100% found this document useful (1 vote)

2K views812 pages

Modern Computer Algebra, 3rd Edition PDF

Uploaded by

Anonymous y5DZdw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

2K views812 pages

Modern Computer Algebra, 3rd Edition PDF

Uploaded by

Anonymous y5DZdw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 812

Modern Computer Algebra

Computer algebra systems are now ubiquitous in all areas of science and engineer-
ing. This highly successful textbook, widely regarded as the “bible of computer
algebra”, gives a thorough introduction to the algorithmic basis of the mathematical
engine in computer algebra systems. Designed to accompany one- or two-semester
courses for advanced undergraduate or graduate students in computer science or
mathematics, its comprehensiveness and reliability has also made it an essential
reference for professionals in the area.
Special features include: detailed study of algorithms including time analysis;
implementation reports on several topics; complete proofs of the mathematical
underpinnings; and a wide variety of applications (among others, in chemistry,
coding theory, cryptography, computational logic, and the design of calendars and
musical scales). A great deal of historical information and illustration enlivens the
text.
In this third edition, errors have been corrected and much of the Fast Euclidean
Algorithm chapter has been renovated.

Joachim von zur Gathen has a PhD from Universität Zürich and has taught at the
University of Toronto and the University of Paderborn. He is currently a professor
at the Bonn–Aachen International Center for Information Technology (B-IT) and
the Department of Computer Science at Universität Bonn.
Jürgen Gerhard has a PhD from Universität Paderborn. He is now Director of
Research at Maplesoft in Canada, where he leads research collaborations with
partners in Canada, France, Russia, Germany, the USA, and the UK, as well as
a number of consulting projects for global players in the automotive industry.
Modern Computer Algebra
Third Edition

J OACHIM VON ZUR G ATHEN

Bonn–Aachen International Center
for Information Technology (B-IT)

J ÜRGEN G ERHARD
Maplesoft, Waterloo
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, São Paulo, Delhi, Mexico City
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK

Published in the United States of America by Cambridge University Press, New York

www.cambridge.org
Information on this title: www.cambridge.org/9781107039032

First and second editions

c Cambridge University Press 1999, 2003
Third edition
c Joachim von zur Gathen and Jürgen Gerhard 2013

This publication is in copyright. Subject to statutory exception

and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.

First published 1999

Second edition 2003
Third edition 2013.

Printed and bound byiCPIiGroupi(UK)iLtd,iCroydoniCR0 i4i4 Y Y

A catalogue record for this publication is available from the British Library

ISBN 978-1-107-03903-2 Hardback

Additional resources for this publication at http://cosec.bit.uni-bonn.de/science/mca

Cambridge University Press has no responsibility for the persistence or

accuracy of URLs for external or third-party internet websites referred to
in this publication, and does not guarantee that any content on such
websites is, or will remain, accurate or appropriate.
To Dorothea, Rafaela, Désirée
For endless patience

To Mercedes Cappuccino
Contents

Introduction 1

1 Cyclohexane, cryptography, codes, and computer algebra 11

1.1 Cyclohexane conformations . . . . . . . . . . . . . . . . . . . . 11
1.2 The RSA cryptosystem . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Distributed data structures . . . . . . . . . . . . . . . . . . . . . 18
1.4 Computer algebra systems . . . . . . . . . . . . . . . . . . . . . 19

I Euclid 23

2 Fundamental algorithms 29
2.1 Representation and addition of numbers . . . . . . . . . . . . . . 29
2.2 Representation and addition of polynomials . . . . . . . . . . . . 32
2.3 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Division with remainder . . . . . . . . . . . . . . . . . . . . . . 37
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3 The Euclidean Algorithm 45

3.1 Euclidean domains . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 The Extended Euclidean Algorithm . . . . . . . . . . . . . . . . 47
3.3 Cost analysis for Z and F[x] . . . . . . . . . . . . . . . . . . . . 51
3.4 (Non-)Uniqueness of the gcd . . . . . . . . . . . . . . . . . . . 55
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4 Applications of the Euclidean Algorithm 69

4.1 Modular arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Modular inverses via Euclid . . . . . . . . . . . . . . . . . . . . 73
4.3 Repeated squaring . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4 Modular inverses via Fermat . . . . . . . . . . . . . . . . . . . . 76

vii
viii Contents

4.5 Linear Diophantine equations . . . . . . . . . . . . . . . . . . . 77

4.6 Continued fractions and Diophantine approximation . . . . . . . 79
4.7 Calendars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.8 Musical scales . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5 Modular algorithms and interpolation 97

5.1 Change of representation . . . . . . . . . . . . . . . . . . . . . 100
5.2 Evaluation and interpolation . . . . . . . . . . . . . . . . . . . . 101
5.3 Application: Secret sharing . . . . . . . . . . . . . . . . . . . . 103
5.4 The Chinese Remainder Algorithm . . . . . . . . . . . . . . . . 104
5.5 Modular determinant computation . . . . . . . . . . . . . . . . . 109
5.6 Hermite interpolation . . . . . . . . . . . . . . . . . . . . . . . 113
5.7 Rational function reconstruction . . . . . . . . . . . . . . . . . . 115
5.8 Cauchy interpolation . . . . . . . . . . . . . . . . . . . . . . . . 118
5.9 Padé approximation . . . . . . . . . . . . . . . . . . . . . . . . 121
5.10 Rational number reconstruction . . . . . . . . . . . . . . . . . . 124
5.11 Partial fraction decomposition . . . . . . . . . . . . . . . . . . . 128
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

6 The resultant and gcd computation 141

6.1 Coefficient growth in the Euclidean Algorithm . . . . . . . . . . 141
6.2 Gauß’ lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.3 The resultant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.4 Modular gcd algorithms . . . . . . . . . . . . . . . . . . . . . . 158
6.5 Modular gcd algorithm in F[x, y] . . . . . . . . . . . . . . . . . 161
6.6 Mignotte’s factor bound and a modular gcd algorithm in Z[x] . . 164
6.7 Small primes modular gcd algorithms . . . . . . . . . . . . . . . 168
6.8 Application: intersecting plane curves . . . . . . . . . . . . . . . 171
6.9 Nonzero preservation and the gcd of several polynomials . . . . . 176
6.10 Subresultants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6.11 Modular Extended Euclidean Algorithms . . . . . . . . . . . . . 183
6.12 Pseudodivision and primitive Euclidean Algorithms . . . . . . . 190
6.13 Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

7 Application: Decoding BCH codes 209

Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Contents ix

II Newton 217

8 Fast multiplication 221

8.1 Karatsuba’s multiplication algorithm . . . . . . . . . . . . . . . 222
8.2 The Discrete Fourier Transform and the Fast Fourier Transform . 227
8.3 Schönhage and Strassen’s multiplication algorithm . . . . . . . . 238
8.4 Multiplication in Z[x] and R[x, y] . . . . . . . . . . . . . . . . . 245
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

9 Newton iteration 257

9.1 Division with remainder using Newton iteration . . . . . . . . . 257
9.2 Generalized Taylor expansion and radix conversion . . . . . . . . 264
9.3 Formal derivatives and Taylor expansion . . . . . . . . . . . . . 265
9.4 Solving polynomial equations via Newton iteration . . . . . . . . 267
9.5 Computing integer roots . . . . . . . . . . . . . . . . . . . . . . 271
9.6 Newton iteration, Julia sets, and fractals . . . . . . . . . . . . . . 273
9.7 Implementations of fast arithmetic . . . . . . . . . . . . . . . . . 278
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

10 Fast polynomial evaluation and interpolation 295

10.1 Fast multipoint evaluation . . . . . . . . . . . . . . . . . . . . . 295
10.2 Fast interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . 299
10.3 Fast Chinese remaindering . . . . . . . . . . . . . . . . . . . . . 301
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

11 Fast Euclidean Algorithm 313

11.1 A fast Euclidean Algorithm for polynomials . . . . . . . . . . . 313
11.2 Subresultants via Euclid’s algorithm . . . . . . . . . . . . . . . . 327
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

12 Fast linear algebra 335

12.1 Strassen’s matrix multiplication . . . . . . . . . . . . . . . . . . 335
12.2 Application: fast modular composition of polynomials . . . . . . 338
12.3 Linearly recurrent sequences . . . . . . . . . . . . . . . . . . . 340
12.4 Wiedemann’s algorithm and black box linear algebra . . . . . . . 346
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
x Contents

13 Fourier Transform and image compression 359

13.1 The Continuous and the Discrete Fourier Transform . . . . . . . 359
13.2 Audio and video compression . . . . . . . . . . . . . . . . . . . 363
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

III Gauß 371

14 Factoring polynomials over finite fields 377

14.1 Factorization of polynomials . . . . . . . . . . . . . . . . . . . . 377
14.2 Distinct-degree factorization . . . . . . . . . . . . . . . . . . . . 380
14.3 Equal-degree factorization: Cantor and Zassenhaus’ algorithm . . 382
14.4 A complete factoring algorithm . . . . . . . . . . . . . . . . . . 389
14.5 Application: root finding . . . . . . . . . . . . . . . . . . . . . . 392
14.6 Squarefree factorization . . . . . . . . . . . . . . . . . . . . . . 393
14.7 The iterated Frobenius algorithm . . . . . . . . . . . . . . . . . 398
14.8 Algorithms based on linear algebra . . . . . . . . . . . . . . . . 401
14.9 Testing irreducibility and constructing irreducible polynomials . 406
14.10 Cyclotomic polynomials and constructing BCH codes . . . . . . 412
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422

15 Hensel lifting and factoring polynomials 433

15.1 Factoring in Z[x] and Q[x]: the basic idea . . . . . . . . . . . . . 433
15.2 A factoring algorithm . . . . . . . . . . . . . . . . . . . . . . . 435
15.3 Frobenius’ and Chebotarev’s density theorems . . . . . . . . . . 441
15.4 Hensel lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
15.5 Multifactor Hensel lifting . . . . . . . . . . . . . . . . . . . . . 450
15.6 Factoring using Hensel lifting: Zassenhaus’ algorithm . . . . . . 453
15.7 Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 461
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467

16 Short vectors in lattices 473

16.1 Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
16.2 Lenstra, Lenstra and Lovász’ basis reduction algorithm . . . . . 475
16.3 Cost estimate for basis reduction . . . . . . . . . . . . . . . . . 480
16.4 From short vectors to factors . . . . . . . . . . . . . . . . . . . . 487
16.5 A polynomial-time factoring algorithm for Z[x] . . . . . . . . . . 489
16.6 Factoring multivariate polynomials . . . . . . . . . . . . . . . . 493
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Contents xi

17 Applications of basis reduction 503

17.1 Breaking knapsack-type cryptosystems . . . . . . . . . . . . . . 503
17.2 Pseudorandom numbers . . . . . . . . . . . . . . . . . . . . . . 505
17.3 Simultaneous Diophantine approximation . . . . . . . . . . . . . 505
17.4 Disproof of Mertens’ conjecture . . . . . . . . . . . . . . . . . . 508
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509

IV Fermat 511

18 Primality testing 517

18.1 Multiplicative order of integers . . . . . . . . . . . . . . . . . . 517
18.2 The Fermat test . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
18.3 The strong pseudoprimality test . . . . . . . . . . . . . . . . . . 520
18.4 Finding primes . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
18.5 The Solovay and Strassen test . . . . . . . . . . . . . . . . . . . 529
18.6 Primality tests for special numbers . . . . . . . . . . . . . . . . 530
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534

19 Factoring integers 541

19.1 Factorization challenges . . . . . . . . . . . . . . . . . . . . . . 541
19.2 Trial division . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
19.3 Pollard’s and Strassen’s method . . . . . . . . . . . . . . . . . . 544
19.4 Pollard’s rho method . . . . . . . . . . . . . . . . . . . . . . . . 545
19.5 Dixon’s random squares method . . . . . . . . . . . . . . . . . . 549
19.6 Pollard’s p − 1 method . . . . . . . . . . . . . . . . . . . . . . . 557
19.7 Lenstra’s elliptic curve method . . . . . . . . . . . . . . . . . . 557
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569

20 Application: Public key cryptography 573

20.1 Cryptosystems . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
20.2 The RSA cryptosystem . . . . . . . . . . . . . . . . . . . . . . 576
20.3 The Diffie–Hellman key exchange protocol . . . . . . . . . . . . 578
20.4 The ElGamal cryptosystem . . . . . . . . . . . . . . . . . . . . 579
20.5 Rabin’s cryptosystem . . . . . . . . . . . . . . . . . . . . . . . 579
20.6 Elliptic curve systems . . . . . . . . . . . . . . . . . . . . . . . 580
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
xii Contents

V Hilbert 585

21 Gröbner bases 591

21.1 Polynomial ideals . . . . . . . . . . . . . . . . . . . . . . . . . 591
21.2 Monomial orders and multivariate division with remainder . . . . 595
21.3 Monomial ideals and Hilbert’s basis theorem . . . . . . . . . . . 601
21.4 Gröbner bases and S-polynomials . . . . . . . . . . . . . . . . . 604
21.5 Buchberger’s algorithm . . . . . . . . . . . . . . . . . . . . . . 608
21.6 Geometric applications . . . . . . . . . . . . . . . . . . . . . . 612
21.7 The complexity of computing Gröbner bases . . . . . . . . . . . 616
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619

22 Symbolic integration 623

22.1 Differential algebra . . . . . . . . . . . . . . . . . . . . . . . . 623
22.2 Hermite’s method . . . . . . . . . . . . . . . . . . . . . . . . . 625
22.3 The method of Lazard, Rioboo, Rothstein, and Trager . . . . . . 627
22.4 Hyperexponential integration: Almkvist & Zeilberger’s algorithm 632
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641

23 Symbolic summation 645

23.1 Polynomial summation . . . . . . . . . . . . . . . . . . . . . . 645
23.2 Harmonic numbers . . . . . . . . . . . . . . . . . . . . . . . . . 650
23.3 Greatest factorial factorization . . . . . . . . . . . . . . . . . . . 653
23.4 Hypergeometric summation: Gosper’s algorithm . . . . . . . . . 658
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671

24 Applications 677
24.1 Gröbner proof systems . . . . . . . . . . . . . . . . . . . . . . . 677
24.2 Petri nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
24.3 Proving identities and analysis of algorithms . . . . . . . . . . . 681
24.4 Cyclohexane revisited . . . . . . . . . . . . . . . . . . . . . . . 685
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698

Appendix 701

25 Fundamental concepts 703

25.1 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
25.2 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
Contents xiii

25.3 Polynomials and fields . . . . . . . . . . . . . . . . . . . . . . . 708

25.4 Finite fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
25.5 Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
25.6 Finite probability spaces . . . . . . . . . . . . . . . . . . . . . . 717
25.7 “Big Oh” notation . . . . . . . . . . . . . . . . . . . . . . . . . 720
25.8 Complexity theory . . . . . . . . . . . . . . . . . . . . . . . . . 721
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724

Sources of illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . 725

Sources of quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
List of algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
List of figures and tables . . . . . . . . . . . . . . . . . . . . . . . . . 732
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734
List of notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769

Keeping up to date

Addenda and corrigenda, comments, solutions to selected exercises, and

ordering information can be found on the book’s web page:

http://cosec.bit.uni-bonn.de/science/mca/
A Beggar’s Book Out-worths a Noble’s Blood.1
William Shakespeare (1613)

Some books are to be tasted, others to be swallowed,

and some few to be chewed and digested.
Francis Bacon (1597)

Les plus grands analystes eux-mêmes ont bien rarement dédaigné de se

tenir à la portée de la classe moyenne des lecteurs; elle est en effet la
plus nombreuse, et celle qui a le plus à profiter dans leurs écrits.2
Anonymous referee (1825)

It is true, we have already a great many Books of Algebra,

and one might even furnish a moderate Library
purely with Authors on that Subject.
Isaac Newton (1728)

I.AmÌ '@ éJ
Ë@ h. AJm
' AÓ ©J
Ôg. éJ
¯ IªÔ
g. ð H. AJºË@ @ Yë ¯
HPQm
Ém × PAJk@ ð ÉÜØ ¨AJ.@ á« @P Qm× 3
Ghiyāth al-Dı̄n Jamshı̄d bin Mas֒ūd bin Mah.mūd al-Kāshı̄ (1427)

1 The sources for the quotations are given on pages 725–729.

2 The greatest analysts [mathematicians] themselves have rarely shied away from keeping within the reach of the
average class of readers; this is in fact the most numerous one, and the one that stands to profit most from their
writing.
3 I wrote this book and compiled in it everything that is necessary for the computer, avoiding both boring ver-
bosity and misleading brevity.
Introduction

In science and engineering, a successful attack on a problem will usually lead to

some equations that have to be solved. There are many types of such equations:
differential equations, linear or polynomial equations or inequalities, recurrences,
equations in groups, tensor equations, etc. In principle, there are two ways of
solving such equations: approximately or exactly. Numerical analysis is a well-
developed field that provides highly successful mathematical methods and com-
puter software to compute approximate solutions.
Computer algebra is a more recent area of computer science, where mathemat-
ical tools and computer software are developed for the exact solution of equations.
Why use approximate solutions at all if we can have exact solutions? The an-
swer is that in many cases an exact solution is not possible. This may have various
reasons: for certain (simple) ordinary differential equations, one can prove that no
closed form solution (of a specified type) is possible. More important are ques-
tions of efficiency: any system of linear equations, say with rational coefficients,
can be solved exactly, but for the huge linear systems that arise in meteorology, nu-
clear physics, geology or other areas of science, only approximate solutions can be
computed efficiently. The exact methods, run on a supercomputer, would not yield
answers within a few days or weeks (which is not really acceptable for weather
prediction).
However, within its range of exact solvability, computer algebra usually pro-
vides more interesting answers than traditional numerical methods. Given a dif-
ferential equation or a system of linear equations with a parameter t, the scientist
gets much more information out of a closed form solution in terms of t than from
several solutions for specific values of t.
Many of today’s students may not know that the slide rule was an indispens-
able tool of engineers and scientists until the 1960s. Electronic pocket calculators
made them obsolete within a short time. In the coming years, computer algebra
systems will similarly replace calculators for many purposes. Although still bulky
and expensive (hand-held computer algebra calculators are yet a novelty), these
systems can easily perform exact (or arbitrary precision) arithmetic with numbers,

1
2 Introduction

matrices, polynomials, etc. They will become an indispensable tool for the sci-
entist and engineer, from students to the work place. These systems are now be-
coming integrated with other software, like numerical packages, CAD/CAM, and
graphics.
The goal of this text is to give an introduction to the basic methods and tech-
niques of computer algebra. Our focus is threefold:
◦ complete presentation of the mathematical underpinnings,
◦ asymptotic analysis of our algorithms, sometimes “Oh-free”,
◦ development of asymptotically fast methods.
It is customary to give bounds on running times of algorithms (if any are given
at all) in a “big-Oh” form (explained in Section 25.7), say as O(n log n) for the
FFT. We often prove “Oh-free” bounds in the sense that we identify the numeri-
cal coefficient of the leading term, as 23 n log2 n in the example; we may then add
O(smaller terms). But we have not played out the game of minimizing these coef-
ficients; the reader is encouraged to find smaller constants herself.
Many of these fast methods have been known for a quarter of a century, but
their impact on computer algebra systems has been slight, partly due to an “unfor-
tunate myth” (Bailey, Lee & Simon 1990) about their practical (ir)relevance. But
their usefulness has been forcefully demonstrated in the last few years; we can now
solve problems—for example, the factorization of polynomials—of a size that was
unassailable a few years ago. We expect this success to expand into other areas of
computer algebra, and indeed hope that this text may contribute to this develop-
ment. The full treatment of these fast methods motivates the “modern” in its title.
(Our title is a bit risqué, since even a “modern” text in a rapidly evolving discipline
such as ours will obsolesce quickly.)
The basic objects of computer algebra are numbers and polynomials. Through-
out the text, we stress the structural and algorithmic similarities between these two
domains, and also where the similarities break down. We concentrate on polyno-
mials, in particular univariate polynomials over a field, and pay special attention
to finite fields.
We will consider arithmetic algorithms in some basic domains. The tasks that
we will analyze include conversion between representations, addition, subtraction,
multiplication, division, division with remainder, greatest common divisors, and
factorization. The domains of fundamental importance for computer algebra are
the natural numbers, the rational numbers, finite fields, and polynomial rings.
Our three goals, as stated above, are too ambitious to keep up throughout. In
some chapters, we have to content ourselves with sketches of methods and out-
looks on further results. Due to space limitations, we sometimes have recourse to
the lamentable device of “leaving the proof to the reader”. Don’t worry, be happy:
solutions to the corresponding exercises are available on the book’s web site.
Introduction 3

After writing most of the material, we found that we could structure the book
into five parts, each named after a mathematician that made a pioneering con-
tribution on which some (but, of course, not all) of the modern methods in the
respective part rely. In each part, we also present selected applications of some of
the algorithmic methods.
The first part E UCLID examines Euclid’s algorithm for calculating the gcd,
and presents the subresultant theory for polynomials. Applications are numerous:
modular algorithms, continued fractions, Diophantine approximation, the Chinese
Remainder Algorithm, secret sharing, and the decoding of BCH codes.
The second part N EWTON presents the basics of fast arithmetic: FFT-based mul-
tiplication, division with remainder and polynomial equation solving via Newton
iteration, and fast methods for the Euclidean Algorithm and the solution of sys-
tems of linear equations. The FFT originated in signal processing, and we discuss
one of its applications, image compression.
The third part G AUSS deals exclusively with polynomial problems. We start
with univariate factorization over finite fields, and include the modern methods
that make attacks on enormously large problems feasible. Then we discuss polyno-
mials with rational coefficients. The two basic algorithmic ingredients are Hensel
lifting and short vectors in lattices. The latter has found many applications, from
breaking certain cryptosystems to Diophantine approximation.
The fourth part F ERMAT is devoted to two integer problems that lie at the foun-
dation of algorithmic number theory: primality testing and factorization. The most
famous modern application of these classical topics is in public key cryptography.
The fifth part H ILBERT treats three different topics which are somewhat more
advanced than the rest of the text, and where we can only exhibit the foundations
of a rich theory. The first area is Gröbner bases, a successful approach to deal with
multivariate polynomials, in particular questions about common roots of several
polynomials. The next topic is symbolic integration of rational and hyperexponen-
tial functions. The final subject is symbolic summation; we discuss polynomial
and hypergeometric summation.
The text concludes with an appendix that presents some foundational material in
the language we use throughout the book: The basics of groups, rings, and fields,
linear algebra, probability theory, asymptotic O-notation, and complexity theory.
Each of the first three parts contains an implementation report on some of the
algorithms presented in the text. As case studies, we use two special purpose pack-
ages for integer and polynomial arithmetic: N TL by Victor Shoup and B I P OL A R
by the authors.
Most chapters end with some bibliographical and historical notes or supple-
mentary remarks, and a variety of exercises. The latter are marked according
to their difficulty: exercises with a ∗ are somewhat more advanced, and the few
marked with ∗∗ are more difficult or may require material not covered in the text.
4 Introduction

Laborious (but not necessarily difficult) exercises are marked by a long arrow −→ .
The book’s web page http://cosec.bit.uni-bonn.de/science/mca/ pro-
vides some solutions.
This book presents foundations for the mathematical engine underlying any
computer algebra system, and we give substantial coverage—often, but not al-
ways, up to the state of the art—for the material of the first three parts, dealing
with Euclid’s algorithm, fast arithmetic, and the factorization of polynomials. But
we hasten to point out some unavoidable shortcomings. For one, we cannot cover
completely even those areas that we discuss, and our treatment leaves out ma-
jor interesting developments in the areas of computational linear algebra, sparse
multivariate polynomials, combinatorics and computational number theory, quan-
tifier elimination and solving polynomial equations, and differential and difference
equations. Secondly, some important questions are left untouched at all; we only
mention computational group theory,parallel computation, computing with tran-
scendental functions, isolating real and complex roots of polynomials, and the
combination of symbolic and numeric methods. Finally, a successful computer
algebra system involves much more than just the mathematical engine: efficient
data structures, a fast kernel and a large compiled or interpreted library, user inter-
face, graphics capability, interoperability of software packages, clever marketing,
etc. These issues are highly technology-dependent, and there is no single good
solution for them.
The present book can be used as the textbook for a one-semester or a two-
semester course in computer algebra. The basic arithmetic algorithms are dis-
cussed in Chapters 2 and 3, and Sections 4.1–4.4, 5.1–5.5, 8.1–8.2, 9.1–9.4, 14.1–
14.6, and 15.1–15.2. In addition, a one-semester undergraduate course might be
slanted towards computational number theory (9.5, 18.1–18.4, and parts of Chap-
ter 20), geometry (21.1–21.6), or integration (4.5, 5.11, 6.2–6.4, and Chapter 22),
supplemented by fun applications from 4.6–4.8, 5.6–5.9, 6.8, 9.6, Chapter 13, and
Chapters 1 and 24. A two-semester course could teach the “basics” and 6.1–6.7,
10.1–10.2, 15.4–15.6, 16.1–16.5, 18.1–18.3, 19.1–19.2, 19.4, 19.5 or 19.6–19.7,
and one or two of Chapters 21–23, maybe with some applications from Chapters
17, 20, and 24. A graduate course can be more eclectic. We once taught a course
on “factorization”, using parts of Chapters 14–16 and 19. Another possibility is
a graduate course on “fast algorithms” based on Part II. For any of these sugges-
tions, there is enough material so that an instructor will still have plenty of choice
of which areas to skip. The logical dependencies between the chapters are given
in Figure 1.
The prerequisite for such a course is linear algebra and a certain level of mathe-
matical maturity; particularly useful is a basic familiarity with algebra and analysis
of algorithms. However, to allow for the large variations in students’ background,
we have included an appendix that presents the necessary tools. For that mate-
rial, the borderline between the boring and the overly demanding varies too much
Introduction 5

2. Fundamental
algorithms
1. Examples
MODERN
COMPUTER
3. The Euclidean
Algorithm
ALGEBRA

4. Applications of 8. Fast
Euclid’s Algorithm NEWTON multiplication

7. Decoding 5. Modular 9. Newton 13. Fourier Transf.

BCH−Codes algorithms iteration and image compr.

6. Resultants and 10. Fast evaluation 12. Fast linear

EUCLID gcd computation and interpolation algebra

11. Fast Euclidean

Algorithm

14. Factoring
FERMAT over finite fields GAUSS

18. Primality 15. Hensel 21. Gröbner

testing lifting bases

19. Factoring 16. Short vectors 23. Symbolic 22. Symbolic

integers in lattices summation integration

20. Public key 17. Applications

24. Applications HILBERT
cryptography of basis reduction

F IGURE 1: Leitfaden.
6 Introduction

to get it right for everyone. If those notions and tools are unfamiliar, an instructor
may have to expand beyond the condensed description in the appendix. Otherwise,
most of the presentation is self-contained, and the exceptions are clearly indicated.
By their nature, some of the applications assume a background in the relevant area.
The beginning of each part presents a biographical sketch of the scientist after
which it is named, and throughout the text we indicate some of the origins of our
material. For lack of space and competence, this is not done in a systematic way,
let alone with the goal of completeness, but we do point to some early sources,
often centuries old, and quote some of the original work. Interest in such historical
issues is, of course, a matter of taste. It is satisfying to see how many algorithms
are based on venerable methods; our essentially “modern” aspect is the concern
with asymptotic complexity and running times, faster and faster algorithms, and
their computer implementation.

Acknowledgements. This material has grown from undergraduate and graduate

courses that the first author has taught over more than a decade in Toronto, Zürich,
Santiago de Chile, Canberra, and Paderborn. He wants to thank above all his two
teachers: Volker Strassen, who taught him mathematics, and Allan Borodin, who
taught him computer science. To his friend Erich Kaltofen he is grateful for many
enlightening discussions about computer algebra.
The second author wants to thank his two supervisors, Helmut Meyn and Volker
Strehl, for many stimulating lectures in computer algebra.
The support and enthusiasm of two groups of people have made the courses
a pleasure to teach. On the one hand, the colleagues, several of whom actually
shared in the teaching: Leopoldo Bertossi, Allan Borodin, Steve Cook, Faith Fich,
Shuhong Gao, John Lipson, Mike Luby, Charlie Rackoff, and Victor Shoup. On
the other hand, lively groups of students took the courses, solved the exercises
and tutored others about them, and some of them were the scribes for the course
notes that formed the nucleus of this text. We thank particularly Paul Beame,
Isabel de Correa, Wayne Eberly, Mark Giesbrecht, Rod Glover, Silke Hartlieb,
Jim Hoover, Keju Ma, Jim McInnes, Pierre McKenzie, Sun Meng, Rob Morenz,
Michael Nöcker, Daniel Panario, Michel Pilote, and François Pitt.
Thanks for help on various matters go to Eric Bach, Peter Blau, Wieb Bosma,
Louis Bucciarelli, Désirée von zur Gathen, Keith Geddes, Dima Grigoryev, Johan
Håstad, Dieter Herzog, Marek Karpinski, Wilfrid Keller, Les Klinger, Werner
Krandick, Ton Levelt, János Makowsky, Ernst Mayr, François Morain, Gerry
Myerson, Michael Nüsken, David Pengelley, Bill Pickering, Tomás Recio, Jeff
Shallit, Igor Shparlinski, Irina Shparlinski, and Paul Zimmermann.
We thank Sandra Feisel, Carsten Keller, Thomas Lücking, Dirk Müller, and
Olaf Müller for programming and the substantial task of producing the index, and
Marianne Wehry for tireless help with the typing.
Introduction 7

We are indebted to Sandra Feisel, Adalbert Kerber, Preda Mihăilescu, Michael

Nöcker, Daniel Panario, Peter Paule, Daniel Reischert, Victor Shoup, and Volker
Strehl for carefully proofreading parts of the draft.
Paderborn, January 1999

The 2003 edition. The great French mathematician Pierre Fermat never pub-
lished a thing in his lifetime. One of the reasons was that in his days, books and
other publications often suffered vitriolic attacks for perceived errors, major or
minor, frequently combined with personal slander.
Our readers are friendlier. They pointed out about 160 errors and possible im-
provements in the 1999 edition to us, but usually sugared their messages with
sweet compliments. Thanks, friends, for helping us feel good and produce a better
book now! We gratefully acknowledge the assistance of Sergeı̆ Abramov, Michael
Barnett, Andreas Beschorner, Murray Bremner, Peter Bürgisser, Michael Clausen,
Rob Corless, Abhijit Das, Ruchira Datta, Wolfram Decker, Emrullah Durucan,
Friedrich Eisenbrand, Ioannis Emiris, Torsten Fahle, Benno Fuchssteiner, Rod
Glover, David Goldberg, Mitch Harris, Dieter Herzog, Andreas Hirn, Mark
van Hoeij, Dirk Jung, Kyriakos Kalorkoti, Erich Kaltofen, Karl-Heinz Kiyek,
Andrew Klapper, Don Knuth, Ilias Kotsireas, Werner Krandick, Daniel Lauer,
Daniel Bruce Lloyd, Martin Lotz, Thomas Lücking, Heinz Lüneburg, Mantsika
Matooane, Helmut Meyn, Eva Mierendorff, Daniel Müller, Olaf Müller, Seyed
Hesameddin Najafi, Michael Nöcker, Michael Nüsken, Andreas Oesterhelt, Daniel
Panario, Thilo Pruschke, Arnold Schönhage, Jeff Shallit, Hans Stetter, David
Theiwes, Thomas Viehmann, Volker Weispfenning, Eugene Zima, and Paul
Zimmermann.
Our thanks also go to Christopher Creutzig, Katja Daubert, Torsten Metzner,
Eva Müller, Peter Serocka, and Marianne Wehry.
Besides correcting the known errors and (unintentionally) introducing new ones,
we smoothed and updated various items, and made major changes in Chapters 3,
15, and 22.
Paderborn, February 2002

The 2013 edition. Many people have implemented algorithms from this text
and were happy with it. A few have tried their hands at the fast Euclidean al-
gorithm from Chapter 11 and became unhappy. No wonder — the description
contained a bug which squeezed through an unseen crack in our proof of correct-
ness. That particular crack has been sealed for the present edition, and in fact
much of Chapter 11 is renovated. In addition, about 80 other errors have been
corrected. Thanks go to John R. Black, Murray Bremner, Winfried Bruns, Evan
Jingchi Chen, Howard Cheng, Stefan Dreker, Olav Geil, Giulio Genovese, Stefan
Gerhold, Charles-Antoine Giuliani, Sebastian Grimsell, Masaaki Kanno, Tom
Koornwinder, Heiko Körner, Volker Krummel, Martina Kuhnert, Jens Kunerle,
8 Introduction

Eugene Luks, Olga Mendoza, Helmut Meyn, Guillermo Moreno-Socías, Olaf

Müller, Peter Nilsson, Michael Nüsken, Kathy Pinzon, Robert Schwarz, Jeff
Shallit, Viktor Shoup, Allan Steel, Fre Vercauteren, Paul Vrbik, Christiaan van
de Woestijne, Huang Yong, Konstantin Ziegler, and Paul Zimmermann for their
hints. We also acknowledge the help of Martina Kuhnert and Michael Nüsken in
producing this edition.
Separate errata pages for each edition will be kept on the book’s website
http://cosec.bit.uni-bonn.de/science/mca/.
Dear readers, the hunt for errors is not over. Please keep on sending them to us at
gathen@bit.uni-bonn.de or gerhard.juergen@web.de. And while hunting,
enjoy the reading!
Bonn and Georgetown, January 2013

Note. We produced the postscript files for this book with the invaluable help of
the following software packages: Leslie Lamport’s LATEX, based on Don Knuth’s TEX,
Klaus Lagally’s ArabTEX, Oren Patashnik’s B IBTEX, Pehong Chen’s MakeIndex, M APLE,
M U PAD, Victor Shoup’s N TL, Thomas Williams’ and Colin Kelley’s gnuplot, the Persis-
tence of Vision Ray Tracer POV-Ray, and xfig.
Clarke’s Third Law:
Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke (c. 1969)

L’avancement et la perfection des mathématiques

sont intimement liés à la prospérité de l’État.1
Napoléon I. (1812)

It must be easy [. . . ] to bring out a double set of results, viz. —1st,

the numerical magnitudes which are the results of operations
performed on numerical data. [. . . ] 2ndly, the symbolical results
to be attached to those numerical results, which symbolical results
are not less the necessary and logical consequences
of operations performed upon symbolical data,
than are numerical results when the data are numerical.
Augusta Ada Lovelace (1843)

There are too goddamned many machines that spew out data too fast.
Robert Ludlum (1995)

After all, the whole purpose of science is not technology—

God knows we have gadgets enough already.
Eric Temple Bell (1937)

1 The advancement and perfection of mathematics are intimately connected with the prosperity of the State.
1
Cyclohexane, cryptography, codes, and computer
algebra

Three examples in this chapter illustrate some applications of the ideas and meth-
ods of computer algebra: the spatial configurations (conformations) of the cy-
clohexane molecule, a chemical problem with an intriguing geometric solution;
a cryptographic protocol for the secure transmission of messages; and distributed
codes for sharing secrets or sending packets over a faulty network. Throughout
this book you will find such sample applications in a wide variety of areas, from
the design of calendars and musical scales to image compression and the intersec-
tion of algebraic curves. The last section in this chapter gives a concise overview
of some computer algebra systems.

1.1. Cyclohexane conformations

H H
H C H a2 a3
H C C H
a1 a4

H C C H
a6 a5
H C H
H H
F IGURE 1.1: The structure formula for cyclohexane (C6 H12 ), and the orientation we give
to the bonds a1 , . . . , a6 .

We start with an example from chemistry. It illustrates the three typical steps in
mathematical applications: creating a mathematical model of the problem at hand,
“solving” the model, and interpreting the solution in the original problem. Usually,

11
12 1. Cyclohexane, cryptography, codes, and computer algebra

none of these steps is straightforward, and one often has to go back and modify the
approach.
Cyclohexane C6 H12 (Figure 1.1), a molecule from organic chemistry, is a hydro-
carbon consisting of six carbon atoms (C) connected to each other in a cycle and
twelve hydrogen atoms (H), two attached to each carbon atom. The four bonds of
one carbon atom (two bonds to adjacent carbon atoms and two bonds to hydrogen
atoms) are arranged in the form of a tetrahedron, with the carbon in the center and
its bonds pointing to the four corners. The angle α between any two bonds is about
109 degrees (the precise value of α satisfies cos α = −1/3). Two adjacent carbon
atoms may freely rotate around the bond between them.
Chemists have observed that cyclohexane occurs in two incongruent conforma-
tions (which are not transformable into each other by rotations and reflections),
a “chair” (Figure 1.2) and a “boat” (Figure 1.3), and experiments have shown
that the “chair” occurs far more frequently than the “boat”. The frequency of
occurrence of a conformation depends on its free energy—a general rule is that
molecules try to minimize the free energy—which in turn depends on the spatial
structure.
When modeling the molecule by means of plastic tubes (Figure 1.4) representing
the carbon atoms and the bonds between them (omitting the hydrogen atoms for
simplicity) in such a way that rotations around the bonds are possible, one observes
that there is a certain amount of freedom in moving the atoms by rotations around
the bonds in the “boat” conformation (we will call it the flexible conformation),
but that the “chair” conformation is rigid, and that it appears to be impossible to
get from the “boat” to the “chair” conformation. Can we mathematically model
and, if possible, explicitly describe this behavior?
We let a1 , . . . , a6 ∈ R 3 be the orientations of the six bonds in three-space, so
that all six vectors point in the same direction around the cyclic structure (Fig-
ure 1.1), and normalize the distance between two adjacent carbon atoms to be one.
By u ⋆ v = u1 v1 + u2 v2 + u3 v3 we denote the usual inner product of two vectors
u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) in R 3 . The cosine theorem says that u ⋆ v =
||u||2 · ||v||2 · cos β , where ||u||2 = (u ⋆ u)1/2 is the Euclidean norm and β ∈ [0, π ] is
the angle between u and v, when both vectors are rooted at the origin. The above
conditions then lead to the following system of equations:

a1 ⋆ a1 = a2 ⋆ a2 = · · · = a6 ⋆ a6 = 1,
1
a1 ⋆ a2 = a2 ⋆ a3 = · · · = a6 ⋆ a1 = , (1)
3
a1 + a2 + · · · + a6 = 0.

The first line says that the length of each bond is 1. The second line expresses the
fact that the angle between two bonds adjacent to the same carbon atom is α (the
cosine is 1/3 instead of −1/3 since, seen from the carbon atom, the two bonds
1.1. Cyclohexane conformations 13

F IGURE 1.2: A stereo image of a “chair” conformation of cyclohexane. To see a three-

dimensional image, hold the two pictures right in front of your eyes. Then relax your eyes
and do not focus at the foreground, so that the two pictures fade away and each of them
splits into two separate images (one for each eye). By further relaxing your eyes, try to
match the right image that your left eye sees with the left image that your right eye sees.
Now you see three images, each of the two outer ones only with one of your eyes, and the
middle one with both eyes. Cautiously focus on the middle image, at the same time slowly
moving the book away from your head, until the image is sharp. (This works best without
wearing glasses.)

F IGURE 1.3: Three “boat” conformations of cyclohexane and a stereo image of the middle
one (see Figure 1.2 for a viewing instruction).

have opposite orientation). Finally, the last line expresses the cyclic nature of the
structure.
Together, (1) comprises 6 + 6 + 1 = 13 equations in the 18 coordinates of the
points a1 , . . . , a6 . The first ones are quadratic, and the last ones are linear. There
is still redundancy coming from the whole structure’s possibility to move and ro-
tate around freely in three-space. One possibility to remedy this is to introduce
three more equations expressing the fact that a1 and a2 are parallel to the x-axis
respectively the x, y-plane. These equations can be solved with a computer alge-
bra system, but the resulting description of the solutions is highly complicated and
non-intuitive.
14 1. Cyclohexane, cryptography, codes, and computer algebra

F IGURE 1.4: A plumbing knee model of cyclohexane, with nearly right angles.
1.1. Cyclohexane conformations 15

–0.2

–0.3

–0.4

–0.5
"boat"

–0.6

–0.7

–0.2 y
x
–0.4 –0.2
–0.4 –0.3
–0.6 –0.6 –0.5
–0.7

F IGURE 1.5: The curve E.

For a successful solution, we pursue a different, more symmetric approach, by

taking the inner products Si j = ai ⋆ a j for 1 ≤ i, j ≤ 6 as unknowns instead of the
coordinates of a1 , . . . , a6 . This is described in detail in Section 24.4. Under the con-
ditions (1), Si j is the cosine of the angle between ai and a j . It turns out that all Si j
depend linearly on S13 , S35 , and S51 , and that the triples of values (S13 , S35 , S51 ) ∈
R 3 leading to the flexible conformations are given by the space curve E in Fig-
ure 1.5. The solution makes heavy use of various computer algebra tools, such as
resultants (Chapter 6), polynomial factorization (Part III), and Gröbner bases
(Chapter 21). The three marked points (−1/3, −1/3, −7/9), (−1/3, −7/9, −1/3)
and (−7/9, −1/3, −1/3) correspond to the three “boat” conformations in Fig-
ure 1.3 (all of them are equivalent by cyclically permuting a1 , . . . , a6 ). Actually,
some information gets lost in the transition from the ai to the Si j , and each point on
the curve E corresponds to precisely two spatial conformations which are mirror
images of each other. The rigid conformation corresponds to the isolated solution
S13 = S35 = S51 = −1/3 not lying on the curve.
16 1. Cyclohexane, cryptography, codes, and computer algebra

F IGURE 1.6: Eight flexible conformations of cyclohexane corresponding to the eight

points marked in Figure 1.5. The first and the eighth ones are “boats”. The point of view
is such that the positions of the red, green, and blue carbon atoms are invariant for all eight
pictures.

We built the simple physical “model” in Figure 1.4 of something similar to cy-
clohexane as follows. We bought six plastic plumbing “knees”, with approxi-
mately a right angle. (German plumbing knees actually have an angle of about 93
degrees, for some deep hydrodynamic reason.) This differs considerably from the
109 degrees of the carbon tetrahedron, but on the other hand, it only cost about
e 7. We stuck the six parts together and pulled an elastic cord through them to
keep them from falling apart. Then one can smoothly turn the structure through
the flexible conformations corresponding to the curve in Figure 1.5, physically
“feeling” the curve. Pulling the whole thing forcibly apart, one can also get into
the “chair” position. Now no wiggling or gentle twisting will move the structure;
it is quite rigid.

1.2. The RSA cryptosystem

The basic scenario for cryptography is as follows. Bob wants to send a message
to Alice in such a way that an eavesdropper (Eve) listening to the transmission
channel cannot understand the message. This is done by enciphering the message
so that only Alice, possessing the right key, can decipher it, but Eve, having no
access to the key, has no chance to recover the message.
Bob Alice
• ✲•
✻
•
Eve

In classical symmetric cryptosystems, Alice and Bob use the same key for
both encryption and decryption. The RSA cryptosystem, described in detail in
1.2. The RSA cryptosystem 17

public key K private key S

❄ transmitted ❄
plaintext ✲ ciphertext ✲ decrypted text
x y = ε(x) δ(y)
encryption ε decryption δ

F IGURE 1.7: A public key cryptosystem.

Section 20.2, is an asymmetric or public key cryptosystem. Alice has a public

key K that she publishes in some directory, and a private key S that she keeps
secret. To encrypt a message, Bob uses Alice’s public key, and only Alice can
decrypt the ciphertext using her private key (Figure 1.7).
The RSA system works as follows. First, Alice randomly chooses two large
(150 digit, say) prime numbers p 6= q and computes their product N = pq. Effi-
cient probabilistic primality tests (Chapter 18) make it easy to find such primes
and then N, but the problem of finding p and q, when just N is given (that is, fac-
toring N), seems very hard (see Chapter 19). Then she randomly chooses another
integer e ∈ {2, . . . , ϕ(N) − 2} coprime to ϕ(N), where ϕ is Euler’s totient func-
tion (Section 4.2). For our particular N, the Chinese Remainder Theorem 5.3
implies that ϕ(N) = (p − 1) · (q − 1). Then Alice publishes the pair K = (N, e).
To obtain her private key, Alice uses the Extended Euclidean Algorithm 3.14 to
compute d ∈ {2, . . . , ϕ(N) − 2} such that ed ≡ 1 mod ϕ(N), and S = (N, d) is her
private key. Thus (ed − 1)/ϕ(N) is an integer.
Before they can exchange messages, both parties agree upon a way of encoding
messages (pieces of text) as integers in the range 0, . . . , N − 1 (this is not part of
the cryptosystem itself). For example, if messages are built from the 26 letters A
through Z, we might identify A with 0, B with 1, . . ., Z with 25, and use the 26-ary
representation for encoding. The message “CAESAR” is then encoded as

2 · 260 + 0 · 261 + 4 · 262 + 18 · 263 + 0 · 264 + 17 · 265 = 202 302 466.

Long messages are broken into pieces. Now Bob wants to send a message x ∈
{0, . . . , N − 1} to Alice that only she can read. He looks up her public key (N, e),
computes the encryption y = ε(x) ∈ {0, . . . , N − 1} of x such that y ≡ xe mod N,
and sends y. Computing y can be done very efficiently using repeated squaring
(Algorithm 4.8). To decrypt y, Alice uses her private key (N, d) to compute the
decryption x∗ = δ (y) ∈ {0, . . . , N − 1} of y with x∗ ≡ yd mod N. Now Euler’s
theorem (Section 18.1) says that xϕ(N) ≡ 1 mod N, if x and N are coprime. Thus

x∗ ≡ yd ≡ xed = x · (xϕ(N) )(ed−1)/ϕ(N) ≡ x mod N,

18 1. Cyclohexane, cryptography, codes, and computer algebra

and it follows that x∗ = x since x and x∗ are both in {0, . . . , N − 1}. In fact, x∗ = x
also holds when x and N have a nontrivial common divisor.
Without knowledge of d, however, it seems currently infeasible to compute x
from N, e, and y. The only known way to do this is to factor N into its prime
factors, and then to compute d with the Extended Euclidean Algorithm as Alice
did, but factoring integers (Chapter 19) is extremely time-consuming: 300 digit
numbers are beyond the capabilities of currently known factoring algorithms even
on modern supercomputers or workstation networks.
Software packages like PGP (“Pretty Good Privacy”; see Zimmermann (1996)
and http://www.openpgp.org) use the RSA cryptosystem for encrypting and
authenticating e-mail and data files, and for secure communication over local area
networks or the internet.

1.3. Distributed data structures

We start with another problem from cryptography: secret sharing. Suppose that,
for some positive integer n, we have n players that want to share a common secret
in such a way that all of them together can reconstruct the secret but any subset of
n − 1 of them or less cannot do so. The reader may imagine that the secret is a key
in a cryptosystem or a code guarding a common bank account or inheritance, or
an authorization for a financial transaction of a company which requires the sig-
nature of a certain number of managers. This can be solved by using interpolation
(Section 5.2), as follows.
We choose 2n − 1 values f1 , . . . , fn−1 , u0 , . . . , un−1 in a field F (say, F is Q or a
finite field) such that the ui are distinct, and let f be the polynomial fn−1 xn−1 +
· · · + f1 x + f0 , where f0 ∈ F is the secret, encoded in an appropriate way. Then
we give vi = f (ui ) = fn−1 un−1
i + · · · + f1 ui + f0 to player i. The reconstruction of
the polynomial f from its values v0 , . . . , vn−1 at the n distinct points u0 , . . . , un−1 is
called interpolation and may be performed, for example, by using the Lagrange
interpolation formula (Section 5.2). The interpolating polynomial at n points
of degree less than n is unique, and hence all n players together can recover f
and the secret f0 , but one can show that any proper subset of them can obtain no
information on the secret. More precisely, all elements of F—as potential values
of f0 —are equally consistent with the knowledge of fewer than n players. We
discuss the secret sharing scheme in Section 5.3.
Essentially the same scheme works for a different problem: reliable routing.
Suppose that we want to send a message consisting of several packets over a net-
work (for example, the internet) that occasionally loses packets. We want to en-
code a message of length n into not many more than n packets in such a way that
after a loss of up to l arbitrary packets the message can still be recovered. Such a
scheme is called an erasure code. (We briefly discuss the related error correcting
codes, which are designed for networks that sometimes perturb packets but do not
1.4. Computer algebra systems 19

lose them, in Chapter 7.) An obvious solution would be to send the message l + 1
times, but this increases message length and hence slows down communication
speed by a factor of l + 1 and is unacceptable even for small values of l.
Again we may assume that each packet is encoded as an element of some field F,
and that the whole message is the sequence of packets f0 , . . . , fn−1 . Then we
choose k = n + l distinct evaluation points u0 , . . . , uk−1 ∈ F and send the k packets
f (u0 ), . . . , f (uk−1 ) over the net. Assuming that the sequence number i is contained
in the packet header and that the recipient knows u0 , . . . , uk−1 , she can reconstruct
the original message—the (coefficients of the) polynomial f —from any n of the
surviving packages by interpolation (and may discard any others).
The above scheme can also be used to distribute n data blocks (for example,
records of a database) among k = n + l computers in such a way that after failure
of up to l of them the complete information can still be recovered. The difference
between secret sharing and this scheme is that in the former the relevant piece of
information is only one coefficient of f , while in the latter it is the whole polyno-
mial.
The above methods can be viewed as problems in distributed data structures.
Parallel and distributed computing is an active area of research in computer sci-
ence. Developing algorithms and data structures for parallel computing is a non-
trivial task, often more challenging than for sequential computing. The amount
of parallelism that a particular problem admits is sometimes difficult to detect. In
computer algebra, modular algorithms (Chapters 4 and 5) provide a “natural”
parallelism for a certain class of algebraic problems. These are divided into smal-
ler problems by reduction modulo several “primes”, the subproblems can be solved
independently in parallel, and the solution is put together using the Chinese Re-
mainder Algorithm 5.4. An important particular case is when the “primes” are
linear polynomials x − ui . Then modular reduction corresponds to evaluation at ui ,
and the Chinese Remainder Algorithm is just interpolation at all points ui , as in the
examples above.
If the interpolation points are roots of unity (Section 8.2), then there is a par-
ticularly efficient method for evaluating and interpolating at those points, the Fast
Fourier Transform (Chapters 8 and 13). It is the starting point for efficient algo-
rithms for polynomial (and integer) arithmetic in Part II.

1.4. Computer algebra systems

We give a short overview of the computer algebra systems available at the time of
writing. We do not present much detail, nor do we aim for completeness.
The vast majority of this book’s material is of a fundamental nature and techno-
logy-independent. But this short section and some other places where we discuss
implementations date the book; if the rapid progress in this area continues as ex-
pected, some of this material will become obsolete in a short time.
20 1. Cyclohexane, cryptography, codes, and computer algebra

Computer algebra systems have historically evolved in several stages. An early

forerunner was Williams’ (1961) PMS, which could calculate floating point poly-
nomial gcd’s. The first generation, beginning in the late 1960s, comprised M AC -
SYMA from Joel Moses’s MATHLAB group at MIT, S CRATCHPAD from Richard
Jenks at IBM, R EDUCE by Tony Hearn, and SAC-I (now S ACLIB) by George
Collins. M U M ATH by David Stoutemyer ran on a small microprocessor; its suc-
cessor D ERIVE is available on the hand-held TI-92. These researchers and their
teams developed systems with algebraic engines capable of doing amazing exact
(or formal or symbolic ) computations: differentiation, integration, factorization,
etc. The second generation started with M APLE by Keith Geddes and Gaston
Gonnet from the University of Waterloo in 1985 and M ATHEMATICA by Stephen
Wolfram. They began to provide modern interfaces and graphic capabilities, and
the hype surrounding the launching of M ATHEMATICA did much to make these
systems widely known. The third generation is on the market now: A XIOM, a suc-
cessor of S CRATCHPAD, by NAG, M AGMA by John Cannon at the University
of Sydney, and M U PAD by Benno Fuchssteiner at the University of Paderborn.
These systems incorporate a categorical approach and operator calculations.
Today’s research and development of computer algebra systems is driven by
three goals, which sometimes conflict: wide functionality (the capability of solv-
ing a large range of different problems), ease of use (user interface, graphics dis-
play), and speed (how big a problem you can solve with a routine calculation, say
in a day on a workstation). This text will concentrate on the latter goal. We will
see the basics of the fastest algorithms available today, mainly for some problems
in polynomial manipulation. Several groups have developed software for these ba-
sic operations: PARI by Henri Cohen in Bordeaux, L IDIA by Johannes Buchmann
in Darmstadt, N TL by Victor Shoup, and packages by Erich Kaltofen. William
Stein’s open-source S AGE facilitates interoperability of various systems, and S IN -
GULAR is strong in certain algebraic settings.
A central problem is the factorization of polynomials. Enormous progress has
been achieved, in particular over finite fields. In 1991, the largest problems amena-
ble to routine calculation were about 2 KB in size, while in 1995 Shoup’s software
could handle problems of about 500 KB. Almost all the progress here is due to
new algorithmic ideas, and this problem will serve as a guiding light for this text.
Computer algebra systems have a wide variety of applications in fields that re-
quire computations that are tedious, lengthy and difficult to get right when done
by hand. In physics, computer algebra systems are used in high energy physics,
for quantum electrodynamics, quantum chromodynamics, satellite orbit and rocket
trajectory computations and celestial mechanics in general. As an example, De-
launay calculated the orbit of the moon under the influence of the sun and a non-
spherical earth with a tilted ecliptic. This work took twenty years to complete and
was published in 1867. It was shown, in 20 hours on a small computer in 1970, to
be correct to nine decimal places.
1.4. Computer algebra systems 21

Important implementation issues for a general-purpose computer algebra system

concern things like the user interface (see Kajler & Soiffer 1998 for an overview),
memory management and garbage collection, and which representations and sim-
plifications are allowed for the various algebraic objects.
Their power of visualization and of solving nontrivial examples makes computer
algebra systems more and more appealing for use in education. Many topics in
calculus and linear algebra can be beautifully illustrated with this technology. The
eleven reports in the Special Issue of the Journal of Symbolic Computation (Lambe
1997) give an idea of what can be achieved. The (self-referential) use in computer
algebra courses is obvious; in fact, this text grew out of courses that the first author
has taught at several institutions, using M APLE (and later other systems) since
1986.
As in most active branches of science, the sociology of computer algebra is
shaped by its leading conferences, journals, and the researchers running these.
The premier research conference is the annual International Symposium on Sym-
bolic and Algebraic Computation (ISSAC), into which several earlier streams of
meetings have been amalgamated. It is run by a Steering Committee, and national
societies such as the US ACM’s Special Interest Group on Symbolic and Algebraic
Manipulation (SIGSAM), the Fachgruppe Computeralgebra of the German GI,
and others, are heavily involved. In addition, there are specialized meetings such
as MEGA (multivariate polynomials), DISCO (distributed systems), and PASCO
(parallel computation). Sometimes relevant results are presented at related con-
ferences, such as STOC, FOCS, or AAECC. Several computer algebra systems
vendors organize regular workshops or user group meetings.
The highly successful Journal of Symbolic Computation , created in 1985 by
Bruno Buchberger, is the undisputed leader for research publications. Some high-
quality journals with a different focus sometimes contain articles in the field: com-
putational complexity , Journal of the ACM , Mathematics of Computation , and
SIAM Journal on Computing . Other journals, such as AAECC and Theoretical
Computer Science , have some overlap with computer algebra.
Part I
Euclid
Little is known about the person of Euclid (Εὐκλείδης, c. 320–c. 275 BC).
Proclus (410–485 AD) edited and commented his famous work, the Elements
(στοιχει̃α), and summed up the meager knowledge about Euclid: Euclid put
together the Elements, collected many of Eudoxus’ theorems, perfected many of
Theaitetus’, and also brought to irrefragable demonstration the things which were
only somewhat loosely proved by his predecessors. This man lived in the time of
the first Ptolemy. For Archimedes, who came immediately after the first
(Ptolemy), makes mention of Euclid: and, further, they say that Ptolemy once
asked him if there was in geometry any shorter way than that of the elements, and
he answered that there was no royal road to geometry. He is then younger than
the pupils of Plato but older than Eratosthenes and Archimedes; for the latter
were contemporary with one another, as Eratosthenes somewhere says
(translation1 from Heath 1925).
By interpolating between Plato’s (427–347 BC) students, Archimedes
(287–212 BC), and Eratosthenes (c. 275–195 BC), we can guess that Euclid lived
around 300 BC; Ptolemy reigned 306–283. It is likely that Euclid learned
mathematics in Athens from Plato’s students. Later, he founded a school at
Alexandria, and this is presumably where most of the Elements was written. An
anecdote relates how “some one who had begun to read geometry with Euclid,
when he had learned the first theorem, asked Euclid, ‘But what shall I get by
learning these things?’ Euclid called his slave and said ‘Give him threepence,
since he must make gain out of what he learns.’”
In later parts of this book, we will present two giants of mathematics—Newton
and Gauß—and two great mathematicians—Fermat and Hilbert. Archimedes is
the third giant, in our subjective reckoning. Euclid’s ticket to enter our little Hall
of Fame is not a wealth of original ideas, but—besides his algorithm for the
gcd—his insistence that everything must be proven from a few axioms, as he does
in his systematic collection and orderly presentation of the mathematical thought
of his times in the thirteen books (chapters) of his Elements . Probably written
around 300 BC, he presents his material in the axiom-definition-lemma-theorem-
proof style which—somewhat modified—has survived through the millenia.
Euclid’s methods go back to Aristotle’s (384–322 BC) peripatetic school, and to
the Eleatics.
After the Bible, the Elements is apparently the most often printed book, an
eternal bestseller with a vastly longer half-life (before oblivion) than current-day
bestsellers. It was the mathematical textbook for over two millenia. (In spite of
their best efforts, the authors of the present text expect it to be superseded long
before 4298 AD.) Even in 1908, a translator of the Elements exulted: Euclid’s
1 Reprinted with the kind permission of Dover Publications Inc., Mineola NY.

24
work will live long after all the text-books of the present day are superseded and
forgotten. It is one of the noblest monuments of antiquity; no mathematician
worthy of the name can afford not to know Euclid. Since the invention of
non-Euclidean geometry and the new ideas of Klein and Hilbert in the 19th
century, we don’t take the Elements quite that seriously any longer.
In the Dark Ages, Europe’s intellectuals were more interested in the maximal
number of angels able to dance on a needle tip, and the Elements mainly survived
in the Arabic civilization. The first translation from the Greek was done by
Al-H.ajjāj bin Yūsuf bin Mat.ar (c. 786–835) for Caliph Hārūn al-Rashı̄d
(766–809). These were later translated into Latin, and Erhard Ratdolt produced in
Venice the first printed edition of the Elements in 1482; in fact, this was the first
mathematics book to be printed. On page 23 we reproduce its first page from a
copy in the library of the University of Basel; the underlining is possibly by the
lawyer Bonifatius Amerbach, its 16th century owner, who was a friend of
Erasmus.
Most of the Elements deals
with geometry, but Books
7, 8, and 9 treat arithmetic.
Proposition 2 of Book 7 asks:
“Given two numbers not prime
to one another, to find their
greatest common measure”,
and the core of the algorithm
goes as follows: “Let AB,CD
be the two given numbers
not prime to one another [. . . ]
if CD does not measure AB,
then, the lesser of the numbers
AB,CD being continually
subtracted from the greater,
some number will be left
which will measure the one
before it” (translation from
Heath 1925).
Numbers here are
represented by line segments,
and the proof that the last number left (dividing the one before it) is a common
divisor and the greatest one is carried out for the case of two division steps (ℓ = 2
in Algorithm 3.6). This is Euclid’s algorithm, “the oldest nontrivial algorithm that
has survived to the present day” (Knuth 1998, §4.5.2), and to whose

25
understanding the first part of this text is devoted. In contrast to the modern
version, Euclid does repeated subtraction instead of division with remainder.
Since some quotient might be large, this does not give a polynomial-time
algorithm, but the simple idea of removing powers of 2 whenever possible
already achieves this (Exercise 3.25).
In the geometric Book 10, Euclid repeats this argument in Proposition 3 for
“commensurable magnitudes”, which are real numbers whose quotient is rational,
and Proposition 2 states that if this process does not terminate, then the two
magnitudes are incommensurable.
The other arithmetical highlight is Proposition 20 of Book 9: “Prime numbers
are more than any assigned multitude of prime numbers.” Hardy (1940) calls its
proof “as fresh and significant as when it was discovered—two thousand years
have not written a wrinkle on [it]”. (For lack of notation, Euclid only illustrates
his proof idea by showing how to find from three given primes a fourth one.)
It is amusing to see how after such a profound discovery comes the platitude of
Proposition 21: “If as many even numbers as we please be added together, the
whole is even.” The Elements is full of such surprises, unnecessary case
distinctions, and virtual repetitions. This is, to a certain extent, due to a lack of
good notation. Indices came into use only in the early 19th century; a system
designed by Leibniz in the 17th century did not become popular.
Euclid authored some other books, but they never hit the bestseller list, and
some are forever lost.

26
Die ganzen Zahlen hat der liebe Gott gemacht,
alles andere ist Menschenwerk.1
Leopold Kronecker (1886)

“I only took the regular course.” “What was that?” enquired Alice.
“Reeling and Writhing, of course, to begin with,” the Mock Turtle
replied: “and then the different branches of Arithmetic—Ambition,
Distraction, Uglification, and Derision.”
Lewis Carroll (1865)

Computation is either perform’d by Numbers, as in Vulgar

Arithmetick, or by Species [variables]2 , as usual among Algebraists.
They are both built on the same Foundations, and aim at the same End,
viz. Arithmetick Definitely and Particularly, Algebra Indefinitely and
Universally; so that almost all Expressions that are found out by this
Computation, and particularly Conclusions, may be called Theorems.
[. . .] Yet Arithmetick in all its Operations is so subservient to Algebra,
as that they seem both but to make one perfect Science of Computing.
Isaac Newton (1728)

It is preferable to regard the computer as a handy device

for manipulating and displaying symbols.
Stanisław Marcin Ulam (1964)

In summo apud illos honore geometria fuit, itaque nihil

mathematicis inlustrius; at nos metiendi ratiocinandique
utilitate huius artis terminauimus modum.3
Marcus Tullius Cicero (45 BC)

But the rest, having no such grounds [religious devotion] of hope,

fell to another pastime, that of computation.
Robert Louis Stevenson (1889)

Check your math and the amounts entered

to make sure they are correct.
State of California (1996)

1 God created the integers, all else is the work of man.

2 [Text in brackets added by the authors, also in other quotations.]
3 Among them [the Greeks] geometry was held in highest esteem, nothing was more glorious than mathematics;
but we have restricted this science to the practical purposes of measuring and calculating.
2
Fundamental algorithms

We start by discussing the computer representation and fundamental arithmetic

algorithms for integers and polynomials. We will keep this discussion fairly infor-
mal and avoid all the intricacies of actual computer arithmetic—that is a topic on
its own. The reader must be warned that modern-day processors do not represent
numbers and operate on them as we describe now, but to describe the tricks they
use would detract us from our current goal: a simple description of how one could,
in principle, perform basic arithmetic.
Although our straightforward approach can be improved in practice for arith-
metic on small objects, say double-precision integers, it is quite appropriate for
large objects, at least as a start. Much of this book deals with polynomials, and
we will use some of the notions of this chapter throughout. A major goal is to find
algorithmic improvements for large objects.
The algorithms in this chapter will be familiar to the reader, but she can refresh
her memory of the analysis of algorithms with our simple examples.

2.1. Representation and addition of numbers

Algebra starts with numbers and computers work on data, so the very first issue
in computer algebra is how to feed numbers as data into computers. Data are
stored in pieces called words. Current machines use either 32- or 64-bit words; to
be specific, we assume that we have a 64-bit processor. Then one machine word
contains a single precision integer between 0 and 264 − 1.
How can we represent integers outside the range {0, . . . , 264 − 1}? Such a mul-
tiprecision integer is represented by an array of 64-bit words, where the first one
encodes the sign of the integer and the length of the array. To be precise, we
consider the 264 -ary (or radix 264 ) representation of a nonzero integer

a = (−1)s ∑ ai · 264i , (1)

0≤i≤n

where s ∈ {0, 1}, 0 ≤ n + 1 < 263 , and ai ∈ {0, . . . , 264 − 1} for all i are the digits

29
30 2. Fundamental algorithms

(in base 264 ) of a. We encode it as an array

s · 263 + n + 1, a0 , . . . , an

of 64-bit words. This representation can be made unique by requiring that the
leading digit an be nonzero if a 6= 0 (and using the single-entry array 0 to repre-
sent a = 0). We will call this the standard representation for a. For example, the
standard representation of −1 is 263 + 1, 1. It is, however, convenient also to allow
nonstandard representations with leading zero digits since this sometimes facili-
tates memory management, but we do not want to go into details here. The range
of integers that can be represented in standard representation on a 64-bit processor
63 63
is between −264·2 + 1 and 264·2 − 1; each of the two boundaries requires 263 + 1
words of storage. This size limitation is quite sufficient for practical purposes: one
of the larger representable numbers would fill about 70 million 1-TB-discs.
For a nonzero integer a ∈ Z, we define the length λ(a) of a as

log2 |a|
λ(a) = ⌊log264 |a|⌋ + 1 = + 1,
64

where ⌊·⌋ denotes rounding down to the nearest integer (so that ⌊2.7⌋ = 2 and
⌊−2.7⌋ = −3). Thus λ(a) + 1 = n + 2 is the number of words in the standard
representation (1) of a (see Exercise 2.1). This is quite a cluttered expression, and
1
it is usually sufficient to know that about 64 log2 |a| words are needed, or even more
succinctly O(log2 |a|), where the big-Oh notation “O” hides an arbitrary constant
(Section 25.7).
We assume that our hypothetical processor has at its disposal a command for
the addition of two single precision integers a and b. The output of the addition
command is a 64-bit word c plus the content of the carry flag γ ∈ {0, 1}, a special
bit in the processor status word which indicates whether the result exceeds 264 or
not. In order to be able to perform addition of multiprecision integers more easily,
the carry flag is also input to the addition command. More precisely, we have

a + b + γ = γ ∗ · 264 + c,

where γ is the value of the carry flag before the addition and γ ∗ is its value after-
wards. Usually there are processor instructions to clear and set the carry flag.
If a = ∑0≤i≤n ai 264i and b = ∑0≤i≤m bi 264i are two multiprecision integers, then
their sum is
c = ∑ (ai + bi )264i ,
0≤i≤k

where k = max{n, m}, and if, say, m ≤ n, then bm+1 , . . . , bn are set to zero. (In other
words, we may assume that m = n.) In general, ai + bi may be larger than 264 , and
if so, then the carry has to be added to the next digit in order to get a 264 -ary
2.1. Representation and addition of numbers 31

representation again. This process propagates from the lower order to the higher
order digits, and in the worst case, a carry from the addition of a0 and b0 may
influence the addition of an and bn , as the example a = 264(n+1) − 1 and b = 1
shows. Here is an algorithm for the addition of two multiprecision integers of the
same sign; see Exercise 2.3 for a subtraction algorithm.

A LGORITHM 2.1 Addition of multiprecision integers.

Input: Multiprecision integers a = (−1)s ∑0≤i≤n ai 264i , b = (−1)s ∑0≤i≤n bi 264i , not
necessarily in standard representation, with s ∈ {0, 1}.
Output: The multiprecision integer c = (−1)s ∑0≤i≤n+1 ci 264i such that c = a + b.

1. γ0 ←− 0

2. for i = 0, . . . , n do
ci ←− ai + bi + γi , γi+1 ←− 0
if ci ≥ 264 then ci ←− ci − 264 , γi+1 ←− 1
3. cn+1 ←− γn+1
return (−1)s ∑ ci 264i
0≤i≤n+1

We use the sign ←− in algorithms to denote an assignment, and indentation

distinguishes the body of a loop.
We practise this algorithm on the addition of a = 9438 = 9 · 103 + 4 · 102 +
3 · 10 + 8 and b = 945 = 9 · 102 + 4 · 10 + 5 and in decimal (instead of 264 -ary)
representation:
i 4 3 2 1 0
ai 9 4 3 8
bi 0 9 4 5
γi 1 1 0 1 0
ci 1 0 3 8 3
How much time does this algorithm use? The basic subroutine, the addition of
two single precision integers, requires some number of machine cycles, say k. This
number will depend on the processor used, and the actual CPU time depends on
the processor’s speed. Thus the addition of two integers of length at most n takes
kn machine cycles for single precision additions, plus some number of cycles for
control structures, manipulating sign bits, index arithmetic, memory access, etc.
We will, however, (correctly) assume that the number of machine cycles for the
latter has the same order of magnitude as the cost for the single precision arithmetic
operations that we count.
For the remainder of this text, it is vital to abstract from machine-dependent
details as discussed above. We will then just say that the addition of two n-word
32 2. Fundamental algorithms

integers can be done in time O(n), or at cost O(n), or with O(n) word operations;
the constants hidden in the big-Oh will depend on the details of the machine. We
gain two advantages from this concept: a shorter and more intuitive notation, and
independence of particular machines. The abstraction is justified by the fact that
the actual performance of an algorithm often depends on compiler optimization,
clever cache usage, pipelining effects, and many other things that are quite techni-
cal and nearly impossible to describe in a comparatively high-level programming
language. However, experiments show that “big-Oh” statements are reflected sur-
prisingly well by implementations on any kind of sequentially working processor:
adding two multiprecision integers is a linear operation in the sense that doubling
the input size also approximately doubles the running time.
One can make these statements more precise and formally satisfying. The cost
measure that is widely used for algorithms dealing with integers is the number of
bit operations which can be rigorously defined as the number of steps of a Tur-
ing or register machine (random access machine, RAM) or the number of gates
of a Boolean circuit implementing the algorithm. Since the details of those com-
putational models are rather technical, however, we will content ourselves with
informal arguments and cost measures, as above.
Related data types occurring in currently available processors and mathematical
software are single and multiprecision floating point numbers. These represent
approximations of real numbers, and arithmetic operations, such as addition and
multiplication, are subject to rounding errors, in contrast to the arithmetic opera-
tions on multiprecision integers, which are exact. Algorithms based on computa-
tions with floating point numbers are the main topic in numerical analysis , which
is a theme of its own; neither it nor the recent attempts at systematically combining
exact and numerical computations will be discussed in this text.

2.2. Representation and addition of polynomials

The two main data types on which our algorithms operate are numbers as above
and polynomials, such as a = 5x3 − 4x + 3 ∈ Z[x]. In general, we have a com-
mutative ring R, such as Z, in which we can perform the operations of addition,
subtraction, and multiplication according to the usual rules; see Section 25.2 for
details. (All our rings have a multiplicative unit element 1.) If we can also divide
by any nonzero element, as in the rational numbers Q, then R is a field.
A polynomial a ∈ R[x] in x over R is a finite sequence (a0 , . . . , an ) of elements
of R (the coefficients of a), for some n ∈ N, and we write it as

a = an xn + an−1 xn−1 + · · · + a1 x + a0 = ∑ ai xi . (2)

0≤i≤n

If an 6= 0, then n = deg a is the degree of a, and an = lc(a) is its leading coefficient.

If lc(a) = 1, then a is monic. It is convenient to take −∞ as the degree of the zero
2.2. Representation and addition of polynomials 33

polynomial. We can represent a by an array whose ith element is ai (in analogy

to the integer case, we would also need some storage for the degree, but we will
neglect this). This assumes that we already have a way of representing coefficients
from R. The length (the number of ring elements) of this representation is n + 1.
For an integer r ∈ N>1 (in particular, for r = 264 as in the previous section), the
representations (2) of a polynomial and the radix r representation

a = an rn + an−1 rn−1 + · · · + a1 r + a0 = ∑ ai ri ,
0≤i≤n

with digits a0 , . . . , an ∈ {0, . . . , r − 1}, of an integer a are quite similar. This is

particularly visible if we take polynomials over R = Zr = {0, . . . , r − 1}, the ring
of integers modulo r, with addition and multiplication modulo r (Sections 4.1
and 25.2). This similarity is an important point for computer algebra; many of
our algorithms apply (with small modifications) to both the integer and polyno-
mial cases: multiplication, division with remainder, gcd and Chinese remainder
computation. It is also relevant to note where this does not apply: the subresul-
tant theory (Chapter 6) and, most importantly, the factorization problem (Parts III
and IV). At the heart of this distinction lies the deceptively simple carry rule. It
gives the low digits some influence on the high digits in addition of integers, and
messes up the cleanly separated rules in the addition of two polynomials

a= ∑ ai xi and b = ∑ bi xi (3)
0≤i≤n 0≤i≤m

in R[x]. This is quite easy:

c = a+b = ∑ (ai + bi )xi = ∑ ci xi ,

0≤i≤n 0≤i≤n

where the addition ci = ai + bi is performed in R and, as with integers, we may

assume that m = n.
For example, addition of the polynomials a = 9x3 + 4x2 + 3x + 8 and b = 9x2 +
4x + 5 in Z[x] works as follows:

i 3 2 1 0
ai 9 4 3 8
bi 0 9 4 5
ci 9 13 7 13

Here is the (rather trivial) algorithm in our formalism.

A LGORITHM 2.2 Addition of polynomials.

Input: a = ∑0≤i≤n ai xi , b = ∑0≤i≤n bi xi in R[x], where R is a ring.
Output: The coefficients of c = a + b ∈ R[x].
34 2. Fundamental algorithms

1. for i = 0, . . . , n do ci ←− ai + bi
2. return c = ∑ ci xi
0≤i≤n

It is somewhat simpler than integer addition, with its carries. This simplicity
propagates down the line for more complicated algorithms such as multiplication,
division with remainder, etc. Although integers are more intuitive (we learn about
them at a much earlier stage in life), their algorithms are a bit more involved,
and we adopt in this book as a general program the strategy to present mainly
the simpler polynomial case which allows us to concentrate on the essentials, and
often leave details in the integer case to the exercises.
As a first example, we have seen that addition of two polynomials of degree up
to n takes at most n + 1 or O(n) arithmetic operations in R; there is no concern
with machine details here. This is a much coarser cost measure than the number of
word operations for integers. If, for example, R = Z and the coefficients are less
than B in absolute value, then the cost in word operations is O(n log B), which is
the same order of magnitude as the input size. Moreover, additive operations +, −
in R are counted at the same cost as multiplicative operations ·, /, while in most
applications the latter are significantly more expensive than the former.
As a general rule, we will analyze the number of arithmetic operations in the
ring R (additions and multiplications, and also divisions if R is a field) used by an
algorithm. In our analyses, the word addition stands for addition or subtraction ;
we do not count the latter separately. The number of other operations, such as in-
dex calculations or memory accesses, tends to be of the same order of magnitude.
These are usually performed with machine instructions on single words, and their
cost is negligible when the arithmetic quantities are large, say multiprecision inte-
gers. The input size is the number of ring elements that the input occupies. If the
coefficients are integers or polynomials themselves, we may then consider sepa-
rately the size of the coefficients involved and the cost for coefficient arithmetic.
We try to provide explicit (but not necessarily minimal) constants for the domi-
nant term in our analyses of algorithms on polynomials when the cost measure is
the number of arithmetic operations in the coefficient ring, but confine ourselves to
O-estimates when counting the number of word operations for algorithms working
on integers or polynomials with integral coefficients.

2.3. Multiplication
Following our program, we first consider the product c = a · b = ∑0≤k≤n+m ck xk of
two polynomials a and b in R[x], as in (3). Its coefficients are
ck = ∑ ai b j (4)
0≤i≤n
0≤ j≤m
i+ j=k
2.3. Multiplication 35

for 0 ≤ k ≤ n + m. We can just take this formula and turn it into a subroutine, after
figuring out suitable loop variables and boundaries:

for k = 0, . . . , n + m do
ck ←− 0
for i = max{0, k − m}, . . . , min{n, k} do
ck ←− ck + ai · bk−i

There are other ways to organize the loops. We learned in school the following
algorithm.

A LGORITHM 2.3 Multiplication of polynomials.

Input: The coefficients of a = ∑0≤i≤n ai xi and b = ∑0≤i≤m bi xi in R[x], where R is a
(commutative) ring.
Output: The coefficients of c = a · b ∈ R[x].

1. for i = 0, . . . , n do di ←− ai xi · b

2. return c = ∑ di
0≤i≤n

The multiplication ai xi · b is realized as the multiplication of each b j by ai plus a

shift by i places. The variable x serves us just as a convenient way to write polyno-
mials, and there is no arithmetic involved in “multiplying” by x or any power of it.
Here is a small example:

5x2 + 2x + 1 · 2x3 +x2 +3x+5

2x3 +x2 +3x+5
+4x +2x3 +6x2 +10x
4

+10x +5x4 +15x3 +25x2

10x5 +9x4 +19x3 +32x2 +13x+5

How much time does this take, that is, how many operations in the ground
ring R? Each of the n + 1 coefficients of a has to be multiplied with each of the
m + 1 coefficients of b, for a total of (n + 1)(m + 1) multiplications. Then these
are summed up in n + m + 1 sums; summing s items costs s − 1 additions. So the
total number of additions is

(n + 1)(m + 1) − (n + m + 1) = nm,

and the total cost for multiplication is 2nm + n + m + 1 ≤ 2(n + 1)(m + 1) op-
erations in R. (If a is monic, then the bound drops to 2nm + n ≤ 2n(m + 1).)
Thus we can say that two polynomials of degree at most n can be multiplied
using 2n2 + 2n + 1 operations, or 2n2 + O(n) operations, or O(n2 ) operations,
36 2. Fundamental algorithms

a3 a2 a1 a0 b4 b3 b2 b1 b0

c7 c6 c5 c4 c3 c2 c1 c0

F IGURE 2.1: An arithmetic circuit for polynomial multiplication. The flow of control is
directed downwards. An “electrical” view is to think of the edges as lines, where ring
elements “flow”, with “contact” crossings marked with a •, and no contact at the other
crossings. The size of this circuit equals 32, the number of arithmetic gates in it.

or in quadratic time. The three expressions for the running time get progres-
sively simpler but also less precise. In this book, each of the three versions has its
place (and there are even more versions).
For a computer implementation, Algorithm 2.3 has the drawback of requiring
us to store n + 1 polynomials with m + 1 coefficients each. A way around this is
to interleave the final addition with the computation of the ai xi b. This takes the
same time but uses only O(n + m) storage and is shown in Figure 2.1 for n = 3
and m = 4. Each horizontal level corresponds to one pass through the loop body
in step 1.
We will call classical those algorithms that take a definition of a function and
implement it fairly literally, as the multiplication algorithms above implements the
2.4. Division with remainder 37

formula (4). One might think that this is the only way of doing it. Fortunately,
there are much faster ways of multiplying, in almost linear rather than quadratic
time. We will study these fast algorithms in Part II. By contrast, for the addition
problem no improvement is possible, nor is it necessary: the algorithm uses only
linear time.
According to our general program, we now examine the integer case. The prod-
uct of two single precision integers a, b between 0 and 264 − 1 has “double preci-
sion”: it lies in the interval {0, . . . , 2128 − 265 + 1}. We assume that our processor
has a single precision multiplication instruction which returns the product in two
64-bit words c, d such that a · b = d · 264 + c. Here is the integer analog of Algo-
rithm 2.3.

A LGORITHM 2.4 Multiplication of multiprecision integers.

Input: Multiprecision integers a = (−1)s ∑0≤i≤n ai 264i , b = (−1)t ∑0≤i≤m bi 264i , not
necessarily in standard representation, with s,t ∈ {0, 1}.
Output: The multiprecision integer ab.
1. for i = 0, . . . , n do di ←− ai 264i · |b|
2. return c = (−1)s+t ∑ di
0≤i≤n

Besides the multiplication by 264i , which is just a shift in the 264 -ary represen-
tation, multiplication of a multiprecision integer b by a single precision integer ai
must be implemented. The time for this is O(m) (Exercise 2.5), and the total time
is quadratic: O(nm). In line with our general program, we omit the details for
implementing this efficiently.
We conclude this section with the example multiplication of a = 521 = 5 · 102 +
2 · 10 + 1 and b = 2135 = 2 · 103 + 102 + 3 · 10 + 5 in decimal representation, ac-
cording to Algorithm 2.4.
521 · 2135
2135
+42700
+1067500
1112335

2.4. Division with remainder

In many applications, calculations “modulo an integer” play an important role;
examples from program checking, database integrity, coding theory and cryptog-
raphy are discussed later in the book. But even when one is really only interested
in integer problems, a “modular” approach is often computationally successful; we
will see this for gcds and factorization of integer polynomials.
38 2. Fundamental algorithms

The basic tool for modular arithmetic is division with remainder: given inte-
gers a, b, with b nonzero, we want to find a quotient q and a remainder r—both
integers—so that
a = qb + r, |r| < |b|.
In line with our general program, we first discuss the computational aspect of this
problem for polynomials. So we are given a, b ∈ R[x], with b nonzero, and want to
find q, r ∈ R[x] so that
a = qb + r, deg r < deg b. (5)
A first problem is that such q and r do not always exist: it is impossible to divide
x2 by 2x + 1 with remainder in Z[x]! (See Exercise 2.8.) There is a way around
this, the pseudodivision explained in Section 6.12. However, for the moment we
simplify the problem by assuming that the leading coefficient lc(b) of b is a unit
in R, so that it has an inverse v ∈ R with lc(b)v = 1. For R = Z, that still only allows
1 or −1 as leading coefficient, but when R is a field, division with remainder by an
arbitrary nonzero polynomial is possible.
We remind the reader of the “synthetic division” learned in high school with a
small example in Z[x]:

3x4 +2x3 +x+5 : x2 + 2x + 3 = 3x2 − 4x − 1

4 3 2
−3x −6x −9x
−4x3 −9x2 +x+5
+4x3 +8x2 +12x
−x2 +13x+5
+x2 +2x+3
15x+8

Thus the coefficients of the quotient q = 3x2 − 4x − 1 are determined one by one,
starting at the top, by setting them equal to the corresponding coefficient of the
current “remainder” (in general, one additionally has to divide by lc(b)), which
initially is a = 3x4 + 2x3 + x + 5. Then the remainder is adjusted by subtracting
the appropriate multiple of b = x2 + 2x + 3. The final remainder is r = 15x + 8.
The degree of q is deg a − deg b if q 6= 0. The following algorithm formalizes
this familiar classical method for division with remainder by a polynomial whose
leading coefficient is a unit.

A LGORITHM 2.5 Polynomial division with remainder.

Input: a = ∑0≤i≤n ai xi , b = ∑0≤i≤m bi xi ∈ R[x], with all ai , bi ∈ R, where R is a ring
(commutative, with 1), bm a unit, and n ≥ m ≥ 0.
Output: q, r ∈ R[x] with a = qb + r and deg r < m.

1. r ←− a, u ←− b−1
m
2.4. Division with remainder 39

2. for i = n − m, n − m − 1, . . . , 0 do
3. if deg r = m + i then qi ←− lc(r)u, r ←− r − qi xi b
else qi ←− 0
4. return q = ∑ qi xi and r
0≤i≤n−m

Figure 2.2 represents this algorithm for n = 7 and m = 4 when bm = 1. Each

horizontal level in the circuit corresponds to one pass through step 3.
As in the polynomial multiplication algorithm, the multiplication qi xi b in step 3
is just a multiplication of each b j by qi , followed by a shift by i places. The
(m + i)th coefficient of r becomes 0 in step 3 and hence need not be computed.
Thus the cost for one execution of step 3 is m + 1 multiplications and m additions

a7 a6 a5 a4 a3 a2 a1 a0 b3 b2 b1 b0

q3 q2 q1 q0 r3 r2 r1 r0

F IGURE 2.2: An arithmetic circuit for polynomial division. A subtraction node computes
the difference of its left input minus its right input.
40 2. Fundamental algorithms

in R if deg r = m + i. Together, we have a cost of at most

(2m + 1)(n − m + 1) = (2 deg b + 1)(deg q + 1) ∈ O(n2 )

additions and multiplications in R plus one division for inverting bm , and only at
most 2 deg b(deg q + 1) additions and multiplications if b is monic. In many appli-
cations, we have n < 2m, and then the cost is at most 2m2 + O(m) ring operations
(plus an inversion), which is essentially the same as for multiplying two polyno-
mials of degree at most m.
It is easy to see that the quotient and remainder are uniquely determined (when
lc(b) is a unit). Namely, another equation a = q∗ b + r∗ , with q∗ , r∗ ∈ R[x] and
deg r∗ < deg b, yields by subtraction

(q∗ − q)b = r − r∗ .

The right hand side has degree less than deg b, and the left hand side has degree at
least deg b, unless q∗ − q = 0. Therefore the latter is true, q = q∗ , and r = r∗ . We
write “a quo b” for the quotient q and “a rem b” for the remainder r.
What about the integer case? The analog of Algorithm 2.5 is well known from
high school, at least in the decimal representation:

32015 : 123 = 260

−24600
7415
−7380
35
The digits of the quotient are again determined one by one, starting at the top. At
each step, one has to determine how often the leading digit of b divides the one
or two leading digits of the current remainder. This requires a “double-by-single-
precision” division instruction. However, it may happen that this “trial” quotient
digit is too large, as in the first line: the quotient of 3 on division by 1 is 3, but the
leading digit of the quotient ⌊32 015/123⌋ is 2.
Algorithmically, in each step, the product of the shifted divisor 264i b with a trial
quotient digit is subtracted from some integer of the same length. If the result is
negative, then the trial quotient digit is too large and one has to decrement it and
add 264i b to the remainder (repeatedly, if necessary) until the remainder is nonneg-
ative. If things are arranged properly, then one such step can be done with O(λ(b))
word operations. The length of the quotient—the number of iterations—is at most
λ(a) − λ(b) + 1 (Exercise 2.7), and the overall cost estimate is again O(λ(b)λ(q))
or O(m(n − m)) word operations if a and b have lengths n and m, respectively. Due
to the carries, the details are somewhat more complicated than in the polynomial
case; the interested reader is referred to the comprehensive discussion in Knuth
(1998), §4.3.1.
Notes 41

In contrast to the polynomial case, the remainder r is not uniquely determined

by the condition |r| < |b|: 13 = 2 · 5 + 3 = 3 · 5 + (−2). We will often follow the
convention that the remainder be nonnegative and denote it by a rem b, and the
corresponding quotient then is ⌊a/b⌋ if b > 0.

Notes. Good texts on algorithms and their analysis are Brassard & Bratley (1996) and
Cormen, Leiserson, Rivest & Stein (2009).
Addition and multiplication algorithms in decimal notation are explicitly described in
Stevin (1585). Several algorithms for computer arithmetic, such as fast (carry look-ahead
and carry-save) addition, are given in Cormen, Leiserson, Rivest & Stein (2009). For
information about the highly active area of symbolic-numeric computations see the Special
Issue of the Journal of Symbolic Computation (Watt & Stetter 1998) and Corless, Kaltofen
& Watt (2003).
2.4. The first algorithm for division with remainder of polynomials appears in Nuñez
(1567). He is, of course, limited by the concepts of his times to specific degrees, 3 and 1
in his case, and positive coefficients. On fo 31ro , Nuñez writes: Si el partidor fuere com-
puesto, partiremos las mayores dignidades de lo que se ha de partir por la mayor dignidad
del partidor, dexandole en que pueda caber la otra dignidad del partidor, y lo q̃ viniere
multiplicaremos por el partidor, y lo produzido por essa multiplicacion sacaremos de toda
la sũma que se parte, y lo mismo obraremos en lo q̃ restare, por el modo q̃ tenemos quando
partimos numero por numero. Y llegando a numero o dignidad en esta obra que sea de
menor denominacion, que el partidor, quedara essa quantidad en quebrado, [. . . ]1 He then
explains the division of 12x3 + 18x2 + 27x + 17 by 4x + 3 with quotient 3x2 + 2 43 x + 5 163
13
and remainder 1 16 , and checks his result by multiplying out.
An anonymous author (1835) presents a decimal division algorithm for hand calculation
based on a “10’s complement” notation.

Exercises.
2.1 For an integer r ∈ N>1 , we consider the variable-length radix r representation (a0 , . . ., al−1 ) of
a positive integer a, with a = ∑0≤i<l ai ri , a0 , . . ., al−1 ∈ {0, . . ., r − 1}, and al−1 6= 0. Prove that its
length l is ⌊logr a⌋ + 1.
2.2 Design a representation for integers of unlimited size on a 64-bit machine.
2.3 (i) Specify a processor instruction analogous to the addition instruction mentioned in the text
which performs subtraction of two single precision integers. Use the carry flag to indicate whether
the result is negative or not.
(ii) Design an algorithm similar to Algorithm 2.1 for the subtraction of two multiprecision integers
a and b of equal sign and with |a| > |b|.
(iii) Discuss how to decide whether |a| > |b| holds.
2.4 Here is a piece of code implementing Algorithm 2.1 for nonnegative multiprecision integers
(that is, when s = 0) on a hypothetical processor. Text enclosed in /* and */ is a comment. The
1 If the divisor is composed [of more than one summand], we divide the leading term of the dividend by the
leading term of the divisor, ignoring the other terms of the divisor, and we multiply the result by the divisor and
subtract the result of this multiplication from the whole of the dividend, and we apply the same procedure to what
is left, in the way we use it when we divide one number by another. And if we arrive in this procedure at numbers
or terms whose degree is less than that of the divisor, then this quantity will remain as a fraction [...]
42 2. Fundamental algorithms

processor has 26 freely usable registers named A to Z. Initially, registers A and B point to the first
word (the one containing the length) of the representations of a and b, respectively, and C points to a
piece of memory where the representation of c shall be placed.
1: LOAD N, [A] /* load the word that A points to into register N */
2: ADD K, N, 1 /* add 1 to register N and store the result in K
(without affecting the carry flag) */
3: STORE [C], K /* store K in the word that C points to */
4: ADD A, A, 1 /* increase register A by 1 */
5: ADD B, B, 1
6: ADD C, C, 1
7: LOAD I, 1 /* load the constant 1 into register I */
8: CLEARC /* clear carry flag */
9: COMP I, N /* compare the contents of registers I and N ... */
10: BGT 20 /* ... and jump to line 20 if I is greater */
11: LOAD S, [A]
12: LOAD T, [B]
13: ADDC S, S, T /* add the contents of register T to register S
using the carry flag */
14: STORE [C], S
15: ADD A, A, 1
16: ADD B, B, 1
17: ADD C, C, 1
18: ADD I, I, 1
19: JMP 9 /* unconditionally jump to line 9 */
20: ADDC S, 0, 0 /* store carry flag in S */
21: STORE [C], S
22: RETURN
Suppose that our processor runs at 2 GHz and that the execution of one instruction takes one machine
cycle = 0.5 nanoseconds = 5 · 10−10 seconds. Calculate the precise time, in terms of n, to run the
above piece of code, and convince yourself that this is indeed O(n).
2.5 Give an algorithm for multiplying a multiprecision integer b by a single precision integer a,
making use of the single precision multiply instruction described in Section 2.3. Show that your al-
gorithm uses λ(b) single precision multiplications and the same number of single precision additions.
Convert your algorithm into a machine program as in Exercise 2.4.
2.6 Prove that max{λ(a), λ(b)} ≤ λ(a + b) ≤ max{λ(a), λ(b)} + 1 and λ(a) + λ(b) − 1 ≤ λ(ab) ≤
λ(a) + λ(b) hold for all a, b ∈ N>0 .
2.7 Let a > b ∈ N>0 , m = λ(a), n = λ(b) and q = ⌊a/b⌋. Give tight upper and lower bounds for
λ(q) in terms of m and n.
2.8 Prove that in Z[x] one cannot divide x2 by 2x + 1 with remainder as in (5).
2.9∗ Let R be an integral domain with field of fractions K and a, b ∈ R[x] of degree n ≥ m ≥ 0. Then
we can apply the polynomial division algorithm 2.5 to compute q, r ∈ K[x] such that a = qb + r and
deg r < deg b.
(i) Prove that there exist q, r ∈ R[x] with a = qb + r and deg r < deg b if and only if lc(b) | lc(r)
in R every time the algorithm passes through step 3, and that they are unique in that case.
(ii) Modify Algorithm 2.5 so that on input a, b, it decides whether q, r ∈ R[x] as in (i) exist, and
if so, computes them. Show that this takes the same number of operations in R as given in the text,
where one operation is either an addition or a multiplication in R, or a test which decides whether an
element c ∈ R divides another element d ∈ R, and if so, computes the quotient d/c ∈ R.
Exercises 43

2.10 Let R be a ring (commutative, with 1) and a = ∑0≤i≤n ai xi ∈ R[x] of degree n, with all ai ∈ R.
The weight w(a) of a is the number of nonzero coefficients of a besides the leading coefficient:

w(a) = #{0 ≤ i < n : ai 6= 0}.

Thus w(a) ≤ deg a, with equality if and only if all coefficients of a are nonzero. The sparse repre-
sentation of a, which is particularly useful if a has small weight, is a list of pairs (i, ai )i∈I , with each
ai ∈ R and a = ∑i∈I ai xi . Then we can choose #I = w(a) + 1.
(i) Show that two polynomials a, b ∈ R[x] of weight n = w(a) and m = w(b) can be multiplied in
the sparse representation using at most 2nm + n + m + 1 arithmetic operations in R.
(ii) Draw an arithmetic circuit for division of a polynomial a ∈ R[x] of degree less than 9 by b =
x6 − 3x4 + 2 with remainder. Try to get its size as small as possible.
(iii) Let n ≥ m. Show that quotient and remainder on division of a polynomial a ∈ R[x] of degree
less than n by b ∈ R[x] of degree m, with lc(b) a unit, can be computed using n − m divisions in R,
and w(b) · (n − m) multiplications and subtractions in R each.
2.11 Let R be a ring and k, m, n ∈ N. Show that the “classical” multiplication of two matrices
A ∈ Rk×m and B ∈ Rm×n takes (2m − 1)kn arithmetic operations in R.
‘Immortality’ may be a silly word, but probably a mathematician
has the best chance of whatever it may mean.
Godfrey Harold Hardy (1940)

The ignoraunte multitude doeth, but as it was euer wonte, enuie that
knoweledge, whiche thei can not attaine, and wishe all men ignoraunt,
like unto themself. [. . . ] Yea, the pointe in Geometrie,
and the unitie in Arithmetike, though bothe be undiuisible,
doe make greater woorkes, & increase greater multitudes,
then the brutishe bande of ignoraunce is hable to withstande.
Robert Recorde (1557)

If mathematics is considered to be a science

[that is, devoted to the description of nature and its laws],
it is more fundamental than any other.
Murray Gell-Mann (1994)

I have often wished, that I had employed about the speculative part of
geometry, and the cultivation of the specious Algebra [multivariate
polynomials] I had been taught very young, a good part of that time
and industry, that I had spent about surveying and fortification (of
which I remember I once wrote an entire treatise) and other practick
parts of mathematicks. And indeed the operations of symbolical
arithmetick (or the modern Algebra) seem to me to afford men one of
the clearest exercises of reason that I ever yet met with.
Robert Boyle (1671)

The length of this article will not be blamed by any one

who considers that, the sacred writers excepted, no Greek
has been so much read and so variously translated as Euclid.
Augustus De Morgan (c. 1844)
3
The Euclidean Algorithm

Integers and polynomials with coefficients in a field behave similarly in many re-
spects. Often—but not always—the algorithms for both types of objects are quite
similar, and sometimes one can find a common abstraction of both domains, and it
is then sufficient to design one algorithm for this generalization to solve both prob-
lems in one fell swoop. In this chapter, the Euclidean domain covers the structural
similarities between gcd computations for integers and polynomials. Typically, in
such a situation the polynomial version is slightly simpler, and in Chapter 6, we
will meet polynomial subresultants which have no integer analog at all.

3.1. Euclidean domains

The Euclidean Algorithm for the two integers 126 and 35 works as follows:

126 = 3 · 35 + 21,
35 = 1 · 21 + 14,
(1)
21 = 1 · 14 + 7,
14 = 2 · 7,
and 7 is the greatest common divisor of 126 and 35. One of the most important
applications is for exact arithmetic on rational numbers, where one has to simplify
35/126 to 5/18 in order to keep the numbers small.
This algorithm can also be adapted to work for polynomials. It is convenient
to use the following general scenario, which captures both situations under one
umbrella. The reader may always think of R as being either the integers or poly-
nomials. The algebraic terminology is explained in Chapter 25.

D EFINITION 3.1. An integral domain R with a function d: R −→ N ∪ {−∞} is

a Euclidean domain if for all a, b ∈ R with b 6= 0, we can divide a by b with
remainder, so that

there exist q, r ∈ R such that a = qb + r and d(r) < d(b). (2)

45
46 3. The Euclidean Algorithm

We say that q = a quo b is the quotient and r = a rem b the remainder, although
q and r need not be unique. Such a d is called a Euclidean function on R.

E XAMPLE 3.2. (i) R = Z and d(a) = |a| ∈ N. Here the quotient and the remain-
der can be made unique by the additional requirement that r ≥ 0.
(ii) R = F[x], where F is a field, and d(a) = deg a. We define the degree of the
zero polynomial to be −∞. It is easy to show uniqueness of the quotient and the
remainder in this case (Section 2.4).
√
(iii) R = Z[i] = {a + ib: a, b ∈ Z}, the ring of Gaussian integers, with i = −1,
and d(a + ib) = a2 + b2 (Exercise 3.19).
(iv) R a field, and d(a) = 1 if a 6= 0 and d(0) = 0. ✸

The value d(b) is never −∞ except possibly when b = 0.

D EFINITION 3.3. Let R be a ring and a, b, c ∈ R. Then c is a greatest common

divisor (or gcd) of a and b if

(i) c | a and c | b,

(ii) if d | a and d | b, then d | c, for all d ∈ R.

Similarly, c is called a least common multiple (or lcm) of a and b if

(i) a | c and b | c,

(ii) if a | d and b | d , then c | d , for all d ∈ R.

A unit u ∈ R is any element with a multiplicative inverse v ∈ R, so that uv = 1.

The elements a and b are associate if a = ub for a unit u ∈ R; we then write a ∼ b.

For example, 3 is a gcd of 12 and 15, and 60 is an lcm of 12 and 15 in Z. In

general, neither the gcd nor the lcm are unique, but all gcds of a and b are precisely
the associates of one of them, and similarly for the lcms. The only units in Z are 1
and −1, and 3 and −3 are all gcds of 12 and 15 in Z. For R = Z, we may define
gcd(a, b) as the unique nonnegative greatest common divisor and lcm(a, b) as the
unique nonnegative least common multiple of a and b. As an example, for negative
a ∈ Z we then have gcd(a, a) = gcd(a, 0) = −a. We say that two integers a, b are
coprime (or relatively prime) if their gcd is a unit.
Greatest common divisors and least common multiples need not exist in an arbi-
trary ring; for an example, see Section 25.2. In the following section, however, we
will prove that a gcd always exists in a Euclidean domain, and as a consequence
an lcm also exists.
3.2. The Extended Euclidean Algorithm 47

L EMMA 3.4. The gcd in Z has the following properties, for all a, b, c ∈ Z.

(i) gcd(a, b) = |a| ⇐⇒ a | b,

(ii) gcd(a, a) = gcd(a, 0) = |a| and gcd(a, 1) = 1,
(iii) gcd(a, b) = gcd(b, a) (commutativity),
(iv) gcd(a, gcd(b, c)) = gcd(gcd(a, b), c) (associativity),
(v) gcd(c · a, c · b) = |c| · gcd(a, b) (distributivity),
(vi) |a| = |b| =⇒ gcd(a, c) = gcd(b, c).

For a proof, see Exercise 3.3. Because of the associativity, we may write

gcd(a1 , . . . , an ) = gcd(a1 , gcd(a2 , . . . , gcd(an−1 , an ) . . .)).

The following algorithm computes greatest common divisors in an arbitrary Eu-

clidean domain. The nonuniqueness of quotient and remainder complicates the
description a bit. For now, “gcd(a, b)” stands for any element which is a gcd of a
and b, and we assume some functions quo and rem which associate to any two el-
ements a, b ∈ R with b 6= 0 unique q = a quo b and r = a rem b in R with a = qb+r
and d(r) < d(b). Section 3.4 fixes the notation so that gcd has only a single value.

A LGORITHM 3.5 Traditional Euclidean Algorithm.

Input: f , g ∈ R, where R is a Euclidean domain with Euclidean function d.
Output: A greatest common divisor of f and g.

1. r0 ←− f , r1 ←− g
2. i ←− 1
while ri 6= 0 do ri+1 ←− ri−1 rem ri , i ←− i + 1
3. return ri−1 .

For f = 126 and g = 35, the algorithm works precisely as illustrated at the
beginning of this section.

3.2. The Extended Euclidean Algorithm

The following extension of Algorithm 3.5 computes not only the gcd but also a
representation of it as a linear combination of the inputs. It generalizes the repre-
sentation

7 = 21 − 1 · 14 = 21 − (35 − 1 · 21) = 2 · (126 − 3 · 35) − 35 = 2 · 126 − 7 · 35,

48 3. The Euclidean Algorithm

which is obtained by reading the lines of (1) from the bottom up. This important
method is called the Extended Euclidean Algorithm and works in any Euclidean
domain. In various incarnations, it plays a central role throughout this book.

A LGORITHM 3.6 Traditional Extended Euclidean Algorithm.

Input: f , g ∈ R, where R is a Euclidean domain.
Output: ℓ ∈ N, ri , si ,ti ∈ R for 0 ≤ i ≤ ℓ + 1, and qi ∈ R for 1 ≤ i ≤ ℓ, as computed
below.
1. r0 ←− f , s0 ←− 1, t0 ←− 0,
r1 ←− g, s1 ←− 0, t1 ←− 1
2. i ←− 1
while ri 6= 0 do
qi ←− ri−1 quo ri
ri+1 ←− ri−1 − qi ri
si+1 ←− si−1 − qi si
ti+1 ←− ti−1 − qiti
i ←− i + 1
3. ℓ ←− i − 1
return ℓ, ri , si ,ti for 0 ≤ i ≤ ℓ + 1, and qi for 1 ≤ i ≤ ℓ

We note that the algorithm terminates because the d(ri ) are strictly decreasing
nonnegative integers for 1 ≤ i ≤ ℓ, where d is the Euclidean function on R. The
elements ri for 0 ≤ i ≤ ℓ + 1 are the remainders and the qi for 1 ≤ i ≤ ℓ are
the quotients in the traditional (Extended) Euclidean Algorithm. The elements
ri , si , and ti form the ith row in the traditional Extended Euclidean Algorithm, for
0 ≤ i ≤ ℓ + 1. The central property is that si f + ti g = ri for all i; in particular,
sℓ f + tℓ g = rℓ is a gcd of f and g (see Lemma 3.8 below). We will see later that all
other intermediate results computed by the algorithm are useful for various tasks
in computer algebra.

E XAMPLE 3.7. (i) As in (1), we consider R = Z, f = 126, and g = 35. The

following table illustrates the computation.
i qi ri si ti
0 126 1 0
1 3 35 0 1
2 1 21 1 −3
3 1 14 −1 4
4 2 7 2 −7
5 0 −5 18
We can read off row 4 that gcd(126, 35) = 7 = 2 · 126 + (−7) · 35.
3.2. The Extended Euclidean Algorithm 49

(ii) R = Q[x], f = 18x3 − 42x2 + 30x − 6, g = −12x2 + 10x − 2. Then the com-
putation of the traditional Extended Euclidean Algorithm goes as follows. Row
i + 1 is obtained from the two preceding ones by first computing the quotient
qi = ri−1 quo ri and then for each of the three remaining columns by subtracting
the quotient times the entry in row i of that column from the entry in row i − 1.

i qi ri si ti
3 2
0 18x − 42x + 30x − 6 1 0
1 − 32 x + 94 −12x2 + 10x − 2 0 1
2 − 83 x + 43 9
2x− 2
3
1 3 9
2x− 4
8 4
3 0 3x− 3 4x2 − 8x + 4

We have ℓ = 2, and from row 2, we find that a gcd of f and g is

9 3 3 9
x − = 1 · (18x3 − 42x2 + 30x − 6) + x − (−12x2 + 10x − 2). ✸
2 2 2 4

For a global view of the algorithm, it is convenient to consider the matrices

s0 t0 0 1
R0 = , Qi = for 1 ≤ i ≤ ℓ
s1 t1 1 −qi

in R2×2 , and Ri = Qi · · · Q1 R0 for 0 ≤ i ≤ ℓ. The following lemma collects some

invariants of the traditional Extended Euclidean Algorithm.

L EMMA 3.8. For 0 ≤ i ≤ ℓ, we have

f ri
(i) Ri · = ,
g ri+1

si ti
(ii) Ri = ,
si+1 ti+1

(iii) gcd( f , g) ∼ gcd(ri , ri+1 ) ∼ rℓ ,

(iv) si f + ti g = ri (this also holds for i = ℓ + 1),

(v) siti+1 − ti si+1 = (−1)i ,

(vi) gcd(ri ,ti ) ∼ gcd( f ,ti ),

(vii) f = (−1)i (ti+1 ri − ti ri+1 ), g = (−1)i+1 (si+1 ri − si ri+1 ),

with the convention that rℓ+1 = 0.

50 3. The Euclidean Algorithm

P ROOF. For (i) and (ii) we proceed by induction on i. The case i = 0 is clear from
step 1 of the algorithm, and we may assume i ≥ 1. Then

ri−1 0 1 ri−1 ri ri
Qi = = = ,
ri 1 −qi ri ri−1 − qi ri ri+1
and (i) follows from Ri = Qi Ri−1 and the induction hypothesis. Similarly, (ii)
follows from
si−1 ti−1 si ti
Qi =
si ti si+1 ti+1
and the induction hypothesis.
For (iii), let i ∈ {0, . . . , ℓ}. We conclude from (i) that
r
ℓ f ri
= Qℓ · · · Qi+1 Ri = Qℓ · · · Qi+1 .
0 g ri+1
Comparing the first entry on both sides, we see that rℓ is a linear combination of
ri and ri+1 , and hence any common divisor of ri and ri+1 divides rℓ . On the other
hand, det Qi = −1 and the matrix Qi is invertible over R, with inverse

−1 qi 1
Qi = ,
1 0
and hence r
ri ℓ
= Q−1 −1
i+1 · · · Qℓ .
ri+1 0
Thus both ri and ri+1 are divisible by rℓ , and rℓ ∼ gcd(ri , ri+1 ). In particular, this is
true for i = 0, so that gcd( f , g) ∼ gcd(r0 , r1 ) ∼ rℓ .
The claim (iv) follows immediately from (i) and (ii), and (v) follows from (ii)
by taking determinants:

si ti
siti+1 − ti si+1 = det = det Ri
si+1 ti+1

s0 t0
= det Qi · · · det Q1 · det = (−1)i .
s1 t1
In particular, this implies that gcd(si ,ti ) ∼ 1 and that Ri is invertible. Now let p ∈ R
be a divisor of ti . If p | f , then clearly p | si f + ti g = ri . On the other hand, if p | ri ,
then p also divides si f = ri −ti g, and hence p divides f since si and ti are coprime.
This proves (vi). For (vii), we multiply both sides of (i) by R−1 i and obtain

r0 −1 ri i ti+1 −ti ri
= Ri = (−1) ,
r1 ri+1 −si+1 si ri+1
using (ii) and (v), and the claim follows by writing this out as a system of linear
equations. ✷
3.3. Cost analysis for Z and F[x] 51

C OROLLARY 3.9.
Any two elements f , g of a Euclidean domain R have a gcd h ∈ R, and it is express-
ible as a linear combination h = s f + tg with s,t ∈ R.

3.3. Cost analysis for Z and F[x]

We want to analyze the cost of the traditional Extended Euclidean Algorithm 3.6
for f , g ∈ R with n = d( f ) ≥ d(g) = m ≥ 0. The number ℓ of division steps is
obviously bounded by ℓ ≤ d(g) + 1. We investigate the two important cases R =
F[x] and R = Z separately, starting with R = F[x], where F is a field, and d(a) =
deg a as usual.
We let ni = deg ri for 0 ≤ i ≤ ℓ + 1, with rℓ+1 = 0. Then n0 = n ≥ n1 = m > n2 >
· · · > nℓ , and deg qi = ni−1 − ni for 1 ≤ i ≤ ℓ. According to Section 2.4, we can
divide the polynomial ri−1 of degree ni−1 by the polynomial ri of degree ni ≤ ni−1
with remainder using at most (2ni + 1)(ni−1 − ni + 1) additions and multiplications
plus one inversion in F. (Recall that we count a subtraction as an addition.) Thus
the total cost for the traditional Euclidean Algorithm, that is, for computing only
the ri and qi , including a gcd of f and g, is

∑ (2ni + 1)(ni−1 − ni + 1) (3)

1≤i≤ℓ

additions and multiplications plus ℓ ≤ m + 1 inversions in F. We first evaluate the

expression above for the normal case where the degree drops exactly by 1 at each
step, so that ni = m − i + 1 for 2 ≤ i ≤ ℓ = m + 1, and later show that this is the
worst case. Since ni−1 − ni + 1 = 2 for i ≥ 2 and n1 = m, (3) simplifies to

(2m + 1)(n − m + 1) + 2 ∑ (2(m − i + 1) + 1)

2≤i≤m+1 (4)
= (2m + 1)(n − m + 1) + 2(m2 − m) + 2m = 2nm + n + m + 1.

We now consider the sum σ (n0 , n1 , . . . , nℓ ) in (3) as a function of the integers n0 ≥

n1 > n2 > · · · > nℓ ≥ 0 and show that it increases if we insert an additional integer
n j−1 > k > n j for some j ∈ {2, . . . , ℓ} or append some integer nℓ > k ≥ 0:

σ (n0 , . . . , n j−1 , k, n j , . . . , nℓ ) − σ (n0 , . . . , nℓ )

= (2k + 1)(n j−1 − k + 1) + (2n j + 1)(k − n j + 1) − (2n j + 1)(n j−1 − n j + 1)
= 2(n j−1 − k)(k − n j ) + 2k + 1 > 0.

A similar argument works for nℓ > k ≥ 0. Proceeding inductively, we find that

σ (n0 , n1 , n2 , . . . , nℓ ) < σ (n0 , n1 , n1 − 1, . . . , 1, 0), and conclude that the bound (4) is
valid in any case.
It remains to determine the cost for computing si ,ti on the way.
52 3. The Euclidean Algorithm

L EMMA 3.10.

deg si = ∑ deg q j = n1 − ni−1 for 2 ≤ i ≤ ℓ + 1, (5)

2≤ j<i

degti = ∑ deg q j = n0 − ni−1 for 1 ≤ i ≤ ℓ + 1. (6)

1≤ j<i

P ROOF. We only prove the first equality; the second can be verified in the same
way (Exercise 3.21 (i)). We show (5) and

deg si−1 < deg si for 2 ≤ i ≤ ℓ + 1 (7)

by simultaneous induction on i. For i = 2, we find that s2 = s0 − q1 s1 =

1 − q1 · 0 = 1, and deg s1 = −∞ < 0 = deg s2 . Now we assume that i ≥ 2 and that
the claims are already proven for 2 ≤ j ≤ i. Then, by the induction hypothesis (7),
we have
deg si−1 < deg si < ni−1 − ni + deg si = deg(qi si ),
which implies that

deg si+1 = deg(si−1 − qi si ) = deg qi + deg si > deg si ,

and
deg si+1 = deg qi + deg si = ∑ deg q j + deg qi = ∑ deg q j ,
2≤ j<i 2≤ j<i+1

where we used the induction hypothesis (5). ✷

T HEOREM 3.11.
The traditional Extended Euclidean Algorithm 3.6 for polynomials f , g ∈ F[x] with
deg f = n ≥ deg g = m can be performed with
◦ at most m + 1 inversions and 2nm + O(n) additions and multiplications in F if
only the quotients qi and the remainders ri are needed,
◦ at most m + 1 inversions and 6nm + O(n) additions and multiplications in F
for computing all results.

P ROOF. The first claim has already been shown, and it remains to analyze the
additional cost for computing the si and ti . At each step, the computation of ti+1 =
ti−1 − qiti requires at most 2 deg qi degti + deg qi + degti + 1 field operations for
the product (Section 2.3), plus at most degti+1 + 1 operations for the subtraction.
Using Lemma 3.10, we obtain

∑ 2(ni−1 − n i )(n0 − ni−1 ) + 2(n0 − ni + 1)
2≤i≤ℓ
3.3. Cost analysis for Z and F[x] 53

additions and multiplications in F, plus n − m + 1 for i = 1. In the normal case,

this becomes

n−m+1+ ∑ 2(n − (m − i + 2)) + 2(n − (m − i + 1) + 1)
2≤i≤m+1

= n−m+1+4 ∑ (n − m + i − 1)
2≤i≤m+1

= n − m + 1 + 4m(n − m) + 2(m2 + m) ∈ 4nm − 2m2 + O(n).

A similar argument as above shows that the normal case is the worst case, so
that the bound is valid in general. Finally, Exercise 3.22 (i) shows that the cost for
the si ’s is at most 2(m2 + m), and the claim follows. ✷

In Chapter 11, we will find a much faster algorithm for the gcd.
Now we sketch the cost analysis when R = Z and d(a) = |a|. We may assume
that f = r0 ≥ g = r1 > r2 · · · > rℓ ≥ 0, so that qi ≥ 1 for all i, and represent all
numbers in 264 -ary standard representation (Section 2.1). Then the length λ(a) of
a positive integer a is λ(a) = ⌊(log a)/64⌋ + 1, where log is the binary logarithm.
But now the bound corresponding to what we used for polynomials, namely ℓ ≤
d(g) + 1 = g + 1 = (264 )(log g)/64 + 1 ≤ 264λ(g), on the number of division steps in
the Euclidean Algorithm for the pair ( f , g) ∈ N 2 is exponential in the input size
λ( f ) + λ(g) (if λ( f ) is not much bigger than λ(g)) and hence rather useless. We
can in fact prove a polynomial upper bound on ℓ, as follows. For 1 ≤ i ≤ ℓ, we
have
ri−1 = qi ri + ri+1 ≥ ri + ri+1 > 2ri+1 .
Thus
∏ ri−1 > 2ℓ−2 ∏ ri+1
2≤i<ℓ 2≤i<ℓ

if ℓ > 2, and rℓ−1 ≥ 2 implies that

r1 r2 r2
2ℓ−2 < < 1,
rℓ−1 rℓ 2

log g log g
ℓ ≤ ⌊2 log r1 ⌋ + 1 = 128 + 1 ≤ 128 + 1 = 128λ(g).
64 64

This bound can still be improved. For N ∈ N and f , g ∈ Z with N ≥ f > g > 0,
the largest possible number of division steps ℓ for ( f , g) is the one where all the
quotients are equal to 1, so that f and g are the two largest successive Fibonacci
numbers up to N. As an example, the Euclidean Algorithm for ( f , g) = (13, 8)
computes
54 3. The Euclidean Algorithm

13 = 1 · 8 + 5,
8 = 1 · 5 + 3,
5 = 1 · 3 + 2,
3 = 1 · 2 + 1,
2 = 2 · 1.

The nth Fibonacci number √ Fn (with F0 = 0, F1 =√1, and Fn = Fn−1 + Fn−2 for
n ≥ 2) is approximately φn / 5, where φ = (1 + 5)/2 ≈ 1.618 is the golden
ratio (Exercise 3.28). Thus the following holds for the number ℓ of division steps
for ( f , g) = (Fn+1 , Fn ) if n ≥ 1:
√
ℓ = n − 1 ≈ logφ 5g − 1 ∈ 1.441 log g + O(1). (8)

The average number of division steps for ( f , g) when g is fixed and f varies is

12(ln 2)2
ℓ≈ log g ≈ 0.584 log g.
π2

Now that we have a good upper bound for the number of steps in the Euclidean
Algorithm, we look at the cost for each step. First we consider the cost for one
division step. Let a > b > 0 be integers and a = qb + r with q, r ∈ N and 0 ≤
r < b. According to Section 2.4, computing q and r takes O((λ(a) − λ(b)) · λ(b))
word operations, where λ(a) and λ(b) are the lengths of a and b in the standard
representation, respectively.
Then setting n = λ( f ) and m = λ(g), we obtain—by analogy with (4)—that the
total cost for performing the traditional Euclidean Algorithm (without computing
the si and ti ) is O(nm) word operations.
The following integer analog of Lemma 3.10 is proven in Exercise 3.23.

g f
L EMMA 3.12. |si | ≤ and |ti | ≤ for 1 ≤ i ≤ ℓ + 1.
ri−1 ri−1

Lemma 3.12 yields analogous bounds for the length of si and ti as in the poly-
nomial case, and we have the following theorem, whose proof is left as Exercise
3.24.

T HEOREM 3.13.
The traditional Extended Euclidean Algorithm 3.6 for positive integers f , g with
λ( f ) = n ≥ λ(g) = m can be performed with O(nm) word operations.
3.4. (Non-)Uniqueness of the gcd 55

N cN /N 2
10 0.63
100 0.6087
1000 0.608383
10 000 0.60794971
100 000 0.6079301507

TABLE 3.1: The probabilities that two random positive integers below N are coprime.

We conclude this section with the following question: what is the probability
that two random integers are coprime? More precisely, when N gets large and
cN = #{1 ≤ x, y ≤ N: gcd(x, y) = 1}, we are interested in the numerical value of
cN /N 2 . Table 3.1 gives cN /N 2 for some values of N; it seems to approach a limit
which is a little larger than 3/5. In fact, the value is

cN 6 log N log N
∈ +O ≈ 0.6079271016 + O .
N 2 π2 N N
Interestingly, a similar approximation holds for the probability that a random inte-
ger is squarefree, so that it has no square divisor p2 :

#{1 ≤ x ≤ N: x is squarefree} 6 1
∈ 2 +O √ .
N π N
Exercises 4.18 and 14.32 answer the corresponding questions for polynomials over
a finite field.
In Figure 3.2, we see a two-dimensional coordinate system where the point
(x, y) ∈ N 2 for x, y ≤ 200 is colored white if gcd(x, y) = 1 and gray otherwise.
The intensity of a pixel is proportional to the number of prime factors in the gcd.
The probability that two random integers below 200 are coprime is precisely the
percentage of the area of the 200 × 200 pixels that is colored white. Thus about
3/5 of all pixels are white, and about 2/5 are gray.
If you hold the page horizontally in front of your eyes, you can see (almost)
white horizontal and vertical lines corresponding to prime values of x and y, and
dark lines through the origin corresponding to lines ax = by with small integers
a, b, the most clearly visible being the line x = y.

3.4. (Non-)Uniqueness of the gcd

The nonuniqueness of the gcd is a harmless nuisance from a mathematical point of
view. But in software, we have to implement a function gcd with a unique output.
In this section, we discuss one way of achieving this.
Since Q is a field, every nonzero rational number is a unit in Q, and so ua ∼ a
in R = Q[x] for all nonzero u ∈ Q and all a ∈ R. If we want to define a single
56 3. The Euclidean Algorithm

F IGURE 3.2: The greatest common divisors of x and y for 1 ≤ x, y ≤ 200.

element gcd( f , g) ∈ Q[x], which one should we choose? In other words, how do
we choose one representative from among all the multiples of a? A reasonable
choice is the monic polynomial, that is, the one with leading coefficient 1. Thus
if lc(a) ∈ Q \ {0} is the leading coefficient of a ∈ Q[x], then we take normal(a) =
a/ lc(a) as the normal form of a. (This has nothing to do with the “normal EEA”
on page 51.)
To make this work in an arbitrary Euclidean domain R, we assume that we have
selected some normal form normal(a) ∈ R for every a ∈ R so that a ∼ normal(a).
We call the unit u ∈ R with a = u· normal(a) the leading unit lu(a) of a. Moreover,
we set lu(0) = 1 and normal(0) = 0. The following two properties are required:
◦ two elements of R have the same normal form if and only if they are associate,
◦ the normal form of a product is equal to the product of the normal forms.
3.4. (Non-)Uniqueness of the gcd 57

These properties in particular imply that the normal form of any unit is 1. We say
that an element a in normal form, so that lu(a) = 1, is normalized.
In our two main applications, integers and univariate polynomials over a field,
we have natural normal forms. If R = Z, lu(a) = sign(a) if a 6= 0 and normal(a) =
|a| defines a normal form, so that an integer is normalized if and only if it is
nonnegative. When R = F[x] for a field F, then letting lu(a) = lc(a) (with the
convention that lu(0) = 1) and normal(a) = a/ lc(a) defines a normal form, and a
nonzero polynomial is normalized if and only if it is monic.
Given such a normal form, we define gcd(a, b) to be the unique normalized
associate of all greatest common divisors of a and b, and similarly lcm(a, b) as the
normalized associate of all least common multiples of a and b. Thus gcd(a, b) > 0
for R = Z and gcd(a, b) is monic for R = F[x] if at least one of a, b is nonzero, and
gcd(0, 0) = 0 in both cases. Lemma 3.4 then remains valid if we replace | · | by
normal(·).
In the polynomial case, it turns out that it is not only useful to have a normal
form for the gcd, but to modify the traditional Euclidean Algorithm so that all
the remainders ri are normalized. In Chapter 6, we will see that for R = Q[x] the
computations of the traditional Euclidean Algorithm produce remainders whose
coefficients have huge numerators and denominators even for inputs of moderate
size, and that the coefficients of the monic associates of the remainders are much
smaller (see pages 143 and 185). In this book, we will often use the following
variant of the traditional Extended Euclidean Algorithm 3.6 which works with
these monic associates.

A LGORITHM 3.14 Extended Euclidean Algorithm (EEA).

Input: f , g ∈ R, where R is a Euclidean domain with a normal form.
Output: ℓ ∈ N, ρi , ri , si ,ti ∈ R for 0 ≤ i ≤ ℓ+1, and qi ∈ R for 1 ≤ i ≤ ℓ, as computed
below.

1. ρ0 ←− lu( f ), r0 ←− normal( f ), s0 ←− ρ−1

0 , t0 ←− 0,
ρ1 ←− lu(g), r1 ←− normal(g), s1 ←− 0, t1 ←− ρ−1
1

2. i ←− 1
while ri 6= 0 do
qi ←− ri−1 quo ri
ρi+1 ←− lu(ri−1 − qi ri )
ri+1 ←− (ri−1 − qi ri )/ρi+1
si+1 ←− (si−1 − qi si )/ρi+1
ti+1 ←− (ti−1 − qiti )/ρi+1
i ←− i + 1
3. ℓ ←− i − 1
return ℓ, ρi , ri , si ,ti for 0 ≤ i ≤ ℓ + 1, and qi for 1 ≤ i ≤ ℓ
58 3. The Euclidean Algorithm

The elements ri for 0 ≤ i ≤ ℓ + 1 are the remainders, the qi for 1 ≤ i ≤ ℓ

are the quotients, and the elements ri , si , and ti form the ith row in the Ex-
tended Euclidean Algorithm, for 0 ≤ i ≤ ℓ + 1. The elements sℓ and tℓ , satisfying
sℓ f + tℓ g = gcd( f , g), are the Bézout coefficients of f and g.

E XAMPLE 3.7 (continued). (ii) With monic remainders, the following quanti-
ties are computed.

i qi ρi ri si ti
3 7 2 5 1 1
0 18 x − 3x + 3x − 3 18 0
1 x − 23 −12 x2 − 56 x + 1
6 0 1
− 12
2 x − 21 1
4 x− 1
3
2
9
1
3x− 2
1

3 1 0 − 29 x + 1
9 − 13 x2 + 2
3x− 3
1

From row 2, we find that

1 2 1 1
gcd( f , g) = x − = · (18x3 − 42x2 + 30x − 6) + x − (−12x2 + 10x − 2). ✸
3 9 3 2

The matrices Qi now become:

0 1
Qi = for 1 ≤ i ≤ ℓ.
ρ−1 −1
i+1 −qi ρi+1

L EMMA 3.15. (a) With the following modifications, all statements of Lemma 3.8
hold for the results of Algorithm 3.14.
(iii) gcd( f , g) = gcd(ri , ri+1 ) = rℓ ,
(v) siti+1 − ti si+1 = (−1)i (ρ0 · · · ρi+1 )−1 ,
(vi) gcd(ri ,ti ) = gcd( f ,ti ),
(vii) f = (−1)i ρ0 · · · ρi+1 (ti+1 ri − ti ri+1 ), g = (−1)i+1 ρ0 · · · ρi+1 (si+1 ri − si ri+1 ).
(b) If R = F[x] for a field F , deg f ≥ deg g, and ni = deg ri for all i, then the degree
formulas of Lemma 3.10 hold for the results of Algorithm 3.14 as well.

P ROOF. With the following changes, the proof of Lemma 3.8 goes through:

ri−1 0 1 ri−1 ri ri
Qi = = = ,
ri ρ−1 −1
i+1 −qi ρi+1 ri (ri−1 − qi ri )ρ−1
i+1 ri+1

qi ρi+1
Q−1
i = ,
1 0
3.4. (Non-)Uniqueness of the gcd 59

s0 t0
det Qi · · · det Q1 · det = (−1)i (ρ0 · · · ρi+1 )−1 ,
s1 t1

r0 ri i ti+1 −ti ri
= R−1
i = (−1) (ρ0 · · · ρi+1 ) .
r1 ri+1 −si+1 si ri+1
Statements (iii) and (vi) follow from the fact that all elements involved are normal-
ized. The proof of (b) is left as Exercise 3.21 (ii). ✷

We conclude this section with a cost analysis of the EEA for polynomials. It
turns out to be not more expensive than the traditional EEA.

T HEOREM 3.16.
For the monic normal form normal(h) = h/ lc(h) on F[x], the Extended Euclidean
Algorithm 3.14 for polynomials f , g ∈ F[x] with deg f = n ≥ deg g = m can be
performed with
◦ at most m + 2 inversions and 2nm + O(n) additions and multiplications in F if
only the quotients qi , the remainders ri , and the coefficients ρi are needed,
◦ at most m + 2 inversions and 6nm + O(n) additions and multiplications in F
for computing all results.

P ROOF. We proceed as in Section 3.3 and let ni = deg ri for 0 ≤ i ≤ ℓ + 1, with

rℓ+1 = 0, so that deg qi = ni−1 − ni for 1 ≤ i ≤ ℓ. Division with remainder of a
monic polynomial a by a monic polynomial b is slightly cheaper than a general
division; it takes only 2 deg b · (deg a − deg b) + deg b additions and multiplications
in F. Thus the cost for computing the quotient qi and the remainder on division
of ri−1 by ri is 2ni (ni−1 − ni ) + ni . If i < ℓ, we compute ρ−1 i+1 and multiply the
remainder by it to obtain ri+1 , taking one inversion plus ni+1 multiplications. Thus
the cost for computing all qi and ri+1 for 1 ≤ i ≤ ℓ is ℓ − 1 ≤ m inversions and

2n
∑ i i−1 i i + ∑ ni+1
(n − n ) + n
1≤i≤ℓ 1≤i<ℓ

additions and multiplications. For a normal degree sequence, where ni = m − i + 1

for 1 ≤ i ≤ ℓ = m + 1, this becomes

2m(n − m) + m + ∑ 4(m − i + 1) = 2m(n − m) + m + 2(m2 − m) = 2nm − m.

2≤i≤m+1

As in Section 3.3, the normal case is the worst case, normalizing f and g takes two
inversions and n + m multiplications, and (i) follows.
60 3. The Euclidean Algorithm

As for division with remainder, multiplying a monic polynomial a by a polyno-

mial b is a bit cheaper than a general multiplication; it takes only 2 deg a · deg b +
deg a additions and multiplications in F (Section 2.3). Thus computing qiti takes
2(ni−1 − ni )(n − ni−1 ) + ni−1 − ni operations, by Lemma 3.15 (b). Subtracting the
product from ti−1 and multiplying the result by ρ−1 i+1 to obtain ti+1 takes another
2(n − ni + 1) additions and multiplications. Hence the cost for computing all ti is

∑ 2(ni−1 − ni )(n − ni−1 ) + ni−1 − ni + 2(n − ni + 1)
1≤i≤ℓ

additions and multiplications in F. In the normal case, this simplifies to

∑ 2(n − (m − i + 2)) + 1 + 2(n − (m − i + 1) + 1)
1≤i≤m+1

= m+1+ ∑ 4(n − m + i − 1)
1≤i≤m+1

= m + 1 + 4(m + 1)(n − m) + 2(m2 + m) = 4nm − 2m2 + 4n − m + 1.

Exercise 3.22 (ii) shows that the cost for computing all si is at most 2m2 + 3m + 1
in the normal case. Again, the normal case is the worst case, and (ii) follows. ✷

Theorem 6.53 (i) in Section 6.11 shows that in the polynomial case, the results
of the traditional EEA and the results of Algorithm 3.14 are constant multiples of
each other.
Taking the positive or monic gcd in Z or F[x], respectively, is a reasonable
solution to the nonuniqueness problem. However, when you implement computer
algebra software, many other rings will be relevant, and often normalization is
not compatible across domains. For example, gcd(−10x, 5x2 ) is not really defined
unless we specify the domain R in Definition 3.3. Using R as a subscript, we
have—under normalization—gcdQ[x] (−10x, 5x2 ) = x, and ±5x are candidates for
gcdZ[x] (−10x, 5x2 ). A computer algebra system has to make an assumption here,
unless it allows the user to specify the domain; for our example, usually Z[x] is
assumed.
If R is a domain with a normal form normalR , then we get one for the polynomial
ring R[x] by setting

normalR (lc( f ))
normalR[x] ( f ) = · f,
lc( f )

where lc( f ) is the leading coefficient of f (Exercise 3.8 (iii)). Inductively, this
defines a normal form, and hence a unique gcd, for multivariate polynomials over
Z or over any field.
Notes 61

Notes. 3.1. The algorithm described in Euclid’s Elements does not use division with
remainder, but rather subtracts the smaller number g from the larger one until it becomes
smaller than g, and then swaps the two.
Allowing −∞ as a value of a Euclidean function d is a bit annoying and makes our two
main examples, integers and univariate polynomials over a field, look different. The proper
analogy between Z and F[x] goes as follows. We can take d(a) = |a| on Z and d(a) = 2deg a
on F[x], including d(0) = 0 in both cases; then d(ab) = d(a)d(b). Or, equivalently, we can
take d(a) = ⌊log2 |a|⌋ on Z (Exercise 3.5) and d(a) = deg a on F[x], with d(0) = −∞ in
both cases; then d(ab) is d(a) + d(b) (or d(a) + d(b) + 1 in Z).
3.2. The astronomical book Āryabhat.ı̄ya, written by Āryabhat.a in Sanskrit near the end
of the fifth century AD, contains an algorithm for computing from two coprime integers
f , g ∈ N two integers s,t such that s f + tg = 1. This problem is also solved in Bachet
(1612).
Exercise 3.25 discusses the binary Euclidean Algorithm of Stein (1967). Knuth (1998),
already in the second edition, states a binary EEA due to Michael Penk (Algorithm Y in
the Answers to Exercises of §4.5.2). Weilert (2000) adapts the binary Euclidean Algorithm
to the Gaussian integers.
Although the polynomial version of the (Extended) Euclidean Algorithm is conceptually
somewhat simpler, it is much younger (Stevin 1585; Newton 1707, page 38) than the
2000-year old integer algorithm. One reason for this is that we have a more intuitive
understanding of integers than we do of polynomials.
3.3. The fact that the number of division steps is maximal for Fibonacci numbers is Lamé’s
(1844) theorem. The scholarly work of Bach & Shallit (1996) contains more complete his-
torical information about this and many other topics in this book. The interesting paper
by Shallit (1994) points to three earlier analyses of the number of divisions in Euclid’s
algorithm: Reynaud (1824), Finck (1841), and Binet (1841); the latter allows negative
remainders, as in Exercise 3.13. Finck’s wording un problème qui [. . . ] a pour objet
de déterminer le nombre des opérations de la recherche du p.g.c.d. de deux nombres en-
tiers1 is a remarkably modern-sounding demand for the analysis of Euclid’s algorithm.
He gives the inequality√ri−1 > 2ri+1 that we used. Dupré (1846) gives √ the bounds of
about (log f )/ log((1 + 5)/2) for the ordinary and (log f )/ log(1 + 2) for Binet’s Eu-
clidean Algorithm (Exercise 3.30). Much earlier, Schwenter (1636), 86. Auffgab, calls
the Euclidean Algorithm for 770 020 512 197 390 and 124 591 930 070 091, with 32 divi-
sions, the arithmetical labyrinth, due to Simon Jacob von Coburg, and points to the Fi-
bonacci numbers as requiring many divisions in the Euclidean Algorithm. (The two large
integers are not Fibonacci numbers, and Schwenter says that their Euclidean Algorithm
requires 54 divisions; there is a calculation or copying mistake somewhere.) We have
gcd(Fn , Fm ) = Fgcd(n,m) ; see Exercise 3.31.
The average number of division steps in the Euclidean Algorithm for integers was inves-
tigated by Heilbronn (1968) and Dixon (1970), and in the binary algorithm (Exercise 3.25)
by Brent (1976); see Knuth (1998), §4.5.2, and Shallit (1994) for surveys. Those results
were all based on reasonable but unproven assumptions. The question was finally settled by
2
Vallée (2003); she gives average case analyses of several variations, with about π6 n many
divisions on average for the Euclidean Algorithm on n-bit numbers. For polynomials over
1 a problem that [. . . ] has as its goal to determine the number of operations in computing the gcd of two integers
62 3. The Euclidean Algorithm

finite fields, Ma & von zur Gathen (1990)) give worst case and average case analyses of
several variants of the Euclidean Algorithm.
The fact that two random integers are coprime with probability 6/π 2 is a theorem of
Dirichlet (1849). Dirichlet also proves the fact, surprising at first sight, that for fixed a
in a division the remainder r = a rem b, with 0 ≤ r < b, is more likely to be smaller
than b/2 than larger: If pa denotes the probability for the former, where 1 ≤ b ≤ a is
chosen uniformly at random, then pa is asymptotically 2 − ln 4 ≈ 61.37%. For Dirichlet’s
theorem, and also the corresponding statement about the probability of being squarefree
(due to Gegenbauer 1884), see Hardy & Wright (1985), §§18.5 and 18.6. A heuristic
argument goes as follows. A prime p divides a random integer x with probability 1/p, and
neither x nor y with probability 1 − 1/p2 . Hence gcd(x, y) = 1 happens with probability
ζ(2)−1 = ∏ p prime (1 − 1/p2 ) = 6/π 2 ; see Notes 18.4 for a discussion of Riemann’s zeta
function. The value of ζ(2) was determined by Euler (1734/35b, 1743); see Apostol (1983)
for a simple way of calculating this quantity.
3.4. The Euclidean Algorithm 3.14 with monic remainders (for univariate polynomials)
appears in the 1969 edition of Knuth (1998), and in Brown (1971).
The calculation of the Bézout coefficients via the EEA in general is in Euler (1748a),
§70. See also Notes 6.3. Gauß (1863b), articles 334 and 335, does this for polynomials in
F p [x], where p is prime.

Exercises.
3.1 Prove that two odd integers whose difference is 32 are coprime.
3.2 Let R be an integral domain. Show that

a ∼ b ⇐⇒ (a | b and b | a) ⇐⇒ hai = hbi,

where hai = Ra = {ra: r ∈ R} is the ideal generated by a.

3.3 Prove Lemma 3.4. Hint: For (v) and (vi), show that any divisor of the left hand side also divides
the right hand side, and vice versa. What are the corresponding statements for the lcm? Are they
also true?
3.4∗ Show that gcd(a, b) = 1 and gcd(a, c) = 1 imply that gcd(a, bc) = 1.
3.5∗∗ We consider the following property of a Euclidean function on an integral domain R:

d(ab) ≥ d(b) for all a, b ∈ R \ {0}. (9)

Our two familiar examples, the degree on F[x] for a field F and the absolute value on Z, both fulfill
this property. This exercise shows that every Euclidean domain has such a Euclidean function.
(i) Show that δ: Z −→ N with δ(3) = 2 and δ(a) = |a| if a 6= 3 is a Euclidean function on Z
violating (9).
(ii) Suppose that R is a Euclidean domain and D = {δ: δ is a Euclidean function on R}. Then D is
nonempty, and we may define a function d: R −→ N ∪ {−∞} by d(a) = min{δ(a): δ ∈ D}. Show that
d is a Euclidean function on R (called the minimal Euclidean function).
(iii) Let δ be a Euclidean function on R such that δ(ab) < δ(b) for some a, b ∈ R \ {0}. Find
another Euclidean function δ ∗ that is smaller than δ. Conclude that the minimal Euclidean function
d satisfies (9).
(iv) Show that for all a, b ∈ R \ {0} and a Euclidean function d satisfying (9), we have d(0) < d(a),
and d(ab) = d(b) if and only if a is a unit.
Exercises 63

(v) Let d be the minimal Euclidean function as in (ii). Conclude that d(0) = −∞ and the group of
units of R is R× = {a ∈ R \ {0}: d(a) = 0}.
(vi) Prove that d(a) = deg a is the minimal Euclidean function on F[x] for a field F, and that
d(a) = ⌊log2 |a|⌋ is the minimal Euclidean function on Z, with d(0) = −∞ in both cases.
3.6∗ (i) Show that each two nonzero elements a, b of a UFD R have a gcd as well as a lcm. You
may assume that a normal form on R is given (this is not a restriction, by Exercise 3.9). Hint: First
look at the special case R = Z, and use the factorizations of normal(a) and normal(b) into normalized
primes.
(ii) Prove that gcd(a, b) · lcm(a, b) = normal(a · b).
(iii) Conclude that lcm(a1 , . . ., an ) = normal(a1 · · ·an ) for any n nonzero elements a1 , . . ., an ∈ R
that are pairwise coprime (you might need Exercise 3.4).
(iv) Is gcd(a1 , . . ., an ) · lcm(a1 , . . ., an ) = normal(a1 · · ·an ) valid for arbitrary n ∈ N?
3.7∗ Let R be a Euclidean domain, with a Euclidean function d: R −→ N ∪ {−∞} that has the addi-
tional properties
◦ d(ab) = d(a) + d(b),
◦ d(a + b) ≤ max{d(a), d(b)}, with equality if d(a) 6= d(b),
◦ d is surjective,
for all a, b ∈ R. Prove that R is a polynomial ring with d as degree function. Proceed as follows:
(i) Prove that d(a) = −∞ if and only if a = 0.
(ii) Show that F = {a ∈ R: d(a) ≤ 0} is a subfield of R.
(iii) Let x ∈ R be such that d(x) = 1, and prove that every nonzero a ∈ R has a unique representation

a = an xn + an−1 xn−1 + · · · + a1 x + a0 ,

where n = d(a), a0 , . . ., an ∈ F, and an 6= 0.

Hint: You may find Exercise 3.5 (iv) useful.
3.8 (i) Find normal forms for Z[x] and for a field F.
(ii) Now let R be an integral domain with a normal form. Prove that lu(ua) = u · lu(a) holds for all
units u ∈ R and all a ∈ R. Show that a is normalized if and only if normal(a) = a.
(iii) Prove that luR[x] ( f ) = luR ( fn ) for a polynomial f = ∑0≤i≤n fi xi ∈ R[x] with fn 6= 0 defines a
normal form on R[x] which extends the given normal form on R. Describe this normal form when R
is a field.
(iv) Let a = bc ∈ R with normalized a, b. Prove that c is normalized if b 6= 0, and conclude that
normal(a/b) = normal(a)/ normal(b) holds for all a, b ∈ R such that a | b and b 6= 0.
(v) Starting with the usual normal form normalZ (a) = |a| on Z, we can use part (iii) of this exercise
in two ways to obtain a normal form on R = Z[x, y], namely by regarding R either as Z[x][y] or as
Z[y][x]. Determine both normal forms for the polynomial x − y ∈ R.
3.9∗ The goal of this exercise is to show that every UFD has a normal form.
(i) For a prime p ∈ N, define lu(p) = 1 if p 6= 2 and lu(2) = −1. Show that “lu” can be uniquely
extended to a normal form on Z.
(ii) Now let R be an arbitrary UFD and P ⊆ R a complete set of representatives for all non-associate
primes of R, so that every prime of R is associate to a unique p ∈ P. (The existence of such a set
is guaranteed by the axiom of choice in general.) Then every r ∈ R can be uniquely written as
r = u ∏ p∈S p for a finite subset S ⊆ R and a unit u ∈ R. Show that lu(r) = u defines a normal form
on R.
3.10 Are there s,t ∈ Z such that 24s + 14t = 1?
64 3. The Euclidean Algorithm

3.11 For each of the following pairs of integers, find their greatest common divisor using the Eu-
clidean Algorithm:
(i) 34, 21; (ii) 136, 51; (iii) 481, 325; (iv) 8771, 3206.
3.12 Show that {s f +tg: s,t ∈ Z} = {k · gcd( f , g): k ∈ Z} holds for all f , g ∈ Z. (In other words, the
two ideals h f , gi and hgcd( f , g)i are identical.)
3.13 The Euclidean Algorithm for integers can be slightly speeded up if it is permitted to carry out
divisions with negative remainders, so that ri−1 = ri qi + ri+1 with −|ri /2| < ri+1 ≤ |ri /2|. Do the
four examples in Exercise 3.11 using this method.
3.14 Use the Extended Euclidean Algorithm to find gcd( f , g), for f , g ∈ Z p [x] in each of the fol-
lowing examples (arithmetic in Z p = {0, . . ., p − 1} is done modulo p). In each case compute the
corresponding polynomials s and t such that gcd( f , g) = s f + tg.
(i) f = x3 + x + 1, g = x2 + x + 1 for p = 2 and p = 3.
(ii) f = x4 + x3 + x + 1, g = x3 + x2 + x + 1 for p = 2 and p = 3.
(iii) f = x5 + x4 + x3 + x + 1, g = x4 + x3 + x2 + x + 1 for p = 5.
(iv) f = x5 + x4 + x3 − x2 − x + 1, g = x3 + x2 + x + 1 for p = 3 and p = 5.
3.15 Show that the si and ti in the traditional Extended Euclidean Algorithm for inputs f , g ∈ Z
with f > g > 0 alternate in sign, so that s2i and t2i−1 are positive and s2i+1 and t2i are negative
for all admissible values of i ≥ 1. Conclude that 0 = s1 < 1 = s2 ≤ |s3 | < |s4 | < · · · < |sℓ+1 | and
0 = t0 < 1 = t1 ≤ |t2 | < |t3 | < · · · < |tℓ+1 |.
3.16 Let R be a Euclidean domain, a, b, c ∈ R, and gcd(a, b) = 1. Prove the following:
(i) a | bc =⇒ a | c,
(ii) a | c and b | c =⇒ ab | c.
Hint: You may want to use the fact that the Extended Euclidean Algorithm computes s,t ∈ R such
that sa + tb = 1.
3.17 Prove that Z[x] is not a Euclidean domain. Hint: If it were, then we could compute s,t ∈ Z[x]
such that s · 2 + t · x = gcd(2, x), using the Extended Euclidean Algorithm.
3.18∗ Let R = F[x] for a field F and

[
S= (F \ {0})ℓ+1 × (R \ {0})2 × {q ∈ R: deg q > 0, q monic}ℓ−1 .
ℓ≥1

The Euclidean representation of a pair ( f , g) ∈ (R \ {0})2 with deg f ≥ deg g is defined as the list
(ρ0 , . . ., ρℓ , rℓ , q1 , . . ., qℓ ) ∈ S formed from the results of the Euclidean Algorithm. Show that the map

{( f , g) ∈ R2 : deg f ≥ deg g and g 6= 0} −→ S

which maps a pair of polynomials ( f , g) to its Euclidean representation is a bijection.

3.19∗ (i) Show that the norm N: Z[i] −→ N with N(α) = αα on the ring of Gaussian integers Z[i]
is a Euclidean function. Hint: Consider the exact quotient of two Gaussian integers α, β ∈ Z[i] in C.
(ii) Show that the units in Z[i] are precisely the elements of norm 1 and enumerate them.
(iii) Prove that there is no multiplicative normal form on Z[i] which extends the usual normal form
normal(a) = |a| on Z. Hint: Consider normal((1 + i)2 ). Why is normal(a + ib) = |a| + i|b| for
a, b ∈ Z not a normal form?
(iv) Compute all greatest common divisors of 6 and 3 + i in Z[i] and their representations as a
linear combination of 6 and 3 + i.
(v) Compute a gcd of 12 277 and 399 + 20i.
Exercises 65

3.20∗ Let x1 , x2 , . . . be countably many indeterminates over Z, R = Z[x1 , x2 , . . .],

0 1
Qi = ∈ R2×2
1 xi

for i ≥ 1, and Ri = Qi · · ·Q1 . We define the ith continuant polynomial ci ∈ R recursively by c0 = 0,

c1 = 1, and ci+1 = ci−1 + xi ci for i ≥ 1. Then ci ∈ Z[x1 , . . ., xi−1 ] for i ≥ 1.
(i) List the first 10 continuant polynomials.
(ii) Let T be the “shift homomorphism” T xi = xi+1 for i ≥ 1. Show that ci+2 (0, x1 , x2 , . . ., xi ) = T ci
for i ≥ 0.

T ci−1 ci
(iii) Show that Ri = for i ≥ 1.
Tci ci+1
(iv) Show that det Ri = (−1)i , and conclude that gcd(ci , ci+1 ) = 1 for i ≥ 0.
(v) Let D be a Euclidean domain and ri , qi , si ,ti ∈ D for 0 ≤ i ≤ ℓ the results of the traditional
Extended Euclidean Algorithm for r0 , r1 . Show that

si = ci−1 (−q2 , . . ., −qi−1 ) = (−1)i ci−1 (q2 , . . ., qi−1 ),

ti = ci (−q1 , . . ., −qi−1 ) = (−1)i−1 ci (q1 , . . ., qi−1 )

for 1 ≤ i ≤ ℓ.
(vi) Write a M APLE program that implements the traditional Extended Euclidean Algorithm and
additionally computes all continuants ci (qℓ−i+2 , . . ., qℓ ) for r0 = x20 and r1 = x19 + 2x18 + x in Q[x],
where q1 , . . ., qℓ are the quotients in the traditional Extended Euclidean Algorithm.
3.21 (i) Prove Lemma 3.10 (6) for the traditional EEA 3.6. Hint: Since q1 may be constant, it is
wise to start the induction with i = 3 and show the cases i = 1 and i = 2 separately.
(ii) Prove Lemma 3.10 for the Extended Euclidean Algorithm 3.14.
3.22 (i) Show that for polynomials f , g ∈ F[x] of degrees n ≥ m, where F is a field, computing
all entries si in the traditional Extended Euclidean Algorithm from the quotients qi takes at most
2m2 + 2m additions and multiplications in F. Hint: Exhibit the bound for the normal case and prove
that this is the worst case.
(ii) Prove that the corresponding estimate for the Extended Euclidean Algorithm is 2m2 + 3m + 1.
3.23 Prove Lemma 3.12. Hint: Use Lemma 3.8 and Exercise 3.15.
3.24∗ Prove Theorem 3.13.
3.25∗ We consider the following recursive algorithm for computing the gcd of two integers.
A LGORITHM 3.17 Binary Euclidean Algorithm.
Input: a, b ∈ N>0 .
Output: gcd(a, b) ∈ N.
1. if a = b then return a
2. if both a and b are even then return 2 · gcd(a/2, b/2)
3. if exactly one of the two numbers, say a, is even then return gcd(a/2, b)
4. if both a and b are odd and, say, a > b, then return gcd((a − b)/2, b)
(i) Run the algorithm on the examples of Exercise 3.11.
(ii) Prove that the algorithm works correctly.
(iii) Find a “good” upper bound on the recursion depth of the algorithm, and show that it takes
O(n2 ) word operations on inputs of length at most n.
(iv) Modify the algorithm so that it additionally computes s,t ∈ N such that sa + tb = gcd(a, b).
66 3. The Euclidean Algorithm

3.26∗ Adapt the algorithm from Exercise 3.25 to polynomials over a field. Hint: Start with F2 [x].
3.27 Let Fn and Fn+1 be consecutive terms in the Fibonacci sequence. Show that gcd(Fn+1 , Fn ) = 1.
3.28 (i) Prove the formula
1
Fn = √ (φn+ − φn− ) for n ∈ N (10)
5
√
for the
√ Fibonacci numbers, where φ+ = (1 + 5)/2 ≈ 1.618 is the golden √ ratio and φ− = −1/φ+ =
(1 − 5)/2 ≈ −0.618. Conclude that Fn is the nearest integer to φn+ / 5 for all n.
(ii) For n ∈ N>0 , let kn = [1, . . ., 1] be the continued fraction of length n with all entries equal to 1
(Section 4.6). Prove that kn = Fn+1 /Fn , and conclude that limn−→∞ kn = φ+ .
3.29∗ This continues Exercise 3.28.
(i) Let h = ∑n≥0 Fn xn ∈ Q[[x]] be the formal power series whose coefficients are the Fibonacci
numbers. Derive a linear equation for h from the recursion formula for the Fibonacci numbers and
solve it for h. (It will turn out that h is a rational function in x.)
(ii) Compute the partial fraction expansion (Section 5.11) of h and use it to prove (10) again by
employing the formula ∑n≥0 xn = 1/(1 − x) for the geometric series and comparing coefficients.
3.30∗ In the least absolute remainder variant of the Euclidean Algorithm for integers (Exercise
3.13), all quotients qi (with the possible exception of q1 ) are at least two in absolute value. Thus
the nonnegative integers with the largest possible number of division steps in this variant, that is, the
analog of the Fibonacci numbers in Lamé’s theorem, are recursively defined by
G0 = 0, G1 = 1, Gn+1 = 2Gn + Gn−1 for n ≥ 1.
(i) Find a closed form expression similar to (10) for Gn . Hint: Proceed as in Exercise 3.29.
(ii) Derive a tight upper bound on the length ℓ of the least absolute remainder Euclidean Algorithm
for two integers f , g ∈ N with f > g in terms of log g, and compare it to (8).
3.31∗ For n ∈ N, let Fn be the nth Fibonacci number, with F0 = 0 and F1 = 1. Prove or disprove that
the following properties hold for all n, k ∈ N.
(i) Fn+k+1 = Fn Fk + Fn+1 Fk+1 ,
(ii) Fk divides Fnk ,
(iii) gcd(Fnk+1 , Fk ) = 1 if k ≥ 1 (hint: Exercise 3.27),
(iv) Fn rem Fk = Fn rem k if k ≥ 1,
(v) gcd(Fn , Fk ) = gcd(Fk , Fn rem k ) if k ≥ 1 (hint: Exercise 3.16),
(vi) gcd(Fn , Fk ) = Fgcd(n,k) .
(vii) Conclude from (i) that Fn can be calculated with O(log n) arithmetic operations in Z.
(viii) Generalize your answers to Lucas sequences (Ln )n≥0 of the form L0 = 0, L1 = 1, and Ln+2 =
aLn+1 + Ln for n ∈ N, where a ∈ Z is a fixed constant.
3.32∗ We define the sequence f0 , f1 , f2 , . . . ∈ Q[x] of monic polynomials by
◦ gcd( fn , fn−1 ) = 1 for n ≥ 1,
◦ for every n ≥ 1 the number of division steps in the Euclidean Algorithm for ( fn , fn−1 ) is n, and
all quotients are equal to x.
(i) What are the remainders in the Euclidean Algorithm for fn and fn−1 ? What are the ρi ? Find a
recursion for the fn . What is the degree of fn ?
(ii) What is the connection between the fn and the Fibonacci numbers?
(iii) State and prove a theorem saying that the number of division steps in the Euclidean Algorithm
for the pair ( fn , fn−1 ) is maximal. Make explicit what you mean by maximal.
3.33 Let R be a ring, and f , g, q, r ∈ R[x] with g 6= 0, f = qg + r, and deg r < deg g. Prove that q and
r are unique if and only if lc(g) is not a zero divisor.
Die Musik hat viel Aehnlichkeit mit der Algeber.1
Novalis (1799)

Ein Mathematiker, der nicht etwas Poet ist,

wird nimmer ein vollkommener Mathematiker sein.2
Karl Theodor Wilhelm Weierstraß (1883)

There remain, therefore, algebra and arithmetic

as the only sciences, in which we can carry on
a chain of reasoning to any degree of intricacy,
and yet preserve a perfect exactness and certainty.
David Hume (1739)

The science of algebra, independently of any of its uses, has all the
advantages which belong to mathematics in general as an object of
study, and which it is not necessary to enumerate. Viewed either as a
science of quantity, or as a language of symbols, it may be made of the
greatest service to those who are sufficiently acquainted with
arithmetic, and who have sufficient power of comprehension
to enter fairly upon its difficulties.
Augustus De Morgan (1837)

J
®m ' B I.K
Q® K ñëð
é<Ë@ B@ AëPðX ÕÎªK
Bð ½Ë X é®J
®k úÎ« Yg@ ®K Bð

éJ®J
®k
úÎ« ¯ñJ ¯ Õæ® JÒK
Ë ¡m Ì '@ àB

.

ÕæB@ P Yg. ú¯ ÉJ
¯ AÒ» I.K
Q®K ½Ë X ÉJ
¯ AÒK@ð
é<Ë@ B@ éÒÊªK B
èP Yg àB

. J
®m' B I.K
Q®K éK@ 3
Abū Ja֒far Muh.ammad bin Mūsā al-Khwārizmı̄ (c. 830)

1 Music has much resemblance to algebra.

2 A mathematician who is not somewhat of a poet will never be a perfect mathematician.
3 And this is an approximation and not a precise determination [of π]. Nobody can determine the exact value of
that and know the circumference, except God. For this curve [the circle] is not straight and cannot be determined
except approximately. That is called an approximation, just as the root of a number is an approximation and not
the exact value; nobody knows it except God.
4
Applications of the Euclidean Algorithm

This chapter presents several applications of the Extended Euclidean Algorithm:

modular arithmetic, in particular modular inverses; linear Diophantine equations;
and continued fractions. The latter in turn are useful for problems outside of com-
puter algebra: devising astronomical calendars and musical scale systems.

4.1. Modular arithmetic

We start with some applications. The first one is checking programs for correct-
ness. In Part II of this book, we will see extremely fast algorithms for multiplica-
tion of large integers. These methods are also considerably more complicated than
classical multiplication, and an implementation quite error-prone. So we may want
to test correctness on many inputs. We take inputs a and b, say positive integers of
10 000 words each, and the output c of 20 000 words. Can we check that a · b = c
without using our own software?
The solution is a modular test. We take a single-precision prime p and check
whether a· b ≡ c mod p (read “a· b and c are congruent modulo p”), which means
that a · b − c is divisible by p, or equivalently, a · b and c have the same remainder
on division by p. By (1) below, it is sufficient for this purpose to compute the
remainders a∗ = a rem p, b∗ = b rem p, c∗ = c rem p and check whether a∗ · b∗ ≡
c∗ mod p, since a · b ≡ a∗ · b∗ mod p. If this fails to hold, then clearly somewhere
there is an error. How reliable is this test?
Of course, it can happen that ab 6= c and a∗ b∗ ≡ c∗ mod p. This happens if and
only if ab − c 6= 0 is divisible by p. If each of our primes is at least 263 , then the
product of k of them at least 263·k . Now |ab − c| is a number with not more than
20 000 words, and hence divisible by at most log263 (264·20 000 ) ≤ 20 318 different
primes. If we have a data base of 40 636 single-precision primes and choose p at
random from these, then the test will fail to detect a software error with probabil-
ity at most 1/2 (assuming that the test itself is correctly implemented). There is
an abundant supply of primes: over 2 · 1017 64-bit primes, and more than 90 mil-
lion 32-bit primes, by the prime number theorem 18.7 (see also Exercise 18.18).

69
70 4. Applications of the Euclidean Algorithm

By standard tricks, such as rerunning the test or choosing a larger data base, this
probability can be made arbitrarily small.
The technique can also be used to test equalities like f · g = h for polynomials
f , g, h, by substituting a random value, or A · B = C for matrices A, B,C, by evalu-
ating at a random vector.
This fingerprinting method can even be applied to problems outside the alge-
braic realm, by “arithmetizing” combinatorial problems. Suppose that one main-
tains a large data base in North America and a mirror image in Europe, by per-
forming all updates on both. Each night, one wants to check whether they indeed
are identical. Sending the whole data base would take too long. So one considers
the data base as a string of words, many gigabytes long, and the (large) number
a whose 264 -ary representation this is. Then one chooses a prime p, computes
a rem p and sends this to the mirror site. The corresponding calculation is per-
formed on the other data base, and the two results are compared. If they disagree,
then the two data bases differ. If they agree, then probably the two data bases
are identical, provided p was chosen appropriately. This can be set up so that the
size of the transmitted message is only logarithmic in the size of the data bases.
Exercise 4.3 asks you to apply this method to the more general problem of string
matching.
Division with remainder of a large number by a small number is easy (Exer-
cise 4.1), and you are familiar with one particularly simple example: the remainder
of a number modulo 9 (or 3) equals the remainder of the sum of its decimal digits.
In particular, the number is divisible by 9 (or 3) if and only if this sum is. Why
does this work? Let a = ∑0≤i<l ai · 10i be the decimal representation of a ∈ N.
Since 10 ≡ 1 mod 9 (and mod 3), we have a ≡ ∑0≤i<l ai · 1i = ∑0≤i<l ai mod 9
(and mod 3) (see Exercise 4.4 for remainders modulo 11).
Computing with remainders of arithmetic expressions modulo some nonzero in-
teger is called modular arithmetic. Given an expression e involving integers and
arithmetic operations +, −, ·, we can compute e modulo some number m very effi-
ciently by first reducing all integers modulo m and then, step by step, performing
an arithmetic operation in Z and immediately reducing the result modulo m again,
as we have done in the examples above. Here is another one:

e = 20 · (−89) + 32 ≡ 6 · 2 + 4 ≡ 12 + 4 ≡ 5 + 4 ≡ 9 ≡ 2 mod 7.

In this way, the intermediate results never exceed m2 . The basic rules for comput-
ing with congruences are

a ≡ b mod m =⇒ a ∗ c ≡ b ∗ c mod m, (1)

where a, b, c ∈ Z and ∗ is one of +, −, ·. Using these rules inductively, we may re-

place numbers by congruent ones in any arithmetic expression involving additions,
subtractions, and multiplications without changing the result modulo m. In other
4.1. Modular arithmetic 71

words, we may manipulate congruences as we are used to manipulating equations,

with one important exception: we have to be careful about cancellation and divi-
sion. For example, we have 0 · 2 ≡ 2 · 2 ≡ 0 mod 4, but 0 6≡ 2 mod 4.
We can also perform modular arithmetic with polynomials over a ring R. Here
the simplest nontrivial modulus is a linear polynomial x − u with u ∈ R. For any
polynomial f ∈ R[x], the polynomial f (x) − f (u) has u as a zero and hence is
divisible by x − u. If we let q = ( f (x) − f (u))/(x − u), then f = q · (x − u) + f (u),
and since f (u) is a constant polynomial, its degree is less than 1 = deg(x − u).
By the uniqueness of the remainder, f (u) is the remainder of f on division by
x − u, that is, ( f rem (x − u)) = f (u) and f ≡ f (u) mod (x − u). Thus calculating
modulo x − u is the same as evaluating at u.
This can be used to check equality of polynomial expressions, as follows. Sup-
pose that we want to know whether the two polynomials f = (x − 1)(x − 2) ·
(x − 3)(x − 4)(x − 5) + 1 and g = x5 − 15x4 + 85x3 − 223x2 + 274x − 119 are equal.
To check this, we evaluate them at some number u. Let us try u = 0. This is
easy for g, since g rem x = g(0) = −119 is just its constant coefficient (this is
analogous to computing a rem 10 for an integer a given in decimal representa-
tion), and f rem x = f (0) = (−1)(−2)(−3)(−4)(−5) + 1 = −120 + 1 = −119,
so this does not hinder f from being equal to g. If we take u = 1, then computing
f rem (x − 1) = f (1) = 1 is easy, since the first summand contains the linear factor
x − 1. Now g rem (x − 1) = g(1) = 1 − 15 + 85 − 223 + 274 − 119 = 3 (this is like
computing the sum of the decimal digits of an integer a; here x ≡ 1 mod (x − 1)
and hence xi ≡ 1i = 1 mod (x − 1) for i ∈ N), and we conclude that f 6= g. In fact,
we would have even found this out if we had done the whole computation modulo
the polynomial x −1 and modulo the integer 5, since g(1) ≡ 1−0+0−3+4−4 ≡
3 6≡ 1 = f (1) mod 5.
If our polynomial modulus is not linear, then things are a bit more complicated.
For example, if m = x2 − x − 1, then the remainder of a polynomial f on division
by m is a unique polynomial of degree at most one. For f = x3 , we have x3 ≡ 2x+1
mod m, x2 + 2x ≡ 3x + 1 mod m, and hence

(x3 + 1)(x2 + 2x) − x3 ≡ ((2x + 1) + 1)(3x + 1) − (2x + 1)

= (6x2 + 8x + 2) − (2x + 1) ≡ (14x + 8) − (2x + 1)
= 12x + 7 mod x2 − x − 1,

since 6x2 + 8x + 2 ≡ 14x + 8 mod m.

The mathematical concept behind modular arithmetic is the residue class ring.
If R is a ring (say R = Z or R = F[x]) and m ∈ R, then hmi = mR = {mr: r ∈ R},
the set of all multiples of m, is the ideal generated by m, and the residue class ring
R/mR = R/hmi = { f mod m: f ∈ R} consists of all residue classes f mod m =
{ f +mr: r ∈ R} for f ∈ R. (The notation f mod m for the residue class is somewhat
unfortunate since it may lead to confusion with congruences like f ≡ g mod m,
72 4. Applications of the Euclidean Algorithm

but it is nevertheless widely used.) For example, if we take R = Z and m = 5, then

3 mod 5 = {. . . , −12, −7, −2, 3, 8, 13, . . .} is one residue class of the residue class
ring Z/5Z = {0 mod 5, 1 mod 5, 2 mod 5, 3 mod 5, 4 mod 5}. The elements of 3
mod 5 can be characterized as those integers having remainder 3 on division by 5,
and any two of them are congruent modulo 5. We also write Zm for Z/mZ. If R =
Q[x] and m = x2 − x − 1, then 12x + 7 mod m consists of all polynomials having
remainder 12x + 7 on division by x2 − x − 1, and any two of them are congruent
modulo x2 − x − 1.
Computationally, we work with a system of representatives (Section 25.2), so
that the residue class f mod m is represented by its smallest nonnegative element
in the case R = Z and by its unique polynomial of least degree in the case R =
F[x]; in both cases, the representative equals f rem m. In the integer case, we
will often ignore the somewhat cumbersome distinction between residue classes
and representatives. For example, we identify Z/5Z and Z5 = {0, 1, 2, 3, 4} and
compute with the elements of the latter modulo 5. (The symmetric system of
representatives of least absolute value is also useful, for example, {−2, −1, 0, 1, 2}
for Z/5Z.) In the polynomial case, however, we will keep this distinction.
By applying the canonical ring homomorphism a 7−→ a mod m, congruences
in R (like 4 · 2 ≡ 3 mod 5 in Z or x2 · x ≡ 4 mod (x3 − 4) in Q[x]) become equalities
in R/mR (4 · 2 = 3 in Z5 or α2 · α = 4 in Q[x]/hx3 − 4i, where α = x mod x3 − 4),
and the rules (1) guarantee that we may work freely with representatives in these
equations as long as we stay within the same residue class.
As indicated above, we will often perform computations modulo polynomials
and modulo integers, that is, in “double” residue class rings of the form Zm [x]/h f i
for some polynomial f ∈ Zm [x]. Then we have two levels of computation: co-
efficient operations, which are performed modulo m, and polynomial operations,
which are performed modulo f and are defined in terms of coefficient operations.
For example,
(4x + 1)(3x2 + 2x) = 12x3 + 11x2 + 2x in Z[x],
and hence
(4x + 1)(3x2 + 2x) = 2x3 + x2 + 2x in Z5 [x].
Now division with remainder yields
2x3 + x2 + 2x = 2 · (x3 + 4x) + x2 − 6x in Z[x],
3 2 3 2
2x + x + 2x = 2 · (x + 4x) + x + 4x in Z5 [x].
Thus
(4x + 1)(3x2 + 2x) ≡ x2 + 4x mod (x3 + 4x) in Z5 [x],
or equivalently,
(4α + 1)(3α2 + 2α) = α2 + 4α in Z5 [x]/hx3 + 4xi,
4.2. Modular inverses via Euclid 73

where α = (x mod (x3 + 4x)) ∈ Z5 [x]/hx3 + 4xi. We have done the calculations in
detail in this example to illustrate the principle; later we will suppress the details.

4.2. Modular inverses via Euclid

We have seen in the previous section how modular addition and multiplication
works. What about inversion and division? Do expressions like a−1 mod m and
a/b mod m make sense, and if so, how can we compute their value? The following
theorem gives an answer when the underlying ring R is a Euclidean domain.

T HEOREM 4.1.
Let R be a Euclidean domain, a, m ∈ R, and S = R/mR. Then a mod m ∈ S is a
unit if and only if gcd(a, m) = 1. In this case, the modular inverse of a mod m can
be computed by means of the Extended Euclidean Algorithm.

P ROOF. We have

a is invertible modulo m ⇐⇒ ∃s ∈ R sa ≡ 1 mod m

⇐⇒ ∃s,t ∈ R sa + tm = 1 =⇒ gcd(a, m) = 1.

If, on the other hand, gcd(a, m) = 1, then the Extended Euclidean Algorithm pro-
vides such s,t ∈ R. ✷

E XAMPLE 4.2. We let R = Z, m = 29, and a = 12. Then gcd(a, m) = 1, and the
Extended Euclidean Algorithm computes 5·29+(−12)·12 = 1. Thus (−12)·12 ≡
17 · 12 ≡ 1 mod 29, and hence 17 is the inverse of 12 modulo 29. ✸

E XAMPLE 4.3. Let R = Q[x], m = x3 − x + 2, and a = x2 . The last row in the

Extended Euclidean Algorithm for m and a is
1 1 3 1 1 1
x+ (x − x + 2) + − x2 − x + x2 = 1,
4 2 4 2 4
and (−x2 − 2x + 1)/4 is the inverse of x2 modulo x3 − x + 2. ✸

A consequence of Theorem 4.1 is that if S = Z p for a prime p ∈ N or S =

F[x]/h f i for a field F and an irreducible polynomial f ∈ F[x], so that f has no
nonconstant factor of degree less than deg f , then any element of S\{0} is a unit,
so that S is a field. We will use the notation F p for the finite field Z p with p
elements throughout the rest of this book. More generally, if f ∈ Z p [x] = F p [x]
is an irreducible polynomial of degree n, then F p [x]/h f i is a finite field Fq with
q = pn elements; Section 25.4 gives some basic properties of finite fields. In fact,
74 4. Applications of the Euclidean Algorithm

this construction works for any prime power q, namely, there exist irreducible
polynomials in Fq [x] of any degree, and any two irreducible polynomials of the
same degree lead to isomorphic fields.

E XAMPLE 4.4. Let R = F5 [x], f = x3 − x + 2, and a = x2 . Then f has no zeroes

in F5 , and hence is irreducible since its degree is 3 (Exercise 4.33). Thus F125 =
F5 [x]/h f i is a field. The last row in the Extended Euclidean Algorithm for f and a
in F5 [x] is
(−x − 2)(x3 − x + 2) + (x2 + 2x − 1)x2 = 1.
Hence x2 + 2x − 1 is the inverse of x2 modulo f . If we write α = x mod f for short,
then α2 + 2α − 1 = (α2 )−1 in F125 . ✸

The following lemma says that computing modulo an irreducible polynomial f

is the same as “adjoining a root of f ”.

L EMMA 4.5. Let F be a field, f ∈ F[x] a monic nonconstant irreducible poly-

nomial, and K = F[x]/h f i. Then K is an extension field of F , and f (α) = 0 for
α = (x mod f ) ∈ K .

P ROOF. K is a field according to Theorem 4.1, and

f (α) = f (x mod f ) = ( f (x) mod f ) = 0. ✷

The following two corollaries are a consequence of the analyses in Chapter 2

and Theorems 3.13, 3.16, and 4.1.

C OROLLARY 4.6.
Let F be a field, and f ∈ F[x] of degree n ∈ N. One arithmetic operation in
F[x]/h f i, that is, addition, multiplication, or division by an invertible element,
can be done using O(n2 ) arithmetic operations in F . More precisely, we have at
most 4n2 + O(n) operations for a multiplication modulo f and at most 6n2 + O(n)
operations for an inversion modulo f .

C OROLLARY 4.7.
One arithmetic operation in Zm , where m ∈ N>0 and n = λ(m) = ⌊(log2 m)/64⌋+1
is the length of m in the standard representation, can be done using O(n2 ) word
operations.

If m is a positive integer, then the set Z×

m of those elements of Zm that have multi-
plicative inverses is a group under multiplication, the group of units of the ring Zm
4.3. Repeated squaring 75

(Sections 25.1 and 25.2). Theorem 4.1 says that Z×

m = {a mod m: gcd(a, m) = 1}.
Euler’s totient function ϕ: N>0 −→ N>0 counts the number of elements in Z×m:

ϕ(m) = #Z×
m = #{0 ≤ a < m: gcd(a, m) = 1}.

By convention, ϕ(1) = 1. If m = p is a prime, then all nonzero elements of Z p = F p

are invertible, and hence ϕ(p) = p − 1. More generally, if m = pe is a prime power,
with e ≥ 1, then Theorem 4.1 implies that a mod pe is invertible if and only if
p does not divide a. Thus there are precisely pe−1 nonunits in Z pe , namely the
elements bp mod pe with 0 ≤ b < pe−1 , and we obtain

ϕ(pe ) = pe − pe−1 = (p − 1)pe−1 . (2)

In Section 5.4, we will derive a formula for ϕ(m) when m is arbitrary from the
Chinese Remainder Theorem. Exercise 4.19 discusses the analog of Euler’s totient
function for polynomials over a finite field.

4.3. Repeated squaring

An important tool for modular exponentiation is repeated squaring (or square and
multiply). In fact, this technique works in any set with an associative multiplica-
tion, but we will mainly use it in residue class rings.

A LGORITHM 4.8 Repeated squaring.

Input: a ∈ R, where R is a ring with 1, and n ∈ N>0 .
Output: an ∈ R.

1. { binary representation of n }
write n = 2k + nk−1 · 2k−1 + · · · + n1 · 2 + n0 , with all ni ∈ {0, 1}
bk ←− a

2. for i = k − 1, k − 2, . . . , 0 do
if ni = 1 then bi ←− b2i+1 a else bi ←− b2i+1

3. return b0

i
Correctness follows easily from the invariant bi = a⌊n/2 ⌋ . This procedure uses
⌊log n⌋ squarings plus w(n) − 1 ≤ ⌊log n⌋ multiplications in R, where log is the
binary logarithm and w(n) is the Hamming weight of the binary representation
of n (Chapter 7), that is, the number of ones in it. Thus the total cost is at most
2 log n multiplications. For example, the binary representation of 13 is 1 · 23 +
1 · 22 + 0 · 2 + 1 and has Hamming weight 3. Thus a13 would be computed as
76 4. Applications of the Euclidean Algorithm

((a2 · a)2 )2 · a, using three squarings and two multiplications. If R = Z17 = Z/h17i
and a = 8 mod 17, then we compute 813 mod 17 as

813 ≡ ((82 · 8)2 )2 · 8 ≡ ((−4 · 8)2 )2 · 8

≡ (22 )2 · 8 = 42 · 8 ≡ −1 · 8 = −8 mod 17,

which is much faster than first evaluating 813 = 549 755 813 888 and then divid-
ing by 17 with remainder. This method was already used by Euler (1761). He
calculated 7160 mod 641 by computing 72 , 74 , 78 , 716 , 732 , 764 , 7128 , 7160 = 7128 · 732 ,
reducing modulo 641 after each step. (He also listed, unnecessarily, 73 .) As an-
3
other example, starting from 22 = 28 = 256, we only need two squarings modulo
5
5 · 27 + 1 = 641 to calculate ((28 )2 )2 = 22 ≡ −1 mod 641. This shows that 641
5
divides the fifth Fermat number F5 = 22 + 1, as discovered by Euler (1732/33);
5
see Sections 18.2 and 19.1. Even if we were given the 10-digit number 22 + 1 =
4 294 967 297, it would seem more laborious to divide it by 641 with remainder
rather than to use modular repeated squaring.
There are clever ways of reducing the cost to (1 + o(1)) log n multiplications
(Exercise 4.21), where o(1) goes to zero as n gets large. On the other hand, start-
ing with an indeterminate x and using d multiplications or additions, one can only
compute polynomials of degree at most 2d , and thus ⌈log n⌉ multiplications are
indeed necessary to obtain xn . However, when x is not an indeterminate but from a
well-structured domain, one can sometimes exploit that structure for faster expo-
nentiation algorithms. We will see an example in the iterated Frobenius algorithm
of Section 14.7. Particularly important for cryptographic applications are methods
based on normal bases and Gauß periods in finite fields.

4.4. Modular inverses via Fermat

Let p ∈ N be a prime and a, b ∈ Z. Then the binomial theorem implies that
p p(p − 1) p−2 2
(a + b) p = ∑ a p−i bi = a p + pa p−1 b + a b + · · · + bp.
0≤i≤p i 2

If 0 < i < p, then the binomial coefficient

p p!
=
i i!(p − i)!
is divisible by p, since the numerator is and the denominator is not. Thus we
have the surprising consequence that (a + b) p ≡ a p + b p mod p, or equivalently,
(α + β ) p = α p + β p in Z p , with α = a mod p and β = b mod p (“the Freshman’s
dream”). More generally, we obtain by induction on i that
i i i
(a + b) p ≡ a p + b p mod p for all i ∈ N. (3)
4.5. Linear Diophantine equations 77

Using this property, we obtain a pretty, elementary proof of the following famous
number-theoretic theorem, which—in a more general form—will have many ap-
plications in factoring polynomials and primality testing (Chapters 14 and 18).

T HEOREM 4.9 Fermat’s little theorem.

If p ∈ N is prime and a ∈ Z, then a p ≡ a mod p and, if p ∤ a, then a p−1 ≡ 1 mod p.

P ROOF. It is sufficient to prove the claim for a ∈ {0, . . . , p − 1}, which we do by

induction on a. The case a = 0 is trivial, and for a > 1, we have

a p = ((a − 1) + 1) p ≡ (a − 1) p + 1 p ≡ (a − 1) + 1 = a mod p,

by (3) and the induction hypothesis. If a 6= 0, then a is invertible modulo p, by

Theorem 4.1, and the claim follows by multiplying with a−1 mod p. ✷

Fermat’s theorem, together with repeated squaring, gives us an alternative to

compute inverses in Z p : since a p−2 a = a p−1 ≡ 1 mod p for a ∈ Z with p ∤ a, we
have a−1 ≡ a p−2 mod p. This may be computed using O(log p) multiplications and
squarings modulo p, using the repeated squaring algorithm 4.8, in total O(log3 p)
word operations, by Corollary 4.7. This is cubic in the input size, while the time
for modular inversion via Euclid’s algorithm is only quadratic, but it may be useful
for hand calculations.

4.5. Linear Diophantine equations

Another application of the Extended Euclidean Algorithm is the solution of linear
Diophantine equations. Let a, f , g ∈ Z be given, and suppose that we are looking
for all integral solutions s,t ∈ Z of the equation

s f + tg = a. (4)

The set of all real solutions of (4) is a line in the plane R 2 , a one-dimensional
object, that can be written as a sum v + U of a particular solution v ∈ R 2 and the
set U of all solutions of the homogeneous equation

s f + tg = 0. (5)

The following lemma says that this is also true for the set of integral solutions.
Moreover, we can decide whether (4) is solvable over Z, and if so, compute all
solutions with the Extended Euclidean Algorithm. Since the proof is the same, we
state the result for arbitrary Euclidean domains.
78 4. Applications of the Euclidean Algorithm

T HEOREM 4.10.
Let R be a Euclidean domain, a, f , g ∈ R, and h = gcd( f , g).

(i) Equation (4) has a solution (s,t) ∈ R2 if and only if h divides a.

(ii) If h 6= 0 and (s∗ ,t ∗ ) ∈ R2 is a solution of (4), then the set of all solutions is
(s∗ ,t ∗ ) +U , where
g f
U = R· ,− ⊆ R2
h h
is the set of all solutions to the homogeneous equation (5).

(iii) If R = F[x] for a field F , h 6= 0, (4) is solvable, and deg f + deg g − deg h >
deg a, then there is a unique solution (s,t) ∈ R2 of (4) such that deg s <
deg g − deg h and degt < deg f − deg h.

P ROOF. (i) If s,t ∈ R satisfy (4), then gcd( f , g) divides s f + tg and hence a.
Conversely, we assume that h = gcd( f , g) divides a. The claim is trivial if h = 0;
otherwise we can compute s∗ ,t ∗ ∈ R such that s∗ f + t ∗ g = h, using the Extended
Euclidean Algorithm, and (s,t) = (s∗ a/h,t ∗ a/h) solves (4).
(ii) For (s,t) ∈ R2 , we have, since h 6= 0 and f /h and g/h are coprime, that

f g g −f
(5) ⇐⇒ s = − t ⇐⇒ ∃k ∈ R s=k and t = k ⇐⇒ (s,t) ∈ U.
h h h h
Then also

(4) ⇐⇒ f · (s − s∗ ) + g · (t − t ∗ ) = 0 ⇐⇒ (s − s∗ ,t − t ∗ ) ∈ U.

(iii) Let (s∗ ,t ∗ ) ∈ R2 be a solution to (4). We divide t ∗ by f /h with remain-

der and obtain q,t ∈ R such that t ∗ = q f /h + t and degt < deg f − deg h. If we
let s = s∗ + qg/h, then (ii) implies that (s,t) = (s∗ ,t ∗ ) + q(g/h, − f /h) solves (4).
Now both deg(tg) and deg a are less than deg f + deg g − deg h, and hence so is
deg s + deg f = deg(s f ) = deg(a − tg). This proves the existence.
Uniqueness follows from deg(s + kg/h) = deg(kg/h) ≥ deg g − deg h for all k ∈
R \ {0}, and hence the first component of any solution to (4) different from (s,t)
has degree at least deg g − deg h. ✷

The situation can be generalized to higher dimensions. The proof of the follow-
ing theorem is left as Exercise 4.24.

T HEOREM 4.11.
Let R be a Euclidean domain, a, f1 , . . . , fn ∈ R, with all fi nonzero, and U the set
of all solutions of the homogeneous equation f1 s1 + · · · + fn sn = 0.
4.6. Continued fractions and Diophantine approximation 79

(i) The linear Diophantine equation

f1 s1 + · · · + fn sn = a (6)

is solvable for s = (s1 , . . . , sn ) ∈ Rn if and only if gcd( f1 , . . . , fn ) divides a.

(ii) If s ∈ Rn is a solution of (4), then the set of all integral solutions is s +U .
(iii) U = Ru2 + · · · + Run , where
f fi fi hi−1
i
ui = si1 , si2 , . . . , si,i−1 , − , 0, . . . , 0 ,
hi hi hi hi
hi−1 = gcd( f1 , . . . , fi−1 ) = si1 f1 + · · · + si,i−1 fi−1

for 2 ≤ i ≤ n and hn = gcd( f1 , . . . , fn ), with suitably chosen si j ∈ R.

4.6. Continued fractions and Diophantine approximation

Let R be a Euclidean domain, r0 , r1 ∈ R, and qi , ri ∈ R for 1 ≤ i ≤ ℓ be the quo-
tients and remainders in the traditional Euclidean Algorithm 3.5 for r0 , r1 . Then
eliminating the remainders successively, we obtain
r0 q1 r1 + r2 r2 1 1 1
= = q1 + = q1 + = q1 + = q1 +
r1 r1 r1 r1 r3 1
q2 + q2 +
r2 r2 r2
r3
1 1
= q1 + = · · · = q1 + .
1 1
q2 + q2 +
r4 1
q3 + q3 +
r3 ..
. 1
+
qℓ
This is called the continued fraction expansion of r0 /r1 ∈ K, where K is the field
of fractions of R (Section 25.3). In general, arbitrary elements of R may occur in
place of 1 in the “numerators” of a continued fraction, but when all of them are
required to be 1 as above, a representation of r0 /r1 by a continued fraction is com-
puted by the traditional Euclidean Algorithm. To abbreviate, we write [q1 , . . . , qℓ ]
for the continued fraction q1 + 1/(q2 + 1/(· · · + 1/qℓ ) · · ·).

E XAMPLE 4.12. We can rewrite the Euclidean Algorithm from page 45 for r0 =
126 and r1 = 35 as follows.

r0 126 r2 r0 21
q1 = = = 3, = − q1 = ,
r1 35 r1 r1 35
80 4. Applications of the Euclidean Algorithm

r1 35 r3 r1 14
q2 = = = 1, = − q2 = ,
r2 21 r2 r2 21

r2 21 r4 r2 7
q3 = = = 1, = − q3 = ,
r3 14 r3 r3 14

r3 14 r5 r3
q4 = = = 2, = − q4 = 0.
r4 7 r4 r4
Thus the continued fraction expansion of 126/35 ∈ Q is
126 18 1
= = [3, 1, 1, 2] = 3 + .✸
35 5 1+ 1 1
1+
2

If R = Z, then even an element α of R may be represented by an (infinite)

continued fraction in the sense that its initial segments converge to α with regard
to the absolute value. How are these continued fractions computed? The rules
from the traditional Extended Euclidean Algorithm 3.6 for computing the quo-
tients q1 , q2 , . . . for (r0 , r1 ) ∈ Z 2 can be reformulated as follows: set α1 = r0 /r1 ,
q1 = ⌊α1 ⌋, β2 = α1 − q1 , α2 = 1/β2 , and in general qi = ⌊αi ⌋, βi+1 = αi − qi ,
αi+1 = 1/βi+1 . Starting with an arbitrary real number α1 , this process defines the
continued fraction expansion of α1 , as in Table 4.1. We note that 0 ≤ βi < 1; the
expansion stops when βi = 0, √ which occurs if and only if α1 is rational.
For example, when α1 = 3, we obtain
√
q1 = ⌊α1 ⌋ = 1, β2 = α1 − q1 = −1 + 3,
√ √
1 −1 − 3 −1 − 3 1 1 √
α2 = = √ √ = = + 3 ≈ 1.366,
β2 (−1 + 3)(−1 − 3) −2 2 2
1 1√
q2 = ⌊α2 ⌋ = 1, β3 = α2 − q2 = − + 3,
2 2
1 2 √
α3 = = √ = 1 + 3 ≈ 2.732,
β3 −1 + 3
√
q3 = ⌊α3 ⌋ = 2, β4 = α3 − q3 = −1 + 3 = β2 ,
√
and then the process repeats over and over, yielding 3 = [1, 1, 2, . . .], where the
overlined part indicates the period of this ultimately periodic sequence.
Table 4.1 gives examples of continued fraction representations of real numbers.
The continued fraction expansion of an irrational number α ∈ R is an excellent
tool for approximating α by rational numbers with “small” denominator. The sub-
stantial theory of Diophantine approximation deals with such questions.
The quality of approximations is best described in the form

α− p ≤ 1 . (7)
q cq2
4.6. Continued fractions and Diophantine approximation 81

r∈R continued fraction expansion of r

8
[0, 3, 1, 1, 1, 2]
q29
8
29 [0, 1, 1, 9, 2, 2, 3, 2, 2, 9, 1, 2, 1, 9, 2, 2, 3, 2, 2, 9, 1, 2, . . .]
√
3 [1, 1, 2, 1, 2, . . .]
√
3
2 [1, 3, 1, 5, 1, 1, 4, 1, 1, 8, 1, 14, 1, 10, 2, 1, 4, 12, 2, 3, 2, 1, 3, 4, 1, 1, 2, 14, 3, 12, . . .]
π [3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, 3, 13, 1, 4, . . .]
e = exp(1) [2, 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, 1, 1, 12, 1, 1, 14, 1, 1, 16, 1, 1, 18, 1, 1, 20, . . .]
√
φ = 1+2 5 [1, 1, 1, . . .]

log2 56 [0, 3, 1, 4, 22, 4, 1, 1, 13, 137, 1, 1, 16, 6, 176, 3, 1, 1, 1, 1, 3, 1, 2, 1, 31, 3, 1, 1, 5, . . .]

TABLE 4.1: Examples of continued fraction representations of real numbers.

This holds for every continued fraction approximation with c = 1. Of three con-
√
secutive continued
√ fraction approximations, at least one satisfies (7) with c = 5,
and for any c > 5 there are real numbers α that have only finitely many rational
approximations with (7); see Notes 4.6. For comparison, this is about twice as
good as decimal fractions, where we restrict q to be a power of 10 and can achieve
an approximation error of 1/2q.
Polynomial analogs of (7) are discussed in Exercises 4.29 and 4.30. The latter
shows that every power series f has approximations

f ≡ r/t mod x2n+1

with polynomials r,t such that deg r, degt ≤ n, for infinitely many n. These are just
certain Padé approximants (Section 5.9).

i [q1 , . . . , qi ] decimal expansion π − [q1 , . . . , qi ] accuracy

1 3 3.00000000000000000000 0.14159265358979323846 0 digits
22
2 3.14285714285714285714 −0.00126448926734961868 2 digits
7
333
3 3.14150943396226415094 0.00008321962752908752 4 digits
106
355
4 3.14159292035398230088 −0.00000026676418906242 6 digits
113
103993
5 3.14159265301190260407 0.00000000057789063439 9 digits
33102

TABLE 4.2: Rational approximations of π.

Table 4.2 shows the rational approximations of π that result from truncating
the continued fraction expansion after the ith component for i = 1, . . . , 5 and the
82 4. Applications of the Euclidean Algorithm

number of correct digits (after the decimal point). Throughout history, people
have grappled with practical problems, in architecture, land surveying, astronomy
etc., that required “squaring the circle”. The Egyptian Rhind Papyrus from about
1650 BC gives the value (16/9)2 ≈ 3.1604. Archimedes (278–212 BC) gave a
method to approximate π , in principle arbitrarily well, using polygons inscribed
10
and circumscribed to a circle; he proved 3 71 < 25 344/8069 < π < 29 376/9347 <
1
3 7 . The Chinese astronomer Tsu Ch’ung-chih (430–501) determined six decimal
digits of π and deduced the approximation 355/113, which was also found by
Adrian Antoniszoon (1527–1607). Lambert (1761) proved that π is irrational,
and Lindemann (1882) proved that it is transcendental. An interesting unsolved
question asks whether the decimal digits of π are uniformly distributed or even
random, in some sense. We do not even know to prove that the digit 1, say, occurs
infinitely often!
Table 4.3 shows some steps in our knowledge about the decimal expansion of π .
Of the records in the 20th century, we only list some where the number of decimal
digits of the number of decimal digits (in the rightmost column) increased. The
current world record is an awesome 10 trillion digits, but is unlikely to stand for
long.
Archimedes c. 250 BC 2
Tsu Ch’ung-chih 5th c. 7
Al-Kāshı̄ 1424 14
van Ceulen 1615 35
Machin 1706 100
William Shanks 1853 527
Reitwiesner 1949 2 035
Genuys 1958 10 000
Daniel Shanks & Wrench 1962 100 265
Guilloud & Bouyer 1973 1 001 250
Kanada, Yoshino & Tamura 1982 16 777 206
Kanada, Tamura et al. 1987 133 554 400
Kanada & Tamura 1989 1 073 740 000
Kanada & Takahashi 1997 51 539 600 000
Kanada & Takahashi 1999 206 158 430 000
Kanada Laboratory 2002 1 241 100 000 000
Yee & Kondo 2011 10 000 000 000 050

TABLE 4.3: Computations of the decimal digits of π.

William Shanks published a book on his computation of 607 digits, but made
an error at the 528th digit. With a modern computer algebra system the first
100000 digits require just a few keystrokes (evalf[100000](Pi), for example,
in M APLE), and bang! there it is on your screen.
The computation of π to many digits is based on deep mathematics and is only
possible with the help of fast algorithms for high precision integer and floating
point arithmetic, based on the Fast Fourier Transform (Chapter 8) and fast division
4.7. Calendars 83

(Chapter 9). It is a good test for computer hardware, which is routinely performed
on some supercomputers before shipping. Borwein, Borwein & Bailey (1989)
speak from experience:

A large-scale calculation of π is entirely unforgiving; it soaks into all

parts of the machine and a single bit awry leaves detectable conse-
quences.

A different large-scale number-theoretic computation discovered the famous

Pentium division bug. Nicely (1996) reports in his commendably careful count
of twin primes (see page 534 for records): The first run [. . .] was performed on
the only Pentium in the group [. . .] The duplicate run was completed on my wife’s
486DX-33 on 4 October 1994. It was immediately clear that their ultra-precision
reciprocal sums differed. [. . .] Several days were then spent looking for the culprit:
compiler error, memory error, system bus, etc. [. . . My] message and its conse-
quences were spread worldwide over the Internet within days, and eventually Intel
admitted that such errors [. . .] were the result of a production flaw in nearly all
(over a million) of the Pentium CPUs produced to that time. [. . .] Intel eventually
agreed to replace all such chips with corrected versions, at the company’s expense.
In January 1995, Intel announced (PC Week, 1995) a $475 million accounting
charge to cover the cost of the episode. [. . .] As we tell our students—check your
work!

4.7. Calendars
The tropical year, to which our calendar adheres, is the period of time between
two successive occasions of vernal equinox, the precise point of time in spring
when the sun crosses the celestial equator. The length of the tropical year is about
365d 5h 48′ 45.2′′ , or 365.242190 days. (Actually, the exact value is currently
diminishing by about 0.53 seconds each century, but this shall not bother us here.)
Since the dawn of civilization, people have used calendars to express the regu-
larities of the moon’s rotation around the earth and of the seasons. Lunar calen-
dars divide time into months, where originally each month began with new moon.
Since the length of a lunar month is between 29 and 30 days, lunar calendars are
asynchronous to the year of the seasons. Solar calendars, however, ignore the
moon phases and try to approximate the year of the seasons as closely as possible.
The early Roman calendar was of a mixed lunisolar type. It consisted of origi-
nally 10 and later 12 months, and occasionally one extra month was added in order
to keep in step with the seasons. The Julian calendar, named after Julius Caesar
(and which had been invented by the Egyptian Sosigenes), started on 1 January
45 BC. Since the Romans before Caesar had badly neglected the management of
the calendar, the year 46 BC, the annus confusionis, had 445 days! Caesar used
84 4. Applications of the Euclidean Algorithm

the approximation of 365.25 days for the year and introduced one additional 366th
leap day every four years. Although this approximation is quite close to the exact
length of the tropical year, the Julian calendar was fast by about three days every
400 years.
Towards the end of the 16th century, vernal equinox was on 10 March rather than
on its “correct” date of 21 March. To rectify this, Pope Gregory XIII introduced
the following calendar reform. First, the erroneous calendar gain was fixed by
eliminating the 10 days between 4 October and 15 October 1582. Second, the
leap year rule was modified by turning those years which are divisible by 100 but
not by 400 into normal years; this removed three leap days in 400 years. So, for
example, the years 1700, 1800, and 1900 AD were all normal years, but counting
the year 2000 AD as normal would be the bug of the millennium. This Gregorian
calendar, which is essentially still used today, corresponds to an approximation of
the tropical year as
1 3 97
365 + − = 365 = 365.2425
4 400 400
days. It is too long by about 26.8 seconds a year.

approximation difference from tropical year

365 14 = 365.25000 11′ 14.8′′
7
365 29 = 365.24137 . . . −1′ 10.0′′
8
365 33 = 365.24242 . . . 20.2′′
31
365 128 = 365.24218 . . . −0.2′′

TABLE 4.4: Continued fraction approximations of the length of the year.

We obtain other rational approximations to the exact length of the tropical

year from its continued fraction expansion 365.242190 = [365, 4, 7, 1, 3, 24, 6, 2, 2].
Some corresponding approximations are given by Table 4.4. The first is the one
that Julius Caesar used. The third approximation is already better than the Grego-
rian calendar and has a considerably smaller denominator. The binary calendar
in the last line looks appealing to computer scientists or others familiar with the
binary system: a simple rule for its implementation is to have a leap day in year
number n if and only if 4 | n and 128 ∤ n.

4.8. Musical scales

Another application of continued fractions is in musical theory. A musical inter-
val consists of two tones (usually of different pitch, such as G–C), and the term
also denotes the sound generated from some instrument playing the two tones one
after the other or simultaneously. To each interval corresponds a unique frequency
4.8. Musical scales 85

ratio of the two tones involved. Table 4.5 lists some common intervals and their
frequency ratios.
frequency ratio name example
r1 = 2 : 1 octave c–C
r2 = 3 : 2 fifth G–C
r3 = 4 : 3 fourth F–C
r4 = 5 : 4 major third E–C
r5 = 6 : 5 minor third E♭ –C
r6 = 9 : 8 whole tone D–C

TABLE 4.5: Frequency ratios of some musical intervals.

“Calculation” with intervals works as follows: combination of two intervals

corresponds to multiplication of their frequency ratios. For example, the octave
c–C can be regarded as combining a fifth c–F with a fourth F–C, and in fact the
corresponding frequency ratios satisfy 2/1 = (3/2) · (4/3).
The origins of musical theory date back to the early Pythagoreans, who were
experimenting with the monochord, a stringed instrument with one string of fixed
length that can be subdivided into two variable length parts by means of a movable
bridge. They found that the tones generated by, for example, one half, two thirds,
or three quarters of the string’s length, together with the base tone generated by the
string in its full length, give intervals that are appealing to the human ear. More
generally, they postulated that this is the case if the length ratio (or equivalently,
the frequency ratio) of the interval is a quotient of small positive integers.
Pythagorean tuning derives the frequency ratios for all other tones from the
frequency ratios 3/2 for the fifth and 2/1 for the octave. Diatonic tuning, invented
by Didymos, in addition fixes the frequency ratio 5/4 of the major third. For both
tuning schemes, Table 4.6 lists the ratios with respect to the base tone C of the
tones in the diatonic scale C–D–E–F–G–A–B–c for the key of C major. In the
Pythagorean tuning, there are five whole tone steps D–C, E–D, G–F, A–G, and
B–A, with frequency ratio 9/8, plus two half-tone steps F–E and c–B with ratio
254/243. All integers that occur are powers of 2 or 3. In diatonic tuning, there
are three big whole tone steps D–C, G–F, and B–A, with frequency ratio 9/8, two
small whole tone steps E–D and A–G with ratio 10/9, and two half-tone steps F–E
and c–B with ratio 16/15.

tone C D E F G A B c
Pythagorean tuning 1:1 9:8 81:64 4:3 3:2 27:16 243:128 2:1
diatonic tuning 1:1 9:8 5:4 4:3 3:2 5:3 15:8 2:1

TABLE 4.6: Frequency ratios with respect to the base tone C in the diatonic scale.

One disadvantage of both Pythagorean and diatonic tuning is that transposing

a piece of music written in one tonal key, say C major, into another tonal key,
86 4. Applications of the Euclidean Algorithm

say D major (which corresponds to multiplying all frequencies by the ratio of

the interval D–C, thus leaving all frequency ratios unchanged), is not as easily
possible with a piano tuned for C major as it is for the human voice. For ex-
ample, with diatonic tuning the whole tone E–D in C major has frequency ratio
(5/4)/(9/8) = 10/9 and not 9/8 as it would be in D major. This led to the inven-
tion of the well-tempered scale, which dates back to Bartolomé Ramos (1482),
who gave suggestions, and to Marin Mersenne (1636), who laid the mathemat-
ical foundations. Prominent advocates of the well-tempered scale were the organ-
ist and musical scientist Andreas Werckmeister (1691) and the composer Johann
Sebastian Bach. Das Wohltemperierte Klavier 1 by Bach (1722) promoted, in its 48
preludes and fugues, the idea that all tonal keys can be played on a well-tempered
instrument (organ, harpsichord, or piano) without audible dissonances. Today,
the well-tempered scale dominates the construction of instruments and the perfor-
mance of music at least in the Western world.
This scale divides the octave c–C into 12 equal half-tones, corresponding to 12
neighboring keys on the piano (Figure 4.7). The 13 white and black keys between
C and c form the chromatic scale built from 12 half-tone steps. The 8 white keys
alone represent the diatonic scale (for the key of C major) with 5 whole-tone steps
D–C, E–D, G–F, A–G, B–A and two half-tone steps F–E and c–B. Why choose
this number 12?

C♯ D♯ F♯ G♯ A♯
= = = = =
D♭ E♭ G♭ A♭ B♭

C D E F G A B c

F IGURE 4.7: Part of a piano keyboard.

Suppose that we want to divide the octave into n equal parts, so that one half-
tone has frequency ratio 21/n , in such a way that each of the pleasant intervals
can be reached with an integral number of half-tone steps. This means that for
i = 1, . . . , 6, the frequency ratio ri from Table 4.5 should be close to 2di /n for some
integer di ; equality would be best, but, for example, there are no integers d, n
such that r2 = 3/2 = 2d/n (Exercise 4.31). Taking logarithms, we have the task of
finding di ∈ N with
di

log ri − (8)
n
1 The Well-Tempered Clavier
4.8. Musical scales 87

1 = log r1 ✻• • • • • • • • • • • • • • •• •• ••
• • • • • • • • • • • • • •
• •••••
• • • • • • • • • • • • • •
•• • • • • • •• • • • • • • • • • • • • •
••••• • •
• ••• • • • • • •
•••••• •
• • • • • • •
•
•
•
•
•
•
•
•
• •
• • • •••••
• • • • • • • • •
•• • • • •••••
•••••• • • •
• • • • • • • • •
•• •• •
• • • •
• • • • ••
• • • • •
• • • • •
•
•
• • •
•
•
•
•
•
• • • • • ••••• • • • • • • • • • •
• • • • • •••••
• • • • • • • • • • •
• • • • • •• •
• • • • • • •
• •• • • • •••••
• •
• • • • • • • • • • •
• •
log r2 • • •• • • •••••• • • • • • • • • • •
• • •••••
• • • • • • • • • •
• • •••••• • • • • • • •
• • • • • • • • • •• • ••
• • • • • • • • • • • • • •
• • ••••• • • • •
log r3 • •••• • •
• •
•
•••••
•••••• •
• • •
• • • •
• • •
• • •
• •
•
•
•
• • • • •
•••••• • • • • • • • • • •
• • • • • • •••••
• • • • • • • • • • • •
log r4 •• • • • •
• • • •• •
• • • • • • • • • • •
log r5 • • • •••••
• • • • • • • • • • •
•• • • • • • • • • ••
• • • • • • • • • •
•• • • •
•••••• • • • • • • • • • •
log r6 •• • • • •••••
• • • • • • • • • • •
• • • •••••• • • • • • • • • • •
••• • • •••••
• • • • • • • • • •
•• • • ••••• • • • • • • • • • • •
• • • •••••
• • • • • • • • • • •
••••• ✲ • • • • • •
0
6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 n

F IGURE 4.8: Diophantine approximations with denominator n to log r1 , . . . , log r6 .

0.02 ✻•

0.01
•
• •
• ••
•
• •••••••••••••••••••••• ✲
0
6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 n
Error in the nth approximation

small, where log is the binary logarithm. This is a Diophantine approximation

problem and best solved by the continued fraction expansion of log ri . For log r5 =
log(6/5) = 0.2630344058 . . ., we find from the continued fraction expansion of
Table 4.1 the approximations
1
= 0.3333333333 . . . ,
3
1
= 0.2500000000,
4
5
= 0.2631578947 . . . ,
19
111
= 0.2630331753 . . . .
422
The problem of a well-tempered scale consists in finding one denominator n
such that with various numerators di the errors (8) are small. This problem of si-
multaneous Diophantine approximation is briefly discussed in Example 17.1 of
88 4. Applications of the Euclidean Algorithm

Section 17.3. Here we solve it graphically: in Figure 4.8 on page 87, we have
a horizontal line at log ri , for the six “pleasant intervals” shown, and the dots
on the vertical line passing through n = 6, 7, . . . , 36 have distance 0, 1/n, 2/n, . . . ,
(n − 1)/n, 1 from the horizontal axis. Inspection reveals a reasonable fit for n = 12;
the lower diagram depicts the quality of the approximations, defined as the sum
over all i of the squares of the distance between log ri and its closest neighbor. For
n = 12, a whole tone is two half-tones, a minor third three, and so on.
Another good fit is at n = 19. Then a whole tone is three “third-tones”, a minor
third is five third-tones, and so on. For example, the tones E♭ and D♯ , which are
distinguished in music sheets, correspond to the same key on the piano (but are
different when played on a violin, say). In a well-tempered 19-tone scale, however,
the minor third E♭ –C is five third-tones, while the augmented second D♯ –C is only
four third-tones.

Notes. 4.1. The fingerprinting technique was invented by Freivalds (1977); an early use
was by the Persian philosopher and scientist Avicenna (980–1037), who apparently verified
his calculations by checking them modulo 9, as did al-Khwārizmı̄ and al-Kāshı̄. DeMillo
& Lipton (1978) apply it to checking equivalence of two arithmetic circuits. Many other
applications in computer science, from universal hashing to probabilistically checkable
proofs, are described in Motwani & Raghavan (1995). The general topic of probabilistic
algorithms is discussed in Section 6.5, and Section 18.4 presents techniques for finding
prime numbers.
4.2. Moore (1896) proved that any finite field is an Fq , as described, for a prime power q,
and coined the term “Galois-field”; the notation GF(q) is quite common.
4.3 and 4.4. The term repeated squaring seems to have been standard in the early 20th
century, since Pocklington (1917) uses it without further explanation (except to say that for
his modular problem one divides with remainder by the modulus after each multiplication).
Bürgisser, Clausen & Shokrollahi (1997) note that repeated squaring for the computation
of an is just Horner’s rule for the binary representation of the exponent n. Knuth (1998),
§4.6.3, discusses at length the topic of addition chains where one tries to minimize the
number of multiplications for an exponentiation. We refer to Mullin, Onyszchuk, Vanstone
& Wilson (1989), von zur Gathen & Nöcker (1997, 1999), Gao, von zur Gathen & Panario
(1998), and Gao, von zur Gathen, Panario & Shoup (2000) for exponentiation in finite
fields.
Euler (1732/33) proved that any prime p dividing Fn satisfies p ≡ 1 mod 2n+2 (Exercise
18.26). He found the factor 641 of F5 as only the second possibility allowed by his con-
dition. In the same five-page paper, he also states Fermat’s little theorem, but saying that
he has no proof and eo autem difficiliorem puto eius demonstrationem esse, quia non est
verum, nisi n + 1 sit numerus primus.2 Our proof is from his later paper Euler (1736b).
Fermat never communicated a proof of his “little theorem”. An unpublished manuscript
of Leibniz from 12 September 1680 (and also Leibniz 1697) contains the first proof of
Fermat’s little theorem; see Mahnke (1912/13), page 38, Vacca (1894), Tropfke (1902),
page 62, and Dickson (1919), Chapter III, pages 59–60. Mahnke considers it likely that
2 I also assume its proof to be rather difficult because it is not true unless n + 1 is prime.
Notes 89

Leibniz found the statement of the theorem himself, but it cannot be completely ruled out
that he had already read Fermat’s Varia Opera, published in 1679.
4.5. Generalizing our single linear Diophantine equation, we may consider a system of
linear Diophantine equations, that is, a matrix F ∈ Rm×n and a vector a ∈ Rm , where R is
a Euclidean domain, and ask for s ∈ Rn satisfying Fs = a. There are variants of Gauss-
ian elimination that allow only elementary unimodular row and column transformations,
that is, permutations, multiplications by units of R, and additions of multiples of one row or
column to another. They transform the original system into an equivalent one, for example,
in Smith or in Hermite normal form, in which solvability of the system and the set of
solutions are easy to determine. For example, the Hermite normal form of a nonsingular
square matrix F ∈ Zn×n is the unique lower triangular matrix H = UF, where U ∈ Zn×n
is unimodular, so that degU = ±1, all diagonal elements of H are positive, and each en-
try below the diagonal is nonnegative and smaller than the diagonal element in the same
column (Exercise 16.7).
Such unimodular transformations correspond to the division steps in the Euclidean Al-
gorithm. For the case of one equation, for example, the Hermite normal form of a one-row
matrix F = ( f1 , . . . , fn ) ∈ R1×n is (h, 0, . . . , 0), where h = gcd( f1 , . . . , fn ), so that computing
the Hermite normal form in this case is the same as computing a gcd. We will encounter
a different type of unimodular “Gaussian elimination” for R = Z in the basis reduction of
Chapter 16.
Finally, we may drop the requirement that the equations be linear and ask whether a
system of polynomial equations in several variables with coefficients in R has a solution,
and if so, look for an explicit description of the set of all solutions. Hilbert’s tenth problem
(see page 587) asks to determine the solvability of a Diophantine equation for R = Z.
Against Hilbert’s intuition, the question turns out to be undecidable, in the sense of Turing.
(See for example Sipser (1997) for the background.) This was proved by Matiyasevich
(1970), who showed that any recursively enumerable set D ⊆ N can be represented as

D = {s1 ∈ N: ∃s2 , . . . , sn ∈ N f (s1 , . . . , sn ) = 0}

for some n ∈ N and some polynomial f ∈ Z[x1 , . . . , xn ]. Thus D 6= Ø if and only if there is
some s ∈ N n such that f (s) = 0. By Lagrange’s famous theorem, every nonnegative integer
can be written as a sum of four squares, and hence D 6= Ø if and only if g(t) = 0 has an
integral solution t ∈ Z 4n , where

g = f (y21 + y22 + y23 + y24 , . . . , y44n−3 + y24n−2 + y24n−1 + y24n ) ∈ Z[y1 , . . . , y4n ].

Since nonemptiness of an arbitrary recursively enumerable set is undecidable, so is the

question whether a Diophantine equation is solvable. Matiyasevich (1993) gives a nice
presentation of the proof and its history.
4.6. Cataldi (1513) gives many rational approximations to square roots of integers. He
derives from this rules for parading soldiers in square formations; obviously the grant-
ing agencies’ pressure for “useful applications” is not a recent phenomenon. Leibniz
(1701) gives the continued fraction for the golden ratio and the approximation denomi-
nators, which are the Fibonacci numbers. Hugenius (1703) treats the approximation of real
numbers by fractions in the context of automata with cog wheels. Euler (1737) coined the
term fractiones continuae , gave the first thirteen entries of the expansion for π, and showed
90 4. Applications of the Euclidean Algorithm

that the continued fraction expansion is periodic for a root of an irreducible quadratic poly-
nomial with integer coefficients; Lagrange (1770a) proved the converse. By relating con-
tinued fractions to certain differential equations, Euler (1737) derived the expansion for e.
In one of his papers on continued fractions, Euler (1762/63) introduces explicit expressions
for our numbers
√ si and ti and proves Lemma 3.8 (iv) and (v). Hurwitz√(1891) showed the
quality 1/ 5q2 of rational approximations, and the optimality of c = 5 in (7). Lagrange
(1798) had proven that any “best” rational approximation comes from a continued frac-
tion. Perron’s (1929) classic explains the rich and interesting theory of continued fractions
in detail; for further reading and references see also Knuth (1998), §4.5.3, and Bombieri &
van der Poorten (1995) give an amusing introduction.
Al-Khwārizmı̄ gives in his Algebra, written around 825, three values for π: 3 17 ≈ 3.1428,
√
10 ≈ 3.16, and 62 832/20 000 = 3.1416. Already the Indian mathematician Āryabhat.a
(c. 530) had obtained the latter. The quote at the beginning of this chapter shows that
al-Khwārizmı̄ was well aware of the inexact quality of these approximations. The word
algorithm is derived from al-Khwārizmı̄’s name, indicating his family’s origin from the
town of Khwarezm, present-day Khiva in Uzbekistan (the old-fashioned “algorism” was
a better transliteration than the anagram of logarithm ). Algebra comes from the word

al-jabr in the title of his algebra book éÊK. A®ÒË@ ð QJ.m.Ì '@ H

. Ak ú¯ QåJjÒË@ H. AJºË@ (al-kitāb

al-mukhtas.ar fı̄ h.isāb al-jabr wa-l-muqābala = The concise book on computing by moving
and reducing terms). Q.g. (jabara) means “to break” and refers to the technique of moving
terms of an equation to the other side so that all resulting terms are positive; he did not allow
negative terms. (The Spanish word algebrista designates both someone who does algebra
or who sets broken bones, as in Cervantes (1615), chapter XV, where one of Don Quixote’s

companions is healed by an algebrista.) éÊK. A®Ó (muqābala) is probably the “reduction” of
equations by subtracting equal quantities on both sides, and these are his two techniques for
solving linear and quadratic equations, the topic of the first part of his book. The influence
of al-Khwārizmı̄’s work on Arab, and later medieval European, mathematics was profound.
Al-Kāshı̄ was chief astronomer at Ulugh Beg’s court in Samarkand. His Instructions
on the circle’s circumference, written around 1424 (see Luckey 1953), is a remarkable
achievement. He presents his calculation of π with great concern about error control,
Newton iteration for square roots with just the required precision, and a conversion of his
hexagesimal result to decimal notation.
Euler (1736a) introduced the symbols π (§638) and e (§171); they became popular
with his Introductio in analysin infinitorum (Euler 1748a), but Gauß (1866) still used a
different notation. π had been used by Jones (1706) and by Christian Goldbach in 1742.
Ludolph van Ceulen (1540–1610) published 20 digits of π in 1596, and his tombstone in
the Pieterskerk in Leiden, Holland, recorded 35 digits. It was lost in the 19th century,
and on 5 July 2000 a reconstruction was ceremoniously installed. Still, tombstones have
not caught on as publication medium. Shanks had calculated 527 digits of π by February
1853, all correct. In March and April 1853, he extended this to 607 digits, incorrectly.
The 2011 record by Yee & Kondo took about one year on a home-brew desktop computer
with about 60 TB of disk space. This impressive achievement is largely due to ingenious
algorithms devised by the Chudnovsky brothers, the Borwein brothers, Richard Brent, and
their collaborators. Hilbert’s (1893) elegant paper proves the transcendence of e and π
on four pages. Berggren, Borwein & Borwein (1997) present a magnificent collection of
writings about π. It is a must for π gourmets, and we used their material liberally.
Exercises 91

There are analogs of continued fractions in the polynomial case. If F is a field, then an
element α of F((x−1 )) or F((x)), the field of formal Laurent series in x−1 or x, respectively,
may be represented by an infinite continued fraction whose initial segments converge to α
with regard to the degree valuation and the x-adic valuation (Section 9.6), respectively.
This is discussed in Exercises 4.29 and 4.30.
4.7. Euler (1737) uses a year of 365d 5h 49′ 8′′ , which is 22.8 seconds longer than our
8
value, to calculate several calendars, including the Julian and Gregorian ones, and 365 33 .
Lagrange (1798), §20, finds several rational approximations to the length of the year, our
four among them, via continued fractions based on a year of 365d 5h 48′ 49′′ , 3.8 seconds
longer than our assumption. He ends his calculations by admonishing the astronomers to
do their homework: comme les Astronomes sont encore partagés sur la véritable longueur
de l’année, nous nous abstiendrons de prononcer sur ce sujet.3 We follow Lagrange’s
modesty.
4.8. Drobisch (1855) first used continued fractions to approximate log(3/2) by rational
numbers in order to divide the octave into equal parts. Ternary continued fractions, studied
by Jacobi (1868), can be used to approximate two irrational quantities simultaneously,
and Barbour (1948) applied them to the tuning problem by approximating log(3/2) and
log(5/4).
Actually, perfectly well-tempered instruments are quite rare. In a piano, the thick chords
for low notes produce anharmonic overtones (“harmonics”). Octaves usually have a ratio
slightly higher than 2 : 1, and on violins a fourth in one octave may have a ratio different
from the same fourth’s ratio in another octave.

Exercises.
4.1 Suppose that on a 64-bit processor you are given a single precision number p with 263 < p < 264 ,
and the words making up a positive multiprecision integer a, say of n words. Give an algorithm that
computes a rem p in O(n) word operations. You may assume that the processor has a double-by-
single-precision division instruction, which takes as input three single precision integers a0 , a1 , p
such that a1 < p and returns single precision integers q, r with a1 · 264 + a0 = qp + r and r < p. Hint:
You have to take care of the leading bit of a.
4.2 Suppose that the two data bases to be compared, as in Section 4.1, are up to 10 GB long and
actually different, and that we use single-precision primes p with 263 < p < 264 . There are at least
1017 such primes (Exercise18.18).
(i) Modulo how many primes can they agree at most?
(ii) If we choose our prime p at random, what is the probability that our test gives an incorrect
“ok”?
4.3∗ You are to apply the fingerprinting technique to string matching . Given are two strings x =
x0 x1 · · ·xm−2 xm−1 and y = y0 y1 · · ·yn−2 yn−1 , say consisting of symbols xi , yi ∈ {0, 1} for all i, of
lengths m < n, respectively. We want to determine whether x occurs as a substring of y. Let zi =
yi yi+1 · · ·yi+m−1 be the substring of length m of y starting at position i, for 0 ≤ i < n − m. Thus the
task is to determine whether x = zi for some i.
(i) Describe a simple algorithm that uses O(mn) symbol comparisons.
(ii) Let a = ∑0≤ j<m xi 2i and bi = ∑0≤ j<m yi+ j 2i be the integers whose binary representation (with
most significant bit right) is x and zi , respectively, and 263 < p < 264 a single precision prime. Give
an algorithm that computes all bi rem p and compares them to a rem p in O(n) word operations.
3 since the astronomers have not yet agreed on the true length of the year, we will refrain from making a recom-
mendation on this subject.
92 4. Applications of the Euclidean Algorithm

(iii) Any match is certainly found by your algorithm. If m ≤ 63k, i < n − m, and p was chosen at
random among the at least 1017 single precision primes (Exercise 18.18), what is the probability that
x 6= yi and yet a ≡ bi mod p (in terms of k)? What is the probability that some such false match is
reported, in terms of k and n? For which k and n is the latter probability below 0.1%?
4.4 Prove that an integer a = ∑0≤i≤l ai · 10i ∈ N is divisible by 11 if and only if the alternating sum
a0 − a1 + a2 − a3 ± · · · + (−1)l al of its decimal digits is.
4.5 Show that for any integer m, congruence mod m is an equivalence relation on Z, and prove (1).
4.6 Let m ∈ N≥1 and f ∈ Zm [x] be monic of degree n. Show that the residue class ring Zm [x]/h f i
has mn elements.
4.7 Is there a b ∈ Z such that 6b ≡ 1 mod 81?
4.8 (i) Let a ∈ N be such that 0 ≤ a < 1000 and the three least significant digits in the decimal
representation of 17a are 001. What is a?
(ii) Same question when the least significant digits are 209.
4.9 Let f = x4 +x3 +2x2 +x +1, g1 = x, and g2 = x3 +x in Q[x]. Compute polynomials t1 ,t2 ∈ Q[x]
such that ti gi ≡ 1 mod f for i = 1, 2, if they exist. Is Q[x]/h f i a field?
4.10 Show that the polynomial f = x3 + x + 1 ∈ F2 [x] is irreducible, and compute the inverses of all
nonzero elements in F8 = F2 [x]/h f i using the Extended Euclidean Algorithm.
4.11 Let g = x5 + x + 1 ∈ F2 [x]. For each of the two polynomials
(i) f = x3 + x + 1, (ii) f = x3 + 1
in F2 [x], do the following. If f mod g is a unit in F2 [x]/hgi, compute its inverse h mod g. If f mod g
is a zero divisor, find a polynomial h ∈ F2 [x] of degree less than 5 such that f h ≡ 0 mod g.
4.12 Prove carefully that R[x]/hx2 + 1i and C are isomorphic fields.
4.13 (i) Find a polynomial f ∈ F7 [x] of degree less than 4 solving the congruence (x2 − 1) · f ≡
x3 + 2x + 5 mod x4 + 2x2 + 1 in F7 [x].
(ii) Show that the residue class ring F343 = F7 [x]/hx3 + x + 1i is a field, and compute the inverse
of x2 mod x3 + x + 1 in F343 .
4.14 (i) Let R be a Euclidean domain and m, f ∈ R. Show that f mod m is a zero divisor (see
page 227) in R/hmi if and only if gcd( f , m) 6= 1 if and only if f mod m is not invertible in R/hmi.
(ii) Give an example of a ring containing nonzero elements that are neither units nor zero divisors.
4.15 Let R be a Euclidean domain and a, b, c ∈ R.
(i) Show that the congruence ax ≡ b mod c has a solution x ∈ R if and only if g = gcd(a, c) di-
vides b. Prove that in the latter case, the congruence is equivalent to (a/g)x ≡ (b/g)x mod (c/g).
(ii) For R = Z and a = 5, 6, 7, determine whether the congruence ax ≡ 9 mod 15 is solvable, and
if so, give all solutions x ∈ {0, . . ., 14}.
4.16∗ The degree sequence of a pair ( f , g) ∈ (F[x] \ {0})2 of nonzero polynomials over a field F is
(deg r0 , deg r1 , . . ., deg rℓ ) ∈ N ℓ+1 , where r0 , r1 , . . ., rℓ are the remainders in the Euclidean Algorithm
for f and g. How many pairs of polynomials ( f , g) ∈ (Fq [x] \ {0})2 over the finite field Fq with
q elements have degree sequence (4, 3, 1, 0)? Generalize your answer for arbitrary given degree
sequences (n0 , n1 , . . ., nℓ ) ∈ N ℓ+1 with n0 ≥ n1 > · · · > nℓ ≥ 0 for ℓ ≥ 1. Hint: Use Exercise 3.18.
For all possible degree sequences with n0 = 3 and n1 = 2, list the corresponding pairs of polynomials
in (F2 [x] \ {0})2 .
4.17∗ This continues Exercise 4.16. Let Fq be a finite field with q elements and n, m ∈ Z with
n ≥ m ≥ 0.
Exercises 93

(i) For two disjoint subsets S, T ⊆ {0, . . ., m − 1}, let pS,T denote the probability that no degree in
S and all degrees in T occur in the remainder sequence of the Euclidean Algorithm for two random
polynomials in Fq [x] of degrees n and m, respectively. Prove that pS,T = q−#S (1 − q−1 )#T .
(ii) For 0 ≤ i < m, let Xi denote the random variable that has Xi = 1 if i occurs in the degree
sequence of the Euclidean Algorithm for two random polynomials in Fq [x] of degrees n and m,
respectively, and Xi = 0 otherwise. Show that X0 , . . ., Xm−1 are independent and prob(Xi = 0) = 1/q
for all i.
4.18∗ Let q be a prime power and n, m ∈ N with n ≥ m > 0. Use Exercise 4.17 to prove the following
statements.
(i) The probability that two random polynomials of degree n and m, respectively, in Fq [x] are
coprime is 1 − 1/q.
(ii) The probability that n2 = n1 − 1 is 1 − 1/q.
(iii) The probability that the degree sequence is normal, that is, ℓ = m + 1 and ni+1 = ni − 1 for
1 ≤ i < ℓ, is (1 − 1/q)m ≥ 1 − m/q.
4.19 Let Fq be a finite field with q elements, f ∈ Fq [x] of degree n > 0, and R = Fq [x]/h f i the
residue class ring modulo f . Then R× , the set of elements of R that have a multiplicative inverse, is
a multiplicative group, and Theorem 4.1 implies that R× = {g mod f : gcd( f , g) = 1}. We denote its
cardinality by Φ( f ) = #R× = #{g ∈ Fq [x]: degg < n and gcd( f , g) = 1}.
(i) Prove that Φ( f ) = qn − 1 if f is irreducible.
(ii) Show that Φ( f ) = (qd − 1)qn−d if f is a power of an irreducible polynomial of degree d.
4.20 Devise a recursive variant of the repeated squaring algorithm 4.8, and also an iterative variant
which proceeds from the low order to the high order bits of the binary representation of n. Trace all
three algorithms on the computation of a45 .
4.21∗ Give a “repeated fourth powering” algorithm that uses 2⌊log4 n⌋ squarings and w4 (n) + 1
ordinary multiplications, where w4 (n) is the number of nonzero digits in the 4-ary representation
of n. Trace your algorithm on the computation of a45 . Generalize your algorithm to a “repeated 2k th
powering” algorithm for k ∈ N>0 .
4.22 Compute 15−1 mod 19 via Euclid and via Fermat.
4.23 Derive (a + b) p ≡ a p + b p mod p for all a, b ∈ Z and prime p from Fermat’s little theorem.
4.24∗ (i) Let R be a Euclidean domain, f1 , . . ., fn ∈ R, and h = gcd( f1 , . . ., fn ). Prove that there
exist s1 , . . ., sn ∈ R such that s1 f1 + · · · + sn fn = h.
(ii) Prove Theorem 4.11.
(iii) Let l = lcm( f1 , . . ., fn ). Show that if R = F[x] for a field F, h 6= 0, and deg a < deg l, then
there exist s1 , . . ., sn ∈ R solving (6) such that deg si < deg l − deg fi .
4.25 Compute integral solutions of the linear Diophantine equations 24s + 33t = 9 and 6s1 + 10s2 +
15s3 = 7.
4.26 (i) Expand the rational fractions 14/3 and 3/14 into finite continued fractions.
(ii) Convert [2, 1, 4] and [0, 1, 1, 100] into rational numbers.
√ √ √ √ √
4.27 Expand each of the following as infinite continued fractions: 2, 2 − 1, 2/2, 5, 7.
4.28 Let R be a Euclidean domain and q1 , . . ., qℓ ∈ R \ 0. Show that

ci+1 (q1 , . . ., qi )
[q1 , . . ., qi ] =
ci+1 (0, q2 , . . ., qi )
for 1 ≤ i ≤ ℓ, where ci is the ith continuant polynomial (Exercise 3.20).
94 4. Applications of the Euclidean Algorithm

4.29∗∗ This exercise assumes familiarity with valuations and formal Laurent series. Let F be a field.
The field F((x−1 )) of formal Laurent series in x−1 consists of expressions of the form
g= ∑ g jx j, gm , gm−1 , . . . ∈ F
−∞< j≤m

for some m ∈ Z. We set deg g = max{ j ≤ m: g j 6= 0}, with the convention that deg 0 = −∞. This
degree function has the usual properties, as the degree of polynomials. In fact, the field F(x) of
rational functions is a subfield of F((x−1 )), and we have deg(a/b) = deg a − deg b for a, b ∈ F[x].
For a Laurent series g, we obtain the continued fraction [q1 , q2 , . . .] of g as follows. Set α1 = g,
and recursively define qi = ⌊αi ⌋ ∈ F[x] and αi+1 = 1/(αi − qi ) for i ∈ N>0 . Here, ⌊·⌋ extracts the
polynomial part, so that deg(αi − qi ) < 0.
(i) Show that deg qi = deg αi for all i ∈ N>0 and deg αi > 0 if i ≥ 2.
(ii) Prove that the continued fraction of a rational function r0 /r1 ∈ F((x−1 )), with nonzero r0 , r1 ∈
F[x], is finite, and that the qi are the quotients in the traditional Euclidean Algorithm for r0 , r1 .
(iii) Let s0 = t1 = 1, s1 = t0 = 0, and si+1 = si−1 − qi si , ti+1 = ti−1 − qiti for i ≥ 1, as in the
traditional Extended Euclidean Algorithm. Prove that the ith convergent ci = [q1 , . . ., qi−1 ] of g is
ci = −ti /si , for all i ≥ 2.
(iv) Show that g = −(ti−1 − αi ti )/(si−1 − αi si ), and conclude that deg(g − ci ) < −2 deg si for all
i ≥ 2. Thus if |h| = 2deg h is the degree valuation of a Laurent series h, then we obtain the analog
|g + ti /si | < |si |−2 of (7).
(v) Now let i ∈ N≥2 , k ≥ n = deg si , r0 = ⌊xn+k g⌋, r1 = xn+k , and ri = si r0 + ti r1 . Conclude
from (iv) that deg ri < k, and show that ri /si ≡ r0 mod xn+k if x ∤ si . (In fact, Lemma 11.3 implies
that q1 , . . ., qi−1 are the first i − 1 quotients and ri is the ith remainder in the traditional Euclidean
Algorithm for r0 , r1 .)
4.30∗∗ This exercise is an analog of Exercise 4.29, now for Laurent series in x rather than in x−1 .
Let F be a field. The field F((x)) of formal Laurent series in x consists of expressions of the form
g= ∑ g jx j, gm , gm+1 , . . . ∈ F
m≤ j<∞

for some m ∈ Z. We let v(g) = min{ j ≥ m: g j 6= 0}, with the convention that v(0) = ∞.
For a Laurent series g, we obtain the continued fraction [q1 , q2 , . . .] of g as follows. Set α1 = g,
and recursively define qi = ⌊αi ⌋ ∈ F[1/x] and αi+1 = 1/(αi − qi ) for i ∈ N>0 . Here, ⌊·⌋ extracts the
part which is polynomial in 1/x, so that v(αi − qi ) > 0, or equivalently, x | (αi − qi ).
(i) Prove that v( f g) = v( f ) + v(g), v(1/g) = −v(g) if g 6= 0, and v( f + g) ≥ min{v( f ), v(g)}, with
equality if v( f ) 6= v(g), hold for all f , g ∈ F((x)).
(ii) Let s0 = t1 = 1, s1 = t0 = 0, and si+1 = si−1 − qi si , ti+1 = ti−1 − qiti for i ≥ 1, as in the
traditional Extended Euclidean Algorithm. Then the si ,ti are polynomials in 1/x. Prove that the ith
convergent ci = [q1 , . . ., qi−1 ] of g is ci = −ti /si , for all i ≥ 2.
(iii) Show that g = −(ti−1 − αi ti )/(si−1 − αi si ), and conclude that v(g − ci ) > −2v(si ) for all
i ≥ 2. Thus if |h| = 2−v(h) is the x-adic valuation of a Laurent series h, then we obtain the analog
|g + ti /si | < |si |−2 of (7).
(iv) Now assume that g ∈ F[[x]] is a power series, let i ∈ N≥2 , and n = −v(si ) ∈ N. Prove that xn si
and xn ti are polynomials of degree at most n, and conclude that there exist polynomials s,t ∈ F[x] of
degree not more than n such that x ∤ s and t/s ≡ g mod x2n+1 .
4.31 Prove that there do not exist integers d, n such that 3/2 = 2d/n .
4.32∗∗ (Sturm 1835) Let f ∈ R[x] have no multiple roots, so that gcd( f , f ′ ) = 1, and determine
f0 = f , f1 = f ′ , f2 , . . ., fℓ , q1 , . . ., qℓ ∈ R[x] similarly as in the traditional Euclidean Algorithm, but
according to the modified rule
qi = fi−1 quo fi , fi+1 = −( fi−1 rem fi )
Exercises 95

for 1 ≤ i ≤ ℓ, with the convention that fℓ+1 = 0. The difference to the traditional Euclidean Algorithm
is the sign of fi+1 ; this corresponds to taking ρ0 = ρ1 = 1 and ρi = −1 for 2 ≤ i ≤ ℓ in the Extended
Euclidean Algorithm 3.14. The polynomials f0 , f1 , . . ., fℓ form the Sturm chain of f . For each
b ∈ R, let w(b) be the number of sign alternations in the sequence f0 (b), . . ., fℓ (b). Here, a sign
alternation occurs when either fi (b) < 0, fi+1 (b) ≥ 0 or fi (b) > 0, fi+1 (b) ≤ 0. Prove Sturm’s
theorem, which says that for all b, c ∈ R such that f (b) 6= 0 6= f (c) and b < c, the number of real
roots of f in the interval (b, c) is w(b) − w(c). Hint: It is sufficient to prove the theorem for intervals
containing at most one zero of all the fi ’s. Show that w does not change at a zero of some fi with
i > 0, but that w drops by one at a zero of f0 = f .
4.33 Let F be a field.
(i) Show that a polynomial f ∈ F[x] of degree 2 or 3 is irreducible if and only if it has no roots
in F.
(ii) For each of the two fields F = Q and F = F2 , find a polynomial of degree 4 that is reducible
and has no roots in F.
All is fair in war, love, and mathematics.
Eric Temple Bell (1937)

“Divide et impera ” is as true in algebra as in statecraft;

but no less true and even more fertile is the maxim “auge et impera.”
The more to do or to prove, the easier the doing or the proof.
James Joseph Sylvester (1878)

Quando orientur controversiae, non magis disputatione opus erit inter

duos philosophos, quam inter duos Computistas. Sufficiet enim,
calamos in manus sumere, sedereque ad abacos, et sibi mutuo
(accito si placet amico) dicere: calculemus.1
Gottfried Wilhelm Leibniz (1684)

These [results] must not be taken on trust by the student, but must be
worked by his own pen, which must never be out of his hand while
engaged in any algebraical process.
Augustus De Morgan (1831)

1 When controversies arise, there will not be a greater dispute between two philosophers than between two com-
puters. It will be sufficient for them to take pen in hand, sit down with their calculators, and (having summoned
a friend, if they like) say to each other: Let us calculate.
5
Modular algorithms and interpolation

An important general concept in computer algebra is the idea of using various

types of representation for the objects at hand. As an example, we can represent a
polynomial either by a list of its coefficients or by its values at sufficiently many
points. In fact, this is just computer algebra lingo for the ubiquitous quest for
efficient data structures for computational problems.
One successful instantiation of the general concept are modular algorithms,
where instead of solving an integer problem (more generally, an algebraic com-
putation problem over a Euclidean domain R) directly one solves it modulo one
or several integers m. The general principle is illustrated in Figure 5.1. There are
three variants: big prime (Figure 5.1 with m = p for a prime p), small primes
(Figure 5.2 with m = p1 · · · pr for pairwise distinct primes p1 , . . . , pr ), and prime
power modular algorithms (Figure 5.3 with m = pl for a prime p). The first one
is conceptually the simplest, and the basic issues are most visible in that variant.
However, the other two variants are computationally superior.
In each case, two technical problems have to be addressed:
◦ a bound on the solution in R,
◦ how to find the required moduli.
The first item is sometimes easy, especially when R = F[x]. The second item
requires the prime number theorem when R = Z; for R = F[x] it is easy for the
small primes version, but more involved for big primes.
The small primes modular approach has the following advantages over direct
computation; the first two also apply to big prime and prime power modular algo-
rithms.
◦ In some applications, such as computing the greatest common divisor of two
polynomials with integer coefficients, we encounter the phenomenon of inter-
mediate expression swell: the coefficients of intermediate results during the
computation can be much bigger than the coefficients of the final result that we
are interested in. In Chapter 6, we will see that the gcd of two polynomials in

97
98 5. Modular algorithms and interpolation

modular
problem reduction problem
✲
in R in R/hmi

direct modular
computation computation

❄ ❄
solution ✛ solution
in R reconstruction in R/hmi

F IGURE 5.1: General scheme for modular algorithms.

modular
reduction ✿ R/hpr i
modular
reduction
... .
R ✲ R/hmi
..

❥
R/hp1 i

direct modular ...

computation computation
❄
R/hp1 i

❄ ..
✙ .
R ✛ R/hmi ② .. .
reconstruction
❄
reconstruction R/hpr i

F IGURE 5.2: General scheme for small primes modular algorithms.

5. Modular algorithms and interpolation 99

modular modular
reduction reduction
R ✲ R/hpl i ✲ R/hpi

direct modular
computation computation

❄ ❄
R ✛ R/hpl i ✛ R/hpi
reconstruction lifting

F IGURE 5.3: General scheme for prime power modular algorithms.

Z[x] of degree at most n has coefficients not much larger than the input polyno-
mials, but the intermediate coefficients in Euclid’s algorithm may be longer by
a factor of about n than the inputs, and by a factor of about n2 for the traditional
Euclidean Algorithm. When computing in a modular fashion, we may choose
the moduli mi such that their product is only slightly larger than the final result,
and by reducing modulo the mi where possible, also the intermediate results in
the modular computation remain as “small” as the final result.
◦ The designer of a modular algorithm is free in her choice of the moduli, as long
as their product is large enough to recover the result. Thus she may choose
the moduli to be Fourier primes which support particularly fast polynomial
arithmetic; these will be discussed in Chapter 8.
◦ In nearly all tasks in computer algebra, the cost for solving a problem with
input size n is at least linear in n. For example, if we use the algorithms for
integer and polynomial arithmetic as described in Chapters 2 and 3, then one
arithmetic operation on integers of length n or on polynomials of degree n takes
O(n2 ) operations. In such cases, it is cheaper to solve r “small” problems with
inputs of size about n/r rather than one “big” problem. In the extreme case
n = r, the cost of the modular computation becomes just O(n), but this has to
be balanced against the cost for the change of representation.
◦ If the moduli mi in the small primes approach fit into one machine word of
the target processor, then the cost for an arithmetic operation modulo one mi
amounts to only a few machine cycles.
◦ The r subtasks modulo the distinct small primes are independent of each other
and can be performed in a distributed fashion using r processors or machines
in parallel.
100 5. Modular algorithms and interpolation

As an aside, we note that as long as the direct computation uses only additions
and multiplications, but no divisions, an arbitrary modulus—or arbitrary pairwise
coprime moduli in the “small primes” variant—may be chosen in a modular algo-
rithm.
Besides the big prime method, we discuss in this chapter the theoretical under-
pinnings of the small primes modular algorithm, namely the Chinese Remainder
Algorithm, and two applications: secret sharing and computing the determinant.
We have to wait until Chapter 9 for the tools used in the third variant: Newton
iteration and Hensel lifting. The prime power approach will play a major role in
the polynomial factorization algorithm in Chapter 15. Table 15.5 on page 460 lists
eleven problems for which we will have learnt modular algorithms by then.
We also discuss applications of the Extended Euclidean Algorithm and the Chi-
nese Remainder Algorithm to various kinds of interpolation problems and to par-
tial fraction decomposition.

5.1. Change of representation

There are two basically different types of representation for our objects. In the
first, one chooses a base and presents data in an expansion along powers of that
base. Examples for this type include the coefficient representation of polynomials
(where x is the base), more generally a Taylor expansion around u (where x − u
is the base; see Section 5.6), and the decimal or binary representation of integers
(with base 10 or 2, respectively). Even the usual floating point representation of
real numbers can be considered to be in this category, with base 10−1 . These types
of representation are the “natural” ones, and computer users like to have their input
and output in this format.
Prototypical for the second type are the representations of a polynomial by its
values at various points u0 , . . . , un−1 , or of an integer modulo different primes
p0 , . . . , pn−1 . (We number from 0 through n − 1 instead of 1 through n for consis-
tency with Chapter 10.) The “bases” here are x − u0 , . . . , x − un−1 and p0 , . . . , pn−1 ,
respectively. Actually, a better analogy is to take
m m m
, ,..., ,
x − u0 x − u1 x − un−1

where m = (x−u0 )(x−u1 ) · · · (x−un−1 ), as basis for all polynomials of degree less
than n in the first case; see the Lagrange interpolation formula (3) below. Some
problems, such as multiplication, are quite easy in appropriate bases of this kind,
while others, like division with remainder, seem to require a representation of the
first type.
For each computational problem, one should examine whether this general-
purpose tool is of use. This involves two questions:
5.2. Evaluation and interpolation 101

◦ In which representation is the problem easiest to solve?

◦ How do we convert to and from that representation?

We discuss some fundamental tools for these questions in this book: the Chinese
Remainder Algorithm, the Extended Euclidean Algorithm, and Newton iteration
(including Hensel lifting).
It is important to realize the similarity between evaluating a polynomial at a
point u and taking the remainder of an integer modulo a prime p. The former is the
same as taking the remainder modulo x − u, and so the latter can be thought of as
“evaluating the integer at p”. The inverse operation of recovering the coefficients
of a polynomial from its values at several points is interpolation. For integers, this
is afforded by the Chinese Remainder Algorithm, and it is useful to understand this
as “interpolating an integer from its values at several primes”.
Similar representations exist for rational functions and rational numbers. We
will later discuss conversion algorithms: Cauchy interpolation and Padé approxi-
mation for rational functions (Sections 5.8 and 5.9), and rational number recon-
struction (Section 5.10).
The proper choice of representation is vital when dealing with multivariate poly-
nomials. Four important possibilities—dense, sparse, by an arithmetic circuit, or
by a “black box”—are briefly discussed in Section 16.6.
An important application of the general idea of change of representation are the
fast multiplication algorithms in Chapter 8 (based on the FFT). In the three major
problems in Part V, namely Gröbner bases, integration, and summation, the basic
task can be interpreted as transforming a general input into a representation where
the problem at hand is fairly easy to solve.

5.2. Evaluation and interpolation

We start with the simplest but most important change of representation: evaluation
and interpolation. Suppose that F is a field and u0 , . . . , un−1 ∈ F are pairwise dis-
tinct. A polynomial f = ∑0≤i<n fi xi ∈ F[x] can be evaluated at a single point u ∈ F
with n − 1 multiplications and n − 1 additions in F, using Horner’s rule

f (u) = (· · · ( fn−1 u + fn−2 )u + · · · + f1 )u + f0 . (1)

Thus we can evaluate F at all points ui using 2n2 − 2n operations in F. What about
interpolation?
The Lagrange interpolant
x − uj
li = ∏ ∈ F[x] (2)
0≤ j<n ui − u j
j6=i
102 5. Modular algorithms and interpolation

1.5
l0
l1
1 l2
l3
l4
0.5 l5

-0.5

-1
0 1 2 3 4 5

F IGURE 5.4: The Lagrange interpolants l0 , . . . , l5 when ui = i for i = 0, . . . , 5.

has the property that li (u j ) is 0 if i 6= j and 1 when i = j; see Figure 5.4. For
arbitrary v0 , . . . , vn−1 ∈ F,

x − uj
f= ∑ vi li = ∑ vi ∏ . (3)
0≤i<n 0≤i<n 0≤ j<n ui − u j
j6=i

is a polynomial of degree less than n such that f (ui ) = vi for all i. The interpolating
polynomial with this degree constraint is unique, since the difference of two such
polynomials has degree less than n and n roots, hence is the zero polynomial.

T HEOREM 5.1.
Evaluating a polynomial f ∈ F[x] of degree less than n at n distinct points u0 , . . . ,
un−1 ∈ F or computing an interpolating polynomial at these points can be per-
formed with O(n2 ) operations in F . More precisely, evaluation takes 2n2 − 2n
operations, and Lagrange interpolation uses 7n2 − 7ny operations.

P ROOF. It remains to prove the claim for interpolation. Let mi = x−ui for all i. We
first compute m0 m1 , m0 m1 m2 , . . . , m = m0 · · · mn−1 . This amounts to multiplying a
monic linear polynomial by a monic polynomial of degree i for 1 ≤ i < n, taking

∑ 2i = n2 − n
1≤i<n
5.3. Application: Secret sharing 103

arithmetic operations. Then for each i, we divide m by mi , taking 2n − 2 operations

(Exercise 5.3), evaluate m/mi at ui , taking at most 2n − 3 operations since m/mi is
monic, and divide vi by that value. This amounts to 4n2 − 4n operations for all i.
Finally, computing the linear combination (3) takes another 2n2 − 2n operations,
and the estimate follows by adding up. ✷

Exercise 5.11 discusses a different algorithm, Newton interpolation, and shows

that it takes at most 52 n2 field operations. In Chapter 10, we show that both eval-
uation and interpolation can be done much faster, with only O(n log2 n loglog n)
operations.
A global view of evaluation and interpolation is the following. Given n points
u0 , . . . , un−1 ∈ F, we define the evaluation map χ: F n −→ F n by

j j
χ( f0 , . . . , fn−1 ) = ∑ f j u0 , . . . , ∑ f j un−1 (4)
0≤ j<n 0≤ j<n

This is just the map corresponding to evaluation of polynomials f = ∑ fi xi of de-

gree less than n at u0 , . . . , un−1 . We consider χ as a function of the coefficients fi ;
different evaluation points ui give different functions. The map χ is obviously
F-linear and represented by the Vandermonde matrix
 
1 u0 u20 · · · u0n−1
 1 u1 u21 · · · u1n−1 
 
 2
· · · u2n−1 
V = VDM(u0 , . . . , un−1 ) =  1 u2 u2  ∈ F n×n .
 .. .. .. .. 
 . . . . 
2 n−1
1 un−1 un−1 · · · un−1
If ui = u j for some i 6= j, then row i and row j of V are equal, and V is singular.
On the other hand, if all ui are distinct, then the existence of a unique interpolation
polynomial of degree less than n shows that χ (and hence V ) is invertible, and
then V −1 is the matrix of the interpolation map. Both evaluation and interpolation
are linear maps between coefficient and value vectors, but of course not linear in
u0 , . . . , un−1 .

5.3. Application: Secret sharing

A neat application of interpolation, which we mentioned in Section 1.3 but which
will not be used later, is secret sharing: you want to give to n players a shared
secret, so that together they can discover it, but no proper subset of the players
can. To achieve this, you identify possible secrets with elements of the finite field
F p = Z/hpi for an appropriate p. Some bank cards for Automatic Teller Machine
access have as their secret PIN codes four-digit decimal numbers. For such a se-
cret, you choose a prime p just bigger than 10 000, say p = 10 007. Then you
104 5. Modular algorithms and interpolation

choose 2n − 1 random elements f1 , . . . , fn−1 , u0 , . . . , un−1 ∈ F p uniformly and inde-

pendently with all ui nonzero, call your secret f0 , set f = fn−1 xn−1 +· · ·+ f1 x+ f0 ∈
F p [x], and give to player number i the value f (ui ) ∈ F p . (If ui = u j for some i 6= j,
√
you have to make a new random choice; this is unlikely to happen if n ≪ p.)
Then together they can determine the (unique) interpolation polynomial f of de-
gree less than n, and thus f0 . But if any smaller number of them, say n − 1, get
together, then the possible interpolation polynomials consistent with this partial
knowledge are such that each value in F p of f0 is equally likely: they have no
information on f0 (Exercise 5.14).
We can extend this scheme to the situation where k ≤ n and each subset of
k players are able to recover the secret, but no set of fewer than k players can.
This is achieved by randomly and independently choosing n + k − 1 elements
u0 , . . . , un−1 , f1 , . . . , fk−1 ∈ F p and giving f (ui ) to player i, where f = fk−1 xk−1 +
· · · + f1 x + f0 ∈ F p [x] and f0 ∈ F p is the secret as above. Again, it is required that
ui 6= u j if i 6= j. Since f is uniquely determined by its values at k points, each
subset of k out of the n players can calculate f and thus the secret f0 , but fewer
than k players together have no information on f0 .

5.4. The Chinese Remainder Algorithm

Suppose that f ∈ N has two decimal digits and has remainder 2 on division by 11
and 7 on division by 13. Does this uniquely define f , and if so, is there a better
way to find it than to check all values between 0 and 99? We will see in this section
that the answer to both questions is positive.
For this section, R is a Euclidean domain, and we fix the following notation:
m0 , . . . , mr−1 ∈ R are pairwise coprime, so that gcd(mi , m j ) = 1
(5)
for 0 ≤ i < j < r, and m = m0 · · · mr−1 .
Thus m = lcm(m0 , . . . , mr−1 ). For 0 ≤ i < r, we have the canonical ring homomor-
phism
πi : R −→ R/hmi i,
f 7−→ f mod mi .
Combining these for all i, we get the ring homomorphism

χ = π0 × · · · × πr−1 : R −→ R/hm0 i × . . . × R/hmr−1 i,

f 7−→ ( f mod m0 , . . . , f mod mr−1 ).
For our example above, we have R = Z, r = 2, m0 = 11, m1 = 13, m = 143, and

χ( f ) = ( f mod 11, f mod 13) = (2 mod 11, 7 mod 13) ∈ Z11 × Z13 .

The following statement provides, in somewhat abstract terminology, the theoreti-

cal basis for many of our algorithms.
5.4. The Chinese Remainder Algorithm 105

T HEOREM 5.2.
χ is surjective with kernel hmi.

P ROOF. Let f ∈ R. Then

f ∈ ker χ ⇐⇒ χ( f ) = ( f mod m0 , . . . , f mod mr−1 ) = (0, . . . , 0)
⇐⇒ mi | f for 0 ≤ i < r ⇐⇒ lcm(m0 , . . . , mr−1 ) | f ⇐⇒ m | f ,
and ker χ = hmi. For the surjectivity, it is sufficient to show that a “Lagrange inter-
polant” li ∈ R with χ(li ) = ei exists for 0 ≤ i < r, where ei = (0, . . . , 0, 1, 0, . . . , 0) ∈
R/hm0 i × · · · × R/hmr−1 i denotes the ith unit vector. To see why this is enough, let
v = (v0 mod m0 , . . . , vr−1 mod mr−1 ) ∈ R/hm0 i × · · · × R/hmr−1 i
be arbitrary, with v0 , . . . , vr−1 ∈ R. Then
!
χ ∑ vi li = ∑ χ(vi )χ(li ) = ∑ (vi mod m0 , . . . , vi mod mr−1 ) · ei
0≤i<r 0≤i<r 0≤i<r

= ∑ (0, . . . , 0, vi mod mi , 0, . . . , 0) = v.
0≤i<r

We assume i = 0 for simplicity. The Extended Euclidean Algorithm, when ap-

plied to m1 · · · mr−1 = m/m0 and m0 , computes s,t ∈ R with sm/m0 + tm0 = 1 =
gcd(m/m0 , m0 ). If we now let l0 = sm/m0 , then obviously l0 ≡ 0 mod m j for
1 ≤ j < r, and
m m
l0 = s ≡ s + tm0 = 1 mod m0 ,
m0 m0
so that χ(l0 ) = e0 as claimed. ✷

C OROLLARY 5.3 Chinese Remainder Theorem (CRT).

We have the ring isomorphism
R/hmi ∼
= R/hm0 i × · · · × R/hmr−1 i (6)
and the group isomorphism of the multiplicative groups
(R/hmi)× ∼
= (R/hm0 i)× × · · · × (R/hmr−1 i)× .

P ROOF. Theorem 5.2 and the homomorphism theorem for rings (Section 25.2)
imply (6). For f ∈ R, we have
f is invertible modulo m ⇐⇒ gcd( f , m) = 1 ⇐⇒ gcd( f , mi ) = 1 for 0 ≤ i < r
⇐⇒ f is invertible modulo mi for 0 ≤ i < r,
and the second claim follows. ✷
106 5. Modular algorithms and interpolation

The proof of Theorem 5.2 is constructive and yields the following algorithm.

A LGORITHM 5.4 Chinese Remainder Algorithm (CRA).

Input: m0 , . . . , mr−1 ∈ R pairwise coprime, v0 , . . . , vr−1 ∈ R, where R is a Euclidean
domain.
Output: f ∈ R such that f ≡ vi mod mi for 0 ≤ i < r.

1. m ←− m0 · · · mr−1

2. for i = 0, . . . , r − 1 do
compute m/mi
call the Extended Euclidean Algorithm 3.14 to compute si ,ti ∈ R with
m
si + ti mi = 1
mi
ci ←− vi si rem mi
m
3. return ∑ ci
0≤i<r mi

We recall that ci is the remainder in R on dividing vi si by mi (Section 3.1). To

see that the algorithm works correctly, we observe that ci m/mi ≡ 0 mod m j for
j 6= i and ci m/mi ≡ vi si m/mi ≡ vi mod mi , and hence f ≡ ci m/mi ≡ vi mod mi for
0 ≤ i < r, as claimed.

E XAMPLE 5.5. (i) We let R = Z, mi = pei i for 0 ≤ i < r, where the pi ∈ N are
distinct primes and ei ∈ N>0 for 0 ≤ i < r. Then

m= ∏ pei i
0≤i<r

is the prime decomposition of m ∈ Z. The CRT tells us that

e er−1
Z/hmi ∼
= Z/hp00 i × · · · × Z/hpr−1 i,

and for arbitrary v0 , . . . , vr−1 ∈ Z the CRA computes a solution f ∈ Z of the system
of congruences
f ≡ vi mod pei i for 0 ≤ i < r.
For example, we take r = 2, m0 = 11, m1 = 13, and m = 11 · 13 = 143, and find
for v0 = 2 and v1 = 7 an f ∈ Z with 0 ≤ f < m and

f ≡ 2 mod 11, f ≡ 7 mod 13.

There is nothing to do in step 1 of Algorithm 5.4. In step 2, we have to apply the

Extended Euclidean Algorithm to 11 and 13 and get 6 · 13 + (−7) · 11 = 1, that is,
5.4. The Chinese Remainder Algorithm 107

s0 = 6 and s1 = −7. The Lagrange interpolants l0 and l1 from Theorem 5.2, which
do not occur explicitly in the algorithm, are l0 = 6 · 13 = 78 and l1 = (−7) · 11 =
−77, and we check that in fact l0 ≡ 1 mod 11, l0 ≡ 0 mod 13, l1 ≡ 0 mod 11, and
l1 ≡ 1 mod 13. Now
c0 = v0 s0 rem m0 = 2 · 6 rem 11 = 1,
c1 = v1 s1 rem m1 = 7 · (−7) rem 13 = 3.
Finally, in step 3 we compute
m m
f = c0 + c1 = 1 · 13 + 3 · 11 = 46,
m0 m1
and indeed 46 = 4 · 11 + 2 = 3 · 13 + 7.
(ii) We let R = F[x] for a field F and mi = x − ui for 0 ≤ i < r, where u0 , . . . , ur−1
in F are pairwise distinct. Then f ≡ f (ui ) mod (x − ui ) for 0 ≤ i < r and arbitrary
f ∈ F[x], by Section 4.1, and hence the ring homomorphism
χ: F[x] −→ F[x]/hx − u0 i × · · · × F[x]/hx − ur−1 i ∼
= Fr
f 7−→ ( f (u0 ), . . . , f (ur−1 ))
from Theorem 5.2 is just the evaluation homomorphism (4) at u0 , . . . , ur−1 . (The
ring F r consists of the r-tuples with entries from F, and the ring operations are
done coordinatewise.) Moreover, the li from the proof of Theorem 5.2 satisfying
li ≡ li (ui ) = 1 mod (x − ui ),
li ≡ li (u j ) = 0 mod (x − u j ) for j 6= i
and deg li < r are the Lagrange interpolants
x − uj
li = ∏ .
0≤ j<r ui − u j
j6=i

If v0 , . . . , vr−1 ∈ F are scalars, then ci = vi si in step 2 of Algorithm 5.4, and

m
f = ∑ ci = ∑ vi li
0≤i<r m i 0≤i<r

is nothing but the familiar Lagrange interpolation polynomial satisfying

f (ui ) = vi for 0 ≤ i < r. (7)
Thus Chinese remaindering for r distinct monic linear polynomials is the same as
interpolation at r points, and the CRT tells us once more what we already know:
the interpolation polynomial is unique modulo m = ∏0≤i<r (x − ui ), so that there
is exactly one polynomial f ∈ F[x] of degree strictly less than r which solves the
interpolation problem (7). In fact, it is useful to think of the CRT as a generaliza-
tion of interpolation. ✸
108 5. Modular algorithms and interpolation

With R = Z and the mi as in (i) of the above example, we obtain the following
formula for Euler’s totient function from (2) of Section 4.2 and the Chinese Re-
mainder Theorem. Exercise 5.28 gives the corresponding formula for polynomials
over a finite field.

C OROLLARY 5.6.
If m = pe00 · · · per−1
r−1
, with distinct primes p0 , . . . , pr−1 ∈ N and e0 , . . . , er−1 ∈ N>0 ,
then

e0 −1 er−1 −1 1
ϕ(m) = (p0 − 1)p0 · · · (pr−1 − 1)pr−1 = m · ∏ 1− .
p|m, p prime
p

T HEOREM 5.7.
Let R = F[x] for a field F , m0 , . . . , mr−1 , m ∈ R as in (5), di = deg mi ≥ 1 for 0 ≤
i < r, n = deg m = ∑0≤i<r di , and vi ∈ R with deg vi < di . Then the unique solution
f ∈ F[x] with deg f < n of the Chinese Remainder Problem

f ≡ vi mod mi for 0 ≤ i < r

for polynomials can be computed using O(n2 ) operations in F .

P ROOF. In step 1 of Algorithm 5.4, we first successively compute m0 , m0 m1 , . . . ,

m0 · · · mr−1 with at most

2 ∑ (d0 + . . . + di−1 + 1)(di + 1) = 2 ∑ d j (di + 1) + 2 ∑ (di + 1)

1≤i<r 0≤ j<i<r 1≤i<r

<2 ∑ d j (di + 1) = 2 ∑ dj ∑ (di + 1) = 2n(n + r) ∈ O(n2 )
0≤i, j<r 0≤ j<r 0≤i<r

operations in F (Section 2.3).

Next, we compute m/mi for 0 ≤ i < r in step 2. Each division takes at most
(2di + 1)(n − di + 1) operations, as shown in Section 2.4, and together we have

∑ (2di + 1)(n − di + 1) < 2n ∑ (di + 1) = 2n(n + r) ∈ O(n2 )

0≤i<r 0≤i<r

operations.
We fix i ∈ {0, . . . , r − 1} in step 2. The Extended Euclidean Algorithm with
input m/mi and mi takes O(di (n − di )) operations (Theorem 3.16). By the degree
formula for si (Lemma 3.15 (b)), we have deg si < deg mi = di , and hence the
multiplication of vi and si , together with the subsequent division with remainder
5.5. Modular determinant computation 109

by mi , takes O(di2 ) operations. So we have O(di n) operations for each i, and O(n2 )
for step 2.
Finally, in step 3 we need O(di (n − di )) operations for the multiplication of ci
and m/mi for 0 ≤ i < r, and O(rn) for the addition of all the products (their degree
is strictly less than n). This gives a cost of O(n2 ) for step 3, and also a total cost of
O(n2 ) for the whole algorithm. ✷

The following is the integer analog of Theorem 5.7; see Exercise 5.29.

T HEOREM 5.8.
Let R = Z, m0 , . . . , mr−1 , m ∈ N as in (5), n = ⌊log2 m/64⌋ + 1 the word length
of m, and vi ∈ Z such that 0 ≤ vi < mi for 0 ≤ i < r. Then the unique solution
f ∈ Z with 0 ≤ f < m of the Chinese Remainder Problem

f ≡ vi mod mi for 0 ≤ i < r

for integers can be computed using O(n2 ) word operations.

5.5. Modular determinant computation

We will use our tools from Section 5.4 on an innocuous problem, namely comput-
ing the determinant det A ∈ Z of an n × n matrix A = (ai j )1≤i, j≤n ∈ Z n×n .
We know from linear algebra (Section 25.5) that this problem can be solved
by means of Gaussian elimination over Q, which costs at most 2n3 operations
in Q. Is this “polynomial time”? Of course 2n3 is polynomial in the input size,
but the number of word operations that the algorithm uses will also depend on
the numerators and denominators of the intermediate results. How large can these
grow? We consider the kth stage during the elimination, and suppose for simplicity
that A is nonsingular and that no row or column permutations are necessary.

∗
..
. ∗
∗
(k) (k)
akk · · · ak j · · ·
.. ..
0 . .
.. ..
. .
(k) (k)
aik · · · ai j · · ·
.. ..
. .

The table represents the matrix after k − 1 pivoting stages, a “∗” denotes an arbi-
trary rational number, and the upper diagonal entries are nonzero. The diagonal
110 5. Modular algorithms and interpolation

(k)
element akk 6= 0 is the new pivot element, and the entries of the kth column below
the pivot element must be made zero in the kth stage by subtracting an appropriate
multiple of the kth row. The entries of the matrix for k < i ≤ n and k ≤ j ≤ n
change according to the formula
(k)
(k+1) (k) aik (k)
ai j = ai j − a .
(k) k j
(8)
akk

(1)
The entries ai j = ai j are the entries of the original matrix A. If bk is an upper
(k)
bound for the absolute value of the numerators and denominators of all ai j for
1 ≤ i, j ≤ n, so that in particular |ai j | ≤ b1 for 1 ≤ i, j ≤ n, then the formula (8)
gives
2 k−2 k−1 k−1
−1)/3 4k−1
bk ≤ 2b4k−1 ≤ 21+4 b4k−2 ≤ · · · ≤ 21+4+···+4 b14 = 2(4 b1 ,

which is an exponentially large upper bound in the input size n2 λ(b1 ) ≈ n2 log264 b1
(see Sections 2.1 and 6.1 concerning the length λ). At this point, we may won-
der whether Gaussian elimination indeed uses polynomial time, if we count word
operations. In fact, the length of the intermediate results and the number of word
operations for Gaussian elimination over Q are polynomial in the input size, but
the proof is nontrivial. We use an alternative approach to reach the same goal,
a polynomial time algorithm for computing det A. This illustrates modular compu-
tation in a simple case, and introduces some tools of more general interest.
The simplest way of obtaining a polynomial-time computation for the determi-
nant d = det A of a matrix A ∈ Z n×n is to choose a prime p which is guaranteed to
be bigger than 2|d|, perform Gaussian elimination on A mod p ∈ Z pn×n to calculate
d mod p, and represent this value in the “symmetric” system
p−1 p−1
− ,..., (9)
2 2
of representatives (if p is odd; see Section 4.1). If r ∈ Z is this representative, then
p p
r ≡ d mod p, − <r< .
2 2
The congruence holds since any polynomial expression like the determinant com-
mutes with the canonical homomorphism Z −→ Z p (Section 25.3); so that the
determinant, taken modulo p, of A equals the determinant in Z p of the matrix
(A mod p) ∈ Z pn×n whose entries are those of A, taken modulo p. It follows that p
divides d − r,
p p
|d − r| ≤ |d| + |r| < + = p,
2 2
and hence d = r.
5.5. Modular determinant computation 111

We note the trivial but surprisingly useful fact that we used:

if a | b and |b| < |a|, then b = 0, for a, b ∈ Z. (10)

The calculation of det(A mod p) is essentially the same as Gaussian elimination

over Q, except that when dividing by a pivot element a we have to calculate its
inverse modulo p by the (traditional) Extended Euclidean Algorithm 3.6. What
have we gained? The big win is that we do not have to worry about the growth of
the intermediate values anymore: they can always be represented in the system (9),
and thus are always “small”. It remains to determine a “good” a priori bound
on | det A|, that is, one that is only polynomially large in n and the size of the
coefficients of A, and which is easy to find without actually calculating det A. Such
a bound is provided by Hadamard’s inequality (Theorem 16.6) which says that
| det A| ≤ nn/2 Bn , where B = max1≤i, j≤n |ai j | ∈ N is the maximal absolute value of
an entry of A.

4 5
E XAMPLE 5.9. We let A = . After Gaussian elimination, the matrix
6 −7
has the form
4 5
,
0 −29/2
so that det A = −58.
The Hadamard bound | det A| ≤ 21 72 = 98 is reasonably close to | det A| = 58.
So for a modular computation, we choose a prime p > 2 · 98, say p = 199, and
perform Gaussian elimination modulo p. The inverse of the pivot element 4 is 50,
and
4 5
det(A mod 199) = det = 141 = −58 in Z199 . ✸
0 85

1
The word length λ(C) of the bound C = nn/2 Bn on | det A| is about 64 log2 C =
1 1 2
64 n( 2 log2 n + log2 B), and thus polynomial in the input size n λ(B), and we will
see in Section 18.4 that a prime p between, say, 2C and 4C can be found easily
with a probabilistic polynomial-time algorithm. Then arithmetic modulo p can
be performed in polynomial time, in fact, with O(log2 C) word operations. All
entries of A are less than p in absolute value, and nothing happens computationally
in reducing them modulo p. Thus the cost of the algorithm is O(n3 ) operations
modulo p, which shows that the determinant of an integer matrix can be computed
with
O(n3 · n2 (log n + log B)2 ) or O∼ (n5 log2 B) (11)
word operations, where the O∼ notation ignores logarithmic factors (Section 25.7).
This is polynomial time, but not cubic (in n)! Using the fast integer arithmetic of
112 5. Modular algorithms and interpolation

Part II, the running time can be reduced to O∼ (n4 log B); this is softly quadratic
in the input size. Well, should we call the running time of Gaussian elimination
quadratic, cubic, quartic, or quintic?
The algorithm indicated above is not much progress over Gaussian elimination
in Q, except that we could easily prove that it works in polynomial time. The
really big idea, however, is not to compute with a single modulus, but with several
moduli at a time: small primes modular computation. These primes can then be
chosen very small, of only logarithmic length, and the main cost of the resulting
algorithm are many small Gaussian eliminations, which can be performed in a
parallel or even distributed fashion. This method is much more efficient.

A LGORITHM 5.10 Small primes modular determinant computation.

Input: A = (ai j )1≤i, j≤n ∈ Z n×n , with |ai j | ≤ B for all i, j.
Output: det A ∈ Z.

1. C ←− nn/2 Bn , r ←− ⌈log2 (2C + 1)⌉

choose r distinct prime numbers m0 , . . . , mr−1 ∈ N

2. for i = 0, . . . , r − 1 compute A mod mi

3. for i = 0, . . . , r − 1 do
compute di ∈ {0, . . . , mi −1} such that di ≡ det A mod mi using Gauss-
ian elimination over Zmi
4. call the Chinese Remainder Algorithm 5.4 to determine d ∈ Z of least abso-
lute value with d ≡ di mod mi for 0 ≤ i < r

5. return d

Because det A is a polynomial expression in the coefficients of A, we have

det A ≡ di mod mi for 0 ≤ i < r and hence det A ≡ d mod m by the Chinese Re-
mainder Theorem, where m = m0 · · · mr−1 . Since m ≥ 2r > 2nn/2 Bn ≥ 2|d|, we
actually have d = det A, as before.

E XAMPLE 5.11. We take the first four prime numbers as moduli and get

det A ≡ 0 mod 2, det A ≡ 2 mod 3,

det A ≡ 2 mod 5, det A ≡ −2 mod 7.

We have m = 2 · 3 · 5 · 7 = 210, the solutions to the Chinese Remainder Problem

d ≡ di mod mi for 1 ≤ i ≤ 4 are d ∈ −58 + 210Z = {. . . , −268, −58, 152, 362, . . .},
and the correct solution −58 is the one of least absolute value. If we had taken only
the first three primes, we would have incorrectly computed d = 2. ✸
5.6. Hermite interpolation 113

For the cost analysis, Theorem 18.10 says that we can calculate the first r
primes using O(r log2 r loglog r) word operations, and that log mi ∈ O(log r) for
all i. (Actually, somewhat fewer than r primes are sufficient, see Exercise 18.21.)
Thus log m = ∑0≤i<r log mi ∈ O(r log r). A single arithmetic operation modulo mi
can be done with O(log2 mi ) or O(log2 r) word operations, and hence the total cost
of all Gaussian eliminations in step 3 is O(n3 r log2 r) word operations. The reduc-
tion of an entry of A modulo one modulus mi takes O(λ(B)λ(mi )) or O(log B·log r)
word operations, by Section 2.4. Therefore the reduction of all entries of A mod-
ulo m0 , . . . , mr−1 in step 2 takes O(n2 log B · r log r) word operations. The cost for
step 4 is O(r2 log2 r) word operations, by Theorem 5.8, and dominates the cost of
step 1. The fact that r ∈ O(n log(nB)) leads to the following theorem, which says
that the small primes approach is faster by about two orders of magnitude than the
big prime algorithm.

T HEOREM 5.12.
The determinant of a matrix A ∈ Z n×n with all entries less than B in absolute value
can be computed deterministically with

O((n4 log(nB) + n3 log2 (nB))(log2 n + (loglog B)2 )) or O∼ (n4 log B + n3 log2 B)

word operations.

In practice, one would precompute and store not the first r primes but r single
precision primes close to the word size of the processor (say between 263 and
264 −1 if the word size is 64). Exercise 18.18 shows that there are sufficiently many
such single precision primes for all practical purposes. Then one operation modulo
an mi takes constant time, and the total cost is O(n3 r), plus O(n2 log B · r + r2 ) or
O(nr2 ) for the initial modular reduction and the CRA, where r is about λ(2C) or
O(n log(nB)). In contrast, the cost of the big prime variant is about O(n3 r2 ) word
operations.
Similarly to the integer case, a modular algorithm for computing determinants
of matrices with entries in F[x], where F is a field, can be designed. If the field is
large enough, then this is even easier than the integer case (Exercise 5.32).

5.6. Hermite interpolation

Sections 5.6 through 5.11 are not essential for the rest of the text and may be
skipped at first reading. In this section, we discuss an application of the Chi-
nese Remainder Algorithm to Hermite interpolation. This is a generalization of
polynomial interpolation where at each point not only the value of a function is
prescribed, but also the values of some of the first few derivatives, or equivalently,
an initial segment of the Taylor expansion.
114 5. Modular algorithms and interpolation

If R is an arbitrary (commutative) ring, f ∈ R[x] has degree at most n, and u ∈ R,

then the Taylor expansion of f around u is
f = fn · (x − u)n + · · · + f1 · (x − u) + f0 , (12)
where fn , . . . , f0 ∈ R are the Taylor coefficients. If u = 0, then this is just our usual
way of writing polynomials. The Taylor coefficients are uniquely determined,
and we have fn xn + · · · + f1 x + f0 = f (x + u), which is the polynomial f with x
substituted by x + u. (Formally, this defines fn , . . . , f0 ∈ R[u], when we consider
u as an indeterminate over R, and then (12) holds for this indeterminate and also
for each value from R substituted for it.) For R = Z, Q, R, or C, the ith Taylor
coefficient of f is equal to f (i) (u)/i!, where f (i) is the ith derivative of f with
respect to x, and (12) takes the more familiar form
f (n) (u) f ′′ (u)
f= · (x − u)n + · · · + · (x − u)2 + f ′ (u) · (x − u) + f (u).
n! 2
Thus for f ∈ Z[x] and u ∈ Z, f (i) (u)/i! is always an integer. For e ≤ n, (12) implies
that
f ≡ fe−1 · (x − u)e−1 + · · · + f1 · (x − u) + f0 mod (x − u)e . (13)
We say that the right hand side of (13) is the Taylor expansion of f around u of
order e.
Let F be a field, u0 , . . . , ur−1 ∈ F distinct, e0 , . . . , er−1 ∈ N, and v0 , . . . , vr−1 ∈ F[x]
with deg vi < ei for all i. The Hermite interpolation problem is then to compute
a polynomial f ∈ F[x] of degree less than n = e0 + · · · + er−1 such that for all i,
vi is the Taylor expansion of f around ui of order ei , or equivalently, f solves the
Chinese Remainder Problem
f ≡ vi mod (x − ui )ei for 0 ≤ i < r. (14)

E XAMPLE 5.13. We look for a polynomial f ∈ Q[x] of degree less than 4 such
that
f (0) = 0, f ′ (0) = 1, f (1) = 1, f ′ (1) = 0, (15)
Thus the initial segments of the Taylor expansions of f at x = 0 and x = 1 are
v0 = f (0) + f ′ (0)x = x and v1 = f (1) + f ′ (1)(x − 1) = 1, respectively, and the
conditions (15) are equivalent to the congruences
f ≡ x mod x2 , f ≡ 1 mod (x − 1)2 .
Here our moduli are m0 = x2 and m1 = (x − 1)2 , and the Extended Euclidean
Algorithm finds that (−2x + 3)x2 + (2x + 1)(x − 1)2 = 1. Thus s0 = 2x + 1 and
s1 = −2x + 3 in step 2 of the Chinese Remainder Algorithm 5.4,
c0 = v0 s0 rem m0 = x · (2x + 1) rem x2 = x,
c1 = v1 s1 rem m1 = 1 · (−2x + 3) rem (x − 1)2 = −2x + 3,
5.7. Rational function reconstruction 115

and finally
m m
f = c0 + c1 = x(x − 1)2 + (−2x + 3)x2 = −x3 + x2 + x.
m0 m1

Thus f ′ = −3x2 + 2x + 1, and we easily check that (15) is satisfied. ✸

The Chinese Remainder Theorem 5.7 implies the following.

C OROLLARY 5.14.
The Hermite interpolation problem (14) can be solved using O(n2 ) arithmetic op-
erations in F .

5.7. Rational function reconstruction

In this section, we solve the problem of finding a rational function of small
“degree” that is congruent to some polynomial modulo another polynomial. As
a consequence, we obtain solutions for various interpolation problems, and in the
most general form a “rational” Chinese Remainder Algorithm.
Let F be a field, m ∈ F[x] of degree n > 0, and g ∈ F[x] of degree less than n.
For a given k ∈ {0, . . . , n}, we want to find a rational function r/t ∈ F(x), with
r,t ∈ F[x], satisfying

gcd(t, m) = 1 and rt −1 ≡ g mod m, deg r < k, degt ≤ n − k, (16)

where t −1 is the inverse of t modulo m (Section 4.2). If k = n, then clearly r = g and

t = 1 is a solution, but it is not clear at all whether solutions exist for other values
of k. The degree constraints in (16) are “natural” in the sense that, for fixed m, the
input g has n coefficients, and the constraints leave exactly n “degrees of freedom”
for the coefficients of r and t.
Since t is a unit modulo m, we may multiply the congruence in (16) by t to
obtain an equivalent condition. When we drop the gcd requirement, we obtain

r ≡ tg mod m, deg r < k, degt ≤ n − k. (17)

Now this is a strictly weaker condition, and we will see that it can always be
satisfied, but that there are (exceptional) cases where (16) has no solution.
The following lemma is in a sense the converse of Lemma 3.15 (b), which state
that the si ,ti in the Extended Euclidean Algorithm have small degrees. It says that
any linear combination r = s f + tg of f and g, where f , g, r, s,t ∈ F[x] and the
degrees of r, s,t are “small”, is a multiple of some row r j = s j f + t j g in the EEA.
116 5. Modular algorithms and interpolation

L EMMA 5.15 (Uniqueness of the EEA entries). Let F be a field, f , g, r, s,t ∈ F[x]
with deg f = n, r = s f + tg and t 6= 0, and suppose that

deg r + degt < n = deg f .

Moreover, let ri , si ,ti for 0 ≤ i ≤ ℓ + 1 be the rows of the Extended Euclidean

Algorithm 3.14 for the pair ( f , g). If we define j ∈ {1, . . . , ℓ + 1} by

deg r j ≤ deg r < deg r j−1 , (18)

then there exists a nonzero α ∈ F[x] such that

r = αr j , s = αs j , t = αt j .

P ROOF. First, we claim that s j t = st j . Suppose that the claim is false, and consider
the equation
sj tj f rj
= .
s t g r
The coefficient matrix is nonsingular, and we can solve for f in F(x) using Cra-
mer’s rule (Theorem 25.6), obtaining

rj tj
det
r t
f= . (19)
sj tj
det
s t
The degree of the left hand side of (19) is n, while

deg(r j t − rt j ) ≤ max{deg r j + degt, deg r + degt j }

≤ max{deg r + degt, deg r + n − deg r j−1 }
< max{n, deg r j−1 + n − deg r j−1 } = n,

by Lemma 3.15 (b) and (18), and the degree of the right hand side of (19) is strictly
less than n. Thus we have a contradiction, proving the claim.
Now, Lemma 3.15 (v) implies that s j and t j are relatively prime, and from the
claim we have t j | s j t, so that t j | t. We write t = αt j , where α ∈ F[x], and α 6= 0
since t 6= 0. Then we have st j = s j t = αs j t j , and cancelling t j , we obtain s = αs j .
Finally,
r = s f + tg = α(s j f + t j g) = αr j . ✷

We now show that (17) can be solved by means of the Extended Euclidean Al-
gorithm. We say that a rational function r/t ∈ F(x), with r,t ∈ F[x], is in canonical
form if t is monic and gcd(r,t) = 1. Every rational function has a unique canonical
form.
5.7. Rational function reconstruction 117

T HEOREM 5.16.
Let m ∈ F[x] of degree n > 0 and g ∈ F[x] of degree less than n. Furthermore, let
r j , s j ,t j ∈ F[x] be the jth row in the Extended Euclidean Algorithm for m, g, where
j is minimal such that deg r j < k.
(i) There exist polynomials r,t ∈ F[x] satisfying (17), namely r = r j and t = t j .
If in addition gcd(r j ,t j ) = 1, then r and t also solve (16).
(ii) If r/t ∈ F(x) is a canonical form solution to (16), then r = τ −1 r j and t =
τ −1t j , where τ = lc(t j ) ∈ F \ {0}. In particular, (16) is solvable if and only
if gcd(r j ,t j ) = 1.

P ROOF. (i) We have r j = s j m + t j g ≡ t j g mod m and

degt j = deg r0 − deg r j−1 = n − deg r j−1 ≤ n − k,
by Lemma 3.15 (b) and the minimality of j, and hence (r,t) = (r j ,t j ) satisfy (17).
Here, r j−1 is the remainder in the EEA preceding r j . Finally, Lemma 3.15 (vi)
implies that gcd(m,t j ) = gcd(r j ,t j ) = 1, and the claim follows.
(ii) Let s ∈ F[x] such that r = sm + tg. The degree constraints in (16) imply that
deg r + degt < n = deg m. From Lemma 5.15, we conclude that (r,t) = (αr j , αt j )
for some nonzero α ∈ F[x]. Since r and t are relatively prime and t is monic,
α = τ −1 ∈ F is constant and gcd(r j ,t j ) = 1. ✷

Using Theorem 3.16, we obtain the following.

C OROLLARY 5.17.
There is an algorithm which decides whether (16) is solvable, and if so, computes
its unique solution using O(n2 ) operations in F .

In the next two sections, several examples will illustrate Theorem 5.16. We will
combine the Chinese Remainder Algorithm for polynomials with rational function
reconstruction to solve various interpolation problems, depicted in Table 5.5. The
problems will be precisely specified below. When the input consists of n items,
we also have a degree constraint on the output which leaves n choices. Then the
polynomial problems always have a solution, and the rational ones typically do,
but not always. The polynomial output in the second and the fourth row is trivially
just the input. The next-to-last column in Table 5.5 can be considered as a special
case of the last column. The third row is the “least common generalization” of the
first two rows and a special case of the last row. The solution of these problems
proceeds in two steps.
◦ First a polynomial solution is computed, by the Chinese Remainder Algorithm.
This was done in the preceding sections.
118 5. Modular algorithms and interpolation

rational function
input moduli polynomial output
output
mi = x − ui , polynomial Cauchy
several values ui distinct interpolation, §5.2 interpolation, §5.8
Taylor expansion Padé approximation,
m = xn §5.9
around 0
rational Hermite
Taylor expansions mi = (x − ui )ei , Hermite
interpolation,
around several ui ui distinct interpolation, §5.6
Exercises 5.42, 5.43
rational function
remainder mod m m arbitrary
reconstruction, §5.7
remainders modulo mi arbitrary, rational CRA,
CRA, §5.4
several mi pairwise coprime Exercise 5.42

TABLE 5.5: Various interpolation problems.

◦ Then the required rational solution is calculated, via the Extended Euclidean
Algorithm for the polynomial solution and a problem-specific modulus.
Like polynomial interpolation, Cauchy and Hermite interpolation and Padé ap-
proximation are well-studied problems in numerical analysis. The various desig-
nations illustrate the power of our general approach: people had found each of
these problems interesting and studied them, and only in hindsight can we classify
them as special instances of the single general task “rational CRA”.

5.8. Cauchy interpolation

The polynomial interpolation problem is, given a collection of sample values vi =
f (ui ) ∈ F for 0 ≤ i < n of an unknown function f : F −→ F at distinct points
u0 , . . . , un−1 of a field F, to compute a polynomial g ∈ F[x] of degree less than n
that interpolates g at those points, so that g(ui ) = vi for all i. We saw in Section 5.2
that such a polynomial always exists uniquely and learned how to compute it using
the Lagrange interpolation formula.
A more general problem is Cauchy interpolation or rational interpolation,
where furthermore k ∈ {0, . . . , n} is given and we are looking for a rational function
r/t ∈ F(x), with r,t ∈ F[x], such that
r(ui )
t(ui ) 6= 0 and = vi for 0 ≤ i < n, deg r < k, degt ≤ n − k. (20)
t(ui )
Like polynomial interpolation, Cauchy interpolation can be used to approximate
real-valued functions given only by their values at a finite set of points. Empiri-
cally, it is often the case that the approximation error is smaller for rational func-
tions than for polynomials, in particular, when the function to be approximated has
singularities; we will see an example below.
5.8. Cauchy interpolation 119

Obviously t = 1 and r = g, where g is an interpolating polynomial as above, is

a solution to (20) for k = n, but it is not clear whether solutions for other values of
k exist. Multiplying (20) by t(ui ) and dropping the requirement that it be nonzero,
we obtain the weaker condition

r(ui ) = t(ui )vi for 0 ≤ i < n, deg r < k, degt ≤ n − k. (21)

Now for any i, r(ui ) = t(ui )vi = t(ui )g(ui ) if and only if r ≡ tg mod (x −ui ), and by
the Chinese Remainder Theorem, Corollary 5.3, (21) is in turn equivalent to (17)
with m = (x − u0 ) · · · (x − un−1 ). The following consequence of Theorem 5.16 on
rational function reconstruction gives a complete answer on existence and unique-
ness of a solution to (20).

C OROLLARY 5.18.
Let F be a field, u0 , . . . , un−1 ∈ F be distinct, v0 , . . . , vn−1 ∈ F , g ∈ F[x] of degree
less than n with g(ui ) = vi for all i, and k ∈ {0, . . . , n}. Furthermore, let r j , s j ,t j ∈
F[x] be the jth row in the Extended Euclidean Algorithm for the polynomials m =
(x − u0 ) · · · (x − un−1 ) and g, where j is minimal such that deg r j < k.

(i) There exist polynomials r,t ∈ F[x] satisfying (21), namely r = r j and t = t j .
If in addition gcd(r j ,t j ) = 1, then r and t also solve (20).

(ii) If r/t ∈ F(x) is a canonical form solution to (20), then r = τ −1 r j and t =

τ −1t j , where τ = lc(t j ) ∈ F \ {0}. In particular, (20) is solvable if and only
if gcd(r j ,t j ) = 1.

Thus to find a rational interpolating function as in (20), we compute the entries

r j and t j in the jth row of the Extended Euclidean Algorithm for m, g, in the no-
tation of the above corollary. If gcd(r j ,t j ) = 1, then r j /t j is the unique canonical
form solution to the rational interpolation problem (20). If the gcd is nontrivial,
however, then also gcd(m,t j ) 6= 1, by Lemma 3.15 (vi). Thus t j (ui ) = 0 for some
i ∈ {0, . . . , n − 1}, and (20) has no solution. (In numerical analysis, such a ui is
called an unreachable point.) A trivial example for the latter is when k = 0 and not
all vi are zero, so that r = 0 and t = m satisfy (21) but not (20). More interesting is
the following.

E XAMPLE 5.19. (i) Let F = F5 , and suppose that we want to compute a rational
function ρ = r/t ∈ F5 (x), with r,t ∈ F5 [x] of degree at most one, such that ρ(i) = 2i
for i = 0, 1, 2. Exercise 5.4 computes the interpolating polynomial g = 3x2 + 3x + 1
of degree less than 3. The Extended Euclidean Algorithm for m = x(x−1)(x−2) =
x3 + 2x2 + 2x and g computes
120 5. Modular algorithms and interpolation

j qj ρj rj sj tj
3 2
0 1 x + 2x + 2x 1 0
2
1 x+1 3 x +x+2 0 2
2 x+4 4 x+2 4 2x + 2
3 x+2 4 1 4x + 1 2x2 + 1
4 1 0 x2 + x + 2 3x3 + x2 + x
and from row 2 we get the desired rational function
r2 x+2 3x + 1
ρ= = = ∈ F5 (x).
t2 2x + 2 x+1
Row 3 gives another rational interpolating function, namely
r3 1 3
ρ= = 2 = 2 .
t3 2x + 1 x + 3
Row 4 would yield ρ = r4 /t4 = 0, but this is obviously not an interpolating func-
tion. We have gcd(r4 ,t4 ) = gcd(m,t4 ) = m.
(ii) Let F = Q, n = 3, u0 = 0, u1 = 1, u2 = −1, v0 = 1, v1 = 2, v2 = 2, and suppose
that we are looking for a rational function ρ = r/t ∈ Q(x) satisfying (20) for k = 2.
Making the ansatz r = a1 x + a0 and t = b1 x + b0 and plugging in u0 , u1 , u2 , we
arrive at the linear system
a0 = b0 , a1 + a0 = 2(b1 + b0 ), −a1 + a0 = 2(−b1 + b0 ),
which is equivalent to (21). It simplifies to
a0 = b0 , 2a0 = 4b0 , 2a1 = 4b1 ,
and hence r = 2x, t = x form—up to multiplication by a constant—the unique
solution of (21). However, the rational function
r 2x
ρ=
= =2
t x
does not solve (20) since obviously ρ(u0 ) = ρ(0) 6= 1 = v0 , and hence (20) has no
solution.
The Extended Euclidean Algorithm for m = x(x − 1)(x + 1) = x3 − x and the
interpolating polynomial x2 + 1 yields
j qj ρj rj sj tj
3
0 1 x −x 1 0
1 x 1 x2 + 1 0 1
1 1
2 x −2 x −2 2x
1 1 2
3 x 1 1 2x −2x + 1
4 1 0 − 21 x2 − 12 12 x3 − 21 x
5.9. Padé approximation 121

We see from row 2 that r = x and t = x/2 solves (21), but we are not allowed to
cancel the common factor x since ρ = 2 does not solve (20). ✸

The alternative way of solving (21) via a system of linear equations, as we did
above, works in general but is less efficient than the EEA.

C OROLLARY 5.20.
There is an algorithm that either computes the canonical form solution to (20) or
else certifies that (20) is unsolvable, using O(n2 ) arithmetic operations in F .

5.9. Padé approximation

Let F be a field and g = ∑i≥0 gi xi ∈ F[[x]] with all gi ∈ F be a formal power series
(Section 25.3). A Padé approximant to g is a rational function ρ = r/t ∈ F(x),
with r,t ∈ F[x] and x ∤ t, that “approximates” g to a sufficiently high power of x.
More precisely, r/t is a (k, n − k)-Padé approximant to g if
r
x ∤ t and ≡ g mod xn , deg r < k, degt ≤ n − k; (22)
t
the congruence is equivalent to r ≡ tg mod xn . Obviously r = ∑0≤i<n gi xi , the
Taylor expansion of order n of g around 0, and t = 1 is an (n, 0)-Padé approximant
for each n ∈ N, but it is not clear whether approximants for k < n exist. A more
general question is to ask for Padé approximants around u of a formal power series
in x − u for an arbitrary u ∈ F. This may be reduced to (22) by performing the shift
of variable x 7−→ x + u.
In numerical analysis, one is interested in approximating arbitrary (sufficiently
smooth) real-valued functions by “simple” functions such as polynomials or ratio-
nal functions. Taylor expansions and Padé approximants provide such approxima-
tions in the vicinity of the origin (or any other point, after an appropriate change of
variable). As in the case of interpolation, it was observed empirically that some-
times rational functions yield a much smaller approximation error, in particular
when the function to be approximated has singularities; see Example 5.23 below.
The similarity with Cauchy interpolation is clear: instead of prescribing the val-
ues of ρ at n distinct points u0 , . . . , un−1 , we have u0 = · · · = un−1 = 0 and prescribe
an initial segment of the Taylor expansion of ρ at u0 . Indeed the statements of the
previous section carry over almost literally if we replace m = (x − u0 ) · · · (x − un−1 )
by m = xn . The following is a consequence of Theorem 5.16.

C OROLLARY 5.21.
Let g ∈ F[x] have degree less than n ∈ N, k ∈ {0, . . . , n}, and r j , s j ,t j ∈ F[x] be the
jth row in the Extended Euclidean Algorithm for m = xn and g, where j is minimal
such that deg r j < k.
122 5. Modular algorithms and interpolation

(i) There exist polynomials r,t ∈ F[x] satisfying

r ≡ tg mod xn , deg r < k, degt ≤ n − k, (23)

namely r = r j and t = t j . If in addition gcd(r j ,t j ) = 1, then r and t also

solve (22).

(ii) If r/t ∈ F(x) is a canonical form solution to (22), then r = τ −1 r j and t =

τ −1t j , where τ = lc(t j ) ∈ F \ {0}. In particular, (22) is solvable if and only
if gcd(r j ,t j ) = 1.

E XAMPLE 5.22. (i) Let g = ∑i≥0 (i + 1)xi = 1 + 2x + 3x2 + 4x3 + · · · ∈ F5 [[x]],

and suppose we want to compute a (2, 2)-Padé approximant to g. The Extended
Euclidean Algorithm 3.14 for m = x4 and 4x3 + 3x2 + 2x + 1 ∈ F5 [x] yields

j qj ρj rj sj tj
4
0 1 x 1 0
1 x + 3 4 x3 + 2x2 + 3x + 4 0 4
2
2 x 1 x + 2x + 3 1 x+3
2 2
3 x + 2x + 3 4 1 x x + 3x + 1
4 1 0 4x3 + 3x2 + 2x + 1 4x4

and we can see in row 3 that

r3 1 1
= 2 =
t3 x + 3x + 1 (x − 1)2

is the required solution. In fact, this is a (k, n − k)-Padé approximant to g for all
values of k ≥ 1 and n such that n − k ≥ 2, since g is the formal derivative of the
geometric series 1/(1 − x) = ∑i≥0 xi , and hence g = 1/(x − 1)2 is the formal power
series inverse of (x − 1)2 .
The above table contains other Padé approximants to g: row 1 gives the trivial
(4, 0) approximant r1 /t1 = 4x3 + 3x2 + 2x + 1 and row 2 yields the (3, 1) approx-
imant r2 /t2 = (x2 + 2x + 3)/(x + 3), but row 4 does not give a Padé approximant
since x divides t4 , and in fact r4 /t4 = 0 does not approximate g.
(ii) Let g = x2 + 1 ∈ Q[[x]] and n = 3. Then there is no (2, 1)-Padé approximant
to g. To see why, we assume that there are polynomials r,t ∈ Q[x] of degree at
most 1 such that x ∤ t and r ≡ tg mod x3 . Let t = ax + b, with a, b ∈ F and b 6= 0.
Then
r ≡ (ax + b)(x2 + 1) ≡ bx2 + ax + b mod x3 ,

which is impossible since deg r ≤ 1.

5.9. Padé approximation 123

The Extended Euclidean Algorithm for m = x3 and x2 + 1 computes

j qj ρj rj sj tj
3
0 1 x 1 0
2
1 x 1 x +1 0 1
2 x −1 x −1 x
2
3 x 1 1 x −x + 1
2
4 1 0 −x − 1 x3
and from row 2, we see that r = t = x solve (23), but we are not allowed to divide
the common factor x out of r and t, since r/t = 1 6≡ x2 + 1 mod x3 . ✸

E XAMPLE 5.23. We want to approximate the tangent function in the vicinity of

the origin with an approximation error of O(x9 ) when x ∈ R approaches 0. The
Taylor series of tan x around the origin is
1 2 17 7
tan x = x + x3 + x5 + x +···
3 15 317
and converges for all x ∈ C such that |x| < π /2. At x = ±π /2 ≈ ±1.57, the tangent
function has simple poles. The Taylor polynomial of order 9 is
17 7 2 5 1 3
g= x + x + x + x ∈ Q[x],
317 15 3
and using rational function reconstruction, we obtain the Padé approximants of
order n = 9 in Table 5.6. (We have not made the denominators monic.) The
k = 9 or 8 k = 7 or 6 k = 5 or 4 k = 3 or 2
x5 + 45x3 − 630x −10x3 + 105x −945x
g
255x2 − 630 x4 − 45x2 + 105 2x6 + 21x4 + 315x2 − 945
TABLE 5.6: The Padé approximants of order 9 to the tangent function.

approximations are so good that in a plot it is hard to tell them apart from the
tangent function. Instead, Figure 5.7 on page 124 shows the difference between
the tangent function and each of the four Padé approximants from Table 5.6 on
the interval (−π /2, π /2). It can be seen that the Padé approximant for k = 5 or 4
is the best one. For example, its approximation error to tan(1.5) ≈ 14.1 is about
0.059, while the Taylor polynomial has an approximation error of about 9.54 at
that point. ✸

Corollary 5.21 yields a decision procedure for Padé approximants: Compute the
appropriate results r j and t j of the Extended Euclidean Algorithm. If their gcd is
one, then r j /t j is the unique (k, n − k)-Padé approximant as in (22), otherwise no
such approximant exists.
124 5. Modular algorithms and interpolation

0.5
Taylor
0.4
Padé k = 7 or 6
0.3 Padé k = 5 or 4
Padé k = 3 or 2
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
-1.5 -1 -0.5 0 0.5 1 1.5

F IGURE 5.7: The difference of tan x to its Padé approximants of order 9 around the origin.

C OROLLARY 5.24.
There is an algorithm that either computes the canonical form solution to (22) or
else certifies that (22) is unsolvable, using O(n2 ) arithmetic operations in F .

This algorithm will be put to use in Chapter 7. Using the fast Euclidean Algo-
rithm from Chapter 11, the running time drops to O(n log2 n loglog n) arithmetic
operations in F.

5.10. Rational number reconstruction

The integer analog of rational function reconstruction is, given integers m > g ≥ 0
and k ∈ {1, . . . , m}, to compute a rational number r/t ∈ Q, with r,t ∈ Z, such that
m
gcd(t, m) = 1 and rt −1 ≡ g mod m, |r| < k, 0≤t ≤ , (24)
k
where t −1 is the inverse of t modulo m. As in the polynomial case, we will see that
the related problem
m
r ≡ tg mod m, |r| < k, 0≤t ≤ , (25)
k
is always solvable, while (24) need not have a solution. The uniqueness statements
are a bit weaker than in the polynomial case, however. The following lemma is the
integer analog of the Uniqueness Lemma 5.15.
5.10. Rational number reconstruction 125

L EMMA 5.25. Let f , g ∈ N and r, s,t ∈ Z with r = s f + tg, and suppose that
f
|r| < k and 0 < t ≤ for some k ∈ {1, . . . , f }.
k
We let ri , si ,ti ∈ Z for 0 ≤ i ≤ ℓ + 1 be the results of the traditional Extended
Euclidean Algorithm for f , g, with ri ≥ 0 for all i. Moreover, we define j ∈
{1, . . . , ℓ + 1} by
r j < k ≤ r j−1 , (26)
and if j ≤ ℓ, we choose q ∈ N≥1 such that

r j−1 − qr j < k ≤ r j−1 − (q − 1)r j ,

and let q = 0 if j = ℓ + 1. Then there exists a nonzero α ∈ Z such that

either (r, s,t) = (αr j , αs j , αt j ) or (r, s,t) = (αr∗j , αs∗j , αt ∗j ),

where r∗j = r j−1 − qr j , s∗j = s j−1 − qs j , and t ∗j = t j−1 − qt j .

P ROOF. Multiplying r j = s j f + t j g by t and r = s f + tg by t j and subtracting, we

obtain
r j t − rt j = s j t f − st j f ≡ 0 mod f . (27)
By (26), we have 0 ≤ r j t < kt ≤ f and |rt j | < k f /r j−1 ≤ f , using Lemma 3.12.
Thus either r j t = rt j or r j t = rt j + f . In the first case, (27) implies that s j t = st j ,
and since gcd(s j ,t j ) = 1 by Lemma 3.8 (v), we have t = αt j for some α ∈ Z;
furthermore α is nonzero since t is. Finally, st j = s j t = s j αt j and rt j = r j t = r j αt j ,
and hence s = αs j and r = αr j since t j 6= 0.
Before we tackle the second case, we note that

r j t ∗j − r∗j t j = r j (t j−1 − qt j ) − (r j−1 − qr j )t j = r j t j−1 − r j−1t j = (−1) j f , (28)

s j t ∗j − s∗j t j = s j (t j−1 − qt j ) − (s j−1 − qs j )t j = s j t j−1 − s j−1t j = (−1) j , (29)

by Lemma 3.8. So we assume now that

r j t = rt j + f . (30)

Then 0 ≤ r j t < kt ≤ f implies rt j < 0, and since rℓ = gcd( f , g) divides r, we have

|r| ≥ rℓ , j 6= ℓ + 1, and r j 6= 0. We claim that r∗j t = rt ∗j . In analogy to (27), we have

r∗j t − rt ∗j = s∗j t f − st ∗j f ≡ 0 mod f .

Since t j−1 and t j alternate in sign (Exercise 3.15), so do t j and t ∗j , whence rt ∗j > 0
and
r∗j t − rt ∗j < kt − rt ∗j ≤ f − rt ∗j < f .
126 5. Modular algorithms and interpolation

On the other hand, using (30) and (28) we obtain

r j (r∗j t − rt ∗j ) = r∗j ( f − rt j ) − r((−1) j f − r∗j t j ) = (r∗j − (−1) j r) f > −r j f ,

since r∗j + r j = r j−1 − (q − 1)r j ≥ k > |r| ≥ (−1) j r, by the choice of q. After
dividing by the positive integer r j , we obtain r∗j t − rt ∗j > − f , and the claim follows.
As in the first case, we conclude that s∗j t = st ∗j . Then equation (29) implies
that gcd(s∗j ,t ∗j ) = 1 and t = αt ∗j for some nonzero α ∈ Z, and finally s = αs∗j and
r = αr∗j , as above. ✷

The next theorem is an integer variant of Theorem 5.16. We say that a rational
number r/t ∈ Q, with r,t ∈ Z, is in canonical form if t > 0 and gcd(r,t) = 1.

T HEOREM 5.26.
Let g, m ∈ N with g < m, k ∈ {1, . . . , m}, and r j , s j ,t j ∈ Z be the jth row in the
Extended Euclidean Algorithm for m and g, where j is minimal such that r j < k.

(i) There exist r,t ∈ Z satisfying (25), namely (r,t) = (r j ,t j ) if t j > 0, and
(r,t) = (−r j , −t j ) otherwise. If in addition gcd(r,t) = gcd(r j ,t j ) = 1, then r
and t also solve (24).
(ii) If r/t ∈ Q is a canonical form solution to (24), then either (r,t) = (τ r j , τ t j )
or (r,t) = (τ r∗j , τ t ∗j ), where r∗j , t ∗j are as in Lemma 5.25 and τ = sign(t j ) or
τ = sign(t ∗j ), respectively.

(iii) Equation (24) is solvable if and only if (gcd(r∗j ,t ∗j ) = 1 and |t ∗j | ≤ m/k) or

gcd(r j ,t j ) = 1.

(iv) There is at most one canonical form solution to (24) satisfying |r| < k/2.

P ROOF. (i) We have r j = s j m + t j g ≡ t j g mod m, and conclude from Lemma

3.12 that
m m
|t j | ≤ ≤ ,
r j−1 k
by the minimality of j. This proves that (r,t) = (±r j , ±t j ) solve (25). The proof
of the other claim is as in Theorem 5.16.
(ii) Letting s = (r − tg)/m, Lemma 5.25 implies that either (r,t) = (αr j , αt j ) or
(r,t) = (αr∗j , αt ∗j ) for some nonzero α ∈ Z. Furthermore, gcd(r,t) = 1 implies that
α = ±1, and the claim follows.
(iii) By (i), (±r j , ±t j ) is a canonical form solution of (24) if and only if r j and t j
are coprime. Since r∗j < k, the pair (±r∗j , ±t ∗j ) is a canonical form solution if and
only if |t ∗j | ≤ m/k and gcd(r∗j ,t ∗j ) = 1. By (ii), these are the only possible solutions.
5.10. Rational number reconstruction 127

(iv) Let both r/t and r∗ /t ∗ be canonical form solutions of (24) with |r| < k/2
and |r∗ | < k/2. Since m divides r − tg and r∗ − t ∗ g, it also divides t ∗ (r − tg) −
t(r∗ − t ∗ g) = rt ∗ − r∗t. However, |r|t ∗ < m/2 and |r∗ |t < m/2, whence rt ∗ = r∗t.
The claim now follows from gcd(r,t) = gcd(r∗ ,t ∗ ) = 1. ✷

We note that t j t ∗j < 0 (Exercise 3.15) and r j , r∗j ≥ 0, and hence the two possible
solutions r j /t j and r∗j /t ∗j of (24) have opposite signs (as rational numbers).

E XAMPLE 5.27. (i) The Extended Euclidean Algorithm for m = 29 and g = 12

works as follows:
j qj rj sj tj
0 29 1 0
1 2 12 0 1
2 2 5 1 −2
3 2 2 −2 5
4 2 1 5 −12
5 0 −12 29
For k = 10, the smallest j such that r j ≤ k is j = 2, and in fact r2 /t2 = −5/2 ≡ 12
mod 29. Now r1 − (q − 1)r2 ≥ k > r1 − qr2 is satisfied for q = 1, whence r2∗ =
r1 − r2 = 7 and t2∗ = t1 − t2 = 3. But |t2∗ | = 3 > 29/10 = m/k, and hence −5/2 is
the only solution of (25) and (24).
For k = 9, j and q are as before, but now |t2∗ | = 3 ≤ 29/9 = m/k, and we have a
second solution r2∗ /t2∗ = 7/3 ≡ 12 mod 29 of (25) and (24).
(ii) For m = 22 and g = 9, the Extended Euclidean Algorithm gives

j qj rj sj tj
0 22 1 0
1 2 9 0 1
2 2 4 1 −2
3 4 1 −2 5
4 0 9 −22

For k = 10, we have j = 1, and r1 /t1 = 9/1 is obviously a solution of (24). Now
q = 2, (r1∗ ,t2∗ ) = (r2 ,t2 ) = (4, −2), and |t1∗ | = 2 ≤ 22/10 = m/k, whence (r1∗ ,t1∗ ) is
a second solution of (25). But gcd(r1∗ ,t1∗ ) = 2, and r1∗ /t1∗ = −2 is not a solution
of (24). Thus we have two solutions of (25), but only one of them also solves (24).
For k = 9, we have j = 2, and (r2 ,t2 ) = (4, −2) is a solution of (25), but not
of (24). Here, q = 1 and (r2∗ ,t2∗ ) = (r1 − r2 ,t1 − t2 ) = (5, 3), but |t2∗ | = 3 > 22/9 =
m/k, and hence (r2∗ ,t2∗ ) does not solve (25), so that (25) has a unique solution and
(24) is unsolvable.
If k = 7, however, then j and q are as before, but now |t2∗ | = 3 ≤ 22/7 = m/k,
and r2∗ /t2∗ = 5/3 is the only solution of (24).
128 5. Modular algorithms and interpolation

(iii) Let m = 36 and g = 13. The Extended Euclidean Algorithm for m and g
yields
j qj rj sj tj
0 36 1 0
1 2 13 0 1
2 1 10 1 −2
3 3 3 −1 3
4 3 1 4 −11
5 0 −13 36
and for k = 11 we find that j = 2, (r2 ,t2 ) = (10, −2) solves (25) but not (24), q = 1,
and (r2∗ ,t2∗ ) = (r1 − r2 ,t1 − t2 ) = (3, 3) also solves (25) but not (24). Thus (25) has
two solutions while (24) has none. ✸

Together with the Chinese Remainder Algorithm for integers, Theorem 5.26
leads to a Chinese Remainder Algorithm for rational numbers (Exercise 5.44).

5.11. Partial fraction decomposition

We discuss another one of the numerous applications of the Chinese Remainder
Theorem for polynomials. It will be put to use in Chapter 22.
Let F be a field, f1 , . . . , fr ∈ F[x] nonconstant monic and pairwise coprime poly-
nomials, e1 , . . . , er ∈ N positive integers, and f = f1e1 · · · frer . (We will see in Part III
how to factor polynomials over finite fields and over Q into irreducible factors, but
here we do not assume irreducibility of the fi .) For another polynomial g ∈ F[x]
of degree less than n = deg f , the partial fraction decomposition of the rational
function g/ f ∈ F(x) with respect to the given factorization of the denominator f is
g g1,1 g1,e gr,1 gr,e
= + · · · + e11 + ··· + + · · · + err , (31)
f f1 f1 fr fr
with gi j ∈ F[x] of smaller degree than fi , for all i, j. If all fi are linear polynomials,
then the gi j are just constants.

E XAMPLE 5.28. Let F = Q, f = x4 − x2 , and g = x3 + 4x2 − x − 2. The partial

fraction decomposition of g/ f with respect to the factorization f = x2 (x−1)(x+1)
of f into linear polynomials is
x3 + 4x2 − x − 2 1 2 1 −1
= + 2+ + .✸ (32)
x4 − x2 x x x−1 x+1

The following questions pose themselves: Does a decomposition as in (31) al-

ways exist uniquely, and how can we compute it? The next lemma is a first step
towards an answer.
5.11. Partial fraction decomposition 129

L EMMA 5.29. There exist unique polynomials ci ∈ F[x] with deg ci < ei deg fi for
all i such that
g c1 cr
= e1 + · · · + er . (33)
f f1 fr

P ROOF. We multiply both sides in (33) by f and obtain the linear equation
e e
g = c1 ∏ f j j + · · · + cr ∏ f j j (34)
j6=1 j6=r

with “unknowns” c1 , . . . , cr . (We have already seen in Section 4.5 how to find
polynomial solutions of such equations.) For any i ≤ r, each summand with the
e
possible exception of the ith one is divisible by fiei , whence g ≡ ci ∏ j6=i f j j mod fiei .
Now each f j is coprime to fi and hence invertible modulo fiei , and we obtain
−e j
ci ≡ g ∏ f j mod fiei , (35)
j6=i

which together with deg ci < deg fiei uniquely determines ci .

On the other hand, if we define ci ∈ F[x] of degree less than ei deg fi by (35), for
all i, and let g∗ be the right hand side in (34), then g∗ is a polynomial of degree less
than n and g∗ ≡ g mod fiei for all i. Now the Chinese Remainder Theorem implies
that g∗ ≡ g mod f , and since both polynomials have degree less than n, they are
equal. ✷

It remains to say how to obtain the decomposition (31) from (33). This uses the
following generalization of the Taylor expansion. Let R be a ring (commutative,
with 1) and a, p ∈ R[x] with p monic of degree m > 0 and a of degree less than km,
for some k, m ∈ N. The p-adic expansion of a is

a = ak−1 pk−1 + · · · + a1 p + a0 , (36)

where a0 , . . . , ak−1 ∈ R[x] have degree less than m. If p = x − u is a linear polyno-

mial, then the a j are constants, and (36) is just the Taylor expansion of a around u.
If a, p are positive integers and 0 ≤ ai < p for all i, then (36) is the familiar radix
p representation of a.

L EMMA 5.30. The p-adic expansion exists uniquely, and it can be computed us-
ing at most (km)2 − km2 operations in R.

P ROOF. Dividing a by p with remainder yields a = qp + r, with q, r ∈ R[x] such

that deg r < m. Taking a0 = r and recursively computing the p-adic expansion q =
ak−1 pk−2 + · · · + a1 of q, we obtain the p-adic expansion of a, proving existence.
130 5. Modular algorithms and interpolation

For the uniqueness, let a = a∗k−1 pk−1 + · · · + a∗1 p + a∗0 be another p-adic expansion.
Then a∗0 is the remainder and a∗k−1 pk−2 + · · · + a∗1 is the quotient of a on division
by p. By induction, the p-adic expansion of the quotient is unique, and hence so is
the p-adic expansion of a.
The cost for the first division with remainder is 2 deg p(1 + deg a − deg p) ≤
2m2 (k − 1), and hence the total number of operations in R is at most

2m2 ∑ i = m2 (k2 − k). ✷

1≤i≤k−1

Putting it all together, we have the following theorem.

T HEOREM 5.31.
The partial fraction decomposition (31) exists uniquely, and it can be computed
using O(n2 ) operations in F .

P ROOF. The existence follows from the preceding two lemmas by taking the fi -
adic expansion ci = gi,ei fiei −1 + · · · + gi,2 fi + gi,1 with gi j ∈ F[x] of degree less than
deg fi for all i, j. If

g g∗1,1 g∗1,e g∗r,1 g∗r,e

= + · · · + e11 + · · · + + · · · + err
f f1 f1 fr fr

is another partial fraction decomposition of g/ f , with g∗i j ∈ F[x] of degree less than
deg fi for all i, j, then Lemma 5.29 implies that ci = g∗i,ei fiei −1 + · · · + g∗i,2 fi + g∗i,1 for
all i, and the uniqueness of the fi -adic expansion implies that gi j = g∗i j , for all i, j.
To prove the running time bound, we let di = ei deg fi and compute mi = fiei
and vi = g rem fiei , at a cost of O(ndi ) operations in F altogether, for 1 ≤ i ≤ r,
taking in total O(n2 ) operations. Then we perform steps 1 and 2 of the Chi-
nese Remainder Algorithm 5.4 with input m1 , . . . , mr and v1 , . . . , vr to compute
ci ≡ vi ( f /mi )−1 ≡ g( f / fiei )−1 mod mi for all i, taking another O(n2 ) operations,
by Theorem 5.7. Finally, we compute the fi -adic expansion of ci , taking O(di2 )
operations, by Lemma 5.30, for each i. This is dominated by the cost for the first
step, and the claim follows. ✷

E XAMPLE 5.28 (continued). With f and g as in Example 5.28, we have m1 = x2 ,

m2 = x − 1, m3 = x + 1, and f = m1 m2 m3 . Now v1 = g rem x2 = −x − 2, v2 =
g rem x − 1 = 2, and v3 = g rem x + 1 = 2. Then step 2 of the Chinese Remainder
Algorithm 5.4 computes
−1
m
c1 ≡ v1 = (−x − 2)(x2 − 1)−1 ≡ (−x − 2)(−1)−1 = x + 2 mod x2 ,
m1
Notes 131

−1
m
c2 ≡ v2 = 2(x3 + x2 )−1 ≡ 2 · 2−1 = 1 mod x − 1,
m2
−1
m
c3 ≡ v3 = 2(x3 − x2 )−1 ≡ 2 · (−2)−1 = −1 mod x + 1,
m3

and hence
x3 + 4x2 − x − 2 x + 2 1 −1
4
= 2 + + .
x +x x x−1 x+1
Using the x-adic expansion 1 · x + 2 of x + 2 immediately leads to (32). ✸

A different way to compute the partial fraction decomposition (31) is to plug in

the gi j with unknown coefficients, clear up denominators, and solve the linear sys-
tem that arises from comparing coefficients on both sides. The coefficient matrix
of the linear system is an n × n matrix, and its solution using Gaussian elimination
takes O(n3 ) operations in F (Section 25.5). However, the method suggested in
the above theorem is faster by one order of magnitude. Using asymptotically fast
algorithms for division with remainder (Section 9.2) and Chinese remaindering
(Section 10.3), the time even drops to O(n log2 n loglog n).

Notes. 5.1. A general theory of these representations and conversions for polynomials
and rational functions is in von zur Gathen (1986).
5.2. The Lagrange interpolant was invented in Waring (1779) and Lagrange (1795), page
286.
5.3. The secret sharing scheme is from Shamir (1979). Asmuth & Blakley (1982) propose
using the CRA for fault-tolerant communication, and Rabin (1989) uses interpolation.
5.4. The name of the Chinese Remainder Theorem derives from the Suan-ching (arith-
metic) of Sun-Tsŭ, written about the first century AD. He solves a particular problem
(Exercise 5.15) in verse-form, using the integer versions of the “Lagrange interpolants”
li as in the proof of Theorem 5.2; see Shen (1988) and Ku & Sun (1992). Variants of
the question appear later in Chinese, Indian, and European mathematics, for example in
Schwenter (1636), 3. Auffgab, with moduli 3, 5, 7, and 8. General solutions are due to
Euler (1734/35a, 1747/48), Lagrange (1770b), §25, Gauß (1801), article 32, and Cauchy
(1841).
Euler (1760/61) proved Corollary 5.6 about his totient function, and also that ak ≡ 1
er−1 er−1
mod m when gcd(a, m) = 1 and k = lcm(ϕ(pe00 ), . . . , ϕ(pr−1 )), where m = pe00 · · · pr−1 is
the prime factorization of m (Exercise 18.13); this is also in Gauß (1801), article 92. Since
Gauß (1801), article 38, the notation ϕ is used.
In a more general version of the Chinese Remainder Theorem, the moduli mi are not
required to be pairwise coprime, but only that vi ≡ v j mod gcd(mi , m j ) for all i, j. Under
these conditions solutions exist always and are unique modulo the least common multiple
of all mi (Exercise 5.23).
5.5. Gauß introduced his elimination method for astronomical calculations (Gauß 1809,
article 182; Gauß 1810). Lagrange (1759) presented a similar procedure for 2 × 2 and
132 5. Modular algorithms and interpolation

3 × 3 matrices. Edmonds (1967) and Bareiss (1968) showed that the intermediate results
of Gaussian elimination over Q are polynomially bounded.
The modular determinant computation for polynomial matrices, called the interpolation
method, is already in Mikeladze (1948); see also Faddeev & Faddeeva (1963), Section 49.
Early suggestions for modular computer arithmetic are in Svoboda & Valach (1955), Svo-
boda (1957), and Garner (1959); see Szabó & Tanaka (1967) for a discussion.
5.8. Cauchy (1821) discusses his rational interpolation problem without paying attention
to its solvability. Kronecker (1881a), page 544, was the first to point out the bisher wohl
noch nicht bemerkte Einschränkung der Lösbarkeit der Cauchy’schen Aufgabe1, namely,
that (20) may have no solution (Exercise 5.36).
5.9. The Padé approximation problem derives its name from Padé’s (1892) dissertation. It
is a bit of a misnomer, since Kronecker (1881a) already stated and solved the problem, also
proving the necessary and sufficient conditions for solvability (Corollary 5.21). However,
Kronecker’s approach was purely algebraic (as is ours), while Padé also considered func-
tions like exp(x) and brought these approximations to the attention of numerical analysts.
Jacobi (1846) had given an explicit solution to (23). Frobenius (1881) describes relations
between the various Padé approximants of one power series.
Baker & Graves-Morris (1996) explain in detail the theory of Padé approximants, its
connection with continued fractions, and its application to root and singularity finding,
convergence acceleration, and various other problems in numerical analysis and theoretical
physics. They also discuss the numerical stability of different methods for computing Padé
approximants.
5.10. Theorem 5.26 is essentially in Kaltofen √ & Rolletschek (1989). The existence of a
solution to (25) in the special case where k = ⌊ m⌋ + 1 was shown by Thue (1902).
5.11. The partial fraction decomposition is described in Euler (1748a), §39 ff. and Cauchy
(1821), Chapter XI.

Exercises.
5.1 Let m0 , . . ., mr ∈ N≥2 .
(i) Prove that every nonnegative integer a < m0 · · ·mr has a mixed-radix representation of the
form
a = a0 + a1 m0 + a2 m0 m1 · · · + ar m0 · · ·mr−1 ,
with unique integers ai satisfying 0 ≤ ai < mi for all i. Relate this to the usual p-adic representation
of an integer a, for an integer p > 1.
(ii) Compute the above representation of a = 42 for m0 = 2, m1 = 3, m2 = 2, and m3 = 5.
(iii) What is the analogous mixed-radix representation for polynomials?
5.2∗ Let a = s/t ∈ Q, with coprime s,t ∈ N such that 0 < s < t. With respect to an arbitrary base
p ∈ N≥2 , a has a unique periodic p-adic expansion
a= ∑ ai p−i ,
i≥1

with all ai ∈ {0, . . ., p − 1}. We say that this expansion is purely periodic if there is a positive l ∈ N
such that ai+l = ai for all i ≥ 1, and the least such l is the length of the period. Moreover, we let k ∈ N
be the smallest integer such that the sequence ak+1 , ak+2 , . . . is purely periodic, and call it the length
of the preperiod. For example, the 10-adic representation of 1/6 is 0.16 = 1 · 10−1 + ∑i≥2 6 · 10−i ,
with k = l = 1.
1 constraint on the solvability of Cauchy ’s problem, apparently unnoticed hitherto
Exercises 133

(i) Show that there exist unique t ∗ , u ∈ N with t = ut ∗ such that gcd(p,t ∗ ) = 1 and every prime
divisor of u divides p.
(ii) Prove that the p-adic expansion of a terminates (so that only finitely many ai are nonzero) if
and only if t ∗ = 1.
(iii) Show that the p-adic expansion of a is purely periodic if and only if p and t are coprime, and
that then l = ordt (p), the order of p in the multiplicative group Zt× .
(iv) Prove that l = ordt ∗ (p) and k = min{n ∈ N: u | pn } in the general case.
(v) Conclude that l ≤ ϕ(t ∗ ) < t and k ≤ log2 t.
5.3 Let R be a ring (commutative, with 1) and u ∈ R. Prove that Horner’s rule not only computes the
remainder f (u) of a polynomial f ∈ R[x] of degree n − 1 on division by x − u but also the coefficients
of the quotient ( f − f (u))/(x − u).
5.4 Let F5 = Z5 be the finite field with 5 elements.
(i) Compute a polynomial f ∈ F5 [x] of degree at most 2 satisfying
f (0) = 1, f (1) = 2, f (2) = 4 (37)
(ii) List all polynomials f ∈ F5 [x] of degree at most 3 satisfying (37). How many of degree at
most 4 are there? Generalize your answer to solutions of degree at most n for n ∈ N.
5.5 Let F7 = Z7 be the finite field with 7 elements and m = x(x + 1)(x + 6) = x3 + 6x ∈ F7 [x].
(i) Let J ⊆ F7 [x] be the set of all polynomials h ∈ F7 [x] solving the interpolation problem
h(0) = 1, h(1) = 5, h(6) = 2.
Compute the unique polynomial f ∈ J of least degree.
(ii) Find a surjective ring homomorphism χ: F7 [x] −→ F73 such that ker χ = hmi = {rm: r ∈ F7 [x]},
and compute χ( f ) and χ(x2 + 3x + 2).
(iii) Show that J = f + ker χ = { f + rm: r ∈ F7 [x]}.
5.6 Let r = x3 + x2 ∈ F5 [x].
(i) List all polynomials f ∈ F5 [x] of degree at most 5 satisfying
f (a) = r(a) for all a ∈ F5 . (38)
(ii) How many polynomials f ∈ F5 [x] of degree at most 6 solve (38)?
5.7 (i) Show that ∑ li = 1, where li are the Lagrange interpolants as in (2).
0≤i<n
(ii) Let un ∈ F be another point different from u0 , . . ., un−1 . Show how one can obtain the Lagrange
interpolants l0∗ , . . ., ln−1
∗ , l ∗ corresponding to u , . . ., u from l , . . ., l
n 0 n 0 n−1 .
5.8 Let R be an integral domain, u0 , . . ., un−1 ∈ R, and V = VDM(u0 , . . ., un−1 ) ∈ Rn×n . Prove that
detV = ∏ (ui − u j ).
1≤ j<i≤n

Hint: Replace un−1 by an indeterminate and proceed by induction on n.

5.9 Let F be a field, u0 , . . ., un−1 ∈ F distinct, m = (x−u0 ) · · ·(x−un−1 ) ∈ F[x], and f ∈ F[x]. Prove
that f m′ ≡ ∑0≤i<n f (ui )m/(x−ui ) mod m, where m′ is the formal derivative of m (Section 9.3). Hint:
Exercise 9.22.
5.10 Let F be a field and f /g ∈ F(x) a rational function, with f , g ∈ F[x] such that g 6= 0 is monic
and gcd( f , g) = 1. We say that f /g is defined at a point u ∈ F if g(u) 6= 0, and then the value
of the rational function at u is f (u)/g(u). Let f ∗ /g∗ ∈ F(x) be another rational function, with
g∗ 6= 0 monic and gcd( f ∗ , g∗ ) = 1, such that f /g and f ∗ /g∗ are defined and their values coincide at
n = max{deg f + deg g∗ , deg f ∗ + deg g} + 1 distinct points u0 , . . ., un−1 ∈ F. Prove that f = f ∗ and
g = g∗ .
134 5. Modular algorithms and interpolation

5.11∗ Another possibility to compute the interpolating polynomial of least degree is Newton inter-
polation. Suppose that u0 , . . ., un−1 , v0 , . . ., vn−1 in a field F are given, with distinct u0 , . . ., un−1 , and
let f ∈ F[x] of degree less than n be the interpolating polynomial with f (ui ) = vi for all i. We divide
f by x − u0 with remainder and obtain f = (x − u0 )g + f (u0 ) = (x − u0 )g + v0 for some g ∈ F[x] of
degree deg f − 1. For i ≥ 1, the value of g at ui is g(ui ) = (vi − v0 )/(ui − u0 ), and we can determine
g recursively in the same fashion.
(i) Design an algorithm for Newton interpolation, prove that is works correctly, and analyze its
cost. It is possible to solve the problem with at most 52 n2 operations in F.
(ii) Trace your algorithm on the examples of Exercises 5.4 and 5.5.
(iii) What is the connection to the mixed-radix representation of f , as discussed in Exercise 5.1,
when mi = x − ui for 0 ≤ i < n?
5.12∗ Let F be a field, u0 , . . ., un−1 ∈ F \ {0} with ui 6= ±u j for 0 ≤ i < j < n, and v0 , . . ., vn−1 ∈ F.
(i) Let f ∈ F[x] of degree less than 2n be such that f (ui ) = f (−ui ) for 0 ≤ i < n. Prove that
f (x) = f (−x), so that f is even.
(ii) Use the Lagrange interpolation formula and (i) to show that there is a unique even interpolating
polynomial f ∈ F[x] of degree less than 2n such that f (ui ) = vi for 0 ≤ i < n.
(iii) Let g ∈ F[x] be the unique polynomial of degree less than n such that g(u2i ) = vi for 0 ≤ i < n.
How is the polynomial f from (ii) related to g?
(iv) What are the statements corresponding to (i) through (iii) for odd interpolating polynomials,
provided that ui 6= 0 for all i?
(v) Compute an even polynomial f0 ∈ R[x] of degree at most 4 interpolating the cosine function at
u0 = π/6, u1 = π/3, and u2 = π/2, and an odd polynomial f1 ∈ R[x] of degree at most 5 interpolating
the sine function at those points.
(Euler (1783) stated interpolation formulas for odd and even functions.)
5.13∗ In this exercise, we discuss bivariate interpolation.
(i) Develop an algorithm for computing f ∈ F[x, y], F a field, where the degree of f in y is less
than n and
f (x, ui ) = vi for i = 0, 1, . . ., n − 1,
for distinct ui ∈ F and arbitrary vi ∈ F[x]. Show that f is unique.
(ii) Assuming that the degree of each vi is less than m, what is the computing time of your algorithm
(in terms of m and n)?
(iii) Compute f ∈ F11 [x, y] such that
f (x, 0) = x2 + 7, f (x, 1) = x3 + 2x + 3, f (x, 2) = x3 + 5.
5.14 Let F be a field, f ∈ F[x] of degree less than n, and u0 , . . ., un−1 ∈ F \ {0} distinct. Determine
the set of all interpolation polynomials g ∈ F[x] of degree less than n with g(ui ) = f (ui ) for 0 ≤
i ≤ n − 2. (In the situation of Section 5.3, this represents the knowledge of all players minus player
n − 1.) Let c ∈ F. How many of these g have constant coefficient c? (Your answer should imply that
the secret sharing scheme is secure.)
5.15 What is the least nonnegative integer f with f ≡ 2 mod 3, f ≡ 3 mod 5, and f ≡ 2 mod 7?
5.16 How many common solutions f ∈ Z with 0 ≤ f < 106 do the following congruences possess?
f ≡ 2 mod 11, f ≡ −1 mod 13, f ≡ 10 mod 17.
5.17 Carl Friedrich, Joachim, and Jürgen met at a Sylvester party on Thursday, 31 December 1998.
They agreed to play Skat (a German card game) together some day as soon as all of them find the time
to do so. But they got into the usual troubles: Carl Friedrich was busy except on Fridays, Joachim
had time on 7 January and then again every 9th day, and Jürgen was free on 6 January and then again
every 11th day. Which date did they agree upon?
Exercises 135

5.18 Ernie, Bert, and the Cookie Monster want to measure the length of Sesame Street. Each of
them does it his own way. Ernie relates: “I made a chalk mark at the beginning of the street and
then again every 7 feet. There were 2 feet between the last mark and the end of the street.” Bert tells
you: “Every 11 feet, there are lamp posts in the street. The first one is 5 feet from the beginning,
and the last one is exactly at the end of the street.” Finally, the Cookie Monster says: “Starting at the
beginning of Sesame Street, I put down a cookie every 13 feet. I ran out of cookies 22 feet from the
end.” All three agree that the length does not exceed 1000 feet. How long is Sesame Street?
5.19 (i) Find a polynomial in F5 [x] of degree four which is reducible but has no roots in F5 . Are
there such examples of lower degree?
(ii) Which of the following polynomials in F5 [x] are irreducible, which are reducible?

m0 = x2 + 2, m1 = x2 + 3x + 4, m2 = x3 + 2, m3 = x3 + x + 1.

(iii) Conclude that the system

f ≡ x + 1 mod m0 , f ≡ 3 mod m1

has a solution f ∈ F5 [x], and compute the unique solution of least degree.
5.20 Compute a solution f ∈ F5 [x] of the system of congruences

f ≡ 1 mod x + 1, x · f ≡ x + 1 mod x2 + 1, (x + 1) f ≡ x + 1 mod x3 + 1

such that deg f < 5. Hint: First bring each of the congruences into the form f ≡ v mod m for some
v, m ∈ F5 [x], using Exercise 4.15. What is the set of all solutions without the degree constraint?
5.21−→ Let m0 = x2 + 1, m1 = x2 − 1, m2 = x3 + x − 1, v0 = −x, v1 = x + 1, and v2 = x5 − x in
F3 [x].
(i) How many polynomials f ∈ F3 [x] are there with f ≡ vi mod mi for i = 0, 1, 2, and deg f ≤ 8?
Answer this without solving (ii).
(ii) Give a list of all f as in (i).
5.22∗ Let p0 , p1 ∈ N be distinct primes, m = p0 p1 , n ∈ N, and u0 , . . ., un−1 , v0 , . . ., vn−1 ∈ Z.
(i) Show that there exists an interpolating polynomial f ∈ Z[x] such that

f has coefficients in {0, . . ., m − 1}, deg f < n, and f (ui ) ≡ vi mod m for 0 ≤ i < n (39)

if and only if
ui ≡ u j mod pk =⇒ vi ≡ v j mod pk
for 0 ≤ i < j < n and k = 0, 1.
(ii) Show that (39) has a unique solution if and only if ui 6≡ u j mod pk for 0 ≤ i < j < n and
k = 0, 1.
(iii) Compute all interpolating polynomials f ∈ Z[x] with coefficients in {0, . . ., 14}, deg f < 3,
and
f (1) ≡ 2 mod 15, f (2) ≡ 5 mod 15, f (4) ≡ −1 mod 15.
5.23∗ (i) Let R be a Euclidean domain, m0 , m1 ∈ R \ {0}, and v0 , v1 ∈ R. Show that

f ≡ v0 mod m0 , f ≡ v1 mod m1

has a solution f ∈ R if and only if v0 ≡ v1 mod gcd(m0 , m1 ). Hint: Theorem 4.10.

(ii) Compute one particular solution for R = Z, m0 = 36, m1 = 42, v0 = 2, v1 = 8, and describe
the set of all solutions.
5.24∗ Design and analyze a Chinese Remainder Algorithm à la “Newton interpolation” (Exer-
cise 5.11). Trace your algorithm on the problems of Exercises 5.16 and 5.20.
136 5. Modular algorithms and interpolation

5.25 Are the two rings Z5 × Z12 and Z3 × Z20 isomorphic?

5.26 Enumerate Z×
m and determine ϕ(m) for m = 11, 16, 33, 42.

5.27 Make a list showing all integers m for which ϕ(m) ≤ 10, and prove that your list is complete.
e
5.28 Let Fq be a finite field with q elements and f = f0e0 · · · fr−1 r−1
with f0 , . . ., fr−1 ∈ Fq [x] irreducible
and pairwise coprime and e0 , . . ., er−1 ∈ N>0 . Let n = deg f and ni = deg fi for all i. Recall the analog
Φ of Euler’s totient function (Exercise 4.19). Prove that

1
Φ( f ) = (qn0 − 1)qn0 (e0 −1) · · ·(qnr−1 − 1)qnr−1 (er−1 −1) = qn ∏ 1 − n .
0≤i<r qi

Hint: CRT.
5.29∗ Prove Theorem 5.8.
5.30 Let An = (i j )1≤i, j≤n ∈ Z n×n .
(i) Compute a good upper bound on | det An | in terms of n using Hadamard’s inequality 16.6.
(ii) Compute det A3 with the small primes modular algorithm.
5.31 Use the familiar formula det A = ∑σ∈Sn sign(σ) · a1σ(1) · · ·anσ(n) for the determinant of a
square matrix A ∈ Z n×n , where Sn is the symmetric group of all n! permutations of {1, . . ., n} (Sec-
tion 25.1), to derive an upper bound on | det A| in terms of n and B = max1≤i, j≤n |ai j |. Compare this
to the Hadamard bound, and tabulate both bounds and their ratio for 1 ≤ n ≤ 10.
5.32−→ Let F be a field, n ∈ N>0 , and A = (ai j )1≤i, j≤n ∈ F[x]n×n a square matrix with polynomial
entries. Moreover, let m = max{deg ai j : 1 ≤ i, j ≤ n}.
(i) Find a tight upper bound r ∈ N on deg(det A) in terms of m and n.
(ii) Describe an algorithm for computing det A using a small primes modular approach if the field
F has more than r elements. Hint: Choose linear moduli. How many operations in F does your
algorithm use (in terms of n and m)?
(iii) Use your algorithm to compute the determinant of the matrix
 
−x + 1 0 2
A= x x + 1 2x  ∈ F7 [x]3×3 .
2x 3x + 1 x

(iv) Find a tight upper bound on deg(det A) in terms of the maximal degrees mi in the ith row of A
for 1 ≤ i ≤ n. (Sometimes, this bound or the corresponding bound arising from the maximal column
degrees is better than the bound from (i).)
(v) Using the bound from (iv), compute the determinant of
 
x−1 x−2 x−3
A =  2x + 1 2x + 3 2x − 2  ∈ F7 [x]3×3 .
x − 1 x + x + 1 (x − 1)2
2 2

5.33∗ The goal of this exercise is to show that nonsingular linear systems over Q can be solved in
polynomial time using a modular approach. Thus let A ∈ Z n×n and b ∈ Z n for some n ∈ N, and
assume that det A 6= 0. Then there is a unique solution x ∈ Q n of the linear system Ax = b, namely
x = A−1 b.
(i) Given a bound B ∈ N on the absolute values of the entries of A and b, show that the numerators
and denominators of the coefficients of x are less than nn/2 Bn in absolute value. Hint: Use Cramer’s
rule 25.6 and Hadamard’s inequality 16.6.
Exercises 137

(ii) We consider the following modular algorithm. Choose a prime p ∈ N greater than 2nn B2n , and
perform Gaussian elimination on A mod p and b mod p. Convince yourself that p ∤ det A. Find a
y ∈ Z n such that y mod p is the unique solution of the modular linear system (A mod p)(y mod p) =
b mod p. Now x ≡ y mod p, and we can reconstruct x from y using rational number reconstruction
(Section 5.10) for each of the coefficients. Prove that this algorithm works correctly.
(iii) Show that the running time of the algorithm is O(n3 log2 p) word operations, or O∼ (n5 log2 B)
when p is close to 2nn B2n .
(iv) Run your algorithm on the matrix A3 from Example 5.30 and the vector b = (1, 1, 1)T .
5.34∗ Given are a positive integer n ∈ N, two polynomials a = ∑0≤i<n ai xi , b = ∑0≤i<n bi xi in
Z[x], and a bound B ∈ N on the coefficients such that |ai |, |bi | ≤ B for 0 ≤ i < n. Moreover, let
ab = c = ∑0≤i<2n ci xi ∈ Z[x].
(i) Find a tight common upper bound on the |ci | in terms of n and B.
(ii) Describe an algorithm for the computation of c using a small primes modular approach.
(iii) Trace your algorithm on the computation of the product of
a = 987x3 + 654x2 + 321x, b = −753x3 − 333x2 − 202x + 815.
5.35−→ Let n + 1 points u0 < u1 < · · · < un in R be given, and v0 , . . ., vn ∈ R arbitrary. You are to
find a twice continuously differentiable function f : [u0 , un ] −→ R which takes the value vi at point ui
and has f ′ (u0 ) = f ′′ (u0 ) = 0, as follows. Construct a sequence of polynomials f0 , . . ., fn ∈ R[x] of
degree at most 3 such that f0 = v0 ,
′
fi (ui−1 ) = fi−1 (ui−1 ), fi′ (ui−1 ) = fi−1 ′′
(ui−1 ), fi′′ (ui−1 ) = fi−1 (ui−1 ), fi (ui ) = vi ,
for 1 ≤ i ≤ n. This amounts to solving a Hermite interpolation problem for each interval [ui−1 , ui ],
with three conditions on fi and its first two derivatives at the left boundary and one condition on fi
at the right boundary. Then f is defined to be equal to fi on the interval [ui−1 , ui ], for all i. Such an
f is called a cubic spline.
(i) Prove that f0 , . . ., fn exist uniquely.
(ii) Compute and draw the cubic spline for the data ui = i for 0 ≤ i ≤ 3, v0 = v3 = 1, v2 = 0, and
v3 = −1.
(iii) In (ii), give various other values to v2 , say −5, −3, −2, −1, 1, 2, 3, 5, 7.
5.36 (Kronecker 1881a, page 546) Let F = Q, n = 4, ui = i + 1 for i = 0, . . ., 3, v0 = 6, v1 = 3,
v2 = 2, v3 = 3. Show that for k = 2, (21) has a unique solution r,t ∈ Q[x] with t monic, while (20) is
unsolvable.
5.37 For 1 ≤ k ≤ 5, try to solve the Cauchy interpolation problem
r
t(i) 6= 0 and (i) = vi for 0 ≤ i ≤ 4, gcd(r,t) = 1 (40)
t
for polynomials r,t ∈ F5 [x] with deg r < k and degt ≤ 5 − k, where the vi are given by the following
table.
i 0 1 2 3 4
vi 1 2 3 2 1
For which values of k is there no solution?
5.38 Let F be a field, u0 , . . ., un−1 ∈ F distinct, v0 , . . ., vn−1 ∈ F, and S = {0 ≤ i < n: vi = 0}. Show
that the Cauchy interpolation problem (20) has no solution if k ≤ #S < n.
5.39 Tabulate all (k, n − k)-Padé approximants to g = x4 + x3 + 3x2 + 1 ∈ F5 [x] for 0 ≤ k ≤ n ≤ 5.
Mark the entries in the table where no approximant exists.
5.40−→ Give all Padé approximants in Q(x) to the exponential function exp(x) = 1 + x + x2 /2 +
x3 /6 + x4 /24 + · · · modulo x5 .
138 5. Modular algorithms and interpolation

5.41 Let F be a field, n ∈ N, g ∈ F[x] of degree less than n, and ℓ ∈ N>0 the number of division
steps in the Euclidean Algorithm for the pair (xn , g).
(i) Show that there are at most ℓ distinct coprime pairs (r,t) ∈ F[x] such that t 6= 0 is monic,

r ≡ tg mod xn and deg r + degt < n.

Thus there are at most ℓ distinct Padé approximants to g modulo xn .

(ii) Given the jth row r j = s j xn +t j g in the Extended Euclidean Algorithm, for some j ∈ {1, . . ., ℓ},
and the degree sequence n0 = n, n1 , . . ., nℓ of (xn , g), for which values of k ∈ {1, . . ., n} is (r j ,t j ) a
solution of (23)?
5.42∗ This exercise discusses both Cauchy interpolation and Padé approximation from a more
general point of view. We let m0 , . . ., ml−1 ∈ F[x] be nonconstant, monic, and pairwise coprime,
m = m0 · · ·ml−1 of degree n, v0 , . . ., vl−1 ∈ F[x] such that deg vi < deg mi for all i, and 0 ≤ k ≤ n.
(In contrast to Section 5.4, we denote the number of moduli by l.) The rational Chinese Remainder
Problem is to compute polynomials r,t ∈ F[x] satisfying

gcd(t, mi ) = 1 and rt −1 ≡ vi mod mi for 0 ≤ i < l, deg r < k, and degt ≤ n − k, (41)

where t −1 is the modular inverse of t modulo mi (Section 4.2). Let g ∈ F[x] be the polynomial
solution of the system of congruences g ≡ vi mod mi for all i. Furthermore, let r j , s j ,t j ∈ F[x] be the
jth row in the Extended Euclidean Algorithm for m and g, where j is minimal such that deg r j < k.
Prove:
(i) There exist polynomials r,t ∈ F[x] satisfying

r ≡ tvi mod mi for 0 ≤ i < l, deg r < k, degt ≤ n − k, (42)

namely r = r j and t = t j . If in addition gcd(r j ,t j ) = 1, then r and t also solve (41).

(ii) If r/t ∈ F(x) is a canonical form solution to (41), then r = τ −1 r j and t = τ −1 t j , where τ =
lc(t j ) ∈ F \ {0}. In particular, (41) is solvable if and only if gcd(r j ,t j ) = 1.
5.43 Compute—if possible—rational functions ρ = r/t ∈ Q(x) satisfying
(i) ρ(−1) = 1, ρ(0) = 2, ρ(1) = 1, ρ′ (1) = −1, with deg r < 3, degt ≤ 1.
(ii) ρ(−1) = 2, ρ′ (−1) = 1, ρ(1) = −1, ρ′ (1) = 2, with deg r < 1, degt ≤ 3.
5.44∗ Use Theorem 5.26 to formulate and prove the analog of Exercise 5.42 for integers.
5.45 Compute the partial fraction expansions of the following rational functions over Q:
x+2 x4 + 2x − 1
(i) 3 2
, (ii) 3 2 .
(x + 1) (x − 1) x (x + 1)
5.46 A cubic Bézier curve is a parametric curve in R2 of the form

A · (i + 1 − u)3 + B · 3(i + 1 − u)2 (u − i) +C · 3(i + 1 − u)(u − i)2 + D · (u − i)3 ,

where A, B,C, D are points in R2 , i ∈ N, and the parameter u runs through the real interval [i, i + 1].
(These curves, and also similar surfaces, were introduced in the late 1960s by Bézier at the Renault
car company, and by de Casteljau at Citroën; see Bézier (1970) and de Casteljau (1985).) A cubic
Bézier spline is a parametric curve on the real interval [0, . . ., n] which is defined by

Ai · (i + 1 − u)3 + Bi · 3(i + 1 − u)2 (u − i) +Ci · 3(i + 1 − u)(u − i)2 + Ai+1 · (u − i)3

for u ∈ [i, i + 1] and 0 ≤ i < n, i.e., it is a Bézier curve on each interval [i, i + 1]. The points
A0 , B0 ,C0 , A1 , . . ., An−1 , Bn−1 ,Cn−1 , An ∈ R2 are the control points of the Bézier spline.
(i) Show that the Bézier spline is continuous and passes through the points A0 , . . ., An .
Exercises 139

(ii) Prove that the Bézier spline is continuously differentiable with respect to the parameter u if
and only if Bi − Ai = Ai −Ci−1 for 1 ≤ i < n. Show that in this case, Bi − Ai is the tangent vector to
the curve at the point Ai for 0 ≤ i < n. What is the tangent vector at the point An ?
(iii) Consider the following control points for n = 4:

i 0 1 2 3 4
Ai (−1.8, 0) (−3.8, 8.5) (0, 12.8) (3.8, 8.5) (1.8, 0)
Bi (−1.8, 2) (−3.8, 11.17) (2, 12.8) (3.8, 5.83)
Ci (−3.8, 5.83) (−2, 12.8) (3.8, 11.17) (1.8, 2)

The corresponding Bézier spline models the inner boundary of the uppercase greek letter Ω. Plot this
Bézier spline. Is it continuously differentiable? Is it twice continuously differentiable?
(iv) Prove that for arbitrary points A0 , . . ., An ∈ R2 , there exists a Bézier spline passing through
these points that is twice continuously differentiable with respect to the parameter u. Show that there
is a unique such spline if B0 and C0 are prescribed as well. Hint: Exercise 5.35.
The mathematician’s pattern, like a painter’s or the poet’s, must be
beautiful. [. . . ] Beauty is the first test; there is no permanent place
in the world for ugly mathematics.
Godfrey Harold Hardy (1940)

Zudem ist es ein Irrtum zu glauben, daß die Strenge in der

Beweisführung die Feindin der Einfachheit sei. An zahlreichen
Beispielen finden wir im Gegenteil bestätigt, daß die strenge Methode
auch zugleich die einfachere und leichter faßliche ist. Das Streben nach
Strenge zwingt uns eben zur Auffindung einfacherer Schlußweisen.1
David Hilbert (1900)

Der Mathematiker ist nur in sofern vollkommen,

als er ein vollkommener Mensch ist, als er das Schöne
des Wahren in sich empfindet; dann erst wird er gründlich,
durchsichtig, umsichtig, rein, klar, anmutig, ja elegant wirken.2
Johann Wolfgang von Goethe (1829)

The algebraical element looked to me a pure science,

subject to mathematical law, inhuman.
Thomas Edward Lawrence (1926)

1 Furthermore, it is an error to believe that rigor in proof is an enemy of simplicity. On the contrary we find it
confirmed by numerous examples that the rigorous method is at the same time the simpler and the more compre-
hensible one. The very effort for rigor forces us to find simpler proof methods.
2 The mathematician is perfect only in so far as he is a perfect being, in so far as he perceives the beauty of truth;
only then will he appear to be thorough, transparent, comprehensive, pure, clear, gracious, and even elegant.
6
The resultant and gcd computation

We start this chapter with a typical example illustrating the growth of coefficients
in the Euclidean Algorithm for polynomials over Q. Much of the rest of this
chapter is devoted to getting a handle on this growth. As an application, we obtain
modular algorithms for the gcd in Q[x] and F[x, y] for a field F; these are much
more efficient than the direct computation.
Gauß’ lemma in Section 6.2 illuminates the non-obvious relation between gcds
of integer polynomials in Z[x] and Q[x]. We then introduce the resultant, which
gives control over the Bézout coefficients s and t in the presentation s f + tg =
gcd( f , g). This yields a modular gcd calculation for bivariate polynomials, and,
together with Mignotte’s factor bound, also for integer polynomials. Section 6.10
discusses the more general subresultants, which govern the coefficient growth in
the whole Extended Euclidean Algorithm, and provide a modular approach to the
EEA.
In between, we digress to two applications: computing the intersection points
of two plane algebraic curves, and an unexpectedly efficient way of computing the
gcd of many polynomials.

6.1. Coefficient growth in the Euclidean Algorithm

Let F be a field, and f , g ∈ F[x] with deg f = n ≥ deg g = m ≥ 0. We fix the notation
from Section 3.4 of the results of the Extended Euclidean Algorithm for f and g:

ρ0 r0 = f , ρ0 s0 = 1, ρ0t0 = 0,
ρ1 r1 = g, ρ1 s1 = 0, ρ1t1 = 1,
ρ2 r2 = r0 − q1 r1 , ρ2 s2 = s0 − q1 s1 , ρ2t2 = t0 − q1t1 ,
.. .. ..
. . . (1)
ρi+1 ri+1 = ri−1 − qi ri , ρi+1 si+1 = si−1 − qi si , ρi+1ti+1 = ti−1 − qiti ,
.. .. ..
. . .
0 = rℓ−1 − qℓ rℓ , sℓ+1 = sℓ−1 − qℓ sℓ , tℓ+1 = tℓ−1 − qℓtℓ ,

141
142 6. The resultant and gcd computation

with deg ri+1 < deg ri for all i ≥ 1. Thus ri−1 = qi ri + ρi+1 ri+1 is the division of
ri−1 by ri with remainder ρi+1 ri+1 ; the leading coefficient ρi+1 serves to have a
normalized remainder ri+1 . A basic invariant is ri = si f +ti g. We define the degree
sequence (n0 , n1 , . . . , nℓ ) by ni = deg ri for all i. Then

n = n0 ≥ n1 > n2 · · · > nℓ ≥ 0.

It is convenient to set ρℓ+1 = 1, rℓ+1 = 0, and nℓ+1 = −∞. The number of arithmetic
operations in F performed by the (Extended) Euclidean Algorithm for f and g is
O(nm) (Theorem 3.16).
In order to get a bound on the number of word operations of Euclid’s algorithm
over F = Q, we need to get a bound on the length of the numbers involved in the
computation. We extend the definition of the length of an integer from Section 2.1
to rational numbers and polynomials with rational coefficients. We bring all coef-
ficients of a polynomial a ∈ Q[x] to a common denominator, and then λ(a) is the
maximal number of words required to encode the denominator or a coefficient of
the numerator of a. More precisely, we use
◦ λ(a) = ⌊(log2 |a|)/64⌋ + 1, when a ∈ Z \ {0}, and λ(0) = 0,
◦ λ(a) = max{λ(b), λ(c)}, if a = b/c ∈ Q \ {0} with b, c ∈ Z and gcd(b, c) = 1,
◦ λ(a) = max{λ(a0 ), . . . , λ(an ), λ(b)}, when a = ∑0≤i≤n ai xi /b ∈ Q[x] with all
ai ∈ Z and b ∈ N≥1 such that gcd(a0 , . . . , an , b) = 1.
Thus a can be represented with about λ(a)(2 + deg a) words. Then for a, b ∈ Z[x]
and c, d ∈ Q, we have

λ(a + b) ≤ max{λ(a), λ(b)} + 1,

λ(ab) ≤ λ(a) + λ(b) + λ(min{deg a, deg b} + 1),
λ(cd), λ(c/d) ≤ λ(c) + λ(d),
λ(c + d) ≤ λ(c) + λ(d) + 1.

We next consider a division with remainder a = qb + ρr, where the polynomials

a = xn + ∑0≤i<n ai xi /c, b = xm + ∑0≤i<m bi xi /d ∈ Q[x], and r ∈ Q[x] are monic,
with all ai , bi ∈ Z and c, d ∈ N≥1 , in the special case where m = n − 1. Then

an−1 d − bm−1 c
q = x+ , λ(q) ≤ λ(a) + λ(b) + 1,
cd
acd 2 − xbcd 2 − (an−1 d − bm−1 c)bd
ρr = a − qb = ,
cd 2
λ(ρr) ≤ λ(a) + 2λ(b) + 3,

and the latter estimate also holds for λ(r) since the cd 2 in the denominator of ρr
and the numerator of 1/ρ cancel. Assuming that λ(a) ≤ λ(b), we see that the
6.1. Coefficient growth in the Euclidean Algorithm 143

coefficient size grows at most by a factor of about 3 in one division. Some exper-
iments with pseudorandom polynomials of degree n = 10 and with 10, 100, and
1000 decimal digit coefficients indicate that this is essentially sharp: the average
length ratio between the remainder coefficients and those of the input was 2.92,
2.998, and 2.9999, respectively, for 10 experiments each.
In a typical execution of the Euclidean Algorithm, the degrees of all the quotient
polynomials will be 1. From the above worst-case estimate, we find that λ(rℓ ) ∈
O(3ℓ · max{λ( f ), λ(g)}). This looks like bad news: an exponential upper bound on
the size of the gcd and the number of word operations of the Euclidean Algorithm!
In reality, however, the sizes do not grow like that at every step, and we can prove
that the coefficient sizes in the Euclidean Algorithm remain polynomially bounded
in the input size. To prove this non-obvious result, we need a “global view” of the
Euclidean Algorithm provided by the theory of (sub-)resultants. This theory will
give us explicit formulas for the coefficients that appear in the polynomials in the
Euclidean Algorithm. As a bonus, this theory will allow us to compute gcds in
Q[x] using a modular approach, yielding a much more practical algorithm.
The following example illustrates the huge coefficients that actually occur in
the Euclidean Algorithm in Q[x]. It is typical in the sense that for most pairs of
polynomials, with about as many coefficient digits as the degree, a similar growth
of intermediate results occurs.

E XAMPLE 6.1. The following is generated on most platforms by the M APLE

commands in typewriter font. Most versions of M APLE output rho[0],r[0]
as ρ0 , r0 , etc.
f := randpoly(x, coeffs = rand(-999 .. 999), degree = 5);
f := 824 x5 − 65 x4 − 814 x3 − 741 x2 − 979 x − 764
g := randpoly(x, coeffs = rand(-999 .. 999), degree = 4);
g := 216 x4 + 663 x3 + 880 x2 + 916 x + 617
rho[0] := lcoeff(f, x); r[0] := f/rho[0];
ρ0 := 824
65 4 407 3 741 2 979 191
r0 := x5 −
x − x − x − x−
824 412 824 824 206
rho[1] := lcoeff(g, x); r[1] := g/rho[1];
ρ1 := 216
221 3 110 2 229 617
r1 := x4 + x + x + x+
72 27 54 216
printlevel := 2:
for i from 1 to 5 do
q[i] := quo(r[i - 1], r[i], x, ’a[i + 1]’);
a[i + 1] := sort(a[i + 1]);
144 6. The resultant and gcd computation

if (a[i + 1] <> 0) then

rho[i + 1] := lcoeff(a[i + 1], x);
r[i + 1] := a[i + 1] / rho[i + 1];
fi;
od;
5837
q1 := x −
1854
614269 3 1539085 2 931745 3230125
a2 := x + x + x+
133488 200232 100116 400464
614269
ρ2 :=
133488
3078170 2 3726980 3230125
r2 := x3 + x + x+
1842807 1842807 1842807
61877369
q2 := x +
44227368
1292018949205 2 386731352527 914965415267
a3 := − x − x+
4527916852332 1131979213083 2263958426166
−1292018949205
ρ3 :=
4527916852332
1546925410108 1829930830534
r3 := x2 + x−
1292018949205 1292018949205
1126368994649461694
q3 := x +
2380941563727618435
4794883885762430016087234 4044518439139721899509903
a4 := x+
1669312965104792370132025 1669312965104792370132025
4794883885762430016087234
ρ4 :=
1669312965104792370132025
731586548698031843
r4 := x +
867315257966502554
396448327221414114828389881017
q4 := x +
1120587748227344134908028769570
−1289900328081598608308367775585297495
a5 :=
752235756701500871961459942888522916
−1289900328081598608308367775585297495
ρ5 :=
752235756701500871961459942888522916
r5 := 1
731586548698031843
q5 := x +
867315257966502554
a6 := 0
Thus already for comparatively small input sizes, the numerators and denomi-
nators of the intermediate results in the Euclidean Algorithm are amazingly large.
The traditional Euclidean Algorithm, where the remainders are not normalized at
6.1. Coefficient growth in the Euclidean Algorithm 145

each step, is run in Section 6.11 on the same example; its intermediate results are
considerably bigger than here. The example also illustrates the phenomenon of in-
termediate expression swell: The 25-digit coefficients of a4 contract to 18 digits
in its normalized version r4 . At the next step, the relation between a5 and r5 is
even more drastic. In the normal case, where all the quotients have degree 1, this
is not a serious problem: the discussion in Section 6.11 implies that then λ(ai ) is
at most about 3λ(ri ). A more important issue is that the upper bound on the size
of the coefficients of the gcd is smaller by about one order of magnitude than the
corresponding bound for the other remainders, even when the gcd is nonconstant.
This follows from the estimates in Section 6.6 and 6.11 below. ✸

The basic question now is: does this algorithm really run in polynomial time?
In other words: do the coefficients that occur have polynomially bounded length?
The naive exponential upper bound and the above example may raise some doubt
about this. But not to worry, all is well! Our proofs of polynomial bounds proceed
in two stages: first for the computation of the gcd in Sections 6.5 and 6.6, and
finally for all results of the EEA in Section 6.11.
Once we have a good bound on the final result, the basic idea to circumvent the
intermediate expression swell is to use a modular approach. When the input poly-
nomials f , g are in Z[x], then we may choose an appropriate prime p ∈ N, compute
gcd( f mod p, g mod p) in F p [x], and recover the gcd from its image modulo p.

E XAMPLE 6.1 (continued). We let f , g be as in Example 6.1 and take p = 7. Then

the Euclidean Algorithm in F7 [x] works as follows.
f := 824 x5 − 65 x4 − 814 x3 − 741 x2 − 979 x − 764
g := 216 x4 + 663 x3 + 880 x2 + 916 x + 617
rho[0] := lcoeff(f, x) mod 7; r[0] := f/rho[0] mod 7;
ρ0 := 5
r0 := x5 + x4 + x3 + 3 x2 + 3 x + 4
rho[1] := lcoeff(g, x) mod 7; r[1] := g/rho[1] mod 7;
ρ1 := 6
r1 := x + 2 x3 + 2 x2 + x + 6
4

printlevel := 2:
for i from 1 to 5 do
q[i] := Quo(r[i - 1], r[i], x, ’a[i + 1]’) mod 7;
a[i + 1] := a[i + 1];
if (a[i + 1] <> 0) then
rho[i + 1] := lcoeff(a[i + 1], x);
r[i + 1] := a[i + 1] / rho[i + 1] mod 7;
fi;
od;
146 6. The resultant and gcd computation

q1 := x + 6 a4 := x + 2
3 2 ρ4 := 1
a2 := x + 4 x + 5 x + 3
ρ2 := 1 r4 := x + 2
3 2 q4 := x + 1
r2 := x + 4 x + 5 x + 3
q2 := x + 5 a5 := 6
2
a3 := 5 x + x + 5 ρ5 := 6
ρ3 := 5 r5 := 1
2
r3 := x + 3 x + 1 q5 := x + 2
q3 := x + 1 a6 := 0

Thus gcd( f mod 7, g mod 7) = 1. If h is a common divisor of f and g in Z[x],

then h mod 7 divides f mod 7 and g mod 7, which implies that h mod 7 is con-
stant. Now lc(h) divides lc( f ), and since the latter does not vanish modulo 7, we
have deg h = deg(h mod 7). This shows that h is constant. ✸

In the above example, the modular approach has revealed that f and g have
no nonconstant common divisors in Z[x], but does this also imply that they have
no nonconstant common divisors in Q[x]? The answer is yes, but it requires an
important tool, Gauß’ lemma, which we will discuss in Section 6.2 below.
Besides that, the following questions have to be addressed in order to make the
idea from the above example into an algorithm.
◦ How big do we have to choose the modulus p so that we can recover the gcd
from its image modulo p? This requires an upper bound on the size of the
coefficients of the gcd, which is provided by Mignotte’s bound 6.33 in Sec-
tion 6.6. The corresponding question for polynomials with coefficients in F[y]
for a field F is trivial: the degree in y of the gcd is at most that of the input
polynomials.
◦ How do we find the denominators of the monic gcd? If the gcd is constant, as
in Example 6.1, then this is not an issue, but in general a monic nonconstant
gcd will have rational coefficients that are not integers. One solution is rational
number reconstruction, as discussed in Section 5.10. Another possibility is to
multiply the modular gcd by a known multiple of all denominators; the results
of Section 6.2 will provide such a multiple.
◦ Does the approach work for any prime, or are there primes where the degree of
the modular gcd is too large? Unfortunately, there are such “unlucky” primes,
but fortunately, not too many of them. This can be shown by using resultants,
which we discuss in Section 6.3.
6.2. Gauß’ lemma 147

6.2. Gauß’ lemma

In this chapter, we lay the groundwork for computing the gcd in rings like Z[x].
The (Extended) Euclidean Algorithm of Chapter 3 works for polynomials in R[x]
only if R is a field. In fact, Z[x] is not a Euclidean domain (Exercise 3.17), and
we first have to make sure that the gcd is well defined. One can of course apply
the Euclidean Algorithm over the field of fractions K of an integral domain R such
as Z, but does that yield the gcd in R[x]? The answer is no: say for f = 2x2 + 2,
g = 6x + 2, we have gcd( f , g) = 1 in Q[x], but gcd( f , g) = 2 in Z[x]. In this section,
we elucidate the difference between these gcds, and will end up with an algorithm
for gcds in R[x]. Our two standard examples are: R = Z and K = Q, and R = F[y]
and K = F(y) for a field F and another indeterminate y.
We recall that two elements a, b of a Unique Factorization Domain R are asso-
ciate if a = ub for a unit u ∈ R. We will assume that we have multiplicative func-
tions “normal” and “lu” on R such that lu(a) is a unit and a = lu(a) · normal(a)
for all a ∈ R, with the properties required in Section 3.4. We will say that a is
normalized if lu(a) = 1, and assume that every element b ∈ R which is associate
to a has normal(b) = normal(a). In particular, normal(a) = 1 and lu(a) = a if
and only if a is a unit. Then for all a, b ∈ R, gcd(a, b) is the unique normalized
associate of all greatest common divisors of a and b. In our two standard ex-
amples, we take lu(a) = sign(a) and normal(a) = |a| for R = Z, lu(a) = lc(a)
and normal(a) = a/ lc(a) for R = F[x], and in both cases, we let lu(0) = 1 and
normal(0) = 0.

D EFINITION 6.2. The content cont( f ) of f = fn xn + · · · + f1 x + f0 ∈ R[x] with

f0 , . . . , fn ∈ R and R a UFD is defined as cont( f ) = gcd( f0 , f1 , . . . , fn ) ∈ R. By
convention, we take cont( f ) = gcd( f0 ) = normal( f0 ) if n = 0. The polynomial
f is primitive if cont( f ) = 1. We define the primitive part pp( f ) of f by f =
cont( f ) · pp( f ). The content and primitive part of f are unique since we have a
unique gcd.

The following examples illustrate these notions.

E XAMPLE 6.3. Let R = Z, K = Q,

f = 18x3 − 42x2 + 30x − 6, g = −12x2 + 10x − 2 ∈ Z[x].

Then

cont( f ) = gcd(18, −42, 30, −6) = 6, cont(g) = gcd(−12, 10, −2) = 2,

pp( f ) = 3x3 − 7x2 + 5x − 1, pp(g) = −6x2 + 5x − 1. ✸
148 6. The resultant and gcd computation

E XAMPLE 6.4. Let

f = (y3 + 3y2 + 2y)x3 + (y2 + 3y + 2)x2 + (y3 + 3y2 + 2y)x + (y2 + 3y + 2),
g = (2y3 + 3y2 + y)x2 + (3y2 + 4y + 1)x + (y + 1)

in F5 [x, y]. Using the Euclidean Algorithm in R = F5 [y] we compute

cont( f ) = gcd(y3 + 3y2 + 2y, y2 + 3y + 2) = y2 + 3y + 2,

cont(g) = gcd(2y3 + 3y2 + y, 3y2 + 4y + 1, y + 1) = y + 1,
pp( f ) = yx3 + x2 + yx + 1,
pp(g) = (2y2 + y)x2 + (3y + 1)x + 1. ✸

L EMMA 6.5. For f ∈ R[x] and c ∈ R, cont(c f ) = cont(c) · cont( f ) and pp(c f ) =
pp(c) · pp( f ).

P ROOF. Exercise 6.4. ✷

The following result, due to Gauß, is the cornerstone for unique factorization of
polynomials over UFDs.

T HEOREM 6.6 Gauß’ lemma.

For a Unique Factorization Domain R, the product of two primitive polynomials
in R[x] is primitive.

P ROOF. Let f , g ∈ R[x] be primitive and p ∈ R a prime. Then D = R/hpi is an

integral domain, and so is D[x], since the product of the leading coefficients of
two nonzero polynomials is nonzero. By assumption, f mod p and g mod p are
both nonzero in D[x], and hence f g mod p is nonzero as well, or equivalently,
p ∤ cont( f g). Thus cont( f g) = 1. ✷

C OROLLARY 6.7.
For f , g ∈ R[x], cont( f g) = cont( f ) cont(g) and pp( f g) = pp( f ) pp(g).

P ROOF. Let h = pp( f g). By Gauß’ lemma, h∗ = pp( f ) pp(g) is primitive. Then

f g = (cont( f ) · pp( f ))(cont(g) · pp(g)) = cont( f ) cont(g) · h∗ ,

cont( f g) = cont(cont( f ) cont(g)) · cont(h∗ ) = cont( f ) cont(g),

by Lemma 6.5 and since cont( f ) cont(g) is normalized, and the claim follows. ✷
6.2. Gauß’ lemma 149

Lemma 6.5 is just the special case of Corollary 6.7 when g = c is constant.
It is convenient to extend the definition of content and primitive part to poly-
nomials in K[x]. If f = ∑0≤i≤n (ai /b)xi ∈ K[x], with a common denominator b ∈
R \ {0} and all ai ∈ R, then we let cont( f ) = gcd(a0 , . . . , an )/ cont(b) ∈ K and
pp( f ) = f / cont( f ). For example, we have cont(−3x − 9/2) = 3/2 ∈ Q and
pp(−3x − 9/2) = −2x − 3 ∈ Z[x]. Then pp( f ) is a primitive polynomial in R[x],
and Exercise 6.4 shows that Lemma 6.5 and Corollary 6.7 hold for c ∈ K and
f , g ∈ K[x].
We recall the following notions. A nonzero nonunit p of a ring R is prime if
p | ab implies that p | a or p | b, and p is irreducible if p = ab implies that one of
a and b is a unit. Multiplication by units does not change the property of being (or
not being) prime or irreducible. Prime elements are irreducible, and if R is a UFD,
then the two notions coincide (Section 25.2). We can now prove the following
celebrated theorem of Gauß.

T HEOREM 6.8 Gauß.

If R is a UFD, then R[x] is a UFD.

P ROOF. Since R is an integral domain, deg( f g) = deg f + deg g holds for any
nonzero polynomials f , g ∈ R[x]. This implies that the units of R[x] are precisely
the units of R, and that a prime p ∈ R is irreducible in R[x].
Let f ∈ R[x] be a nonzero nonunit. Since R is a UFD, cont( f ) can be written as a
product of irreducibles of R, by the above. Let K denote the field of fractions of R.
Then K[x] is a Euclidean domain and therefore a UFD, and pp( f ) = f1 f2 · · · fr
in K[x] with (over K) irreducible nonconstant polynomials f1 , . . . , fr . Extracting
contents, Corollary 6.7 yields the factorization
pp( f ) = pp( f1 ) · · · pp( fr ) (2)
into primitive polynomials in R[x]. Since each pp( fi ) is primitive in R[x] and irre-
ducible in K[x], it is irreducible in R[x]. This proves the existence of a factorization
into irreducibles in R[x].
By the additivity of the degree, every irreducible factor of a constant f ∈ R
belongs to R, and the uniqueness of the factorization of f in R[x] follows from the
one in R. Now we assume that f ∈ R[x] is nonconstant, and let
p1 · · · pk · f1 · · · fr = f = q1 · · · ql · g1 · · · gs
be two factorizations of f into irreducibles, with normalized p1 , . . . , pk , q1 , . . . , ql in
R and nonconstant primitive f1 , . . . , fr , g1 , . . . , gs ∈ R[x]. Then p1 · · · pk = cont( f ) =
q1 · · · qs , by Corollary 6.7. Thus k = l and p1 = q1 , . . . , pk = qk after reordering,
since R is a UFD. Furthermore,
f1 · · · fr = pp( f ) = g1 · · · gs (3)
150 6. The resultant and gcd computation

in R[x]. Since any factorization in K[x] of a nonconstant primitive polynomial

would lead to a nontrivial factorization in R[x], as in (2), these polynomials remain
irreducible in K[x]. Thus (3) also contains two factorizations of pp( f ) in K[x] into
irreducibles, and hence r = s and—after a suitable renumbering— fi = bi gi , with
bi ∈ K\{0}, for 1 ≤ i ≤ r. Since fi and gi are primitive, we have

fi = pp( fi ) = pp(bi gi ) = pp(bi ) pp(gi ) = pp(bi )gi

for 1 ≤ i ≤ r, which concludes the proof since pp(bi ) is a unit in R. ✷

In particular, since R[x] is a UFD, any two elements of R[x] have a gcd. In
order to have a function gcd on R[x], we extend “lu” to R[x] via lu( f ) = lu(lc( f ))
(Exercise 3.8 (iii)) to define a normal form on R[x]. Then a polynomial in R[x] is
normalized precisely when its leading coefficient is, and gcd( f , g) is the unique
normalized associate in R[x] of all greatest common divisors of f and g, as usual.
Both 5 and 5x + 1 are primes in Z[x], but 5 is a unit in Q[x], while 5x + 1 is a
prime also in Q[x]. More generally, nonconstant polynomials are not units, so that
R× = (R[x])× , and {1, −1} = Z× = (Z[x])× ⊂ Q \ {0} = Q× = (Q[x])× , where R×
is the group of units of a ring R.

C OROLLARY 6.9.
Let R be Z or a field, and n ≥ 0. Then R[x1 , . . . , xn ] is a Unique Factorization
Domain.

C OROLLARY 6.10.
Let R be a UFD with field of fractions K , f , g ∈ R[x], and h the normalized gcd of
f and g in R[x].

(i) The primes of R[x] are the primes of R plus the primitive polynomials in R[x]
that are irreducible in K[x].
(ii) cont(h) = gcd(cont( f ), cont(g)) in R and pp(h) = gcd(pp( f ), pp(g)) in R[x].
In particular, h = gcd(cont( f ), cont(g)) · gcd(pp( f ), pp(g)), and h is primi-
tive if one of f and g is.
(iii) h/ lc(h) ∈ K[x] is the monic gcd of f and g in K[x].

P ROOF. (i) Let p ∈ R[x]. We first assume that p is prime. If p is a constant, then
p is prime in R. Otherwise, it is a primitive polynomial and irreducible in K[x],
since a factorization in K[x] leads to one in R[x], as in (2).
On the other hand, if p is not prime, then a factorization p = uv with u, v 6∈ R×
shows that p is not prime in R, and, if p is primitive and nonconstant, that p is
reducible in K[x].
6.2. Gauß’ lemma 151

(ii) The polynomial h divides f , and hence cont(h) divides cont( f ), and, by sym-
metry, it divides cont(g) and hence gcd(cont( f ), cont(g)). On the other hand, this
gcd is in R and a common factor of f and g, hence divides h and then also cont(h).
This proves the first claim. The second one follows similarly, using the fact that
pp(h) divides pp( f ), by Corollary 6.7, and that pp(h) is normalized since h is
(Exercise 3.8 (iv)).
(iii) Since h/ lc(h) is a divisor of f and g in K[x], it also divides their monic
gcd h∗ . On the other hand, f = f ∗ h∗ for some f ∗ ∈ K[x], and taking contents
together with Corollary 6.7 shows that pp(h∗ ) | pp( f ) | f in R[x], and similarly
pp(h∗ ) | g. Thus pp(h∗ ) divides h = gcd( f , g) in R[x], which implies that h∗ and
h/ lc(h) divide each other in K[x], and since both are monic, they are equal. ✷

We note that part (ii) is wrong when h is not normalized: for example, if R = Z,
f = g = x, and h = −x, then pp(h) = −x 6= x = gcd(pp( f ), pp(g)). The following
examples illustrate the difference between gcds in R[x] and K[x].

E XAMPLE 6.3 (continued). With the Euclidean Algorithm in Q[x], we find that
gcd( f , g) = gcd(pp( f ), pp(g)) = x − 1/3 in Q[x]; see the continuation of Example
3.7 on page 58. Hence

pp(gcd( f , g)) = gcd(pp( f ), pp(g)) = 3x − 1 in Z[x],

cont(gcd( f , g)) = gcd(cont( f ), cont(g)) = gcd(6, 2) = 2,
gcd( f , g) = cont(gcd( f , g)) · pp(gcd( f , g)) = 6x − 2 in Z[x].

The polynomials f and pp( f ) are normalized in Z[x] since their leading coeffi-
cients are positive, but g and pp(g) are not. Both gcd( f , g) and gcd(pp( f ), pp(g))
are normalized. ✸

E XAMPLE 6.4 (continued). Since R = F5 [y] is a Euclidean domain and we can

regard f , g as polynomials in R[x], we can apply the above results to compute the
greatest common divisor of f and g.

gcd(cont( f ), cont(g)) = y + 1,
gcd(pp( f ), pp(g)) = yx + 1 in F5 [y][x] = F5 [x, y],
1
gcd( f , g) = gcd(pp( f ), pp(g)) = x + in F5 (y)[x],
y
gcd( f , g) = gcd(cont( f ), cont(g)) · gcd(pp( f ), pp(g))
= (y2 + y)x + (y + 1) in F5 [y][x].

Thus f and pp( f ) are normalized in R[x], while g and pp(g) are not. Both gcd( f , g)
and gcd(pp( f ), pp(g)) are normalized. ✸
152 6. The resultant and gcd computation

We obtain the following algorithm for calculating gcds in Z[x] and F[x, y]. By
Corollary 6.10 (ii) we may assume that the input polynomials are primitive.

A LGORITHM 6.11 Gcd of primitive polynomials.

Input: Primitive f , g ∈ R[x], where R is a UFD.
Output: h = gcd( f , g) in R[x].
1. call the Euclidean Algorithm 3.14 to determine the monic gcd v of f and g
in K[x], where K is the field of fractions of R
2. b ←− gcd(lc( f ), lc(g))
3. return pp(bv) ∈ R[x]

T HEOREM 6.12.
The algorithm works correctly as specified.

P ROOF. Let h be the normalized gcd of f and g in R[x]. By Corollary 6.10, v as

computed in step 1 equals h/ lc(h). Since h divides f and g in R[x], lc(h) divides
lc( f ) and lc(g) in R, hence it divides gcd(lc( f ), lc(g)), and bv ∈ R[x]. Further-
more, pp(bv) ∈ R[x] is primitive, by definition, and normalized, since b is (Exercise
3.8 (iv)), so that pp(bv) = h. ✷

To compute the gcd of arbitrary polynomials, we of course compute the gcd

of their contents in R and apply the algorithm to their primitive parts. However,
Example 6.1 indicates that Algorithm 6.11 is not the best way to do this, and the
modular algorithms of Sections 6.4 through 6.7 will be much more efficient.

6.3. The resultant

The central goal of this whole chapter is to find modular gcd algorithms for do-
mains like Z[x], Q[x], and F[x, y]. Section 6.13 reports on implementations that
show how much these algorithms are superior to the “traditional” one, whose prob-
lems are quite visible in Example 6.1. The simplest such approach, the big prime
modular algorithm, chooses a large prime p, calculates the gcd modulo p, and
recovers the true gcd from its modular image. This is quite easy, provided that the
modular gcd is indeed the image of the true gcd; this may, in fact, fail in excep-
tional cases.
This section provides a general tool, the resultant, to control modular images
of the gcd. This introduces linear algebra into our polynomial problems. We also
discuss other applications, such as curve intersection and minimal polynomials of
algebraic elements. In Section 6.10, we introduce the subresultants, a general-
ization that gives us control over all results of the EEA. But the reader should
6.3. The resultant 153

realize clearly that for gcd calculations the resultant is purely an (indispensable)
conceptual tool and does not enter the algorithms, but only their analysis.
Now let F be a field and f , g ∈ F[x]. The following lemma says that the van-
ishing linear combination (−g) · f + f · g = 0 has the smallest possible coefficient
degrees if and only if gcd( f , g) = 1.

L EMMA 6.13. Let f , g ∈ F[x] be nonzero. Then gcd( f , g) 6= 1 if and only if there
exist s,t ∈ F[x] \ {0} such that s f + tg = 0, deg s < deg g, and degt < deg f .

P ROOF. Let h = gcd( f , g). If h 6= 1, then deg h ≥ 1, and s = −g/h, t = f /h suf-

fice. Conversely, let s,t be as assumed. If f and g were coprime, then s f = −tg
would imply that f | t, which is impossible since t 6= 0 and deg f > degt. This
contradiction shows that h 6= 1. ✷

We now reformulate Lemma 6.13 in a different language. Given nonzero f , g ∈

F[x] of degrees n, m, respectively, we let

ϕ = ϕ f ,g : F[x] × F[x] −→ F[x]

(4)
(s,t) 7−→ s f + tg

be the “linear combination map”. For d ∈ N, we let Pd = {a ∈ F[x]: deg a < d},
with the convention that P0 = {0}. Then ϕ is a linear mapping of infinite-dimen-
sional vector spaces over F. (It is also an F[x]-linear map of F[x]-modules, in the
natural way.) The restriction of ϕ to ϕ0 : Pm × Pn −→ Pn+m is an F-linear mapping
between vector spaces of the same finite dimension, and Lemma 6.13 says the
following.

T HEOREM 6.14.
Let f , g ∈ F[x] be nonzero of degrees n, m, respectively.

(i) gcd( f , g) = 1 ⇐⇒ ϕ0 is an isomorphism.

(ii) If gcd( f , g) = 1 and n + m ≥ 1, then the Bézout coefficients sℓ ,tℓ computed

by the EEA form the unique solution in Pm × Pn of ϕ0 (sℓ ,tℓ ) = 1.

P ROOF. Lemma 6.13 says that

deg gcd( f , g) ≥ 1 ⇐⇒ there exists a nonzero (s,t) ∈ Pm × Pn with ϕ0 (s,t) = 0

⇐⇒ ϕ0 is not injective.

For our map ϕ0 between vector spaces of equal (finite) dimension, the following
three properties are equivalent:
154 6. The resultant and gcd computation

◦ ϕ0 is an isomorphism,
◦ ϕ0 is injective (or one-to-one),
◦ ϕ0 is surjective (or onto).
Claim (i) now follows. For (ii), we recall from Lemma 3.15 (b) that (sℓ ,tℓ ) ∈
Pm × Pn . Since ϕ0 is an isomorphism, the solution ϕ0 (sℓ ,tℓ ) = 1 is unique. ✷

In order to compute with the linear map ϕ0 , we represent it by a matrix. We

write f = ∑0≤ j≤n f j x j , g = ∑0≤ j≤m g j x j , with all f j , g j ∈ F. The natural basis for
Pm × Pn consists of (xi , 0) for i < m and (0, x j ) for j < n, and for Pn+m we take all
xk with 0 ≤ k < n + m. On these bases, ϕ0 is represented by the (n + m) × (n + m)
matrix S with entries in F defined as
 
fn g m
 
 fn−1 fn gm−1 gm 
 
 .. .. . . . . . 
 . . . .. .. .. 
 
 .. .. . .. 
 . . f n g .. . 
 1
 .. .. . . 
 . . f .. .. 
n−1 g 0 
S=  .. .. .. ,
 . . . g0 gm 
 (5)
 . . . . 

 f0 .. .. .. .. 
 
 .. .. .. 
 f0 . . . 
 
 . . .. . . .. 
 . . . . 
f0 g0
| {z } | {z }
m n

with m columns of f j ’s and n columns of g j ’s, and all entries outside the two
“parallelograms” equal to zero. This means that when we write

s= ∑ y jx j, t = ∑ z j x j , s f + tg = ∑ u jx j,
0≤ j<m 0≤ j<n 0≤ j<n+m

with all y j , z j , u j ∈ F, then

  
un+m−1 ym−1
 ..   .. 
   . 
 .   
 ..   y0 
 
 = S· .
 .  zn−1 
 ..   
 .   .. 
 . 
u0 z0
6.3. The resultant 155

This is the central step; we advise the reader to understand it thoroughly. Theorem
6.14 can now be restated as follows.

C OROLLARY 6.15.
Let f , g, n, m be as in Theorem 6.14.

(i) gcd( f , g) = 1 ⇐⇒ det S 6= 0.

(ii) If gcd( f , g) = 1, n + m ≥ 1, and y0 , . . . , ym−1 , z0 , . . . , zn−1 ∈ F satisfy
 
ym−1  
 ..  0
 .   .. 
   
 y0   . 
S·   . 
 zn−1  =  ..  ,
   
 ..   0 
 . 
1
z0

then sℓ = ∑0≤i<m yi xi and tℓ = ∑0≤i<n zi xi are the Bézout coefficients com-

puted by the EEA, with sℓ f + tℓ g = 1.

If R is a (commutative) ring and f , g ∈ R[x], then Syl( f , g) = S is their Sylvester

matrix (sometimes the transpose of S gets this name), and res( f , g) = det S their
resultant. If n = m = 0, then S is the “empty” 0 × 0 matrix with determinant
res( f , g) = 1, and it is convenient to define the resultant with the zero polynomial
as res( f , 0) = res(0, f ) = 0 if f is zero or nonconstant and res( f , 0) = res(0, f ) = 1
if f is a nonzero constant. Then the resultant is defined for all pairs of polynomials,
and Corollary 6.15 (i) holds in all cases.

E XAMPLE 6.16. Let f = r0 = x4 − 3x3 + 2x and g = r1 = x3 − 1 in Q[x]. The

quotients and remainders in the Euclidean Algorithm are

r0 = q1 r1 + ρ2 r2 = (x − 3)r1 + 3(x − 1),

r1 = q2 r2 = (x2 + x + 1)r2 .

Thus gcd( f , g) = r2 = x − 1 (both in Z[x] and in Q[x]), and the resultant is

 
1 0 0 1 0 0 0
 −3 1 0 0 1 0 0 
 
 0 −3 1 0 0 1 0 
 
res( f , g) = det 
 2 0 −3 −1 0 0 1  = 0.

 0 2 0 0 −1 0 0 
 
 0 0 2 0 0 −1 0 
0 0 0 0 0 0 −1
156 6. The resultant and gcd computation

If we divide out the gcd, so that r0 = f /(x−1) = x3 −2x2 −2x and r1 = g/(x−1) =
x2 + x + 1, we have

r0 = q1 r1 + r2 = (x − 3)r1 + 3 · 1,
r1 = q2 r2 = (x2 + x + 1)r2 .

Now r0 , r1 are coprime and

 
1 0 1 0 0
 −2 1 1 1 0 
 
res(r0 , r1 ) = det 
 −2 −2 1 1 1  = 9. ✸

 0 −2 0 1 1 
0 0 0 0 1

C OROLLARY 6.17.
Let F be a field, and f , g ∈ F[x] nonzero. Then the following are equivalent:

(i) gcd( f , g) = 1,

(ii) res( f , g) = det S 6= 0,

(iii) there do not exist s,t ∈ F[x] \ {0} such that

s f + tg = 0, deg s < deg g, degt < deg f .

E XAMPLE 6.18. A quadratic polynomial f = ax2 + bx + c ∈ F[x], with char F 6= 2

and a 6= 0, is squarefree (that is, it has no double root) if and only if its discriminant
4ac − b2 does not vanish. Equivalently, gcd( f , f ′ ) = 1, where f ′ = 2ax + b is the
derivative of f (Section 9.3). We calculate
 
a 2a 0
res ( f , f ′ ) = det  b b 2a  = a(4ac − b2 ). ✸
c 0 b

E XAMPLE 6.19. If F ⊆ K are fields, f , g ∈ F[x], and h ∈ K[x] is nonconstant and

divides f and g, then there is also a nonconstant polynomial k ∈ F[x] dividing f
and g. This is because the resultant res( f , g) is the same whether we consider it
over F or K. By assumption, it is zero in K, hence also in F. Moreover, even the
(monic) gcd of f and g is the same over F and over K. This is in contrast to the fact
that a polynomial f may very well have a nontrivial factor, with degree between
2
1 and deg f − 1, over K, but none
√ over√F, such as the irreducible x − 2 ∈ Q[x],
2
which factors as x − 2 = (x − 2)(x + 2) in R[x]. ✸
6.3. The resultant 157

Combining Corollaries 6.10 and 6.17, we obtain the following.

C OROLLARY 6.20.
Let R be a UFD and f , g ∈ R[x] not both zero. Then gcd( f , g) is nonconstant in
R[x] if and only if res( f , g) = 0 in R.

C OROLLARY 6.21.
Let R be an integral domain and f , g ∈ R[x] nonzero with deg f + deg g ≥ 1. Then
there exist nonzero s,t ∈ R[x] such that s f + tg = res( f , g), deg s < deg g, and
degt < deg f .

P ROOF. Let F be the field of fractions of R. If r = res( f , g) = 0, then Corollary

6.17 yields such s,t ∈ F[x], and the claim follows from multiplying with a common
denominator. If the resultant is nonzero, then f and g are coprime in F[x], and there
exist nonzero s∗ ,t ∗ ∈ F[x] with the stated degree bounds such that s∗ f + t ∗ g = 1,
by Lemma 3.15 (b). Now Theorem 6.14 says that the coefficients of s∗ and t ∗ are
the unique solution of a linear system with coefficient matrix S = Syl( f , g), and
Cramer’s rule 25.6 implies that each of them is the quotient of the determinant of
a submatrix of S by r = det S. Thus s = rs∗ and t = rt ∗ are in R[x]. ✷

When f , g ∈ F[x, y], we write resx ( f , g) for the resultant in F[y] with respect
to x. Symmetrically, there is also a polynomial resy ( f , g) ∈ F[x]. We have the
following bound on degy resx ( f , g), where degy denotes the degree with respect to
the variable y (Section 25.3).

T HEOREM 6.22.
Let f , g ∈ F[x, y] with n = degx f , m = degx g, and degy f , degy g ≤ d . Then

degy resx ( f , g) ≤ (n + m)d.

P ROOF. When we write the determinant resx ( f , g) as the familiar sum of (n + m)!
terms, then each nonzero term has m factors that are coefficients of f , and n factors
that are coefficients of g. Hence the degree of each term is at most md + nd. ✷

In Section 6.8, we discuss Bézout’s theorem, which corresponds to the bound

degy resx ( f , g) ≤ deg f · deg g, where deg is the total degree (see Exercise 6.11).
The following corollary gives an analogous bound on the size of the resul-
tant, but now for integer polynomials f and g. It is an immediate consequence
of Hadamard’s inequality 16.6. Two expressions for the “size” of f play a role:
The 2-norm (or Euclidean norm) || f ||2 of a polynomial f = ∑0≤i≤n fi xi ∈ Z[x] is
158 6. The resultant and gcd computation

|| f ||2 = (∑0≤i≤n fi2 )1/2 . The max-norm is || f ||∞ = max{| fi |: 0 ≤ i ≤ n}, and the
relation || f ||∞ ≤ || f ||2 ≤ (n + 1)1/2 || f ||∞ shows that the two norms differ only by a
small factor (Section 25.5).

T HEOREM 6.23.
Let f , g ∈ Z[x], n = deg f , and m = deg g. Then

| res( f , g)| ≤ || f ||m n

2 ||g||2 ≤ (n + 1)
m/2
(m + 1)n/2 || f ||m n
∞ ||g||∞ .

6.4. Modular gcd algorithms

Our goal in this and the following sections is to provide a modular gcd algorithm
for Z[x] and F[x, y]. We start by investigating the relation between the modular
image of the gcd and the gcd of modular images. It turns out that they are usually
(essentially) equal, but this fails for the “unlucky primes” that divide a certain
resultant.
We let f , g ∈ R[x] for a Euclidean domain R, p ∈ R a prime, and denote by a
bar the reduction modulo p. Corollary 6.20 says that gcd( f , g) is constant if and
only if res( f , g) 6= 0. The resultant res( f , g) ∈ R is a polynomial expression in
the coefficients of f and g, and one might be tempted to say: since res( f , g) =
res( f , g), f and g are coprime in (R/hpi)[x] if and only if p ∤ res( f , g).

E XAMPLE 6.24. To get a taste of what can go wrong without further assumptions,
we let R = Z and p = 2. When f = x + 2 and g = x, then res( f , g) = −2 6= 0 and
res( f , g) = 0, as expected. But when f = 4x3 − x and g = 2x + 1, then res( f , g) = 0
and res( f , g) = res(x, 1) = 1 6= 0; in particular, res( f , g) 6= res( f , g). ✸

The reason for the unexpected behavior in the last example is that the two rele-
vant Sylvester matrices are formed in rather different ways. Fortunately, this nui-
sance disappears when p does not divide at least one of the leading coefficients.

L EMMA 6.25. We let R be a ring (commutative, with 1), f , g ∈ R[x] nonzero,

r = res( f , g) ∈ R, I ⊆ R an ideal, denote by a bar the reduction modulo I , and
assume that lc( f ) is not a zero divisor.
(i) r = 0 ⇐⇒ res( f , g) = 0.
(ii) If R/I is a UFD, then r = 0 if and only if gcd( f , g) is nonconstant.

P ROOF. We write f = ∑0≤ j≤n f j x j , g = ∑0≤ j≤m g j x j , with nonzero fn , gm and all
f j , g j ∈ R. If deg f = 0, then both Sylvester matrices Syl( f , g) and Syl( f , g) are
6.4. Modular gcd algorithms 159

diagonal with f and f on the diagonal, respectively, and both r and res( f , g) are
nonzero. So let deg f ≥ 1. If g = 0, then res( f , g) = 0 and each column of g j ’s in
the Sylvester matrix Syl( f , g) vanishes modulo I, so that r = 0.
We now assume that g 6= 0, and let i be the smallest index with gm−i 6= 0. Then
we can partition Syl( f , g) as in Figure 6.1. The lower right submatrix, taken
 

 fn gm 
  i
 . .. 
 . .. .. 
 . . . . 
 
f fn gm−i .. 
 n−i . 
 

 
 .. .. .. .. .. gm 

 . . . . . 
 

 .. .. .. .. 
 f0 .
fn
. . . 
Syl( f , g) =  
 
 .. .. .. g0 gm−i  n+m−i
 . . . 
 
 .. .. .. 
 f0 . 
 . . 

 
 .. .. .. .. 

 . . . . 
 
 

f0 g0
| {z }| {z }| {z }
i m−i n

F IGURE 6.1: The Sylvester matrix of f and g.

modulo I, is Syl( f , g). All g j in the first i rows are zero modulo I, and repeated
Laplace expansion (Section 25.5) of r = det Syl( f , g) along the first row yields
modulo I that r = fni res( f , g). This proves (i), and the second claim follows from
Corollary 6.20. ✷

The conclusion may be false when both leading coefficients vanish modulo I, as
in the second case of Example 6.24.

T HEOREM 6.26.
Let R be a Euclidean domain, p ∈ R prime, and f , g ∈ R[x] nonzero. Furthermore,
let h = gcd( f , g) ∈ R[x], e = deg h, α = lc(h), and assume that p does not divide
b = gcd(lc( f ), lc(g)) ∈ R. A bar denotes reduction modulo p, and we let e∗ =
deg gcd( f , g). Then
(i) α divides b,
(ii) e∗ ≥ e,
(iii) e∗ = e ⇐⇒ α · gcd( f , g) = h ⇐⇒ p ∤ res( f /h, g/h) in R.
160 6. The resultant and gcd computation

P ROOF. Since h divides f and g in R[x], lc(h) divides lc( f ) and lc(g) in R, and (i)
follows. Let u = f /h and v = g/h ∈ R[x]. Then deg h = e since p ∤ b and by (i),
and
uh = f and vh = g (6)
imply that h divides gcd( f , g), which shows (ii) and the first equivalence in (iii).
(Recall that over a field such as R/hpi, polynomial gcds are always taken to be
monic.)
Now p ∤ b implies that p divides at most one of lc(u) and lc(v), say p ∤ lc(u).
Then Lemma 6.25 (ii) implies that p divides res(u, v) if and only if gcd(u, v) 6= 1
in R/hpi. From (6), we find that gcd( f , g) = gcd(u, v) · h/α, and this implies the
second equivalence in (iii). ✷

E XAMPLE 6.3 (continued). Let R = Z,

f = 18x3 − 42x2 + 30x − 6, g = −12x2 + 10x − 2 ∈ Z[x]

as in Example 6.3, and p = 17. Then h = gcd( f , g) = 6x − 2, so that e = 1, and

p ∤ lc(h). Computing f /h = 3x2 − 6x + 3, g/h = −2x + 1, and
 
f g 3 −2 0
res , = det  −6 1 −2  = 3 6≡ 0 mod 17,
h h
3 0 1

we conclude that deg gcd( f , g) = 1. In fact, gcd( f , g) = x + 11 in F17 [x], and

gcd( f , g) = 6 · gcd( f , g). ✸

E XAMPLE 6.16 (continued). Let R = Z, f = x4 − 3x3 + 2x and g = x3 − 1 in Z[x].

We have seen in Example 6.16 that h = gcd( f , g) = x − 1, whence e = 1, and that
res( f /h, g/h) = 9. Thus for p = 3, we have deg gcd( f , g) > 1. Inspection reveals
that gcd( f , g) = g in F3 [x], and 3 is an “unlucky prime”; in fact, it is the only one,
by Theorem 6.26. ✸

E XAMPLE 6.27. Let R = F5 [y],

f = yx3 + x2 + yx + 1, g = (2y2 + y)x2 + (3y + 1)x + 1 ∈ R[x].

We have seen in Example 6.4 that h = gcd( f , g) = yx + 1 in R[x], and hence e = 1.

Moreover, f /h = x2 + 1, g/h = (2y + 1)x + 1, and
 
f g 1 2y + 1 0
resx , = det  0 1 2y + 1  = 4y2 + 4y + 2 = 4(y − 1)(y − 3).
h h
1 0 1
6.5. Modular gcd algorithm in F[x, y] 161

We write resx to indicate that we consider x as the main variable; there is also
resy ( f , g) = x2 +1 ∈ F5 [x]. If we now let a = a(−1) for a ∈ R, corresponding to p =
y + 1, then b = gcd(lcx ( f ), lcx (g)) = y does not vanish modulo p, res( f /h, g/h) =
res( f /h, g/h)(−1) 6= 0, and hence deg gcd( f , g) = 1. Actually

lc(h) · gcd( f , g) = − gcd(−x3 + x2 − x + 1, x2 − 2x + 1) = −x + 1 = h(−1) = h.

On the other hand, if a = a(1) for a ∈ R, then b 6= 0, res( f /h, g/h) = 0, and
deg gcd( f , g) > 1. In fact, gcd( f , g) = x2 + 3x + 2 = g/3. ✸

6.5. Modular gcd algorithm in F[x, y]

In Section 6.11, we present a modular algorithm that computes all results of the
Extended Euclidean Algorithm, including the gcd and the Bézout coefficients. But
if just the gcd is required, there is a better way which we now describe.
In a modular algorithm, say with a big prime p, we need two conditions to be
satisfied: p has to be large enough so that the coefficients of the gcd can be re-
covered from their images modulo p, and p should not divide the resultant and the
leading coefficient of the gcd, so that its degree does not change modulo p. When
both input polynomials have degree about n and coefficients of length n, then the
bound for the first condition is O(n), but for the second one it is about n2 . The trick
now is to choose p randomly so that coefficient recovery is always guaranteed, but
the non-divisibility condition only with high probability.
This introduces the important method of probabilistic algorithms. Such an al-
gorithm takes an input, makes some random choices (say, chooses several times a
bit, either 0 or 1, each with equal probability), does some calculations, and returns
an output. If one can prove that the probability of returning the correct output is
at least some value greater than 1/2, say 2/3, then one can run the algorithm re-
peatedly, and will obtain the correct answer by a majority vote with probability
arbitrarily close to 1. This is called a Monte Carlo algorithm. In some appli-
cations, as in this chapter, one can easily test the output for correctness. Then
the error probability becomes zero, and only the running time is a random vari-
able. This is called a Las Vegas algorithm. See Notes 6.5 and Section 25.8 for
discussions.
These probabilistic algorithms actually started in computer algebra, with Berle-
kamp’s polynomial factorization (Section 14.8) and Solovay & Strassen’s primal-
ity test (Section 18.5). Their power and simplicity has made them a ubiquitous
tool in many areas of computer science. We have seen examples of probabilistic
modular testing in Section 4.1. These methods have an inherent uncertainty, but
it can be made arbitrarily small, and thus they are like playing a highly attractive
lottery: the stakes are only a tiny fraction of the jackpot (say, polynomial time vs.
exponential time), but you are almost guaranteed to win!
162 6. The resultant and gcd computation

In a probabilistic algorithm, the error probability bound has to hold no matter

what the inputs are. This must not be confused with the average case analysis of
an algorithm, where the average cost is determined for some reasonable proba-
bility distribution of the inputs. This sometimes provides a valuable insight, but
its Achilles heel is whether the inputs in a real environment actually follow the
assumed distribution.
Now here is our first probabilistic algorithm. It works over a field F where
sufficiently many irreducible univariate polynomials of any degree can be found.
By lcx ( f ) we denote the leading coefficient of a bivariate polynomial f ∈ F[x, y]
with respect to the variable x.

A LGORITHM 6.28 Modular bivariate gcd: big prime version.

Input: Primitive polynomials f , g ∈ F[x, y] = R[x], where R = F[y] and F is a field,
with degx f = n ≥ degx g ≥ 1 and degy f , degy g ≤ d.
Output: h = gcd( f , g) ∈ R[x].

1. b ←− gcd(lcx ( f ), lcx (g))

2. repeat

3. choose a random monic irreducible polynomial p ∈ R with deg p =

d + 1 + deg b

4. f ←− f mod p, g ←− g mod p
call the Euclidean Algorithm 3.14 over R/hpi to compute the monic
v ∈ R[x] with degy v < degy p and v mod p = gcd( f , g) ∈ (R/hpi)[x]

5. compute w, f ∗, g∗ ∈ R[x] of degrees in y less than degy p such that

w ≡ bv mod p, f ∗ w ≡ b f mod p, g∗ w ≡ bg mod p (7)

6. until degy ( f ∗ w) = degy (b f ) and degy (g∗ w) = degy (bg)

7. return ppx (w)

If also the cofactors f /h and g/h are needed, they can easily be obtained as
ppx ( f ∗ ) and ppx (g∗ ). Before computing f ∗ and g∗ in step 5, one will first test
whether the constant coefficient of w divides the constant coefficients of b f and bg,
and go back to step 3 if this test fails. We may compute f ∗ and g∗ as f ∗ ≡ f /v
mod p and g∗ ≡ g/v mod p.
To compute the gcd of non-primitive polynomials, we first compute the gcd of
their contents, then apply the algorithm to their primitive parts, and finally multiply
its result by the gcd of the contents. If the gcd of the constant coefficients of
6.5. Modular gcd algorithm in F[x, y] 163

f and g is smaller than b, then exchanging the roles of the leading and the constant
coefficients decreases the required degree of p.
The remarks above also apply to the modular gcd algorithms 6.34, 6.36, and
6.38 below.

T HEOREM 6.29.
Let f , g be an input, h = gcd( f , g) in R[x], and r = resx ( f /h, g/h) ∈ R = F[y].
Then r is a nonzero polynomial of degree at most 2nd , the halting condition in
step 6 is satisfied if and only if p does not divide r, and then the correct output is
returned in step 7. The cost for one iteration of steps 4 through 6 is no more than
48n2 d 2 + O(nd(n + d)) or O(n2 d 2 ) operations in F . If b = 1, then the cost is at
most 12n2 d 2 + O(nd(n + d)). Steps 1 and 7 take O(nd 2 ) operations in F .

P ROOF. We have gcd( f /h, g/h) = 1. Since h and f /h divide f , their degrees in y
are at most degy f ≤ d, and similarly for g/h, and Corollary 6.20 and Theorem 6.22
yield the first claim. Moreover, degy b < degy p, and hence p ∤ b. We first assume
that p ∤ r, and let α = lc(h) ∈ R. Then Theorem 6.26 implies that αv ≡ h mod p.
Moreover, α | b, and hence w ≡ bv ≡ (b/α)h mod p. Both w and (b/α)h have
degree in y less than degy p, whence they are equal. Similarly, we find that f ∗ =
b f /w and g∗ = bg/w, and the degree conditions in step 6 are satisfied since all
congruences in (7) are in fact equalities. Now h is primitive, by Corollary 6.10,
and the algorithm returns the correct result ppx (w) = ppx ((b/α)h) = h in step 7
since h, α, and b are all normalized.
On the other hand, if p | r, then Theorem 6.26 implies that degx w = degx v >
degx h. If the degree conditions in step 6 were true, then the congruences in (7)
would be equalities, and pp(w) would be a common divisor of f and g of higher
degree in x than degx h. This contradiction finishes the correctness proof.
Computationally, nothing happens in reducing f and g modulo p in step 4. The
cost for the Euclidean Algorithm is at most 2n2 + O(n) additions and multiplica-
tions in R/hpi, plus at most n + 2 modular inversions, by Theorem 3.16. The cost
for one addition or multiplication in this residue class ring is at most 4(deg p)2 +
O(deg p) operations in F, by Corollary 4.6. Since deg p = d + 1 + deg b ≤ 2d + 1,
the total cost for step 4 is at most 32n2 d 2 + O(nd(n + d)) operations in F (the
cost for the modular inversions is subsumed by the “O” term), and only at most
8n2 d 2 + O(nd(n + d)) if b = 1.
By Section 2.4, the cost for the three multiplications by leading coefficients
and the two modular divisions in step 5 is at most 4 degx w · (n − degx w) + O(n)
additions and multiplications modulo p. Since m(n − m) ≤ n2 /4 for all m ∈ R,
this amounts to at most n2 + O(n) modular operations or 16n2 d 2 + O(nd(n + d))
operations in F, and only 4n2 d 2 + O(nd(n + d)) if b = 1. Steps 1 and 7 use at most
n + 1 gcds and divisions of polynomials in F[y] of degree at most 2d, or O(nd 2 )
operations in F. ✷
164 6. The resultant and gcd computation

We have ignored the cost for finding p in step 3. For a finite field F = Fq , we
will discuss this in Section 14.9: Corollary 14.44 implies that this can be done
with an expected number of O∼ (d 2 log q) operations in Fq , and that the expected
number of iterations of the algorithm is at most two if d ≥ 4 + 2 log2 n. Here the
O∼ notation ignores logarithmic factors (Section 25.7).

6.6. Mignotte’s factor bound and a modular gcd algorithm in Z[x]

In order to adapt Algorithm 6.28 to Z[x], we need an a priori bound on the coeffi-
cient size of h. Over F[y], the bound

degy h ≤ degy f (8)

is trivial and quite sufficient. Over Z, we could use the subresultant bound of
Theorem 6.52 below, but we now derive a much better bound. It actually depends
only on one argument of the gcd, say f , and is valid for all factors of f . We will
use this again for the factorization of f in Chapter 15.
We extend the 2-norm to a complex polynomial f = ∑0≤i≤n fi xi ∈ C[x] by
1/2
|| f ||2 = ∑0≤i≤n | fi |2 ∈ R, where |a| = (a · a)1/2 ∈ R is the norm of a ∈ C and
a is the complex conjugate of a. We will derive a bound for the norm of factors
of f in terms of || f ||2 , that is, a bound B ∈ R such that any factor h ∈ Z[x] of f
satisfies ||h||2 ≤ B. One might hope that we can take B = || f ||2 , but this is not
the case. For example, let f = xn − 1 and h = Φn ∈ Z[x] be the nth cyclotomic
polynomial (Section 14.10). Thus Φn divides xn − 1, and the direct analog of (8)
would say that each coefficient of Φn is at most 1 in absolute value, but for ex-
ample Φ105 , of degree 48, contains the term −2x7 . In fact, the coefficients of Φn
are unbounded in absolute value if n −→ ∞, and hence this is also true for ||h||2 .
Worse yet, for infinitely many integers n, Φn has a very large coefficient, namely
larger than exp(exp(ln 2 · ln n/ lnln n)), where ln is the logarithm in base e; such a
coefficient has word length somewhat less than n. It is not obvious how to control
the coefficients of factors at all, and it is not surprising that we have to work a little
bit to establish a good bound.

L EMMA 6.30. For f ∈ C[x] and z ∈ C, we have ||(x − z) f ||2 = ||(zx − 1) f ||2 .

P ROOF. Writing f = ∑0≤i≤n fi xi and letting f−1 = fn+1 = 0, we calculate

||(x − z) f ||22 = ∑ | fi−1 − z fi |2 = ∑ ( fi−1 − z fi )( fi−1 − z fi )

0≤i≤n+1 0≤i≤n+1

= || f ||22 (1 + |z|2 ) − ∑ (z fi−1 fi + z fi−1 fi )

0≤i≤n+1
6.6. Mignotte’s factor bound and a modular gcd algorithm in Z[x] 165

= ∑ (z fi−1 − fi )(z fi−1 − fi ) = ∑ |z fi−1 − fi |2

0≤i≤n+1 0≤i≤n+1

= ||(zx − 1) f ||22 . ✷

Let
f= ∑ f i xi = f n ∏ (x − zi ),
0≤i≤n 1≤i≤n

with f0 , . . . , fn , z1 , . . . , zn ∈ C. We define M( f ) = | fn | · ∏1≤i≤n max{1, |zi |}, and

note that M( f ) ≥ | lc( f )| and M( f ) = M(g)M(h) if f = gh for g, h ∈ C[x]. The
following theorem of Landau (1905) relates M( f ) and || f ||2 .

T HEOREM 6.31 Landau’s inequality.

For any f ∈ C[x], we have M( f ) ≤ || f ||2 .

P ROOF. We arrange the roots so that |z1 |, . . . , |zk | > 1 and |zk+1 |, . . . , |zn | ≤ 1 for
some k ∈ {0, . . . , n}, so that M( f ) = | fn · z1 · · · zk |. Let
g = fn ∏ (zix − 1) ∏ (x − zi ) = gn xn + · · · + g0 ∈ C[x].
1≤i≤k k<i≤n

Then
g 2

M( f )2 = | fn z1 · · · zk |2 = |gn |2 ≤ ||g||22 = (x − z1 ) = · · ·
z1 x − 1 2
g 2

= (x − z1 ) · · · (x − zk ) = || f ||22 ,
(z1 x − 1) · · · (zk x − 1) 2

using repeatedly the previous lemma, and the assertion follows. ✷

It is convenient to use also the 1-norm || f ||1 = ∑0≤i≤n | fi |, so that || f ||∞ ≤ || f ||2 ≤
|| f ||1 ≤ (n + 1)|| f ||∞ .

T HEOREM 6.32.
If h = ∑0≤i≤m hi xi ∈ C[x] of degree m divides f = ∑0≤i≤n fi xi ∈ C[x] of degree
n ≥ m, then
hm
||h||2 ≤ ||h||1 ≤ 2 M(h) ≤ 2m || f ||2 .
m
fn

P ROOF. We write h = hm ∏1≤i≤m (x − ui ) with ui ∈ C for 1 ≤ i ≤ m, and note that

each ui equals some root z j of f . By Viète’s rule, we can express the coefficients
of h in terms of its roots as
hi = (−1)m−i hm ∑ ∏ u j,
S⊆{1,...,m} j∈S
#S=m−i
166 6. The resultant and gcd computation

where the sum is the (m − i)th elementary symmetric polynomial in u1 , . . . , um , and

hence m
|hi | ≤ |hm | · ∑ ∏ |u j | ≤ M(h)
S j∈S i
for 0 ≤ i ≤ m. Thus

hm m hm
m
||h||2 ≤ ||h||1 = ∑ |hi | ≤ 2 M(h) ≤ 2 M( f ) ≤ 2m || f ||2 ,

0≤i≤m fn fn

by the sum formula for the binomial coefficients and Landau’s inequality. ✷

C OROLLARY 6.33 Mignotte’s bound.

Suppose that f , g, h ∈ Z[x] have degrees deg f = n ≥ 1, deg g = m, deg h = k, and
that gh divides f (in Z[x]). Then

(i) ||g||∞ ||h||∞ ≤ ||g||2 ||h||2 ≤ ||g||1 ||h||1 ≤ 2m+k || f ||2 ≤ (n + 1)1/2 2m+k || f ||∞ ,

(ii) ||h||∞ ≤ ||h||2 ≤ 2k || f ||2 ≤ 2k || f ||1 and ||h||∞ ≤ ||h||2 ≤ (n + 1)1/2 2k || f ||∞ .

P ROOF. By Theorem 6.32 and Landau’s inequality, we have

||g||1 ||h||1 ≤ 2m+k M(g)M(h) ≤ 2m+k M( f ) ≤ 2m+k || f ||2 .

This proves (i), and (ii) follows by taking g = 1. ✷

Suppose that the polynomials f , g ∈ Z[x] have degrees n = deg f ≥ deg g and
max-norm || f ||∞ , ||g||∞ at most A. Then the max-norm of gcd( f , g) ∈ Z[x] is at
most (n + 1)1/2 2n A, by Corollary 6.33. We now have the following algorithm for
computing a gcd in Z[x], completely analogous to Algorithm 6.28.

A LGORITHM 6.34 Modular gcd in Z[x]: big prime version.

Input: Primitive polynomials f , g ∈ Z[x] with deg f = n ≥ deg g ≥ 1 and max-norm
|| f ||∞ , ||g||∞ ≤ A.
Output: h = gcd( f , g) ∈ Z[x].

1. b ←− gcd(lc( f ), lc(g)), B ←− (n + 1)1/2 2n Ab

2. repeat

3. choose a random prime p ∈ N with 2B < p ≤ 4B

4. f ←− f mod p, g ←− g mod p
call the Euclidean Algorithm 3.14 over Z p to compute the monic v in
Z[x] with ||v||∞ < p/2 such that v mod p = gcd( f , g) ∈ Z p [x]
6.6. Mignotte’s factor bound and a modular gcd algorithm in Z[x] 167

5. compute w, f ∗, g∗ ∈ Z[x] of max-norm less than p/2 such that

w ≡ bv mod p, f ∗ w ≡ b f mod p, g∗ w ≡ bg mod p (9)

6. until || f ∗ ||1 ||w||1 ≤ B and ||g∗ ||1 ||w||1 ≤ B

7. return pp(w)

In step 3, we need primes satisfying certain conditions. We do not yet have the
tools to solve this task, and postpone its discussion to Section 18.4. The following
is analogous to Theorem 6.29.

T HEOREM 6.35.
Let h be the normalized gcd of f and g in Z[x], so that lc(h) > 0. Then r =
res( f /h, g/h) is a nonzero integer with |r| ≤ (n + 1)n A2n , the halting condition in
step 6 is true if and only if p does not divide r, and then the output in step 7 is
correct. The cost for one execution of steps 4 and 5 is O(n2 (n2 + log2 A)) word
operations, and steps 1 and 7 take O(n(n2 + log2 A)) word operations.

P ROOF. For the correctness, it is sufficient to see that the condition in step 6
holds if and only if pp(w) = h. If the condition holds, then || f ∗ w||∞ ≤ || f ∗ w||1 ≤
|| f ∗ ||1 ||w||1 ≤ B < p/2, and ||b f ||∞ < p/2 and f ∗ w ≡ b f mod p imply that f ∗ w =
b f . Similarly, we find g∗ w = bg. Thus w | gcd(b f , bg), and Theorem 6.26 (ii) im-
plies that deg w = deg gcd(b f , bg), and hence pp(w) = gcd( f , g) since both poly-
nomials are normalized.
On the other hand, if pp(w) = gcd( f , g) with the w calculated in step 5, then
w divides b f , Mignotte’s bound 6.33 shows that ||b f /w||∞ ≤ B < p/2, and hence
the congruence f ∗ ≡ b f /w mod p is an equality. Similarly, we find g∗ = bg/w,
and another application of Corollary 6.33 implies the condition in step 6. Exercise
6.25 shows that p ∤ r if and only if pp(w) = h.
With k = deg h, we have || f /h||2 , ||g/h||2 ≤ (n + 1)1/2 2n−k A, again by Corol-
2
lary 6.33, and Theorem 6.23 gives |r| ≤ 4n (n + 1)n A2n ; Exercise 6.24 yields the
better bound stated in the theorem. Step 4 takes O(n2 ) arithmetic operations in Z p ,
and the cost for each of these is O(log2 p) word operations. Now log p ≤ log(4B) ∈
O(n+log A), whence step 4 uses O(n2 (n2 +log2 A)) word operations, and the same
bound holds for the divisions in step 5. Steps 1 and 7 take O(n) gcd’s and divisions
on integers of length O(n + log A), or O(n(n2 + log2 A)) word operations. ✷

In Section 18.4, we show that we can find a random number p between 2B and
4B such that p is prime and p ∤ r with probability at least 1/2 by a probabilistic
algorithm using O∼ (log3 B) or O∼ (n3 + log3 A) word operations (Corollary 18.11).
Then the expected number of iterations of the algorithm is at most two.
168 6. The resultant and gcd computation

6.7. Small primes modular gcd algorithms

We have seen in Section 5.5 that the small primes modular approach for computing
the determinant is computationally superior to the big prime scheme. The reason
that we have discussed big prime modular gcd algorithms at all in the preceding
sections is that they are easier and the main idea is more clearly visible than for
their small prime variants that we will present now. In practice, we strongly rec-
ommend the use of the latter. We start with the algorithm for F[x, y] since it is
simpler to describe and analyze than the corresponding algorithm for Z[x].

A LGORITHM 6.36 Modular bivariate gcd: small primes version.

Input: Primitive polynomials f , g ∈ F[x, y] = R[x] with degx f = n ≥ degx g ≥ 1 and
degy f , degy g ≤ d, where R = F[y] for a field F with at least (4n+2)d elements.
Output: h = gcd( f , g) ∈ R[x].

1. b ←− gcd(lcx ( f ), lcx (g)), l ←− d + 1 + degy b

2. repeat
3. choose a set S ⊆ F of 2l evaluation points
4. S ←− {u ∈ S: b(u) 6= 0}
for each u ∈ S call the Euclidean Algorithm 3.14 over F to compute
the monic vu = gcd( f (x, u), g(x, u)) ∈ F[x]
5. e ←− min{deg vu : u ∈ S}, S ←− {u ∈ S: deg vu = e}
if #S ≥ l then remove #S − l elements from S else goto 3
6. compute by interpolation each coefficient in F[y] of the polynomials
w, f ∗, g∗ ∈ R[x] of degrees in y less than l such that

w(x, u) = b(u)vu ,

f ∗ (x, u)w(x, u) = b(u) f (x, u), g∗ (x, u)w(x, u) = b(u)g(x, u)

for all u ∈ S
7. until degy ( f ∗ w) = degy (b f ) and degy (g∗ w) = degy (bg)
8. return ppx (w)

In practice, one will choose the points from F adaptively, starting with about l
or even fewer elements in S, remove “unlucky” points that are detected in steps 4,
5, or 7 from S, and add some new random points to S if the condition in step 7
is violated. If the gcd is constant, then only one “lucky” point is sufficient to
detect this. The analysis of the above algorithm is somewhat easier, though. These
remarks also apply to Algorithm 6.38 below.
6.7. Small primes modular gcd algorithms 169

T HEOREM 6.37.
Algorithm 6.36 correctly computes the gcd of f and g. One iteration of the loop
uses at most 10n2 d + 36nd 2 + O((n + d)d) arithmetic operations in F , and only
5n2 d + 13nd 2 + O((n + d)d) if b = 1. If d ≥ 1 and we choose S in step 3 as a
uniform random subset with 2l elements of a fixed finite set U ⊆ F of cardinality
#U ≥ (4n + 2)d , then the expected number of iterations is at most 2. The cost for
steps 1 and 8 is at most 10nd 2 +O(nd) operations in F , or even 52 nd 2 +O((n+d)d)
if b = 1.

P ROOF. We have degy b ≤ d, l ≤ 2d + 1, and #U ≥ 6d ≥ 2l, which implies that

enough points can be chosen in step 3. The correctness then follows as in the proof
of Theorem 6.29. In step 4, the cost per u is O(d) operations in F for evaluating
b at it, at most 4nd + O(n + d) operations in F for evaluating (all coefficients of)
f and g at y = u, plus at most 2n2 + O(n) for the gcd, by Theorem 3.16, in total
4n2 l + 8ndl + O((n + d)d) operations for all u ∈ S. In step 6, we first compute
the modular cofactors f (x, u)/vu and g(x, u)/vu for all u ∈ S. As in the proof of
Theorem 6.29, this takes at most n2 + O(n) operations in F per point u, altogether
no more than n2 l + O(nl) operations. Then we solve an interpolation problem at
l points for each of the at most 2n + 2 coefficients of w, f ∗ , and g∗ , taking a total
of 5nl 2 + O((n + d)d) operations in F, by Exercise 5.11. Thus one iteration of the
loop 2 takes 5n2 l + 8ndl + 5nl 2 + O((n + d)d) operations in F. The cost for steps
1 and 8 is at most n + 1 gcd’s of polynomials of degree at most l, plus the same
number of divisions, in total no more than 52 nl 2 + O((n + d)d) operations.
It remains to determine the expected number of iterations. The primes in the
algorithm are linear polynomials y − u. Let r = resx ( f /h, g/h) ∈ F[y]. As in the
big prime approach, the halting condition in step 7 is satisfied if and only if y − u
does not divide the polynomial r, or equivalently, r(u) 6= 0, for all u ∈ S. Thus
the choice of S in step 3 is successful if and only if the polynomial br does not
vanish for at least l points in S. We will say that u ∈ U is unlucky if b(u)r(u) = 0.
The degree of br in y is at most (2n + 1)d, by Theorem 6.29. Since U has at
least (4n + 2)d elements, at most half of the points in U are unlucky. Thus the
probability that at most half of the points in S in step 3 are unlucky is at least 1/2
(Exercise 6.31). Therefore the expected number of iterations of the algorithm is at
most two, and the claim follows. ✷

By increasing the size of U, the failure probability in a single run can be re-
duced and the expected number of iterations of the algorithm can be brought down
arbitrarily close to one. A variant of the algorithm is analyzed in Section 24.3.
The running time of the small primes modular gcd algorithm is better by about
one order of magnitude than for the big prime variant when n ≈ d. If fast polyno-
mial arithmetic, as described in Part II, is used, then the cost even drops to O∼ (nd)
(Corollary 11.12).
170 6. The resultant and gcd computation

When F does not have sufficiently many elements, say when F = F2 , then we
have a problem in step 3. This can be circumvented by either making a suit-
able field extension, which increases all timings by a factor of O(log2 (nd)) (Exer-
cise 6.32), or by choosing nonlinear moduli.
Here is the analogous algorithm for Z[x]. We denote the natural (base e) log-
arithm by ln.

A LGORITHM 6.38 Modular gcd in Z[x]: small primes version.

Input: Primitive polynomials f , g ∈ Z[x] with deg f = n ≥ deg g ≥ 1 and max-norm
|| f ||∞ , ||g||∞ ≤ A.
Output: h = gcd( f , g) ∈ Z[x].
l m
1. b ←− gcd(lc( f ), lc(g)), k ←− 2 log2 (n + 1)n bA2n
B ←− (n + 1)1/2 2n Ab, l ←− ⌈log2 (2B + 1)⌉

2. repeat

3. choose a set S of 2l primes, each less than 2k ln k

4. S ←− {p ∈ S: p ∤ b}
for each p ∈ S call the Euclidean Algorithm 3.14 over Z p to com-
pute the monic v p ∈ Z[x] with coefficients in {0, . . . , p − 1} such that
v p mod p = gcd( f , g) ∈ Z p [x], where the bar indicates reduction of
each coefficient modulo p
5. e ←− min{deg v p : p ∈ S}, S ←− {p ∈ S: deg v p = e}
if #S ≥ l then remove #S − l elements from S else goto 3
6. call the Chinese Remainder Algorithm 5.4 to compute each coeffi-
cient of the unique polynomials w, f ∗ , g∗ ∈ Z[x] with max-norms less
than (∏ p∈S p)/2 and

w ≡ bv p mod p, f ∗ w ≡ b f mod p, g∗ w ≡ bg mod p

for all p ∈ S
7. until || f ∗ ||1 ||w||1 ≤ B and ||g∗ ||1 ||w||1 ≤ B

8. return pp(w)

In step 3, we need primes satisfying certain conditions. This is discussed in

Section 18.4. In practice, 2k ln k will be less than the word size of the processor,
and we may take (possibly precomputed) single-word primes (just under 232 or 264 ,
depending on the computer’s word size).
6.8. Application: intersecting plane curves 171

T HEOREM 6.39.
Algorithm 6.38 works correctly. One execution of steps 4 through 7 can be per-
formed with O(n(n2 + log2 A)(log n + loglog A)2 ) word operations, and the same
estimate holds for steps 1 and 8.

P ROOF. Correctness follows as in the proof of Theorem 6.35. For the running
time estimate, we first note that log p ∈ O(log k) for each prime p ∈ S. In step 4,
the cost per prime p is O(n log A · log k) word operations for reducing b and all
coefficients of f and g modulo p, and O(n2 ) operations in Z p or O(n2 log2 k) word
operations for the gcd, a total of O(n(n log k + log A)l log k) word operations. In
step 6, we perform two divisions with remainder f /v p and g/v p modulo p for
each p ∈ S, taking O(n2 log2 k) word operations, and then apply the Chinese Re-
mainder Algorithm to each of the at most 2n + 2 coefficients of w, f ∗ , and g∗ .
We have log ∏ p∈S p = ∑ p∈S log p ∈ O(l log k), Theorem 5.8 implies that the cost
for each coefficient is O(l 2 log2 k) word operations, and the cost for all coeffi-
cients is O(nl 2 log2 k). The cost for steps 1 and 8 is as in Theorem 6.35. We have
l ∈ O(n + log A) and log k ∈ O(log n + loglog A), and the claims follow. ✷

As in the polynomial case, the cost estimate for the small primes algorithm is
smaller by about one order of magnitude as for the big prime variant. If we use
single precision primes, then the cost is about O(nl(n + l)) word operations. In
Section 11.1, we will show that the cost drops to O∼ (n2 + n log A) when using the
fast methods for polynomial and integer arithmetic from Part II. In Section 18.4,
we show that the first k primes p1 = 2, . . . , pk can be computed deterministically
by the sieve of Eratosthenes, taking O(k log2 k loglog k) or O∼ (n(n + log A)) word
operations, and that each of them is at most 2k ln k. The value k is an upper bound
on 2 log2 |b res( f /h, g/h)|, by Theorem 6.35, and this guarantees that at least k/2
of our k primes do not divide b res( f /h, g/h). By Theorem 6.26, at least half of
the primes p1 , . . . , pk are “lucky”. We have 2l ≤ k, and if we choose the set S as
a uniform random subset with 2l elements of {p1 , . . . , pk } in step 3, then Exercise
6.31 shows that at least l of the primes in S are lucky with probability no less than
1/2. As in the bivariate case, the condition in step 7 is satisfied if and only if at
least l primes are lucky, and the expected number of iterations of the algorithm is
at most two.
In practice, one would use an adaptive approach as described for the bivariate
case, in particular since the Mignotte bound on the coefficients of h is often too
large. We present running times of such an implementation in Section 6.13.

6.8. Application: intersecting plane curves

This and the following section discuss two applications of resultants which are not
used later.
172 6. The resultant and gcd computation

The historical purpose of the resultant was to solve geometric problems by elim-
ination of variables. As an example, we want to determine the common roots of
two polynomials in two variables, or, equivalently, the intersection of two plane
curves. Suppose we are given f , g ∈ F[x, y], where F is a field, and want to inter-
sect the two plane curves

X = {(a, b) ∈ F 2 : f (a, b) = 0}, Y = {(a, b) ∈ F 2 : g(a, b) = 0}.

We eliminate the variable y by considering the resultant r = resy ( f , g) ∈ F[x] with

respect to y. We assume that F is algebraically closed, so that every nonconstant
univariate polynomial over F has a root. This is often required to make general
statements about geometric objects such as our curves X and Y true. The reader
may imagine that F = C. Now we let Z be the projection of X ∩Y onto the x-axis.
If a ∈ F and lcy ( f ), lcy (g) do not both vanish at x = a, then Lemma 6.25 implies
that

a ∈ Z ⇐⇒ ∃b ∈ F (a, b) ∈ X ∩Y ⇐⇒ ∃b ∈ F f (a, b) = g(a, b) = 0

⇐⇒ gcd( f (a, y), g(a, y)) 6= 1
⇐⇒ r(a) = resy ( f , g)(a) = 0.

Thus to determine X ∩ Y , we first compute the (d + e) × (d + e) determinant over

F[x] giving r ∈ F[x], where d and e are the degrees in y of f and g, respectively,
then find all roots of r, and for each such root a we find all the roots b ∈ F of
gcd( f (a, y), g(a, y)) ∈ F[y]. The fact that r(a) = 0 guarantees that such b’s exist if
not both leading coefficients with respect to y vanish at x = a. In other words, Z is
contained in the set of roots of r.
This means that intersecting two plane curves is reduced to finding roots of uni-
variate polynomials, a much easier task. (For F = Q or a finite field, we will solve
this task in Part III.) If n = deg f , m = deg g, where deg is the total degree, and
gcd( f , g) = 1 in F[x, y], then X ∩ Y has “in general” nm points. This is called Bé-
zout’s theorem after the French geometer Étienne Bézout. It is valid for arbitrary
algebraic varieties, provided one counts points “at infinity” and “multiple points
and components” properly, and all intersection components have the “correct” di-
mension.
If gcd( f , g) = h for some nonconstant h ∈ F[x, y], then r is the zero polynomial,
all points of the curve {h = 0} belong to X ∩ Y , and the intersection is infinite.
A trivial example is when f = h = g.

E XAMPLE 6.40. We start with a simple example, illustrated in Figure 6.2 on

page 174. Let f = x2 + y2 − 1 and g = 3x + 4y in C[x]. Then X = {(a, b) ∈
C 2 : f (a, b) = 0} is a circle of radius 1 centered at the origin (blue), and Y =
{(a, b) ∈ C 2 : g(a, b) = 0} is a line with slope −3/4 through the origin (green).
6.8. Application: intersecting plane curves 173

Bézout’s theorem says that there are deg f · deg g = 2 intersection points; they are
depicted in Figure 6.2 (black). We might compute them by solving g = 0 for y and
plugging this into f = 0, but let us proceed systematically to illustrate the resultant
method. We have
 
1 4 0
resy ( f , g) = det  0 3x 4  = 25x2 − 16,
x2 − 1 0 3x

and the projection Z of X ∩ Y onto the x axis consists of the two zeroes Z =
{4/5, −4/5} of resy ( f , g) (red). We obtain the corresponding values for y by tak-
ing gcds:

4 4
2 9 12 3
gcd f , y , g , y = gcd y − , 4y + = y+ ,
5 5 25 5 5

4 4 9 12 3
gcd f − , y , g − , y = gcd y2 − , 4y − = y− ,
5 5 25 5 5
and hence the two intersection points are
n 4 3 4 3 o
X ∩Y = ,− , − , .✸
5 5 5 5

E XAMPLE 6.41. We consider the two plane curves X,Y ⊆ C 2 given by the two
polynomials

f = (y2 + 6)(x − 1) − y(x2 + 1), g = (x2 + 6)(y − 1) − x(y2 + 1) ∈ Z[x, y].

Exchanging x and y corresponds to a swap of f and g, and hence the whole situa-
tion is symmetric with respect to this exchange.
Before any calculations, let us look at a picture, in Figure 6.3 on page 174. This
is easy to generate with M APLE’s implicitplot command. The projection Z of
X ∩Y onto the x-axis is readily calculated: the resultant

resy ( f , g) = 2x6 − 22x5 + 102x4 − 274x3 + 488x2 − 552x + 288

= 2(x − 2)2 (x − 3)2 (x2 − x + 4)
√ √
has the four distinct roots in Z = {2, 3, (1 ± 15i)/2}, where i = −1. For each
of these values for x we calculate the corresponding values for y:

gcd( f (2, y), g(2, y)) = gcd(y2 − 5y + 6, −2y2 + 10y − 12)

= y2 − 5y + 6 = (y − 2)(y − 3),
gcd( f (3, y), g(3, y)) = gcd(2y2 − 10y + 12, −3y2 + 15y − 18)
= y2 − 5y + 6 = (y − 2)(y − 3),
174 6. The resultant and gcd computation

y
✻
Y
•
X

• • ✲
x
Z •

F IGURE 6.2: Intersection of the circle X with the line Y .

3
y

0 1 2 x 3 4 5

F IGURE 6.3: The three curves f = 0 (blue), g = 0 (green), and f + g = 0 (pink), and the
projection of their intersection points to the x-axis (red).

and
!
1 ± √15i 1 ± √15i √
1 ∓ 15i
gcd f ,y ,g ,y = y− .
2 2 2

Thus X ∩Y consists of the six points

n 1 + √15i 1 − √15i 1 − √15i 1 + √15i o
(2, 2), (2, 3), (3, 2), (3, 3), , , , .
2 2 2 2
Only the four real points are visible in Figure 6.3. Staring a bit at the equations,
6.8. Application: intersecting plane curves 175

one observes that in f + g the terms of degree 3 cancel, and in fact f + g = 0 is the
equation of a circle, the pink curve in Figure 6.3.
Bézout’s theorem says that X ∩Y consists of 3 · 3 = 9 points. We only found six
of them. Where are the others? This book’s margin is too narrow to contain them,
because they lie at infinity! ✸

E XAMPLE 6.42. As a simple example, we take our two curves to be lines: X =

{(a, b): ua + vb + w = 0}, Y = {(a, b): pa + qb + r = 0}, where u, v, w, p, q, r, ∈ F
are given. The corresponding resultant is

v q
resy (ux + vy + w, px + qy + r) = det
ux + w px + r
= (vp − uq)x + (vr − wq) ∈ F[x].

This linear polynomial has a root if and only if either the leading coefficient vp−uq
is nonzero (then X ∩Y consists of one point), or if it and vr − wq both vanish (then
X = Y ). ✸

The general theory of linear algebra generalizes this well-known criterion for
simultaneous solvability of two linear equations in two variables. In a similar way,
geometric elimination theory tries to generalize our curve intersection method to
higher dimensions. This is a much more difficult problem, and the current algo-
rithmic methods are feasible only for a fairly small number of variables. We give
an introduction to one successful method in Chapter 21: Gröbner bases.
We give a further application of resultants from the theory of algebraic field
extensions. Suppose we have two elements α, β in an algebraic extension E of a
field F, with minimal polynomials f , g ∈ F[x], respectively (Section 25.3). How
can we find the minimal polynomial h of α + β ? Since (α + β , β ) ∈ E 2 is a com-
mon zero of g(y) and of f (x − y), the resultant r = resy ( f (x − y), g(y)) ∈ F[x] is
nonconstant and has α + β as a root. Thus h is a factor of r.
√ √
E XAMPLE 6.43. We let F = Q, α = i = −1, β = 3. Then f = x2 + 1, g =
x2 − 3, f (x − y) = y2 − 2xy + x2 + 1, and
 
1 0 1 0
 −2x 1 0 1  4 2
r = resy ( f (x − y), g(y)) = det  
 x2 + 1 −2x −3 0  = x − 4x + 16.
0 x2 + 1 0 −3

This polynomial is irreducible in Q[x], as can be checked by verifying that it has no

monic linear or√quadratic factors in Z[x], and hence√it is the minimal polynomial
of α + β = i + 3. Its four complex roots are ±i ± 3. ✸
176 6. The resultant and gcd computation

6.9. Nonzero preservation and the gcd of several polynomials

In this section, we discuss the following problem: Given nonzero polynomials
f1 , . . . , fn ∈ F[x] over a field F, compute h = gcd( f1 , . . . , fn ). Let d ∈ N be such
that deg fi ≤ d for all i. We are particularly interested in the case where d is
close to n. A simple approach is to set h1 = f1 and compute hi = gcd(hi−1 , fi )
for i = 2, . . . , n. If deg h is fairly large, say d/10, then this will take n − 1 gcd
calculations of polynomials of degree at least d/10.
We now present a more efficient algorithm that uses only one gcd calculation.
The basic tool for this probabilistic algorithm is the following useful lemma. It
says that a nonzero polynomial is likely to take a nonzero value at a random point.
In other words, random evaluations probably preserve nonzeroness.

L EMMA 6.44. Let R be an integral domain, n ∈ N, S ⊆ R finite with s = #S ele-

ments, and r ∈ R[x1 , . . . , xn ] a polynomial of total degree at most d ∈ N.

(i) If r is not the zero polynomial, then r has at most dsn−1 zeroes in Sn .

(ii) If s > d and r vanishes on Sn , then r = 0.

P ROOF. (i) We prove the claim by induction on n. The case n = 1 is clear, since
a nonzero univariate polynomial of degree at most d over an integral domain has
at most d zeroes (Lemma 25.4). For the induction step, we write r as a polynomial
in xn with coefficients in x1 , . . . , xn−1 : r = ∑0≤i≤k ri xni with ri ∈ R[x1 , . . . , xn−1 ] for
0 ≤ i ≤ k and rk 6= 0. Then deg rk ≤ d − k, and by the induction hypothesis, rk has
at most (d − k)sn−2 zeroes in Sn−1 , so that there are at most (d − k)sn−1 common
zeroes of r and rk in Sn . Furthermore, for each a ∈ Sn−1 with rk (a) 6= 0, the uni-
variate polynomial ra = ∑0≤i≤k ri (a)xni ∈ R[xn ] of degree k has at most k zeroes, so
that the total number of zeroes of r in Sn is bounded by

(d − k)sn−1 + ksn−1 = dsn−1 .

(ii) follows immediately from (i). ✷

In the example r = ∏1≤i≤d (xn − ai ), where a1 , . . . , ad ∈ S are distinct, the bound

in (i) is achieved. A typical application of (i) is in the analysis of probabilistic
algorithms, rewriting it as
d
prob{r(a) = 0: a ∈ Sn } ≤ , (10)
#S
where a is chosen in Sn uniformly at random. In the applications, the tricky part
is usually to show that r is nonzero. An amazing fact is that this bound on the
probability is independent of the number of variables.
6.9. Nonzero preservation and the gcd of several polynomials 177

For the probabilistic algorithm below, we assume that we have a finite set S ⊆ F
and a “random element generator for S”, which produces a uniform random mem-
ber of S. Instead of computing many gcds, it just uses one.

A LGORITHM 6.45 Gcd of many polynomials.

Input: f1 , . . . , fn ∈ F[x], where F is a field.
Output: h ∈ F[x] monic, so that h = gcd( f1 , . . . , fn ) is probably true.

1. choose a3 , . . . , an ∈ S independently at random

2. g ←− f2 + ∑3≤i≤n ai fi

3. return gcd( f1 , g)

Lemma 6.44 yields the following theorem.

T HEOREM 6.46.
Suppose that deg fi ≤ d for each i, and h∗ = gcd( f1 , . . . , fn ). Then the algorithm
uses at most 2(n − 2)(d + 1) + 2d 2 + O(d) operations in F , h∗ divides h, and
prob{h 6= h∗ } ≤ d/#S.

P ROOF. The cost estimate is immediate from Theorem 3.16. Since h∗ divides
f1 , . . . , fn , it divides g and gcd( f1 , g) = h. It remains to establish the bound on the
error probability.
Dividing each fi by h∗ if necessary, we may assume that gcd( f1 , . . . , fn ) = 1, and
also that f1 6= 0. Let A3 , . . . , An be new indeterminates over F(x), R = F[A3 , . . . , An ],
K = F(A3 , . . . , An ) the field of fractions of R, G = f2 + ∑3≤i≤n Ai fi ∈ R[x], and
r = resx ( f1 , G) ∈ R. Then r is a polynomial in A3 , . . . , An of degree at most d, and
Lemma 6.25, applied to the ideal I = hA3 − a3 , . . . , An − an i for which R/I ∼ = F,
shows that

r(a3 , . . . , an ) 6= 0 ⇐⇒ resx ( f1 , g) 6= 0 ⇐⇒ gcd( f1 , g) = 1.

In order to apply Lemma 6.44, we have to show that r is not the zero polynomial.
Let u be a common divisor of f1 and G in R[x]. Since u divides f1 , its coefficients
lie in the splitting field E of f1 over F. But E[x] ∩ K[x] = F[x], and hence u ∈ F[x].
If we think of G as a linear polynomial in A3 , . . . , An with coefficients in F[x], then
u | G implies that u divides the coefficients of G in that representation, so that u
divides f1 , . . . , fn . Since gcd( f1 , . . . , fn ) = 1, it follows that u ∈ F. Therefore the
gcd of f1 and G in R[x] is a constant. By Corollary 6.20, r is not the zero element
of R, and Lemma 6.44 yields the bound on the error probability. ✷
178 6. The resultant and gcd computation

The 2d 2 + O(d) from the time bound can be replaced by O(d log2 d · loglog d)
when using the fast Euclidean Algorithm from Chapter 11. The dominating cost
of about dn for Algorithm 6.45 is unavoidable, since this is the input size.
In practice, one would choose f1 to have minimal degree among f1 , . . . , fn . To
reduce the error probability to zero, we can in addition compute the remainders
f1 rem h, . . . , fn rem h; this is somewhat cheaper than n gcds, and h = h∗ if and
only if all remainders are zero. In the rare event that they are not, one can rerun the
algorithm with h and these remainders. This is particularly useful for computing
the primitive part of a bivariate polynomial with respect to one variable, since then
the quotients f1 /h∗ , . . . , fn /h∗ are needed anyway.
Using the Extended Euclidean Algorithm in step 3 yields a representation of h as
a linear combination of f1 , . . . , fn (Exercise 6.38). Using Algorithm 6.45, one can
also compute the least common multiple of several polynomials (Exercise 6.39).

6.10. Subresultants
In this section, we extend the resultant theory—which governs the gcd—to the
subresultants which cover all results of the Extended Euclidean Algorithm. As
before, this leads to efficient modular methods, but now for the whole algorithm.
The reader only interested in efficient gcd algorithms may skip this and proceed
directly to the implementation report in Section 6.13.
So now let F be an arbitrary field, and f , g ∈ F[x] nonzero of degrees n ≥ m,
respectively. We use the notation for the results of the Extended Euclidean Algo-
rithm, as in (1) on page 141, and ni = deg ri for 0 ≤ i ≤ ℓ + 1, with rℓ+1 = 0 and
deg rℓ+1 = −∞.

T HEOREM 6.47.
Let 0 ≤ k ≤ m ≤ n. Then k does not appear in the degree sequence if and only if
there exist s,t ∈ F[x] satisfying

t 6= 0, deg s < m − k, degt < n − k, deg(s f + tg) < k. (11)

P ROOF. “=⇒”: Suppose that k does not appear in the degree sequence. Then
there exists an i with 2 ≤ i ≤ ℓ + 1 such that ni < k < ni−1 . We claim that s = si
and t = ti do the job. We have s f + tg = ri , and deg ri = ni < k. Furthermore, from
Lemma 3.15 (b) we have

deg s = m − ni−1 < m − k,

0 ≤ degt = n − ni−1 < n − k.

The case i = ℓ + 1 gives s = g/rℓ and t = − f /rℓ , where k < nℓ and rℓ+1 = 0.
6.10. Subresultants 179

“⇐=”: Suppose there exist s,t ∈ F[x] satisfying (11). The Uniqueness Lemma
5.15 implies that there exist i ∈ {1, . . . , ℓ + 1} and α ∈ F[x] \ {0} such that t = αti
and r = s f + tg = αri . Then from Lemma 3.15 (b) we find

n − ni−1 ≤ deg α + n − ni−1 = deg(αti ) = degt < n − k,

ni ≤ deg α + ni = deg(αri ) = deg r < k.

Together these imply that ni < k < ni−1 , so that k is between two consecutive
remainder degrees and does not occur in the degree sequence. ✷

As we did for the resultant, we now restate Theorem 6.47 in the language of
linear algebra. The reader should keep comparing our development with the ma-
terial about the resultant in Section 6.3, which is just the special case k = 0. We
recall that Pd ⊆ F[x] denotes the vector space of all polynomials of degree less
than d ∈ N. For 0 ≤ k ≤ m, we consider the restriction of the map ϕ from (4) to
Pm−k × Pn−k . These polynomials are mapped to Pn+m−k . But now

dim(Pm−k × Pn−k ) = n + m − 2k < n + m − k = dim Pn+m−k ,

if k > 0. In order to find an isomorphism, we need an image space of the proper

dimension. It turns out that the right thing to do is to ignore the k low order coef-
ficients of a ∈ Pn+m−k , so that we consider the quotient a quo xk of a on division
by xk , and the corresponding linear map Pn+m−k −→ Pn+m−2k . Thus we take

ϕk : Pm−k × Pn−k −→ Pn+m−2k

(s,t) 7−→ (s f + tg) quo xk .

This is now a linear map between spaces of the same dimensions. Then Theorem
6.47 becomes the following.

C OROLLARY 6.48.
Let 0 ≤ k ≤ m ≤ n, and 1 ≤ i ≤ ℓ + 1.

(i) k appears in the degree sequence ⇐⇒ ϕk is an isomorphism.

(ii) If k = ni < n, then the linear coefficients si ,ti computed in the EEA are the
unique solution of ϕk (si ,ti ) = 1.

P ROOF. From Theorem 6.47 we have

k does not appear in the degree sequence

⇐⇒ there exists a nonzero (s,t) ∈ Pm−k × Pn−k with ϕk (s,t) = 0
⇐⇒ ϕk is not injective.
180 6. The resultant and gcd computation

We have used the fact that s 6= 0 and ϕk (s,t) = 0 imply t 6= 0. This proves (i).
For (ii), we note that if k = ni < n, then si ∈ Pm−k and ti ∈ Pn−k satisfy ϕk (si ,ti ) = 1.
Since ϕk is an isomorphism, this implies the claim. ✷

As for the resultant, it is easy to specify the matrix of ϕk . Namely, we write

f = ∑0≤ j≤n f j x j and g = ∑0≤ j≤m g j x j , with all f j , g j ∈ F. Then the (n + m − 2k) ×
(n + m − 2k) matrix Sk with entries in F defined as
 
fn gm
 
 fn−1 fn gm−1 gm 
 . . . . 
 .. .. .. .. 
 
 
f
 n−m+k+1 · · · · · · f n g k+1 · · · · · · g m 
 . .. .. . 

Sk =  . . . . . . ,

 f · · · · · · f g · · · · · · · · · · · · g 
 k+1 m m−n+k+1 m 
 . .. .. .. 
 .. . . . 
 
 . .. .. .. 
 .. . . . 
f2k−m+1 · · · · · · fk g2k−n+1 · · · · · · · · · · · · gk
| {z } | {z }
m−k n−k

with m − k columns of f j ’s and n − k columns of g j ’s, is the matrix of ϕk with

respect to the standard bases (xi , 0) for i < m − k plus (0, x j ) for j < n − k of
Pm−k × Pn−k and x p for p < n + m − 2k of Pn+m−2k . An f j or g j with j < 0 is zero.
In other words, if
s= ∑ y jx j, t= ∑ z jx j, s f + tg = ∑ u j x j ∈ F[x],
0≤ j<m−k 0≤ j<n−k 0≤ j<n+m−k

then
Sk · (ym−k−1 , . . . , y0 , zn−k−1 , . . . , z0 )T = (un+m−k−1 , . . . , uk )T
where T denotes transposition. Again, the reader is advised to understand this rela-
tion carefully. We have immediately the following consequence of Corollary 6.48.

C OROLLARY 6.49.
Let 0 ≤ k ≤ m ≤ n, and 1 ≤ i ≤ ℓ + 1.
(i) k appears in the degree sequence ⇐⇒ det Sk 6= 0.
(ii) If k = ni < n, and y0 , . . . , ym−k−1 , z0 , . . . , zn−k−1 ∈ F form the unique solution
to
Sk · (ym−k−1 , . . . , y0 , zn−k−1 , . . . , z0 )T = (0, . . . , 0, 1)T , (12)
then si = ∑ y j x j and ti = ∑ z jx j.
0≤ j<m−k 0≤ j<n−k
6.10. Subresultants 181

If R is a (commutative) ring and f , g ∈ R[x], then in Section 6.3 we called S0 =

Syl( f , g) their Sylvester matrix, and res( f , g) = det S0 their resultant. The newly
defined elements σk = det Sk ∈ R for 0 ≤ k ≤ deg g are the subresultants. (In the
literature and in computer algebra systems, one finds various definitions that are
slightly different from ours.) In fact, Sk is a submatrix of Si if k > i. If k = deg f =
deg g, then Sk is the empty matrix with determinant σk = 1.
It is convenient to define the subresultants also in the following cases: if 0 ≤
k ≤ n < m, we let σk = det Sk , as before. If min{n, m} < k < max{n, m}, we let
σk = 0. Then Corollary 6.49 (i) with det Sk replaced by σk is in fact valid for
arbitrary n, m, k ∈ N with k < max{n, m}, and part (ii) is valid if k ≤ min{n, m}
and k < max{n, m}.

E XAMPLE 6.16 (continued). Let f = r0 = x4 − 3x3 + 2x and g = r1 = x3 − 1 in

Z[x]. We have seen in Example 6.16 that the degree sequence is 4, 3, 1, and 0 and
2 do not appear. Moreover, we have calculated res( f , g) = σ0 = det S0 = 0. The
other subresultants are
 
1 0 1 0 0
 −3 1 0 1 0 
 

σ1 = det S1 = det  0 −3 0 0 1   = 9,
 2 0 −1 0 0 
0 2 0 −1 0
 
1 1 0
σ2 = det S2 = det −3 0 1  = 0,

0 0 0

σ3 = det S3 = det 1 = 1. ✸

E XAMPLE 6.1 (continued). The following M APLE code computes the subresul-
tants of the two polynomials from Example 6.1.
f := 824 x5 − 65 x4 − 814 x3 − 741 x2 − 979 x − 764
g := 216 x4 + 663 x3 + 880 x2 + 916 x + 617
with(LinearAlgebra):
S[0] := Transpose(SylvesterMatrix(f, g, x));
 
824 0 0 0 216 0 0 0 0
 −65 824 0 0 663 216 0 0 0 
 
 −814 −65 824 0 880 663 216 0 0 
 
 −741 −814 −65 824 916 880 663 216 0 
 
S0 := 
 −979 −741 −814 −65 617 916 880 663 216 

 −764 −979 −741 −814 0 617 916 880 663 
 
 0 −764 −979 −741 0 0 617 916 880 
 
 0 0 −764 −979 0 0 0 617 916 
0 0 0 −764 0 0 0 0 617
182 6. The resultant and gcd computation

S[1] := SubMatrix(S[0], 1..7, [1,2,3,5,6,7,8]);

 
824 0 0 216 0 0 0
 −65 824 0 663 216 0 0 
 
 −814 −65 824 880 663 216 0 
 
S1 := 
 −741 −814 −65 916 880 663 216 

 −979 −741 −814 617 916 880 663 
 
 −764 −979 −741 0 617 916 880 
0 −764 −979 0 0 617 916
S[2] := SubMatrix(S[1], 1..5, [1,2,4,5,6]);
 
824 0 216 0 0
 −65 824 663 216 0 
 
S2 :=  −814 −65 880 663 216 


 −741 −814 916 880 663 
−979 −741 617 916 880
S[3] := SubMatrix(S[2], 1..3, [1,3,4]);
 
824 216 0
S3 :=  −65 663 216 
−814 880 663
S[4] := SubMatrix(S[3], 1..1, [2]);

S4 := 216
for k from 0 to 4 do
sigma[k] := Determinant(S[k]);
od;
σ0 := 31947527181400427273207648
σ1 := 27754088254928081728
σ2 := −41344606374560
σ3 := 176909472
σ4 := 216 ✸

Corresponding to the bound of Theorem 6.23 on the resultant, Hadamard’s in-

equality 16.6 yields the following estimate of the subresultants.

T HEOREM 6.50.
Let f , g ∈ Z[x], n = deg f , m = deg g, and 0 ≤ k ≤ min{n, m}. Then

|σk | = | det Sk | ≤ || f ||2m−k ||g||2n−k ≤ (n + 1)n−k || f ||∞m−k ||g||∞n−k .

The analogous result for bivariate polynomials reads as follows.

6.11. Modular Extended Euclidean Algorithms 183

T HEOREM 6.51.
Let f , g ∈ F[x, y] with n = degx f , m = degx g, degy f , degy g ≤ d , and 0 ≤ k ≤
min{n, m}. Then degy σk ≤ (n + m − 2k)d .

6.11. Modular Extended Euclidean Algorithms

In this section, we use the subresultants from Section 6.10 to prove a bound on the
coefficients of the Extended Euclidean Algorithm 3.14 over Q[x] and F(y)[x], and
to derive modular algorithms for the results of the EEA.

T HEOREM 6.52.
Let f , g ∈ Z[x] have degrees n ≥ m and max-norm || f ||∞ , ||g||∞ at most A, and let
δ = max{ni−1 − ni : 1 ≤ i ≤ ℓ} be the maximal degree difference of consecutive
remainders. The results ri , si ,ti of the Extended Euclidean Algorithm 3.14 for f and
g in Q[x] have numerators and denominators (in lowest terms) absolutely bounded
by B = (n + 1)n An+m . The corresponding bound for qi and ρi is C = (2B)δ+2 . The
algorithm can be performed with O(n3 m δ 2 log2 (nA)) word operations.

P ROOF. Let 2 ≤ i ≤ ℓ and ni = deg ri . In the EEA, si and ti form the unique solution
to the system (12) of linear equations, so that σni si , σni ti , and σni ri = σni si f + σni ti g
are in Z[x], and by Cramer’s rule 25.6 and Hadamard’s inequality 16.6 we have

|σni | ≤ || f ||2m−ni ||g||n−n

2
i
≤ (n + 1)n−ni An+m−2ni ≤ B,
||σni si ||∞ ≤ || f ||m−n
2
i −1
||g||n−n
2
i
≤ (n + 1)n−ni −1/2 An+m−2ni −1 ≤ B,
||σni ti ||∞ ≤ || f ||2m−ni ||g||2n−ni −1 ≤ (n + 1)n−ni −1/2 An+m−2ni −1 ≤ B,
||σni ri ||∞ = ||(σni si ) f + (σni ti )g||∞ ≤ (ni + 1)(||σni si ||∞ · || f ||∞ + ||σni ti ||∞ · ||g||∞ )
≤ (ni + 1) · 2(n + 1)n−ni −1/2 An+m−2ni ≤ 2(n + 1)1/2 B.

Exercise 6.45 gives the slightly better bound ||σni ri ||∞ ≤ B.

Finally, we consider the division with remainder ri−1 = qi ri + ρi+1 ri+1 , and let
k = deg q = ni−1 − ni . Multiplying up, we find the pseudodivision (Section 6.12)

σnk+1
i
(σni−1 ri−1 ) = (σnki σni−1 qi ) · (σni ri ) + (σnk+1
i
σni−1 ρi+1 ri+1 ), (13)

where the four terms in parentheses are in Z[x]. By Exercise 6.44, we have

||σnki σni−1 qi ||∞ ≤ ||σni−1 ri−1 ||∞ · (||σni ri ||∞ + |σni |)k ≤ (2B)k+1 ,
||σnk+1
i
σni−1 ρi+1 ri+1 ||∞ ≤ ||σni−1 ri−1 ||∞ · (||σni ri ||∞ + |σni |)k+1 ≤ (2B)k+2

Since |σnki σni−1 | ≤ Bk+1 and |σnk+1

i
σni−1 | ≤ Bk+2 , both numerators and denominators
of qi and ρi+1 are absolutely bounded by (2B)k+2 ≤ C.
184 6. The resultant and gcd computation

The algorithm uses O(nm) arithmetic operations (Theorem 3.16) on rational

numbers with numerator and denominator absolutely bounded by C. A single such
operation can be done with O(log2 C) or O(n2 δ 2 log2 (nA)) word operations, and
the claimed time bound follows. ✷

For random inputs (say, of fixed degrees and coefficient lengths), the expected
value of δ is quite small if the degrees n and m of the two inputs are close to each
other (Exercise 6.46). It is conceivable that the δ in the estimates for qi and ρi
is an artifact and that a more careful analysis would in fact reveal that it can be
replaced by 1. Lickteig & Roy (1996, 2001) discuss a variant of the EEA where
this is indeed the case.
For comparison, we state the analogous bounds for the traditional EEA 3.6,
where the remainders are given by ri+1∗ ∗
= ri−1 rem ri∗ for all i, without dividing out
the leading coefficient.

T HEOREM 6.53.
We denote by q∗i , ri∗ , s∗i ,ti∗ ∈ Q[x] the results of the traditional Extended Euclidean
Algorithm, and

ρi ρi−2 · · · ρ2 ρ0 if i ≥ 0 is even,
αi =
ρi ρi−2 · · · ρ3 ρ1 if i ≥ 1 is odd.

(i) The length of the algorithm equals that of the monic EEA, and for all i we
have
αi−1
q∗i = qi , ri∗ = αi ri , s∗i = αi si , ti∗ = αiti .
αi
(ii) Let n, m, δ , A,C be as in Theorem 6.52. The numerators and denominators of
the coefficients of all results of the traditional algorithm in Q[x] are bounded
by Cm+2 in absolute value, and the computing time is O(n3 m3 δ 2 log2 (nA)).

Exercise 6.47 asks for a proof, and Exercise 6.49 gives a slightly better bound
for the ri∗ , s∗i ,ti∗ in the traditional EEA, essentially replacing δ by 1.
We compare the two bounds from Theorems 6.52 and 6.53 with Mignotte’s
bound, say when A is an n-digit number and δ = 1. Then the “traditional” bound
is a number of about n3 digits, the “monic” one has about n2 digits, and Mignotte’s
only about n digits! Of course Mignotte’s bound only applies to the gcd, and one
cannot hope for a bound of similar quality for all results of the EEA.
Theorem 6.52 provides a clear explanation for the coefficient growth, as in Ex-
ample 6.1. The results of the EEA are governed by subresultants and grow at a
quadratic rate in length. But the leading coefficients αi in the traditional Euclid-
ean Algorithm are a product of i/2 such entries, and thus grow at a cubic rate.
6.11. Modular Extended Euclidean Algorithms 185

This does not literally follow from Theorem 6.53, which gives only upper bounds,
but there seems to be typically little cancellation in the product defining αi . For
instance, in the example below, the products of numerators and denominators in
r5 = α5 = ρ1 · ρ3 · ρ5 have 50 and 48 digits, respectively, and only the 2-digit fac-
tor 24 cancels. A practical recommendation is therefore to use the monic version
wherever possible.

E XAMPLE 6.1 (continued). The traditional Euclidean Algorithm for the polyno-
mials from Example 6.1 produces the following quotients and remainders; almost
any (random) input will exhibit a similar behavior:
r0 := 824 x5 − 65 x4 − 814 x3 − 741 x2 − 979 x − 764
r1 := 216 x4 + 663 x3 + 880 x2 + 916 x + 617
for i from 1 to 5 do
q[i] := quo(r[i - 1], r[i], x, ’r[i + 1]’);
r[i + 1] := sort(r[i + 1]);
od;
103 5837
q1 := x−
27 486
614269 3 1539085 2 1863490 3230125
r2 := x + x + x+
162 243 243 486
34992 30072401334
q2 := x+
614269 377326404361
23256341085690 2 27844657381944 32938754949612
r3 := − x − x+
377326404361 377326404361 377326404361
231779913080427109 212504381367397914300612023767
q3 := − x−
3767527255881780 7301574909368361826957477350
163630473867966784641771618997 276046921899101981276672067323
r4 := x+
15023816685943131331188225 30047633371886262662376450
349399005257174220664364219554244000250
q4 := − x
61742098348486478706658122441075651245917
53605502942609915156276524064879156029311616760832823425
−
26774931978255360791810790390285343980469602246030531286009
14999180998204546086628509444183593910034968673275
r5 :=
141919206653976666794661960809129382074315418338
q5 := 2322230703575610679693717783220005461472383779859614416232408\
.
1189217602966986 22534494575630661208071063858852539249064234\
5609867489818460486857552186875x + 1958818007759640557915662822891\
.
8052861081903680682675547410956194774022384587 22534494575630\
6612080710638588525392490642345609867489818460486857552186875
186 6. The resultant and gcd computation

r6 := 0
One can see clearly that the numerators and denominators are considerably
larger than in the monic Euclidean Algorithm on page 143. ✸

The following analog of Theorems 6.52 and 6.53 for bivariate polynomials is
proven in Exercise 6.48.

T HEOREM 6.54.
Let F be a field, f , g ∈ F[x, y] with n = degx f ≥ m = degx g and degy f , degy g ≤ d ,
and let δ = max{ni−1 − ni : 1 ≤ i ≤ ℓ} be the maximal degree difference of consec-
utive remainders in the Euclidean Algorithm for f and g in F(y)[x].

(i) The results ri , si ,ti of the Extended Euclidean Algorithm 3.14 for f and g
in F(y)[x] have numerators and denominators (in lowest terms) of degree in
y at most (n + m − 2ni )d ≤ (n + m)d . The corresponding bound for the qi
and ρi is (δ + 2)(n + m)d . The Extended Euclidean Algorithm 3.14 can be
performed with O(n3 m δ 2 d 2 ) operations in F .

(ii) For the traditional EEA, the degree bound is (m + 2)(δ + 2)(n + m)d , and
the number of operations in F is O(n3 m3 δ 2 d 2 ).

The following generalization of Theorem 6.26 will lead to modular algorithms

for the EEA.

T HEOREM 6.55.
Let R be a Euclidean domain with field of fractions K , p ∈ R prime, and f , g ∈ R[x]
nonzero with deg f ≥ deg g and such that p does not divide b = gcd(lc( f ), lc(g)).
Furthermore let 1 ≤ i ≤ ℓ and ri , si ,ti ∈ K[x] be the results in the ith row of the
monic EEA, ni = deg ri < deg f , and σ = σni ∈ R the ni th subresultant of f , g.
A bar denotes the reduction modulo p.

(i) The polynomials σ ri , σ si , σti are in R[x].

(ii) The remainder degree ni occurs in the EEA for f , g over R/hpi if and only
if p ∤ σ .

(iii) If p ∤ σ , then p divides no denominator in ri , si , or ti , and ri , si ,ti form a row

in the EEA for f , g over R/hpi, with deg ri = ni .

P ROOF. (i) follows from Cramer’s rule, as in the proof of Theorem 6.52. We first
assume that ni ≤ min{deg f , deg g}. Then σ is, up to a unit modulo p, equal to the
6.11. Modular Extended Euclidean Algorithms 187

ni th subresultant of f and g, as in the proof of Lemma 6.25, and Corollary 6.49

gives (ii). The coefficients of si ,ti are a solution over R/hpi of the system (12)
for f = f and g = g, and (iii) follows since the solution is unique, by Corollary
6.49 (ii).
It remains to investigate the degenerate cases where deg f < ni or deg g < ni .
Then p ∤ b implies that deg g = deg g or deg f = deg f , respectively. If i ≥ 2 and
deg f < ni < n1 = deg g, then ni does not occur in the degree sequence of the EEA
for f and g, and p | σ since it divides each entry of the (n1 − ni )th column of the
matrix Sni , which has determinant σ . A similar argument works if deg g < ni <
deg f = deg f . Finally, if i = 1 and deg f < ni = deg g, then p ∤ σ = lc(g)deg f −deg g
and ri = g/ lc(g), si = 0, ti = 1 form a row in the EEA for f and g. ✷

The restriction ni < deg f is somewhat non-intuitive. It excludes precisely the

case where i = 1 and deg f = deg g. Then σ = 1, and the claims (i) and (iii) of
the theorem may fail to hold if p | lc(g). This is due to the fact that we consider
the Euclidean Algorithm with monic remainders; both claims hold trivially for the
traditional Euclidean Algorithm in this case.
It is interesting to note that p might divide some previous subresultants, so that
some rows are “missing” modulo p, but still the correct results pop up modulo p
whenever p does not divide that particular subresultant nor the leading coefficients
of f and g.

E XAMPLE 6.56. Let f = r0 = x4 + x3 + x2 + x − 4 and g = r1 = x3 − 2x2 + x + 3

in Z[x]. Then the EEA in Q[x] computes
5 13
r0 = q1 r1 + ρ2 r2 = (x + 3)r1 + 6 x2 − x − ,
6 6
7 79 17
r1 = q2 r2 + ρ3 r3 = x − r2 + x+ ,
6 36 79
497 12 114
r2 = q3 r3 + ρ4 r4 = x − r3 − · 1,
474 6241
r3 = q4 r4 = r3 r4 .

Modulo p = 3, the computation is

r0∗ = q∗1 r1∗ + ρ∗2 r2∗ = x · r1∗ + 1 · (x + 2),

r1∗ = q∗2 r2∗ = (x2 + 2x)r2∗ .

The degrees 2 and 0 are missing in the degree sequence modulo 3, but nonetheless
the two remainders r3 and r2∗ of degree 1 are equal modulo 3. ✸

We obtain the following modular algorithm for the results of the EEA in Q[x].
188 6. The resultant and gcd computation

A LGORITHM 6.57 Modular EEA in Q[x]: small primes version.

Input: f , g ∈ Z[x] with deg f = n ≥ deg g = m ≥ 1 and || f ||∞ , ||g||∞ ≤ A.
Output: The results ri , si ,ti in Q[x] of the EEA for f and g.

1. B ←− (n + 1)n An+m , r ←− ⌈log2 (2A2 B3 + 1)⌉

choose a set S ⊆ N of r primes

2. S ←− {p ∈ S: p ∤ lc( f ) and p ∤ lc(g)}

for each p ∈ S call the Euclidean Algorithm 3.14 to compute all results in
Z p [x] of the EEA for f mod p and g mod p

3. Let n0 = n ≥ n1 = m > n2 > . . . > nℓ ≥ 0 be the degrees of all remainders

that were computed in step 2
for i = 2, . . . , ℓ do
Si ←− {p ∈ S: ni occurs in the degree sequence of f mod p, g mod p}
compute the coefficients of the monic remainder ri ∈ Q[x] of degree
ni and of si ,ti ∈ Q[x] from their images modulo the primes in Si by
rational number reconstruction (Section 5.10)
4. return ri , si ,ti for 2 ≤ i ≤ ℓ

T HEOREM 6.58.
The algorithm returns the correct values as specified. If S in step 1 consists of the
first r primes, then the algorithm uses O(n3 m log2 (nA)(log2 n + (loglog A)2 )) word
operations.

P ROOF. We denote by σm−1 , σm−2 . . . , σ0 ∈ Z the subresultants and by m0 = n,

m1 = m, m2 , . . . , mℓ∗ the degree sequence of ( f , g).
Let 0 ≤ k < m. Then for any prime p ∈ S, k occurs as a remainder degree, say
k = ni , for f mod p and g mod p if and only if σk 6≡ 0 mod p. By the choice of r,
we have ∏ p∈S p > 2B3 A2 . After removing divisors of lc( f ) and lc(g) from S in
step 2, we still have ∏ p∈S p > 2B3 . Since |σk | ≤ B, by Theorem 6.50, we have
∏ p∈Si p > 2B2 if σk 6= 0. (If σk = 0, then no modular remainder has degree k.) It
follows that precisely those k with σk 6= 0 occur as modular remainder degrees,
and ℓ∗ = ℓ and ni = mi for 0 ≤ i ≤ ℓ. Theorem 6.55 says that ri mod p, si mod p,
ti mod p have been computed in step 2 for the primes p in Si . The numerators and
denominators of the coefficients of the monic ri are absolutely bounded by B, by
Theorem 6.52, and they can indeed be uniquely reconstructed from their images
modulo the p ∈ Si (Theorem 5.26). The same holds for si and ti .
For the cost analysis, Theorem 18.10 says that the first r primes can be computed
in time O(r log2 r loglog r) in step 1, and that log p ∈ O(log r) for all p ∈ S. Thus
the cost per prime p in step 2 is O(n log A log r) word operations for reducing all
coefficients of f and g modulo p, a total of O(n log A · r log r) operations. The EEA
6.11. Modular Extended Euclidean Algorithms 189

modulo p takes O(nm log2 r) word operations (Theorem 3.16 and Corollary 4.7),
and O(nmr log2 r) for all p ∈ S. Since log ∏ p∈S p = ∑ p∈S log p ∈ O(r log r), we have
O(r2 log2 r) word operations for reconstructing one rational coefficient in step 3
from its image modulo all primes in S by means of the Chinese Remainder Algo-
rithm 5.4 and the (traditional) Extended Euclidean Algorithm 3.6 over Z. Since all
ri , si ,ti together have O(nm) coefficients, the total cost for step 3 is O(nmr2 log2 r)
word operations. This dominates the cost for the other steps, and the claim follows
from r ∈ O(n log(nA)). ✷

If also some quotient qi and the corresponding ρi+1 is required, we calculate

them by one division with remainder: ri−1 = qi ri + ρi+1 ri+1 .
As for the small primes modular determinant computation, actually a slightly
smaller value for r is sufficient; see Exercise 18.21.
Our timing estimate is—up to logarithmic factors—the same as for direct calcu-
lation (Theorem 6.52) and for the big prime variant (Exercise 6.51). However, in
many applications we are only interested in one particular row ri , si ,ti of the EEA.
Then the rational number reconstruction in step 3 takes only O(nr2 log2 r) word
operations, and the overall cost drops to O∼ (n3 log2 A), which is faster by about a
factor of m than both direct calculation and the big prime version.
In practice, one would not take the first r primes but instead sufficiently many
(possibly precomputed) single precision primes just below the word size, and then
the cost is O(nmr2 ) word operations. When using fast algorithms for integer arith-
metic as discussed in Part II, the cost for the algorithm drops to O∼ (n3 log A)
word operations. Theorem 6.52 implies that the output size is at most about
n2 log264 B ∈ O∼ (n3 log A) words, and hence the above time estimate is—up to log-
arithmic factors—optimal when the size of the output is close to the upper bound.
When in addition using the fast Euclidean Algorithm from Chapter 11 in step 2,
then we may compute a single row ri , si ,ti in the EEA at a cost of only O∼ (n2 log A)
operations (Corollary 11.14).
Here is the analogous algorithm for F[x, y].

A LGORITHM 6.59 Modular bivariate EEA: small primes version.

Input: f , g ∈ F[x, y] = R[x] of degree n ≥ m ≥ 1 in x and at most d in y, where
R = F[y] for a field F with at least 3(n + m + 1)d elements.
Output: The results ri , si ,ti in F(y)[x] of the EEA for f and g.

1. let S ⊆ F with #S = 3(n + m + 1)d

2. S ←− {u ∈ S: lcx ( f ), lcx (g) do not vanish at y = u}

for each u ∈ S call the Euclidean Algorithm 3.14 to compute all results in
F[x] of the EEA for f (x, u) and g(x, u)
190 6. The resultant and gcd computation

3. let n0 = n ≥ n1 = m > n2 · · · > nℓ ≥ 0 be the degrees of all remainders that

were computed in step 2
for i = 2, . . . , ℓ do
Si ←− {u ∈ S: ni occurs in the degree sequence of f (x, u), g(x, u)}
compute the coefficients of the monic remainder ri ∈ F(y)[x] and of
si ,ti ∈ F(y)[x] from their values at the points in Si by Cauchy interpo-
lation (Section 5.8)
4. return ri , si ,ti for 2 ≤ i ≤ ℓ

T HEOREM 6.60.
The algorithm returns the correct values as specified. It uses O(n3 md 2 ) operations
in F .

See Exercise 6.50 for the proof. As for the modular gcd algorithm, we make a
suitable field extension when F does not have sufficiently many elements.
The timing estimate for the small primes modular EEA is the same as for both
the big prime variant and direct calculation, and better by a factor of m if we
only want to compute a single row ri , si ,ti of the EEA, as in the integer case. The
estimate drops to O∼ (n3 d) when using fast arithmetic (Part II), or even to O∼ (n2 d)
when only one row of the EEA is required (Corollary 11.12). Both bounds are
optimal up to logarithmic factors since the output size for all results is about n3 d,
and for one row about n2 d, at least in a generic sense.
Our purpose in studying subresultants has been to gain a conceptual understand-
ing of Euclid’s algorithm and a bound on the coefficients occurring in it. One
might be tempted to actually execute Euclid’s algorithm by calculating subresul-
tants via Gaussian elimination. This would be highly inefficient; in Section 11.2
we show how to calculate subresultants efficiently from the ρi in the Euclidean Al-
gorithm. They may then be used to replace the rational number reconstruction and
the Cauchy interpolation, respectively, in step 3 of the modular EEA algorithms,
by a (computationally easier) polynomial Chinese remainder or interpolation al-
gorithm, after multiplying all modular images by the corresponding subresultant.
The same modular techniques also apply to gcds of multivariate polynomials
over Q or a finite field. The rational case is reduced to the finite field case by com-
puting modulo small prime numbers, and the computation of multivariate polyno-
mial gcds over finite fields is reduced to univariate gcd computations by evaluating
one of the variables at distinct points and proceeding recursively.

6.12. Pseudodivision and primitive Euclidean Algorithms

Gauß’ lemma 6.6 suggests another way of computing the gcd of two primitive
polynomials f , g ∈ Z[x]: use the primitive versions of the remainders ri , that is,
6.12. Pseudodivision and primitive Euclidean Algorithms 191

an integral multiple αi ri ∈ Z[x] which is primitive. This corresponds to taking

essentially the primitive part of a polynomial as its normal form in Q[x]; see Exer-
cise 6.5. It is convenient to replace the usual division with remainder by a pseudo-
division computing q, r ∈ Z[x] from a, b ∈ Z[x] with

lc(b)1+deg a−deg b a = qb + r, deg r < deg b

(assuming b 6= 0). The integer factor multiplied to a ensures that the division can
be carried out in Z[x] (see also Exercise 2.9). This works with any integral domain
R instead of Z, and is useful when R is a ring of multivariate polynomials over an
integral domain. As in Section 6.2, we assume that we have some normal form
normal on R, which we extend to R[x] via normal( f ) = normal(lc( f )) f / lc( f ).

A LGORITHM 6.61 Primitive Euclidean Algorithm.

Input: Primitive polynomials f , g ∈ R[x], where R is a UFD, of degrees n ≥ m.
Output: h = gcd( f , g) ∈ R[x].

1. r0 ←− f , r1 ←− g, n0 ←− n, n1 ←− m

2. i ←− 1
while ri 6= 0 do
{ Pseudodivision }
ai−1 ←− lc(ri )1+ni−1 −ni ri−1 , qi ←− ai−1 quo ri ,
ri+1 ←− pp(ai−1 rem ri ), ni+1 ←− deg ri+1
i ←− i + 1
3. ℓ ←− i − 1
return normal(rℓ )

T HEOREM 6.62.
The algorithm correctly computes the gcd as specified. We let δ = max{ni−1 − ni :
1 ≤ i ≤ ℓ} be the maximal quotient degree.

(i) If R = Z and || f ||∞ , ||g||∞ ≤ A, then the max-norm of the intermediate results
is at most (2(n + 1)n An+m )δ+2 , and the algorithm uses O(n3 m δ 2 log2 (nA))
word operations.

(ii) If R = F[y] for a field F and degy f , degy g ≤ d , then the degree in y of all
intermediate results is at most (δ + 2)(n + m)d , and the time is O(n3 m δ 2 d 2 )
operations in F .

P ROOF. Let αi = lc(ri ) ∈ R for all i, and K be the field of fractions of R. As

in Exercise 6.47, we find that ri /αi ∈ K[x] is the ith remainder in the Euclidean
192 6. The resultant and gcd computation

Algorithm for f , g. In particular, rℓ is a primitive multiple of the monic gcd, and

hence normal(rℓ ) = h.
We only prove the claim (i) for R = Z; the bivariate case is left as Exercise 6.53,
which also discusses a primitive Extended Euclidean Algorithm. For 1 ≤ i ≤ ℓ,
the polynomial ri is a primitive multiple of the corresponding monic remainder,
and thus αi divides the subresultant σni of f and g, by Theorem 6.55 (i). Let
B = (n + 1)n An+m . Then Theorem 6.52 implies that |αi |, ||ri ||∞ ≤ B. Thus

||ai−1 ||∞ = |αi |ni−1 −ni +1 ||ri−1 ||∞ ≤ Bδ+2 ,

and Exercise 6.44 implies that

||qi ||∞ , ||ai−1 rem ri ||∞ ≤ ||ri−1 ||∞ (||ri ||∞ + |αi |)ni−1 −ni +1 ≤ (2B)δ+2 .

The latter number bounds all integers in the algorithm. The number of operations
in Z is O(nm), and the estimate follows from log B ∈ O(n log(nA)). ✷

The time estimate for the primitive Euclidean Algorithm is the same as for the
monic variant; Exercise 6.53 gives a slightly better bound. Its main advantage over
the monic algorithm is that it avoids rational numbers or functions completely. Ex-
periments with R = Z in Section 6.13 show that the primitive algorithm clearly
beats the monic algorithm in practice, but it is still slower than the modular algo-
rithms.
We can profitably use Algorithm 6.45 for computing the content and the prim-
itive part of ai−1 rem ri in step 2 when R = F[y]. This content is usually quite
large: Exercise 6.54 shows that its degree is about (ni−1 − ni + 1)(n + m − 2ni )d −
2(ni−1 − ni+1 )d ≥ 2(n + m − 2ni − 2)d.

E XAMPLE 6.1 (continued). We run the primitive Euclidean Algorithm 6.61 on the
polynomials from Example 6.1 in M APLE.
r0 := 824 x5 − 65 x4 − 814 x3 − 741 x2 − 979 x − 764
r1 := 216 x4 + 663 x3 + 880 x2 + 916 x + 617
for i from 1 to 5 do
a[i - 1] := r[i - 1] * lcoeff(r[i], x)
^ (degree(r[i - 1], x) - degree(r[i], x) + 1);
q[i] := quo(a[i - 1], r[i], x, ’r[i + 1]’);
r[i + 1] := sort(primpart(r[i + 1], x));
od;
a0 := 38444544 x5 − 3032640 x4 − 37977984 x3 − 34572096 x2 − 45676224 x
− 35645184
q1 := 177984 x − 560352
r2 := 1842807 x3 + 3078170 x2 + 3726980 x + 3230125
6.13. Implementations 193

a1 := 733522530077784 x4 + 2251506654822087 x3 + 2988425122539120 x2

+ 3110678877552084 x + 2095293523416633
q2 := 398046312 x + 556896321
r3 := −1292018949205 x2 − 1546925410108 x + 1829930830534
a2 := 3076221617285867113225886594175 x3
+ 5138429089796618729969295394250 x2
+ 6221496034686259067634654534500 x
+ 5392089541409117454572707253125
q3 := −2380941563727618435 x − 1126368994649461694
r4 := 867315257966502554 x + 731586548698031843
a3 := −971902851927901193438036722668528690384882481780 x2
− 1163652606433370945698101020523242569959656034928 x
+ 1376539403018149446952759586929166946652371517144
q4 := −1120587748227344134908028769570 x − 396448327221414114828389881017
r5 := 1
a4 := 867315257966502554 x + 731586548698031843
q5 := 867315257966502554 x + 731586548698031843
r6 := 0
In the example, the leading coefficients of the remainders agree—up to sign—
with the denominators of the corresponding coefficients in the monic Euclidean
Algorithm. ✸

6.13. Implementations
Table 6.5 gives an overview of the different variants of the Euclidean Algorithm
for computing gcds in Z[x] and F[x, y] that we have discussed in this chapter.
We have implemented all algorithms discussed in this chapter for computing
the gcd of two polynomials with integral coefficients in C++, using Victor Shoup’s
“Number Theory Library” N TL 1.5 for integer and polynomial arithmetic (see
http://www.shoup.net/ntl/); we will describe parts of it in Section 9.7. The
running times are given in Figures 6.4 and 6.6 for various degrees and coefficient
sizes. The integer arithmetic of N TL uses Karatsuba’s multiplication algorithm
(Section 8.1) which is asymptotically faster than classical multiplication, so that,
for example, the running time of the big prime gcd algorithm is only about n3.18 .
All timings are the averages over 10 pseudorandom inputs. The software ran in
1998 on a Sun Sparc Ultra 1 clocked at 167 MHz.
The experiments were as follows. For each choice of n and k, we pseudo-
randomly and independently chose three polynomials a, b, c ∈ Z[x] of degree less
than n2 and with nonnegative coefficients less than 2k/2 , and computed the gcd of
194 6. The resultant and gcd computation

0.2

0.15
CPU seconds

0.1

0.05

0
8 16 24 32 40 48 56 64
n

traditional big prime with Mignotte bound

monic big prime with bound 2n/2
primitive big prime without prime search
heuristic with random u small primes
heuristic with u a power of 2 small primes NTL built-in

40
CPU minutes

0
1024 2048 3072 4096 5120 6144 7168 8192
n

F IGURE 6.4: Various gcd algorithms in Z[x] for pseudorandom polynomials of degree less
than n − 1 with nonnegative coefficients less than n2n−1 , for 2 ≤ n ≤ 64 and for 64 ≤ n ≤
8192.
6.13. Implementations 195

Algorithm for Z[x] for F[x, y] time

traditional Algorithm 3.6 n8
monic Algorithm 3.14 n6
primitive Algorithm 6.61 n6
big prime modular EEA Exercise 6.51 n6
small primes modular EEA Algorithm 6.57 Algorithm 6.59 n6
small primes modular EEA, single row Algorithm 6.57 Algorithm 6.59 n5
“heuristic gcd” Exercise 6.27 Exercise 6.28 n4
big prime modular gcd Algorithm 6.34 Algorithm 6.28 n4
small primes modular gcd Algorithm 6.38 Algorithm 6.36 n3

TABLE 6.5: Comparison of various Euclidean Algorithms in Z[x] and F[x, y]. The time
(word and field operations, respectively) is for polynomials of degree at most n in x and
with coefficients of length or degree at most n, respectively, with a normal degree sequence,
and ignores logarithmic factors.

30
n = 128
25 n = 2048
n=k
20 k = 128
CPU minutes

k = 2048

4 8 12 16 20 24 28 32
input size nk in megabits

F IGURE 6.6: The small primes modular gcd algorithm in Z[x] of NTL for various pseudo-
random polynomials of degree less than n − 1 with about k-bit coefficients.
196 6. The resultant and gcd computation

ac and bc in Z[x]. Thus the degree of the gcd was at least 2n − 1; in fact, it was
equal to 2n − 1 in all cases when n ≥ 6.
In these cases where the gcd is essentially c, the Mignotte bound for the length of
the coefficients of the gcd is too large by a factor of about 2n , which discriminates
against our implementation of the big prime algorithm. For that reason we also ran
a variant of the big prime method using the known bound 2n/2 on the coefficients
of c, and it computed the correct gcd in all cases, in time faster than the original
big prime algorithm but still slower than the small primes algorithm. The standard
deviations in the experiments with the big prime algorithm are considerably higher
than for the other algorithms. The reason is that there were enormous differences
in the time spent for finding a big prime with the routines of N TL that implement
the probabilistic primality testing algorithms of Chapter 18. Figure 6.4 also shows
the timings for the big prime algorithm with bound 2n/2 and without the cost of
the prime search; one can see that the corresponding curves are much smoother.
We implemented two variants of the “heuristic” gcd algorithm (Exercise 6.27).
In the first variant, the two input polynomials are evaluated at a random point,
and the evaluation point is a power of two in the second variant. The most time
consuming part in both variants is the gcd calculation of two integers with about
n2 bits.
We give timings for two variants of the small primes algorithm: our implemen-
tation and the built-in routine of N TL. Both routines differ from Algorithm 6.38 in
that they work in an adaptive fashion. They do not compute the Mignotte bound
at all, but take only as many single precision primes as needed to recover the co-
efficients of the gcd. This is achieved by deterministically adding one new prime
each time, discarding the “unlucky” ones which lead to a modular gcd of too large
degree, and performing a divisibility test after each new “lucky” prime, starting
with the divisibility check for the constant coefficients.
We did not try to optimize our routines and merely implemented them as de-
scribed in the text, using low-level routines of N TL. The one exception is the small
primes algorithm where we employed the adaptivity. By their nature, such compar-
isons depend on the effort spent on the various subroutines and hence contain an
element of unfairness. In particular, the different types of integer and polynomial
arithmetic of N TL favor some algorithms and disfavor others.
Nevertheless, the timings confirm the ranking that the theoretical bounds of Ta-
ble 6.5 suggest, in particular, the efficiency of small primes modular algorithms.
Moreover, they show a clear distinction between the monic and the primitive Eu-
clidean Algorithm, which is probably caused by the absence of arithmetic with
rational numbers in the latter, and between the big prime and the small primes
modular algorithm, which is partly due to the adaptivity and partly reflects our in-
tuition that it is cheaper to solve many “small” problems than one “big” problem
of the same total size. In summary, an adaptive small primes modular algorithm
appears to be the most favorable method to implement.
Notes 197

Notes. 6.1. When researchers started experimenting in the late 1960s with the first
computer algebra systems (often built by themselves), they observed an unpleasant phe-
nomenon: the rapid coefficient growth in the traditional Euclidean Algorithm, say in
Q[x], as in Example 6.1 on page 185 (and almost any random example). If pseudodivi-
sion is used at each step without removing common factors, then the remainder coeffi-
cients actually grow exponentially. As long as each rational number is reduced to lowest
terms, Theorem 6.52 guarantees that all coefficient lengths in the (monic) Extended Eu-
clidean Algorithm 3.14 are polynomially bounded. This was discovered by Collins (1966,
1967), and led to several new variants of the traditional Euclidean Algorithm: using the
monic versions, as discussed in Chapter 3, or the primitive versions (Section 6.12) of the
remainders. See also Collins (1971), Brown (1971), Brown & Traub (1971), and von zur
Gathen & Lücking (2000). Computationally, they are inferior to the modular approach,
introduced by Brown (1971) and Collins (1971); see also Notes 6.7 and 6.10.
6.2. Gauß (1801) proves his important “Lemma” in article 42. He shows in article 340 of
Gauß (1863b) that F p [x] is a UFD, for a prime p.
6.3. Leibniz (1683) drafted a letter to Tschirnhaus, but never mailed it. He describes how
to calculate the resultant of two polynomials of degree five by Euclid’s Algorithm, and
says that its vanishing means that the two polynomials have a nontrivial gcd. After earlier
work of Newton and Maclaurin, Euler (1748c) and Bézout (1764) introduced the resul-
tant; the name comes perhaps from Bézout’s équation resultante de l’élimination. Bézout
obtains the resultant as the determinant of a matrix, today called Bézout’s matrix , with
only min{n, m} rows and columns (Exercise 6.14). He describes how to calculate an k × k
determinant as the familiar sum of k! terms, and gives the linear equations describing the
Bézout coefficients s,t with s f + tg = constant, tacitly assuming the gcd to be trivial.
Later algebraic geometers generalized the resultant to more than two variables and poly-
nomials: Euler (1764), Sylvester (1840, 1853), Cayley (1848), Macaulay (1902, 1916,
1922), and many others. The basics of the subresultant theory for univariate polynomials,
as presented here, were developed by Jacobi (1836, 1846), Cayley (1848), Kronecker
(1873, 1878, 1881b), and Frobenius (1881). Jacobi (1836), § 4, shows that the resul-
tant is irreducible (as a polynomial in indeterminate coefficients of the two input polyno-
mials) and proves our Uniqueness Lemma 5.15 in his §15. (He only considers the normal
case.) He performs Euclid’s algorithm with pseudodivision, and obtains the description
si f + ti g = ri of the remainders. Cauchy (1840) discusses various elimination methods,
Euler’s and Bézout’s among them, proves irreducibility of the resultant, and writes down
explicitly the 5 × 5 Sylvester matrix for two polynomials of degrees 2 and 3, respectively.
Cauchy presents an early use of indices: he writes “ f = a0 xn + · · · + an ”. The lack of such
a notation had made earlier work cumbersome. Kronecker (1881b) contains many of the
(non-computational) results of this section, including Theorem 6.47. A “modern” presen-
tation along these lines is in von zur Gathen (1984b), where the goal is parallel algorithms
for the results of the Extended Euclidean Algorithm—not a topic of this text.
The general notion of a field emerged only towards the end of the 19th century; before
that, results were proven separately for various cases. As an example, Sylvester (1881)
wrote a note saying that Corollary 6.17 is also valid for polynomials with integer coeffi-
cients modulo a prime number.
Sylvester (1840) contains an explicit description of the resultant and of subresultants,
as determinants of his matrix and its submatrices, and how to compute the remainders in
the Euclidean Algorithm—which he calls derivation—from them. Apparently ignorant of
198 6. The resultant and gcd computation

Gauß’ elimination method, he calculates n × n determinants as a sum of n! terms, and ends

on the following hopeful note: “Through the well-known ingenuity and kindly proferred
[sic!] help of a distinguished friend, I trust to be able to get a machine made for working
Sturm’s theorem, and indeed all problems of derivation, after the method here expounded;
on which subject I have a great deal more yet to say, than can be inferred from this or my
preceding papers.” It seems that his “Euclidean engine” was never built.
6.5. Ulam invented probabilistic Monte Carlo algorithms to estimate the success prob-
ability in a solitaire card game; it was much used in the 1940s to approximate multi-
dimensional integrals in high-energy physics at Los Alamos; see Metropolis & Ulam
(1949), and Halton (1970) for a survey. Berlekamp’s (1970) algorithm (Section 14.8) was
the first to use them in computer science, and Pollard (1974, 1975) used them for two of his
integer factorization methods (Sections 19.3 and 19.4), but the breakthrough came in the
wake of Solovay & Strassen’s (1977) primality test (Section 18.5). Babai (1979) coined
the term Las Vegas algorithm . Gill (1977) set up the corresponding complexity classes
(Section 25.8), and since then they found applications in many areas of computer science:
testing (see Section 4.1), sorting, (universal) hashing, routing on networks, geometric al-
gorithms such as triangulations or convex hulls, counting problems, online algorithms, and
many more. Motwani & Raghavan (1995) give a nice survey of the numerous areas of
application.
Early probabilistic algorithms are in Legendre (1785) (see Notes 14.2 and 14.3), Galois
(1830) (see Notes 14.9), and Pocklington (1917), who finds a nonsquare modulo a prime
by trial, but calls this a defect in the method. Buffon gave in 1777 a probabilistic method
for approximating π by dropping a needle of length l randomly over an array of equidistant
parallel lines with distance d > l. The probability of hitting a line is 2l/πd.
6.6. The lower bound on some cyclotomic coefficients is from Vaughan (1974). Theorem
6.32 and Corollary 6.33 are due to Mignotte (1974). For a variety of bounds for factors,
see Mignotte (1982, 1988), Mignotte (1989), §IV.3, and Mignotte & Glesser (1994). Since
|gm | ≤ | fn |, Mignotte’s bound implies that ||g||2 ≤
√ α|| f ||2 with α = 2m . Granville (1990)
shows that one can use α = φn , where φ = (1 + 5)/2 ≈ 1.61803 . . . is the golden ratio.
Mignotte (1989), Proposition 4.9, proves that for all m ∈ N there exist f , g ∈ Z[x] with
deg g = m, deg f = ⌊(m2 ln m)/c⌋, and g | f such that || f ||1 ≥ c2m (m2 ln m)−1/2 ||g||2 , for an
absolute constant c ∈ R>0 .
6.7. Modular gcd algorithms for (multivariate) polynomials over a UFD were discovered
by Collins (1971) and Brown (1971). Moses & Yun (1973) state a prime power modular
algorithm for multivariate polynomials based on Hensel lifting (Chapter 15); see also Lauer
(2000). It is particularly useful for sparse multivariate polynomials.
6.8. Bézout stated his theorem in 1779; for conics this was already in a book by the Greek
mathematician Apollonius, and it was also given by Maclaurin in 1720 and in Cramer
(1750), §46. Euler (1748b, §4, 1748c, §3, 1764) states Bézout’s theorem correctly: two
plane curves of degrees n and m without common components meet in at most nm points.
His proof uses his product representation of the resultant (Exercise 6.12). For a proof that
the resultant has degree at most nm, in a similar spirit, see Exercise 6.11. General versions
of Bézout’s theorem for higher-dimensional varieties were first proven in the modern sense
by van der Waerden in the 1920s and by Weil in 1946.
6.9. Lemma 6.44 was discovered independently by DeMillo & Lipton (1978), Zippel
(1979), and Schwartz (1980). Von zur Gathen, Karpinski & Shparlinski (1996) and
Exercises 199

Díaz & Kaltofen (1995), Theorem 6.2 and “Note Added in Proof”, analyze Algorithm
6.45, the latter also for multivariate polynomials. A method similar to Algorithm 6.45 also
works to calculate the gcd of many integers (von zur Gathen & Shparlinski (2006)). It is
particularly useful for calculating the content of a polynomial in Z[x]. Rowland & Cowles
(1986), Chen & Kao (1997, 2000), Lewin & Vadhan (1998), and Moeller (1999) use alge-
braic field extensions to reduce the amount of randomness for zero testing of polynomials.
The unsolved question of a deterministic polynomial-time zero test for polynomials given
by an arithmetic circuit has an unexpected connection to other areas of complexity theory.
See Shpilka & Yehudayoff (2010) for a survey of this active research area.
6.10. Subresultants were introduced into computer algebra by Collins (1966, 1967, 1973);
see also Brown (1971, 1978) and Brown & Traub (1971). In fact, they work with a slightly
different notion of “subresultant”; in our presentation, both definitions and theorems take
a somewhat simpler form. Their “subresultants” are constant multiples of the remainders
of the Euclidean Algorithm, rather than just constants. Mulders (1997) describes an error
in software implementations of an integration algorithm due to a confusion about subre-
sultants; see Section 22.3.
Subresultants were introduced by Sylvester (1840) and are treated in Trudi (1862) and
Gordan’s (1885) textbook, §132 ff. Habicht (1948) studies subresultants systematically
in the generic case where the coefficients of the input polynomials are indeterminates.
He calls our subresultants Nebenresultanten1 and also gives explicit formulas for the si ,ti
in the EEA in terms of determinants of submatrices of the Sylvester matrix. Von zur
Gathen (1984b) discusses the relation between the monic and the traditional Euclidean
Algorithm. The “primitive polynomial remainder sequence” discussed in Section 6.12 was
introduced by Collins (1967). Two further algorithms, based on the so-called “reduced
polynomial remainder sequence” and the “subresultant polynomial remainder sequence”,
were invented by Collins (1967) and Brown & Traub (1971). Both avoid rational arithmetic
by using pseudodivision, but in contrast to the primitive Euclidean Algorithm, they do not
divide out the complete content but a divisor of it which can be computed without gcd
calculations. The “reduced” algorithm appears to use exponential time in the worst case
(Brown 1971, page 485). This is not the case for the “subresultant” algorithm: Brown
(1978) gives an estimate for its running time which is essentially the same as the bound
for the primitive Euclidean Algorithm from Exercise 6.53. Lickteig & Roy (1996, 2001),
Ducos (2000), and Lombardi, Roy & Safey El Din (2000) present clever variants of the
“subresultant” algorithm that eliminate the factor δ from the time bound of Exercise 6.53.
Von zur Gathen & Lücking (2003) give a historic overview and a systematic treatment of
subresultants and polynomial remainder sequences.

Exercises.
6.1 Give a sharp estimate for λ(ab) when a, b ∈ Q[x].
6.2 Let a = qb + r be a division with remainder, with a, b, q, r ∈ Q[x], −1 + deg a = deg b > deg r,
and λ(a), λ(b) ≤ l ∈ N. Give estimates for λ(q) and λ(r) in terms of l (a and b need not be monic).
6.3 Let f ∈ R[x] for a Unique Factorization Domain R. Show that f = pp( f ) if and only if f is
primitive.
1 minor resultants
200 6. The resultant and gcd computation

6.4 Prove that Lemma 6.5 and Corollary 6.7 hold when c ∈ K and f , g ∈ K[x], where K is the field
of fractions of R.
6.5 Let R be a UFD on which a normal form luR in the sense of Section 3.4 is given, and let K be
the field of fractions of R. We extend cont and pp to K[x] as described in Section 6.2. Prove that
taking luK[x] ( f ) = luR (lc(pp( f ))) cont( f ) defines a normal form for K[x]. Hint: Exercise 3.8. Why
is luK[x] ( f ) = cont( f ) not a normal form?
6.6 Let f ∈ Z[x] be monic, and α ∈ Q be a root of f . Show that α ∈ Z.
6.7 Let p be a prime and ϕ: Z[x] −→ Z p [x] be defined by taking coefficients modulo p. Show that
when f ∈ Z[x], p ∤ lc( f ), and ϕ( f ) is irreducible in Z p [x], then f is irreducible in Q[x].
6.8 Show that the probability for two random polynomials in Z[x] of degree at most n and max-norm
at most A to be coprime in Q[x] is at least 1 − 1/(2A + 1). Hint: Exercise 4.18.
6.9 Consider the ring R = Z[1/2] = {a/2n : a ∈ Z, n ∈ N} of binary rationals.
(i) Prove that R is the smallest subring of Q containing Z and 1/2.
(ii) What are the units of R?
(iii) You may use the fact that R is a UFD and that any two elements of R have a gcd which is
unique up to associates. Find a normal form on R and use this to define a gcd function on R.
(iv) Determine the content and primitive part of the polynomial f = 2x2 + 6x − 4 with respect to
the three rings Z, R, and Q. Is f primitive with respect to R?
6.10 Let f , g ∈ Z[x], r = res( f , g) ∈ Z, and u ∈ Z. Prove that gcd( f (u), g(u)) divides r. Hint:
Corollary 6.21.
6.11 Let F be a field and f = ∑0≤i≤n fi xi and g = ∑0≤i≤m gi xi in F[x, y] have total degrees n and m,
respectively, so that each fi , gi ∈ F[y] with degy fi ≤ n − i, degy gi ≤ m − i. Let r = resx ( f , g) ∈
F[y]. Show that each of the (n + m)! summands contributing to r has degree at most nm, and hence
degy r ≤ nm.
6.12∗ Let R be a UFD with field of fractions F, f , g ∈ R[x] nonzero of degrees n, m, respectively,
and α1 , . . ., αn and β1 , . . ., βm the roots of f and g, respectively, in an extension field of F, counted
with multiplicities.
(i) Prove:
res( f , g) = lc( f )m ∏ g(αi ) = (−1)nm lc(g)n ∏ f (β j ) = lc( f )m lc(g)n ∏ (αi − β j ).
1≤i≤n 1≤ j≤m 1≤i≤n
1≤ j≤m

Hint: First prove the claim in the case where the roots are considered to be indeterminates. Then
apply the ring homomorphism which maps them to the actual roots.
(ii) Conclude that res( f , gh) = res( f , g) res( f , h) for all f , g, h ∈ R[x].
6.13 This exercise provides an alternative proof of Corollaries 6.20 and 6.21. Let R be a UFD and
f , g ∈ R[x] nonzero of degrees n, m, respectively, where n + m ≥ 1.
(i) Prove that
(xn+m−1 , . . ., x, 1) · Syl( f , g) = (xm−1 f , . . ., f , xn−1 g, . . ., g) in R[x]n+m ,
and conclude that there exist nonzero s,t ∈ R[x] with deg s < m and degt < n such that s f + tg =
res( f , g). Hint: Cramer’s rule.
(ii) Conclude that res( f , g) = 0 if and only if gcd( f , g) is nonconstant.
6.14∗ Let f = ∑0≤i≤n fi xi , g = ∑0≤i≤n gi xi ∈ F[x], with a field F and f0 6= 0. For 0 ≤ k < n, cross-
multiply each polynomial by the leading k + 1 terms of the other polynomial and subtract:
bk = f · ∑ gn−k+ j x j − g · ∑ fn−k+ j x j = ∑ bkl xl .
0≤ j≤k 0≤ j≤k 0≤l
Exercises 201

(i) Show that deg bk < n for all k.

(ii) Let B = (bkl )0≤k,l<n ∈ F n×n be the Bézout matrix made up from these coefficients. Prove
that det B = 0 if gcd( f , g) 6= 1. Hint: If gcd( f , g) 6= 1, then there exists a common root of f and g in
some extension field of F.
6.15∗ The aim of this exercise is to show that Corollaries 6.17 and 6.21 remain essentially true over
arbitrary (commutative) coefficient rings R. Let f , g ∈ R[x] nonzero of degrees n, m, respectively,
such that n + m ≥ 1 and r = res( f , g) ∈ R is nonzero.
(i) Show that there exist polynomials s,t ∈ R[x] such that s f +tg = res( f , g). Hint: Apply Corollary
6.21 to the generic case where the coefficients of f and g are indeterminates, and then map them to
the actual coefficients in R.
(ii) Now assume that f and g are monic. Prove that there exist polynomials s,t ∈ R[x] with deg s <
deg g and degt < deg f satisfying s f + tg = 1 if and only if res( f , g) is a unit in R.
6.16∗ This exercise generalizes Lemma 6.13. Let F be a field, f , g ∈ F[x] nonzero of degrees n
and m, respectively, and h = gcd( f , g) ∈ F[x] of degree d.
(i) Prove that there exists a nonzero pair (s,t) ∈ Pm−i × Pn−i with ϕ0 (s,t) = 0 if and only if i < d.
(ii) Give d pairs (s1 ,t1 ), . . ., (sd ,td ) ∈ Pm × Pn such that ϕ0 (si ,ti ) = 0 and si is monic of degree
m − i for all i.
(iii) Suppose that (s,t) ∈ Pm × Pn is linearly independent of all pairs from (ii) and ϕ0 (s,t) = 0.
Show that there is a nonzero pair (s∗ ,t ∗ ) ∈ Pm−d−1 × Pn−d−1 with ϕ0 (s∗ ,t ∗ ) = 0, contradicting (i).
(iv) Conclude that dim ker Syl( f , g) = deg gcd( f , g).
6.17 Why are the results of gcd(x^2+1,x+1) mod 2 and Gcd(x^2+1,x+1) mod 2 in M APLE
different?
6.18 Let f = x4 − 13x3 − 62x2 − 78x − 408 and g = x3 + 6x2 − x − 30 be polynomials with integral
coefficients.
(i) Set up the Sylvester matrix of f and g and compute res( f , g).
(ii) Let p1 = 5, p2 = 7, p3 = 11, and p4 = 13. Compute h = gcd( f , g) in Q[x]. For which of the
primes is the modular image of h equal to the gcd modulo that prime, and why? Answer the latter
question without actually computing the modular gcd’s, then check your answer.
6.19 Let α ∈ R be a parameter and f , gα ∈ R[x] monic polynomials with res( f , gα ) = α3 + α2 +
α + 1. Determine all values of α for which gcd( f , gα ) 6= 1.
6.20 Let F be a field, f , g ∈ F[x, y] nonzero with degx f , degx g ≤ n, degy f , degy g ≤ d, and lcx ( f ) =
lcx (g) = 1. Suppose that gcd( f (x, u), g(x, u)) 6= 1 for at least 2nd + 1 values u ∈ F. Conclude that
degx gcd( f , g) > 0.
6.21 Let Fq be a finite field with q elements. What is wrong in the following argument: Exercise
4.18 shows that the probability that two random nonconstant polynomials in Fq [x] are coprime is
1 − 1/q ≥ 1/2. Thus there is a probabilistic algorithm for computing gcds in Fq [x] which always
outputs 1, independently of the inputs, and errs with probability at most 1/2.
6.22 Let f = xn − 1 and g = Φn ∈ Z[x] be the nth cyclotomic polynomial, of degree ϕ(n) ≤ n − 1.
Compare Vaughan’s lower bound on ||Φn ||∞ (valid for certain, but not all, values of n) to Mignotte’s
upper bound 6.33.
6.23∗ (i) Let f = ∑0≤i≤n fi xi ∈ C[x] of degree n > 0 and α ∈ C a root of f . Prove that |α| ≤ 2b,
where b = max0≤i<n | fi / fn |1/(n−i) . Hint: Start with the case fn = 1, and assume that |α| > b.
(ii) Conclude that |α| ≤ 2|| f ||∞ if f ∈ Z[x], and compare this to Mignotte’s bound for |α| that you
obtain from Theorem 6.32.
(iii) Show that in fact |α| ≤ || f ||∞ if f ∈ Z[x] and α ∈ Z.
202 6. The resultant and gcd computation

6.24∗ (i) Let R be a UFD and f , g, h ∈ R[x] nonzero of degrees n, m, k, respectively, such that h
divides f and g, and f ∗ = f /h and g∗ = g/h. Moreover, let S = Syl( f ∗ , g∗ ) ∈ R(n+m−2k)×(n+m−2k) ,
r = det S = res( f ∗ , g∗ ) ∈ R, and
 
hk 0 0 ··· 0
 hk−1 hk 0 ··· 0 
 
 hk−2 hk−1 hk · · · 0 
H =  .. .. .. . .

..  ∈ R
(n+m−2k)×(n+m−2k)
 . . . . . 

.. .. ..
. . . ··· h k

the Toeplitz matrix whose rows are shifts of the coefficient sequence h0 , . . ., hk ∈ R of h. Prove that
T = HS is a matrix whose first m − k columns are shifts of the coefficient sequence of f and whose
last n − k columns are shifts of the coefficient sequence of g, and show that det T = lc(h)n+m−2k r.
(In fact, det T is the kth subresultant of f and g.)
(ii) Let R = Z, m ≤ n, and || f ||∞ , ||g||∞ ≤ A. Show that |r| ≤ (n + 1)n A2(n−k) .
6.25 Complete the proof of Theorem 6.35 by showing that p ∤ r holds if and only if pp(w) = h.
6.26 Let F be a field. This exercise discusses the task to decide whether a polynomial g ∈ F[x, y]
divides another polynomial f ∈ F[x, y], and if so, to compute the quotient f /g ∈ F[x, y], using a
modular approach. Suppose that degx g ≤ degx f = n and degy g ≤ degy f = d, and let p ∈ F[y] be
nonconstant and coprime to lcx (g).
(i) Convince yourself that g mod p divides f mod p in (F[y]/hpi)[x] if g divides f in F[x, y].
(ii) Now assume that g mod p divides f mod p. One might be tempted to assume that degy p > d
is sufficient to conclude that g divides f . Prove that this is wrong in general by considering the
example f = xn + (yd + yd−1 )xn−2 , g = x − yd , and p = yd+1 + y + 1 for n ≥ 2 and d ≥ 1.
(iii) Assume that g mod p divides f mod p and degy p > d, and let h ∈ F[x, y] be the modular
quotient, with degx f ≥ degx g + degx h, degy h < degy p, and f ≡ gh mod p. Prove that f = gh if
degy f ≥ degy g + degy h. Given p, what is the running time of this method? Compute h in the
example of (ii).
(iv) Find and prove analogs of (ii) and (iii) for f , g ∈ Z[x] and p ∈ Z. Hint: Use Mignotte’s bound
6.33 and look at the proof of Theorem 6.35.
6.27∗ This exercise discusses a modular gcd algorithm for Z[x] by Char, Geddes & Gonnet (1989)
(who in fact give an algorithm for multivariate polynomials over Z; they call it “heuristic gcd”) and
Schönhage (1985, 1988). The modulus is not a prime but a linear polynomial x − u. Let f , g ∈ Z[x]
be nonzero and primitive of degree at most n and with max-norm at most A, h = gcd( f , g) ∈ Z[x],
and u ∈ N such that u > 4A.
(i) Prove that h(u) | c = gcd( f (u), g(u)) in Z and that h(u) 6= 0.
(ii) Let v ∈ Z[x] whose coefficients vi satisfy −u/2 < vi ≤ u/2, and v(u) = c. Give an algorithm
for computing v from c.
(iii) Now assume that pp(v) | f and pp(v) | g. Writing h = pp(v)w with a primitive w ∈ Z[x], prove
that w(u) | cont(v). Use Exercise 6.23 to show that u/2 ≥ |w(u)| ≥ | lc(w)| · (u −2A)degw > (u/2)deg w
if w is nonconstant, and conclude that h = ± pp(v).
(iv) Compute gcd(3x4 + 6x3 + 5x2 − 2x − 2, x3 + 4x2 + 6x + 4) by the above method. Find out the
smallest evaluation point where the method works, and compare it to your choice of u.
Schönhage (1988) proves that for u > 4(n + 1)n A2n the divisibility conditions assumed in (iii) are
always satisfied, so that the method always terminates. He also discusses a probabilistic variant
where u is chosen at random from a “small” interval and the length of u is dynamically increased
(say, doubled) on failure, and proves that the expected length of a successful u is only O∼ (n), in
contrast to O∼ (n log A) for the deterministic algorithm.
Exercises 203

6.28∗ Find an analog of Exercise 6.27 for bivariate polynomials.

6.29−→ Compute (with explanation) the gcd over Z[x] of the polynomials 36x4 + 72x3 + 68x2 +
104x + 60 and 36x5 + 24x4 + 116x3 + 126x2 + 150x + 150 with the small primes modular algorithm.
6.30−→ (Newton 1707, page 46) Compute the gcd of x4 − 3ax3 − 8a2 x2 + 18a3 x − 8a4 and x3 −
ax2 − 8a2 x + 6a3 in Q[x, a] using the small primes modular algorithm.
6.31∗ A bin contains w white balls, labeled 1, . . ., w, and 0 < b ≤ w black balls. Consider choosing
l balls without replacement, where 0 < l ≤ 2b, and show that the probability that at least ⌈l/2⌉ balls
are white is at least 1/2. Hint: Use induction on w, starting with w = b, and for the induction step
from w to w + 1, distinguish the two cases whether the white ball number w + 1 has been chosen or
not.
6.32∗ Let Fq be a finite field with q elements and f , g ∈ Fq [x, y] of degree at most n in x and at most
d in y.
(i) Determine the smallest value t ∈ N such that Fq t has at least (4n + 2)d elements and conclude
that the modular gcd algorithm 6.36 with F = Fq t correctly computes the gcd over Fq (hint: Example
6.19) and takes O(nd(n + d) log2 (nd)) operations in Fq .
(ii) Let f = x2 + y2 + xy + x + y and g = x2 y + xy2 + xy in F2 [x, y]. Check that t = 5, and that
gcd( f mod p, g mod p) 6= 1 for the linear moduli x, x + 1, y, y + 1. Compute gcd( f (x, α), g(x, α)) in
F4 [x], where F4 = F2 [z]/hz2 + z + 1i and α = z mod z2 + z + 1 ∈ F4 , and conclude that gcd( f , g) = 1.
The construction of the field extension Fq t , that is, of an irreducible polynomial of degree t over Fq ,
is discussed in Section 14.9.
6.33−→ We consider the plane curves

X = {(a, b) ∈ R 2 : b − a3 + 7a − 5 = 0},
Y = {(a, b) ∈ R 2 : 20a2 − 5ab − 4b2 + 35a + 35b − 21 = 0}

in R 2 . Determine the intersection of X and Y in two ways: by projecting it to the first coordinate,
and by projecting it to the second coordinate. Comment on the differences. Plot the two curves and
mark their intersection points.
√ √
6.34 Compute the minimal polynomial f ∈ Q[x] of 2 + 3 over Q. Let F192 = F19 [z]/hz2 − 2i
and α = z mod x2 − 2 ∈ F192 a square root of 2. Check that 7α is a square root of 3, and compute the
minimal polynomial of α + 7α over F19 . How is it related to f ?
6.35∗ Let α, β be two nonzero algebraic numbers, with (monic) minimal polynomials f , g ∈ Q[x] of
degrees n, m, respectively.
(i) Prove that the reversal rev( f ) = xn f (x−1 ) of f is the minimal polynomial of α−1 .
(ii) Let r = resy (rev( f )(y), g(xy)) ∈ Q[x]. Show that degx r = nm and r(αβ) = 0 (hint: Exer-
cise 6.12), and conclude that r is the minimal polynomial of αβ if it is irreducible.
(iii) Find multiples of degree nm of the minimal polynomials of aα + bβ, where a, b ∈ Q \ {0} are
arbitrary, and of α/β.
√ √ √ √
(iv) Compute the minimal polynomials of 2 − 2 3 over Q and of 2 3 3 over Q and over F13 .
6.36∗ Let α be an algebraic number and f , g ∈ Q[x] of degrees n, m ∈ N≥1 such that f is the minimal
polynomial of α. We want to compute the minimal polynomial of g(α) and therefore may assume
that n > m.
(i) Let r = resy ( f (y), x − g(y)) ∈ Q[x]. Show that degx r = n and that the minimal polynomial of
g(α) divides r (hint: Exercise 6.12). (In fact, r is a power of the minimal polynomial of g(α), which
equals r/ gcd(r, r′ ).)
√
(ii) Compute the minimal polynomials of 3 + 1 and 22/3 + 21/3 + 1 over Q.
A different algorithm is given in Exercise 12.10.
204 6. The resultant and gcd computation

6.37 This exercise discusses a variant of Lemma 6.44, due to Zippel (1993). Let R be an integral
domain, n ∈ N, S ⊆ R finite with s = #S elements, and r ∈ R[x1 , . . ., xn ] a polynomial of degree at
most di ≤ s in the variable xi .
(i) Show that r has at most sn − (s − d1 ) · · ·(s − dn ) ≤ (d1 + · · · + dn )sn−1 zeroes in Sn if it is not
the zero polynomial. Hint: Prove inductively that the number of elements of S that are not zeroes of
r is at least (s − d1 ) · · ·(s − dn ).
(ii) Let u1 , . . ., us be the elements of S. Prove that the polynomial

r= ∏ ∏ (xi − u j )
1≤i≤n 1≤ j≤di

achieves the first bound from (i) exactly.

6.38 Modify Algorithm 6.45 so as to also produce s1 , . . ., sn ∈ F[x] of degree less than d such that
s1 f1 + · · · + sn fn = h, taking at most 3(n − 2)(d + 1) + O(d 2 ) operations in F.
6.39 Let R be a Unique Factorization Domain, f1 , . . ., fn ∈ R, m = f1 · · · fn , and gi = m/ fi for
1 ≤ i ≤ n. Show that lcm( f1 , . . ., fn ) = m/ gcd(g1 , . . ., gn ). Derive a probabilistic algorithm for
computing lcm( f1 , . . ., fn ) if R = F[x] for a field F, and analyze its cost and success probability.
6.40 For each n ∈ N, find polynomials f1 , . . ., fn ∈ Q[x] such that gcd( f1 , . . ., fn ) = 1 and any proper
subset of them has a nonconstant gcd.
6.41 Let R be a UFD and f , g ∈ R[x] nonzero with deg g ≤ deg f . Prove that for 0 ≤ k < deg g,
gcd(lc( f ), lc(g)) divides the kth subresultant of f and g.
6.42−→ Let f = x8 + x6 − 3x4 − 3x3 + 8x2 + 2x − 5 and g = 3x6 + 5x4 − 4x2 − 9x + 21 be polyno-
mials in Z[x] and Sk ∈ Z (14−2k)×(14−2k) be the submatrix of the Sylvester matrix of f and g whose
determinant is the kth subresultant σk , as in Section 6.10, for 0 ≤ k ≤ 6.
(i) Trace the Extended Euclidean Algorithm 3.14 for f and g over Q, and check that si f + ti g = ri
for 2 ≤ i ≤ ℓ. Compute the degree sequence n0 = 8, n1 = 6, n2 , . . ., nℓ .
(ii) Set up the matrices Sk and compute σk for 0 ≤ k ≤ 6. Explain which of the Sk are singular and
what this has to do with the remainders in the EEA.
(iii) Let 2 ≤ i ≤ ℓ and uni ∈ Q 14−2ni such that its last entry is 1 and all other coefficients are zero.
Say why the linear system Sni v = uni has a unique solution v ∈ Q 14−2ni and state the coefficients of
the solution without calculation. Then check your answer by actually computing Sni v.
(iv) For those k ∈ {0, . . ., 5} where σk = 0, find a nonzero vector v ∈ Q 14−2k such that Sk v = 0
without calculation. Then check your answer by actually computing Sk v.
6.43 Let F be a field, m, n ∈ N with m < n, and f = ∑0≤i≤m fi xi ∈ F[x] of degree m. For 0 ≤ h ≤ m,
we define square matrices Ah = ( fh−i+ j )0≤i, j<n−h ∈ F (n−h)×(n−h) and Bh = ( fh−i+ j )0≤i, j<n−h−1 ∈
F (n−h−1)×(n−h−1) , so that Bh is the leading principal submatrix of Ah of order n − h − 1.
(i) Prove that the hth subresultant of xn and f is σh = det Ah , for 0 ≤ h ≤ m.
(ii) Let 0 ≤ h ≤ m be such that σh 6= 0 and r, s,t ∈ F[x] the row in the EEA for (xn , f ) that has
deg r = h. Prove that the constant coefficient of t is det Bh / det Ah . Hint: Theorem 6.48 and Cramer’s
rule.
(iii) Let 0 ≤ h < k ≤ m be such that h is maximal with σh 6= 0. Conclude from (ii) that there exists
a (k, n − k)-Padé approximant to f if and only if det Bh 6= 0. Hint: Corollary 5.21.
6.44∗ You are to estimate the coefficient size of quotient and remainder in a pseudodivision and its
cost. Let a, b ∈ Z[x], with n = deg a ≥ m = deg b > 0, k = n − m, and c = lc(b) ∈ Z. Furthermore, let
q = ∑0≤i≤k qi xi and r in Z[x] be such that a∗ = ck+1 a = qb + r and deg r < m, and ri ∈ Z[x] be the
remainder in the ith iteration of the polynomial division algorithm 2.5 applied to rk+1 = a∗ and b, for
k ≥ i ≥ 0 (see also Exercise 2.9).
Exercises 205

(i) Let ||a||∞ ≤ A, ||b||∞ ≤ B, and |c| ≤ C. Prove that we have |qi | ≤ A(B + C)k−iCi and ||ri ||∞ ≤
A(B +C)k+1−iCi for k ≥ i ≥ 0, and ||r||∞ ≤ A(B +C)k+1 .
(ii) Conclude that the cost for computing q, r from a, b is O(mk2 log2 B) word operations if A ≤ B.
(iii) Let A, B,C ∈ N>0 , and assume that a has positive coefficients not smaller than A, and that
b = Cxm − b∗ for a polynomial b∗ ∈ Z[x] with deg b∗ < m and positive coefficients greater or equal
to B. Show that qi ≥ A(B +C)k−iCi for k ≥ i ≥ 0 and ||r||∞ ≥ A(B +C)k+1 . (Thus the bound from (i)
is essentially sharp.)
(iv) Give statements analogous to (i) and (ii) for pseudodivision of bivariate polynomials.
6.45 Let F be a field, f , g ∈ F[x] nonzero of degree n ≥ m, ri , si ,ti ∈ F[x] the ith row in the EEA for
f , g and some i ≥ 1, ni = deg ri , S the ni th submatrix of the Sylvester matrix of f and g, and σ = det S
the ni th subresultant.
(i) We have shown in Theorem 6.52 that the coefficients of σsi and σti are determinants of sub-
matrices of S of order n + m − 2ni − 1. Let U,V be the matrices that arise from S by replacing
the last row by (xm−ni −1 , . . ., x, 1, 0, . . ., 0) and (0, . . ., 0, xn−ni −1 , . . ., x, 1), respectively. Prove that
σsi = detU and σti = detV .
(ii) Let W be the matrix S with its last row replaced by (xm−ni −1 f , . . ., x f , f , xn−ni −1 g, . . ., xg, g).
Conclude from (i) that σri = detW .
(iii) Prove that every coefficient of σri has absolute value at most (n + 1)n−ni An+m−2ni if f , g are
in Z[x] with max-norms at most A. Hint: The coefficient of x j in σri is obtained by taking only terms
containing x j in the last row of W .
6.46∗ Let Fq be a finite field with q elements and n, m ∈ Z with n ≥ m ≥ 0.
(i) Let X0 , . . ., Xm−1 be independent random variables with prob(Xi = 0) = q−1 for all i, and ρ the
random variable that counts the longest run of zeroes in X0 , . . ., Xm−1 :
ρ = max{0 ≤ i ≤ m: ∃ j ≤ m − i X j = X j+1 = · · · = X j+i−1 = 0}.

Prove that prob(ρ ≥ d) ≤ (m − d + 1)q−d for 1 ≤ d ≤ m, and conclude that the expected value
E(ρ) = ∑ d prob(ρ = d) = ∑ prob(ρ ≥ d)
0≤d≤m 0≤d≤m

of ρ is at most 1+m/(q−1). (In fact, the better bound E(ρ) ∈ O(log m) holds; see Guibas & Odlyzko
(1980) for a proof and references.)
(ii) For two uniform random polynomials f , g ∈ Fq [x] of degrees n and m, respectively, let δ denote
the maximal degree difference of two consecutive remainders in the Euclidean Algorithm for f , g,
with δ( f , g) = 0 if g | f (the first difference deg f − deg g is not counted). Use Exercise 4.17 to
conclude that E(δ) ∈ O(log m).
(iii) Let A ≥ 1. Derive the same upper bound on the expected value of δ for random f , g ∈ Z[x] of
degrees n and m, respectively, and with || f ||∞ , ||g||∞ ≤ A.
6.47∗ Prove Theorem 6.53.
6.48∗ Prove Theorem 6.54.
6.49∗ The aim of this exercise, which follows Shoup (1991), is to prove a bound on the coefficients
of the results ri∗ , s∗i ,ti∗ of the traditional EEA for two nonzero elements f , g of Z[x] or F[x, y], where
F is a field, that is independent of the value δ from Theorems 6.53 and 6.54. So let αi = lc(ri∗ )
and ni = deg ri∗ for 0 ≤ i ≤ ℓ, σk the kth subresultant, and Sk the submatrix of the Sylvester matrix
Syl( f , g) whose determinant is σk , for 0 ≤ k ≤ n1 , as usual.
(i) Let κi , λi be the constant coefficients of si = α−1 ∗ −1 ∗
i si and ti = αi ti , respectively, for 2 ≤ i ≤ ℓ.
By Theorem 6.48 and Cramer’s rule, we have
detYi det Zi
κi = , λi = for 2 ≤ i ≤ ℓ,
σni σni
206 6. The resultant and gcd computation

where Yi , Zi are matrices that result from Sni by replacing a certain column by a unit vector. Let
γ2 = detY2 , γi = detYi−1 · det Zi − det Zi−1 · detYi for 3 ≤ i ≤ ℓ,
and prove that
σn2 (−1)i−1 σni σni−1
α2 =, αi = for 3 ≤ i ≤ ℓ.
γ2 γi αi−1
Hint: Lemma 3.15 and Theorem 6.53. Conclude that
(−1)i+ j−1
αi = (−1)(i+1)(i+2)/2 σni ∏ γj for 2 ≤ i ≤ ℓ.
2≤ j≤i

(ii) Let f , g ∈ Z[x] with max-norm at most A, n = n0 ≥ m = n1 , B = (n + 1)n An+m , as in Theo-

rem 6.52, and 2 ≤ i ≤ ℓ. Prove that both the numerator and the denominator of αi are absolutely at
most (2B)i . Using the bounds from Theorem 6.52 on the coefficients of the results ri = α−1 ∗
i ri , si ,ti
of the monic EEA, show that the numerators and denominators of ri∗ , s∗i ,ti∗ are absolutely bounded
by (2B)i+1 . Conclude that their length is O(nm log(nA)).
(iii) Let f , g ∈ F[x, y] of degrees at most d in y, n = n0 ≥ m = n1 , and 2 ≤ i ≤ ℓ. Prove that the
numerator and denominator of αi has degree in y at most i(n + m)d. Using the bounds from Theorem
6.54 on the coefficients of the results ri , si ,ti of the monic EEA, conclude that the numerators and
denominators of ri∗ , s∗i ,ti∗ have degree at most (i + 1)(n + m)d.
6.50∗ Prove Theorem 6.60 and try to find a “small” constant c ∈ Q such that the running time is
at most cn3 md 2 + O(n3 d(n + d)) arithmetic operations in F in the normal case where deg ri+1 =
deg ri − 1 for 1 ≤ i < ℓ = m + 1.
6.51∗ Design big prime modular algorithms for the monic EEA in Z[x] and F[x, y], where F is a field.
Prove that your algorithms are correct and on input f , g of degree n ≥ m in x take O(n3 m log2 (nA))
word operations if || f ||∞ , ||g||∞ ≤ A and O(n3 md 2 ) field operations if degy f , degy g ≤ d, respectively.
You may ignore the cost for finding a big prime or irreducible polynomial.
6.52 Let f , g be primitive polynomials in R[x], where R is a UFD, and let prem( f , g) be the remain-
der in the pseudodivision of f by g. Show that gcd( f , g) = gcd(g, prem( f , g)).
6.53∗ (i) Prove the running time estimate of Theorem 6.62 for the primitive Euclidean Algorithm
6.61 for bivariate polynomials.
(ii) Modify the primitive Euclidean Algorithm 6.61 so as to also compute si ,ti ∈ R[x] with ri =
si f + ti g for 0 ≤ i ≤ ℓ, and prove that the time estimate from Theorem 6.62 is still valid for the
modified algorithm.
(iii) Use Exercise 6.44 to show that Algorithm 6.61 can be executed using only O(n3 mδ log2 (nA))
word operations for R = Z and only O(n3 mδd 2 ) operations in F when R = F[y] for a field F.
6.54∗ Let F be a field and f , g ∈ F[x, y] with degx f = n ≥ degx g = m and degy f , degy g ≤ d. As-
suming that all coefficients of the remainders ri in the primitive Euclidean Algorithm 6.61 for f , g
achieve the degree bounds from Theorem 6.54 on the numerators of the corresponding coefficients
in the monic Euclidean Algorithm exactly, and also the degree bounds for the pseudodivision of
ri−1 by ri according to Exercise 6.44 are in fact equalities, prove that degy contx (ai−1 rem ri ) =
(ni−1 − ni + 1)(n + m − 2ni )d − 2(ni−1 − ni+1 )d.
6.55−→ You are to compare five methods for computing gcds in Z[x] in your favorite computer
algebra system. The first method is simply to call its appropriate routine. Try to find out which
algorithm it uses. The other methods are the primitive Euclidean Algorithm 6.61, the big primes
modular algorithm 6.34, the small primes modular algorithm 6.38, and the “heuristic” algorithm
from Exercise 6.27. Implement and run your algorithms on several examples. Choose examples of
increasing complexity; construct examples so that the resulting gcds have several different degrees,
including cases where the gcd is 1. Measure running times and see if you can draw any conclusions
about relative merits of the algorithms.
Exercises 207

6.56 Let F be a field and n ∈ N≥2 .

(i) Determine the discriminant disc( f ) = res( f , f ′ ) of a binomial f = xn + b, with b ∈ F × .
(ii) (Swan 1962) Determine the discriminant of a trinomial f = xn + axk + b, with 1 ≤ k < n and
a, b ∈ F × .
6.57 Derive Lemma 3.15 for R = F[x], where F is a field, from Lemmas 3.8 and 3.10 and Theorem
6.53 (i).
I would have my son mind and understand Business,
read a little History, study the Mathematics and Cosmography:
—these are good, with subordination to the things of God.
[. . .] These fit for Public services, for which man is born.
Oliver Cromwell (1649)

They that are ignorant of Algebra cannot imagine

the wonders in this kind are to be done by it: and what
further improvements and helps advantageous to other parts
of knowledge the sagacious mind of man may yet find out,
it is not easy to determine.
John Locke (1690)

What kind of world do we live in where 11 and 7 equal 2?

John Cougar Mellencamp (1989)

Scientists pretended that history didn’t matter,

because the errors of the past were now corrected
by modern discoveries. But of course their forebears
had believed exactly the same thing in the past, too.
Michael Crichton (1995)

Bei dem Kinde aber muss man im Unterrichte allmälig

das Wissen und Können zu verbinden suchen. Unter allen
Wissenschaften scheint die Mathematik die einzige der Art zu seyn,
die diesen Endzweck am besten befriedigt.1
Immanuel Kant (1803)

1 The instruction of children should aim gradually to combine knowing and doing. Among all sciences mathe-
matics seems to be the only one that satisfies this purpose best.
7
Application: Decoding BCH codes

Coding theory deals with the detection and correction of transmission errors. The
scenario is that a message m is sent over a transmission channel, and due to noise
on the channel some of the symbols in the received message r are different from
those in m. How can we correct them?
• ✲ •
m channel r

A simple strategy is to send m three or five times and take a majority vote on
each symbol. If errors occur too frequently, then this may not help much, but the
usual assumption is that errors occur only with fairly small probability, and then
this strategy will give an erroneous result only with much smaller probability than
accepting r as is.
However, the cost (= length) of transmission has increased by a factor of three
or five. The fundamental task of coding theory is to see whether small error prob-
ability can be achieved at reasonable cost. The basic framework of this theory was
established in the pioneering work of Shannon (1948). Error correcting codes are
employed in numerous situations, from computer networks to satellite TV, digi-
tal telephony, and the technology that make CDs so remarkably resistant against
scratches. They must not be confused with cryptography , the art of sending secret
messages that only the intended receiver can read (see Section 20).
It turns out that the tools of algebra provide many useful codes. We describe a
particular class of such codes. Let Fq be a finite field with q elements, n, k ∈ N,
and C ⊆ Fqn a k-dimensional linear subspace. C is called a linear code over Fq .
Any basis of C provides an isomorphism Fqk −→ C, and ε: Fqk −→ C ⊆ Fqn is the
encoding map. The number n is the length of C, k is its dimension, and the ratio
k/n ≤ 1 is the rate of C.
To transmit a message m, we first identify it with an element of Fqk . If, say,
q = 2 and k = 64, and we want to transmit messages in ASCII, then each ASCII
letter can be identified with an 8-bit string, and a block of 8 letters with a “word”

209
210 7. Application: Decoding BCH codes

in F264 . Now the simple code which sends each “word” three times has length 192,
dimension 64, and rate 1/3.
For an element a = (a1 , . . . , an ) ∈ Fqn , we denote by

w(a) = #{i: 1 ≤ i ≤ n, ai 6= 0}

its Hamming weight, and by

d(C) = min{w(a): a ∈ C \ {0}}

the minimal distance of C. Since C is a linear subspace, w(a − b) ≥ d(C) for all
distinct a, b ∈ C. Our triple repetition code is C = {(a, a, a) ∈ F2192 : a ∈ F264 } and
has minimal distance d(C) = 3.
On receiving a word r ∈ Fqn , it is decoded as c ∈ C with w(r − c) minimal. Since
fewer errors are more probable, this is called maximum likelihood decoding.
If less than d(C)/2 errors occurred in transmitting the word, then this will work
correctly. If a single letter in Fq is received incorrectly with probability ε ≪ 1, and
errors occur independently, then this decoding procedure makes a mistake with
probability no more than

n
∑ ε j (1 − ε)n− j .
d(C)/2≤ j≤n
j

One of the goals in coding theory is to make this probability small without de-
creasing the rate too much.
For example, ε ≈ 10−4 seems to be a reasonable value for transmissions over
copper wires. In Table 7.1 below, we have a code C over F2 with dimension 8,
length 15, and minimal distance 5. Then this error probability becomes ≈ 5 · 10−8 :
a tremendous gain, at the cost of not even halving the transmission rate. This
is much better than the triple repetition code mentioned above, which has error
probability of about 10−5 and transmission rate 1/3.
We now describe a popular class of codes, the BCH codes, together with an
efficient way of implementing the decoding procedure.
Let Fq be a finite field and f ∈ Fq [x] be irreducible and monic with deg f = m.
Then Fqm = Fq [x]/h f i, and α = x mod f ∈ Fqm is a root of f (Lemma 4.5). Since
2 m−1
f (xq ) = f (x)q for each f ∈ Fq [x], the elements αq , αq , . . . , αq ∈ Fqm are also
i
roots of f . Furthermore, the αq for 0 ≤ i < m are all distinct (we will prove this in
Section 14.10 when α is a primitive root of unity). Hence they are all roots of f ,
m−1
and f = (x − α)(x − αq ) · · · (x − αq ). The minimal polynomial of an element
β ∈ Fqm is the monic (nonzero) polynomial f ∈ Fq [x] of least degree such that
f (β ) = 0. It exists and is unique, and for all g ∈ Fq [x], we have g(β ) = 0 if and
only if f | g. These basic facts about finite fields are explained in Section 25.4.
7. Application: Decoding BCH codes 211

E XAMPLE 7.1. (i) If m = 1, then the minimal polynomial of β is f = x − β .

(ii) The minimal polynomial of β = x mod f ∈ Fqm = Fq [x]/h f i over Fq is f .
(iii) The polynomial f = x4 + x + 1 ∈ F2 [x] is irreducible, and F16 = F2 [x]/h f i is
a field with 16 elements. By (ii), the minimal polynomial of β = x mod f ∈ F16 is
x4 + x + 1. ✸

D EFINITION 7.2. An element β ∈ Fqm is called a primitive nth root of unity if

gcd(n, q) = 1 and

(i) β n = 1, and
(ii) β k 6= 1 for 0 < k < n.

Thus a primitive nth root of unity is just an element of order n (Section 25.1)
in the multiplicative group F×qm = Fqm \ {0}. Such roots of unity will play a major
role for the Fast Fourier Transform in Section 8.2. We are now in a position to say
what a BCH code is.

D EFINITION 7.3. Let q = pr for some prime p and let n, δ ≥ 1, β a primitive nth
root of unity in some extension Fqm of Fq , and g ∈ Fq [x] the monic lcm of the
minimal polynomials of β , β 2 , . . . , β δ−1 . Then the vector space

C= ∑ xi g · Fq ⊆ Fq [x]/hxn − 1i = R ∼
= Fqn
0≤i<n−deg g

is a BCH code, denoted by BCH(q, n, δ ). Here g = (g mod xn − 1) ∈ R, and C is

the ideal generated by g in R. The code C has length n and dimension n − deg g,
and g is its generator polynomial.

The notation BCH(q, n, δ ) does not reflect the fact that the code depends on
the choice of the primitive nth root of unity β , but the properties of the code (in
particular, its minimal distance) are essentially independent of β . We will discuss
in Section 14.10 how to construct BCH codes in general, and only give an example
here.

E XAMPLE 7.4. We will construct all BCH codes of length 15 over F2 . The fac-
torization of x15 − 1 over F2 into irreducible factors is

x15 + 1 = (x| {z}

+ 1)(x|2 +{z
x + 1})(x|4 + x3 +{z
x2 + x + 1})(x|4 +{z
x3 + 1})(x|4 +{z
x + 1}).
f1 f2 f3 f4 f5

From the factorization of the cyclotomic polynomials Φk in Z[x] (Section 14.10)

we find that x15 −1 = Φ1 Φ3 Φ5 Φ15 , where Φ1 ≡ f1 , Φ3 ≡ f2 , Φ5 ≡ f3 , and Φ15 ≡ f4 f5
212 7. Application: Decoding BCH codes

mod 2. We take F16 = F2 [x]/h f5 i, as in Example 7.1 (iii). For β = x mod f5 ∈ F16 ,
the elements β 3 , β 2 , β , 1 form a basis of F16 over F2 , and

F16 = {a3 β 3 + a2 β 2 + a1 β + a0 : a3 , a2 , a1 , a0 ∈ F2 }.

We see that β 3 6= 1, β 5 = β 2 + β 6= 1, and β 15 = 1. This means that β is a primitive

15th root of unity. We only have to check the divisors of 15, because the order of
β is a divisor of the order 15 of the multiplicative group F× 16 of F16 , by Lagrange’s
theorem.
Table 7.1 gives all BCH codes of length 15 over F2 . ✸

δ generator polynomial g exponents i with g(β i ) = 0 dimC d(C)

1 1 Ø 15 1
2, 3 f5 1, 2, 4, 8 11 3
4, 5 f3 f5 1, 2, 3, 4, 6, 8, 9, 12 8 5
6, 7 f2 f3 f5 1, 2, 3, 4, 5, 6, 8, 9, 10, 12 5 7
8, . . . , 15 f2 f3 f4 f5 1, . . . , 14 1 15

TABLE 7.1: The BCH codes of length 15 over F2 .

The parameter δ is called the designed distance of the BCH code. The next
theorem shows that the minimal distance is at least as great.

T HEOREM 7.5.
The minimal distance d(C) of the code C = BCH(q, n, δ ) is at least δ .

P ROOF. We identify Fqn with R = Fq [x]/hxn − 1i via

(an−1 , . . . , a0 ) ←→ an−1 xn−1 + · · · + a1 x + a0 mod xn − 1.

Furthermore we have a primitive nth root of unity β ∈ Fqm for some m ≥ 1, and for
a ∈ R we have

a ∈ C ⇐⇒ a(β i ) = 0 for 1 ≤ i < δ

  
β n−1 ··· β2 β 1 an−1
 β 2(n−1) ··· β4 β2 1   .. 
  . 
⇐⇒  .. .. .. ..  = 0.
 . . . .  a1 
β (δ−1)(n−1) · · · β 2(δ−1) β δ−1 1 a0

We denote the (δ − 1) × n-matrix above by B, and show that each (δ − 1) × (δ − 1)

submatrix of B is nonsingular. From this the claim follows, because then for each
a ∈ C with a 6= 0 and w(a) ≤ δ − 1 we have Ba 6= 0.
7. Application: Decoding BCH codes 213

For 0 ≤ i < n, the (n − i − 1)st column of B is

 
βi
 β 2i 
 
 .. .
 . 
β (δ−1)i

If we divide it by β i , we obtain
 
1
 βi 
 
 .. ,
 . 
β (δ−2)i

which is a column of a Vandermonde matrix (Section 5.2). Hence each (δ − 1) ×

(δ − 1) submatrix of B is a Vandermonde matrix, where the columns are multiplied
by some power of β . Since the β i are pairwise distinct for 0 ≤ i < n and any power
of β is nonzero, all such submatrices are nonsingular. ✷

Table 7.1 shows that the minimal distance of a BCH code can be strictly larger
than the designed distance.
Now we will see how the decoding of a BCH code works. Let C = BCH(q, n, δ )
be given via β , and let δ be odd. Suppose that c ∈ C is the transmitted and r the
received word. We want to correct up to t = (δ − 1)/2 errors. Let

e = r − c = en−1 xn−1 + · · · + e1 x + e0 mod xn − 1 ←→ (en−1 , . . . , e1 , e0 )

be the error vector. Our assumption is that w(e) ≤ t. We define:

M = {i : ei 6= 0}, the positions where an error occurs,

u= ∏ (1 − β i y) ∈ Fq[y], the error locator polynomial, and
i∈M
v= ∑ ei β iy ∏ (1 − β j y) ∈ Fq [y].
i∈M j∈M\{i}

Then #M ≤ t, deg u ≤ t, and deg v ≤ t. If we know u and v, then the errors can
be corrected in the following way. By evaluating u at 1, β −1 , β −2 , . . . , β −n+1 , we
obtain M. If i ∈ M, then we use the following observations to calculate ei (this is
only necessary, of course, if q > 2). The formal derivative u′ of u with respect to y
(Section 9.3) is
u′ = ∑ (−β i ) ∏ (1 − β j y).
i∈M j∈M\{i}
214 7. Application: Decoding BCH codes

Thus
v(β −i ) = ei ∏ (1 − β j−i ) = −ei β −i u′ (β −i ),
j∈M\{i}

and hence
−v(β −i )β i
ei = .
u′ (β −i )
To compute u and v, we define
v ei β i y
w= =∑ = ∑ ∑ ei (β i y)k = ∑ yk ∑ ei β ki = ∑ yk e(β k ).
u i∈M 1 − β i y i∈M k≥1 k≥1 i∈M k≥1

Since c(β k ) = 0 for 1 ≤ k ≤ δ − 1, we have e(β k ) = r(β k ) for 1 ≤ k ≤ δ − 1.

We can compute these values, because r is the received word, and hence compute
w rem yδ . So we have to solve the following problem: Given w rem yδ , compute u
and v.
It is possible to formulate this problem as a system of linear equations. On the
other hand, v/u is just a (t + 1,t)-Padé approximant to w. It is unique and can
be computed with the Extended Euclidean Algorithm, as described in Section 5.9.
The computation can be done with O(δ 2 ) operations in Fqm , and with O∼ (δ ) oper-
ations using the fast algorithms from Part II.

E XAMPLE 7.4 (continued). Let g = f5 = x4 + x + 1 ∈ F2 [x] from Example 7.4

be the generator polynomial of the code C = BCH(2, 15, 3) when we take β =
(x mod x4 + x + 1) ∈ F16 = F2 [x]/hx4 + x + 1i. Table 7.1 shows that d(C) = 3, and
we can correct one error. Suppose that we have received
r = x5 + x4 + 1 mod x15 − 1 ∈ F2 [x]/hx15 − 1i.
Using β 4 = β + 1, we have
r(β ) = β 5 + β 4 + 1 = β 2 , r(β 2 ) = β 10 + β 8 + 1 = β + 1,
and hence
w= ∑ e(β k )yk = ∑ r(β k )yk ≡ (β + 1)y2 + β 2 y mod y3 . (1)
k≥1 k≥1

Exercise 7.2 shows that the (2, 1)-Padé approximant of w is

v (β 3 + β 2 + β )y β2y
= 3 = ,
u (β + β 2 + β )y + β 3 + β β 2 y + 1
so that v = β 2 y and u = β 2 y + 1 in F16 [y]. The only zero of u is at y = β −2 , and
if we assume that at most one error has occurred, then this happened at position
i = 2, and the original codeword was
c = x5 + x4 + x2 + 1 mod x15 − 1.
Notes 215

In fact, we have c = (x + 1)g mod x15 − 1 ∈ C. Expressing everything in terms of

bit strings, the original message is m = 00000000011 of length eleven, encoded
as c∗ = 000000000110101 of length 15, using the isomorphism F211 −→ C given
by the basis x10 g, x9 g, . . . , g of C. The received word is 000000000110001, and
the decoding procedure discovers and corrects the error in the third last position,
giving the receiver the correct word c∗ , which then has to be converted back to the
original message m. ✸

Notes. Coding theory was founded by Shannon (1948). There are many good texts
available, among them Berlekamp (1984), MacWilliams & Sloane (1977), and van Lint
(1982). The coding technology for CDs is described in detail in Hoffman, Leonard, Lind-
ner, Phelps, Rodger & Wall (1991).
For arbitrary codes, it is not clear how to decode them efficiently, and, in fact, a suffi-
ciently general version of the decoding problem is NP-complete (Berlekamp, McEliece
& van Tilborg 1978). BCH codes were discovered by Bose & Ray-Chaudhuri (1960) and
independently by Hocquenghem (1959). Berlekamp (1984), already in the 1968 edition,
and Massey (1965) discovered the decoding procedure for BCH codes, in a different for-
malism, and Dornstetter (1987) pointed out the relation to the Euclidean Algorithm.
Rabin (1989), Albanese, Blömer, Edmonds, Luby & Sudan (1994), and Alon, Edmonds
& Luby (1995) describe erasure codes , a related class of codes which is used for commu-
nication over faulty networks that occasionally lose (or delay) packets (but do not change
them).

Exercises.
7.1 Let F be a field, k < n positive integers, and u1 , . . ., un ∈ F distinct. For f ∈ F[x], let χ( f ) =
( f (u1 ), . . ., f (un )) ∈ F n , that is, χ is the evaluation map at u1 , . . ., un . We define the linear code
C ⊆ F n by C = {χ( f ): f ∈ F[x], deg f ≤ k}. Show that C has minimal distance n − k.
7.2 Compute the (2, 1)-Padé approximant to w from (1).
7.3 Determine generator polynomials and minimal distances of all BCH codes for q = 2 and n = 7.
Hint: The polynomial x7 − 1 ∈ F2 [x] factors into three irreducible polynomials
x7 − 1 = (x + 1)(x3 + x + 1)(x3 + x2 + 1),
and β = x mod x3 + x + 1 ∈ F8 = F2 [x]/hx3 + x + 1i is a primitive 7th root of unity.
7.4 Let C = BCH(2, 7, 3) be generated by g = x3 + x + 1 ∈ F2 [x], and β = x mod g be as in Exer-
cise 7.3. Assuming that at most one error has occurred, decode the received words
r1 = x6 + x5 + x3 + 1 mod x7 − 1, r2 = x6 + x + 1 mod x7 − 1.
Find a codeword c ∈ C such that d(r2 − c) = 2.
7.5−→ Let q = 11 and n = 10.
(i) Prove that β = 2 ∈ Fq is a primitive nth root of unity.
(ii) Show that the polynomial x10 − 1 splits into linear factors over Fq .
(iii) Tabulate generator polynomials and minimal distances of all BCH codes for the above values
of q, n, and β.
(iv) Let C = BCH(11, 10, 5). Check that the generator polynomial for C is g = x4 + 3x3 + 5x2 +
8x + 1. Assuming that at most two errors have occurred, decode the received word
r = x6 + 7x + 4 mod x10 − 1 ∈ F11 [x]/hx10 − 1i.
Part II
Newton
Isaac Newton (1642–1727) had a rather tough childhood. His father died during
his mother’s pregnancy and his mother remarried when he was three years
old—and left little Isaac in the care of his grandmother.
In 1661, Newton entered Trinity College in Cambridge, and graduated with a
BA in 1664, after an unimpressive student career. But then the university shut
down for two years because of the Great Plague, and Newton, back in his native
Woolsthorpe, laid the ground for much of his future work in the anni mirabiles
1664–1666. He invented calculus (his method of fluxions) and the law of
gravitation, and showed by experiment the prismatic composition of white light.
All this before he turned 25. (Inventing calculus means that he developed a
widely applicable theory; its roots go back, of course, to the work of many
people, Archimedes and Fermat among them.)
Back at Cambridge, Newton became Lucasian Professor of Mathematics, at the
age of 26. His former teacher, Isaac Barrow, resigned from that position to make
way for the greater scientist (and to prepare his own move into a better position as
chaplain to King Charles II). At that time, Newton was the prototype of the
“forgetful professor”, rather negligent about trifles such as his appearance. His
nephew Humphrey Newton wrote: He very rarely went to Dine in ye Hall unless
upon some Publick Dayes, & then, if He has not been minded, would go very
carelesly, wth Shooes down at Heels, Stockins unty’d, surplice on, & his Head
scarcely comb’d.
Newton did not publish his early results; this was later to work against him in
disputes over priority. This was partly due to the publishers who were reluctant to
invest in a money pit like a mathematical monograph. A 1672 paper on optics
was so heavily criticized by overbearing referees (against whose beliefs his new
and correct theory ran) that he withdrew it in the end.
Finally, he published in 1687 his masterpiece Philosophiae Naturalis Principia
Mathematica, containing his discoveries in mechanics and astronomy.
In the summer of 1669, Newton had finished his De Analysi per Æquationes
Numero Terminorum Infinitas. It circulated among English mathematicians and
also abroad (Scotland and France), but appeared in print only in 1711. Among
other things, he describes what is now called Newton’s method (or Newton
iteration) for approximating real roots of polynomial equations. He takes
ϕ = y3 − 2y − 5 ∈ Q[y] as an example, and proceeds as follows:

Let the Equation y3 − 2y − 5 = 0 be proposed to be resolved: and let

2 be a number which differs from the Root sought, by less than a
tenth Part of itself. Then I put 2 + p = y, and I substitute this Value in
Place of it in the Equation, and thence a new Equation arises, viz.
p3 + 6p2 + 10p − 1 = 0, whose Root p is to be sought for, that it may

218
be added to the Quotient: viz. thus (neglecting p3 + 6p2 upon the
Account of their smallness) 10p − 1 = 0, or p = 0.1 is near the Truth;
therefore I write 0.1 in the Quotient, and then suppose 0.1 + q = p,
and this it’s value I substitute, as formerly, whence results
q3 + 6.3 q2 + 11.23 q + 0.061 = 0.
And since 11.23 q + 0.061 = 0 comes near to the Truth, or since q is
almost equal to −0.0054 ( viz. by dividing until as many Figures arise
as there are places betwixt the first Figures of this and the principal
Quotient) I write −0.0054 in the lower Part of the Quotient, since it is
negative.

His choice of the starting point 2 is justified by the fact that ϕ(2) = −1 < 0
< 16 = ϕ(3), so that there is a root of ϕ between 2 and 3. This example later
became a standard test for root finding methods. Joseph Raphson (1690)
discussed this approach, acknowledging Newton as the originator, and it is
sometimes called the Newton–Raphson method. Newton himself calls it “an
improved version of the procedure, expounded by Viète and simplified by
Oughtred”. We will use Newton iteration in Chapters 9 and 15.
After many years of studying
religious subjects (in particular,
the Biblical chronology), Newton
turned to public office, serving as
Member of (powerless) Parliament
until its dissolution in 1690.
In 1699, he was awarded the
moderately prestigious office of
( )
Warden of the Mint. Bell 1937 writes scathingly about this elevation: “The
crowning imbecility of the Anglo-Saxon breed is its dumb belief in public office
or an administrative position as the supreme honour for a man of intellect.”—is
that restricted to one “breed”?
The “universal genius” Leibniz, in Hannover, invented calculus independently,
probably in the mid-1670s, and published this in 1684, before Newton published
his ideas, almost two decades old by then. At first, the two men seem to have had
mutual respect for each other’s achievements. But fueled by the nationalism of
the day (so, what else is new?), this degenerated into one of the bitterest
controversies about priority in the history of science, an embarrassment for all
persons involved.
Sir Isaac Newton, knighted by Queen Anne in 1705, was President of the
Royal Society until his death at the age of 85.

219
Classical in this context came to mean something like make-believe.
Richard Phillips Feynman (1984)

We shall not build a new world

until we have got rid of the mentalities of the old.
John le Carré (1989)

Rule 8: The development of fast algorithms is slow!

Arnold Schönhage (1994)

Jede mathematische Aufgabe könnte durch direktes Zählen gelöst

werden. Es gibt aber Zähloperationen, die gegenwärtig in wenigen
Minuten vollführt werden, welche aber ohne Methode vorzunehmen
die Lebensdauer eines Menschen bei weitem nicht reichen würde.1
Ernst Mach (1896)

1 Any mathematical task could, in principle, be solved by direct counting. However, there are counting problems
that can presently be solved in a few minutes, but for which without mathematical method a lifetime would not
be sufficient.
8
Fast multiplication

In this chapter, we introduce fast methods for multiplying integers and polyno-
mials. We start with a simple method due to Karatsuba which reduces the cost
from the classical O(n2 ) for polynomials of degree n to O(n1.59 ). The Discrete
Fourier Transform and its efficient implementation, the Fast Fourier Transform,
are the backbone of the fastest algorithms. These work only when appropriate
roots of unity are present, but Schönhage & Strassen (1971) showed how to cre-
ate “virtual” roots that lead to a multiplication cost of only O(n log n loglog n). In
Chapter 9, Newton iteration will help us extend this to fast division with remainder.
General-purpose computer algebra systems typically only implement the clas-
sical method, and sometimes Karatsuba’s. This is quite sufficient as long as one
deals with fairly small numbers or polynomials, but for many high-performance
tasks fast arithmetic is indispensable. Examples include factoring large polyno-
mials (Section 15.7), finding primes and twin primes (Notes to Chapter 18), and
computing billions of digits of π (Section 4.6) or billions of roots of Riemann’s
zeta function (Notes 18.4).
Asymptotically fast methods are standard tools in many areas of computer sci-
ence, where, say, O(n log n) sorting algorithms like quicksort or mergesort are
widely used and experiments show that they outperform the “classical” O(n2 ) sort-
ing algorithms like bubble sort or insertion sort already for values of n below 100.
In contrast, the asymptotically fast algorithms for polynomial and integer arith-
metic, in particular for multiplication, have received until recently comparatively
little attention in the computer algebra world since their invention around 1970.
Some of the reasons may be that the fast algorithms are often considerably more
complicated than the classical ones, and that the crossover points between these
algorithms may be disappointingly high when the algorithms are implemented “lit-
erally” as described in textbooks, without any further optimization. On the other
hand, experiments with highly optimized software, such as those described in Sec-
tion 9.7, show that Karatsuba’s algorithm has a fairly small crossover with the
classical algorithm, and that even faster multiplication algorithms already come
into play for moderately sized inputs. Designers of a computer algebra system

221
222 8. Fast multiplication

should carefully determine the crossover points and then, depending on the size of
the problems their system is intended to solve, decide which algorithms to offer.
Last but not least there is the intellectual beauty of asymptotic analysis. Com-
plexity theory provides a precise framework in which to compare algorithms via
their asymptotic running time (or some other measure, such as memory or paral-
lel time). For our problems, both Boolean and arithmetic complexity theory play
a role; Bürgisser, Clausen & Shokrollahi (1997) give an overview of the impres-
sive results of the latter. It provides tools for proving lower bounds, saying that
every conceivable algorithm must use at least so and so many operations, and in
lucky cases even that some algorithms are optimal. Furthermore, this crystal-clear
framework allows for a precise statement that “this new method makes progress”
and, if incorrect, its refutation. Practical results, such as the experiments reported
in this book, are also important, but often open to dispute as they take place in
the muddy waters of difficult-to-compare computing environments, difference of
opinion on the “important cases and examples”, and difficulty to reproduce. The
latter is extremely laborious or virtually impossible for a large implementation ef-
fort. As an example, Fürer (2009) improved the O(n log n loglog n) multiplication
algorithm of Schönhage & Strassen (1971); it is difficult to imagine this status of
a well-defined challenge for over a quarter century for an experimental problem.
Some areas have an accepted set of concrete benchmark problems, such as the
“most wanted” Cunningham numbers for integer factorization (see Chapter 19).
But even there, asymptotic progress is the holy grail.
The lower table on the inside back cover lists the problems in polynomial al-
gebra for which we will achieve almost linear-time algorithms in the following
chapters. The algorithms work over an arbitrary ring or field, and fast polynomial
multiplication is crucial for them. For all these problems, the input size is about n,
and the classical algorithms from Part I take quadratic time. All algorithms have
integer analogs, which—as always—are more complicated due to the carries, with
about the same running time in word operations when the input consists of n words.

8.1. Karatsuba’s multiplication algorithm

We start with the multiplication of two polynomials f , g ∈ R[x] of degree less than
n over a ring R. As usual, “ring” means a commutative ring with 1. If fi , g j , hk ∈ R
are the coefficients of f , g and h = f g, respectively, the classical multiplication
algorithm uses O(n2 ) operations in R to compute the hk from the fi and g j : n2 mul-
tiplications fi g j plus (n − 1)2 additions for all hk = ∑i+ j=k fi g j (Section 2.3). For
instance, multiplying (ax + b)(cx + d) = acx2 + (ad + bc)x + bd uses four multi-
plications ac, ad, bc, bd, and one addition ad + bc.
Surprisingly, there is an easy method of doing better. We compute ac, bd,
u = (a + b)(c + d), and ad + bc = u − ac − bd, with three multiplications and
four additions and subtractions. The total has increased to seven operations, but a
8.1. Karatsuba’s multiplication algorithm 223

recursive application will drastically reduce the overall cost (see Figure 8.2). To
explain the general approach, we assume that n = 2k for some k ∈ N, set m = n/2,
and rewrite f and g in the form f = F1 xm + F0 with F0 , F1 ∈ R[x] of degree less than
m and similarly g = G1 xm + G0 . (If deg f < n − 1, then some of the top coefficients
are zero.) Now f g = F1 G1 xn + (F0 G1 + F1 G0 )xm + F0 G0 . In this form, multiplica-
tion of f and g has been reduced to four multiplications of polynomials of degree
less than m. Multiplication by a power of x does not count as a multiplication,
since it corresponds merely to a shift of the coefficients.
So far we have not really achieved anything. But the method by Karatsuba in
Karatsuba & Ofman (1962), explained for n = 1 above, shows how this expression
for f g can be rearranged to reduce the number of multiplications of the smaller
polynomials at the expense of increasing the number of additions. Since multipli-
cation is slower than addition, a saving is obtained when n is sufficiently large. We
rewrite the product as f g = F1 G1 xn + ((F0 + F1 )(G0 + G1 ) − F0 G0 − F1 G1 )xm +
F0 G0 . This expression shows that multiplication of f and g requires only three
multiplications of polynomials of degree less than m and some additions. The
same method is now applied recursively to the smaller multiplications. If T (n)
denotes the time necessary to multiply two polynomials of degree less than n, then
T (2n) ≤ 3T (n) + cn, for some constant c. The linear term comes from the obser-
vation that addition of two polynomials of degree less than d can be done with d
operations in R.
Here is the corresponding algorithm.

A LGORITHM 8.1 Karatsuba’s polynomial multiplication algorithm.

Input: f , g ∈ R[x] of degrees less than n, where R is a ring (commutative, with 1)
and n a power of 2.
Output: f g ∈ R[x].

1. if n = 1 then return f · g ∈ R

2. let f = F1 xn/2 + F0 and g = G1 xn/2 + G0 , with F0 , F1 , G0 , G1 ∈ R[x] of degrees

less than n/2

3. compute F0 G0 , F1 G1 , and (F0 + F1 )(G0 + G1 ) by a recursive call

4. return F1 G1 xn + ((F0 + F1 )(G0 + G1 ) − F0 G0 − F1 G1 )xn/2 + F0 G0

Figure 8.1 visualizes this algorithm in the form of an arithmetic circuit for n = 4.
We first need a lemma which will be helpful in the analysis of several recursive
algorithms. T (n) will be the cost on input size n of the algorithm, which consists
of b recursive calls with inputs of size n/2, plus some cost denoted by S(n). We
denote by log the binary logarithm.
224 8. Fast multiplication

f0 f1 f2 f3 g0 g1 g2 g3

h0 h1 h2 h3 h4 h5 h6

F IGURE 8.1: An arithmetic circuit illustrating Karatsuba’s algorithm for n = 4. The shaded
boxes are Karatsuba circuits for n = 2. A subtraction node computes the difference of its
left input minus its right input. The flow of control is from top do bottom.

L EMMA 8.2. Let b, c ∈ R>0 , d ∈ R≥0 , S, T : N≥1 −→ N≥1 be functions such that
S(2n) ≥ cS(n) for all n ∈ N≥1 , and
T (1) = d, T (n) ≤ bT (n/2) + S(n) for n = 2i and i ∈ N≥1 .
Then for i ∈ N and n = 2i we have
(
dnlog b + S(n) log n if b = c,
T (n) ≤ log b c log(b/c)
dn + b−c S(n)(n − 1) if b 6= c.
If S(1) > 0, S and T are non-decreasing, and S(2n) ≤ eS(n) for some e ∈ R>0 and
all n ∈ N≥0 , then
(
O(S(n) log n) if b = c,
T (n) ∈ log(b/c)
O(S(n)n ) if b 6= c.
8.1. Karatsuba’s multiplication algorithm 225

P ROOF. We have the estimate

j
i i j i− j i log b i b
T (2 ) ≤ b T (1) + ∑ b S(2 ) ≤ d2 + S(2 ) ∑ ,
0≤ j<i 0≤ j<i c

where the first inequality follows by induction, and we use S(2i− j ) ≤ c− j S(2i ) in
the second one. If b = c, then the last sum simplifies to S(2i ) · i. If b 6= c, then we
have a geometric sum
j
b ( bc )i − 1 c
∑ c = b
= (2i(log(b/c)) − 1),
0≤ j<i c −1
b − c

and the first claim follows.

We note that S(1) ≤ c−i S(2i ) and therefore d nlog b ≤ (d/S(1)) · S(n)nlog(b/c) .
Thus 

 d

 + 1 S(n) log n if b = c,
S(1) log n
T (n) ≤

 d c

 + S(n)nlog(b/c) if b =
6 c,
S(1) b − c

for n = 2i and i ∈ N≥1 . For an arbitrary integer n ≥ 2, let i = ⌈log n⌉ and a =

d/S(1) + c/(b − c). If b 6= c, then

aeb
T (n) ≤ T (2i ) ≤ a · S(2i )2i log(b/c) ≤ a · S(2n) · (2n)log(b/c) ≤ S(n)nlog(b/c) ,
c
and similarly for the case b = c. ✷

T HEOREM 8.3.
Karatsuba’s algorithm 8.1 for multiplying polynomials of degree less than a power
n of 2 over a ring can be done with at most 9nlog 3 or O(n1.59 ) ring operations.

P ROOF. In step 3, we have n additions for computing F0 + F1 and G0 + G1 . The

computation of F1 G1 xn + F0 G0 in step 4 is just a concatenation of coefficients. The
computation of ((F0 + F1 )(G0 + G1 ) − F0 G0 − F1 G1 )xn/2 takes 2n subtractions, and
we have another n additions for adding the result to F1 G1 xn + F0 G0 . Thus we may
choose b = 3, c = 2, d = 1, and S(n) = 4n in Lemma 8.2 and obtain a total cost of
9 · nlog 3 − 8n operations. The claim follows from log 3 < 1.59. ✷

This is a substantial improvement over the classical method, since log 3 < 2.
The savings are visualized in Figure 8.2.
226 8. Fast multiplication

classical 1 iteration

2 iterations 3 iterations

4 iterations 5 iterations

F IGURE 8.2: Cost (= black area) of Karatsuba’s algorithm for increasing recursion depths.
The image approaches a fractal of dimension log2 3 ≈ 1.59.
8.2. The Discrete Fourier Transform and the Fast Fourier Transform 227

If n is not a power of 2, then there are two ways to proceed. The first one is
to apply the algorithm for the least power of 2 that is greater than n, that is, for
2⌈log n⌉ . This is easy to analyze, but introduces an additional factor of 3 in the
running time, which is annoying if n is only slightly greater than a power of 2. The
second possibility is to split the polynomials into blocks of about half the degree
each time in the recursive process. This leads to an algorithm that performs better
than the first variant, but the analysis is somewhat more involved; see Exercise 8.5.
The same method applies to multiplication of two (positive) integers a and b in
r-ary representation, say with r = 264 (see Section 2.1). When they have length at
most n, the classical integer multiplication algorithm requires O(n2 ) word opera-
tions (Section 2.3). Karatsuba’s algorithm for integers writes a = A1 264m + A0 and
b = B1 264m + B0 , where A0 , A1 , B0 , B1 < 264m and n = 2m is assumed to be a power
of 2. Then ab = A1 B1 264n + ((A0 + A1 )(B0 + B1 ) − A0 B0 − A1 B1 )264m + A0 B0 . As
in the polynomial case, multiplication of two integers has been reduced to multi-
plication of three integers of at most half the size plus O(n) word operations (if
A0 + A1 or B0 + B1 exceed 264n , then one has to take extra care of the leading bits,
or alternatively computes A0 B1 + A1 B0 = A0 B0 + A1 B1 − (A0 − A1 )(B0 − B1 )). This
leads to the following theorem.

T HEOREM 8.4.
Multiplication of two integers of length at most n words can be done with O(nlog 3 )
or O(n1.59 ) word operations using Karatsuba’s algorithm.

Karatsuba’s algorithm is used in computer algebra systems like M APLE. The

classical method is faster for polynomials of small degree and integers of small
length; see Exercises 8.4 and 8.6.

8.2. The Discrete Fourier Transform and the Fast Fourier Transform
In this section, we discuss a polynomial multiplication algorithm which works in
nearly linear time. It requires that the coefficient ring contain certain roots of unity.
We recall that an element a of a ring R is a zero divisor if there exists a nonzero
b ∈ R with ab = 0. In particular, 0 is a zero divisor (unless R is the trivial ring
{0}). The reader should note that in many algebra texts, 0 is not considered a zero
divisor.

D EFINITION 8.5. Let R be a ring, n ∈ N≥1 , and ω ∈ R.

(i) ω is an nth root of unity if ω n = 1.
(ii) ω is a primitive nth root of unity (or root of unity of order n) if it is an nth
root of unity, n ∈ R is a unit in R, and ω n/t − 1 is not a zero divisor for any
prime divisor t of n.
228 8. Fast multiplication

Here n has two meanings: in “ω n ” it is an integer used as a counter to express

the n-fold product of ω with itself, and in “n ∈ R” it stands for the ring element
n · 1R ∈ R, the n-fold sum of 1R with itself (Section 25.3).
ℑ

2πi·2
i=e 8

2πi·3 2πi·1
e 8 e 8

2πi·4 2πi·0
−1 = e 8 1=e 8

2πi·5 2πi·7
e 8 e 8

2πi·6
−i = e 8

F IGURE 8.3: The 8th roots of unity in C. The black square has order 1, the black circle
order 2, the two gray circles order 4, and the four white circles are the primitive 8th roots
of unity.

E XAMPLE
√ 8.6. (i) ω = e2πi/8 ∈ R = C is a primitive 8th root of unity, where
i = −1; see Figure 8.3.
(ii) Z8 has no primitive square root of unity, despite the fact that 32 ≡ 1, since 2
is not a unit.
(iii) For the “Fermat prime” 24 + 1 = 17, the element 3 is a primitive 16th root
of unity in Z17 , and 2 is not. ✸

The following lemma extends the property of Definition 8.5 (ii) for ω ℓ − 1 from
ℓ = n/t to all ℓ that are not divisible by n.

L EMMA 8.7. Let R be a ring, ℓ, n ∈ N≥1 such that 1 ≤ ℓ < n, and ω ∈ R a primitive
nth root of unity. Then
8.2. The Discrete Fourier Transform and the Fast Fourier Transform 229

(i) ω ℓ − 1 is not a zero divisor in R,

(ii) ∑0≤ j<n ω ℓ j = 0.

P ROOF. We will make repeated use of the formula

(c − 1) ∑ c j = cm − 1, (1)
0≤ j<m

which holds for all m ∈ N and c ∈ R (in fact, even for an indeterminate c).
(i) Let g = gcd(ℓ, n) and u, v ∈ Z so that uℓ + vn = g. Since 1 ≤ g < n, we
can choose a prime divisor t of n so that g divides n/t. Letting c = ω g and m =
n/tg in (1), we obtain a · (ω g − 1) = ω n/t − 1 for some a ∈ R. If b ∈ R satisfies
b · (ω g − 1) = 0, then also b · (ω n/t − 1) = 0, and hence b = 0 since ω n/t − 1 is not
a zero divisor. Thus ω g − 1 is not a zero divisor either.
Furthermore, (1) with c = ω ℓ and m = u implies that ω ℓ − 1 divides ω uℓ − 1 =
ω ω − 1 = ω g − 1. The same argument as above implies that ω ℓ − 1 is not a zero
uℓ vn

divisor.
(ii) By letting c = ω ℓ and m = n in (1), we see that
(ω ℓ − 1) ∑ (ω ℓ ) j = ω ℓn − 1 = 0.
0≤ j<n

Thus ∑0≤ j<n ω ℓ j = 0, since ω ℓ − 1 is not a zero divisor, by (i). ✷

When R is an integral domain (for example, a field), then (i) simply says ω ℓ 6= 1,
and it is sufficient to check this for ℓ = n/t, with t running through the prime
divisors of n, for the last property required of a primitive nth root of unity.
The following lemma, proven in Exercise 8.18, says when primitive roots of
unity exist in a finite field Fq (as defined on page 73) with q elements.

L EMMA 8.8. For a prime power q and n ∈ N, a finite field Fq contains a primitive
nth root of unity if and only if n divides q − 1.

Let R be a ring, n ∈ N≥1 , and ω ∈ R be a primitive nth root of unity. In what

follows, we identify a polynomial f = ∑0≤ j<n f j x j ∈ R[x] of degree less than n with
its coefficient vector ( f0 , . . . , fn−1 ) ∈ Rn .

D EFINITION 8.9. (i) The R-linear map

 n n
 R −→ R
DFTω : 2 n−1
 f 7−→ f (1), f (ω ), f (ω ), . . . , f (ω )

which evaluates a polynomial at the powers of ω is called the Discrete Fou-

rier Transform (DFT).
230 8. Fast multiplication

(ii) The convolution of two polynomials f = ∑0≤ j<n f j x j and g = ∑0≤k<n gk xk

in R[x] is the polynomial

h = f ∗n g = ∑ hℓ xℓ ∈ R[x],
0≤ℓ<n

where
hℓ = ∑ f j gk = ∑ f j gℓ− j for 0 ≤ ℓ < n,
j+k≡ℓ mod n 0≤ j<n

with index arithmetic modulo n. If n is clear from the context, we will simply
write ∗ for ∗n . If we regard the coefficients as vectors in Rn , then h is called
the cyclic convolution of the vectors f and g.

This notion of convolution is equivalent to polynomial multiplication in the ring

R[x]/hxn − 1i. The ℓth coefficient of the product polynomial f g is ∑ j+k≡ℓ mod n f j gk ,
and hence f ∗n g ≡ f g mod xn − 1. We will exploit this relationship between con-
volution and multiplication to obtain fast polynomial multiplication algorithms.

E XAMPLE 8.10. Let R = Q, n = 4, u ∈ Q, and suppose we want to multiply f =

x3 + 1 and g = 2x3 + 3x2 + x + 1 in Q[x] modulo x4 − u. Then

f g = 2x6 + 3x5 + x4 + 3x3 + 3x2 + x + 1

= (2x2 + 3x + 1)(x4 − u) + 3x3 + 3x2 + x + 1 + (2x2 + 3x + 1)u
= (2x2 + 3x + 1)(x4 − u) + 3x3 + (3 + 2u)x2 + (1 + 3u)x + (1 + u).

The special form of the binomial x4 − u makes division with remainder particularly
easy: the quotient is the upper part of f g, and the remainder is the lower part of f g
plus u times the upper part. In particular, for u = 1 we have f g ≡ 3x3 + 5x2 + 4x + 2
mod x4 − 1, or equivalently, f ∗4 g = 3x3 + 5x2 + 4x + 2. A similar phenomenon
will help us in the FFT algorithm below. ✸

L EMMA 8.11. For polynomials f , g ∈ R[x] of degree less than n, we have

DFTω ( f ∗ g) = DFTω ( f ) · DFTω (g),

where · denotes the pointwise multiplication of vectors.

P ROOF. We have f ∗ g = f g + q · (xn − 1) for some q ∈ R[x], so that

( f ∗ g)(ω j ) = f (ω j )g(ω j ) + q(ω j )(ω jn − 1) = f (ω j )g(ω j ) for 0 ≤ j < n. ✷

8.2. The Discrete Fourier Transform and the Fast Fourier Transform 231

We may also consider the map R[x] −→ Rn that evaluates f at ω 0 , . . . , ω n−1 . Its
kernel is hxn − 1i, and the lemma says that DFTω : R[x]/hxn − 1i −→ Rn is a homo-
morphism of R-algebras, where multiplication in Rn is pointwise multiplication of
vectors. The following commutative diagram illustrates this:

2 DFTω × DFTω✲
R[x]/hxn − 1i Rn × Rn

cyclic pointwise
(2)
convolution multiplication
❄ ❄
n
R[x]/hx − 1i ✲ Rn
DFTω

In fact, DFTω is an isomorphism. If R is a field, then this is the special case of the
Chinese Remainder Theorem 5.3, where m j = x − ω j for 0 ≤ j < n. We discussed
in Section 5.1 the general principle of change of representation and will now see
how the particular example (2) gives rise to a fast multiplication algorithm.
In the following, polynomials of degree less than n over an integral domain R
are—besides the usual dense representation by their coefficient vectors—represen-
ted by their values at n distinct points u0 , . . . , un−1 ∈ R, namely the powers u j = ω j
for 0 ≤ j < n of a primitive nth root of unity ω ∈ R. The reason for consider-
ing the value representation is that multiplication in that representation is easy:
If f (u0 ), . . . , f (un−1 ) and g(u0 ), . . . , g(un−1 ) are the values of two polynomials f
and g with deg( f g) < n at n distinct points, then the values of the product poly-
nomial f g at those points are f (u0 ) · g(u0 ), . . . , f (un−1 ) · g(un−1 ). Hence the cost
of polynomial multiplication in the value representation is linear in the degree,
while we do not know how to multiply polynomials in the dense representation
in linear time. Thus a fast way of doing multipoint evaluation and interpolation
leads to a fast polynomial multiplication algorithm: evaluate the two input poly-
nomials, multiply the results pointwise, and finally interpolate to get the product
polynomial.
The Discrete Fourier Transform is a special multipoint evaluation at the powers
1, ω , . . . , ω n−1 of a primitive nth root of unity ω , and we will now show that both
the DFT and its inverse, the interpolation at the powers of ω , can be computed with
O(n log n) operations in R, and thus obtain an O(n log n) multiplication algorithm
for polynomials. In Chapter 10, we will see a fast algorithm for evaluation and
interpolation at arbitrary points.
First we show that interpolation at the powers of ω is essentially again a Discrete
Fourier Transform. The Vandermonde matrix
232 8. Fast multiplication

 
1 1 1 ··· 1

 1 ω ω2 ··· ω n−1 

 1 ω2 ω4 ··· ω 2(n−1) 
Vω = VDM(1, ω , . . . , ω n−1 ) =  
 .. .. .. .. .. 
 . . . . . 
2
1 ω n−1 ω 2(n−1) · · · ω (n−1)
= (ω jk )0≤ j,k<n ∈ Rn×n
is the matrix of the multipoint evaluation map DFTω (Section 5.2).
√
E XAMPLE 8.12. For the primitive 4th root of unity ω = i = −1 ∈ C, we have
 
1 1 1 1
 1 i −1 −i 
Vi = VDM(1, i, −1, −i) =   1 −1 1 −1  . ✸


1 −i −1 i

Since 1, ω , . . . , ω n−1 are distinct, Vω is invertible if R is a field (Section 5.2), and

its inverse is the matrix of the interpolation map at the n points. The following
theorem says that this is true for arbitrary rings.

T HEOREM 8.13.
Let R be a ring (commutative, with 1), n ∈ N≥1 , and ω ∈ R be a primitive nth root
of unity. Then ω −1 is a primitive nth root of unity and Vω · Vω−1 = nI , where I is
the n × n identity matrix.

P ROOF. Exercise 8.13 shows that ω −1 is a primitive nth root of unity. Let 0 ≤
j, ℓ < n, and
u = (Vω ·Vω−1 ) jℓ = ∑ (Vω ) jk (Vω−1 )kℓ = ∑ ω jk ω −kℓ = ∑ (ω j−ℓ )k .
0≤k<n 0≤k<n 0≤k<n

If j = ℓ, then u = ∑0≤k<n 1 = n. If j 6= ℓ, then u = ∑0≤k<n ω ( j−ℓ)k = 0, by Lem-

ma 8.7. ✷

In particular, the theorem implies that (Vω )−1 = n−1Vω−1 , so that computing the
inverse is fairly easy.
 
1 1 1 1
1  1 −i −1 i 
E XAMPLE 8.12 (continued). Vi−1 =  . ✸
4  1 −1 1 −1 
1 i −1 −i
8.2. The Discrete Fourier Transform and the Fast Fourier Transform 233

We next discuss an important algorithm—the Fast Fourier Transform, or FFT

for short—that computes the DFT quickly. It was (re)discovered by Cooley and
Tukey in 1965, and is possibly the second most important nontrivial algorithm in
practice. (Contender for first place is fast sorting.) The inverse DFT can then also
be computed quickly, by Theorem 8.13.
Let n ∈ N≥1 be even, ω ∈ R a primitive nth root of unity, and f ∈ R[x] of degree
less than n. To evaluate f at the powers 1, ω , ω 2 , . . . , ω n−1 , we divide f by xn/2 − 1
and xn/2 + 1 with remainder:

f = q0 (xn/2 − 1) + r0 = q1 (xn/2 + 1) + r1 , (3)

for some q0 , r0 , q1 , r1 ∈ R[x] of degree less than n/2. Due to the special form of
the divisor polynomials, the computation of the remainders r0 and r1 (we do not
actually need the quotients) can be done by adding the upper n/2 coefficients of
f to, respectively subtracting them from, the lower n/2 coefficients, as in Exam-
ple 8.10, at a total cost of n operations in R. In other words, if f = F1 xn/2 + F0 with
deg F0 , deg F1 < n/2, then xn/2 − 1 divides f − F0 − F1 , and hence r0 = F0 + F1 ;
similarly r1 = F0 − F1 . If we plug in a power of ω for x in (3), we find

f (ω 2ℓ ) = q0 (ω 2ℓ )(ω nℓ − 1) + r0 (ω 2ℓ ) = r0 (ω 2ℓ ),
f (ω 2ℓ+1 ) = q1 (ω 2ℓ+1 )(ω nℓ ω n/2 + 1) + r1 (ω 2ℓ+1 ) = r1 (ω 2ℓ+1 )

for all 0 ≤ ℓ < n/2. We have used the facts that ω nℓ = 1 and ω n/2 = −1, since

0 = ω n − 1 = (ω n/2 − 1)(ω n/2 + 1)

and ω n/2 − 1 is not a zero divisor. It remains to evaluate r0 at the even powers of
ω and r1 at the odd powers. Now ω 2 is a primitive (n/2)th root of unity (Exer-
cise 8.13), and hence the first task is a DFT of order n/2. But also the evaluation
of r1 can be reduced to a DFT of order n/2 by noting that r1 (ω 2ℓ+1 ) = r1∗ (ω 2ℓ ) for
r1∗ = r1 (ω x). The computation of the coefficients of r1∗ uses n/2 multiplications by
powers of ω . If n is a power of 2, we can proceed recursively to evaluate r0 and r1∗
at the powers 1, ω 2 , . . . , ω 2n−2 of ω 2 , and obtain the following algorithm.

A LGORITHM 8.14 Fast Fourier Transform (FFT).

Input: n = 2k ∈ N with k ∈ N, f = ∑0≤ j<n f j x j ∈ R[x], and the powers ω , ω 2 , . . . ,
ω n−1 of a primitive nth root of unity ω ∈ R.
Output: DFTω ( f ) = ( f (1), f (ω ), . . . , f (ω n−1 )) ∈ Rn .

1. if n = 1 then return ( f0 )

2. r0 ←− ∑ ( f j + f j+n/2 )x j , r1∗ ←− ∑ ( f j − f j+n/2 )ω j x j

0≤ j<n/2 0≤ j<n/2
234 8. Fast multiplication

f0 f1 f2 f3 f4 f5 f6 f7

ω0 ω1 ω2 ω3

fj f j+n/2

ω0 ω2 ω0 ω2
ωj

( f j + f j+n/2 )
( f j − f j+n/2 )ω j
ω0 ω0 ω0 ω0

f (1) f (ω) f (ω 2 ) f (ω 3 ) f (ω 4 ) f (ω 5 ) f (ω 6 ) f (ω 7 )

F IGURE 8.4: A butterfly operation (left) and an arithmetic circuit computing the FFT for
n = 8 (right). A subtraction node computes the difference of its left input minus its right
input.

3. call the algorithm recursively to evaluate r0 and r1∗ at the powers of ω 2

4. return r0 (1), r1∗ (1), r0 (ω 2 ), r1∗ (ω 2 ), . . . , r0 (ω n−2 ), r1∗ (ω n−2 )

T HEOREM 8.15.
Let n be a power of 2 and ω ∈ R be a primitive nth root of unity. Then Algo-
rithm 8.14 correctly computes DFTω using n log n additions in R and (n/2) log n
multiplications by powers of ω , in total 32 n log n ring operations.

P ROOF. We prove correctness by induction on k. If k = 0, then f = f0 is constant

and the algorithm returns the correct result. If k > 1, we have to show that f (ω 2ℓ ) =
r0 (ω 2ℓ ) and f (ω 2ℓ+1 ) = r1∗ (ω 2ℓ ) for all 0 ≤ ℓ < n/2, and then the claim follows from
8.2. The Discrete Fourier Transform and the Fast Fourier Transform 235

the induction hypothesis. Retracing the calculations above, we find

r0 (ω 2ℓ ) = ∑ ( f j + f j+n/2 )ω 2ℓ j = ∑ f j ω 2ℓ j + ∑ f j+n/2 ω 2ℓ j ω ℓn
0≤ j<n/2 0≤ j<n/2 0≤ j<n/2
2ℓ j 2ℓ
= ∑ f jω = f (ω ),
0≤ j<n

r1∗ (ω 2ℓ ) = ∑ ( f j − f j+n/2 )ω j ω 2ℓ j
0≤ j<n/2

= ∑ f j ω (2ℓ+1) j + ∑ f j+n/2 ω (2ℓ+1) j ω ℓn ω n/2

0≤ j<n/2 0≤ j<n/2

= ∑ f j ω (2ℓ+1) j = f (ω 2ℓ+1 ).
0≤ j<n

Let S(n) and T (n) denote the number of additions and multiplications in R,
respectively, that the algorithm uses for input size n. The cost for the individ-
ual steps is: 0 in steps 1 and 4, n additions and n/2 multiplications in step 2,
and 2S(n/2) additions and 2T (n/2) multiplications in step 3. This yields S(1) =
T (1) = 0, S(n) = 2S(n/2) + n, and T (n) = 2T (n/2) + n/2, and by unfolding the
recursions we find that S(n) = n log n and T (n) = 12 n log n. ✷

The FFT can be nicely illustrated in form of an arithmetic circuit. It is built from
elementary blocks that execute step 2 of the above algorithm for one particular
value of j, called a butterfly operation. One such building block as well as the
entire circuit for n = 8 are shown in Figure 8.4.
Figure 8.5 illustrates the cost of Algorithm 8.14 if the recursion is stopped at
depths 0, 1, . . . , 5 and the remaining subproblems are computed by Horner’s rule.
The diagonal lines visualize the linear cost at each recursive step which—in con-
trast to Karatsuba’s method—contributes to the overall cost. Now we use the FFT
to compute convolutions and products of polynomials quickly.

A LGORITHM 8.16 Fast convolution.

Input: f , g ∈ R[x] of degree less than n = 2k with k ∈ N, and a primitive nth root of
unity ω ∈ R.
Output: f ∗ g ∈ R[x].

1. compute ω 2 , . . . , ω n−1
2. α ←− DFTω ( f ), β ←− DFTω (g)
3. γ ←− α · β { pointwise product }
1
4. return DFTω −1 (γ ) = DFTω−1 (γ )
n
236 8. Fast multiplication

classical 1 iteration

2 iterations 3 iterations

4 iterations 5 iterations

F IGURE 8.5: Cost of the FFT for increasing recursion depths. The black area is propor-
tional to the total work.
8.2. The Discrete Fourier Transform and the Fast Fourier Transform 237

D EFINITION 8.17. We say that a (commutative) ring R supports the FFT if R

has a primitive 2k th root of unity for any k ∈ N.

An example of a ring supporting the FFT is R = C.

T HEOREM 8.18.
Let R be a ring that supports the FFT, and n = 2k for some k ∈ N. Then convolution
in R[x]/hxn − 1i and multiplication of polynomials f , g ∈ R[x] with deg( f g) < n
can be performed using 3n log n additions in R, 32 n log n + n − 2 multiplications by
powers of ω , n multiplications in R, and n divisions by n, in total 92 n log n + O(n)
arithmetic operations.

P ROOF. Let f , g ∈ R[x] of degree less than n and h = f ∗ g. Then h is uniquely

determined by its values at n distinct points. Since the convolution satisfies

DFTω (h) = DFTω ( f ) · DFTω (g),

by Lemma 8.11, where the multiplication is the componentwise product, we con-

clude that Algorithm 8.16 correctly computes the convolution of f and g. In par-
ticular, the output does not depend on the choice of the primitive root of unity ω .
If furthermore deg( f g) < n, then f ∗ g ≡ f g mod (xn − 1) implies that f g = f ∗ g.
The cost for the individual steps is

1. n − 2 multiplications by ω ,

2. 2n log n additions and n log n multiplications by powers of ω ,

3. n multiplications,

4. n log n additions, 21 n log n multiplications by powers of ω , and n divisions

by n,

and the claim follows. ✷

To multiply two arbitrary polynomials of degree less than n ∈ N, we only need

a primitive 2k th root of unity, where 2k−1 < 2n ≤ 2k . Then we have decreased the
cost of about 2n2 of the classical algorithm to O(n log n).

C OROLLARY 8.19.
If R supports the FFT, then polynomials in R[x] of degree less than n can be mul-
tiplied with 18n log n + O(n) operations in R.
238 8. Fast multiplication

8.3. Schönhage and Strassen’s multiplication algorithm

In the previous section we have discussed an asymptotically fast FFT-based poly-
nomial multiplication algorithm over rings containing certain primitive roots of
unity. If the underlying coefficient ring R is arbitrary, an FFT-based approach will
not work directly. In this section, we will show how to make it work by adjoining
“virtual” roots of unity.
Let R be a ring such that 2 is a unit in R, n = 2k for some k ∈ N, and D =
R[x]/hxn + 1i. D is generally not a field, even if R is, as the example R = C and
n = 2 shows. The congruences

xn ≡ −1 mod (xn + 1), x2n = (xn )2 ≡ 1 mod (xn + 1)

imply that ω = x mod (xn + 1) ∈ D is a 2nth root of unity. Moreover, ω n − 1 =

(−1) − 1 = −2 is a unit in R since 2 is, and ω is a primitive 2nth root of unity.
To multiply two polynomials f , g ∈ R[x] with deg( f g) < n = 2k ∈ N, it is clearly
sufficient to compute f g modulo xn + 1. This is called the negative wrapped
convolution of f and g . We let m = 2⌊k/2⌋ , t = n/m = 2⌈k/2⌉ , and partition the
coefficients of f and g into t blocks of size m, that is, we write

f= ∑ f j xm j , g= ∑ g j xm j ,
0≤ j<t 0≤ j<t

with f j , g j ∈ R[x] of degree less than m for 0 ≤ j < t. With f ′ = ∑0≤ j<t f j y j , g′ =
∑0≤ j<t g j y j ∈ R[x, y], we then have f = f ′ (x, xm ) and g = g′ (x, xm ). It is sufficient
to compute f ′ g′ modulo yt + 1, since

f ′ g′ = h′ + q′ (yt + 1) ≡ h′ mod (yt + 1) (4)

for some h′ , q′ ∈ R[x, y] implies that

f g = h′ (x, xm ) + q′ (x, xm )(xtm + 1) ≡ h′ (x, xm ) mod (xn + 1).

We now take the primitive 4mth root of unity

ξ = x mod (x2m + 1) ∈ D = R[x]/hx2m + 1i.

We want to compute h′ ∈ R[x, y] with degy h′ < t satisfying (4) (this uniquely de-
termines h′ ; see Section 2.4). Comparing coefficients of y j for j ≥ t, we see that
degx q′ ≤ degx ( f ′ g′ ) < 2m and conclude that

degx h′ ≤ max{degx ( f ′ g′ ), degx q′ } < 2m. (5)

With f ∗ = f ′ mod (x2m + 1), g∗ = g′ mod (x2m + 1), and h∗ = h′ mod (x2m + 1)
in D[y], (4) implies
f ∗ g∗ ≡ h∗ mod (yt + 1) in D[y]. (6)
8.3. Schönhage and Strassen’s multiplication algorithm 239

Since the three polynomials have degrees in x less than 2m, by (5), reducing them
modulo x2m + 1 is just taking a different algebraic meaning of the same coefficient
array. In particular, the coefficients of h′ ∈ R[x][y] can be read off the coefficients
of h∗ ∈ D[y].
The following picture illustrates the relations between h, h′ and h∗ ; the arrows
are ring homomorphisms.
h′ ∈ R[x, y]

mod (y − xm ) mod (x2m + 1)

✠ ❘
∗
h ∈ R[x] h ∈ D[y]

For example, let h = 4x3 + 3x2 + 2x + 1 ∈ Q[x], m = 2, and ξ = x mod (x4 + 1) ∈

D = Q[x]/hx4 + 1i. Then h′ = (4x + 3)y + (2x + 1) ∈ Q[x, y] and h∗ = (4ξ + 3)y +
(2ξ + 1) ∈ D[y].
Computationally, “nothing happens” in mapping h′ to h∗ , but we are now in a
situation where we can apply the machinery of the FFT to compute (6), as follows.
Since t equals either m or 2m, D contains a primitive 2tth root of unity η , namely
η = ξ if t = 2m and η = ξ 2 if t = m. Then (6) is equivalent to

f ∗ (η y)g∗ (η y) ≡ h∗ (η y) mod (η y)t + 1 ,

or
f ∗ (η y)g∗ (η y) ≡ h∗ (η y) mod (yt − 1), (7)
since ηt = −1. Given f ∗ (η y) and g∗ (η y) in D[y], Algorithm 8.16 computes h∗ (η y)
with O(t logt) operations in D, using essentially three t-point FFTs. A multipli-
cation of two elements in D is again a negative wrapped convolution over R which
can be handled recursively. Putting things together, we obtain the following algo-
rithm.

A LGORITHM 8.20 Fast negative wrapped convolution.

Input: Two polynomials f , g ∈ R[x] of degree less than n = 2k for some k ∈ N,
where R is a (commutative) ring such that 2 is a unit in R.
Output: h ∈ R[x] such that f g ≡ h mod (xn + 1) and deg h < n.

1. if k ≤ 2 then
call the classical algorithm 2.3 (or Karatsuba’s algorithm 8.1) to com-
pute f · g
return f g rem xn + 1
240 8. Fast multiplication

2. m ←− 2⌊k/2⌋ , t ←− n/m
let f ′ , g′ ∈ R[x, y] with degx f ′ , degx g′ < m such that f = f ′ (x, xm ) and g =
g′ (x, xm )

3. let D = R[x]/hx2m + 1i
if t = 2m then η ←− x mod (x2m + 1) else η ←− x2 mod (x2m + 1)
{ η is a primitive 2tth root of unity }
f ∗ ←− f ′ mod (x2m + 1), g∗ ←− g′ mod (x2m + 1)
call the fast convolution algorithm 8.16 with ω = η 2 to compute h∗ ∈ D[y]
of degree less than t such that

f ∗ (η y)g∗ (η y) ≡ h∗ (η y) mod (yt − 1),

using Algorithm 8.20 recursively for the multiplications in D

4. let h′ ∈ R[x, y] with degx h′ < 2m such that h∗ = h′ mod (x2m + 1)

h ←− h′ (x, xm ) rem (xn + 1)
return h

E XAMPLE 8.21. Let R = F5 , f = x4 + 2x + 3, and g = 2x3 + x2 + 4x + 2 in F5 [x].

Then we can compute f g ∈ F5 [x] using Algorithm 8.20 with k = 3, as follows. In
step 2, we calculate m = 2, t = 4, f ′ = y2 + 2x + 3 and g′ = (2x + 1)y + 4x + 2. In
step 3, we have η = x mod (x4 + 1),

f ∗ = y2 + 2η + 3, g∗ = (2η + 1)y + 4η + 2,
f (η y) = η 2 y2 + 2η + 3,
∗
g (η y) = (2η 2 + η )y + 4η + 2.
∗

Now Algorithm 8.16 (or, in this case, a direct calculation of f ∗ (η y)g∗ (η y)) yields

h∗ (η y) = (2η 4 + η 3 )y3 + (4η 3 + 2η 2 )y2 + (4η 3 + 3η 2 + 3η )y + 3η 2 + η + 1,

h∗ = h∗ (η (η −1 y))
= (2η + 1)y3 + (4η + 2)y2 + (4η 2 + 3η + 3)y + 3η 2 + η + 1.

Finally, in step 4 we have

h′ = (2x + 1)y3 + (4x + 2)y2 + (4x2 + 3x + 3)y + (3x2 + x + 1),

h′ (x, x2 ) = 2x7 + x6 + 4x5 + x4 + 3x3 + x2 + x + 1 ≡ f g mod (x8 + 1),

and in fact equality holds in the second line since the degrees of both sides are less
than 8. ✸

T HEOREM 8.22.
Algorithm 8.20 works correctly and uses 92 n log n loglog n + O(n log n) operations
in R.
8.3. Schönhage and Strassen’s multiplication algorithm 241

P ROOF. Correctness follows from the discussion preceding the algorithm. Let
T (k) denote the number of arithmetic operations in R that the algorithm uses on
inputs of size n = 2k . The cost for step 1 is O(1). In step 2, no arithmetic operations
in R are performed. By Theorem 8.18, Algorithm 8.16 uses 3t logt additions in D
and 32 t logt multiplications by powers of ω = η 2 in the FFT-steps, plus t divisions
by t ∈ R and t “essential” multiplications of two arbitrary elements of D (the pow-
ers of ω in step 1 of Algorithm 8.16 need not be computed). One addition in D costs
2m additions in R, one division by t costs 2m divisions in R, and one multiplication
of a = ∑0≤ j<2m a j x j mod (x2m + 1) by a power of η corresponds to a cyclic shift of
the coordinates ai and a sign inversion of the “wrapped around” coordinates, using
at most 2m operations in R. Each essential multiplication in D is done recursively,
using T (⌊k/2⌋+1) operations in R. The computation of f ∗ (η y), g∗ (η y) from f ∗ , g∗
and of h∗ = h∗ (η (η −1 y)) from h∗ (η y) amounts to 3t multiplications by powers of η .
Thus the total cost of step 3 is at most 9mt logt + 8mt + t · T (⌊k/2⌋ + 1). In step 4,
the only cost is at most n = mt additions for the computation of h from h′ . Together,
we have j k k l k m
T (k) ≤ 2⌈k/2⌉ T + 1 + 9 · 2k +1
2 2
if k > 2. Thus
j k + 1 k l k + 1 m
2−k T (k + 1) + 45 ≤ 2⌈(k+1)/2⌉−k T + 1 + 90 + 18 + 1 − 45
2 2
l k m j k k 1
= 2 2−⌈k/2⌉ T + 1 + 45 + 18 −
2 2 2

if k > 1, where we used ⌊(k + 1)/2⌋ = ⌈k/2⌉ and ⌈(k + 1)/2⌉ = ⌊k/2⌋ + 1. Writing
S(k) = (2−k T (k + 1) + 45)/(k − 1), we obtain

2(⌈k/2⌉ − 1) l k m 2(⌊k/2⌋ − 1/2)

S(k) ≤ S +9
k−1 2 k−1
l k m
≤S + 9 ≤ · · · ≤ S(2) + 9(⌈log k⌉ − 1)
2
for k > 1, by induction, and hence

k−1
T (k) = 2 (k − 2)S(k − 1) − 45
9 S(2) k 45
≤ 2k (k − 2)(⌈log(k − 1)⌉ − 1) + 2 (k − 2) − 2k
2 2 2
9 k 9
∈ 2 k log k + O(2k k) = n log n loglog n + O(n log n). ✷
2 2

In an implementation, one has to carefully determine the “crossover” threshold

for k in step 1 below which one uses a different algorithm.
242 8. Fast multiplication

We note that the value of m in step 2 and also the value of n need not be powers
of 2; it is sufficient that t be a power of 2 dividing 2m. This allows for choices
of n such as n = 3 · 2k or n = 5 · 3k , which have the advantage that fewer zeroes
have to be padded when deg( f g) is just below such a number. For example, if
deg( f g) = 3·22l−1 −1 for some l ∈ N, then the literal approach would use n = 22l+1
and m = 2l , t = 2l+1 in step 2, while it seems better to choose n = 3 · 22l−1 , with
m = 3 · 2l−1 and t = 2l in step 2.
Exercise 8.30 discusses the analog of Algorithm 8.20 when 3 is a unit in R, using
a 3-adic FFT. In particular, this covers the case when R is a field of characteristic 2.
What about arbitrary rings R? All divisions in Algorithm 8.16 are by powers of
two. We replace the last line by

4. return n · DFTω −1 (γ ) = DFTω−1 (γ )

Then it uses only additions and multiplications, but no divisions in R, and returns
n · (a ∗ b) instead of a ∗ b. If we use this modified algorithm in step 3 of Algo-
rithm 8.20, then Exercise 8.31 shows that the modified algorithm works over any
(commutative) ring and returns 2κ · h for some κ ∈ N. Similar modifications of the
3-adic FFT algorithm from Exercise 8.30 lead to an algorithm that computes 3λ · h
for some λ ∈ N.
If we now want to compute the product of two polynomials f , g ∈ R[x] of degree
less than n, we choose k, l ∈ N such that 2k−2 < n ≤ 2k−1 and 3l−1 < n ≤ 3l and
call both the modified 2-adic and the modified 3-adic algorithm to compute 2κ f g
and 3λ f g. Using the Extended Euclidean Algorithm, we find s,t ∈ Z such that
s2κ +t3λ = 1, and obtain s2κ f g +t3λ f g = f g. Thus we have the following result.

T HEOREM 8.23.
Over any commutative ring R, polynomials of degree less than n can be multiplied
using at most (18+72 log3 2) n log n loglog n+O(n log n) or 63.43 n log n loglog n+
O(n log n) arithmetic operations in R.

P ROOF. Let k, l ∈ N be as above. By Theorem 8.22, computing 2κ f g modulo

k
x2 + 1 takes 92 2k k log k + O(k log k) or 18 n log n loglog n + O(n log n) operations,
k k
since 2k < 4n. Similarly, 3λ f g modulo x2·3 + x3 + 1 can be computed using
24 · 3k k log k + O(k log k) or 72 n log3 n loglog n + O(n log n) operations, by Exer-
cise 8.30 and the fact that 3k < 3n. Computing s2κ f g + t3λ f g takes at most
6n operations, and the claim follows by noting that log3 n = log3 2 · log n and
72 log3 2 < 45.43. ✷

The constant 63.43 is probably not the best possible, and the reader is encour-
aged to figure out a smaller constant herself.
8.3. Schönhage and Strassen’s multiplication algorithm 243

Can this method be extended to integer multiplication? The following result

introduced the FFT into algebraic complexity theory and high-performance com-
puting.

T HEOREM 8.24 Schönhage & Strassen (1971).

Multiplication of integers of length n can be performed with O(n log n loglog n)
word operations.

We do not present the details of this algorithm (it is partly described in Exer-
cise 8.36), but rather a different approach for fast integer multiplication which
only works for integers of bounded length but seems sufficient for most prac-
tical purposes (inputs up to millions of gigabytes). Let a = ∑0≤ j<n a j 264 j , b =
∑0≤k<n bk 264k in 264 -ary representation, and A = ∑0≤ j<n a j x j , B = ∑0≤k<n bk xk in
Z[x], so that a = A(264 ) and b = B(264 ). If C = AB = ∑0≤l<2n−1 cl xl ∈ Z[x], then
we obtain ab = C(264 ). Now 0 ≤ cl = ∑ j+k=l a j bk < ∑ j+k=l 2128 ≤ n · 2128 for all l.
We assume that n < 261 , take three single precision primes p1 , p2 , p3 between 263
and 264 , and multiply A and B modulo each p j . Then the Chinese Remainder
Theorem guarantees that we can reconstruct AB from its images modulo the three
primes.
For the three modular multiplications, we want to use the FFT multiplication
algorithm 8.16, and this requires that each p j − 1 be divisible by a sufficiently
high power of 2; we will call such a prime a Fourier prime. More precisely, if
t = ⌈log(2n − 1)⌉ and 2t divides p j − 1 for j = 1, 2, 3, then F p j contains a primitive
2t th root of unity, by Lemma 8.8, and we may use Algorithm 8.16 with R = F p j to
compute AB mod p j . If n is not too large, then three such primes can be found by
successively testing 2t +1, 2·2t +1, 3·2t +1, . . . for primality, using the algorithms
from Chapter 18. (Exercise 18.16 shows how to find a primitive 2t th root of unity
modulo such a prime.) For example, for each of the six pairs
k 29 71 75 95 108 123
ω 21 287 149 55 64 493
p = k · 257 + 1 is prime and ω is the least positive primitive 257 th root of unity
modulo p. In fact, these are all primes p below 264 such that 257 divides p − 1,
and all but the first one are greater than 263 . (For p = 108 · 257 + 1, p − 1 is even
divisible by 259 ; it is the only prime below 264 with that property, and there is no
prime p below 264 such that p − 1 is divisible by a higher power of 2.) Three such
pairs may be precomputed once and for all. Here is the corresponding algorithm.

A LGORITHM 8.25 Three primes FFT integer multiplication.

s−1
Input: Two integers a, b ∈ N in 264 -adic representation such that a, b < 264·2 ,
where s ≤ 62.
Output: ab ∈ N.
244 8. Fast multiplication

{ we assume three precomputed pairs (p1 , ω1 ), (p2 , ω2 ), (p2 , ω3 ) of single

precision integers such that each p j ∈ {263 , . . . , 264 − 1} is prime and each
ω j ∈ {1, . . . , p j − 1} is a primitive 2s th root of unity in F p j }
1. let A, B ∈ Z[x], with all coefficients nonnegative and less than 264 , such that
a = A(264 ) and b = B(264 )
t ←− ⌈log2 (1 + deg(AB))⌉
2. for j = 1, 2, 3 call the fast convolution algorithm 8.16 over R = F p j with
s−t
ω ≡ ω 2j mod p j to compute C j = AB rem p j
3. call the Chinese Remainder Algorithm 5.4 to compute C ∈ Z[x] with non-
negative coefficients less than p1 p2 p3 such that C ≡ C j mod p j for j = 1, 2, 3
4. return C(264 )

The number s in the algorithm is an implementation dependent constant; it lim-

its the size of the numbers that can be multiplied. By the discussion preceding
the algorithm, with primes up to 264 the maximal possible value is s = 57, and
then the largest integers that can be multiplied by this method have 64 · 256 bits
or 536 870 912 gigabytes. The analogous algorithm for 32-bit processors allows
factors of 256 megabytes each (Exercise 18.16).
The number t ≤ s corresponds to the size of the inputs; the sum of the lengths
of a and b in the 264 -ary representation is close to 2t . The cost for Algorithm 8.25
amounts to essentially nine 2t -point FFT’s of polynomials with single precision
coefficients, in total O(t2t ) word operations.
Throughout this text, we discuss algorithms for many problems in computer
algebra based on fast polynomial and integer multiplication. In order to abstract
from the underlying multiplication algorithm in our cost analyses, we introduce
the following notation.

D EFINITION 8.26. Let R be a ring (commutative, with 1). We call a function

M: N>0 −→ R>0 a multiplication time for R[x] if polynomials in R[x] of degree
less than n can be multiplied using at most M(n) operations in R. Similarly, a func-
tion M as above is called a multiplication time for Z if two integers of length n
can be multiplied using at most M(n) word operations.

In the last line, log∗ is the number of times that logarithms have to be taken until
the result is less than 1. Thus log∗ n ≤ 5 for n < 265 536 . For practical purposes,
log∗ n behaves like a constant, and Fürer’s result comes close to being O(n log n),
which is also conjectured to be optimal.
In principle, any multiplication algorithm leads to a multiplication time. Table
8.6 summarizes the multiplication times for the algorithms we discuss. You find a
8.4. Multiplication in Z[x] and R[x, y] 245

Algorithm M(n)
classical 2n2
Karatsuba (Karatsuba & Ofman 1962) O(n1.59 )
FFT multiplication (provided that R supports the FFT) O(n log n)
Schönhage & Strassen (1971), Schönhage (1977)
O(n log n loglog n)
Cantor & Kaltofen (1991); FFT based
∗ n)
Fürer (2009); FFT based n log n · 2O(log

TABLE 8.6: Various polynomial multiplication algorithms and their running times.

similar table for quick reference on the inside back cover. In the remainder of this
text, we will assume that the multiplication time satisfies
M(n)/n ≥ M(m)/m if n ≥ m, M(mn) ≤ m2 M(n), (8)
for all n, m ∈ N>0 . The first inequality yields the superlinearity properties
M(mn) ≥ m · M(n), M(n + m) ≥ M(n) + M(m), and M(n) ≥ n (9)
for all m, n ∈ N>0 (Exercise 8.33). The last property in (8) says that M is “at most
quadratic” and implies that M(cn) ∈ O(M(n)) for all positive constants c. Theorem
8.23 implies that we may take
M(n) ∈ 63.43 n log n loglog n + O(n log n)
for an arbitrary commutative ring R, and we will mainly use this result.

8.4. Multiplication in Z[x] and R[x, y]

Let f , g ∈ Z[x] be of degree less than n. Theorem 8.23 tells us that f and g can
be multiplied using O(n log n loglog n) additions and multiplications in Z. If we
want to determine the number of word operations for their multiplication, we need
a bound on the coefficients of the intermediate results. A more detailed analy-
sis of Algorithm 8.20 would provide such an estimate, but we choose a different
approach here: Kronecker substitution.
Let f = ∑0≤i<n fi xi and g = ∑0≤ j<n g j x j , with all fi , g j ∈ Z of length at most l in
the 264 -adic representation, and h = f g = ∑0≤k≤2n−2 hk xk , where hk = ∑i+ j=k fi g j .
To simplify our discussion, we assume that all fi , g j are nonnegative (this can be
achieved, for example, by splitting f and g into their positive and negative parts if
necessary). Then
0 ≤ hk < ∑ 2128l ≤ n2128l
i+ j=k

for 0 ≤ k ≤ 2n−2. If we choose t ∈ N so that n2128l < 264t , then the coefficients of h
can be read off the 264 -adic representation of the number h(264t ) = f (264t )g(264t );
246 8. Fast multiplication

this corresponds to calculating modulo x − 264t . Thus t = 2l + ⌊(log n)/64⌋ + 1 ∈

O(l + log n) suffices, and then f (264t ) and g(264t ) have length O(n(l + log n)).
Theorem 8.24 now implies the following.

C OROLLARY 8.27.
Polynomials in Z[x] of degree less than n with coefficients of length at most l can
be multiplied using O(M(n(l + log n))) or O∼ (nl) word operations.

The O∼ notation ignores logarithmic factors; see Section 25.7.

The above result is not easy to implement efficiently. In practice, other mod-
ular approaches have proven useful. A method using Fermat numbers is dis-
cussed in Exercise 8.36, and we now present a different approach, similar to
the three primes FFT algorithm. We choose a collection of distinct single pre-
cision primes p1 , . . . , pr ∈ N such that their product exceeds n2128l+1 , compute f g
mod p1 ∈ F p1 [x], . . . , f g mod pr ∈ F pr [x], and finally recover f g by applying the
Chinese Remainder Algorithm 5.4 to each of its coefficients (Exercise 5.34). If
furthermore 2n ≤ 2k for some k ∈ N and we let the pi be Fourier primes such
that 2k | (pi − 1) for all i, then each F pi contains a primitive 2k th root of unity
(Lemma 8.8), and f g mod pi can be computed using Algorithm 8.16 directly, with-
out constructing any “virtual” primitive roots of unity. Each pi fits into one ma-
chine word of the target processor, and coefficient arithmetic in F pi amounts to just
a few machine instructions. The primes pi and appropriate primitive roots of unity
in F pi (Exercise 18.16), as well as the “Lagrange interpolants” in the Chinese Re-
mainder Algorithm, which only depend on the pi , may be precomputed and stored
if many polynomial multiplications of the same size have to be performed.
Of course, this approach limits the degree and the coefficient size of the poly-
nomials that can be handled, but it appears to be highly sufficient for practical
applications. An implementation by Shoup (1995) and an implementation of the
Fermat number approach are described in Section 9.7, and the timings indicate that
the former is well suited for small coefficients, while the latter is better when the
coefficients get large.
Kronecker substitution also applies to multiplication of bivariate polynomials
over a ring R. Let f = ∑0≤i<n fi xi , g = ∑0≤i<n gi xi ∈ R[y][x] be two polynomials,
with fi , gi ∈ R[y] of degree less than d for all i. Then h = f g = ∑0≤i≤2n−2 hi xi ,
with hi ∈ R[y] of degree at most 2d − 2 for all i. Substituting y2d−1 for x (that is,
calculating modulo x − y2d−1 ), we obtain

f · g ≡ f (y2d−1, y) · g(y2d−1, y) = h(y2d−1, y) = ∑ hi y(2d−1)i ≡ h mod x − y2d−1,

0≤i≤2n−2

and we see that the coefficients of h can be read off the image of f g modulo
x − y2d−1 , which in turn can be computed using fast multiplication of univariate
Notes 247

polynomials in R[y]. Since the degrees of f (y2d−1 , y) and g(y2d−1 , y) are at most
(n − 1)(2d − 1) + d − 1 = 2nd − n − d, we have the following result.

C OROLLARY 8.28.
Polynomials in R[x, y], where R is a ring (commutative, with 1), of degree less
than n in x and less than d in y can be multiplied using O(M(nd)) or O∼ (nd) ring
operations.

Notes. 8.1. Karatsuba and Ofman participated in a seminar run by Kolmogorov, who had
conjectured that the quadratic cost for multiplication cannot be beaten and then actually
wrote (probably with Ofman) and submitted their paper for publication. Karatsuba (1995)
says about the work that made him famous: I learned about the article only when I was
given its reprints.
8.2. Cooley & Tukey (1965) discovered the FFT for computer science, causing a revolution
in digital signal processing methods. Its original invention goes back a century and a half:
Gauß found it around 1805, but it appeared only posthumously (Gauß 1866). Gauß also
discovered the usual Fourier Transform, before Fourier did so in 1807 (published in Fou-
rier 1822). The algorithm was rediscovered several times over the years. The fascinating
history of the FFT is described in Cooley (1987, 1990) and Heideman, Johnson & Burrus
(1984).
8.3. Algorithm 8.20 is an adaption to polynomials of Schönhage & Strassen’s (1971) in-
teger multiplication algorithm, where 2 plays the role of our x; Schönhage (1977) solved
the additional complication that occurs in characteristic 2 by using a 3-adic FFT (Exer-
cise 8.30). These two papers, and also Cantor & Kaltofen (1991), showed Theorem 8.23.
Before Schönhage & Strassen’s breakthrough, Toom (1963), Cook (1966), Schönhage
(1966), and Strassen (1968, unpublished) had found n1+o(1) multiplication algorithms for
integers of length n, but Schönhage & Strassen set the world record that stood for a quarter
century, until Fürer (2009). Gentleman & Sande (1966) had earlier proposed to use the
FFT for polynomial multiplication. See Knuth (1998), §4.3.3, for a description of some of
these methods. An alternative, also briefly discussed there, is to employ the FFT over the
complex numbers, with approximations to sufficiently high precision of the complex roots
of unity. This is used in calculating π to record accuracy; FFT multiplication uses 90%
of the CPU time for such high precision calculations (Kanada (1988); see Section 4.6).
Bernstein (2001) gives an exhaustive discussion of fast multiplication routines. Schönhage
showed that n-bit integers can be multiplied on random access machines (with cost m to
access an m-bit address) using O(n log n) word operations (see Knuth (1998), §4.3.3 C).
Pollard (1971) presents an FFT-based multiplication algorithm for polynomials over
finite fields, over Z, and for integers, including an implementation report, but without
asymptotic analysis or a general construction of the required primitive roots of unity. He
also gives the three primes FFT algorithm 8.25; see also Lipson (1981), §IX.2.2. Moenck
(1976) presents an early implementation report.
For some of the multiplication times that we use, the required properties (8) may fail to
hold for n ≤ 2. We will ignore this systematically.
8.4. Kronecker (1882), §4, invented his substitution for multivariate polynomials; its effect
is to keep distinct coefficients “apart” after the substitution. The algorithm for multiplying
in Z[x] modulo several small primes appears in Pollard (1971).
248 8. Fast multiplication

Exercises.
8.1 Given the real and imaginary parts √ a0 , a1 , b0 , b1 ∈ R of two nonzero complex numbers z1 =
a0 + a1 i and z2 = b0 + b1 i, where i = −1, show how to compute the real and imaginary parts of
the quotient z1 /z2 ∈ C using at most 7 multiplications and divisions in R. Draw an arithmetic circuit
illustrating your algorithm. Can you achieve at most 6 real multiplications and divisions?
8.2 Let R be a ring (commutative, with 1) and f , g ∈ R[x, y], and assume that f and g have degrees
less than m in y and less than n in x. Let h = f · g.
(i) Using classical univariate polynomial multiplication and viewing R[x, y] as R[y][x], bound the
number of arithmetic operations in R to compute h.
(ii) Using Karatsuba’s algorithm bound the number of operations in R to compute h.
(iii) Generalize parts (i) and (ii) to polynomials in an arbitrary number of variables.
8.3 With notation as in Lemma 8.2, assume that in addition S and T are monotonically increasing
and S(2n) ≤ 4S(n) for all n ∈ N>0 . Derive a tight upper bound on T (n) that is valid for all n ∈ N>0
and not only for powers of two. Hint: n lies in some interval [2k−1 + 1, 2k ].
8.4 We have seen in Section 2.3 that the classical multiplication method for polynomials of degree
less than n requires 2n2 − 2n + 1 ring operations. For which values of n = 2k is this larger than the
9 · 3k − 8 · 2k for Karatsuba’s algorithm?
8.5∗ Let R be a ring (commutative, with 1) and f , g ∈ R[x] of degrees less than n. You are to analyze
two variants of Karatsuba’s algorithm 8.1 when n is not a power of two.
(i) The first variant is to call Algorithm 8.1 with 2⌈log n⌉ instead of n. Show that this takes at most
9 · 3⌈logn⌉ ≤ 27nlog 3 operations in R.
(ii) Let m = ⌈n/2⌉, and modify Algorithm 8.1 so that it divides f and g into blocks of degree less
than m. If T (n) denotes the cost of this algorithm, show that T (1) = 1 and T (n) ≤ 3T (⌈n/2⌉) + 4n.
(iii) Show that T (2k ) ≤ 9 · 3k − 8 · 2k and T (2k−1 + 1) ≤ 6 · 3k − 4 · 2k − 2 for all k ∈ N>0 , and
compare this to (i) for n = 2k and n = 2k−1 + 1. Plot the curves of the two running time bounds from
(i) and (ii) for n ≤ 50.
(iv) Implement both algorithms for R = Z. Experiment with various values of n, say 2 ≤ n ≤ 50 and
n = 100, 200, . . ., 1000. Use random polynomials with one-digit coefficients, and also with n-digit
coefficients.
8.6∗ Karatsuba’s algorithm is slower for small inputs than classical multiplication. You are to inves-
tigate a hybrid algorithm which recursively does Karatsuba until the degrees get smaller than some
bound 2d ∈ N and then switches to classical multiplication, that is, we replace line 1 of Karatsuba’s
algorithm 8.1 by
1. if n ≤ 2d then call Algorithm 2.3 to compute f g ∈ R[x]
Algorithm 8.1 then corresponds to d = 0. Let T (n) denote the cost of this algorithm when n = 2k for
some k ∈ N, and prove that T (n) ≤ γ(d)nlog 3 − 8n holds for n ≥ 2d , where γ(d) depends only on d.
Find the value of d ∈ N that minimizes γ(d), and compare the result to the running time bound of
Theorem 8.3.
8.7 Karatsuba’s method for polynomial multiplication can be generalized as follows. Let F be a
field, m, n ∈ N>0 , and f = ∑0≤i<n fi xi , g = ∑0≤i<n gi xi in F[x]. To multiply f and g, we divide each
of them into m ≥ 2 blocks of size k = ⌈n/m⌉:

f= ∑ Fi xki , g= ∑ Gi xki ,
0≤i<m 0≤i<m

with all Fi , Gi ∈ F[x] of degree less than k. Then f g = ∑0≤i<2m−1 Hi xki , where Hi = ∑0≤ j≤i Fj Gi− j
for 0 ≤ i < 2m − 1 and we assume that Fj , G j = 0 if j ≥ m.
Exercises 249

(i) Find a way to compute H0 , . . ., H4 when m = 3 using at most 6 multiplications of polynomials

of degree less than k. Use this method to construct a recursive algorithm à la Karatsuba and analyze
its cost when n is a power of 3.
(ii) Suppose that you have found a scheme to compute H0 , . . ., H2m−2 using d multiplications of
polynomials of degree less than k, and made this scheme into a recursive algorithm as in (i). How
large may d be at most such that your algorithm is asymptotically faster than Karatsuba’s? Compare
with your result from (i).
8.8∗∗ This continues Exercise 8.7.
(i) You are to show that d = 2m − 1 can be achieved provided that the cardinality of F is at least
2m − 1. Let K = F(α0 , . . ., αm−1 , β0 , . . ., βm−1 ) be the field of rational functions in 2m indetermi-
nates, α = ∑0≤i<n αi xi and β = ∑0≤i<n βi xi in K[x], u0 , . . ., u2m−2 ∈ F distinct, and v j = α(u j )β(u j )
for 0 ≤ j ≤ 2m − 2. If
x−uj
lj = ∏ uk − u j ∈ F[x]
0≤k≤2m−2
k6= j

are the Lagrange interpolants at u0 , . . ., u2m−2 for 0 ≤ j ≤ 2m − 2, then

αβ = ∑ v jl j = ∑ γi xi ∈ K[x],
0≤ j≤2m−2 0≤i≤2m−2

since deg(αβ) < 2m − 1 and the interpolating polynomial is unique, and each γi ∈ K is an F-linear
combination
γi = ∑ ci j v j
0≤ j≤2m−2

of v0 , . . ., v2m−2 , with ci j ∈ F for 0 ≤ i, j ≤ 2m − 2. (In fact, the matrix (ci j )0≤i, j≤2m−2 is the inverse
j
of the Vandermonde matrix (ui )0≤i, j≤2m−2 in F (2m−1)×(2m−1) .) This yields the following scheme for
j
computing H0 , . . ., H2m−2 . We assume that the values ui and ci j for 0 ≤ i, j ≤ 2m−2 are precomputed
and stored.
j j
1. Set Pi = ∑0≤ j<m Fj ui and Qi = ∑0≤ j<m G j ui for 0 ≤ i ≤ 2m − 2.
2. Compute Ri = Pi Qi for 0 ≤ i ≤ 2m − 2.
3. Set Hi = ∑0≤ j≤2m−2 ci j R j for 0 ≤ i ≤ 2m − 2.
Prove that this scheme works correctly (hint: first consider k = 1) and figure out the precise number
of additions and multiplications in F that steps 1 and 3 take.
j
(ii) Calculate the values of ui and ci j for F = F5 , m = 3, and ui = i mod 5 for 0 ≤ i ≤ 4.
(iii) Use the scheme from (i) to construct a recursive algorithm for polynomial multiplication, and
determine its asymptotic cost when n is a power of m. Conclude that if F is infinite, then for each
positive ε ∈ R there is an algorithm for multiplying polynomials of degree less than n in F[x] taking
O(n1+ε ) operations in F.
8.9 Let F = F29 .
(i) Find a primitive 4th root of unity ω ∈ F, and compute its inverse ω −1 ∈ F.
(ii) Find the matrices for DFTω and DFTω−1 , and check that their product is 4I.
8.10−→ Let F = F17 and f = 5x3 + 3x2 − 4x + 3, g = 2x3 − 5x2 + 7x − 2 in F[x].
(i) Show that ω = 2 is a primitive 8th root of unity in F, and compute the inverse 2−1 mod 17 of
ω in F.
(ii) Compute h = f · g ∈ F[x].
(iii) For 0 ≤ j < 8, compute α j = f (ω j ), β j = g(ω j ), and γ j = α j · β j . Compare γ j to h(ω j ).
250 8. Fast multiplication

(iv) Show the two matrices V1 = Vω and V2 = 8−1Vω−1 , and compute their product. Compute the
matrix–vector products V1 f ,V1 g, and V2 γ, identifying f and g with their coefficient vectors, and with
γ = (γ j )0≤ j<8 as computed. Comment.
(v) Trace the FFT multiplication algorithm 8.16 to multiply f and g, with ω as above.
8.11−→ Let F = F41 .
(i) Prove that ω = 14 ∈ F is a primitive 8th root of unity. Compute all powers of ω, and mark the
ones that are primitive 8th roots of unity.
(ii) Let η = ω 2 , and f = x7 + 2x6 + 3x4 + 2x + 6 ∈ F[x]. Give an explicit calculation of α =
DFTω ( f ), using the FFT. You only have to do one recursive step, and then can use direct evaluation
at powers of η.
(iii) Let g = x7 + 12x5 + 353 + 1 ∈ F[x]. Compute β = DFTω (g), γ = α · β with coordinate–wise
product, and h = DFTω−1 (γ).
(iv) Compute f · g in F[x] and f ∗8 g. Compare with your result from (iii).
√
8.12−→ The complex number ω = exp(2πi/8) ∈ C, where i = −1, is a primitive 8th root of unity.
3 2 3 2
Let f = 5x + 3x − 4x + 3 and g = 2x − 5x + 7x − 2 in C[x], and run the fast convolution algorithm
8.16 on this example to calculate the coefficients of the product f · g. (Of course, on such a small
example the “fast” algorithm is more tedious than the school method. But, who knows, you may
want to multiply polynomials of degree 1 000 000 one day . . . .) Multiply linear polynomials by the
“classical” method. Use ω only symbolically, with the fact that ω 4 = −1.
8.13 Let R be a ring, n ∈ N≥1 , and ω ∈ R be a primitive nth root of unity.
(i) Show that ω −1 is a primitive nth root of unity.
(ii) If n is even, then show that ω 2 is a primitive (n/2)th root of unity. If n is odd, then show that
ω 2 is a primitive nth root of unity.
(iii) Let k ∈ Z and d = n/ gcd(n, k). Show that ω k is a primitive dth root of unity; this generalizes
both (i) and (ii).
8.14 Let R be a ring, n ∈ N≥2 , ω ∈ R a primitive nth root of unity, and η ∈ R with η 2 = ω. Under
what conditions is η a primitive 2nth root of unity?
8.15∗ Let n ∈ N>0 and R be an integral domain of characteristic coprime to n.
(i) Show that the set Rn of all nth roots of unity is a subgroup of the multiplicative group R× .
(ii) Prove that the following are equivalent for an nth root of unity ω ∈ R:
(a) ω is a primitive nth root of unity,
(b) ω ℓ 6= 1 for 0 < ℓ < n (that is, ω has order n in R),
(c) ω ℓ 6= 1 for all 0 < ℓ < n with ℓ | n,
(d) ω n/p 6= 1 for all prime divisors p of n.
We now assume that R contains a primitive nth root of unity ω.
(iii) Draw all 12th roots of unity for R = C and mark the primitive ones.
(iv) Show that Rn is cyclic and isomorphic to the additive group Zn of integers modulo n (so in
particular, #Rn = n). Hint: The polynomial xn − 1 ∈ R[x] has at most n roots.
(v) Prove that there are precisely ϕ(n) primitive nth roots of unity, where ϕ is Euler’s totient
function.
8.16∗ Let q be a prime power, Fq a finite field with q elements, and n ∈ N a divisor of q − 1,
with prime factorization n = pe11 · · · per r . For a ∈ F×
q , we denote by ord(a) the order of a in the
multiplicative group F× ×
q , and want to show that ord(a) = q − 1 for some a ∈ Fq . Prove:
(i) ord(a) = n if and only if an = 1 and an/p j 6= 1 for 1 ≤ j ≤ r.
Exercises 251

e
(ii) F× j ×
q contains an element b j of order p j , for 1 ≤ j ≤ r. Hint: Consider an element of Fq which
is not a root of the polynomial x(q−1)/p j
− 1.
×
(iii) If a, b ∈ Fq are elements of coprime orders, then ord(ab) = ord(a) ord(b).
(iv) F×
q contains an element of order n.
(v) F×
q is cyclic.

8.17 (i) For all a ∈ F× k

19 , determine the powers a with k | 18, and derive ord(a) from this data
only.
(ii) Determine all n ∈ N>0 for which F19 contains a primitive nth root of unity, and for each such n,
list all primitive nth roots of unity.
8.18 Prove Lemma 8.8. Hint: Exercises 8.15 and 8.16.
8.19∗ Let p, q ∈ N be distinct odd primes, n = pq, and k, l ∈ N.
(i) Given a primitive kth root of unity in Z× ×
p and a primitive lth root of unity in Zq , how can you
×
construct a primitive mth root of unity in Zn , where m = lcm(k, l)?
(ii) Show that Z× n contains a primitive kth root of unity if and only if k | lcm(p − 1, q − 1).
(iii) Find primitive 16th roots of unity in Z× ×
17 and in Z97 , and construct a primitive 16th root of
×
unity in Z1649 .
8.20∗ (i) Let x, m, n ∈ N>0 , x ≥ 2, r = n rem m, and g = gcd(n, m). Prove that xn −1 rem xm −1 =
xr − 1, and conclude that gcd(xn − 1, xm − 1) = xg − 1. Answer the same questions when x is an
indeterminate.
(ii) Let n ∈ N≥2 . The integer Mn = 2n − 1 is the nth Mersenne number. Use (i) and Exercise 4.14
to prove that 2 is a primitive nth root of unity in Z/hMn i if and only if n is prime. Hint: Use Fermat’s
little theorem 4.9.
8.21 Let F be a field supporting the FFT, and a, b, q, r ∈ F[x] such that a = qb + r and deg r <
deg b ≤ deg a < n for a power n of 2. We assume that b is coprime to xn − 1. Give an algorithm which
on input a, b decides whether r = 0, and if so, computes the quotient q using essentially three n-point
FFTs.
8.22 Let R be a (commutative) ring supporting the FFT, f1 , . . ., fr , g1 , . . ., gr ∈ R[x] polynomials,
and h = ∑1≤ j≤r f j g j ∈ R[x], of degree less than a power n ∈ N of 2. Prove that h can be computed
using 2r + 1 FFTs of order n plus O(rn) operations in R. Determine the constant hidden in the “O”.
Compare this result to the time for multiplying each f j by g j using the fast convolution algorithm
8.16 and then adding up the products.
4
8.23−→ Let p = 65 537 = 22 + 1 be the fourth Fermat prime and ω = 3.
(i) Check that ω ∈ F× 16
p is a primitive 2 th root of unity.
(ii) Program classical polynomial multiplication, Karatsuba’s algorithm, and the FFT multiplica-
tion algorithm 8.16 in F p [x] in a computer algebra system of your choice. The inputs and outputs of
your program should be the coefficient arrays of two polynomials of degree less than 215 and their
product, respectively.
(iii) Design a suitable test series of polynomials with degrees increasing up to 32 767 and determine
the crossover points between your implementations of the three algorithms. Compare your timings
also to the built-in multiplication routine of the computer algebra system. Create a plot of your
timings. Comment on your results.
8.24∗ We may interleave Algorithms 8.14 and 8.16, as follows. Instead of evaluating f and g at all
powers of ω, we reduce the multiplication of f and g modulo xn − 1 to two multiplications modulo
xn/2 − 1. Here is the algorithm:
252 8. Fast multiplication

A LGORITHM 8.29 Fast convolution.

Input: f , g ∈ R[x] of degree less than n = 2k with k ∈ N, and the powers ω, ω 2 , . . ., ω n/2 − 1 of a
primitive nth root of unity ω ∈ R.
Output: f ∗ g ∈ R[x].
1. if k = 0 then return f · g ∈ R
2. f0 ←− f rem xn/2 − 1, f1 ←− f rem xn/2 + 1,
g0 ←− g rem xn/2 − 1, g1 ←− g rem xn/2 + 1
3. call the algorithm recursively to compute h0 , h1 ∈ R[x] of degrees less than n/2 such that

h0 (x) ≡ f0 (x) · g0 (x) mod xn/2 − 1, h1 (ωx) ≡ f1 (ωx) · g1 (ωx) mod xn/2 − 1

1
4. return ((h0 − h1 )xn/2 + h0 + h1 )
2
(i) Prove that the algorithm works correctly and takes 11
2 n log n operations in R.
(ii) For small inputs, it is faster to use classical multiplication (or Karatsuba’s algorithm) and
a subsequent reduction modulo xn − 1 to compute the result. Replace the first line of the above
algorithm by
1. if k ≤ d then call Algorithm 2.3 to compute f g ∈ R[x] and return f g rem xn − 1
The above algorithm corresponds to d = 0. Let T (n) denote the cost of the hybrid algorithm when
n = 2k for some k ∈ N, and prove that T (n) = 11 d
2 n log n+γ(d)n holds for n ≥ 2 , where γ(d) depends
only on d. Find the value of d ∈ N which minimizes γ(d), and compare the result to (i).
8.25∗ Algorithm 8.14 computes DFTω over a (commutative) ring R by dividing the input polyno-
mial f ∈ R[x] of degree less than n by xn/2 − 1 and xn/2 + 1 with remainder. A different approach is
to split f into its odd and even parts, that is, to write f = f0 (x2 ) + x f1 (x2 ) with f0 , f1 ∈ R[x] of degree
less than n/2, and then to compute DFTω2 ( f0 ) and DFTω2 ( f1 ) recursively. Work out the details and
prove that your algorithm uses cn log n operations in R for some positive constant c ∈ Q when n is
a power of 2. Modify, if necessary, your algorithm so that c = 3/2, as in Theorem 8.15. Hint: Use
ω n/2 = −1. Draw an arithmetic circuit illustrating your algorithm for n = 8, and compare it to the
circuit in Figure 8.4.
8.26∗ Let R be a ring (commutative, with 1) containing a primitive 3k th root of unity for any k ∈ N.
(i) Design a 3-adic FFT algorithm, taking as input k ∈ N, a polynomial f ∈ R[x] of degree less than
n = 3k , and a list of powers 1, ω, ω 2 , . . ., ω n−1 of a primitive nth root of unity ω ∈ R, and returning
f (1), f (ω), f (ω 2 ), . . ., f (ω n−1 ). Prove the correctness of your algorithm. Hint: Consider dividing f
with remainder by xn/3 − 1, xn/3 − ω n/3 , and xn/3 − ω 2n/3 .
(ii) Draw an arithmetic circuit illustrating your algorithm for n = 9.
(iii) Let T (n) denote the cost of your algorithm in operations in R when n = 3k for some k ∈ N.
Set up a recursion for T (n) (don’t forget the initial condition) and solve it.
(iv) Assuming that R contains primitive nth root of unity for any n ∈ N, generalize the above to an
m-adic FFT algorithm for arbitrary m ∈ N≥2 .
(v) Formulate an alternative m-adic FFT algorithm as in Exercise 8.25.
8.27 Let F be a field containing a primitive 2k th root of unity for all k ∈ N. Algorithm 8.16 shows
that the convolution ∗n in F[x] can be computed with O(n log n) operations in F if n is a power of 2.
The goal of this exercise is to generalize this to arbitrary n ∈ N. So let f , g ∈ F[x] and m ∈ N be a
power of 2 such that m/2 < 2n ≤ m, and set a = f · (xm−n + 1) and b = g. Show how to obtain the
coefficients of f ∗n g from those of a ∗m b, and derive the claim from this.
8.28 Show that ω = x mod (xn − 1) ∈ R = F[x]/hxn − 1i, where F is a field of characteristic not
dividing n, is not a primitive nth root of unity for n ≥ 2.
Exercises 253

8.29∗ Let R be a ring (commutative, with 1).

(i) For p ∈ N≥2 , determine the quotient and remainder on division of f p = x p−1 +x p−2 +· · ·+x+1
by x − 1 in R[x]. Conclude that x − 1 is invertible modulo f p if p is a unit in R and that x − 1 is a zero
divisor modulo f p if p is a zero divisor in R.
(ii) Assume that 3 is a unit in R, and let n = 3k for some k ∈ N, D = R[x]/hx2n + xn + 1i, and ω =
x mod x2n + xn + 1 ∈ D. Prove that ω 3n = 1 and ω n − 1 is a unit. Hint: Calculate (ω n + 2)(ω n − 1).
Conclude that ω is a primitive 3nth root of unity.
(iii) Let p ∈ N be prime and a unit in R, n = pk for some k ∈ N, Φ pn = f p (xn ) = x(p−1)n +x(p−2)n +
· · · + xn + 1 ∈ R[x] the pnth cyclotomic polynomial, D = R[x]/hΦ pn i, and ω = x mod Φ pn ∈ D. Prove
that ω pn = 1 and ω n − 1 is a unit. Hint: Calculate (ω (p−2)n + 2ω (p−3)n + · · · + (p − 2)ω n + (p − 1)) ·
(ω n − 1). Conclude that ω is a primitive pnth root of unity.

8.30∗∗ In this exercise, we discuss Schönhage’s (1977) 3-adic variant of Algorithm 8.20. It works
over any (commutative) ring R such that 3 is a unit in R, in particular over a field of characteristic 2.

A LGORITHM 8.30 Schönhage’s algorithm.

Input: Two polynomials f , g ∈ R[x] of degree less than 2n = 2 · 3k for some k ∈ N, where R is a
(commutative) ring and 3 is a unit in R.
Output: h ∈ R[x] such that f g ≡ h mod (x2n + xn + 1) and deg h < 2n.

1. if k ≤ 2 then
call the classical algorithm 2.3 (or Karatsuba’s algorithm 8.1) to compute f · g
return f g rem x2n + xn + 1

2. m ←− 3⌈k/2⌉ , t ←− n/m
let f ′ , g′ ∈ R[x, y] with degx f ′ , degx g′ < m such that f = f ′ (x, xm ) and g = g′ (x, xm )
3. let D = R[x]/hx2m + xm + 1i
if m = t then η ←− x mod (x2m + xm + 1) else η ←− x3 mod (x2m + xm + 1)
{ η is a primitive 3tth root of unity }
f ∗ ←− f ′ mod (x2m + xm + 1), g∗ ←− g′ mod (x2m + xm + 1)
4. for j = 1, 2 do
f j ←− f ∗ rem y t − η jt , g j ←− g∗ rem y t − η jt
call the fast convolution algorithm 8.16 with ω = η 3 to compute h j ∈ D[y] of degrees
less than t such that

f j (η j y)g j (η j y) ≡ h j (η j y) mod yt − 1

{ the DFTs are performed by the 3-adic FFT algorithm from Exercise 8.26, and Algorithm
8.30 is used recursively for multiplications of elements in D }
1
5. h∗ ←− (y t (h2 − h1 ) + η 2t h1 − ηt h2 )(2ηt + 1)
3
let h′ ∈ R[x, y] with degx h′ < 2m such that h∗ = h′ mod (x2m + xm + 1)
h ←− h′ (x, xm ) rem (x2n + xn + 1)
return h

(i) Use Exercise 8.29 to prove that the algorithm works correctly.
(ii) Let T (k) denote the cost of the algorithm for n = 3k . Prove that T (k) ≤ 2 · 3⌊k/2⌋ T (⌈k/2⌉) +
(c + 48(⌊k/2⌋ + 1/2))3k for k > 2 and some constant c ∈ N, and conclude that T (k) is at most
24 · 3k · k · log k + O(3k · k) = 24n log3 n log2 log3 n + O(n log n). Hint: Consider the function S(k) =
(3−k T (k) + c)/(k − 1), and prove that S(k) ≤ S(⌈k/2⌉) + 24 if k > 2.
254 8. Fast multiplication

8.31∗ (i) Let R be a (commutative) ring, n ∈ N≥2 a power of two, and ω = (x mod xn/2 + 1) ∈
R[x]/hxn/2 + 1i. Show that the conclusion Vω ·Vω−1 = nI of Theorem 8.13 holds even when n is not
a unit in R. Hint: Show first that ω nt/2 + 1 = 0 for all odd t ∈ N and use the factorization

∑ x j = (x + 1)(x2 + 1)(x4 + 1) · · ·(xn/2 + 1)

0≤ j<n

in R[x]. Generalize this to the case when n is an arbitrary prime power.

(ii) Let e(k) be such that the division-free variant of Algorithm 8.20 described in the text returns
2e(k) h on input n = 2k for k ≥ 2. Prove that e(k + 1) = e(⌈k/2⌉ + 1) + ⌊k/2⌋ + 1 for k ≥ 2, and
conclude that e(k) = k − 2 + ⌈log(k − 1)⌉.
8.32∗ Prove that for n ∈ N≥1 , ω = 2 is a primitive 2nth root of unity modulo 2n + 1 if and only if n
is a power of 2.
8.33 Prove (9).
8.34 Let M be a multiplication time for polynomials. Prove that M(n + 1) ≤ M(n) + 4n for all n.
8.35 Let k, n ∈ N.
(i) Prove that a polynomial of degree less than n can be multiplied by a polynomial of degree less
than kn in time k M(n) + O(kn). Determine a “small” value for the constant in the “O”.
(ii) Now let n and k be powers of 2. For which values of k is your method from (i) faster than the
naive M(kn)? Answer the latter question for the classical M(n) = 2n2 , Karatsuba’s M(n) = 9nlog 3 ,
and M(n) = 29 n log n from Theorem 8.18.
8.36∗ Let k, l ∈ N and f , g ∈ Z[x] with deg( f g) < n = 2k , || f ||∞ , ||g||∞ ≤ 2l , and 2l + 1 ≤ n − k.
(i) Show that || f g||∞ < 2n−1 .
(ii) Use Exercise 8.32 and the fast convolution algorithm 8.16 over R = Z/h2n + 1i to compute
f g ∈ Z[x], and show that this takes O(n2 log n loglog n) word operations. (This is the essential part of
Schönhage & Strassen’s (1971) fast integer multiplication algorithm; they call the algorithm recur-
sively for the coefficient multiplications.)
8.37∗ Adapt Exercise 8.36 to bivariate polynomials.
8.38 Generalize the Kronecker substitution to reduce multiplication of r-variable polynomials to
multiplication of univariate polynomials, and analyze its cost, including the dependency on r.
What we know is a drop, what we don’t know, an ocean.
Isaac Newton

Mit Ausnahme der paar von Hand gefertigten Möbel, Kleider, Schuhe
und der Kinder erhalten wir alles unter Einschaltung mathematischer
Berechnungen. Dieses ganze Dasein, das um uns läuft, rennt, steht, ist
nicht nur für seine Einsehbarkeit von der Mathematik abhängig,
sondern ist effektiv durch sie entstanden.1
Robert Musil (1913)

Hat man diesen Gegenstand [die imaginären Grössen ] bisher aus

einem falschen Gesichtspunkt betrachtet und eine geheimnissvolle
Dunkelheit dabei gefunden, so ist diess grossentheils den wenig
√
schicklichen Benennungen zuzuschreiben. Hätte man +1, −1, −1
nicht positive, negative, imaginäre (oder gar unmögliche) Einheit,
sondern etwa directe, inverse, laterale Einheit genannt, so hätte von
einer solchen Dunkelheit kaum die Rede sein können.2
Carl Friedrich Gauß (1831)

Before the introduction of the Arabic notation, multiplication was

difficult, and the division even of integers called into play the highest
mathematical faculties. Probably nothing in the modern world could
have more astonished a Greek mathematician than to learn that [. . . ]
a large proportion of the population of Western Europe could
perform the operation of division for the largest numbers.
Alfred North Whitehead (1911)

Dixit Alchoarizmi: [. . . ] Hec sunt igitur universa, que necessaria sunt

hominibus ex divisione et multiplicatione in integro numero
et in ceteris, que secuntur. His peractis incipiemus narrare
multiplicationem fractionum et divisionem earum sive radices,
si deus voluerit.3
Abū Ja֒far Muh.ammad bin Mūsā al-Khwārizmı̄ (c. 830)

1 Except for a few hand-made pieces of furniture, clothes, shoes, and our children we obtain everything by
using mathematical computations. The whole world we live in, everything which walks, runs, stands around us,
depends not only for our understanding on mathematics, but has effectively been created by it.
2 That this subject [the imaginary magnitudes ] has hitherto been considered from the wrong point of view and
√
surrounded by a mysterious obscurity, is to be attributed largely to an unfortunate notation. If +1, −1, −1 had
not been called positive, negative, and imaginary (or even impossible) unit, but rather direct, inverse, and lateral
unit, then such an obscurity would probably not have arisen.
3 Thus spake Al-Khwārizmı̄: [. . . ] So this is everything that is necessary for men concerning the division and
multiplication with an integer, and the other things that are connected with it. Having completed this, we now
begin to discuss the multiplication of fractions and their division, and the extraction of roots, if God so wills.
9
Newton iteration

We mentioned on pages 218–219 Newton’s method for approximating roots of

polynomials. It has become a staple of numerical computation, and seen many
generalizations and improvements over the years. But what does this decidedly
continuous, approximative method, computing values that are closer and closer to
some real root, have to do with the discrete, exact calculations prevalent in com-
puter algebra? There is a somewhat counter-intuitive notion of closeness for inte-
gers (and polynomials), corresponding to divisibility by higher and higher powers
of a fixed prime. Newton iteration works just beautifully in this purely algebraic
setting.
We start by using it to find a custom-Taylored division algorithm that is about
as fast as multiplication, and then describe its use for finding roots of polynomials.
Finally, we describe a common framework—valuations—into which both the an-
alytical method over the real numbers and our symbolic version fit. In Chapter 15,
we will apply Newton’s method to the factorization of polynomials; it is then called
Hensel lifting.

9.1. Division with remainder using Newton iteration

Integers and polynomials over a field form a Euclidean domain when the absolute
value and the polynomial degree, respectively, is used as the Euclidean function.
This means that for all a, b with b 6= 0 there exist unique q, r such that a = qb + r
where 0 ≤ r < |b| in the integer case and deg r < deg b in the polynomial case. The
division problem is then to find q, r, given a, b.
The “classical” (or “synthetic”) division algorithm 2.5 requires O(n2 ) (word
or field) operations for inputs of size n. We will now see how this can be im-
proved to O(M(n)), where M is a multiplication time. We only discuss the poly-
nomial case since it is somewhat easier, and then briefly indicate the ideas in the
integer case. Let D be a ring (commutative, with 1) and a, b ∈ D[x] two polyno-
mials of degree n and m, respectively. We assume that m ≤ n and that b is monic.
We wish to find polynomials q and r in D[x] satisfying a = qb+r with deg r < deg b

257
258 9. Newton iteration

(where, as usual, we assume that the zero polynomial has degree −∞). Since b is
monic, such q, r exist uniquely even if D is not a field (Section 2.4).
Substituting 1/x for the variable x and multiplying by xn , we obtain
1 1 1 1
xn a = xn−m q · xm b + xn−m+1 xm−1 r . (1)
x x x x
We define the reversal of a as revk (a) = xk a(1/x). When k = n, this is the poly-
nomial with the coefficients of a reversed, that is, if a = an xn + an−1 xn−1 + · · · +
a1 x + a0 , then

rev(a) = revn (a) = a0 xn + a1 xn−1 + · · · + an−1 x + an

Equation (1) now reads

revn (a) = revn−m (q) · revm (b) + xn−m+1 revm−1 (r),

and therefore,
revn (a) ≡ revn−m (q) · revm (b) mod xn−m+1 .
We note that revm (b) has constant coefficient 1 and thus is invertible modulo
xn−m+1 , by Theorem 4.1. Hence we find

revn−m (q) ≡ revn (a) · revm (b)−1 mod xn−m+1 , (2)

and obtain q = revn−m (revn−m (q)) and r = a − q · b.

E XAMPLE 9.1. Let a = 5x5 + 4x4 + 3x3 + 2x2 + x and b = x2 + 2x + 3 be polyno-

mials in F7 [x]. Then

rev5 (a) = x4 + 2x3 + 3x2 + 4x + 5, rev2 (b) = 3x2 + 2x + 1.

We claim that rev2 (b)−1 ≡ 4x3 + x2 + 5x + 1 mod x4 in F7 [x], and indeed

(3x2 + 2x + 1)(4x3 + x2 + 5x + 1) = 5x5 + 4x4 + 1 ≡ 1 mod x4 .

It follows that

rev3 (q) ≡ (x4 + 2x3 + 3x2 + 4x + 5)(4x3 + x2 + 5x + 1) ≡ 6x3 + x + 5 mod x4 .

Thus q = 5x3 + x2 + 6 and

r = a − qb = 5x5 + 4x4 + 3x3 + 2x2 + x − (5x3 + x2 + 6)(x2 + 2x + 3) = 3x + 3. ✸

So now we have to solve the problem of finding, from a given f ∈ D[x] and l ∈ N
with f (0) = 1, a g ∈ D[x] satisfying f g ≡ 1 mod xl .
9.1. Division with remainder using Newton iteration 259

From numerical analysis, we recall that Newton iteration involves computing

successive approximations to solutions of ϕ(g) = 0. From a suitable initial ap-
proximation g0 , subsequent approximations are computed using
ϕ(gi )
gi+1 = gi − , (3)
ϕ′ (gi )
where ϕ′ is the derivative of ϕ. This corresponds to intersecting the tangent with
an axis, as illustrated in Figure 9.1, or in other words, to replacing ϕ by its “lin-
earization” or two-term Taylor expansion at that point. For the task at hand, we
want to find (or approximate) a root of 1/g − f = 0. The Newton iteration step is
1/gi − f
gi+1 = gi − = 2gi − f g2i .
−1/g2i
The following theorem tells us a good initial approximation and shows us that this
method converges “quickly” to a solution also in our algebraic setting.

T HEOREM 9.2.
Let D be a ring (commutative, with 1), f , g0 , g1 , . . . ∈ D[x], with f (0) = 1, g0 = 1,
i+1 i
and gi+1 ≡ 2gi − f g2i mod x2 , for all i. Then f gi ≡ 1 mod x2 for all i ≥ 0.

P ROOF. The proof is by induction on i. For i = 0 we have

0
f g0 ≡ f (0)g0 ≡ 1 · 1 ≡ 1 mod x2 .
For the induction step, we find
i+1
1 − f gi+1 ≡ 1 − f (2gi − f g2i ) ≡ 1 − 2 f gi + f 2 g2i ≡ (1 − f gi )2 ≡ 0 mod x2 . ✷

We obtain the following algorithm to compute the inverse of f mod xl . We

denote by log the binary logarithm.

A LGORITHM 9.3 Inversion using Newton iteration.

Input: f ∈ D[x] with f (0) = 1, and l ∈ N.
Output: g ∈ D[x] satisfying f g ≡ 1 mod xl .
1. g0 ←− 1, r ←− ⌈log l⌉
i
2. for i = 1, . . . , r do gi ←− (2gi−1 − f g2i−1 ) rem x2
3. return gr

If we want to compute the inverse of a polynomial f ∈ D[x] with f (0) a unit

different from 1, then we set g0 = f (0)−1 in step 1. If f (0) is not a unit, then no
inverse of f modulo xl exists, since f g ≡ 1 mod xl implies that f (0) · g(0) = 1.
260 9. Newton iteration

y y = ϕ(z)
✻

y = ϕ′ (gi )z + ϕ(gi ) − ϕ′ (gi )gi

• (gi , ϕ(gi ))

z
• • ✲
gi+1 gi

F IGURE 9.1: Newton iteration over the reals.

E XAMPLE 9.1 (continued). Let f = 3x2 + 2x + 1 ∈ F7 [x] and l = 4. Then Algo-

rithm 9.3 computes g0 = 1, r = 2, and

g1 ≡ 2g0 − f g20 = 2 − (3x2 + 2x + 1) ≡ 5x + 1 mod x2 ,

g ≡ 2g1 − f g21 = 2(5x + 1) − (3x2 + 2x + 1)(5x + 1)2
= 2x4 + 4x3 + x2 + 5x + 1 ≡ 4x3 + x2 + 5x + 1 mod x4 .

We have already checked in Example 9.1 that indeed f g ≡ 1 mod x4 . ✸

T HEOREM 9.4.
Algorithm 9.3 correctly computes the inverse of f modulo xl . If l = 2r is a power
of 2, then it uses at most 3M(l) + l ∈ O(M(l)) arithmetic operations in D.

r
P ROOF. Correctness follows from Theorem 9.2 and the fact that xl divides x2 . In
step 2, all powers of x up to 2i can be dropped, and since gi ≡ gi−1 · (2 − f gi−1 ) ≡
i−1
gi−1 mod x2 , also the powers of x less than 2i−1 . The cost for one iteration of
i
step 2 is M(2i−1 ) for the computation of g2i−1 , M(2i ) for the product f g2i−1 mod x2 ,
i
and then the negative of the upper half of f g2i−1 modulo x2 is the upper half of gi ,
taking 2i−1 operations. Thus we have M(2i ) + M(2i−1 ) + 2i−1 ≤ 32 M(2i ) + 2i−1 in
step 2, and the total running time is

3 3
∑ M(2i ) + 2i−1 ≤ M(2r ) + 2r−1 ∑ 2i−r < 3M(2r ) + 2r = 3M(l) + l,
1≤i≤r 2 2 1≤i≤r

where we have used 2M(n) ≤ M(2n) for all n ∈ N. ✷

9.1. Division with remainder using Newton iteration 261

If l is not a power of 2, then the above algorithm computes too many coefficients
of the inverse. Exercise 9.6 gives a better algorithm with essentially the same
running time bound in this general case.

A LGORITHM 9.5 Fast division with remainder.

Input: a, b ∈ D[x], where D is a ring (commutative, with 1) and b 6= 0 is monic.
Output: q, r ∈ D[x] such that a = qb + r and deg r < deg b.
1. if deg a < deg b then return q = 0 and r = a
2. m ←− deg a − deg b
call Algorithm 9.3 to compute the inverse of revdeg b (b) ∈ D[x] modulo xm+1
3. q∗ ←− revdeg a (a) · revdeg b (b)−1 rem xm+1
4. return q = revm (q∗ ) and r = a − bq

T HEOREM 9.6.
Let D be a ring (commutative, with 1). Division with remainder of a polynomial
a ∈ D[x] of degree n + m by a monic polynomial b ∈ D[x] of degree n, where
n ≥ m ∈ N, can be done using 4M(m) + M(n) + O(n) ring operations.

P ROOF. Let a = qb + r, with q, r ∈ D[x] such that deg r < n. Then we have deg q =
deg a − deg b = m. The correctness of Algorithm 9.5 follows from the discussion
at the beginning of the section. Using Exercises 8.34 and 9.6, we have at most
3M(m) + O(m) operations in step 2 of Algorithm 9.5, M(m) + O(m) in step 3, and
finally M(n) + O(n) in step 4; only the lower part of a − qb has to be computed
since deg r < deg b. ✷

It may seem circular to use an algorithm that uses the rem operation to perform
division. However, we are only using the rem operation to truncate the polyno-
mial. It is similar to finding the quotient and remainder of a large number written in
base 10 when divided by 10 000. Division in this special case costs no operations.
What is the number of word operations for division with remainder in Z[x]? If
b is monic and a, b have max-norm ||a||∞ , ||b||∞ < 2l , then Exercise 6.44 shows that
||q||∞ , ||r||∞ < 2nl . Exercise 9.15 shows that all intermediate results in Algorithm 9.3
have coefficients of length O(nl), and hence the cost for division using Newton
inversion is O(M(n)M(nl)) or O∼ (n2 l) word operations. Since the output size is
O(n2 l) and Exercise 6.44 also shows that this bound can be achieved, the running
time is—up to logarithmic factors—asymptotically optimal. A similar statement
also holds for division with remainder in F[y][x] for a field F.
Exercises 8.21 and 9.14 discuss slightly faster algorithms for exact division,
where the remainder is known to be zero in advance.
262 9. Newton iteration

C OROLLARY 9.7.
Let D be a ring (commutative, with 1) and f ∈ D[x] monic of degree n. Then one
multiplication in the residue class ring D[x]/h f i can be done using 6M(n) + O(n)
or O(M(n)) arithmetic operations in D.

If several divisions by or multiplications modulo the same f have to be per-

formed, then we may precompute f ∗ = revn ( f )−1 rem xn and store it, and the time
for one division or one modular multiplication drops to at most 2M(n) + O(n) and
3M(n) + O(n), respectively. When using FFT multiplication, then also DFTω ( f )
and DFTω ( f ∗ ), where ω is an appropriate root of unity, may be precomputed and
stored, thus further saving two FFTs per modular multiplication.
Now we want to divide with remainder an integer a by a nonzero integer b. Tak-
ing reversals with respect to the binary representation does not work here because
of the carries. Instead, numerical Newton iteration over R is used to compute an
approximation g to 1/b ∈ Q with sufficient accuracy, then q is obtained by round-
ing ag to the nearest integer, and finally r = a − qb. The quotient may need to be
adjusted by ±1 if r ≥ b or r < 0. To avoid fractions, things are scaled appropriately
by multiplying by powers of 2.

T HEOREM 9.8.
Division with remainder of integers of length n can be done with O(M(n)) word
operations.

C OROLLARY 9.9.
For an integer m ∈ N of length n, one multiplication in the residue class ring Zm
can be performed using O(M(n)) word operations.

Algorithm 9.3 can be generalized to compute inverses modulo pl for an element

p in an arbitrary ring R. One minor difference is that we need an initial inverse
modulo p, which is trivial to compute if p = x and R = D[x], and for a Euclidean
domain R can be computed with the Extended Euclidean Algorithm in R.

A LGORITHM 9.10 p-adic inversion using Newton iteration.

Input: f , g0 ∈ R with f g0 ≡ 1 mod p, where R is a ring (commutative, with 1) and
p ∈ R arbitrary, and l ∈ N.
Output: g ∈ R satisfying f g ≡ 1 mod pl .

1. r ←− ⌈log l⌉
i
2. for i = 1, . . . , r compute gi ∈ R such that gi ≡ (2gi−1 − f g2i−1 ) mod p2

3. return gr
9.1. Division with remainder using Newton iteration 263

E XAMPLE 9.11. We let R = Z, and wish to compute the inverse of 5 modulo 81.
We begin with g0 = −1, since −1 · 5 ≡ 1 mod 3. Then

g1 ≡ 2g0 − 5g20 ≡ 2 mod 9, g2 ≡ 2g1 − 5g21 ≡ −16 mod 81,

and indeed −16 · 5 = −80 ≡ 1 mod 81. ✸

T HEOREM 9.12.
Algorithm 9.10 correctly computes the inverse of f mod pl . It uses O(M(l log p))
word operations if R = Z, p > 1, and | f | < pl , and O(M(l deg p)) operations in D
if R = D[x] for a (commutative) ring D, p is monic, and deg f < l deg p.

Exercise 9.8 asks for a proof.

In step 2, there are in general many gi ’s satisfying the congruence. But in our
applications such as integers or polynomials we have a unique “smallest” such gi ,
i
namely from the standard system of representatives for R/hp2 i (see Section 25.2).
We have already seen in Section 4.2 that modular inverses can be computed us-
ing the Extended Euclidean Algorithm. If the modulus is a perfect power pl with
l ∈ N not too small and we use M(n) ∈ O(n log n loglog n), then the above theorem
tells us that using the Extended Euclidean Algorithm to compute an inverse mod-
ulo p and then applying Algorithm 9.10 is asymptotically faster for our two main
applications R = Z and R = F[x], where F is a field, than using the Extended Eu-
clidean Algorithm to compute the inverse modulo pl . This is still true if we have
an asymptotically fast Euclidean Algorithm like the one that will be discussed in
Chapter 11.
Let R be a Euclidean domain, p, f ∈ R, and l ∈ N>0 . Theorem 4.1 implies that

f is invertible modulo pl ⇐⇒ gcd(pl , f ) = 1 ⇐⇒ gcd(p, f ) = 1

⇐⇒ f is invertible modulo p.

(For R = F[x] for a field F and p = x, this is equivalent to f (0) 6= 0, as we have

seen before.) The following corollary says that this is true for arbitrary rings.

C OROLLARY 9.13.
Let R be a ring (commutative, with 1), p ∈ R, and l ∈ N>0 . An element f ∈ R is
invertible modulo pl if and only if it is invertible modulo p.

P ROOF. If f has an inverse modulo p, then Algorithm 9.10 computes an inverse

of f modulo pl . Conversely, if g ∈ R is such that f g ≡ 1 mod pl , then it follows
from l ≥ 1 that f g ≡ 1 mod p. ✷
264 9. Newton iteration

9.2. Generalized Taylor expansion and radix conversion

We now use fast division with remainder to compute quickly the p-adic expan-
sion (or generalized Taylor expansion) of a polynomial a: given a, p ∈ R[x] of
degrees n, m, respectively, where R is a ring and p is monic, we want the unique
a0 , . . . , ak−1 ∈ R[x] such that

a= ∑ ai pi , deg ai < m for 0 ≤ i < k, (4)

0≤i<k

where k = ⌊n/m⌋ + 1 (see Section 5.11). This will be used in the integration algo-
rithm of Chapter 22; the reader may skip the present section at first reading.
A special case is the usual coefficient sequence when p = x, or, more generally,
the Taylor expansion of a around u for p = x − u (Section 5.6). We have seen in
Section 5.11 that the p-adic expansion can be computed using O(n2 ) operations
in R, and we will now see how to do this in softly linear time.

A LGORITHM 9.14 Generalized Taylor expansion.

Input: a, p ∈ R[x] with deg p = m and deg a < km, where R is a ring (commutative,
with 1), p is monic, and k ∈ N≥1 is a power of 2.
Output: a0 , . . . , ak−1 ∈ R[x] such that (4) holds.

1. if k = 1 then return a0 = a

2. t ←− k/2
call the repeated squaring algorithm 4.8 to compute pt ∈ R[x]

3. q ←− a quo pt , r ←− a rem pt

4. call the algorithm recursively to compute the generalized Taylor expansions

r = ∑0≤i<t ai pi and q = ∑0≤i<t at+i pi

5. return a0 , . . . , ak−1

T HEOREM 9.15.
Algorithm 9.14 correctly computes the p-adic expansion of a and uses at most
(3M(km) + O(km)) log k or O(M(km) log k) operations in R.

P ROOF. Correctness is clear when k = 1. If k > 1, then

a = qpt + r = ∑ ai+t pi pt + ∑ ai pi = ∑ ai pi .
0≤i<t 0≤i<t 0≤i<k

Let T (k) denote the cost of the algorithm. Step 1 is for free, and hence T (1) = 0.
By treating leading coefficients in polynomial multiplication separately, we have
9.3. Formal derivatives and Taylor expansion 265

M(n + 1) ≤ M(n) + 5n if n ≥ 1 (Exercise 8.34). Thus the cost for step 2 is

km
M(m + 1) + M(2m + 1) + · · · + M +1
4
km km
≤ M m + 2m + · · · + + 5 m + 2m + · · · +
4 4
km 5 1
≤M + km ∈ M(km) + O(km).
2 2 2
The cost for step 3 is at most 5M(km/2) + O(km) or 52 M(km) + O(km), by Theo-
rem 9.6, and step 4 costs 2T (k/2). Thus we have the recursive inequality T (k) ≤
2T (k/2) + 3M(km) + ckm, for some constant c ∈ R, and Lemma 8.2 implies that
T (k) is at most (3M(km) + O(km)) log k. ✷

One may also take k = ⌊(deg f )/m⌋ + 1, not necessarily a power of 2, and
t = ⌊k/2⌋ in step 2. The effect is a more “balanced” binary splitting in steps 3
and 4, possibly resulting in a slightly faster algorithm, but the analysis is more
involved. If k is a power of 2, however, one may precompute p2 , p4 , . . . , pk/4 , pk/2 ,
thus performing repeated squaring only once instead of every time the algorithm
passes through step 2 in the recursive process.
Exercise 9.20 shows that the “reverse” task of computing the coefficients of a
from its p-adic expansion (4) can also be done in time O(M(mk) log k).

C OROLLARY 9.16.
Let n ∈ N be a power of 2. The Taylor expansion of a polynomial a ∈ R[x] of
degree n around u ∈ R can be computed using at most (3M(n) + O(n)) log n or
O(M(n) log n) operations in R.

The analog of Algorithm 9.14 for integers can be used to convert an integer
from the 264 -ary representation to an expansion with respect to the powers of an
arbitrary base p ∈ N>1 in softly linear time. The following theorem about this
radix conversion is proven in Exercise 9.21.

T HEOREM 9.17.
Given a, p ∈ N with p of length m and a of length at most km for some k, m ∈ N,
we can compute the p-adic expansion of a using O(M(km) log k) word operations.

9.3. Formal derivatives and Taylor expansion

In the next section, we solve the problem of finding a root g in a ring R of ϕ(g) ≡ 0
mod pl , where ϕ ∈ R[y] and p ∈ R. As in the case of inversion modulo pl , New-
ton iteration will lead to a fast algorithm—provided we have a starting solution
266 9. Newton iteration

modulo p. But first we need to adapt some well known tools from calculus to our
purely algebraic setting.

D EFINITION 9.18. Let R be an arbitrary ring (commutative, with 1). For ϕ =

∑0≤i≤n ϕi yi ∈ R[y] we define the formal derivative of ϕ by

ϕ′ = ∑ iϕi yi−1 .
0≤i≤n

For R = R, this is the familiar notion usually defined by a limit process. But
in general, say over a finite field, there is no concept like a “limit”. We note that
i plays two different roles here: as a summation index, where it is really just a
convenient notation for the vector (ϕ0 , . . . , ϕn ) ∈ Rn+1 of coefficients, and the ring
element i = 1 + · · · + 1 ∈ R.
The formal derivative has some familiar properties.

L EMMA 9.19. (i) ′ is R-linear,

(ii) ′ satisfies the Leibniz (or product) rule (ϕψ )′ = ϕ′ ψ + ψ ′ ϕ,

(iii) ′ satisfies the chain rule (ϕ(ψ ))′ = ϕ′ (ψ )ψ ′ .

P ROOF. (i) Let ϕ, ψ ∈ R[y], ϕ = ∑0≤i≤n ϕi yi , ψ = ∑0≤i≤n ψi yi , and a, b ∈ R. Then

(aϕ + bψ )′ = ( ∑ (aϕi + bψi )yi )′ = ∑ i(aϕi + bψi )yi−1

0≤i≤n 0≤i≤n

=a ∑ iϕi yi−1 + b ∑ iψi yi−1 = aϕ′ + bψ ′ .

0≤i≤n 0≤i≤n

(ii) Because of linearity, it is enough to show the claim for powers of y. So let
n, m ∈ N.

(yn ym )′ = (yn+m )′ = (n + m)yn+m−1 = nyn−1 ym + mym−1 yn

= (yn )′ ym + (ym )′ yn .

(iii) Again, it is sufficient to show the claim for ϕ being a power of y, ϕ = yn for
n ∈ N say. But then the claim reduces to (ψ n )′ = nψ n−1 ψ ′ , which is easily proven
using the Leibniz rule and induction on n. ✷

We note one difference from the usual derivatives, say over R. Over F p (or,
more generally, any field of characteristic p > 0) any pth derivative is zero. For
example, ϕ′′ = 0 for all ϕ ∈ F2 [y].
9.4. Solving polynomial equations via Newton iteration 267

L EMMA 9.20. Let ϕ ∈ R[y] and g ∈ R. Then

ϕ = ϕ(g) + ϕ′ (g)(y − g) + ψ · (y − g)2
for some ψ ∈ R[y].

P ROOF. We have seen in Section 5.6 that ϕ has the Taylor expansion
ϕ= ∑ ϕi · (y − g)i = ϕ0 + ϕ1 · (y − g) + ψ · (y − g)2 ,
0≤i≤n

around g, with unique ϕi ∈ R and ψ = ∑2≤i≤n ϕi (y − g)i−2 ∈ R[y]. Substituting

y = g yields ϕ0 = ϕ(g). Now we take derivatives:
ϕ′ = ϕ1 + ψ ′ · (y − g)2 + ψ · 2(y − g).
Again substituting y = g yields ϕ1 = ϕ′ (g), and the claim follows. ✷

This is the Taylor expansion of order 2 of ϕ around g. In fact, we can even

consider g to be a new indeterminate, and then the lemma is true with ψ ∈ R[y, g].

9.4. Solving polynomial equations via Newton iteration

Now we are ready to state the main result for the Newton iteration algorithm. Let
R be an arbitrary ring and p ∈ R. The algorithm corresponding to (5) below com-
putes better and better approximations, that is, modulo higher and higher powers
m of p, to a root of a polynomial, just as the Newton iteration (3) does over the
real numbers.

L EMMA 9.21 (Quadratic convergence of Newton iteration). Let m ∈ R, ϕ ∈ R[y],

and g, h ∈ R with ϕ(g) ≡ 0 mod m and ϕ′ (g) invertible modulo m. Suppose that
Newton’s formula holds “approximately”:
h ≡ g − ϕ(g)ϕ′ (g)−1 mod m2 . (5)
Then ϕ(h) ≡ 0 mod m2 , h ≡ g mod m, and ϕ′ (h) is invertible modulo m2 .

Intuitively, if g is a “good” approximation to a zero of ϕ, then h is a “better”

approximation, at least “twice as accurate”.

P ROOF. By Corollary 9.13, ϕ′ (g) is invertible modulo m2 , and hence the right
hand side of (5) is well defined; Algorithm 9.10 computes ϕ′ (g)−1 mod m2 given
ϕ′ (g)−1 mod m. Since m | m2 , the congruence (5) also holds modulo m, and
h ≡ g − ϕ(g)ϕ′ (g)−1 ≡ g mod m
because ϕ(g) vanishes modulo m. This proves the second assertion.
268 9. Newton iteration

For the first one, we make use of the Taylor expansion given by Lemma 9.20 of
ϕ around g and substitute h for y:

ϕ(h) = ϕ(g) + ϕ′ (g)(h − g) + ψ (h) · (h − g)2

≡ ϕ(g) + ϕ′ (g)(h − g) ≡ ϕ(g) + ϕ′ (g) · (−ϕ(g)ϕ′ (g)−1 ) ≡ 0 mod m2 .

Here, we use the fact that m2 divides (−g)2 , by the second assertion.
Since h ≡ g mod m, we have ψ (h) ≡ ψ (g) mod m for any ψ ∈ R[y], in particular
for ψ = ϕ′ . This is just a special case of a general principle: since the reduction
map modulo m is a ring homomorphism, it commutes with the ring operations +
and ·, and hence with any polynomial over R. Now Corollary 9.13 proves the last
claim. ✷

For a prime element m in a Euclidean domain R, the condition that ϕ′ (g) be

invertible modulo m is equivalent to ϕ′ (g) 6≡ 0 mod m.
Note the similarities of the p-adic Newton iteration below to the Newton itera-
tion for inversion (Algorithm 9.10).

A LGORITHM 9.22 p-adic Newton iteration.

Input: ϕ ∈ R[y], where R is a ring (commutative, with 1), p ∈ R, l ∈ N>0 , g0 ∈ R a
starting solution, with ϕ(g0 ) ≡ 0 mod p and ϕ′ (g0 ) invertible modulo p, and a
modular inverse s0 of ϕ′ (g0 ) modulo p.
Output: g ∈ R with ϕ(g) ≡ 0 mod pl and g ≡ g0 mod p.

1. r ←− ⌈log l⌉

2. for i = 1, . . . , r − 1 compute gi , si ∈ R such that

i i
gi ≡ gi−1 − ϕ(gi−1 )si−1 mod p2 , si ≡ 2si−1 − ϕ′ (gi )s2i−1 mod p2

{ The second computation is the ith execution of step 2 in the Newton iter-
ation of Algorithm 9.10 for the inversion of ϕ′ (gi ). }

3. compute g ∈ R with g ≡ gr−1 − ϕ(gr−1 )sr−1 mod pl

return g

T HEOREM 9.23.
Algorithm 9.22 works correctly.

r
P ROOF. Let gr ≡ gr−1 − ϕ(gr−1 )sr−1 mod p2 . Then g ≡ gr mod pl , and it is suf-
ficient to show the invariants
i i
gi ≡ g0 mod p, ϕ(gi ) ≡ 0 mod p2 , si ≡ ϕ′ (gi )−1 mod p2 if i < r
9.4. Solving polynomial equations via Newton iteration 269

for 0 ≤ i ≤ r by induction on i. The case i = 0 is clear, and we assume that

i−1
i > 0. Then the induction hypotheses imply that p2 divides both ϕ(gi−1 ) and
i
si−1 − ϕ′ (gi−1 )−1 , and hence p2 divides their product. Thus
i
gi ≡ gi−1 − ϕ(gi−1 )si−1 ≡ gi−1 − ϕ(gi−1 )ϕ′ (gi−1 )−1 mod p2 ,
and the first two invariants follow from Lemma 9.21 with g = gi−1 , h = gi , and
i−1 i−1
m = p2 . If i < r, then gi ≡ gi−1 mod p2 implies that
i−1
ϕ′ (gi )−1 ≡ ϕ′ (gi−1 )−1 ≡ si−1 mod p2 .
Now si is obtained by one Newton step for inversion, as in Algorithm 9.3, and the
third invariant follows. ✷

When R = Z or R = F[x] for a field F and p ∈ R is prime or irreducible, respec-

tively, then finding a starting solution means computing a root of a polynomial in
the field K = R/hpi. We discuss this problem when K is finite or K = Q in Part III
of this book.

E XAMPLE 9.24. (i) We take R = Z and p = 5, and determine a nontrivial so-

lution (different from ±1) of the equation g4 ≡ 1 mod 625, so that ϕ = y4 − 1.
For a starting solution we can use g0 = 2, since ϕ(2) ≡ 0 mod 5 (for example,
by Fermat’s little theorem) and ϕ′ (2) = 4 · 23 ≡ 2 6≡ 0 mod 5. Thus s0 ≡ 2−1 ≡
3 mod 5,
g1 ≡ g0 − ϕ(g0 )s0 = 2 − 15 · 3 ≡ 7 mod 25,
s1 ≡ 2s0 − ϕ′ (g1 )s20 = 2 · 3 − 1 372 · 32 ≡ 8 mod 25,
g ≡ g1 − ϕ(g1 )s1 ≡ 7 − 2 400 · 8 ≡ 182 mod 625,
and indeed 1824 = 1 + 1 755 519 · 625.
(ii) We take R = F3 [x] and p = x, and determine a square root g of the polynomial
f = x + 1 modulo x4 that satisfies g(0) = −1. Here ϕ = y2 − f ∈ F3 [x][y], and
g0 = −1 can serve as a starting solution, since g0 (0) = −1, ϕ(g0 ) = −x ≡ 0 mod x,
and ϕ′ (g0 ) = 2g0 = 1 6≡ 0 mod x. Thus s0 = 1,
g1 ≡ g0 − ϕ(g0 )s0 = −1 − (−x) · 1 = x − 1 mod x2 ,
s1 ≡ 2s0 − ϕ′ (g1 )s20 = 2 · 1 − 2(x − 1) · 12 = x + 1,
g ≡ g1 − ϕ(g1 )s1 = x − 1 − x2 (x + 1) = −x3 − x2 + x − 1 mod x4 ,
and a calculation shows (−x3 − x2 + x − 1)2 = (x + 1) + x4 (x2 − x − 1). ✸

T HEOREM 9.25.
When R = D[x] for a ring D (commutative, with 1), p = x, g0 ∈ D, l ∈ N is a
power of 2, and ϕ ∈ R[y] with degy ϕ = n and degx ϕ < l , then Algorithm 9.22
takes (3n + 3/2)M(l) + O(nl) operations in D.
270 9. Newton iteration

i
P ROOF. Reducing modulo x2 where possible, we may assume that the degrees of
si and gi are less than 2i for all i. At first, we compute ϕ′ , taking nl = n2r operations
i
in D. In step 2, we compute ϕ(gi−1 ) and ϕ′ (gi ) modulo x2 using Horner’s rule, at
i
a total cost of 2n − 1 multiplications and the same number of additions modulo x2 ,
or (2n − 1)(M(2i ) + 2i ) operations in D. Computing gi from gi−1 , si−1 , and ϕ(gi−1 )
can be done using at most M(2i−1 ) + 2i−1 ≤ 12 M(2i ) + 2i−1 operations in D: since
the lower part of ϕ(gi−1 ) is zero, we only need to multiply its upper part by si−1 and
take the negative of the lower part of the result as the upper part of gi . Similarly,
computing si from si−1 and ϕ′ (gi ) takes M(2i ) + M(2i−1 ) + 2i−1 ≤ 32 M(2i ) + 2i−1
operations, as in the proof of Theorem 9.4. Thus the cost for the ith iteration of
step 2 is at most (2n + 1)M(2i ) + 2n · 2i operations in D, and similarly we have
(n + 1/2)(M(2r ) + 2r ) operations in step 3. Now

i i r r
∑ (2n + 1)M(2 ) + 2n · 2 ≤ (2n + 1)M(2 ) + 2n · 2 ∑ 2i−r
1≤i<r 1≤i<r
≤ (2n + 1)M(2r ) + 2n · 2r ,

and the total cost is at most

3 1 3 1
(3n + )M(2r ) + (4n + )2r = (3n + )M(l) + (4n + )l. ✷
2 2 2 2

The more general statement when p is an arbitrary polynomial is discussed in

Exercise 9.31, and Exercise 9.32 asks for a proof of the following integer analog.

T HEOREM 9.26.
When R = Z, 0 ≤ g0 < p, and ϕ has degree n and coefficients absolutely less
than pl , then Algorithm 9.22 takes O(n M(l log p)) word operations.

When calculating by hand, one may perform all computations in Algorithm 9.22
in the p-adic representation, since then reductions modulo powers of p are for free.
One question that did not come up with the Newton iteration algorithm for in-
version is that of uniqueness of the solution. Inverses modulo pl are unique, but
solutions of an arbitrary polynomial equation ϕ(y) = 0 modulo pl generally are
not, because there may already be several solutions modulo p. The following the-
orem implies that for any l ∈ N>0 , every starting solution gives rise to exactly one
solution modulo pl , so that there are as many solutions modulo pl as there are
modulo p (with nonvanishing ϕ′ ).

T HEOREM 9.27 Uniqueness of Newton iteration.

Let ϕ ∈ R[y], g ∈ R with ϕ(g) ≡ 0 mod p and ϕ′ (g) invertible modulo p a starting
solution, and l ∈ N>0 . If h, h∗ ∈ R are solutions modulo pl with h ≡ g ≡ h∗ mod p
and ϕ(h) ≡ 0 ≡ ϕ(h∗ ) mod pl , then h ≡ h∗ mod pl .
9.5. Computing integer roots 271

P ROOF. Again, we make use of the Taylor expansion of ϕ and get

ϕ(h∗ ) = ϕ(h) + ϕ′ (h)(h∗ − h) + c · (h∗ − h)2

for some c ∈ R, or equivalently

ϕ(h∗ ) − ϕ(h) = (h∗ − h)(ϕ′ (h) + c · (h∗ − h)). (6)

Now

ϕ′ (h) + c · (h∗ − h) ≡ ϕ′ (h) mod p (since h∗ − h ≡ 0 mod p)

≡ ϕ′ (g) mod p (since h ≡ g mod p).

By Corollary 9.13, there exist some s,t ∈ R such that s · (ϕ′ (h) + c · (h∗ − h)) =
1 + t pl , and (6) implies that

s · (ϕ(h∗ ) − ϕ(h)) − t pl (h∗ − h) = h∗ − h.

The left hand side of this equation vanishes modulo pl , and the claim follows. ✷

The conclusion of Theorem 9.27 need no longer be true if g violates the second
condition for a starting solution, namely if ϕ′ (g) is not invertible modulo p. For
example, the equation y4 = 0 has only one solution g ≡ 0 modulo 5, but five so-
lutions h ≡ 0, 5, 10, 15, 20 modulo 25 that are all congruent to 0 modulo 5. Here
ϕ = y4 and ϕ′ (0) ≡ 0 mod 5, so that 0 is not a proper starting solution.
We will meet Newton iteration again in Chapter 15, under the name of Hensel
lifting, which is used for (approximate) factorizations of polynomials.

9.5. Computing integer roots

In this section, we illustrate the prime power modular approach for solving prob-
lems over the integers with the example of computing roots of positive integers.
We have introduced the concept in Chapter 5, but only now have the tool to fill in
the details: Newton iteration.
Suppose that we are given positive a, n ∈ N and want to decide whether a is
√
an nth power of an integer, and if so, we want to compute n a ∈ N. We might
use numerical Newton iteration, applied to the equation yn − a = 0, stop when
the precision is sufficiently high, and then round to the nearest integer. A different
approach is p-adic Newton iteration, where the precision management is somewhat
easier.
For simplicity, we assume that n is odd (see Exercise 9.43 for square roots) and
a is odd, by extracting the largest power of 2 dividing a and computing its nth root
separately; if it does not have one, then a does not either. Then 1n − a ≡ 0 mod 2,
272 9. Newton iteration

and g0 = 1 is a valid starting solution for the 2-adic Newton iteration to solve
ϕ(y) = yn − a = 0, as in Algorithm 9.22, since ϕ′ (1) = n · 1n−1 ≡ 1 6≡ 0 mod 2.
We choose k ∈ N minimal such that 2nk > a, and after r = ⌈log k⌉ steps Algorithm
9.22 has computed g ∈ N with ϕ(g) = gn − a ≡ 0 mod 2k . If now gn = a in Z,
√
then g = n a. Otherwise, we claim that a is not an nth power in Z. To see why,
we assume that we have b ∈ N with bn = a. Then b is odd, b ≡ g0 ≡ g mod 2,
0 ≤ b < 2k , and
ϕ(b) = bn − a = 0 ≡ gn − a = ϕ(g) mod 2k .
Now the uniqueness of Newton iteration (Theorem 9.27) yields b ≡ g mod 2k , and
since both sides are nonnegative and less than 2k , they are equal.
In order to save computing time, we set t0 = 1 in step 1 and additionally compute
i+1
ti = gin−1 rem 22 in step 2. In the ith iteration of step 2 in Algorithm 9.22, we
then calculate
i
gi ≡ gi−1 − ϕ(gi−1 )si−1 ≡ gi−1 − (gi−1ti−1 − a)si−1 mod 22
i
with two multiplications and two additions modulo 22 . Then we compute ti and
i
si ≡ 2si−1 − ϕ′ (gi )s2i−1 ≡ 2si−1 − nti s2i−1 mod 22 ,
i
taking three multiplications and two additions modulo 22 . For the computation
of ti , we use repeated squaring (Section 4.3), at a cost of at most 2 log n mul-
i+1
tiplications modulo 22 . Thus the total cost for the ith iteration of step 2 is
O(M(2i ) log n) word operations. In the 264 -ary representation, reduction modulo
i
22 is essentially free.

T HEOREM 9.28.
Let a, n ∈ N be odd, a < 2l , and 3 ≤ n < l . Then the above algorithm either
√
computes the unique positive integer n a ∈ N, or certifies that a is not an nth power
in Z, using O(M(l)) word operations.

P ROOF. Correctness is clear from the above discussion. Let c ∈ R>0 such that the
cost for the ith iteration of step 2 of the Newton iteration algorithm 9.22 is at most
c M(2i ) log n word operations and the cost for step 3 is at most c M(k) log n. With
r = ⌈log k⌉ as above, the total cost is no more than

i i
c log n M(k) + ∑ M(2 ) ≤ c log n M(k) + M ∑ 2
1≤i<r 1≤i<r

≤ c log n(M(k) + 2M(2r−1 )) ≤ 3c M(k) log n.

Now 2n(k−1) ≤ a < 2l , by the minimality of k, and hence k − 1 < l/n and M(k) ∈
O(M(⌊l/n⌋)). Finally, we check whether gn = a using repeated squaring, at a cost
of O(M(l)) (Exercise 9.39), which dominates the cost for the other steps. ✷
9.6. Newton iteration, Julia sets, and fractals 273

√
E XAMPLE 9.29. Let us compute 3 2197. We may choose k = 4, since 23·4 =
212 = 4 096 > 2197. Now g0 = s0 = t0 = 1 and

g1 ≡ g0 − (t0 g0 − 2197)s0 ≡ 1 − (1 − 1) ≡ 1 mod 4,

t1 ≡ g21 = 1 mod 16, s1 ≡ 2s0 − 3t1 s20 = 2 − 3 ≡ 3 mod 4,
g ≡ g1 − (t1 g1 − 2197)s1 = 1 − (1 − 5) · 3 ≡ 13 mod 16,

and in fact 2197 = 133 . ✸

In Exercise 9.44, we use this to test whether a ∈ N>1 is a perfect power, so that
a = cn for integers c, n > 1. In Sections 14.4 and 15.6, we discuss algorithms for
computing integer roots of arbitrary polynomials.

9.6. Newton iteration, Julia sets, and fractals

This section gives a general framework for Newton iteration, and presents some
of the similarities and differences with the method over the real numbers. These
results are not needed later. The real (or complex) iteration makes basic use of the
notion of convergence. This can be carried over to numbers and polynomials, by
saying that two elements are close together if their difference is divided by a large
power of our prime p.

D EFINITION 9.30. A valuation on an integral domain R is a map v: R −→ R

which is multiplicative, subadditive, and positive definite, so that it satisfies for
all a, b ∈ R

(i) v(ab) = v(a)v(b),

(ii) v(a + b) ≤ v(a) + v(b),

(iii) v(a) ≥ 0 and v(a) = 0 if and only if a = 0.

The following are some commonly used valuations on integers and polynomials.
A ring may, of course, have more than one valuation.

E XAMPLE 9.31. (i) Let R = Z, and v(a) = |a|, the absolute value.
(ii) With R = Z and p prime, let

0, if a = 0,
v p (a) =
p−n , if pn | a and pn+1 ∤ a.

This is the p-adic valuation.

274 9. Newton iteration

(iii) With R = F[x], F a field, let

0, if a = 0,
v(a) = (7)
2 , if xn | a and xn+1 ∤ a.
−n

This is the x-adic valuation. Similarly, if p ∈ F[x] is irreducible, we get the p-adic
valuation if x is replaced by p in (7).
(iv) With R = F[x], F a field, let

0, if a = 0,
v(a) =
2deg(a) , if a 6= 0.

This is the degree valuation. ✸

A similar p-adic valuation can be defined for any prime p in a UFD.

The notion of being “close together” can be expressed in terms of a valuation.
In the polynomial case with valuation (iii), we use the intuition that a polynomial
a is small if xn | a for some large n. Two polynomials a, b are close if their distance
d(a, b) = v(a − b) is small.

1 1
E XAMPLE 9.32. v3 (54) = 3−3 = , v3 (55) = 1, v3 (54 000 000) = .✸
27 27

D EFINITION 9.33. A non-Archimedean valuation is a valuation with the prop-

erty of subadditivity replaced by the stronger condition that for all a, b ∈ R

v(a + b) ≤ max{v(a), v(b)}.

This is the ultrametric inequality.

The p-adic valuation for integers is non-Archimedean, while the absolute value
for integers is Archimedean. The Newton iteration of Algorithm 9.22 for solving
polynomial equations approximately can be carried over to any non-Archimedean
valuation. Lemma 9.21 reads as follows in this generality.

L EMMA 9.34. Let v be a non-Archimedean valuation on R, with v(a) ≤ 1 for all

a ∈ R, ϕ ∈ R[y], 0 < ε < 1, and g, h ∈ R with v(ϕ(g)) ≤ ε and v(ϕ′ (g)) = 1, and
suppose that
v(h − (g − ϕ(g)/ϕ′ (g))) ≤ ε2 .

Then v(ϕ(h)) ≤ ε2 , v(h − g) ≤ ε, and v(ϕ′ (h)) = 1.

9.6. Newton iteration, Julia sets, and fractals 275

P ROOF. We only show the first two bounds, using the Taylor expansion (Lem-
ma 9.20) of ϕ around g:

ϕ(g) ϕ(g) ϕ(g) ϕ(g)
v(h − g) = v h − g + ′ − ≤ max v h − g + ′ ,v ′
ϕ (g) ϕ′ (g) ϕ (g) ϕ (g)
= max {ε2 , ε} = ε,
v(ϕ(h)) = v(ϕ(g) + ϕ′ (g)(h − g) + ψ (h − g) · (h − g)2 )
ϕ(g) ϕ(g)
= v ϕ(g) − ϕ′ (g) ′ + ϕ′ (g) h − g + ′ + ψ (h − g) · (h − g)2
ϕ (g) ϕ (g)

ϕ(g)
≤ max v(ϕ′ (g)) · v h − g + ′ , v(ψ (h − g)) · v(h − g)2
ϕ (g)
≤ max {1 · ε2 , 1 · ε2 } = ε2 . ✷

The division by ϕ′ (g) in the above formulas leads, in principle, out of the ring R.
There are three ways of dealing with this problem: we can replace ϕ′ (g)−1 by a
sufficiently good approximation in R, as computed by Algorithm 9.10, or we can
extend v to the field of fractions of R, by setting v(a/b) = v(a)/v(b) if b 6= 0, or
we can multiply by ϕ′ (g), where necessary, and conclude from v(ϕ′ (g)) = 1 that
the valuations do not change.
Newton iteration in R = Q for solving y2 −2 = 0 and starting √ with g0 = 2 leads to
better and better rational approximations to the root. But 2 itself is not a rational
number; in order to capture such an exact root, the domain has to be enlarged, say
to R.
A similar phenomenon happens with the p-adic valuations on Z or F[x]. One
can enlarge these rings to their completions, namely to the ring Z(p) of p-adic
integers or the ring F[[x]] of formal power series (for the x-adic valuation), and
in these larger rings Newton iteration converges to an exact root. We do not go
into details, since these quantities cannot be represented in a finite manner, and
these rings are mainly of conceptual interest to computer algebra. One can finitely
represent initial segments of them, say a mod pl for a ∈ Z(p) , but that is essentially
the same as some integer modulo pl .
Newton iteration for inversion tells us what the units are in these rings. An
element a = a0 + a1 p + a2 p2 + · · · ∈ Z(p) , with a0 , a1 , a2 , . . . ∈ {0, . . . , p − 1}, is a
unit if and only if a0 mod p is a unit in Z p , that is, if and only if a0 6= 0. The
power series a = a0 + a1 x + a2 x2 + · · · ∈ F[[x]], with a0 , a1 , a2 , . . . ∈ F is a unit in
F[[x]] if and only if a0 6= 0. As an example, 1 − x ∈ F[[x]] is a unit, with inverse
1 + x + x2 + · · ·.
In the real or complex numbers, the behavior of Newton iteration is quite intri-
cate, as seen in the simple case of finding the three roots 1, e2πi/3 , e4πi/3 in C of
ϕ = y3 − 1.
276 9. Newton iteration

F IGURE 9.2: Convergence of Newton iteration to solve y3 = 1 over C.

In Figure 9.2, the three roots are marked by white circles, and the colored areas
indicate convergence to one of them via the Newton iteration. Brightness corre-
sponds to the “convergence speed”: the brighter the color of a point, the earlier
does Newton iteration starting at that point approach its final limit. The intricacy
of the picture illustrates the difficulty of finding a simple rule for telling where
a point will go. We have big bright areas where the limit point is clear, but also
other areas where a small change of the initial value will lead to a different desti-
nation. The points on the real line have nowhere to go but to 1. But the real root
of 16x9 + 51x6 + 21x3 + 2 near −1.43 first crashes into 0 and then explodes . . .
This problem is part of a larger question: given an iteration function gi+1 =
ψ (gi ), determine the behavior for any starting value g0 . For example, for which g0
does this converge at all? The set of all these g0 is called the Julia set of ψ , after
the French mathematician Gaston Julia who first studied it. Such sets are highly
complicated and provide stunning pictures. Their study is mainly a part of dy-
namical systems theory . The beautiful mathematical theory of chaos and fractals
9.6. Newton iteration, Julia sets, and fractals 277

32 = 2 3

33 = 6 0 1

34 = 4 35 = 5

F IGURE 9.3: Representation of Z7 in Figure 9.4.

F IGURE 9.4: Convergence of Newton iteration to solve y3 = 1 over the 7-adic integers.
278 9. Newton iteration

is described and richly illustrated in Mandelbrot (1977) and Peitgen, Jürgens &
Saupe (1992).
In Figure 9.4, we see the analog of Figure 9.2 over Z(7) , the set of 7-adic integers.
The seven elements of Z7 are arranged as in Figure 9.3, and Z(7) can be represented
by the fractal composed of infinitely many recursive compositions of this centered
hexagon. The boundary is a Koch snowflake. The derivative ϕ′ = 3y2 of ϕ =
y3 − 1 vanishes modulo 7 on the white points in the center, and Newton iteration
does not work. All other points converge to the root of f of their color, and brighter
color means faster convergence. The three roots of ϕ in Z(7) , whose sum equals 0,
are 1, 2 + 4 · 7 + 6 · 72 + 3 · 73 + · · ·, and 4 + 2 · 7 + 0 · 72 + 3 · 73 + · · ·. In Z×
7 , at most
one iteration leads to a root modulo 7 and then to convergence modulo higher
powers of 7. However, for large p, it seems to be as tricky as in the complex
situation to determine which elements of Z×p lead to a root modulo p, except that
we have only a finite number of starting points to try. Once a root is reached,
convergence is quick.

9.7. Implementations of fast arithmetic

In this book you can learn some of the fundamental algorithms in computer alge-
bra, with an emphasis on the modern fast methods that one needs to solve large
problems. The enterprising reader might now say to herself: “Ok, so let’s do it.
Let’s build a computer algebra system.” But that is a tall order. Not only does
it require an enormous effort of dozens (or hundreds) of woman-years, but it also
needs many tools that cannot even be mentioned in this book. Some of them be-
come apparent as soon as you start up your favorite system on your own machine.
Nevertheless, we present in this section some notes on implementations, mainly
a case study of two software packages, one designed by Victor Shoup and the
other one by the authors. These packages are rather limited in scope; Shoup’s
basically provides fast arithmetic for polynomials and matrices over finite fields
and the integers, and ours only works over F2 . The fact that even the latter modest
goal requires about 10 000 lines of code indicates the amount of effort necessary
to implement carefully the basic algorithms.
The framework for determining the (arithmetic) cost of our algorithms is asym-
ptotic analysis , with a typical statement like “O(n log n loglog n) operations” for
multiplication. This is a powerful, reliable, and universal tool for comparing algo-
rithms. For any new algorithm, an improvement of the previously known asympto-
tic estimates is sufficient justification of its relevance; however, such improvements
are hard to come by for well-studied problems. This section describes some of the
additional efforts that have to supplement the asymptotic analysis for any practical
implementation.
By its nature, the flavor of this section is rather different from the rest of the
book (except that in Section 15.7 we continue this report on implementations).
9.7. Implementations of fast arithmetic 279

Many of our algorithms are (hopefully) of long-lasting interest, but the computer
timings reported now will be out of date before the book goes to press.
The first lesson in implementing a software package for fast integer or polyno-
mial arithmetic is that a large variety of algorithms have to be coded and tested to
determine the crossover points. These are the input sizes at which one algorithm
beats another one. A typical experience is that, say for multiplication, the classical
method is best for small inputs, Karatsuba’s algorithm takes over for intermediate
sizes, and a fast, for example, an FFT-based method, excels for large problems.
The second lesson is that just casting the algorithms “from the book” into soft-
ware will not work well. One has to understand the algorithmic ideas in depth and
use a multitude of tricks and special relations to make things go at lightning speed.
Only a few of these methods can be explained here; fortunately, there is no limit
to the ingenuity of the programmer (except for having to complete the project in
some reasonable time frame).
Several factors determine whether a software package for (integer or polyno-
mial) arithmetic is fast in practice. Besides choosing the algorithms and deter-
mining the crossover points between various methods, one has to design suitable
data types, exploit fast hardware arithmetic whenever possible, and customize for
specific types and sizes of problems.
Currently, there are—besides implementations in any general purpose computer
algebra system—several libraries available for arbitrary precision integer arithme-
tic and univariate polynomial arithmetic over finite fields, Z, Q, algebraic number
fields, R, and C (among others GNU MP, PARI, L IDIA), but only few that imple-
ment the fast algorithms presented in Chapters 8 through 11. Among them are L IP
by Arjen Lenstra and Paul Leyland, the package of Schönhage, Grotefeld & Vetter
(1994) (see also Reischert 1995), N TL by Shoup (an early version is described in
Shoup 1995), and B I P OL A R (Binary Polynomial Arithmetic) by von zur Gathen
& Gerhard (1996). The last two of these will be described below.
The C++ library B I P OL A R was designed and optimized for univariate polyno-
mial factorization over F2 . This is a very narrow focus, but we use it to explain
some general principles. The first question when writing the package was the
choice of data types. When programming on top of an existing package, one may
not have much choice. Experience has shown that for high performance code, one
should represent algebraic data as compactly as possible, since all linear operations
like addition or copying take time proportional to the length of the representation
(that is, the number of machine words it occupies in memory). Thus, on a ma-
chine with a word size of 32 bit, we represent polynomials over F2 as arrays of 32
bit words; each word contains 32 consecutive coefficients. In this representation,
all linear-time operations are straightforward to implement, and the next task is to
tackle the nontrivial arithmetic operations, starting with multiplication.
We have five methods at our disposal:
280 9. Newton iteration

◦ table lookup,
◦ classical multiplication,
◦ Karatsuba’s algorithm,
◦ an algorithm by Cantor (1989),
◦ FFT-based algorithms.
We did not experiment with the last one. As explained above, each method typ-
ically has its range of input sizes where it beats the other methods. One has to
implement many variations of these approaches and test them to determine the
best one for each range, starting with the small ranges. A typical outcome then
is a hybrid algorithm where one performs, say, first a few Karatsuba steps and
then classical multiplication on small arguments. As an example, for single pre-
cision polynomials of degree less than 32 we found the following to work best:
two stages of Karatsuba’s algorithm plus table lookup for the 9 resulting multi-
plications at degrees less than 8; the size of the table is 28 · 28 · 16 bits or 128
kilobytes. (Unfortunately, there is no hardware support for multiplication in F2 [x]
on general purpose microprocessors—there is no possibility to sever the “carry”
line—and one has to implement the single precision multiplication in software.)
On top of this, both the classical algorithm 2.3 and Karatsuba’s algorithm 8.1 are
implemented at machine word level, that is, with base x32 instead of x; the block
sizes are multiples of 32 and recursion in the latter algorithm stops as soon as the
polynomials are of degree less than 32.
We also implemented an algorithm by Cantor (1989) for multiplication in F2 [x],
which uses evaluation and interpolation at linear subspaces of F2m for some m ∈ N
and is similar to the FFT based methods from Chapter 8. Its running time is
O(n(log n)1.59 ) arithmetic operations in F2 . For practical purposes, we may take
m = 32, so that one element of F2m fits precisely into a machine word. Here the
single precision operations are multiplications in F216 and F232 . Again, we might
have implemented these by doing one polynomial multiplication in the way de-
scribed above and one subsequent division with remainder, but we have chosen
a different approach using tables based on the multiplicative structure of finite
fields, as in Pollard (1971) and Montgomery (1991). We take a fixed generator
g of the multiplicative group F× 216
, and compute two tables for the exponentiation
map {0, . . . , 216 − 2} −→ F× 216
, with a 7−→ ga , and its inverse. Two nonzero el-
ements c, d ∈ F216 are multiplied by determining a, b ∈ {0, . . . , 216 − 2} such that
c = ga and d = gb , and computing cd = ga+b . This amounts to essentially one addi-
tion modulo 216 − 1 and three table lookups; the size of each table is 216 · 16 bits or
128 KB. Inversion in F× 216
is done similarly. One multiplication in F232 is reduced
to three multiplications in F216 à la Karatsuba; this requires a change of basis.
After determining the best (that is, the fastest) routines for single precision arith-
metic, we implemented the three multiplication algorithms mentioned above for
9.7. Implementations of fast arithmetic 281

0.5
classical
Karatsuba
0.4 Cantor
CPU seconds

0.3

0.2

0.1

0
8192 16384 24576 32768 40960 49152 57344 65536
n

F IGURE 9.5: Multiplication of polynomials in F2 [x] of degree n − 1 in B I P OL A R.

multiprecision polynomials. An important aspect here is space considerations:

due to the overhead involved in dynamic storage management, one should try to
limit dynamic allocation whenever possible. When starting with two polynomials
of length at most 2k machine words each, one can implement Karatsuba’s algo-
rithm so as to use at most 2k+2 words of work space. In our implementation, we
pre-allocate this amount of storage and free it again at the end of the computation.
Both the multipoint evaluation and the interpolation in Cantor’s algorithm can be
implemented in place, which means that the output is returned in the space of the
input, and no additional work space (besides a constant number of registers) is
required (the same is true for the FFT; see Figure 8.4). Figure 9.5 gives running
times for the three multiplication algorithms. The experiments in this section ran
in 1998 on a Sparc Ultra 1 with 167 MHz. The timings shown are the average of
10 pseudorandom inputs; the relative standard deviation was less than 10% (except
in some cases where the time was less than 0.01 seconds).
The recursive algorithms work particularly well at degrees just below a power
of 2, and a direct implementation gives a picture with large steps at these powers.
In Figure 9.5 you can see the effect of several tricks that have made the behavior of
Karatsuba’s algorithm quite smooth, and for Cantor’s algorithm have broken the
big step between 215 − 1 and 215 (to 216 − 1) into four smaller steps.
The crossover points between the algorithms are near degree 500 for Karatsuba’s
algorithm, and near degree 33 000 for Cantor’s algorithm (see Figure 9.5). After
determining these, we have built a hybrid multiplication algorithm which compares
282 9. Newton iteration

the degrees of the input polynomials to the two crossover degrees and then decides
which of the three algorithms to use. Its performance can be seen in Table 9.6.

n CPU seconds
512 0.0004
1024 0.0006
2048 0.0014
4096 0.0038
8192 0.0110
16 384 0.0329
32 768 0.0971
65 536 0.2135
131 072 0.4666
262 144 1.0218
524 288 2.2330
1 048 576 4.9560

TABLE 9.6: Multiplication of polynomials in F2 [x] of degree n − 1 in B I P OL A R using the

hybrid algorithm.

We have to stress that here (and in all our implementation discussions) the tim-
ings and crossover points depend on our efforts and on our computing environ-
ment. We expect our software to perform quite well on other similar processors,
but for example to use the power of true 64 bit machines one would have to start
all over again—at least for the single precision arithmetic. The one universal truth
is that a well-done implementation is very labor-intensive and requires close fa-
miliarity with the algorithmics.
For division with remainder in F2 [x], we first wrote single precision routines for
both the classical algorithm and Newton inversion, working at the bit level. On top
of these, we implemented multiprecision versions working at machine word level
(Algorithm 2.5 with base x32 instead of x, and Algorithm 9.3 followed by essen-
tially two polynomial multiplications). The asymptotically fast division algorithm
uses the hybrid multiplication algorithm as a subroutine. Figure 9.7 shows some
experiments; the crossover point between the two algorithms (the top two curves)
is near degree 10 000.
In some applications, in particular in modular arithmetic and polynomial fac-
torization, many remainder computations modulo a fixed divisor f ∈ F2 [x] have to
be performed. In that case, rev( f )−1 mod xdeg f can be precomputed using Algo-
rithm 9.3 and stored, and then one remainder computation amounts to essentially
two polynomial multiplications of degree about deg f . When counting only the
latter, the crossover point drops to about 4000 (Figure 9.7). Further optimization
is possible when deg f is above the crossover degree for Cantor multiplication,
which reduces the time for one remainder computation modulo f to essentially the
time for a polynomial multiplication of the same size. Similar optimizations are
9.7. Implementations of fast arithmetic 283

0.5

0.4
CPU seconds

0.3

0.2

0.1 classical
Newton
Newton with precomputation
0
8192 16384 24576 32768 40960 49152 57344 65536
n

F IGURE 9.7: Division of a polynomial in F2 [x] of degree 2n − 2 by a polynomial of degree

n − 1 with remainder in B I P OL A R.

possible when many modular multiplications gh rem f with both f and h fixed are
performed.
B I P OL A R also implements the Extended Euclidean Algorithm and polynomial
factorization routines for F2 [x]; the latter will be discussed in Section 15.7.
The integer arithmetic of Shoup’s N TL is highly optimized. On a processor
with a word length of 32 bits, arbitrary precision integers are represented as arrays
of machine words, where—depending on the underlying hardware—between 26
and 30 consecutive bits of the binary representation are packed into one machine
word. Multiplication and division of such single precision integers is done by
cleverly employing the hardware floating point arithmetic, which in most currently
available microprocessors is considerably faster than hardware integer arithmetic.
N TL uses classical integer multiplication for integers of size up to about 500
bits, and Karatsuba’s algorithm for larger integers. Other arithmetic operations
like division with remainder and the Extended Euclidean Algorithm are all done
in the classical way.
We have implemented the classical algorithm 2.4, Karatsuba’s algorithm 8.1,
the three primes FFT algorithm 8.25, and the algorithm of Schönhage & Strassen
(1971) for integer multiplication using low-level routines of N TL version 1.5. Fig-
ure 9.8 gives running times for our implementations and the built-in routine of
N TL. We have not invested much effort into optimizing our routines, and the tim-
ings of our Karatsuba implementation are at most twice as large as those of N TL’s
routine. The graphs for the algorithms which are not FFT-based are quite smooth,
284 9. Newton iteration

3
classical
Karatsuba
2.5 NTL builtin
three primes FFT
Schönhage & Strassen FFT
2
CPU seconds

1.5

0.5

0
50000 100000 150000 200000
k

F IGURE 9.8: Multiplication of k-bit integers in NTL.

20
classical
Karatsuba
modular FFT
15 Fermat number FFT
CPU seconds

0
128 256 384 512 640 768 896 1024
n

F IGURE 9.9: Multiplication of polynomials of degree n − 1 with n bit integer coefficients

in NTL.
9.7. Implementations of fast arithmetic 285

20
classical
Karatsuba
modular FFT
15 Fermat number FFT
CPU seconds

0
4096 8192 12288 16384 20480 24576 28672 32768
n

F IGURE 9.10: Multiplication of polynomials of degree n − 1 with 64 bit integer coeffi-

cients in NTL.

20
classical
Karatsuba
modular FFT
15 Fermat number FFT
CPU seconds

0
1024 2048 3072 4096 5120 6144 7168 8192
k

F IGURE 9.11: Multiplication of polynomials of degree 63 with k bit integer coefficients in

NTL.
286 9. Newton iteration

while we have large steps near powers of 2 for the three primes FFT and Schön-
hage & Strassen’s algorithm. These steps may be smoothed with some additional
effort, but we have not tried this.
For multiplying polynomials over Z and Zm with m ∈ Z, N TL implements the
classical algorithm for small degrees and coefficient sizes, Karatsuba’s algorithm
for polynomials of medium degree, and the FFT-based modular approach de-
scribed in Section 8.4 and a variant of Algorithm 8.20 using FFT modulo Fermat
numbers (Exercise 8.36) for larger polynomials. Figures 9.9 through 9.11 show
running times for various degrees and coefficient sizes in N TL. For division with
remainder, N TL uses the classical algorithm for polynomials of small degree, and
Newton inversion (Algorithm 9.3) for higher degree polynomials.
Besides basic arithmetic for multiprecision integers, floating point numbers, fi-
nite fields, and univariate polynomials and matrices over these domains, the recent
version 3.1 of N TL includes routines for primality testing (Chapter 18), Chinese
remaindering (Chapters 5 and 10), computing greatest common divisors (Chap-
ters 3 and 6), factorization of univariate polynomials (Part III), computing reduced
bases in lattices over Z (Chapter 16), and much more. The polynomial factoriza-
tion routines will be discussed in Section 15.7. N TL is a C++ library and can be
downloaded from Victor Shoup’s homepage http://www.shoup.net. We rec-
ommend this package to anybody who is not ready to reinvent the wheel.

Notes. 9.1. Cook (1966) devised a division algorithm for integers that costs the same
number of word operations as a multiplication, up to a constant factor. Sieveking (1972),
Strassen (1973a), Kung (1974), and Borodin & Moenck (1974) gave analogous algorithms
for polynomials. For the details of the division method for integers, see Knuth (1998),
Algorithm 4.3.3 R, and Aho, Hopcroft & Ullman (1974), §8.2. In the nonscalar model,
Schönhage has shown how to divide a polynomial of degree at most 2n by one of degree n
with 5.875n multiplications and divisions (see Kalorkoti 1993 and Bürgisser, Clausen &
Shokrollahi 1997, Corollary 2.26 and Notes 2.8). Karp & Markstein (1997) state a mod-
ification of Algorithm 9.5 taking only 27 M(m) + M(n) + O(n) ring operations. Burnikel
& Ziegler (1998) give a divide-and-conquer algorithm for division with remainder taking
time about 2M(n) when Karatsuba multiplication is used; see also Jebelean (1997).
9.2. Algorithm 9.14 is from von zur Gathen (1990a). In fact, the Taylor expansion can be
computed with M(n) + O(n) or O(M(n)) ring operations (Aho, Steiglitz & Ullman 1975,
see also Schönhage, Grotefeld & Vetter 1994, page 284, and Exercise 9.49).
9.3. The Taylor expansion goes back to Taylor (1715) and Maclaurin (1742), and is already
in Newton (1710) for ϕ = yn .
9.4 and 9.5. The formulas of Newton’s iteration for square and cube roots were known by
the Babylonians, and appear in the 6th century Indian text Āryabhat.ı̄ya. Muh.ammad al-
Khwārizmı̄ described the Newton iteration for square roots around 830 (see Folkerts 1997).
Jamshı̄d Al-Kāshı̄, who lived in Samarkand in the early 15th century, had used a single
Newton step for root finding. Both one- and two-dimensional Newton iteration is explicitly
described in Waring (1770). The history of Newton’s method is traced in Goldstine (1977),
§2.4. Cauchy (1847) describes the arithmetic Newton iteration for finding, from a root of
Exercises 287

an integer polynomial modulo m, roots modulo m2 , m3 , . . . . Bach & Sorenson (1993) and
Bernstein (1998b) present efficient tests whether an integer is a perfect power.
9.6. Von Koch (1904) designed a continuous curve which is nowhere differentiable. Three
joint copies of it—called a Koch snowflake or a Koch island—form the boundary of the
fractal in Figure 9.4. Each of the six white areas around the center “subflake” has again a
Koch flake as boundary, and so on forever (or at least up to the resolution).
The total length of the edges grows exponentially fast. At each iteration, the (linear)
size of the hexagon shrinks to 1/3. If we draw the edges of the hexagon, as in Figure 9.3,
around the smallest hexagons only, then the total length 6 · l, say, where l is the length of
one edge, is replaced by 7 · 6 · l/3 = 14l. Starting with length 6 cm, as in Figure 9.4, we
get a length of (14/6)3 · 6 cm ≈ 76 cm after three iterations (which is approximated by the
little hexagons in the figure). After 83 iterations, the length is more than (current estimates
of) the diameter of the universe.
We thank Rob Corless for pointing out the relation of our picture of Z(7) to von Koch’s
snowflake.
9.7. The development of B IPOLAR has been discontinued and the system is not available
anymore. Von zur Gathen & Gerhard (1996) describe an extension of Cantor’s (1989)
algorithm. Montgomery (1992) discusses algorithms and implementation results for fast
integer arithmetic, in the context of factoring with the elliptic curve method.

Exercises.
9.1 Use Newton iteration to compute f −1 mod x8 for f = x2 − 2x + 1 ∈ Q[x].
9.2 Compute 94−1 mod 6561 using Newton iteration.
9.3 Let a = x7 + 2x4 − 1 and b = x3 + 2x2 − 3x − 1 in Q[x]. Compute the quotient and remainder
of the division of a by b. Trace by hand the “fast” algorithm for division with remainder on this
example.
9.4−→ Let a = 30x7 + 31x6 + 32x5 + 33x4 + 34x3 + 35x2 + 36x + 37 and b = 17x3 + 18x2 + 19x + 20
in F101 [x], and f ∈ F101 [x] the reversal of b.
(i) Compute f −1 mod x4 .
(ii) Use (i) to find q, r ∈ F101 [x] with a = qb + r and deg r < 3.
(iii) Use the Extended Euclidean Algorithm to find a−1 mod b, that is, a polynomial c ∈ F101 [x] of
degree less than 3 with ac ≡ 1 mod b.
(iv) Use Newton iteration to find a−1 mod b4 .
9.5 Let D be a ring (commutative, with 1) and f , g ∈ D[x] monic of degree n > 0.
(i) Prove that rev( f g)−1 rem x2n can be computed from rev( f )−1 rem xn , rev(g)−1 rem xn , and
f g using 2M(n) + M(2n) + O(n) arithmetic operations in D.
(ii) Prove that rev( f )−1 rem xn can be computed from rev( f g)−1 rem x2n using M(n) + O(n) op-
erations in D.
9.6∗ Consider the following variant of the Newton inversion algorithm 9.3. Instead of computing
i r r−1
f −1 mod x2 for i = 1, 2, . . ., compute the inverse modulo x⌈l/2 ⌉ , x⌈l/2 ⌉ , . . ., x⌈l/2⌉ , xl . Show that
− j
the cost of this algorithm is at most l + ∑1≤ j≤r (M(⌈l2 ⌉)+M(⌈l2 − j−1 ⌉)). Use ⌈l2− j ⌉ ≤ ⌊l2− j ⌋+1
for all j and Exercise 8.34 to conclude that the overall cost is at most 3M(l) + O(l).
9.7 Let D be a ring (commutative, with 1), R = D[x], p ∈ R monic nonconstant, r ∈ N, and f ∈ R of
degree less than n = 2r deg p.
288 9. Newton iteration

r
(i) Show that p2 , p4 , . . ., p2 can be computed with M(n) + O(n) ring operations in D.
(ii) Prove that given the polynomials from (i), rev(p)−1 rem xdeg p , rev(p2 )−1 rem x2 deg p , . . .,
r
rev(p2 )−1 rem xn can be computed using at most 4M(n) + O(n) operations in D. Hint: Exercise 9.5.
r−1 r−2
(iii) Given the data from (i) and (ii), show that f rem p2 , f rem p2 , . . ., f rem p2, f rem p can
be computed with 2M(n) + O(n) operations in D.
r
(iv) Show that when R = Z and f , p ∈ N with f < 2r p, you can compute p2 , p4 , . . ., p2 and
r−1 r−2
f rem p2 , f rem p2 , . . ., f rem p2 , f rem p using O(M(2r log p)) word operations.
9.8 (i) Prove that the Newton inversion algorithm 9.3 works correctly as specified.
(ii) Use Exercise 9.7 to show that the algorithm takes 14M(l deg p) + O(l deg p) ring operations in
D if R = D[x] for a (commutative) ring D, p is monic, l is a power of 2, and deg f < l deg p, and
O(M(l log p)) word operations if R = Z and | f | < pl .
9.9 We consider the linear variant of the Newton inversion algorithm 9.3, where the inverse is com-
puted successively modulo x2 , x3 , x4 , . . ., xl . If gi is the inverse modulo xi , give an explicit formula
for the coefficient of xi in gi+1 in terms of the coefficients of gi and the first i + 1 coefficients of f .
Show that this algorithm takes O(l 2 ) ring operations.
9.10 Show that the cost of the Newton inversion algorithm 9.3 drops to at most 2M(l)+2l arithmetic
operations if char D = 2.
9.11∗ Let D be a (commutative) ring, k ∈ N>0 , and f , g ∈ D[x] with f (0) = 1 and f g ≡ 1 mod xk .
(i) Let d ∈ N, e = 1 − f g, and h = g · (ed−1 + ed−2 + · · · + e + 1). Prove that f h ≡ 1 mod xdk .
(ii) Letting d = 2 gives precisely Algorithm 9.3. State an algorithm for Newton inversion modulo
xl with cubic convergence (that is, d = 3), and analyze its cost when l is a power of 3.
9.12∗ This exercise discusses an alternative to the fast division algorithm 9.5 for computing in
residue class rings. It is an adaption of Montgomery’s (1985) integer algorithm to polynomials.
We let F be a field and f , r ∈ F[x] such that f is nonconstant, deg r < deg f = n, and f and r are
coprime. For a ∈ F[x], we represent the residue class a mod f ∈ R = F[x]/h f i by the polynomial
a∗ = ra rem f ∈ F[x]. This is particularly useful when performing a long computation in R, for
example, a modular exponentiation.
(i) Show that (a + b)∗ = a∗ + b∗ and (ab)∗ ≡ r−1 a∗ b∗ mod f for all a, b ∈ F[x].
(ii) Let s ∈ F[x] of degree less than n be the inverse of f modulo r, so that s f ≡ 1 mod r. Consider
the following algorithm for computing (ab)∗ from a∗ and b∗ .
A LGORITHM 9.35 Montgomery multiplication.
Input: a∗ , b∗ ∈ F[x] of degrees less than n.
Output: (ab)∗ ∈ F[x].
1. u ←− a∗ b∗ , v ←− u rem r
2. w ←− vs rem r, c∗ ←− (u − w f )/r
3. return c∗
Prove that r divides u −w f in step 2. Conclude that the algorithm works correctly, so that deg c∗ <
n and c∗ ≡ r−1 a∗ b∗ mod f , if deg r = n − 1.
(iii) Now let r = xn−1 and show that the algorithm can be executed with 3M(n) +n operations in F.
You may ignore the cost for computing s. Compare this to using Newton iteration with precomputa-
tion.
(iv) Let a ∈ F[x] of degree less than n and r as in (iii). Employ the above algorithm to show that a
can be computed from a∗ using 2M(n) + n operations in F, and that conversely a∗ can be computed
from a using 3M(n) + n operations if r∗ is precomputed.
Exercises 289

9.13 Let F be a field of characteristic different from 2, and M(n), I(n), D(n), S(n) be the computing
times for multiplying two polynomials of degree less than n, computing the inverse of a polynomial
modulo xn , division of a polynomial of degree less than 2n by a polynomial of degree n, and squaring
a polynomial of degree less than n, respectively. Theorems 9.4 and 9.6 show that I ∈ O(M) and
D ∈ O(M). The purpose of this exercise is to show that all four functions are of the same order of
magnitude.
−1
(i) Prove the identity y2 = y−1 − (y + 1)−1 − y, and conclude that S ∈ O(I).
(ii) Show that M ∈ O(S), using the identity f g = (( f + g)2 − f 2 − g2 )/2.
(iii) For a polynomial b ∈ R[x] of degree n, relate revn (b)−1 mod xn to the quotient of x2n−1 on
division by b, and conclude that I ∈ O(D). Conclude that O(M) = O(I) = O(D) = O(S).
9.14∗ Let a, b, q ∈ Z[x] such that a = qb, deg a = n, and ||a||∞ ≤ A. Use Mignotte’s bound 6.33 and
a big prime modular approach to show that q can be computed from a and b using O∼ (n(n + log A))
word operations. You may ignore the cost for finding a big prime. Use Corollary 11.13 for modular
arithmetic. See also Exercises 6.26 and 10.21; the latter discusses the small primes variant.
9.15∗ Let a, b ∈ Z[x] such that n = deg a = m+deg b, with n, m ∈ N, b is monic, and ||a||∞ , ||b||∞ < 2l .
(i) Let f = revdeg b (b) ∈ Z[x]. Prove that ||gi ||∞ < 22(i−1)+l ||gi−1 ||2∞ for 1 ≤ i ≤ r in the Newton
inversion algorithm 9.3.
(ii) Prove that ∑0≤ j<i j2− j ≤ 2 for all i ∈ N. Hint: Consider the formal derivative of the polyno-
mial ∑0≤ j<i x j = (1 − xi )/(1 − x) ∈ Z[x].
(iii) Let S(i) = log ||gi ||∞ for 0 ≤ i ≤ r. Conclude from (i) and (ii) that S(i) ≤ (2 + l)2i ∈ O(nl) for
all i.
(iv) Perform a similar analysis when a, b ∈ R[y][x] are bivariate polynomials over a (commutative)
ring R and b is monic with respect to x.
9.16 This exercise discusses division with remainder when the degrees of the divisor and the quo-
tient differ significantly. Let k, m ∈ N be positive. We consider univariate polynomials over an
arbitrary ring (commutative, with 1, as usual).
(i) Prove that division with remainder of a polynomial a of degree less than km by a monic poly-
nomial b of degree m can be done in time (2k + 1)M(m) + O(km). Hint: Partition the dividend a into
blocks of size m, and compute revm (b)−1 mod xm only once.
(ii) Prove that dividing a polynomial of degree n < km by a monic polynomial of degree n − m
takes at most (k + 3)M(m) + O(km) ring operations. Hint: Exercise 8.35.
Determine a small value for the constant in the “O” in both cases.
9.17 Trace the generalized Taylor expansion algorithm 9.14 on computing the (x2 + 1)-adic expan-
sion of x15 in Q[x].
9.18 Use the integer variant of Algorithm 9.14 to convert the decimal integer 64 180 into hexadeci-
mal.
9.19 This exercise discusses a divide-and-conquer variant of Horner’s rule for computing Taylor
expansions. Let R be a ring (commutative, with 1), u ∈ R, n = 2k ∈ N a power of 2, and a ∈ R[x]
of degree less than n. By writing a = a1 xn/2 + a0 with a0 , a1 ∈ R[x] of degree less than n/2, devise
a recursive algorithm which computes a(x + u) and (x + u)n and takes at most (cM(n) + O(n)) log n
ring operations for some constant c. (The coefficients of a(x + u) are the coefficients in the Taylor
expansion of a around u, by Section 5.6.) Determine a small value for c, and compare your result to
Corollary 9.16.
9.20 Let R be a ring (commutative, with 1) and a, p ∈ R[x] with deg p = m and deg a < km for some
k, m ∈ N. Prove that the coefficients of a can be computed from its p-adic expansion (4) using at
most ( 12 M(km) + O(km))(1 + logk) ring operations when k is a power of 2.
290 9. Newton iteration

9.21 Prove Theorem 9.17.

9.22 Let R be a ring (commutative, with 1), f ∈ R[x], u ∈ R a root of f , and g = f /(x − u). Prove
that f ′ (u) = g(u).
9.23 Let R be a ring (commutative, with 1), f ∈ R[x], and m ∈ R. Prove that ( f mod m)′ = f ′
mod m.
9.24 Let R be a ring (commutative, with 1), f , g ∈ R[x], and n ∈ N. Prove that f ≡ g mod xn+1
implies f ′ ≡ g′ mod xn , and give an example where f ′ ≡ g′ mod xn+1 does not hold.
9.25 Let n ∈ N, R be a ring such that n! is a unit in R, f ∈ R[x] of degree n, and f = ∑0≤i≤n fi ·(x−u)i
its Taylor expansion around some u ∈ R. Prove that fi = f (i) (u)/i! for all i, where f (i) is the ith
derivative of f .
9.26 Let R be a ring (commutative, with 1, as usual). For k ∈ N, the kth Hasse–Teichmüller
derivative f [k] of a polynomial ∑0≤i≤n fi xi ∈ R[x] is defined as

i
f [k] = ∑ fi xi−k ∈ R[x].
k≤i≤n k

Let y be another indeterminate. Show that f has the Taylor expansion f (x) = ∑0≤i≤n f [i] (y) · (x − y)i
around y.
9.27 Let R be a ring (commutative, with 1), f1 , . . ., fr ∈ R[x] and e1 , . . ., er , n ∈ N≥1 . You are to
prove three generalizations of the Leibniz rule.
n
(i) (n−i)
(i) ( f1 f2 )(n) = ∑ f1 f2 , where (i) denotes the ith derivative,
0≤i≤n i
(ii) ( f1 · · · fr )′ = ∑ fi′ ∏ f j ,
1≤i≤r j6=i
e
(iii) ( f1e1 · · · frer )′ = ∑ ei fi′ fiei −1 ∏ f j j .
1≤i≤r j6=i
(iv) Conclude from (ii) that
f′ f′ f′
= 1 +···+ r
f f1 fr
is the partial fraction decomposition of f ′ / f , for f = f1 · · · fr .
9.28 Compute the first 16 decimal digits of the real root of y3 − 2y − 5 using Newton iteration and
y0 = 2 as your starting value. Compare your results with Newton’s (page 219). What are the other
two roots?
9.29 Under which condition does the Newton iteration algorithm 9.22 work for a rational function
ϕ ∈ R(y)? The Newton formula for ϕ = 1/y − f ∈ R(y) gives exactly the inversion procedure from
Theorem 9.2. Why does the polynomial ϕ = f y − 1 ∈ R[y] not work directly?
9.30 Let ϕ = x4 + 25x3 + 129x2 + 60x + 108 ∈ Z[x] and p = 5.
(i) Determine all roots of ϕ mod p in F p .
(ii) Find an a priori bound B such that every root a ∈ Z of ϕ has |a| ≤ B.
(iii) Choose l ∈ N such that 2B < pl , and apply p-adic Newton iteration to all modular roots of ϕ
from (i).
(iv) Use the results from (iii) to find all roots of ϕ in Z.
9.31∗ Let R = D[x] for a (commutative) ring D, and ϕ, p, l, g0 be inputs to the p-adic Newton iter-
ation 9.22 with p monic nonconstant, deg g0 < deg p, degx ϕ < l deg p, and degy ϕ = n. Show that
Algorithm 9.22 takes O(n M(l deg p)) operations in D. Hint: Exercise 9.7.
Exercises 291

9.32∗ Prove Theorem 9.26. Hint: Exercise 9.7.

9.33∗ In this exercise, we consider the linearly convergent Newton iteration. Let R be a ring
(commutative, with 1), p ∈ R, ϕ ∈ R[y], and s, g ∈ R such that ϕ(g) ≡ 0 mod pk for some k ∈ N and
sϕ′ (g) ≡ 1 mod p. We define h ∈ R by the Newton formula

h ≡ g − s · ϕ(g) mod pk+1 .

Prove that ϕ(h) ≡ 0 mod pk+1 , h ≡ g mod pk , and sϕ′ (h) ≡ 1 mod p. Derive a linearly conver-
gent analog of Algorithm 9.22 from this, and show that when R = D[x] for a ring D and p = x, it
takes O(nl 2 ) operations in D. This is slower than the quadratically convergent variant, but has the
advantage that the inverse of the derivative need not be updated.
9.34 Derive the formula
1 a
gi+1 = gi +
2 gi
for i ≥ 0, which was already known to the Babylonians, and is the Newton iteration for approximat-
ing the square root of a. Using this formula, compute a square root of 2 modulo 38 . What is the
corresponding formula for computing an nth root of a?
√
9.35 Find the Newton formula for approximating 1/ a. What is the remarkable difference to the
√
Newton formula for a?
9.36 Compute a square root g ∈ Q[x] of f = 1 + 4x ∈ Q[x] modulo x8 such that g(0) = 1, using
Newton iteration.
9.37 Compute a cube root of 2 modulo 625, that is, g ∈ {0, . . ., 624} such that g3 ≡ 2 mod 625.
How many such g are there?
9.38 Consider
√ the three prime numbers p = 5, 7, and 17. We want to calculate p-adic approxima-
tions to 2.
(i) For which of the three p does 2 have a square root modulo arbitrary powers of p?
(ii) For those p where possible, compute all square roots of 2 modulo p6 .
9.39 Let a ∈ N>0 be of word length l, such that a < 264l . For n ∈ N, we denote by T (n) the number
of word operations to compute an using repeated squaring. Prove that T (n) ≤ T (⌊n/2⌋) + O(M(nl))
if n > 1, and conclude that T (n) ∈ O(M(nl)). What is the corresponding result when a is a univariate
polynomial over a (commutative) ring R?
9.40−→ For n ∈ N≥2 and a ∈ Z let Sn (a) be the number of solutions g ∈ {0, . . ., n − 1} of the
quadratic congruence g2 ≡ a mod n.
(i) Which values for S p (a) are possible when p is prime? Distinguish the three cases p = 2, p | a
and 2 6= p ∤ a.
(ii) Let p 6= 2 be prime and e ∈ N>0 . Show that S pe (a) = S p (a) if p ∤ a, and give a counterexample
when p | a.
(iii) Now let n be an odd integer and n = pe11 · . . . · per r its prime factorization, with distinct primes
p1 , . . ., pr ∈ N and positive integers e1 , . . ., er . Find a formula expressing Sn (a) in terms of S p1(a), . . .,
S pr (a) in the case where a and n are coprime. Hint: Chinese Remainder Theorem. Conclude that
Sn (1) = 2r .
(iv) Which of the numbers 10 001, 42 814, 31 027, 17 329 have square roots modulo 50 625?
(v) Compute all square roots of 91 modulo 2025 and of 1 modulo 50 625.
9.41∗ For n ∈ N≥2 and a ∈ Z let Cn (a) be the number of solutions g ∈ {0, . . ., n − 1} of the cubic
congruence g3 ≡ a mod n.
(i) Show that the following hold for an odd prime p:
292 9. Newton iteration

◦ C p (a) ≤ 3,
◦ C p (a) = 1 if p | a or p = 3,
◦ C p (a) 6= 2, and for any value C ∈ {0, 1, 3} there is an odd prime p and an integer a such that
3 6= p ∤ a and C p (a) = C.
(ii) Let p > 3 be a prime and e ∈ N>0 . Show that C pe (a) = C p (a) if p ∤ a, and give a counterexample
when p | a.
(iii) Now let n ∈ N such that gcd(n, 6) = 1, and let n = pe11 · . . . · per r be its prime factorization, with
distinct primes p1 , . . ., pr ∈ N and positive integers e1 , . . ., er . Find a formula expressing Cn (a) in
terms of C p1 (a), . . .,C pr (a) in the case where a and n are coprime.
(iv) Compute all cube roots of 11 modulo 225 625.
9.42 Let n ∈ N>0 . How many cube roots g ∈ F7 [x] modulo xn of degree less than n does f =
−x3 + x2 − x + 1 ∈ F7 [x] have, and how can they be computed? Compute one for n = 4.
9.43∗ Modify the algorithm for computing nth roots in Z so as to work when n is a power of 2, by
using a 3-adic Newton iteration. Prove that your algorithm is correct, and√show that is uses O(M(l))
word operations on inputs of length l. Apply your algorithm to compute 4 2 313 441.
9.44∗ Design a test whether a ∈ N is a perfect power. Your test should output b, d, e, r ∈ N such that
a = 2d 3e br , gcd(b, 6) = 1, and r is maximal, using O(log a · M(log a)) word operations.
9.45 Let R be a ring (commutative, with 1) with a valuation v, with the special property that v(a) ≤ 1
for all a ∈ R. Show that if a ∈ R is a unit, then v(a) = 1.
9.46 Let R be an integral domain with a valuation v, and K the field of fractions of R. Show that
w(a/b) = v(a)/v(b) defines a valuation w on K.
9.47 Conclude the proof of Lemma 9.34.
9.48∗ Let F be a field, and v: F[[x]] −→ R be the x-adic valuation on the ring F[[x]] of formal power
series.
(i) For n ∈ N, let fn = 1 + x + · · ·+ x2n − x2n+1 ∈ F[[x]]. Show that f0 , f1 , · · · is a Cauchy sequence,
so that
∀ε > 0 ∃N ∈ N ∀n, m > N v( fn − fm ) ≤ ε.
(ii) Prove that the sequence has a limit in F[[x]], so that there exists f ∈ F[[x]] with

∀ε > 0 ∃N ∈ N ∀n > N v( f − fn ) ≤ ε.

(iii) Prove that every Cauchy sequence in F[[x]] has a limit in F[[x]], so that F[[x]] is complete.
Show that F[x] with the x-adic valuation does not have this property. (In fact, F[[x]] can be obtained
from F[x] by the same process of “completion” by which one obtains R from Q with respect to the
absolute value.)
(iv) Let f = a0 + a1 x + · · · ∈ F[[x]], and a0 = 0. Prove that f does not have an inverse in F[[x]].
(v) Let f = a0 + a1 x + · · · ∈ F[[x]], and a0 6= 0. Use Newton iteration to prove that f has an inverse
in F[[x]].
9.49 (Aho, Steiglitz & Ullman 1975; see also Schönhage, Grotefeld & Vetter 1994, page 284)
In this exercise, you are to improve the cost estimate of Corollary 9.16 by a factor of log n. Let
n ∈ N, R a ring such that (n − 1)! is a unit in R, u ∈ R, and a = ∑0≤i<n ∈ R[x]. Moreover, let
f = ∑0≤i<n i! ai xn−1−i and g = ∑0≤ j<n u j x j / j!. Show that the coefficient of xk in the polynomial
a(x + u) is equal to 1/k! times the coefficient of xn−1−k in the product polynomial f g, for 0 ≤ k < n.
Conclude that the coefficients of a(x + u), or equivalently, the coefficients of the Taylor expansion of
a around u, by Section 5.6, can be computed using M(n) + 5n arithmetic operations in R.
The second concept is the asymptotic behavior of the number of
operations. This was not significant for small N so the importance of
early forms of the FFT algorithms was not noticed even where they
would have been very useful.
James William Cooley (1987)

Il y a une imagination étonnante dans les mathématiques. [. . . ]

Il y avait beaucoup plus d’imagination dans la tête d’Archimède
que dans celle d’Homère.1
Voltaire (1771)

Leibnitz [sic!] crut voir l’image de la création, dans son arithmétique

binaire où il n’employait que les deux caractères zéro et l’unité.
Il imagina que l’unité pouvait représenter Dieu, et zéro le néant; et que
l’Être suprême avait tiré du néant tous les êtres, comme l’unité avec le
zéro exprime tous les nombres dans ce système d’arithmétique.2
Pierre Simon Laplace (1812)

Guided by an instinctive sense of the beautiful and fitting, in a happy

moment I have succeeded in grasping this much wished for
representation, with which I propose now and for ever to take my
farewell of this long and deeply excogitated theorem.
James Joseph Sylvester (1853)

1 There is an astonishing imagination in mathematics. [. . . ] There was far more imagination in the head of
Archimedes than in that of Homer.
2 Leibniz believed he saw the image of creation in his binary arithmetic in which he employed only the two
characters, zero and unity. He imagined that unity can represent God, and zero nothing; and that the Supreme
Being might have drawn all beings from nothing, just as unity with zero expresses in this binary arithmetic all
numbers.
10
Fast polynomial evaluation and interpolation

In the preceding chapters, we have seen extremely fast algorithms for multiplica-
tion and division with remainder. We now tackle the next set of problems: eval-
uation of a polynomial at many points, its inverse problem, namely interpolation,
and a substantial generalization, the Chinese Remainder Algorithm.

10.1. Fast multipoint evaluation

We consider the following situation: R is a ring (commutative, with 1, as always),
n ∈ N, u0 , . . . , un−1 ∈ R, mi = x − ui ∈ R[x], and m = ∏0≤i<n (x − ui ). Then the
evaluation map

χ: R[x]/hmi −→ Rn
f 7−→ ( f (u0 ), . . . , f (un−1 ))

is a ring homomorphism. If R is a field, then R[x] and Rn are vector spaces over R,
thus R-algebras, and χ is in fact an isomorphism of R-algebras if u0 , . . . , un−1 are
distinct. This is a special case of the Chinese Remainder Theorem 5.3.
In this and the next section, we want to solve the following two problems. We
simplify our exposition by assuming that the number n of points is a power of 2.
For general n, we have the two options of either adding some “phantom” points
or splitting points into two roughly equal halves in the recursive calls. This is
discussed after Theorems 8.3 and 9.15.

P ROBLEM 10.1. (Multipoint evaluation) Given n = 2k for some k ∈ N, f ∈ R[x]

of degree less than n, and u0 , . . . , un−1 ∈ R, compute

χ( f ) = ( f (u0 ), . . . , f (un−1 )).

P ROBLEM 10.2. (Interpolation) Given n = 2k for some k ∈ N, u0 , . . . , un−1 ∈ R

such that ui − u j is a unit for i =
6 j, and v0 , . . . , vn−1 ∈ R, compute f ∈ R[x] of

295
296 10. Fast polynomial evaluation and interpolation

degree less than n with

χ( f ) = ( f (u0 ), . . . , f (un−1 )) = (v0 , . . . , vn−1 ).

We have already discussed these problems for a field R and presented algorithms
taking time O(n2 ) in Chapter 5. The methods of this chapter are only of interest
in connection with subquadratic multiplication routines, as those from Chapter 8.
There we have seen that the evaluation and interpolation problems can be solved
with O(n log n) operations in R if R supports the FFT and ui = ω i , where ω is a
primitive nth root of unity. Our goal now is a similar bound for the general case.
For arbitrary points u0 , . . . , un−1 , multipoint evaluation can be done with O(n2 )
operations in R by using Horner’s rule n times. In fact, it can be proved that one
evaluation requires at least n multiplications. One might be tempted to think that
then n evaluations require at least n2 multiplications. This is false, and our goal
in this section is to see that mass-production of evaluations can be done much
cheaper. In the next section, we show the same bound for interpolation.
The idea of the evaluation algorithm is to split the point set {u0 , . . . , un−1 } into
two halves of equal cardinality and to proceed recursively with each of the two
halves. This leads to a binary tree of depth log n with root {u0 , . . . , un−1 } and the
singletons {ui } for 0 ≤ i < n at the leaves (see Figure 10.1), where log is the binary
logarithm.

i=k u0 , . . . , un−1
Mk,0
i = k−1 u0 , . . . , un/2−1 un/2 , . . . , un−1
.. Mk−1,0 Mk−1,1
. .. .. .. ..
. . . .

i=1 u0 , u1 u2 , u3 un−2 , un−1

M1,0 M1,1 ... M1,n/2−1
i=0 u0 u1 u2 u3 un−2 un−1
M0,0 M0,1 M0,2 M0,3 M0,n−2 M0,n−1

F IGURE 10.1: Subproduct tree for the multipoint evaluation algorithm.

We let mi = x − ui as above, and define

Mi, j = m j·2i · m j·2i +1 · · · m j·2i +(2i −1) = ∏ m j·2i +l (1)

0≤l<2i

for 0 ≤ i ≤ k = log n and 0 ≤ j < 2k−i . Thus each Mi, j is a subproduct with 2i
10.1. Fast multipoint evaluation 297

factors of m = ∏0≤l<n ml = Mk,0 and satisfies for each i, j the recursive equations

M0, j = m j , Mi+1, j = Mi,2 j · Mi,2 j+1 . (2)

If R is an integral domain and u0 , . . . , un−1 are distinct, then Mi, j is the monic
squarefree polynomial whose zero set is the jth node from the left at level i of the
tree in Figure 10.1.
The following algorithm solves the more general problem of computing the sub-
products Mi, j for arbitrary moduli m0 , . . . , mr−1 . It proceeds from the leaves to the
root of the subproduct tree in Figure 10.1.

A LGORITHM 10.3 Building up the subproduct tree.

Input: m0 , . . . , mr−1 ∈ R[x], where r = 2k for some k ∈ N.
Output: The polynomials Mi, j as in (1) for 0 ≤ i ≤ k and 0 ≤ j < 2k−i .

1. for j = 0, . . . , r − 1 do M0, j ←− m j

2. for i = 1, . . . , k do

3. for j = 0, . . . , 2k−i − 1 do Mi, j ←− Mi−1,2 j · Mi−1,2 j+1

We recall the multiplication time M (see inside back cover).

L EMMA 10.4. Algorithm 10.3 correctly computes all subproducts Mi, j ∈ R[x] and
takes at most M(n) log r operations in R, where n = ∑0≤i<r deg mi .

P ROOF. Correctness is clear from (2). Let di, j = deg Mi, j for all i and j. Step 1
uses no arithmetic operations, and the cost for the ith iteration of step 3 is at most

∑ M(di, j ) ≤ M ∑ di, j = M(n)
0≤ j<2k−i 0≤ j<2k−i

operations in R, since ∑0≤ j<2k−i di, j = n. The time estimate follows, since there are
k = log r iterations. ✷

Exercise 10.8 proves an analogous result for integers. If all mi have the same
degree, then Exercise 10.3 proves the better timing estimate ( 12 M(n) + O(n)) log r.
If the degrees of the mi differ considerably from each other, then the tree in Figure
10.1 is quite unbalanced with respect to the degree. In fact, it is possible to prove
a slightly better bound on the arithmetic cost for that case. If p0 , . . . , pr−1 ∈ R are
positive probabilities that sum to 1, then

H(p0 , . . . , pr−1 ) = − ∑ pi log pi

0≤i<r
298 10. Fast polynomial evaluation and interpolation

is known from information theory as the entropy of p0 , . . . , pr−1 . Exercise 10.4

shows that 0 < H(p0 , . . . , pr−1 ) ≤ log r, and H(p0 , . . . , pr−1 ) = log r if and only if
p0 = · · · = pr−1 = 1/r. If we organize the subproduct tree in such a way that at
each node the total degree of the left and of the right subtree are as well balanced
as possible, then the running time bound for computing m = m0 · · · mr−1 drops to at
most M(n)(H(deg m0 /n, . . . , deg mr−1 /n) + 1) (Exercise 10.7); the case where all
deg mi are equal to n/r yields the bound from Lemma 10.4. The same also applies
to the other algorithms in Sections 10.1 through 10.3.
The computation of all the subproducts Mi, j can be regarded as a precomputation
stage for the fast multipoint evaluation algorithm that we are going to present now.
If several polynomials have to be evaluated at the same points u0 , . . . , un−1 , it is
sufficient to carry out the precomputation stage only once in advance.
For n ∈ N, we let D(n) denote the number of operations in R for dividing a
polynomial of degree less than 2n by a monic polynomial of degree n in R[x],
and assume that D(n + m) ≥ D(n) + D(m) and D(n) ≥ n for all n, m ∈ N. By
Theorem 9.6, D(n) is at most 5M(n) + O(n).
We now present a divide-and-conquer algorithm that, given all subproducts Mi, j ,
proceeds top down along the tree in Figure 10.1.

A LGORITHM 10.5 Going down the subproduct tree.

Input: f ∈ R[x] of degree less than n = 2k for some k ∈ N, u0 , . . . , un−1 ∈ R, and the
subproducts Mi, j from (1).
Output: f (u0 ), . . . , f (un−1 ) ∈ R.

1. if n = 1 then return f

2. r0 ←− f rem Mk−1,0 , r1 ←− f rem Mk−1,1

3. call the algorithm recursively to compute r0 (u0 ), . . . , r0 (un/2−1 )

4. call the algorithm recursively to compute r1 (un/2 ), . . . , r1 (un−1 )

5. return r0 (u0 ), . . . , r0 (un/2−1 ), r1 (un/2 ), . . . , r1 (un−1 )

T HEOREM 10.6.
Algorithm 10.5 works correctly and takes at most D(n) log n operations in R, which
is at most (5M(n) + O(n)) log n or O(M(n) log n).

P ROOF. We prove the correctness by induction on k. If k = 0, then f is constant,

and the algorithm outputs the correct value in step 1. Otherwise, if k ≥ 1, we
may inductively assume that the results of steps 3 and 4 are correct. Let q0 =
10.2. Fast interpolation 299

f quo Mk−1,0 and q1 = f quo Mk−1,1 . Then


 q0 (ui )Mk−1,0 (ui ) + r0 (ui ) = r0 (ui ) if 0 ≤ i < n ,
f (ui ) = 2
 q1 (ui )Mk−1,1 (ui ) + r1 (ui ) = r1 (ui ) if n ≤ i < n.
2

Let T (n) = T (2k ) denote the cost for the recursive process. Then T (1) = 0 and

T (2k ) = 2T (2k−1 ) + 2D(2k−1 )

for k ≥ 1, so that T (2k ) ≤ 2k · D(2k−1 ) ≤ D(n) log n, by Lemma 8.2, and the claim
follows from Theorem 9.6. ✷

Putting things together, we obtain the following algorithm for fast multipoint
evaluation.

A LGORITHM 10.7 Fast multipoint evaluation.

Input: f ∈ R[x] of degree less than n = 2k for some k ∈ N and u0 , . . . , un−1 ∈ R.
Output: f (u0 ), . . . , f (un−1 ) ∈ R.

1. call Algorithm 10.3 with input (x − u0 ), . . . , (x − un−1 ) to compute the sub-

products Mi, j as in (1)

2. call Algorithm 10.5 with input f , the points ui , and the subproducts Mi, j
return its results

C OROLLARY 10.8.
Evaluation of a polynomial in R[x] of degree less than n at n points in R can be
performed using at most ( 11
2 M(n) + O(n)) log n or O(M(n) log n) operations in R.

The time bound follows from Exercise 10.3 and Theorem 10.6. Exercise 10.9
proves the smaller bound (1 + 27 log n)(M(n) + O(n)). Exercise 10.11 shows that
if many evaluations at the same set of points have to be performed, then all data
depending only on the evaluation points may be precomputed and stored, and the
cost drops to essentially (2M(n) + O(n)) log n.

10.2. Fast interpolation

We recall the Lagrange interpolation formula from Chapter 5. Given distinct
u0 , . . . , un−1 in a field F and arbitrary v0 , . . . , vn−1 ∈ F, the unique polynomial
300 10. Fast polynomial evaluation and interpolation

f ∈ F[x] of degree less than n that takes the value vi at the point ui for all i is
f = ∑0≤i<n vi si m/(x − ui ), where m = (x − u0 ) · · · (x − un−1 ), as before, and

1
si = ∏ . (3)
j6=i ui − u j

Over a ring R, this is still valid if we demand that ui −u j is a unit for i 6= j. Theorem
10.13 below shows that this condition is also necessary in the general case.
We first explain an idea to compute the si fast. The formal derivative m is m′ =
∑0≤ j<n m/(x − u j ), and since m/(x − ui ) vanishes at all points u j with j 6= i, we
have
′ m 1
m (ui ) = = . (4)
x − ui x=ui si
Given m, the computation of all the si amounts to one evaluation of m′ at n points,
at a cost of O(M(n) log n) operations in R, plus n inversions.
The following divide-and-conquer algorithm is the core of the fast interpolation
algorithm. It proceeds from the leaves to the root of the tree in Figure 10.1.

A LGORITHM 10.9 Linear combination for linear moduli.

Input: u0 , . . . , un−1 , c0 , . . . , cn−1 ∈ R, where n = 2k for some k ∈ N, and the polyno-
mials Mi, j as in (1).
m
Output: ∑ ci ∈ R[x], where m = (x − u0 ) · · · (x − un−1 ).
0≤i<n x − ui

1. if n = 1 then return c0
Mk−1,0
2. call the algorithm recursively to compute r0 = ∑ ci
0≤i<n/2
x − ui

Mk−1,1
3. call the algorithm recursively to compute r1 = ∑ ci
n/2≤i<n
x − ui

4. return Mk−1,1 r0 + Mk−1,0 r1

T HEOREM 10.10.
Algorithm 10.9 takes at most (M(n) + O(n)) log n or O(M(n) log n) arithmetic op-
erations in R to correctly compute the result.

P ROOF. As usual, we prove the correctness by induction on k. If k = 0, then

m = x − u0 , and the output in step 1 is correct. If k ≥ 1, then the results of the
recursive calls in steps 2 and 3 are correct by the induction hypothesis, and the
algorithm outputs the correct result in step 4 since m = Mk−1,0 · Mk−1,1 .
10.3. Fast Chinese remaindering 301

Let T (n) = T (2k ) denote the cost of the algorithm. The cost for the individ-
ual steps is 0 for step 1, T (n/2) for each of the steps 2 and 3, and at most
2M(n/2 + 1) + n ∈ M(n) + O(n) (Exercise 8.34) for step 4. (The “+1” comes
from our convention that M(n) is the time to multiply polynomials of degree less
than n.) Thus T (1) = 0 and T (n) ≤ 2T (n/2) + M(n) + cn for n > 1 and some
constant c ∈ R, and Lemma 8.2 yields the claim. ✷

Putting things together, we have the following fast interpolation algorithm.

A LGORITHM 10.11 Fast interpolation.

Input: u0 , . . . , un−1 ∈ R such that ui − u j is a unit for i 6= j, and v0 , . . . , vn−1 ∈ R,
where n = 2k for some k ∈ N.
Output: The unique polynomial f ∈ R[x] of degree less than n such that f (ui ) = vi
for 0 ≤ i < n.
1. call Algorithm 10.3 with input m0 = x − u0 , . . . , mn−1 = x − un−1 to compute
the polynomials Mi, j as in (1)
2. m ←− Mk,0
call Algorithm 10.5 with input f = m′ , u0 , . . . , un−1 , and the Mi, j to evaluate
m′ at u0 , . . . , un−1
1
for i = 0, . . . , n − 1 do si ←− ′
m (ui )
3. call Algorithm 10.9 with input u0 , . . . , un−1 , v0 s0 , . . . , vn−1 sn−1 , and the Mi, j
return its result

C OROLLARY 10.12.
Algorithm 10.11 solves the interpolation problem 10.2 over a (commutative) ring
R using at most ( 13
2 M(n) + O(n)) log n or O(M(n) log n) operations in R.

P ROOF. The cost for step 1 is at most ( 12 M(n) + O(n)) log n, by Exercise 10.3.
The cost for step 2 is at most (5M(n) + O(n)) log n operations, by Theorem 10.6,
including the computation of m′ and the final modular inversions. Finally, step 3
takes no more than (M(n) + O(n)) log n operations, by Theorem 10.10. ✷

Exercise 10.11 shows that if many interpolations at the same set of points have
to be performed, then all data depending only on the interpolation points may be
precomputed and stored, and the cost drops to essentially (M(n) + O(n)) log n.

10.3. Fast Chinese remaindering

The ideas and algorithms of the previous sections carry over to monic moduli of
arbitrary degree in R[x] and to Z. We only discuss the polynomial case in detail
and refer to the exercises for the integer analog.
302 10. Fast polynomial evaluation and interpolation

Let R be a field, m0 , . . . , mr−1 ∈ R[x] nonconstant and pairwise coprime, m =

m0 · · · mr−1 , and n = deg m. Then evaluation and interpolation correspond to com-
puting the Chinese Remainder isomorphism
χ: R[x]/hmi −→ R[x]/hm0 i × . . . × R[x]/hmr−1 i,
(5)
f mod m 7−→ ( f mod m0 , . . . , f mod mr−1 ),
and its inverse. For arbitrary coefficient rings, we have the following variant of the
Chinese Remainder Theorem 5.3; Exercise 10.13 asks for a proof.

T HEOREM 10.13.
Let r ≥ 1, R be a ring (commutative, with 1, as always), m0 , . . . , mr−1 ∈ R[x] monic
and nonconstant, and m = m0 · · · mr−1 . Then the following are equivalent.
(i) The ring homomorphism χ in (5) is an isomorphism.
(ii) There exist polynomials s0 , . . . , sr−1 ∈ R[x] such that ∑0≤i<r si m/mi = 1.
(iii) For i 6= j there exist polynomials si j ,ti j ∈ R[x] such that si j m j + ti j mi = 1.
(iv) res(mi , m j ) ∈ R× for i 6= j.

If each mi = x − ui is linear, with ui ∈ R, then

1 1
res(mi , m j ) = det = ui − u j ,
−ui −u j
and (iv) is equivalent to ui − u j ∈ R× for i 6= j.
Let us assume for simplicity that r = 2k is a power of 2 for some k ∈ N. For the
moment, we also drop the requirement that res(mi , m j ) ∈ R× for i 6= j. We build
the subproduct tree with Mi, j ∈ F[x] as in (1). The following algorithm generalizes
Algorithm 10.5.

A LGORITHM 10.14 Fast simultaneous reduction with precomputation.

Input: Monic nonconstant moduli m0 , . . . , mr−1 ∈ R[x], where r = 2k with k ∈ N, f ∈
R[x] of degree less than n = ∑0≤i<r deg mi , and the polynomials Mi, j from (1).
Output: f rem m0 , . . . , f rem mr−1 ∈ R[x].
1. if r = 1 then return f
2. f0 ←− f rem Mk−1,0 , f1 ←− f rem Mk−1,1
3. call the algorithm recursively to compute f0 rem m0 , . . . , f0 rem mr/2−1
4. call the algorithm recursively to compute f1 rem mr/2 , . . . , f1 rem mr−1
5. return f0 rem m0 , . . . , f0 rem mr/2−1 , f1 rem mr/2 , . . . , f1 rem mr−1
10.3. Fast Chinese remaindering 303

T HEOREM 10.15.
Algorithm 10.14 works correctly and takes no more than (10M(n) + O(n)) log r or
O(M(n) log r) operations in R.

P ROOF. The correctness proof is similar to that of Theorem 10.6 and left as Ex-
ercise 10.15. For the cost analysis, we see that the algorithm works from the root
to the leaves along the binary tree formed by the subproducts Mi, j . The cost for a
vertex Mi, j with i ≥ 1 is the cost for dividing a polynomial of smaller degree than
deg Mi, j by Mi−1,2 j and Mi−1,2 j+1 with remainder, using at most 2D(deg Mi, j ) ring
operations. The total cost at level i is then at most 2 ∑0≤ j<2i D(deg Mi, j ) ≤ 2D(n),
as in the proof of Lemma 10.4, and the claim follows from Theorem 9.6 and the
fact that there are log r levels. ✷

A LGORITHM 10.16 Fast simultaneous modular reduction.

Input: Monic nonconstant moduli m0 , . . . , mr−1 ∈ R[x], where r = 2k with k ∈ N,
and f ∈ R[x] of degree less than n = ∑0≤i<r deg mi .
Output: f rem m0 , . . . , f rem mr−1 ∈ R[x].
1. call Algorithm 10.3 with input m0 , . . . , mr−1 to compute the subproducts Mi, j
as in (1)
2. call Algorithm 10.14 with input m0 , . . . , mr−1 , f , and the subproducts Mi, j
return its results

C OROLLARY 10.17.
Given monic nonconstant polynomials m0 , . . . , mr−1 ∈ R[x], where r ∈ N is a power
of 2, and f ∈ R[x] of degree less than n = ∑0≤i<r deg mi , Algorithm 10.16 computes
f rem m0 , . . . , f rem mr−1 using at most (11M(n) + O(n)) log r or O(M(n) log r)
operations in R.

Exercise 10.17 gives a better analysis of Algorithm 10.16 when all moduli have
the same degree.
For the fast Chinese Remainder Algorithm, we recall the generalization of La-
grange’s formula from Chapter 5. Given pairwise distinct and nonconstant mod-
uli m0 , . . . , mr−1 ∈ F[x] for a field F and polynomials v0 , . . . , vr−1 ∈ F[x] with
deg vi < deg mi for all i, there is a unique polynomial f ∈ F[x] of degree less
than n = ∑0≤i<r deg mi satisfying f ≡ vi mod mi for all i, and it is given by f =
∑0≤i<r (vi si rem mi )m/mi , where m = m0 · · · mr−1 and si ∈ F[x] is an inverse of
m/mi modulo mi . Theorem 10.13 implies that this is true for arbitrary coefficient
rings R if we require res(mi , m j ) ∈ R× for i 6= j.
As in the case of interpolation, we first address the task of computing the si .
This need be done only once if several computations with the same set of moduli
are to be executed.
304 10. Fast polynomial evaluation and interpolation

A LGORITHM 10.18 Simultaneous inverse computation.

Input: m0 , . . . , mr−1 ∈ R[x] monic and nonconstant such that res(mi , m j ) ∈ R× for
i 6= j, where r = 2k for some k ∈ N, and m = m0 · · · mr−1 .
m
Output: s0 , . . . , sr−1 ∈ R[x] with si ≡ 1 mod mi and deg si < deg mi for all i.
mi

1. call Algorithm 10.16 to compute m rem m2i for all i

m
2. for i = 0, . . . , r − 1 compute rem mi
mi
m
3. for i = 0, . . . , r − 1 compute si ∈ R[x] with si · rem mi ≡ 1 mod mi and
mi
deg si < deg mi , using the Extended Euclidean Algorithm if R is a field and
Exercise 6.15 otherwise

4. return s0 , . . . , sr−1

L EMMA 10.19. Algorithm 10.18 works correctly. If R is a field, then it takes

O(M(n) log n) operations in R for n = deg m.

P ROOF. Let R be a field and di = deg mi for 0 ≤ i < r. The cost for step 1 is
O(M(2n) log r) ring operations, including the cost for computing all m2i . Step 2
costs D(di ) ∈ O(M(di )) for mi . In Chapter 11, we will see that step 3 can be done
with O(M(di ) log di ) operations in R for each i. Using

∑ M(di ) ≤ M ∑ di = M(n),
0≤i<r 0≤i<r

we have a cost of O(M(n)) and O(M(n) log n) for steps 2 and 3, respectively, and
the claim follows. ✷

Here is the corresponding generalization of Algorithm 10.9.

A LGORITHM 10.20 Linear combination.

Input: m0 , . . . , mr−1 ∈ R[x] monic and nonconstant, where r = 2k for some k ∈ N,
c0 , . . . , cr−1 ∈ R[x] with deg ci < deg mi for all i, and the polynomials Mi, j as
in (1).
m
Output: The polynomial f = ∑ ci ∈ R[x], where m = m0 · · · mr−1 .
0≤i<r m i

1. if r = 1 then return c0
Mk−1,0
2. call the algorithm recursively to compute r0 = ∑ ci
0≤i<r/2
mi
10.3. Fast Chinese remaindering 305

Mk−1,1
3. call the algorithm recursively to compute r1 = ∑ ci
r/2≤i<r
mi

4. return Mk−1,1 r0 + Mk−1,0 r1

T HEOREM 10.21.
Algorithm 10.20 works correctly. If ∑0≤i<r deg mi < n, then it takes no more than
(2M(n) + O(n)) log r or O(M(n) log r) arithmetic operations in R.

The correctness proof is analogous to the proof of Theorem 10.10 and the run-
ning time bound can be obtained by considering the same binary tree as in the
proof of Theorem 10.15. The details can be found in Exercise 10.16.

A LGORITHM 10.22 Fast Chinese Remainder Algorithm.

Input: m0 , . . . , mr−1 ∈ R[x] such that res(mi , m j ) ∈ R× for i 6= j, where R is a ring
(commutative, with 1), r = 2k for some k ∈ N, and v0 , . . . , vr−1 ∈ R[x] with
deg vi < deg mi for all i.
Output: The unique polynomial f ∈ R[x] of degree less than n = ∑0≤i<r deg mi such
that f ≡ vi mod mi for 0 ≤ i < n.

1. call Algorithm 10.3 with input m0 , . . . , mr−1 to compute the polynomials Mi, j
as in (1)
2. call Algorithm 10.18 with input m0 , . . . , mr−1 and m = Mk,0 to compute poly-
m
nomials si ∈ R[x] with si ≡ 1 mod mi and deg si < deg mi for all i
mi
3. call Algorithm 10.20 with input m0 , . . . , mr−1 , v0 s0 rem m0 , . . . , vr−1 sr−1 rem
mr−1 , and the polynomials Mi, j
return its result

C OROLLARY 10.23.
Given m0 , . . . , mr−1 ∈ F[x] monic and pairwise coprime, where F is a field, and
v0 , . . . , vr−1 ∈ F[x] with deg vi < deg mi for all i, we can compute the unique solution
f ∈ F[x] of degree less than n = ∑0≤i<r deg mi of the Chinese Remainder Problem

f ≡ vi mod mi for 0 ≤ i < r

using O(M(n) log n) operations in F .

Exercise 10.17 gives an explicit constant for the leading cost term of Algorithm
10.22 when all moduli have the same degree.
We only state the corresponding results for the integer case. Algorithms 10.14
and 10.20 carry over almost literally; the details are left as an exercise.
306 10. Fast polynomial evaluation and interpolation

T HEOREM 10.24.
Given m0 , . . . , mr−1 ∈ N≥2 and f ∈ N less than m = ∏0≤i<r mi , we can compute
f rem m0 , . . . , f rem mr−1 using O(M(log m) log r) word operations.

T HEOREM 10.25.
Given pairwise coprime integers m0 , . . . , mr−1 ∈ N≥2 and v0 , . . . , vr−1 ∈ N such
that vi < mi for all i, we can compute the unique solution f ∈ N with f < m =
∏0≤i<r mi of the Chinese Remainder Problem f ≡ vi mod mi for 0 ≤ i < r, at a
cost of O(M(log m) loglog m) word operations.

Notes. Pan (1966) proved the optimality of Horner’s rule. The results in Sections 10.1
through 10.3 are based on Lipson (1971), Fiduccia (1972a), Horowitz (1972), Moenck &
Borodin (1972), and Borodin & Moenck (1974). Borodin & Munro (1975) give a compre-
hensive treatment.

Exercises.
10.1 Let f = 8x7 + 7x6 + 6x5 + 5x4 + 4x3 + 3x2 + 2x + 1 ∈ Q[x]. Trace Algorithm 10.7 to evaluate f
at the eight integer points −3, −2, · · ·, 4. In the recursive algorithm 10.5, you need only execute the
last recursive step and may compute its inputs directly.

10.2 Let R be a ring (commutative, with 1), n ∈ N a power of 2, and k ∈ N. Show that you can
evaluate a polynomial of degree less than kn at n points from R using (2k + 1 + 11
2 log n)M(n) +
O((k + log n)n) additions and multiplications in R. Hint: Exercise 9.16.

10.3 Let R be a ring (commutative, with 1), m0 , . . ., mr−1 ∈ R[x] of degree d > 0, and n = rd, where
r is a power of two. Using Exercise 8.34, prove that Algorithm 10.3 takes only (M(n/2) + O(n)) log r
or ( 12 M(n) + O(n)) log r ring operations.

10.4∗ Let n ∈ N and p1 , . . ., pn ∈ R>0 such that ∑1≤i≤n pi = 1.

(i) Prove that H(p1 , . . ., pn ) ≥ 0, with equality if and only if n = 1.
(ii) Prove that H(p1 , . . ., pn ) ≤ log n, with equality if and only if p1 = · · · = pn = 1/n. Hint: Use
ln x ≤ x − 1 for all positive x ∈ R, with equality if and only if x = 1, and apply this to the expression
∑1≤i≤n pi ln(1/pi n)/ ln 2. (Recall that log = log2 and ln = loge .)
10.5∗∗ A mobile is a complete binary tree with additive node weights, so that each node has either
no child (it is a leaf) or two children (it is an internal node), and a positive real number, its weight.
The weight of an internal node is the sum of the weights of its two children. By induction, the weight
of any node in a mobile is the sum of the weights of all leaves in the subtree rooted at that node, and
in particular, the weight of the root is the sum of the weights of all leaves. A stochastic mobile is a
mobile with root weight 1; it is useful to imagine all weights to be probabilities. Any mobile can be
turned into a stochastic mobile by dividing all weights by the root weight. The depth of a node in a
mobile is the length of the path from the root to that node. The average depth of a stochastic mobile
with n leaves is ∑1≤i≤n pi di , where pi is the weight and di is the depth of leaf i. For given stochastic
leaf weights, we are interested in mobiles whose average depth is as small as possible. The following
figure shows a stochastic mobile with five leaves and average depth 9/4.
Exercises 307

5/12 7/12

1/4 1/6 1/4 1/3

1/12 1/6

(i) Prove that the average depth of any stochastic mobile with leaf weights p1 , . . ., pn is at least
H(p1 , . . ., pn ). Hint: Induction on n.
(ii) For given p1 , . . ., pn ∈ R>0 such that ∑1≤i≤n pi = 1, let li = −⌊log pi ⌋ > 0 for 1 ≤ i ≤ n,
l = max{li : 1 ≤ i ≤ n}, and n j be the number of indices i such that li = j, for 1 ≤ j ≤ l. Prove that
∑1≤ j≤l n j 2− j ≤ 1.
(iii) Consider the following algorithm, which uses ideas by Shannon (1948), Fano (1949, 1961),
and Kraft (1949) for constructing a stochastic mobile of small average depth.
A LGORITHM 10.26 Building a mobile.
Input: p1 , . . ., pn ∈ R>0 such that ∑1≤i≤n pi = 1.
Output: A stochastic mobile with leaf weights p1 , . . ., pn .
1. let l1 , . . ., ln , l, n1 , . . ., nl be as in (ii), and create a full binary tree t (that is, a complete binary
tree with 2l leaves of depth l) with all node weights equal to zero
2. for j = 1, . . ., l do
3. assign those weights pi with li = j to the first n j of the nodes of depth j in t and
remove the subtree of each such node with positive weight from t
4. for j = l, l − 1, . . ., 1 remove the leaves of depth j with zero weight in t
5. while the tree t is not complete, so that there exists a node with a single child, identify that
node with its child and remove the edge between them
6. compute the weights of the inner nodes in t, proceeding from the leaves to the root
return t
Use (ii) to show that before the jth pass through step 3, there are precisely 2 j −n1 2 j−1 −n2 2 j−2 −
· · · − n j−1 · 2 ≥ n j nodes of depth j left in t, and conclude that the algorithm works correctly.
(iv) Prove that the average depth of t after step 4 is less than H(p1 , . . ., pn ) + 1, and conclude that
this is true as well for the tree returned in step 6.
(v) Run the algorithm with p1 = p3 = p7 = p8 = 1/17, p2 = 5/17, p4 = p5 = 2/17, and p6 = 4/17.
10.6∗ This exercise discusses Huffman (1952) codes, a tool for data compression. Suppose that we
want to encode a piece of text over a finite alphabet Σ = {σ1 , . . ., σn } in binary using as few bits as
possible. If we know nothing more than the size of Σ, then there seems to be no better way than to
choose some encoding of the elements Σ as bit strings of fixed length ⌈log n⌉. Suppose now that for
each element σi , we know the frequency pi with which it occurs in our text. The idea of the Huffman
code is then to use a variable-length encoding which encodes letters that occur frequently by shorter
bit strings than letters that only rarely occur. Huffman codes are instantaneous codes, so that no
codeword is a prefix of another codeword, and can be represented by binary trees.
Here is an algorithm which dynamically builds a stochastic mobile (Exercise 10.5) with leaf
weights p1 , . . ., pn , the Huffman tree, which has minimal average depth.
A LGORITHM 10.27 Building a Huffman tree.
Input: p1 , . . ., pn ∈ R>0 such that p1 + · · · + pn = 1.
Output: A Huffman tree for p1 , . . ., pn .
308 10. Fast polynomial evaluation and interpolation

1. create n nodes t1 , . . .,tn , such that ti has weight pi ,

T ←− {t1 , . . .,tn }
2. repeat
3. choose two trees t ∗ ,t ∗∗ ∈ T with root weights p∗ , p∗∗ , such that p∗ ≤ p∗∗ and the
roots of all other trees in T have weight at least p∗∗
4. create a new tree t with root weight p∗ + p∗∗ and children t ∗ and t ∗∗
5. T ←− (T \ {t ∗ ,t ∗∗ }) ∪ {t}
6. until #T = 1
7. return the single tree in T
(i) Build a Huffman tree for “MISSISSIPPI_RIVER” over the alphabet Σ = {E, I, M, P, R, S, V, _}.
(There is some non-determinism in step 3 if there are several trees with minimal root weights, so the
Huffman tree is not unique.)
(ii) To encode a piece of text, the two edges from a node to its two children in the Huffman tree
are labeled 0 and 1, and the encoding of σi is obtained by concatenating the edge labels on the path
from the root to leaf i of weight pi . The average length of a codeword is then the average depth of
the tree. Encode the text from (i), both with the Huffman tree and with the tree from Exercise 10.5,
and compare the length of the resulting bit string to the length when using a fixed-length 3 bit code.
It is possible to show that the Huffman tree has minimal average depth (see, for example, Hamming
(1986), §4.8, or Knuth (1997), §2.3.4.5), and Exercise 10.5 implies that its average depth is less than
H(p1 , . . ., pn ) + 1.
10.7∗ Let R be a ring (commutative, with 1), m0 , . . ., mr−1 ∈ R[x] nonconstant, m = m0 · · ·mr−1 of
degree n, and pi = (deg mi )/n for all i. Let t be a stochastic mobile for p0 , . . ., pr−1 with average
depth d, as in Exercise 10.5.
(i) Prove that d +1 = ∑v p(v), where the sum is over all nodes of t and p(v) is the weight of node v.
Conclude that d is the sum of the weights of all internal nodes in t.
(ii) Show that computing m along the tree t, with mi at the leaf of weight pi , takes at most d M(n)
operations in R. You may assume that M(n) = nS(n) for a monotonic function S. Conclude that m
can be computed with less than M(n)(H(p0 , . . ., pr−1 ) + 1) operations.
(iii) Now assume that deg m0 = 1 and deg mi = 2i−1 for 1 ≤ i < r. Show that H(p0 , . . ., pr−1 ) = 2
and find a stochastic mobile for p0 , . . ., pr−1 of average depth 2. Conclude that you can compute
m using at most 2M(n) operations in R, and compare this to the bound from Lemma 10.4. Using
Exercise 8.34, reduce the bound to M(n) + O(n).
10.8 Let r = 2k for some k ∈ N, m0 , . . ., mr−1 positive integers of length λ(mi ) ≤ l (see Section 2.1),
and Mi j ∈ N for 0 ≤ i ≤ k and 0 ≤ j < 2k−i as in (1).
(i) Find a close upper bound on the length of Mi, j depending only on i and l.
(ii) Prove that all Mi, j can be computed using O(M(rl) log r) word operations.
10.9 (Montgomery 1992) Let R be a ring (commutative, with 1), m0 , . . ., mr−1 ∈ R[x] monic of
degree d > 0, the subproducts Mi j as in (1) for all i, j, and n = rd for a power r = 2k of 2.
i
(i) Use Exercise 9.5 to show that all rev(Mi j )−1 rem x2 d for i < k can be computed with (k + 1) ·
(M(n) + O(n)) ring operations.
(ii) Conclude from Exercise 10.3 and (i) that the fast multipoint evaluation algorithm 10.7 can be
modified so as to use only (1 + 72 log n)(M(n) + O(n)) arithmetic operations in R.
10.10 Trace Algorithm 10.11 on computing the interpolating polynomial f ∈ Q[x] of degree less
than 4 such that f (u) = 2u for u = 0, 1, 2, 3.
Exercises 309

10.11 In this exercise, you are to examine the cost for several evaluations and interpolations at
the same set of points. Show that one evaluation can be done with at most (2M(n) + O(n)) log n
operations in R and one interpolation with at most (M(n) + O(n)) logn) operations if the cost for
precomputing data depending only on the points is ignored. Hint: Some preconditioning on the
divisors in the “going down” is possible; see Section 9.1.
10.12∗ Let n = 2k be a power of 2, F a field, u0 , . . ., un−1 ∈ F distinct, and v0 , . . ., vn−1 ∈ F. You
are to design an interpolation algorithm with running time O(M(n) log2 n).
(i) Let m1 , m2 ∈ F[x] of degree n be monic and coprime and v1 , v2 ∈ F[x] of degree less than n.
Using Theorem 11.10, give an algorithm that computes a solution f ∈ F[x] of degree less than 2n of
the congruences f ≡ v1 mod m1 and f ≡ v2 mod m2 , taking O(M(n) log n) operations in F.
(ii) Use (i) to design a divide-and-conquer algorithm computing the interpolating polynomial
f ∈ F[x] of degree less than n such that f (ui ) = vi for all i, and show that it takes O(M(n) log2 n)
operations in F.
(iii) Trace your algorithm on the example from Exercise 10.10.
10.13∗ Prove Theorem 10.13. Hint: Exercise 6.15.
10.14∗ Prove the following generalized version of the Chinese Remainder Theorem. Let R be a ring
(commutative, with 1), I0 , . . ., Ir−1 ideals, and I = I0 ∩ · · · ∩ Ir−1 . If Ii + I j = R for i 6= j, then the map

χ: R/I −→ R/I0 × . . . × R/Ir−1 ,

f mod I 7−→ ( f mod I0 , . . ., f mod Ir−1 )

is an isomorphism.
10.15 Prove that Algorithm 10.14 works correctly.
10.16∗ Prove Theorem 10.21.
10.17∗ (i) Prove that when all moduli have the same degree, then the fast modular reduction
algorithm 10.16 takes only (11M(n/2) + O(n)) log r or ( 11 2 M(n) + O(n)) log r ring operations and
Algorithm 10.20 takes only (M(n) + O(n)) log r operations.
(ii) Show that Algorithm 10.18 takes at most (11 log r + 24 log(n/r) + 5)M(n) + O(n log n) arith-
metic operations if R is a field and all moduli have the same degree, using (i) and Theorem 11.10.
(iii) Conclude that the fast Chinese Remainder Algorithm 10.22 takes at most (24M(n) + O(n)) ·
log n arithmetic operations if R is a field and all moduli have the same degree.
You may find Exercise 10.3 useful.
10.18 Let F be a field, f1 , . . ., fr ∈ F[x] pairwise coprime, e1 , . . ., er ∈ N>0 , f = f1e1 · · · frer , and
g ∈ F[x] of degree less than deg f . Show that the partial fraction decomposition of g/ f with respect to
the given factorization of f (see Section 5.11) can be computed with O(M(n) log n) field operations.
10.19∗ Work out the details for the integer versions of Algorithms 10.14 and 10.20, and prove
Theorems 10.24 and 10.25.
10.20−→ You are to trace the integer analog of Algorithms 10.14 and 10.20. The point is to see
how the algorithm works, not just to compute the final result. Let m0 = 23, m1 = 24, m2 = 25, and
m3 = 29.
(i) Check that the moduli are pairwise coprime.
(ii) Compute the binary tree of products.
(iii) Compute 300 000 mod mi for 0 ≤ i < 3, using Algorithm 10.14.
(iv) Let v0 = 5, v1 = 3, v2 = 1, and v3 = 22. Use the fast Chinese Remainder Algorithm to compute
f ∈ Z such that f ≡ vi mod mi for 0 ≤ i < 4.
310 10. Fast polynomial evaluation and interpolation

10.21∗ Let a, b ∈ Z[x] nonzero such that deg b < deg a = n and ||a||∞ ≤ A. You are to design a small
primes modular algorithm that decides whether b | a, and if so, computes the quotient q = a/b ∈ Z[x].
By Mignotte’s bound 6.33, we have ||b||1 ||q||1 ≤ B = (n + 1)1/2 2n A in the latter case. The algorithm
should choose a collection of distinct primes p1 , . . ., pr < 2r log r not dividing lc(b), with r chosen
appropriately such that their product exceeds 2B, calculate (a mod pi )/(b mod pi ) for all i (if this
is not possible, then certainly b ∤ a), compute a trial quotient q by Chinese remaindering, and fi-
nally check whether ||b||1 ||q||1 ≤ B. Work out the details, prove that this procedure works correctly,
and show that it takes O((M(n) + logloglog B) log B · M(loglog B) + n M(log B loglog B) loglog B) or
O∼ (n2 + n log A) word operations. You may ignore the cost of O(log B(loglog B)2 logloglog B) word
operations for finding the small primes (Theorem 18.10). Use Corollary 11.13 for arithmetic in F pi .
See also Exercises 6.26 and 9.14.
The mathematically sophisticated will know how to skip formulæ.
This skill is easy to practice for others also.
Leslie G. Valiant (1994)

Thus it appears that whatever may be the number of digits

the Analytical Engine is capable of holding, if it is required to
make all the computations with k times that number of digits,
then it can be executed by the same Engine,
but in an amount of time equal to k2 times the former.
Charles Babbage (1864)

Τόδε ἤδη ἐπεσκέψω, ὡς οἵ τε φύσει λογιστικοὶ εἰς πάντα τὰ

μαθήματα ὡς ἔπος εἰπει̃ν ὀξει̃ς φύονται, οἵ τε βραδει̃ς, ἄν ἐν τούτω
παιδευθω̃σι καὶ γυμνάσωνται, κἂν μηδὲν ἄλλο ὠφεληθω̃σιν, ὅμως
εἴς γε τὸ ὀξύτεροι αὐτοὶ αὐτω̃ν γίγνεσθαι πάντες ἐπιδιδόασιν.1
Plato (c. 375 BC)

1 Have you [Glaukon] ever noticed that those who have a talent for mathematics are, almost without exception,
talented in all sciences? And that mentally slow people, if they be trained and exercised in this study, become
invariably quicker than they were before, even if they draw no other profit from it?
11
Fast Euclidean Algorithm

The main result of this chapter is a fast algorithm for the quotients in the Euclidean
Algorithm for univariate polynomials over a field, using O(M(n) log n) field oper-
ations for inputs of degree at most n. One can also compute a single remainder ri
together with the corresponding si and ti , at the same cost, but this is not possible
for all remainders together. The final section shows how one can also calculate
subresultants in softly linear time.
In order to make the description and the proofs simpler, the fast Euclidean Algo-
rithm that we present in this section will not make the remainders in intermediate
computations monic, but only at the end, if desired. For that reason, it is most suit-
able for coefficient fields with a fixed size representation for the elements, such as
finite fields. For other coefficient domains, it is best to use a combination of the
modular algorithms described in Chapter 6 with the algorithm in this chapter.

11.1. A fast Euclidean Algorithm for polynomials

Let F be a field, r0 , r1 ∈ F[x]\{0} with deg r0 ≥ deg r1 , s0 = t1 = 1, s1 = t0 = 0, and

r2 = r0 − q1 r1 , s2 = s0 − q1 s1 , t2 = t0 − q1t1 ,
.. .. ..
. . .
ri+1 = ri−1 − qi ri , si+1 = si−1 − qi si , ti+1 = ti−1 − qiti , (1)
.. .. ..
. . .
0 = rℓ−1 − qℓ rℓ , sℓ+1 = sℓ−1 − qℓ sℓ , tℓ+1 = tℓ−1 − qℓtℓ

be the results of the traditional Extended Euclidean Algorithm for r0 , r1 , where

deg ri+1 < deg ri for 1 ≤ i < ℓ, as introduced in Section 3.2. We will assume that
rℓ+1 = 0 and deg rℓ+1 = −∞. Then Lemma 3.8 (i) says that

ri 0 1 ri−1 ri−1 r0
= = Qi = Qi · · · Q1 ,
ri+1 1 −qi ri ri r1

313
314 11. Fast Euclidean Algorithm

where

0 1 2×2 si ti
Qi = ∈ F[x] and Ri = Qi · · · Q1 = . (2)
1 −qi si+1 ti+1
Let ni = deg ri for 0 ≤ i ≤ ℓ and mi = deg qi = ni−1 − ni for 1 ≤ i ≤ ℓ. The
sequence (n0 , n1 , . . . , nℓ ) is the degree sequence in the Extended Euclidean Algo-
rithm for r0 , r1 . If F = Fq is a finite field with q elements and r0 , r1 ∈ Fq [x] with
deg r0 = n0 > deg r1 = n1 are uniform random polynomials, then
1
prob(deg r2 < n1 − 1) =
q
(Exercise 4.18), which is rather small for large q. So typically one can expect that
the degree of each quotient is 1, or equivalently, that ni+1 = ni − 1 for 1 ≤ i < ℓ. In
that case, we call the degree sequence (n0 , . . . , nℓ ) normal.
The basic idea leading to a fast gcd algorithm is that the first quotients qi only
depend on the highest coefficients of r0 and r1 . To express this idea formally, we
introduce some notation.
Let f = fn xn + fn−1 xn−1 + · · · + f0 ∈ F[x] with leading coefficient fn 6= 0, and
k ∈ Z. Then we define the truncated polynomial
f ↾ k = f quo xn−k = fn xk + fn−1 xk−1 + · · · + fn−k ,
where we set fi = 0 for i < 0. So for k ≥ 0, f ↾ k is a polynomial of degree k whose
coefficients are the k + 1 highest coefficients of f , and f ↾ k = 0 if k < 0. We also
define f ↾ −∞ = 0, and 0 ↾ k = 0 for all k ∈ Z ∪ {−∞}. For all i ≥ 0 we have that
( f xi ) ↾ k = f ↾ k.
Now let f , g, f ∗ , g∗ ∈ F[x] with f , f ∗ both nonzero, deg f ≥ deg g and deg f ∗ ≥
deg g∗ , and k ∈ Z. Then (f, g) and (f ∗ , g ∗ ) coincide up to k if
f ↾ k = f ∗ ↾ k,
g ↾ (k − (deg f − deg g)) = g∗ ↾ (k − (deg f ∗ − deg g∗ )).
This defines an equivalence relation on F[x] \ {0} × F[x] (Exercise 11.1). If ( f , g)
and ( f ∗ , g∗ ) coincide up to k and k ≥ deg f − deg g, then deg f − deg g = deg f ∗ −
deg g∗ .
We consider one division step in the Euclidean Algorithm.

L EMMA 11.1. Let k ∈ N, ( f , g) and ( f ∗ , g∗ ) in (F[x]\{0})2 coincide up to 2k, and

k ≥ deg f − deg g ≥ 0. Define q, r, q∗ , r∗ ∈ F[x] by division with remainder:
f = qg + r, deg r < deg g,
f ∗ = q∗ g∗ + r∗ , deg r∗ < deg g∗ .
Then q = q∗ , and either (g, r) and (g∗ , r∗ ) coincide up to 2(k − deg q), or r = 0, or
k − deg q < deg g − deg r.
11.1. A fast Euclidean Algorithm for polynomials 315

P ROOF. By multiplying ( f , g) and ( f ∗ , g∗ ) with appropriate powers of x if nec-

essary, we may assume that deg f = deg f ∗ > 2k holds. Then deg g = deg g∗ ,
k ≥ deg q = deg f − deg g = deg f ∗ − deg g∗ = deg q∗ ,

deg( f − f ∗ ) < deg f − 2k ≤ deg g − k,

deg(g − g∗ ) < deg g − (2k − (deg f − deg g)) = deg f − 2k
(3)
≤ deg g − k ≤ deg g − deg q,
deg(r − r ) ≤ max{deg r, deg r∗ } < deg g,
∗

and
f − f ∗ = q(g − g∗ ) + (q − q∗ )g∗ + (r − r∗ ). (4)
The polynomials f − f ∗ , q(g − g∗ ) and r − r∗ all have degree less than deg g by (3),
hence also deg((q − q∗ )g∗ ) < deg g = deg g∗ , which implies that q = q∗ .
Now we assume that r 6= 0 and k − deg q ≥ deg g − deg r. We have to show that
g ↾ (2(k − deg q)) = g∗ ↾ (2(k − deg q)),
r ↾ (2(k − deg q) − (deg g − deg r)) = r∗ ↾ (2(k − deg q) − (deg g∗ − deg r∗ )).
The first assertion follows from the coincidence up to 2k of ( f , g) and ( f ∗ , g∗ ).
Furthermore we have
deg(r − r∗ ) ≤ max{deg( f − f ∗ ), deg q + deg(g − g∗ )}
< deg q + deg f − 2k = deg g − 2(k − deg q) (5)
= deg r − (2(k − deg q) − (deg g − deg r)),
by (4) and (3), and by the above assumption
deg r ≥ deg q + deg g − k ≥ deg q + deg f − 2k > deg(r − r∗ ),
so that deg r = deg r∗ . Now the second assertion follows from the second inequality
in (5). ✷

E XAMPLE 11.2. Let

f = x8 + 5x7 + 3x6 + 5x4 + 4x3 + 5x2 + 2x + 2,
g = x7 + 4x6 + 4x5 + 2x4 + x3 + 4x2 + x + 3
be polynomials over F7 , and
f ∗ = f ↾ 4 = x4 + 5x3 + 3x2 + 5, g∗ = g ↾ 3 = x3 + 4x2 + 4x + 2.
Then ( f , g) and ( f ↾ 4, g ↾ 3) coincide up to 2 · 2 = 4, so that k = 2. In the proof of
Lemma 11.1, we multiply the second pair with x4 and obtain
f ∗ = x8 + 5x7 + 3x6 + 5x4 , g∗ = x7 + 4x6 + 4x5 + 2x4 .
316 11. Fast Euclidean Algorithm

Then
q = x + 1, r = 2x6 + x5 + 2x4 + 6x3 + 5x + 6,
q∗ = x + 1, r∗ = 2x6 + x5 + 3x4 .
We see that q = q∗ and r ↾ 1 = r∗ ↾ 1, and since g ↾ 2 = g∗ ↾ 2, we have that (g, r)
and (g∗ , r∗ ) coincide up to 2 = 2(k − deg q). Now r∗ /x4 = ( f ↾ 4) rem (g ↾ 3), and
we find that also (g, r) and (g ↾ 3, ( f ↾ 4) rem (g ↾ 3)) coincide up to 2(k − deg q),
as stated in the lemma. ✸

Lemma 11.1 gives only sufficient conditions for the quotients to be equal. Often
less information is necessary; in the above example, the constant coefficient of f ∗
may be altered without changing the quotient.
Next we consider the Euclidean Algorithm for two pairs r0 , r1 and r0∗ , r1∗ of poly-
nomials with deg r0 > deg r1 and deg r0∗ > deg r1∗ :

r0 = q1 r1 + r2 , r0∗ = q∗1 r1∗ + r2∗ ,

.. ..
. .
∗
ri−1 = qi ri + ri+1 , ri−1 ∗
= q∗i ri∗ + ri+1 ,
.. ..
. .
..
. rℓ∗∗ −1 = q∗ℓ∗ rℓ∗∗ ,
rℓ−1 = qℓ rℓ

of length ℓ and ℓ∗ , respectively, and let mi = deg qi for 1 ≤ i ≤ ℓ and m∗i = deg q∗i
for 1 ≤ i ≤ ℓ∗ . As usual, we let ni = deg ri = n0 − m1 − · · · − mi for 0 ≤ i ≤ ℓ and
nℓ+1 = −∞, and we define n∗i accordingly, for 0 ≤ i ≤ ℓ∗ + 1.
Moreover, we define for any k ∈ N the number η (k) ∈ N by

η (k) = max{0 ≤ j ≤ ℓ: ∑ mi ≤ k},

1≤i≤ j

so that
n0 − nη(k) = ∑ mi ≤ k < ∑ mi = n0 − nη(k)+1 , (6)
1≤i≤η (k) 1≤i≤η (k)+1

where the second inequality only holds if η (k) < ℓ, and η (k) is uniquely determined
by (6). In other words, the number n0 − k is sandwiched between the two con-
secutive remainder degrees deg rη(k) and deg rη(k)+1 in the Euclidean Algorithm.
In particular, η (k) ≤ k since mi ≥ 1 for 1 ≤ i ≤ ℓ. We define η ∗ (k) analogously.
The following lemma says, in a precise way, that the first results in the Euclidean
Algorithm only depend on the top part of the inputs.

L EMMA 11.3. Let k ∈ N, h = η (k), and h∗ = η ∗ (k). If (r0 , r1 ) and (r0∗ , r1∗ ) coincide
up to 2k, then h = h∗ and qi = q∗i for 1 ≤ i ≤ h.
11.1. A fast Euclidean Algorithm for polynomials 317

P ROOF. We show by induction on j that the following holds for 0 ≤ j ≤ h:

j ≤ h∗ and qi = q∗i for 1 ≤ i ≤ j, and
either j = h or (r j , r j+1 ) and (r∗j , r∗j+1 ) coincide up to 2(k − n0 + n j ).
The assertion of the lemma follows from this and the symmetric statement.
There is nothing to prove for j = 0. Let us assume the induction hypothesis
holds for j − 1 ≥ 0. Then j − 1 < h implies r j−1 6= 0, k ≥ n0 − n j ≥ n j−1 − n j =
n∗j−1 −n∗j , and hence also r∗j 6= 0. Lemma 11.1, with k replaced by k −n0 +n j−1 , im-
plies that q j = q∗j , and either (r j , r j+1 ) and (r∗j , r∗j+1 ) coincide up to 2(k − n0 + n j ),
or r j+1 = 0, or k − n0 + n j < n j − n j+1 . In the second case, we have j + 1 > l, j ≥ 1,
and hence j = h = 1. In the third case we have h = η (k) = j, by the definition of η .
Finally, the assertion j ≤ h∗ is true because

∑ deg q∗i = ∑ deg qi ≤ ∑ deg qi ≤ k,

1≤i≤ j 1≤i≤ j 1≤i≤h

and hence j ≤ η ∗ (k) = h∗ . This concludes the induction step. ✷

We use this lemma to construct a divide-and-conquer algorithm for a single row

in the traditional Extended Euclidean Algorithm, say the last one. One has to be
careful about dividing the problem into two subproblems of about the same size.
An appealing idea is to first compute the first ℓ/2 of the quotients qi and then the
second half, where ℓ is the number of division steps as usual, but this is not the
best way because in general the quotients may have very different degrees. It is
better to divide the sequence of the quotients into two parts such that the sum of
the degrees in the first part is about the same as in the second part. In the case of a
normal degree sequence, this will of course yield the same result as the first idea.
The following algorithm uses this strategy. For simplicity, it does not return
the quotients, but instead returns the product of the corresponding Q-matrices.
Moreover, in order to simplify the presentation and the analysis, we assume for
the moment that the degree sequence is normal.

A LGORITHM 11.4 Half gcd for normal degree sequence.

Input: r0 , r1 ∈ F[x], n = deg r0 ≥ 0, deg r1 = n − 1 if n > 0 and r1 = 0 if n = 0, and
k ∈ N with 0 ≤ k ≤ n.
Output: Rk ∈ F[x]2×2 as in (2).

1 0
1. if k = 0 then return
0 1
2. d ←− ⌈k/2⌉
3. call the algorithm recursively with input r0 ↾ (2d − 2), r1 ↾ (2d − 3) and d − 1
(instead of k), returning R = Qd−1 · · · Q1
318 11. Fast Euclidean Algorithm

rd−1 r0
4. ←− R
rd r1

0 1
5. qd ←− rd−1 quo rd , rd+1 ←− rd−1 rem rd , Qd ←−
1 −qd

6. d ∗ ←− ⌊k/2⌋

7. call the algorithm recursively with input rd ↾ 2d ∗ , rd+1 ↾ (2d ∗ − 1) and d ∗

(instead of k), returning S = Qk · · · Qd+1

8. return SQd R

E XAMPLE 11.2 (continued). Let r0 = f and r1 = g, where f and g in F7 [x] are as

in Example 11.2, and k = 5. The quotients in the Euclidean Algorithm for r0 and
r1 are
q1 = x + 1, q2 = 4x, q3 = 3x + 6, q4 = 2x + 3,
q5 = 4x, q6 = 2x, q7 = 5x, q8 = 4x + 2,
of degrees mi = 1 for 1 ≤ i ≤ 8, and the length is ℓ = 8. The half gcd algorithm
proceeds as follows.

2. d = ⌈5/2⌉ = 3.

3. The recursive call with r0 ↾ (2 · 3 − 2) = x4 + 5x3 + 3x2 + 5, r1 ↾ (2 · 3 − 3) =

x3 + 4x2 + 4x + 2 and d − 1 = 2 yields

0 1 0 1
R = Q2 Q1 =
1 3x 1 6x + 6

1 6x + 6 s2 t2
= = .
3x 4x2 + 4x + 1 s3 t3

r2 r0 2x6 + x5 + 2x4 + 6x3 + 5x + 6
4. =R = .
r3 r1 3x5 + 6x4 + x3 + 5x2 + 5x + 3

5. Here q3 , r4 = 5x4 + 6x3 + 4x2 + x + 2, and Q3 are computed.

6. d ∗ ←− 2.

7. The recursive call with r3 ↾ (2 · 2) = 3x4 + 6x3 + x2 + 5x + 5, r4 ↾ (2 · 2 − 1) =

5x3 + 6x2 + 4x + 1 and d ∗ returns

0 1 0 1 1 5x + 4
S = Q5 Q4 = = .
1 3x 1 5x + 4 3x x2 + 5x + 1
11.1. A fast Euclidean Algorithm for polynomials 319

8. The matrix

1 5x + 4 0 1 1 6x + 6
SQ3 R =
3x x2 + 5x + 1 1 4x + 1 3x 4x2 + 4x + 1

4x3 + 6x + 4 3x3 + 3x3 + 4x + 1 s5 t5
= =
5x4 + 2x2 + x + 1 2x5 + 2x4 + 2x3 + 4x2 + 3x s6 t6
is returned. ✸

In the following correctness proof, we denote by log the binary logarithm.

T HEOREM 11.5.
In the case of a normal degree sequence, Algorithm 11.4 works correctly. If k ≥
n/2, it uses no more than (12M(k) + O(k)) log k additions and multiplications plus
k inversions, in total O(M(k) log k) operations in F .

P ROOF. Since the empty product is defined to be the multiplicative neutral ele-
ment, in this case the identity matrix, the statements of the theorem are satisfied if
k = 0. Otherwise, we have n > 0, and we see by induction on k and Lemma 11.3
that the results of the recursive call in step 3 are correct. Since the degree sequence
is normal, we have deg rd = n − d ≥ 0 and therefore rd 6= 0 in step 5. Again by
induction and Lemma 11.3, the results of the recursive call in step 7 are correct,
and hence also the final result in step 8.
We may arrange things in such a way that the only inversions in F take place in
step 5, and that lc(r j )−1 is computed only once. During the recursive process, step
5 is executed k times.
Let T (k) denote the number of additions and multiplications that the algorithm
uses on input k. Since d − 1 = ⌈k/2⌉ − 1 ≤ ⌊k/2⌋, assuming monotonicity of T ,
steps 3 and 7 take at most T (⌊k/2⌋) operations each for solving a subproblem
of the same kind. We now analyze the cost for the polynomial multiplications,
divisions, and additions in steps 4, 5 and 8. We note that, by our assumption,
ni = deg ri = n − i for 0 ≤ i ≤ n.
In step 4, the entries of R are sd−1 ,td−1 , sd ,td , by Lemma 3.8 (ii), and their
degrees are n1 − nd−2 , n0 − nd−2 , n1 − nd−1 , n0 − nd−1 , by Lemma 3.10. All four
degrees are at most d − 1 < k/2. We have four multiplications of polynomials of
degree at most k/2 by polynomials of degree at most n ≤ 2k, plus some additions.
Dividing the larger polynomials into blocks of degree at most k/2 (Exercise 8.35),
the cost for step 4 is 16M(k/2) + O(k), or 8M(k) + O(k), by the superlinearity
properties (9) in Section 8.3.
Since nd−1 = nd + 1, the cost for step 5 is O(k). In step 8, we first compute Qd R,
or equivalently, sd+1 = sd−1 − qd sd and td+1 = td−1 − qd td . Since deg qd = 1, this
takes another O(k) operations.
320 11. Fast Euclidean Algorithm

The degrees of sd+1 and td+1 , in the bottom row of Qd R, are at most n0 − nd =
⌈k/2⌉. The entries of S have degrees nd+1 − nk−1 , nd − nk−1 , nd+1 − nk , nd − nk ,
all at most d ∗ ≤ k/2. Thus the computation of S · Qd R takes at most another
8M(k/2) + O(k), or 4M(k) + O(k) operations.
Putting things together, we have a cost of at most 12M(k) + O(k). Thus T satis-
fies the recursive inequalities

T (0) = 0, T (k) ≤ 2T (⌊k/2⌋) + 12M(k) + ck for k > 0,

for some constant c ∈ R, and hence T (k) is at most (12M(k) + O(k)) log k, by
Lemma 8.2. ✷

We now state a more sophisticated version of Algorithm 11.4, which differs

from it in the following respects.

◦ Some optimization is possible in step 4. First, since the remainders are going to
be truncated anyway in step 7, it is sufficient to compute only the top 2(d ∗ + 2)
coefficients of rd−1 and rd , which we obtain by applying R to the vector with
the truncated entries r0 ↾ 2k and r1 ↾ 2k − 1. This will also allow us to drop the
restriction k ≥ n/2 for the cost estimate in Theorem 11.5.
In addition, we can make use of the fact that the top d − 1 coefficients in the
matrix-vector product vanish, and truncate at the top as well. This reduces
the cost for step 4 to 6M(k) + O(k) and the constant in Theorem 11.5 from 12
to 10.
◦ Using the strategy discussed in the paragraph before Algorithm 11.4, the im-
proved algorithm also works when the degree sequence is not normal. Instead
of returning Rk , the algorithm returns h = η (k), as defined in (6), and Rh . In par-
ticular, the restriction deg r0 > deg r1 can be relaxed to deg r0 ≥ deg r1 .
◦ In addition to the matrix Rh , the improved algorithm also returns the quotients
q1 , . . . , qh .

A LGORITHM 11.6 Half gcd.

Input: r0 , r1 ∈ F[x], n = n0 = deg r0 ≥ deg r1 = n1 , and k ∈ N with 0 ≤ k ≤ n.
Output: h = η (k) ∈ N as defined in (6), q1 , . . . , qh ∈ F[x], and Rh ∈ F[x]2×2 as in (2).

1 0
1. if r1 = 0 or k < n0 − n1 then return 0, the empty sequence, and
 0 1
0 1
lc(r0 )
else if k = 0 = n0 − n1 then return 1, , and  lc(r0 ) 
lc(r1 ) 1 −
lc(r1 )
2. d ←− ⌈k/2⌉
11.1. A fast Euclidean Algorithm for polynomials 321

3. call the algorithm recursively with r0 ↾ (2d − 2), r1 ↾ (2d − 2 − (n0 − n1 ))

and d − 1 (instead of k) as inputs, returning η (d − 1), q1 , . . . , qη(d−1) , and
R = Qη(d−1) · · · Q1
4.
j ←− η (d − 1) +
1, δ ←− deg R2,2 ,
r̃ j−1 r0 ↾ 2k ñ j−1 deg r̃ j−1
←− R , ←−
r̃ j r1 ↾ 2k − (n0 − n1 ) ñ j deg r̃ j
5. if r̃ j = 0 or k < δ + ñ j−1 − ñ j then return j − 1, q1 , . . . , q j−1 , and R

0 1
6. q j ←− r̃ j−1 quo r̃ j , Q j ←−
1 −q j
r̃ j+1 ←− r̃ j−1 rem r̃ j , ñ j+1 ←− deg r̃ j+1
7. d ∗ ←− k − δ − (ñ j−1 − ñ j )
8. call the algorithm recursively with input r̃ j ↾ 2d ∗ , r̃ j+1 ↾ (2d ∗ − (ñ j − ñ j+1 ))
and d ∗ (instead of k), returning η (d ∗ ), q j+1 , . . . , qh , and S = Qη(d ∗ )+ j · · · Q j+1
9. h ←− η (d ∗ ) + j
return h, q1 , . . . , qh , and SQ j R

Note that in the case of a normal degree sequence, we have η (d − 1) = d − 1

in step 3, j = d and δ = d − 1 in step 4, the condition in step 5 is never satisfied,
d ∗ = k − d in step 7, η (d ∗ ) = d ∗ in step 8, and h = k in step 9, so that Algorithm
11.4 is indeed a special case of Algorithm 11.6.
Before proving the correctness of the latter algorithm, let us see how it works in
the above example.

E XAMPLE 11.2 (continued). Let r0 = f + x3 and r1 = g + x2 , where f and g in

F7 [x] are as in Example 11.2, and k = 5. The quotients in the Euclidean Algorithm
for r0 and r1 are
q1 = x + 1, q2 = 4x, q3 = 3x + 6, q4 = 6x3 + 5
of degrees m1 = 1, m2 = 1, m3 = 1, and m4 = 3, and the length is ℓ = 4. The
algorithm proceeds as follows.
2. d = ⌈5/2⌉ = 3.
3. The recursive call with input r0 ↾ (2 · 3 − 2) = r0 ↾ 6 = x4 + 5x3 + 3x2 + 5,
r1 ↾ (6 − (8 − 7)) = x3 + 4x2 + 4x + 2 and d − 1 = 2 yields η (2) = 2, q1 , q2 ,
and

0 1 0 1
R = Q2 Q1 =
1 3x 1 6x + 6

1 6x + 6 s2 t2
= = .
3x 4x2 + 4x + 1 s3 t3
322 11. Fast Euclidean Algorithm

2 2
4. j ←− 3, δ ←−
2, r0 ↾ 10= xr0 , r18 ↾ 9 7= x r16,
r̃2 r0 ↾ 10 2x + x + 2x + 6x5 + 6x4 + 5x3 + 6x2
=R = ,
r̃3 r1 ↾ 9 3x7 + 6x6 + 5x5 + 6x4 + 5x3 + 3x2
ñ2 ←− 8, and ñ3 ←− 7.

5. r̃3 6= 0 and k = 5 ≥ 3 = δ + ñ2 − ñ3 , so the condition is not satisfied.

6. Here q3 , Q3 , r̃4 = 4x4 + x3 + 2x2 , and ñ4 = 4 are computed.

7. d ∗ ←− 2.

8. The recursive call with input r̃3 ↾ (2 · 2) = 3x4 + 6x3 + 5x 2

+ 6x +
5,
1 0
r̃4 ↾ (2 · 2 − (7 − 4)) = 4x + 1 and d ∗ returns η (2) = 0 and S = .
0 1

9. h = 3, q1 , q2 , q3 , and the matrix

1 0 0 1 1 6x + 6
SQ3 R =
0 1 1 4x + 1 3x 4x2 + 4x + 1

3x 4x2 + 4x + 1 s3 t3
= =
5x2 + 3x + 1 2x3 + 6x2 s4 t4

are returned, which is correct because

∑ mi = 3 ≤ 5 < 6 = ∑ mi . ✸
1≤i≤3 1≤i≤4

T HEOREM 11.7.
Algorithm 11.6 works correctly and uses no more than (22M(k) + O(k)) log k ad-
ditions and multiplications plus k + 1 inversions, in total O(M(k) log k) opera-
tions in F . The bound on the number of additions and multiplications drops to
(10M(k) + O(k)) log k if the degree sequence is normal.

P ROOF. Let ℓ be the number of division steps in the Euclidean Algorithm for
(r0 , r1 ), ni = deg ri for 0 ≤ i ≤ ℓ, and mi = deg qi for 1 ≤ i ≤ ℓ as usual. If r1 = 0 or
k < n0 − n1 , then η (k) = 0, and the algorithm correctly returns the identity matrix
in step 1 (note that the empty product is defined to be the multiplicative neutral
element). If k = 0 = n0 − n1 , then the first quotient is q1 = lc(r0 )/ lc(r1 ), and the
correct result is returned as well. Otherwise, k ≥ 1, and we see by induction on
k and Lemma 11.3 that the results of the recursive call in step 3 are correct. In
particular, δ = m1 + · · · + m j−1 ≤ d − 1 < k by (6) and the definition of d.
In step 4, we have that (r j−1 , r j ) and (r̃ j−1 , r˜j ) coincide up to 2(k − δ ), unless
r̃ j = 0 or k − δ < ñ j−1 − ñ j , by arguments similar to the ones used in the proof of
Lemma 11.1. In particular, n0 − n j−1 = 2k − ñ j−1 and ñ j−1 − n˜j = m j = n j−1 − n j
11.1. A fast Euclidean Algorithm for polynomials 323

if r̃ j 6= 0. If now r̃ j = 0 or k < δ + (ñ j−1 − ñ j ) = δ + m j , then η (k) = j − 1 and the

algorithm returns the correct result in step 5. Otherwise, by Lemma 11.1, the cor-
rect quotient q j is computed in step 6, and either r̃ j+1 = 0 or k − δ − (ñ j−1 − ñ j )<
ñ j − ñ j+1 or (r j , r j+1 ) and (r̃ j , r̃ j+1 ) coincide up to 2(k − δ − (n j−1 − n j )) = 2d ∗ .
Again by induction and Lemma 11.3, the results of the recursive call in step 8
are correct, and

n j − nh = ∑ mi ≤ d ∗ < ∑ mi = n j − nh+1 (7)

j+1≤i≤h j+1≤i≤h+1

or else h = ℓ. It remains to show that h = η (k). By (7) we have

n0 − nh = (n0 − n j ) + (n j − nh ) ≤ (n0 − n j ) + d ∗
< (n0 − n j ) + (n j − nh+1 ) = n0 − nh+1

or else h = ℓ. But this implies h = η (n0 − n j + d ∗ ) = η (k) by (6) and the definition
of d ∗ , and hence the final results in step 9 are correct.
As in the proof of Theorem 11.5, we may arrange things in such a way that the
number of inversions in F performed during the recursive process is at most k + 1.
Let T (k) denote the number of additions and multiplications that the algorithm
uses on input k. Steps 3 and 8 take T (d − 1) and T (d ∗ ) operations, respectively, for
solving a subproblem of the same kind, together at most 2T (⌊k/2⌋) since d − 1 ≤
⌊k/2⌋ and d ∗ = k − (n0 − n j ) < k − (d − 1) = ⌊k/2⌋ + 1, by (6). We now analyze
the cost for the polynomial multiplications, divisions, and additions in steps 1, 4,
6 and 9. The cost for step 1 is O(1).
In step 4, the entries of R are s j−1 ,t j−1 , s j ,t j , by Lemma 3.8 (ii), and their degrees
are n1 − n j−2 , n0 − n j−2 , n1 − n j−1 , n0 − n j−1 , by Lemma 3.10. All four degrees are
at most n0 − n j−1 = δ ≤ d − 1 < k/2. We have four multiplications of polynomials
of degree at most k/2 by polynomials of degree at most 2k, plus some additions,
taking 8M(k) + O(k) operations in F, as in the proof of Theorem 11.5. (As noted
earlier, in the normal case, the bound drops to 6M(k) + O(k).)
In step 6, computing the quotient of degree m j = ñ j−1 − ñ j ≤ k of the division of
r̃ j−1 by r̃ j , of degree ñ j = 2k − (n0 − n j ) = k + d ∗ , takes 4M(k) + O(k) operations,
by Theorem 9.6. By partitioning the divisor r̃ j into two blocks of sizes at most
k/2, as in Exercise 8.35, the remainder can be computed using at most another
2M(k) + O(k) operations, together no more than 6M(k) + O(k).
In step 9, we first compute Q j R, or equivalently, s j+1 = s j−1 − q j s j and t j+1 =
t j−1 − q j t j . This amounts to two multiplications of the quotient q j of degree
n j−1 − n j ≤ k by a polynomial of degree at most n0 − n j−1 < k/2 plus some ad-
ditions, taking at most 2M(k) + O(k).
The degrees of s j+1 and t j+1 , the lower row of Q j R, are at most n0 − n j ≤ k. The
entries of S have degrees n j+1 − nh−1 , n j − nh−1 , n j+1 − nh , n j − nh , respectively,
324 11. Fast Euclidean Algorithm

all at most n j − nh ≤ d ∗ ≤ k/2. Thus the computation of S · Q j R takes at most

another 4M(k/2) + 4M(k) + O(k) or 6M(k) + O(k) operations. In the normal case,
we have n0 − n j = ⌈k/2⌉ + 1, and the cost is only 4M(k) + O(k).
Putting things together, we have a cost of at most 22M(k) + O(k). Thus T satis-
fies the recursive inequalities

T (0) = 0, T (k) ≤ 2T (⌊k/2⌋) + 22M(k) + ck for k > 0,

for some constant c ∈ R, and hence T (k) is at most (22M(k) + O(k)) log k, by
Lemma 8.2. In the normal case, the bound drops to (10M(k) + O(k)) log k. ✷

We have not attempted to determine the smallest possible constant in the place
of 22. In fact, it is possible to prove a slightly better bound on the arithmetic cost
for the “non-normal” case where not all the quotients have small degree. Strassen
(1983) showed that in the nonscalar model of computation, where additions and
multiplications by scalars are not counted and an interpolation algorithm proves
(over an infinite field) that M(n) ∈ O(n), the cost for Algorithm 11.6 when k = n
can be bounded by O(n · H(m1 /m, . . . , mℓ /m)), where m = ∑1≤i≤ℓ mi and H is the
entropy function, as in Section 10.1. This coincides with the bound of Theorem
11.7 in the normal case, since then mi /m = 1/n for 1 ≤ i ≤ ℓ = n.
Strassen also showed that the computation of all quotients of the Euclidean Al-
gorithm in the nonscalar model requires at least about n · H(m1 /m, . . . , mℓ /m) field
operations for almost all pairs of polynomials with quotient degrees m1 , . . . , mℓ .
This lower bound shows that Algorithm 11.6 is uniformly optimal in the nonscalar
model.
The qi as calculated in step 5 of Algorithm 11.6 are exactly the quotients in the
Euclidean Algorithm. Thus, taking M(n) ∈ O(n log n loglog n), Theorem 11.7 for
k = n implies that all the quotients qi in the Euclidean Algorithm can be computed
in time O∼ (n). Can we also compute all the remainders ri together in softly linear
time? The answer is: no. In the normal case, where always deg ri = n − i, the
number of coefficients of r0 , . . . , rℓ is

∑ (deg ri + 1) = ∑ (n − i + 1) = (n2 + 3n + 2)/2.

0≤i≤ℓ 0≤i≤n

But this means that the output size is quadratic in the input size 2n, and hence any
algorithm that computes r0 , . . . , rℓ requires at least n2 /2 field operations, since in
some examples all output values are different (if the field is large enough), and
therefore each requires at least one operation.
Algorithm 11.6 computes the product Rη(k) of the Q-matrices in the Euclid-
ean Algorithm such that the sum of the degrees of the corresponding quotients is
roughly k. Given f , g ∈ F[x], it is easy to compute the gcd of f and g and the
11.1. A fast Euclidean Algorithm for polynomials 325

Bézout coefficients s,t from this matrix, by letting k = n. Then η (k) = ℓ, sℓ and tℓ
constitute the first row of the matrix Rℓ , and

rℓ = sℓ f + tℓ g.

Dividing by lc(rℓ ) gives gcd( f , g) = rℓ / lc(rℓ ) and the Bézout coefficients s =

sℓ / lc(rℓ ) and t = tℓ / lc(rℓ ). Moreover, it is possible (as it is with the classi-
cal Extended Euclidean Algorithm) to compute any single row rh , sh ,th for some
h ∈ {1, . . . , ℓ} when h is specified by a lower bound on deg rh , or equivalently,
an upper bound on ∑1≤i≤h deg qi , so that a number k ∈ N is given and h is deter-
mined as h = η (k). This has applications in computer algebra and coding theory
beyond the sole computation of gcds. Examples are: rational function reconstruc-
tion (Section 5.7), Padé approximation (Section 5.9), Cauchy interpolation (Sec-
tion 5.8), decoding of BCH codes (the Berlekamp–Massey algorithm in Chapter 7),
and fast sparse linear system solving (Chapter 12).
For completeness, we will now state such an algorithm returning all quotients
as well as a complete row in the traditional Extended Euclidean Algorithm.

A LGORITHM 11.8 Fast Extended Euclidean Algorithm.

Input: f , g ∈ F[x] \ {0} with n = deg f ≥ deg g = m, and k ∈ N with 0 ≤ k ≤ n.
Output: The quotients q1 , . . . , qh ∈ F[x] and polynomials sh ,th as in (1) for r0 = f
and r1 = g, where h = η (k) is defined as in (6), and the remainder rh .

1. call Algorithm
11.6 withinput f , g, and k, to compute h = η (k), q1 , . . . , qh ,
sh th
and Rh =
sh+1 rh+1
2. return q1 , . . . , qh , sh ,th , and sh f + th g

By Theorem 11.7, the cost for computing only the quotients and sh ,th in this
algorithm is O(M(k) log k) additions and multiplications in F plus at most k + 1
inversions. The additional cost for computing rh is O(M(n)) additions and mul-
tiplications. Note that it is easy to compute the corresponding row in the monic
Extended Euclidean Algorithm by multiplying rh , sh ,th by lc(rh )−1 , at a cost of
another O(n) field operations.

C OROLLARY 11.9.
For polynomials f , g ∈ F[x] of degree at most n, all of the following can be com-
puted with O(M(n) log n) additions and multiplications plus at most n + 2 inver-
sions, or O∼ (n) operations in F :
◦ rℓ = gcd( f , g),
◦ the Bézout coefficients s,t ∈ F[x] with s f + tg = rℓ ,
326 11. Fast Euclidean Algorithm

◦ the entries rh , sh ,th ∈ F[x] of an arbitrary row in the (traditional or monic) Ex-
tended Euclidean Algorithm for f , g,
◦ the quotients q1 , . . . , qℓ in the traditional Euclidean Algorithm for f , g.

We give explicit constants for the first two statements of the above corollary.

T HEOREM 11.10.
Let f , g ∈ F[x] \ {0} with deg g ≤ deg f ≤ n.
(i) With at most (22M(n) + O(n)) log n additions and multiplications plus n + 2
inversions in F , we can decide whether f and g are coprime, and if so,
compute the Bézout coefficients s,t ∈ F[x] such that s f + tg = 1.
(ii) If the degree sequence is normal, then we can compute the monic gcd( f , g)
and the Bézout coefficients s,t ∈ F[x] such that s f + tg = gcd( f , g) using at
most 10M(n) log n + 2M(n) + O(n log n) additions and multiplications plus
n + 2 inversions.

P ROOF. (i) We see from Lemma 3.10 that the gcd is constant if and only if
deg sℓ+1 = deg g and degtℓ+1 = deg f . If so, then we return s = sℓ /rℓ and t = tℓ /rℓ .
The cost for computing the constant polynomial rℓ = sℓ (0) f (0) + tℓ (0)g(0) and
dividing sℓ ,tℓ by it is one inversion and O(n) multiplications and additions, and the
claim follows from Theorem 11.7 with k = n.
(ii) The degrees of sℓ and tℓ are less than n, by Lemma 3.10, and hence computing
rℓ = sℓ r0 + tℓ r1 amounts to 2M(n) + O(n) arithmetic operations. ✷

C OROLLARY 11.11.
Let f ∈ F[x] of degree n > 0. A product in the residue class ring F[x]/h f i can
be computed with 6M(n) + O(n) arithmetic operations in F , and an inverse with
no more than (22M(n) + O(n)) log n operations. Thus one arithmetic operation in
F[x]/h f i takes O∼ (n) arithmetic operations in F .

Applying Corollaries 10.12, 10.17, 11.9, and 11.11 to the analysis of the modu-
lar Extended Euclidean Algorithms 6.36 and 6.59, we obtain the following result.

C OROLLARY 11.12.
Let F be a field with at least (6n + 3)d elements and f , g ∈ F[y][x] of degree at
most n in x and at most d in y.
(i) With an expected number of O(d M(n) log n + n M(d) log d) or O∼ (nd) arith-
metic operations in F , we can compute the gcd of f and g.
(ii) A single row of the EEA for f , g can be computed with O(n M(nd) log(nd))
or O∼ (n2 d) operations in F .
11.2. Subresultants via Euclid’s algorithm 327

The Euclidean Algorithm for integers. The method also works for integers,
although there are some complications due to the carries. But Corollary 11.9 is
true also for integers when the cost measure is the number of word operations
instead of field operations. Wang & Pan (2003) describe a suitable algorithm; see
also Pan & Wang (2004). The output row i is specified by an integer h and the
condition si ≤ 2h < si+1 . For inputs of at most n words, it uses O(M(n) log h) word
operations.

C OROLLARY 11.13.
For an integer m ∈ N of length n, one arithmetic operation in the residue class
ring Zm can be performed using O∼ (n) word operations. More precisely, the cost
is O(n) for an addition, O(M(n)) for a multiplication, and O(M(n) log n) for an
inversion or a division.

The following is the analog of Corollary 11.12 for integer polynomials.

C OROLLARY 11.14.
Let f , g ∈ Z[x] of degree at most n and with max-norm at most A.

(i) The gcd of f and g can be computed using an expected number of

O M(n) log n · (n + log A) · M(log(n log(nA))) · loglog(n log(nA))

+ n M((n + log A) log(n log(nA))) · (log n + loglog A)

or O∼ (n2 + n log A) word operations.

(ii) A single row in the EEA of f , g can be computed taking

O(M(n) log n · M(n log(nA)) · log(n log(nA))) or O∼ (n2 log A)

word operations.

11.2. Subresultants via Euclid’s algorithm

We have seen in Section 6.10 how to express each result of the Extended Euclidean
Algorithm for polynomials via subresultants. Now we turn this question around:
can we express subresultants, in particular the resultant, via the results of the Eu-
clidean Algorithm? The answer is yes.
This section may be skipped at first reading. The main application of its result,
the so-called fundamental theorem on subresultants, is to make the fast Euclidean
Algorithm of this chapter also work for resultants, which appear in Section 6.8 and
Chapters 15, 22, and 23.
328 11. Fast Euclidean Algorithm

We start by relating the two subresultants corresponding to one division step. We

write Sκ ( f , g) for the κth submatrix of Syl( f , g), as in Section 6.10, and σκ ( f , g) =
det Sκ ( f , g).

L EMMA 11.15. Let f , g, r ∈ F[x] be nonzero polynomials with degrees n ≥ m > d ,

respectively, r = f rem g, and 0 ≤ κ ≤ d . Then

σκ ( f , g) = (−1)(n−κ)(m−κ) lc(g)n−d σκ (g, r).

P ROOF. There exists q ∈ F[x] of degree n − m such that f = qg + r. Denoting by

f j , g j , q j , r j the coefficients of f , g, q, r, respectively, we can rewrite this as

     
fn gm
0
 fn−1   gm−1 gm   .. 
     
 ..   .. ..     . 
 .   . .   .. 
   ..  qn−m  
 ..     ..   . 
 .   . gm     ..
 −  ..   . = .
 ..     ..   . (8)
 .   g0 .   
 ..   . 0 
 ..   ..   
 .   . .  q0  rd 
     
 ..   . . ..   .. 
 .  . .  . 
f0 g0 r0
| {z }
n−m+1

The first column of Sκ ( f , g) ∈ F (n+m−2κ)×(n+m−2κ), defined on page 180, is

 
fn
 fn−1 
 
 .. , (9)
 . 
f2κ−m+1

and the second summand on the left hand side of (8), extended by zeroes or
truncated—as necessary—to length n + m − 2κ, is a linear combination of the
columns in the right part of Sκ ( f , g). Thus the column

(0, . . . , 0, rd , . . . , r2κ−m+1 )T ,

where T denotes transposition, is a linear combination of columns of Sκ ( f , g),

with coefficient 1 at the column (9), and we may substitute it for the column (9) in
Sκ ( f , g) without changing the value of the determinant; as usual, coefficients out
of range are set to zero.
11.2. Subresultants via Euclid’s algorithm 329

Similarly, we may replace each other column of f j ’s in Sκ ( f , g) by the corre-

sponding column of r j ’s. Next, we move the columns of r j ’s in order to the right
side of the matrix. This changes the sign of the determinant (n − κ)(m − κ) times.
Together we find

σκ ( f , g) = det Sκ ( f , g)

 
gm
 .. 
 gm−1 . rd 
 .. .. 
 .. 
 . gm . . 
= (−1) (n−κ)(m−κ)
det  .. .. .. .
 . . . rd 
 
 .. .. .. .. 
 . . . . 
g2κ−n+1 · · · gκ r2κ−m+1 · · · rκ
| {z } | {z }
n−κ m−κ

This matrix is of block form

D 0
,
∗ Sκ (g, r)
|{z} | {z }
n−d m+d−2κ

where D is a lower triangular (n − d) × (n − d) matrix with diagonal entries equal

to gm . Thus
n−d
σκ ( f , g) = (−1)(n−κ)(m−κ) gm σκ (g, r). ✷

We recall the degree sequence (n0 , . . . , nℓ ), with ni = deg ri for all i. Using the
above lemma repeatedly, we arrive at the following celebrated theorem.

T HEOREM 11.16 Fundamental Theorem on Subresultants.

Let f = r0 , g = r1 ∈ F[x] \ {0} of degree n ≥ m, respectively, and ni = deg ri and
αi = lc(ri ) the degrees and the leading coefficients from the traditional Euclidean
Algorithm 3.5 for 0 ≤ i ≤ ℓ.

(i) For 0 ≤ κ ≤ m, the κth subresultant of ( f , g) is


 (−1)τi αini−1 −ni ∏ αnj j−1 −n j+1 if κ = ni for some i ≤ ℓ,
σκ = det Sκ = 1≤ j<i
 0 otherwise ,

where τi = ∑1≤ j<i (n j−1 − ni )(n j − ni ).

330 11. Fast Euclidean Algorithm

(ii) The subresultants satisfy for 1 ≤ i < ℓ the recursive formulas

σm = α1n−m , σni+1 = (−1)(ni −ni+1 )(n−ni+1 +i+1) (αi αi+1 )ni −ni+1 σni .

P ROOF. (i) We know from Theorem 6.48 that σκ vanishes in the second case. So
we may assume that κ = ni for some i ≤ ℓ. This i is unique, so that the expressions
in the claim are well defined.
Induction on h for 0 ≤ h < i, using Lemma 11.15, shows that
n −n j+1
σκ (r0 , r1 ) = σκ (rh , rh+1 ) ∏ (−1)(n j−1 −κ)(n j −κ) α j j−1 .
1≤ j≤h

The claim follows from the case κ = ni and h = i − 1, together with σni (ri−1 , ri ) =
n −n
αi i−1 i .
(ii) follows from (i) by calculating τi+1 − τi modulo 2. ✷

Exercise 11.10 shows that (−1)τi = 1 when the degree sequence is normal.
We illustrate the theorem with an example.

E XAMPLE 11.17. Let f = r0 = x3 + 2x2 + 3x + 4 and g = r1 = 3x2 + 2x + 1 in

Q[x]. Then the Euclidean Algorithm computes

1 4 16 32
r0 = q1 r1 + r2 = x+ r1 + x + ,
3 9 9 9

7 9
r1 = q2 r2 + r3 = x− r2 + 9,
16 4

16 32
r2 = q3 r3 = x+ r3 .
27 27

Thus α0 = 1, α1 = 3, α2 = 16/9, α3 = 9. The subresultants of ( f , g) are

σ2 = σn1 = det(3) = 3 = α1 ,
 
1 3 0
σ1 = σn2 = det  2 2 3  = 16 = (−1)τ2 α2 α12 ,
3 1 2
 
1 0 3 0 0
 2 1 2 3 0 
 
σ0 = σn3 = det  3 2 1 2 3  = 256 = (−1)τ3 α3 α22 α12 . ✸
 
 4 3 0 1 2 
0 4 0 0 1
11.2. Subresultants via Euclid’s algorithm 331

Theorem 11.7 implies the following.

C OROLLARY 11.18.
Let f , g ∈ F[x] \ {0} have degrees n = n0 ≥ n1 , and let 0 ≤ k ≤ n. Then all
subresultants σ j for n1 ≥ j ≥ (n − k) of ( f , g) can be calculated with at most
(22M(k) + O(k)) log k operations in F .

P ROOF. We use Algorithm 11.8 with input f , g and k to compute the quotients
q1 , . . . , qh , where h = η (k). By Theorem 6.53 (b), we can compute the αi =
αi−1 / lc(qi ) from α0 and the leading coefficients of the quotients qi using at most
k multiplications and inversions each. Then we compute the subresultants up to
σn−k along the recursion formula (ii) from Theorem 11.16. This takes at most 2k
additional multiplications, and Theorem 11.7 implies the claim. ✷

Because of its importance, we highlight the case of the resultant, which corre-
sponds to κ = 0 in Theorem 11.16 and to k = n in Corollary 11.18.

C OROLLARY 11.19.
Let f , g ∈ F[x] \ {0}, and let n = n0 ≥ n1 ≥ · · · ≥ nℓ and α0 , . . . , αℓ be the degrees
and the leading coefficients, respectively, of the remainders in the Euclidean Al-
gorithm for ( f , g).
If deg gcd( f , g) = nℓ ≥ 1, then res( f , g) = 0. Otherwise, if nℓ = 0,
n n −n j+1
res( f , g) = (−1)τ αℓ ℓ−1 ∏ α j j−1 ,
1≤ j<ℓ

where τ = ∑1≤ j<ℓ n j−1 n j . This resultant can be calculated using no more than
(22M(n) + O(n)) log n operations in F .

Corollary 11.18 implies that in fact all subresultants can be computed within
the same time bound. This can be used to replace the rational number reconstruc-
tion (or Cauchy interpolation) in the modular EEA algorithms of Section 6.11, as
follows. We multiply the modular images of the ith row ri , si ,ti of the EEA with
the modular image of the subresultant σni , for all “lucky” primes, and reconstruct
σni ri , σni si , σni ti , which have integral (or polynomial) coefficients, by the fast Chi-
nese Remainder Algorithm (or fast interpolation), for all i.

C OROLLARY 11.20.
We can compute all subresultants of two polynomials f , g ∈ Z[x] of degree at most
n and with max-norm at most A using O(M(n) log n · M(n log(nA)) log(n log(nA)))
or O∼ (n2 log A) word operations.
332 11. Fast Euclidean Algorithm

P ROOF. We modify the small primes modular EEA of Section 6.11 so that in ad-
dition it computes all subresultants modulo each small prime, using the leading
coefficients of the quotients from the fast Euclidean Algorithm, as in the proof of
Corollary 11.18, and recover them from their modular images by Chinese remain-
dering. By Corollary 11.18, the additional cost is negligible, and Corollary 11.14
implies the claim. ✷

The proof of the following result for bivariate polynomials is analogous.

C OROLLARY 11.21.
We can compute all subresultants of two polynomials f , g ∈ F[y][x] of degree at
most n in x and at most d in y, where F is a field, using O(n M(nd) log(nd)) or
O∼ (n2 d) arithmetic operations in F .

Since all subresultants together have about n2 d coefficients, at least in a generic

sense, the above result is—up to logarithmic factors—optimal. However, this is
no longer true if we are only interested in a particular subresultant, for example,
the resultant, and it is not clear to us whether an O∼ (nd) algorithm exists. Similar
remarks apply to the integer case.

Notes. 11.1. The idea for a fast gcd algorithm is due to Lehmer (1938); later work in-
cludes Knuth (1970), Schönhage (1971), Moenck (1973), Aho, Hopcroft & Ullman (1974),
§8.9, Schwartz (1980), Brent, Gustavson & Yun (1980), and Strassen (1983); the first three
papers actually deal with integers. Brent, Gustavson & Yun apply their algorithm to com-
puting Padé approximants and solving Toeplitz (or Hankel) systems of linear equations,
and show that the two problems are equivalent. They also note that the “HGCD” algorithm
from Aho, Hopcroft & Ullman does not always return the correct result in non-normal
cases where not all quotients have degree 1. The Fast Extended Euclidean Algorithm 11.4
in the 1999 and 2003 editions of this text also contained an error. The concept of and
notation for “coinciding” is from Strassen’s paper. Pan (1997) gives a (sometimes) faster
computation for Padé approximants and the decoding of BCH codes.
11.2. Lemma 11.15 for the resultant is in Gordan (1885), §145, Haskell (1891/92), and
“well-known” to Swan (1962). The last paper also contains implicitly Theorem 11.16.
In the traditional, and in Sturm’s, variant of the Euclidean Algorithm (Exercise 4.32), the
(sub)resultants of consecutive entries are all identical, up to sign. The fundamental theorem
on subresultants is in Collins (1967) and Brown & Traub (1971); for our development, it
takes a back seat compared to the results of Chapter 6. Exercise 11.8 provides a simple
proof of the special case κ = 0 of Lemma 11.15.
Lickteig & Roy (1996, 2001) and Reischert (1997) give non-modular methods for com-
puting resultants of polynomials in Z[x] or F[y, x] and prove running time estimates which
are—up to logarithmic factors—within the bounds from Corollaries 11.20 and 11.21.

Exercises.
11.1 Prove that for a ring R and each k ∈ N, “coinciding up to k” is an equivalence relation on
R[x] \ {0} × R.
Exercises 333

11.2 Let F be a field supporting the FFT. In the text, we have given fast O(n logn) and O(n log2 n)
algorithms, respectively, for the following six problems for polynomials in one variable of degree
less than n over F: multiplication, division with remainder, inverse modulo xn , evaluation at n points,
interpolation at n points, and greatest common divisor. For each of these problems, describe in one
sentence of at most 20 words how the algorithm for it depends on the FFT and the algorithms for the
other problems, and what method is used for the “dependency”.
11.3 Let F be a field, k ∈ N, and f , g, f ∗ , g∗ ∈ F[x] nonzero. Prove or disprove the following
“converse” of Lemma 11.1: If g ↾ k = g∗ ↾ k and f quo g = f ∗ quo g∗ has degree k, then f ↾ 2k =
f ∗ ↾ 2k.
11.4 Let F be a field, f ∈ F[x] of degree n, and m1 , . . ., mr ∈ F[x] such that ∑1≤i≤r deg mi ≤ n.
Prove that gcd( f , m1 ), . . ., gcd( f , mr ) can be computed with O(M(n) log n) operations in F. Hint:
Corollary 10.17.
11.6 In this exercise, you are to show how the Half gcd algorithm 11.6 can be sped up if FFT
multiplication is used. Suppose that F is a field that supports the FFT. We know that the entries
of SQ j R in step 9 are of degree at most n0 − nh ≤ k, and it is sufficient to compute all of them
modulo xκ − 1, where κ ∈ N is the least power of 2 strictly larger than k. Then the product matrix
can be computed by evaluating all entries of S, Q j , R at the primitive κth roots of unity, performing
the matrix products pointwise, and interpolating to obtain the result. Count the number of κ-point
FFTs computed, and compare your result to the number of κ-point FFTs that the usual approach of
separately computing all polynomial products takes.
11.7∗ You are to modify
Algorithm
11.4
(Half gcd for normal degree sequence) so that in addition
rk
the two remainders rk+1 = Rk rr01 are output. You may assume that n0 ≤ 2k. Analyze your
algorithm carefully. The goal is to obtain a cost of at most(10M(k) + O(k)) log k field operations.
r0∗
Hint: In your modified step 4, you only have to compute R r∗ , where r0∗ , r1∗ are the “lower parts”
1
of r0 , r1 , in order to obtain rd−1 and rd , and a similar computation in step 8 gives rk , rk+1 . Trace
your algorithm on the data of Example 11.2. Can you explain why this algorithm outputs more than
Algorithm 11.4 and still uses fewer operations?
11.8 Give an alternative proof of Lemma 11.15 for k = 0 using Exercise 6.12.
11.9 Let F be a field and n, e1 , . . ., ed ∈ N, with e1 > · · · > ed and e1 + · · · + ed ≤ n.
(i) Let f , g ∈ F[x] of degree at most n, and assume that e1 , . . ., ed occur in the degree sequence
of the EEA for f and g. Show that the remainders of degrees e1 , . . ., ed can be computed with
O(M(n) log n) operations in F.
(ii) Let f , g ∈ F[y][x] of degree at most n in x and at most d in y, and assume that #F ≥ (6n+3)d and
e1 , . . ., ed occur in the degree sequence of the EEA for f and g in F(y)[x]. Show that the remainders
of degrees e1 , . . ., ed in x can be computed with O(n M(nd) log(nd)) operations in F.
11.10 Prove that (−1)τi = 1 for all i in Theorem 11.16 in case the degree sequence is normal.

Research problems.
11.11 Determine the smallest possible constant c ≤ 22 such that your favorite fast Euclidean Algo-
rithm works in time (c M(k) + O(k)) logk.
11.12 Can you find an algorithm for computing the resultant of two polynomials f , g ∈ F[x, y] of
degree at most n in x and at most d in y taking O∼ (nd) operations in the field F?
11.13 Implement carefully fast algorithms for large instances of the problems discussed in this
book.
Obweizvestno, qto zadaqa obraweni matric [. . . ]
vlets odno
i iz centralьnyh i trudnyh zadaq teorii
matric. [. . . ] K soжaleni, nesmotr
na obxirnu literaturu, posvwennu зtomu voprosu,
problema vo mnogih eë aspektah trebuet
ixego uglublënnogo issledovani.1
dalьne
Iosif Semenovich Iohvidov (1974)

For what is the theory of determinants? It is an algebra upon algebra;

a calculus which enables us to combine and foretell the results of
algebraical operations, in the same way as algebra itself enables us to
dispense with the performance of the special operations of arithmetic.
All analysis must ultimately clothe itself under this form.
James Joseph Sylvester (1851)

Lorsqu’il n’est pas en notre pouvoir de discerner les plus vraies

opinions, nous devons suivre les plus probables.2
René Descartes (1637)

1 It is commonly known that the problem of the inversion of matrices [. . . ] is one of the central and difficult
problems in matrix theory. [. . . ] Unfortunately, in spite of extensive literature dealing with this question, the
problem requires in many of its aspects a further and profound investigation.
2 When it is not in our power to tell the most correct opinions apart, we ought to follow the most probable ones.
12
Fast linear algebra

The “classical” algorithms for problems in linear algebra, such as matrix multipli-
cation, computing the determinant, or solving systems of linear equations, all take
O(n3 ) arithmetic operations for inputs of size n × n. In this chapter, we discuss
two totally different approaches to improving this. The first is a general method
whose most powerful variant leads to O(n2.3727 ) operations, but whose practical
use may be limited. The second one uses a radically different model of linear
algebra: instead of writing down the matrix (“explicit linear algebra”), we only
use a (fast) mechanism for evaluating the matrix at a vector (“black box linear al-
gebra”). This is only applicable with profit to restricted classes of matrices, but
many problems arising in practice fall into this category: Sylvester, Vandermonde,
and Toeplitz matrices, the Berlekamp matrix in Berlekamp’s polynomial factor-
ization algorithm 14.31, and large sparse matrices over F2 in integer factorization
algorithms (Algorithm 19.12).

12.1. Strassen’s matrix multiplication

Let R be a ring and A, B ∈ Rn×n two square matrices. The classical algorithm for
computing the product matrix AB by computing n2 row-by-column products uses
n3 multiplications and n3 − n2 additions in R. After having seen in Chapter 8 that
we can multiply polynomials faster than by the obvious method, a natural question
is whether something similar applies to matrix multiplication.
A few years after Karatsuba’s polynomial and integer multiplication algorithm,
Strassen (1969) found such an algorithm. The input matrices are divided into four
n/2 × n/2 blocks, and the computation of AB is reduced to seven multiplications
and 18 additions of n/2 × n/2 matrices, in comparison to eight multiplications and
four additions for the classical algorithm. Like in the polynomial case, the mul-
tiplications are handled recursively, and the saving of one (costly) multiplication
at the expense of 14 (cheap) additions leads to an asymptotically smaller running
time of O(nlog2 7 ) operations in R, where the exponent is log2 7 = 2.807354922 . . .

335
336 12. Fast linear algebra

We present a slightly different version, also using seven multiplications of n/2×

n/2 matrices, but only 15 additions.

A LGORITHM 12.1 Matrix multiplication.

Input: A, B ∈ Rn×n , where R is a ring and n = 2k for some k ∈ N.
Output: The product matrix AB ∈ Rn×n .

1. if n = 1 then let A = (a), B = (b) for some a, b ∈ R and return (ab)

A11 A12 B11 B12
2. write A = ,B= , with all Ai j , Bi j ∈ R(n/2)×(n/2)
A21 A22 B21 B22

3. S1 ←− A21 + A22 , T1 ←− B12 − B11

S2 ←− S1 − A11 , T2 ←− B22 − T1
S3 ←− A11 − A21 , T3 ←− B22 − B12
S4 ←− A12 − S2 , T4 ←− T2 − B21

4. call the algorithm recursively to compute

P1 = A11 B11 , P5 = S1 T1 ,
P2 = A12 B21 , P6 = S2 T2 ,
P3 = S4 B22 , P7 = S3 T3 .
P4 = A22 T4 ,

5. U1 ←− P1 + P2 , U5 ←− U4 + P3
U2 ←− P1 + P6 , U6 ←− U3 − P4
U3 ←− U2 + P7 , U7 ←− U3 + P5
U4 ←− U2 + P5

U1 U5
6. return
U6 U7

T HEOREM 12.2.
Algorithm 12.1 correctly computes the product matrix and uses at most 6nlog2 7
additions and multiplications in R. For arbitrary n ∈ N, an n × n matrix product
can be computed with 42nlog2 7 ∈ O(nlog2 7 ) ring operations.

P ROOF. The correctness is left as Exercise 12.1. For n = 2k ∈ N, let T (n) denote
the number of arithmetic operations in R that the algorithm performs on inputs
of size n × n. Then T (1) = 1 and T (2k ) = 15 · 22k−2 + 7T (2k−1 ) for k ≥ 1, and
Lemma 8.2 implies the first claim. For arbitrary n, we pad the matrices with zeroes,
thereby at most doubling the dimension. ✷
12.1. Strassen’s matrix multiplication 337

Strassen’s (1969) discovery was the starting signal for the development of fast
algorithms. Although subquadratic integer multiplication algorithms had been
around for a while (Section 8.1), it was the surprise of realizing that the “obvi-
ous” cubic algorithm for matrix multiplication could be improved that kicked this
development into high gear and inspired, within the following five years, the many
new ideas for almost all the fast algorithms discussed in Part II of this book.
On a more technical level, Strassen’s result spawned three lines of research:
◦ faster matrix multiplication,
◦ other problems from linear algebra,
◦ bilinear complexity.
For a field F, a number ω ∈ R is a feasible matrix multiplication exponent
if two n × n matrices over F can be multiplied with O(nω ) operations in F. The
classical algorithm shows that ω = 3 is feasible, and Strassen’s that ω = log2 7 is.
The matrix multiplication exponent µ (for F) is the infimum of all feasible ones.
Thus
2≤µ≤ω
for all feasible ω ’s. This µ is the same for all fields of a fixed characteristic, and
all feasible exponents discovered so far work for all fields.
The fascinating history of the smallest exponents known is in the Notes 12.1;
the current world record is ω < 2.3727. It seems natural to conjecture that µ = 2;
there is currently no method in sight that might prove or disprove this.
How practical are these algorithms? Bailey, Lee & Simon (1990) deplore “an
unfortunate myth [. . . ] regarding the crossover point for Strassen’s algorithm”, and
show that for the Sun–4 “Strassen is faster for matrices as small as 16 × 16. For
Cray systems the crossover point is roughly 128”. They conclude that “it appears
that Strassen’s algorithm can indeed be used to accelerate practical-sized linear
algebra calculations.” Besides a Cray library implementation (SGEMMS) of fast
matrix multiplication, there is also one, the ESSL library, for IBM 3090 machines.
Higham (1990) reports on a set of FORTRAN 77 routines (level 3 BLAS) using
“Strassen’s method for fast matrix multiplication, which is now recognized to be
a practically useful technique once matrix dimensions exceed about 100”. In all
these experiments, the coefficients are floating point numbers of a fixed precision.
A further avenue to explore with Strassen’s algorithm is that its recursive parti-
tion employs data access that is essentially different from classical multiplication.
This may make it attractive for machines with a hierarchical memory structure and
for large matrices stored in secondary memory, possibly reducing data transfer
(paging) time.
Further computational problems in linear algebra include matrix inversion, com-
puting the determinant, the characteristic polynomial, or the LR-decomposition of
a matrix, and, for F = C, the QR-decomposition and unitary transformation to
338 12. Fast linear algebra

upper Hessenberg form. It turns out that all these problems have the same asym-
ptotic complexity as matrix multiplication (up to constant factors), so that a fast
algorithm for one of them immediately gives fast algorithms for all of them.
The exponent η for solving systems of linear equations satisfies η ≤ ω for all
feasible ω . It is not known whether η = µ.
The most fundamental consequence of Strassen’s breakthrough was the devel-
opment of bilinear complexity theory , a deep and rich area that is concerned with
good and optimal algorithms for functions that depend linearly on each of two
sets of variables, just like the entries of the product of two matrices (or polyno-
mials) do. Bürgisser, Clausen & Shokrollahi (1997) give a detailed account of the
achievements in this theory, which is part of algebraic complexity theory .

12.2. Application: fast modular composition of polynomials

As an application of fast matrix multiplication, we now discuss a fast algorithm for
modular composition of polynomials. The problem is, given three polynomials
f , g, h ∈ R[x] with deg g, deg h < deg f = n, where R is a (commutative) ring and
f 6= 0 is monic, to compute g(h) rem f , that is, the remainder of the composition
of g and h modulo f . Using Horner’s rule, this can be done with at most n mul-
tiplications and additions modulo f , at a cost of O(n M(n)) operations in R. The
surprising fact is that we can do better if we use not only fast polynomial arithmetic
but also fast matrix arithmetic.

A LGORITHM 12.3 Fast modular composition.

Input: f , g, h ∈ R[x] with deg g, deg h < deg f = n and f 6= 0 monic.
Output: g(h) rem f ∈ R[x].

1. m ←− ⌈n1/2 ⌉
let g = ∑0≤i<m gi xmi , with g0 , . . . , gm−1 ∈ R[x] of degree less than m

2. for i = 2, . . . , m compute hi rem f

3. let A ∈ Rm×n be the matrix whose rows are the coefficients of 1, h rem f ,
h2 rem f , . . ., hm−1 rem f , and B ∈ Rm×m the matrix whose rows are the co-
efficients of g0 , g1 , . . . , gm−1 , and compute BA ∈ Rm×n via ⌈n/m⌉ ≤ m matrix
multiplications of size m × m

4. for i = 0, . . . , m − 1 do
let ri ∈ R[x] be the polynomial whose coefficients form the ith row
of BA, and compute b = ∑0≤i<m ri · (hm )i rem f using Horner’s rule

5. return b
12.2. Application: fast modular composition of polynomials 339

g0 1 r0
g1 h rem f r1
.. · .. = ..
. . .
gm−1 hm−1 rem f rm−1
| {z } | {z }
B A
F IGURE 12.1: The matrix product in the modular composition algorithm 12.3.

T HEOREM 12.4.
Algorithm 12.3 works correctly and uses at most ⌈n1/2 ⌉ matrix multiplications of
size ⌈n1/2 ⌉ × ⌈n1/2 ⌉, plus no more than 6n1/2 (M(n) + O(n)) additions and multi-
plications in R.

P ROOF. Let i < m, and gi = ∑0≤ j<m gi j x j with all gi j ∈ R. Then ri = ∑0≤ j<m gi j ·
(h j rem f ) = gi (h) rem f (see Figure 12.1), and

b≡ ∑ ri · (hm )i ≡ ∑ gi (h) · hmi = g(h) mod f .

0≤i<m 0≤i<m

The cost for step 2 is m − 1 multiplications modulo f . We may precompute

revn ( f )−1 mod xn for the fast division algorithm, at a cost of 3M(n) + O(n), and
then step 2 takes 3n1/2 M(n) + O(n3/2 ) ring operations, by Corollary 9.7 and the
discussion following it. Step 3 may be done by computing about m products of
pairs of m × m matrices. Finally, step 4 costs at most m multiplications and addi-
tions modulo h, with about the same cost as for step 2. ✷

Taking M(n) ∈ O(n log n loglog n) and ω < 2.3727, we obtain the following.

C OROLLARY 12.5.
The modular composition g(h) rem f for three polynomials f , g, h ∈ R[x] with
deg g, deg h < deg f = n and f 6= 0 monic can be computed using O(n1.687 ) op-
erations in R.

Umans (2008) and Kedlaya & Umans (2009) present major progress, namely a
new algorithm for modular composition that uses only n1+o(1) · O∼ (log q) bit oper-
ations. This corresponds, up to small factors, to only O(n) operations in Fq . One
of the main innovations in their work is that they do not use algebra in Fq , but bit
operations. Namely, they reduce the problem to multipoint evaluation for multi-
variate polynomials. They solve this task in a modular fashion, by considering it
over the integers, solving it modulo various primes, and reconstructing the result
via Chinese remaindering.
340 12. Fast linear algebra

12.3. Linearly recurrent sequences

In this and the following section, we introduce a new way, called black box
linear algebra, of solving linear equations. So the task is, for a given matrix
A ∈ F n×m and a vector b ∈ F n over a field F, to find one (or a description of
all) solution y ∈ F m to the equation Ay = b. From basic linear algebra, we know
that Gaussian elimination solves the problem with O(max{n, m}3 ) operations in F
(Section 25.5).
In some applications, such as polynomial factorization (Section 14.8) and in-
teger factorization (Section 19.5), linear systems have to be solved where the co-
efficient matrix has a special form, for example, it is sparse. Wiedemann (1986)
presented an algorithm that is asymptotically faster than Gaussian elimination for
sparse matrices. Kaltofen & Saunders (1991) gave a more general variant of Wie-
demann’s algorithm that is particularly well suited for a much larger class of ma-
trices, namely those where multiplication of the matrix by an arbitrary vector is
cheap. A square matrix A ∈ F n×n is not given by its n × n array of entries, but
by a black box for evaluating A, that is, a procedure which on input of a vector
v ∈ F n produces Av ∈ F n . We denote by c(A) the arithmetic cost for evaluating A
at a vector. Then the cost of Kaltofen & Saunders’ version of Wiedemann’s algo-
rithm is essentially O(n · c(A)). If A is arbitrary and we use the classical algorithm
for computing Av, then c(A) is about 2n2 , and Wiedemann’s algorithm provides
no asymptotic gain over Gaussian elimination, but it does if A is special such that
c(A) ∈ o(n2 ). Table 12.2 collects the evaluation costs for some important classes
of matrices.

class of matrices c(A)

general 2n2 − n
Sylvester matrix O(M(n))
DFTω , ω a primitive 2k th root of unity O(n log n)
Vandermonde matrix O(M(n) log n)
Berlekamp matrix over Fq O(M(n) log q)
sparse, with ≤ s nonzero entries ≤ 2s
random squares integer factorization O(n log2 n)

TABLE 12.2: Evaluation cost for some matrix classes.

For matrices with “small” evaluation cost we also have “fast” matrix multipli-
cation, in that the product with an arbitrary (explicitly given) n × n matrix can
be calculated with n · c(A) arithmetic operations. The transposition principle (see
Notes 12.3) says that for the evaluation cost, it does not matter whether we mul-
tiply by a vector from the right or the left, or whether we consider a matrix or its
transpose.
12.3. Linearly recurrent sequences 341

Before we can present the algorithm, we need some facts about linearly recur-
rent sequences. Let F be a field and V 6= {0} a vector space over F. Then V N is the
(infinite-dimensional) vector space of infinite sequences (ai )i∈N , with all ai ∈ V .

D EFINITION 12.6. A sequence a = (ai )i∈N ∈ V N is linearly recurrent (over F )

if there exist n ∈ N and f0 , . . . , fn ∈ F with fn 6= 0 such that

∑ f j ai+ j = fn ai+n + · · · + f1 ai+1 + f0 ai = 0

0≤ j≤n

for all i ∈ N. The polynomial f = ∑0≤ j≤n f j x j ∈ F[x] of degree n is called a char-
acteristic (or annihilating, or generating) polynomial of a.

E XAMPLE 12.7. (i) V = F, ai = 0 for all i ∈ N. This sequence is linearly recur-

rent, and any nonzero polynomial f is a characteristic polynomial.
(ii) V = F = Q, a0 = 0, a1 = 1, ai+2 = ai+1 + ai for all j ≥ 2. Then a = (ai )i≥0
is the Fibonacci sequence, and f = x2 − x − 1 is a characteristic polynomial of a.
(iii) V = F n×n , A ∈ F n×n arbitrary. Then a = (Ai )i≥0 is linearly recurrent, and
the characteristic polynomial χA ∈ F[x] is a characteristic polynomial of a, by the
Cayley–Hamilton theorem (Section 25.5).
(iv) V = F n , A ∈ F n×n and b ∈ F n arbitrary. Then a = (Ai b)i∈N is linearly recur-
rent, and any characteristic polynomial of (Ai )i∈N also annihilates a. The subspace
of F n spanned by the entries of a is called the Krylov subspace of A and b.
(v) V = F, A ∈ F n×n , b, u ∈ F n arbitrary. Then a = (uT Ai b)i∈N is linearly recur-
rent, and any characteristic polynomial of (Ai b)i∈N also annihilates a.
(vi) This generalizes the two previous examples. If V and W are vector spaces
over F, ϕ:V −→ W is F-linear, and a = (ai )i∈N ∈ V N is linearly recurrent with
characteristic polynomial f ∈ F[x], then so is ϕ(a) = (ϕ(ai ))i∈N ∈ W N . ✸

A linearly recurrent sequence a = (ai )i∈N ∈ V N with characteristic polynomial

f = ∑0≤ j≤n f j x j ∈ F[x] of degree n is completely determined by the n initial values
a0 , . . . , an−1 . The ith entry of a can be computed in linear time with space for
n elements of F, independent of i, using a linear feedback shift register (see
Figure 12.3). It consists of a shift register with n places, holding n consecutive
entries ai , . . . , ai+n−1 of a at a time, plus n gates for multiplication by the f j and
n + 1 addition gates. The shift register is synchronized by an external clock. At the
ith clock tick, the entries of the shift register are shifted one place to the left, the
leftmost entry ai is output, and the next entry ai+n = ∑0≤ j<n f j ai+ j of the sequence
is computed by the part of the circuit consisting of arithmetic gates, and fed into
the rightmost place of the shift register. Initially, the shift register is loaded with
a0 , . . . , an−1 .
342 12. Fast linear algebra

...

∗ f0 ∗ f1 ∗ fn−2 ∗ fn−1

a0 a1 ... an−2 an−1

F IGURE 12.3: Initial state of a linear feedback shift register for the sequence a = (ai )i∈N
with characteristic polynomial f = xn − fn−1 xn−1 − fn−2 xn−2 − · · · − f1 x − f0 .

It is convenient to define a multiplication of sequences by polynomials. For

f = ∑0≤ j≤n f j x j ∈ F[x] and a = (ai )i∈N ∈ V N , we set

f • a = ∑ f j ai+ j ∈ V N.
0≤ j≤n i∈N

The constants f ∈ F act on sequences in the usual way, and the indeterminate x
acts as a shift operator:
x • a = (ai+1 )i∈N .
This makes V N , together with •, into an F[x]-module. A module is something
similar to a vector space, with the only difference that the “scalars” may be ele-
ments of an arbitrary (commutative) ring instead of a field. In particular, • has the
following properties:

f • (a + b) = f • a + f • b, (1)
f • 0 = 0, (2)
( f + g) • a = f • a + g • a, (3)
( f g) • a = f • (g • a) = g • ( f • a), (4)
0 • a = 0, (5)
1 • a = a, (6)

for all f , g ∈ F[x] and a, b ∈ V N , where 0 = (0)i∈N is the zero sequence. Their
proof is in Exercise 12.5. For example, every commutative group G is a Z-module
by letting f • a = a f for a ∈ G and f ∈ Z.
We can express the property of being a characteristic polynomial in terms of
the operation •: f ∈ F[x] \ {0} is a characteristic polynomial of a ∈ V N if and
only if f • a = 0. The set of all characteristic polynomials of a sequence a ∈ V N ,
12.3. Linearly recurrent sequences 343

together with the zero polynomial, is an ideal in F[x]: if f , g are both characteristic
polynomials or zero, then so is f + g, and if r ∈ F[x] is arbitrary, then r f is either
zero or a characteristic polynomial, by (2), (3), and (4). This ideal is called the
annihilator of a and denoted by Ann(a). Since any ideal in F[x] is generated by a
single polynomial (Section 25.3), either Ann(a) = {0} or there is a unique monic
polynomial m ∈ Ann(a) of least degree such that hmi = {rm: r ∈ F[x]} = Ann(a).
This polynomial is called the minimal polynomial of a and divides any other
characteristic polynomial of a. We denote it by ma . If a is not linearly recurrent,
then Ann(a) = {0}, and we set ma = 0. The degree of ma is called the recursion
order of a. Summarizing, we have the following equivalences for f ∈ F[x] and
a ∈ V N:

f = 0 or f is a characteristic polynomial of a ⇐⇒ f • a = 0
⇐⇒ f ∈ Ann(a) ⇐⇒ ma | f ,
a ∈ V is linearly recurrent ⇐⇒ ∃ f ∈ F[x] \ {0}
N
f •a = 0
⇐⇒ Ann(a) 6= {0} ⇐⇒ ma 6= 0.

E XAMPLE 12.7 (continued). (i) Any polynomial annihilates the zero sequence,
by (2). Thus Ann(0) = F[x] and m0 = 1.
2
(ii) The minimal polynomial of the Fibonacci sequence is m√ a = x −x−1. This is
because the polynomial is irreducible over Q (its roots (1 ± 5)/2 are irrational),
and hence no proper divisor of ma annihilates a (1 obviously does not).
(iii) The minimal polynomial of the matrix A is also the minimal polynomial of
the sequence (Ai )i∈N .
(iv) ma divides the minimal polynomial of A.
(v) ma divides the minimal polynomial of (Ai b)i∈N .
(vi) Ann(a) ⊆ Ann(ϕ(a)) and mϕ(a) | ma .
(vii) Let V be an algebraic field extension of F, α ∈ V , and a = (αi )i≥0 . Then a
is linearly recurrent, and the minimal polynomial of a is the minimal polynomial
of α over F. ✸

We now indicate how to compute the minimal polynomial of a given sequence

a = (ai )i∈N ∈ F N , provided that we know an upper bound n ∈ N on the recursion
order. We recall from Section 9.1 the reversal of a polynomial: for f = fd xd +
· · · + f0 ∈ F[x] of degree d, we have

rev( f ) = revd ( f ) = xd f (x−1 ) = f0 xd + f1 xd−1 + · · · + fd ∈ F[x].

L EMMA 12.8. Let a = (ai )i∈N ∈ F N be linearly recurrent, h = ∑i∈N ai xi ∈ F[[x]],

the formal power series whose coefficients are the coefficients of the sequence a,
f ∈ F[x] \ {0} of degree d , and r = rev( f ) its reversal.
344 12. Fast linear algebra

(i) The following are equivalent.

(a) f is a characteristic polynomial of a,

(b) r · h is a polynomial of degree less than d ,
(c) h = g/r for some g ∈ F[x] with deg g < d .

(ii) If f is the minimal polynomial of a, then d = max{1 + deg g, deg r} and

gcd(g, r) = 1 in (i).

P ROOF. For the proof of (i), see Exercise 12.7. For (ii), we note that deg r ≤ d,
with equality if and only if x ∤ f , and hence d ≥ max{1 + deg g, deg r} in (i).
Now let f = ma , and suppose that d > max{1 + deg g, deg r}. Then x divides f ,
r = rev( f /x), and f /x is a characteristic polynomial of a of degree d − 1, by (i),
contradicting the minimality of ma . Thus d = max{1 + deg g, deg r}.
Let u = gcd(g, r). Then f ∗ = f / rev(u) is a polynomial of degree d − deg u,
r/u = rev( f ∗ ), and (r/u)h = (g/u) is a polynomial of degree less than d − deg u.
Hence f ∗ is a characteristic polynomial of a, again by (i), and the minimality of d
implies that deg u = 0. ✷

If n ∈ N is an upper bound on the recursion order of a, then we may compute

ma by solving the Padé approximation problem
s
h≡ mod x2n , x ∤ t, deg s < n, degt ≤ n, gcd(s,t) = 1, (7)
t

since Lemma 12.8 (ii) implies that (s,t) = (g, r) is a solution to (7) (note that x ∤ r,
by the definition of rev). We have seen in Section 5.9 that the solution to (7) is
unique (up to multiplication by constants) and can be computed with the Extended
Euclidean Algorithm, using O(n2 ) arithmetic operations in F. This leads to the
following algorithm.

A LGORITHM 12.9 Minimal polynomial for F N .

Input: An upper bound n ∈ N on the recursion order and the first 2n entries a0 , . . . ,
a2n−1 ∈ F of a linearly recurrent sequence a ∈ F N .
Output: The minimal polynomial ma ∈ F[x] of a.

1. h ←− a2n−1 x2n−1 + · · · + a1 x + a0
call the Extended Euclidean Algorithm to compute s,t ∈ F[x] such that
t(0) = 1 and (7) holds, as described in Section 5.9

2. d ←− max{1 + deg s, degt}, return revd (t)

12.3. Linearly recurrent sequences 345

T HEOREM 12.10.
Algorithm 12.9 correctly computes the minimal polynomial of a linearly recurrent
sequence (ai )i∈N of recursion order at most n and uses O(n2 ) operations in F .

P ROOF. Let f ∈ F[x] be the minimal polynomial of a. The discussion preced-

ing the algorithm implies that (g, r) = (s,t), where g, r are as in Lemma 12.8 (i).
Finally, f = revk (r) for some k ∈ N, and Lemma 12.8 (ii) implies that k = d. ✷

Using the fast Euclidean Algorithm from Chapter 11, the minimal polynomial
can actually be computed with O(M(n) log n) field operations, but this does not
help in our intended application.

E XAMPLE 12.11. (i) Let F = F5 and a = (3, 0, 4, 2, 3, 0, . . .) ∈ F5N be linearly

recurrent of recursion order at most 3. Then h = 3x4 + 2x3 + 4x2 + 3 in step 1 of
Algorithm 12.9, and the relevant results of the Extended Euclidean Algorithm for
x6 and h are
j q j−1 aj tj
0 x6 0
1 3x4 + 2x3 + 4x2 + 3 1
2 2
2 2x + 2x + 1 4x + 2 3x + 3x + 4
3 2x3 + 2x2 3 4x5 + 3x4 + x3 + 2x2 + 1
4 3x + 4 0 3x6
We read off row 2 that
4x + 2 x+3
h≡ = mod x6 ,
3x2 + 3x + 4 2x2 + 2x + 1
whence s = x + 3 and t = 2x2 + 2x + 1. Finally, in step 2 we have d = 2 and
ma = rev2 (t) = x2 + 2x + 2. We check that indeed ai+2 + 2ai+1 + 2ai = 0 in F5 for
i = 0, 1, 2, 3. Thus the series continues as (3, 0, 4, 2, 3, 0, 4, 2, 3, 0, . . .).
(ii) Let a = (0, 0, 1, 0, 1, 0, . . .) ∈ F N of recursion order at most 3. Then h =
x + x2 , the Extended Euclidean Algorithm for x6 and h computes
4

j q j−1 aj tj
6
0 x 0
1 x4 + x2 1
2 2 2
2 x −1 x −x + 1
3 x2 + 1 0 x4

so that s = x2 and t = −x2 + 1. Here d = 3 and ma = rev3 (t) = x3 − x. Thus

ai+3 = ai+1 for all i ≥ 0, ai = 0 if i is odd, and ai = 1 if i > 0 is even. Hence
(a1 , a2 , a3 , . . .) is periodic with period 2, and a has a preperiod of length 1. ✸
346 12. Fast linear algebra

12.4. Wiedemann’s algorithm and black box linear algebra

The main idea of Wiedemann’s (1986) algorithm for solving linear equations is as
follows. For simplicity, we assume that A ∈ F n×n is a nonsingular square matrix.
Then for any b ∈ F n , y = A−1 b ∈ F n is the unique solution of Ay = b. Suppose that
m = ma = ∑0≤ j≤d m j x j ∈ F[x] is the minimal polynomial of the linearly recurrent
sequence a = (Ai b)i∈N , that is, the unique monic polynomial of least degree such
that m • a = 0. Then in particular the first entry of m • a is zero, and

m(A)b = ∑ m j A j b = 0 in F n . (8)
0≤ j≤d

Now m is a divisor of the minimal polynomial of A, which in turn divides χA ,

the characteristic polynomial of A, according to Example 12.7 (iii). The constant
coefficient of χA is det A 6= 0, since A is nonsingular, and hence also the constant
coefficient m0 = m(0) of m is nonzero. Thus

A · (−m−1
0 ) ∑ m j A j−1 b = −m−1
0 ∑ m j A j b = b,
1≤ j≤d 1≤ j≤d

j−1
and y = −m−1 0 ∑1≤ j≤d m j A b ∈ F n is the required solution and can be computed
in a Horner-like fashion (Section 5.2), using d − 1 < n evaluations of A at a vector
plus O(n2 ) field operations for additions of vectors and multiplications by scalars.
We note that y belongs to the Krylov subspace of A and b.
This leads to the following algorithm.

A LGORITHM 12.12 Solving a nonsingular square linear system.

Input: A nonsingular matrix A ∈ F n×n and a vector b ∈ F n .
Output: y = A−1 b ∈ F n .

1. compute the minimal polynomial m ∈ F[x] of the linearly recurrent sequence

(Ai b)i∈N ∈ (F n )N
m − m(0)
2. h ←− − ∈ F[x], compute y = h(A)b in a Horner-like fashion
m(0) · x
3. return y

We have reduced our original problem of solving Ay = b to computing the min-

imal polynomial ma of a = (Ai b)i∈N ; any small multiple of ma would also do. We
have seen at the end of the previous section how to compute minimal polynomials
for sequences whose entries are field elements, but there is no obvious analog of
that procedure when the sequence entries are n-dimensional vectors. The idea now
is to choose vectors u ∈ F n randomly, compute the minimal polynomial m ∈ F[x] of
the linearly recurrent sequence (uT Ai b)i∈N ∈ F N , using Algorithm 12.9, and then
12.4. Wiedemann’s algorithm and black box linear algebra 347

check whether m is actually the minimal polynomial of a (in general, it divides ma )

by computing m(A)b ∈ F n . If p is the probability of success, then the algorithm
will be successful after O(1/p) expected trials (Section 25.6).
The following algorithm even works for singular square matrices.

A LGORITHM 12.13 Minimal polynomial for Krylov subspaces.

Input: A matrix A ∈ F n×n and a vector b ∈ F n .
Output: The minimal polynomial m ∈ F[x] of the sequence (Ai b)i∈N .

1. if b = 0 then return 1

2. choose a finite set U ⊆ F

3. choose u ∈ U n uniformly at random and compute uT Ai b ∈ F for 0 ≤ i < 2n

4. call Algorithm 12.9 to compute the minimal polynomial m ∈ F[x] of the

linearly recurrent sequence (uT Ai b)i∈N ∈ F N, with recursion bound n

5. if m(A)b = 0 in F n then return m else goto 3

E XAMPLE 12.14. Let F = F5 ,

   
1 4 4 3
A=  4 0 3  and b =  1 ,
1 2 4 2

and suppose that we want to find y ∈ F 3 such that Ay = b. We have

     
0 4 2
Ab =  3  , A2 b = A(Ab) =  4  , A3 b = A(A2 b) =  0  ,
3 3 4
   
3 0
A4 b = A(A3 b) =  0  , A5 b = A(A4 b) =  1  .
3 0
In step 3 of Algorithm 12.13, we choose u = (1, 0, 0)T ∈ F 3 , and obtain

(uT Ai b)i∈N = (3, 0, 4, 2, 3, 0, . . .).

We have already seen in Example 12.11 that the minimal polynomial of this se-
quence is m = x2 + 2x + 2. In step 5 of Algorithm 12.13, we calculate
 
0
m(A)b = A2 b + 2Ab + 2b =  2  ,
3
348 12. Fast linear algebra

so that m is not the minimal polynomial of (Ai b)i∈N . We go back to step 3 and
choose u = (1, 2, 0)T this time. Then (uT Ai b)i∈N = (0, 1, 2, 2, 3, 2, . . .), and Algo-
rithm 12.9 yields the minimal polynomial m = x3 + 3x + 1. Since the minimal
polynomial of (Ai b)i∈N is a monic multiple of m of degree at most 3, it equals m.
We check this by calculating
 
0
3 
m(A)b = A b + 3Ab + b = 0 .
0

Finally, in step 2 of Algorithm 12.12, we compute



2
m − m(0)
h=− = 4x2 + 2, y = h(A)b = 4A2 b + 2b =  3  ,
m(0)x
1

and in fact Ay = b. ✸

T HEOREM 12.15.
An output returned by Algorithm 12.13 is correct. If it returns after k iterations,
the cost is at most 2n c(A) + O(kn2 ) operations in F .

P ROOF. Let a = (Ai b)i∈N . Correctness is clear if b = 0, and we assume that b is

nonzero. The polynomial m computed in step 4 of the algorithm is a divisor of ma ,
according to Example 12.7 (v) on page 341. If the algorithm returns m in step 5,
then m(A)b = 0, so that m is a characteristic polynomial of a and hence a multiple
of ma . Since both m and ma are monic, we have m = ma .
Steps 3 and 5 can be done in a Horner-like way, using 2n applications of A
to a vector in total if the 2n vectors b, Ab, . . . , A2n−1 b ∈ F n are precomputed and
stored after step 2, and 3n evaluations of A per iteration otherwise, plus O(n2 ) field
operations per iteration for additions of vectors, inner products, and multiplications
by scalars. The cost for step 4 is O(n2 ) for each iteration. ✷

It remains to prove that the condition in step 5 is true with reasonable probability,
so that we expect to get an output after a small number of iterations. Thus we have
to find a lower bound on the probability that for random u ∈ U n , the polynomial m
computed in step 4 of Algorithm 12.13 is the minimal polynomial of (Ai b)i∈N .
For a nonzero f ∈ F[x] of degree d, we consider the set M f ⊆ F N of all sequences
a ∈ F N that are annihilated by f . For example, Mxd −1 is the set of all periodic
sequences with period d. M f is an F[x]-submodule of F N , since a + b and g • a
are annihilated by f if a and b are and g ∈ F[x] is arbitrary, by (1), (2), and (4).
Since any sequence in M f is completely determined by its d initial values, M f is
12.4. Wiedemann’s algorithm and black box linear algebra 349

d-dimensional as a vector space over F, and a basis is formed by the d shifts

x0 • c = (0, 0, . . . , 0, 0, 1, cd , cd+1 , . . .),

x1 • c = (0, 0, . . . , 0, 1, cd , cd+1 , cd+2 . . .),
..
.
xd−1 • c = (1, cd , . . . , c2d−4 , c2d−3 , c2d−2 , c2d−1 , c2d , . . .)

of the impulse response sequence c = (ci )i∈N for f , whose d initial values are
0, 0, . . . , 0, 1, and whose remaining values are determined by the recurrence relation
f • c = 0. Hence M f is the cyclic F[x]-module F[x] • c generated by c: if a =
∑0≤ j<d g j (x j • c) ∈ M f is arbitrary and g = ∑0≤ j<d g j x j ∈ F[x], then a = g • c.
A cyclic module M = R • c over a ring R is isomorphic to R/Ann(c). This
follows from the fact that λ: R −→ M with λ(g) = g • c is a surjective homomor-
phism of R-modules with kernel Ann(c), and by the homomorphism theorem for
R-modules, the map ϕ: R/Ann(c) −→ M given by ϕ(g mod Ann(c)) = g • c is an
isomorphism (see the latest edition of van der Waerden 1931, §86). This may be
familiar to the reader in the case of commutative groups, where R = Z, M is a finite
cyclic group of order n ∈ N, and the module operation is g • a = ag for g ∈ Z and
a ∈ M. Then, for any generator c of M, we have M = Z • c, Ann(c) = nZ, and in
fact ϕ: Z/nZ −→ M with ϕ(g mod n) = cg is an isomorphism of Z-modules.
In our situation, Ann(c) = h f i. This is because clearly f annihilates c, and
on the other hand no nonzero g ∈ F[x] of degree k < d satisfies g • c = 0, since
the (d − 1 − k)th coefficient of g • c is the leading coefficient of g. Thus M f and
F[x]/h f i are isomorphic as F[x]-modules, where the module operation on F[x]/h f i
is defined by g • (h mod f ) = (g mod f )(h mod f ) = gh mod f , and

ϕ: F[x]/h f i −→ M f , g mod f 7−→ g • c (9)

is an isomorphism.

L EMMA 12.16. Let A ∈ F n×n , b ∈ F n \ {0}, and f ∈ F[x] be the minimal poly-
nomial of the sequence (Ai b)i∈N ∈ (F n )N . There is a surjective F –linear map
ψ : F n −→ F[x]/h f i such that for all u ∈ F n we have

f is the minimal polynomial of (uT Ai b)i∈N ∈ F N ⇐⇒ ψ (u) is a unit. (10)

P ROOF. We have d = deg f ≤ n. We define the F–linear map ψ ∗ : F n −→ F N by

ψ ∗ (u) = (uT Ai b)i∈N . Then ψ ∗ (u) is linearly recurrent with characteristic poly-
nomial f for all u ∈ F n , so that ψ ∗ (u) ∈ M f . On the other hand, the d vectors
b, Ab, . . . , Ad−1 b ∈ F n are linearly independent over F, by the minimality of f .
Hence for any sequence a = (ai )i∈N ∈ M f there exists a u ∈ F n with uT Ai b = ai
for 0 ≤ i < d, and ψ ∗ (u) = a since the first d entries of both sequences agree and
350 12. Fast linear algebra

both are annihilated by f . This shows that we can regard ψ ∗ as a surjective map
from F n onto M f . We now set ψ = ϕ−1 ◦ ψ ∗ , where ϕ: F[x]/h f i −→ M f is the
isomorphism (9) of cyclic F[x]-modules. Then ψ is surjective,

g • ψ ∗ (u) = g • (ϕ ◦ ψ (u)) = ϕ(g • ψ (u)) = ϕ((g mod f ) · ψ (u))

for all g ∈ F[x] and u ∈ F n , and

f = mψ∗ (u) ⇐⇒ ∀g ∈ F[x] (g • ψ ∗ (u) = 0 ⇐⇒ f | g)

⇐⇒ ∀g ∈ F[x] ((g mod f ) · ψ (u) = 0 ⇐⇒ g mod f = 0)
⇐⇒ ∀h ∈ F[x]/h f i (h · ψ (u) = 0 ⇐⇒ h = 0)
⇐⇒ ψ (u) is a unit,

since d ≥ 1 and any nonunit is a zero divisor in F[x]/h f i (Exercise 4.14). ✷

L EMMA 12.17. Let U ⊆ F be finite, A ∈ F n×n , b ∈ F n \ {0}, f be the minimal

polynomial of (Ai b)i∈N ∈ (F n )N , and d = deg f . Then the probability p that f is
the minimal polynomial of the sequence (uT Ai b)i∈N for a u ∈ U n chosen uniformly
at random satisfies p ≥ 1 − d/#U .

P ROOF. Let ψ : F n −→ F[x]/h f i be as in Lemma 12.16, e j ∈ F n the jth unit vector

for 1 ≤ j ≤ n, and u = (u1 , . . . , un )T = u1 e1 + · · · + un en ∈ F n be an arbitrary vector.
Since ψ is F–linear, we have

ψ (u) = u1 ψ (e1 ) + u1 ψ (e1 ) + · · · + un ψ (en ) = (u1 h1 + · · · + un hn ) mod f ,

with h1 , . . . , hn ∈ F[x] of degree less than d and ψ (e j ) = h j mod f for all j. Let
y1 , . . . , yn be new indeterminates over F[x] and r = resx (y1 h1 + · · · + yn hn , f ) ∈
F[y1 , . . . , yn ]. Then the total degree of r is at most d, and by Lemma 6.25

ψ (u) is a unit ⇐⇒ gcd(u1 h1 + · · · + un hn , f ) = 1 ⇐⇒ r(u1 , . . . , un ) 6= 0. (11)

Since ψ is surjective and d ≥ 1, there exists some u ∈ F n such that ψ (u) = 1 is

a unit, and hence r is not the zero polynomial. By (10) and (11), p is the prob-
ability that r(u1 , . . . , un ) 6= 0 for u1 , . . . , un ∈ U chosen uniformly at random and
independently, and the assertion follows from Lemma 6.44. ✷

Lemma 12.17 gives a good lower bound on the success probability of Algorithm
12.13 for fields of sufficiently large cardinality, say at least 2n.

T HEOREM 12.18.
The expected cost of Algorithms 12.12 and 12.13 is at most 2n c(A) + O(n2 ) field
operations if F has at least 2n elements.
12.4. Wiedemann’s algorithm and black box linear algebra 351

P ROOF. From Lemmas 12.16 and 12.17, we conclude that the expected number of
iterations of Algorithm 12.13 is at most 2 if we take any subset U ⊆ F of cardinality
at least 2n. We may assume that the vectors Ab, A2 b, . . . , A2n−1 b computed in step 3
of Algorithm 12.13 are stored, and then the cost for step 2 of Algorithm 12.12 is
only another O(n2 ) field operations for vector additions and scalar multiplications.
The claim now follows from Theorem 12.15. ✷

For “small” finite fields Fq , we might make a suitable field extension (Ex-
ercise 12.16), which would add a factor of O(M(logq n)) to the timings from
Theorem 12.18. Wiedemann (1986) shows that this factor can be replaced by 2
(Exercise 12.18), by computing the least common multiple of the minimal polyno-
mials of (uT Ai b)i∈N for several independently chosen uniformly random u ∈ F n .
Wiedemann (1986) also addresses the singular and nonsquare case. A different
variant in the singular case is from Kaltofen & Saunders (1991). They prove the
following theorem.

T HEOREM 12.19.
Let F be a field, n ∈ N>0 , A ∈ F n×n of rank r ≤ n with leading principal r × r
submatrix nonsingular, and b ∈ F n such that the linear system Ay = b is solvable.
Then for any vector v ∈ F n there is a unique v∗ ∈ F n such that A · (v∗ − v) = b and
the lower n − r coordinates of v∗ are zero. Moreover, v∗ − v is a uniform random
vector in the solution space {y ∈ F n : Ay = b} if v is a uniform random vector in F n .

Thus given A, b as in the theorem, we choose a random vector v and apply Wie-
demann’s algorithm 12.12 to the linear system Ar yr = br , where Ar ∈ F r×r is the
leading principal submatrix of A, the vector br ∈ F r consists of the upper r coor-
dinates of b + Av, and yr ∈ F r is to be computed. If we let v∗ ∈ F n be the vector
whose upper part is yr and whose lower part is zero, then the theorem implies that
y = v∗ − v is a uniform random solution of the linear system Ay = b. (This works,
in particular, for b = 0.) Since we may apply Ar to a vector vr ∈ F r by applying A
to the vector whose upper r coordinates are those of vr and whose lower n − r ones
are zero and taking the upper r coordinates of the result, the cost for the method
described above is O(n(c(A) + n)) operations in F.
Kaltofen & Saunders also give a probabilistic algorithm which transforms an ar-
bitrary matrix C ∈ F n×n into the form required in Theorem 12.19, thereby preserv-
ing the black box property, and determines its rank. If the field has at least 3(n2 +n)
elements, then their algorithm uses O(n(c(C)+M(n))) field operations and returns
the correct result with probability at least 1/2. To find a solution of the linear sys-
tem with coefficient matrix C, the above method is applied to the transformed
matrix, and a uniform random solution of the original linear system can be com-
puted within the same time bound. The cost for one application of the transformed
352 12. Fast linear algebra

matrix to a vector is c(C) + 2M(n), and the overall cost is O(n(c(C) + M(n)))
operations.
An important aspect of Wiedemann’s algorithm is that the only use of the ma-
trix A is to evaluate it at vectors. Thus instead of storing an array of n2 entries,
all we need is a “black box” for the evaluation of A, that is, a subroutine which
on input v ∈ F n returns Av ∈ F n . This leads to a new way of doing linear algebra,
black box linear algebra (or implicit linear algebra), in contrast to the traditional
explicit linear algebra, where all entries of A are explicitly stored. A somewhat
intermediate concept is sparse linear algebra, where A is stored in a sparse for-
mat, listing only those (i, j, ai j ) with ai j 6= 0; this is appropriate for Wiedemann’s
original application to integer factorization (Chapter 19).
As an example, let ω ∈ F be a primitive nth root of unity and suppose that A =
VDM(1, ω , ω 2 , . . . , ω n−1 ) is the matrix of the Discrete Fourier Transform DFTω
(Section 8.2). Thus solving Av = b for v corresponds to interpolating at the powers
of ω with values determined by b, and evaluation of A at v corresponds to comput-
ing the Discrete Fourier Transform of the polynomial corresponding to v, which
can be done at a cost of O(n log n) arithmetic operations. Actually, this does not
yield improved algorithms here, since DFT−1 ω can be computed with O(n log n) op-
erations, while the black box linear algebra approach takes O(n2 log n), but it does
when applied to the Berlekamp matrix for factoring polynomials (Section 14.8).

Notes. 12.1. Algorithm 12.1 is due to Winograd (1971), and the current world record
ω < 2.3727 is from Williams (2011), after the result of Coppersmith & Winograd (1990)
had stood for a quarter century. The entries of the following table indicate the approximate
date of discovery of new feasible matrix multiplication exponents in history; publication
was often years later.

Strassen 1968 2.808 Coppersmith & Winograd 1980 2.498

Pan 1978 2.781 Strassen 1986 2.479
Bini et al. 1979 2.780 Coppersmith & Winograd 1986 2.376
Schönhage 1979 2.548 Williams 2011 2.373
Pan 1979 2.522

The details of these algorithms are beyond the scope of this text. The most comprehensive
treatment is in Bürgisser, Clausen & Shokrollahi (1997); we also refer to the books by Pan
(1984) and de Groote (1987), and the survey articles of Strassen (1984, 1990) and von zur
Gathen (1988) for details and references. Also for the rest of this section, we refer the
reader to those texts.
Fast algorithms (or reductions) for the problems at the end of the section are in van der
Waerden (1938), Strassen (1969, 1973a), Bunch & Hopcroft (1974), Baur & Strassen
(1983), and Keller-Gehrig (1985). Chou, Deng, Li & Wang (1995) report on a success-
ful parallel implementation of Strassen’s matrix multiplication.
Various normal forms of matrices are important tools in linear algebra, say over a Eu-
clidean domain R. The Hermite normal form (Notes 4.5) is particularly useful for solving
Exercises 353

linear equations over R. Algorithm 16.26 computes the Hermite normal form over Z of a
nonsingular square matrix, and faster algorithms are in Storjohann (2000). Giesbrecht,
Storjohann & Villard (2003) is a survey of this active area by three of the main contribu-
tors.
12.2. Algorithm 12.3 is from Brent & Kung (1978). It can be slightly speeded up by using
faster algorithms for rectangular matrix multiplication . The direct approach for multiply-
ing an n × n matrix by an n × n2 matrix using ω < 2.373 takes O(n3.373 ) ring operations.
Huang & Pan (1998) have improved the exponent for this particular rectangular problem
to less than 3.334, and then the n1.687 in Corollary 12.5 drops to n1.667 . In the special
case f = xn , Brent & Kung give a O(M(n)(n log n)1/2 ) solution (Exercise 12.4). Bernstein
(1998a) presents a faster algorithm for rings of small characteristic.
12.3. Linearly recurrent sequences were already studied by de Moivre, and the equivalence
between finding recurrence relations and computing Padé approximants (Lemma 12.8) was
known to Kronecker (1881a), page 566. The problem is also intimately connected to solv-
ing Toeplitz (or Hankel) systems of equations. Krylov (1931) invented his method in the
context of solving differential equations for oscillation problems. The transposition prin-
ciple says that for a given square matrix A, the cost of computing Av or wA, for input
vectors v and w, are essentially the same (Fiduccia 1972b, 1973, Kaminski, Kirkpatrick &
Bshouty 1988). Kaltofen (2000) proposes as an open problem the task to relate these two
costs more precisely. The connection between these two problems is well-studied in digital
filter design; see Antoniou (1979), §4.7.
12.4. The block Wiedemann method, proposed by Coppersmith (1994) and analyzed and
improved by Kaltofen (1995b) and Villard (1997), reduces the number of evaluations of A
from 2n to (1 + ε)n for any ε > 0. It is particularly well suited for F = F2 and also leads
to efficient parallel algorithms over arbitrary fields.
A different black box method, based on inner products, for solving linear systems is due
to Lanczos (1952). LaMacchia & Odlyzko (1990) introduced the algorithm into computer
algebra by employing it for integer factorization. Block variants were given by Copper-
smith (1993) and Montgomery (1995), and Eberly & Kaltofen (1997) analyze randomized
Lanczos algorithms. Giesbrecht, Lobo & Saunders (1998) solve the problem of certify-
ing inconsistency of a system of linear equations over a field or over the integers by the
Wiedemann method.

Exercises.
12.1 Prove that Algorithm 12.1 works correctly.
12.2 Let g = h = x3 + 2x2 + 3x + 4 and f = x4 − 1 in F5 [x]. Trace Algorithm 12.3 on computing
g(h) rem f .
12.3 Let R be a ring (commutative, with 1), f , g, h ∈ R[x] with deg f = n, deg g < d, and deg h < m,
such that f is monic and d is a power of 2. Devise a divide-and-conquer algorithm for computing
g(h) rem f by splitting g into two blocks of size d/2. Prove that your algorithm takes O(M(n) log n)
operations in R if dm ≤ n, and O((dm/n)M(n) logn) in general.
12.4∗ Let n, m ∈ N, R be a (commutative) ring such that n! is a unit in R, and g, h ∈ R[x] of degrees
less than n. We let k = ⌈n/m⌉ and write h = h1 + h0 , with h0 , h1 ∈ R[x] such that deg h0 < m and xm
divides h1 .
(i) Prove that the following Taylor expansion holds:
g(k) (h0 ) k
g(h) ≡ g(h0 ) + g′ (h0 )h1 + · · · + h1 mod xn .
k!
354 12. Fast linear algebra

(ii) The chain rule implies that g(i+1) (h0 ) · h′0 = (g(i) (h0 ))′ for all i ∈ N. Assuming that h′0 (0)
is nonzero, show that g(i+1) (h0 ) rem xn+k−i−1 can be computed from g(i) (h0 ) rem xn+k−i using
O(M(n)) operations in R, for 0 ≤ i < k.
(iii) Consider Brent & Kung’s (1978) algorithm for computing g(h) mod xn .
A LGORITHM 12.20 Composition modulo powers of x.
Input: n ∈ N and g, h ∈ R[x] of degrees less than n such that h′ (0) 6= 0.
Output: g(h) rem xn ∈ R[x].
1. write h = h1 + h0 as above
for i = 2, . . ., k compute hi1 /i! rem xn
2. call the algorithm from Exercise 12.3 to compute g(h0 ) rem xn+k
3. for i = 1, . . ., k compute g(i) (h0 ) rem xn+k−i
hi1
4. return ∑ g(i) (h0 ) rem xn
0≤i≤k i!
Its correctness follows from (i). Use (ii) to prove that the algorithm takes O((k + m logn)M(n))
operations in R.
(iv) Which choice of m minimizes the running time?
(v) Can you remove the restriction that h′ (0) be nonzero, using essentially the same time bound?
12.5 Prove properties (1) through (6) of a module operation •.
12.6 Let V be a vector space over a field F.
(i) What is the minimal polynomial of the sequence (ai )i∈N ∈ F N defined by ai = 1 for 0 ≤ i < n
and ai = 0 otherwise?
(ii) How are ma and mxn •a for a ∈ V N and n ∈ N related?
12.7 Prove Lemma 12.8 (i).
12.8 Determine a recurrence relation and sufficiently many initial values of the sequence (ai )i∈N ∈
Q N if h = ∑i≥0 ai xi ∈ Q[[x]] is
x2 + x x2 − x x4 + x
(i) h = , (ii) h = , (iii) h = .
x3 − x − 1 x4 − x2 − x x3 − x − 1
12.9 Compute the minimal polynomial of the sequence 1, 3, 4, 7, 11, 18, 29, 47, . . . of rational num-
bers using Algorithm 12.9. You may assume that the recursion order is at most four. Give the next
12 elements of the sequence.
12.10∗ Let F be a field, f ∈ F[x] an irreducible polynomial of degree n, E = F[x]/h f i, α = x
mod f ∈ E, and β = g(α) ∈ E for some nonzero polynomial g ∈ F[x] of degree less than n.
(i) Prove that the minimal polynomial m ∈ F[x] of β over F is equal to the minimal polynomial of
the sequence (β i )i∈N ⊆ E N over F.
(ii) Let τ : E −→ F be the F-linear map with τ (∑0≤i<n ci αi ) = c0 for all c0 , . . ., cn−1 ∈ F. Prove
that m is the minimal polynomial of the sequence (τ (β i ))i∈N ⊆ F N . Hint: m is irreducible.
(iii) Show that m can be computed using O(n M(n)) operations in F.
(iv) Compute the minimal polynomial of 22/3 + 21/3 + 1 over Q.
(Shoup (1999) employs Algorithm 12.3 to compute minimal polynomials over a finite field F with
O(n1/2 M(n) + n2 ) arithmetic operations.)
12.11 Let F be a field, A ∈ F n×n , u, b ∈ F n , and define sequences a = (Ai )i∈N and a∗ = (Ai b)i∈N .
(i) Prove that f ∈ F[x] is a characteristic polynomial of a if and only if f (A) = 0 in F n×n .
(ii) Prove that f ∈ F[x] is a characteristic polynomial of a∗ if and only if f (A)b = 0 in F n .
Exercises 355

12.12∗ This continues Exercise 12.11. Let a∗∗ = (uT Ai b)i∈N . Find a situation where uT f (A)b = 0,
but f is not a characteristic polynomial of a∗∗ . Can you determine a stronger condition such that
a similar equivalence as in Exercise 12.11 holds? Hint: Consider the condition that uT f (A) be
orthogonal to the Krylov subspace hAi b: i ∈ Ni of F n generated by b.
12.13−→ Let    
1 2 3 0
A= 4 0 1  ∈ F53×3 , b =  1  ∈ F53 ,
1 3 1 2
and compute A−1 b using Wiedemann’s algorithm 12.12.
12.14∗ Let F be a field, n ∈ N, A ∈ F n×n , and ei ∈ F n be the ith unit vector, that is, the column
vector with 1 in coordinate i and 0 everywhere else, for 1 ≤ i ≤ n. Prove that if f , f1 , . . ., fn ∈ F[x]
are the minimal polynomials of A, (Ai e1 )i∈N , . . ., (Ai en )i∈N , respectively, then f = lcm{ f1 , . . ., fn }.
Generalize this to arbitrary bases e1 , . . ., en of F n .
12.15∗ Let F be a field.
(i) Design an algorithm that, given a matrix A ∈ F n×n , computes its minimal polynomial by ran-
domly choosing u, b ∈ F n , computing the minimal polynomial of (uT Ai b)i∈N , and checking whether
it is actually the minimal polynomial of A. Prove that your algorithm is correct if it halts.
(ii) Let f ∈ F[x] be the minimal polynomial of A. Show that there is a surjective bilinear map
ψ: F n × F n −→ F[x]/h f i (bilinear means that ψ is linear with respect to both arguments) such that

f is the minimal polynomial of (uT Ai b)i∈N ∈ F n ⇐⇒ ψ(u, b) is a unit

for all u, b ∈ F n . Hint: Proceed as in Lemma 12.16.

(iii) Use (ii) to show that the probability for f being the minimal polynomial of (uT Ai b)i∈N , when
the entries of u and b are chosen uniformly at random and independently from a fixed finite subset
U ⊆ F, is at least 1 − 2 deg f /#U. Hint: Proceed as in the proof of Lemma 12.17.
12.16 Let F ⊆ E be fields, n ∈ N, V = F n , b ∈ V , and A ∈ F n×n .
(i) Let a = (Ai b)i∈N ∈ V N . Prove that the recursion order of a is the smallest r ∈ N such that
b, Ab, . . ., Ar b are linearly dependent in F n .
(ii) Let r ≤ n, and prove that vectors b0 , . . ., br−1 ∈ F n are linearly dependent in F n if and only if
they are linearly dependent in E n . Hint: Gaussian elimination. Conclude that the minimal polynomial
of a over F is the same as over E.
12.17∗ Let n ∈ N, F be a field of cardinality at least 4n, and A ∈ F n×n . Give a probabilistic “black
box” algorithm taking O(c(A)n + n2 ) field operations that decides whether A is singular, such that
the answer is always correct when A is nonsingular and correct with probability at least 1/2 when A
is singular. Hint: Exercise 12.15.
12.18∗∗ Let F be a field, n ∈ N, A ∈ F n×n , and b ∈ F n . This exercise discusses a modification of
Algorithm 12.12, due to Wiedemann (1986), which proves that the time bound from Theorem 12.18
can be achieved within a factor of 2 when F is a small finite field.
(i) Let V be a vector space over F, f ∈ F[x] the minimal polynomial of a linearly recurrent se-
quence a ∈ V N , and g ∈ F[x]. Prove that g • a has minimal polynomial f / gcd( f , g).
(ii) Replace steps 1, 2, and 5 of Algorithm 12.13 by
1. let U ⊆ F be finite, g ←− 1
2. if b = 0 then return g
5. g ←− gm, b ←− m(A)b, goto 2
Prove that the modified algorithm works correctly.
356 12. Fast linear algebra

(iii) Let uk ∈ F[x] be the value of u chosen in the kth iteration and gk ∈ F[x] the minimal polynomial
of the sequence a(k) = (uTk Ai b∗ )i∈N , where b∗ ∈ F n is the initial value of b. Prove that the invariant
m = gk / gcd(g, gk ) holds after the kth pass through step 3, and conclude that g = lcm(g1 , . . ., gk )
holds before the kth pass through step 2.
(iv) Let U = F = Fq be a finite field with q elements and f ∈ F[x]. Prove that for k polyno-
mials h1 , . . ., hk ∈ F[x] of degree less than deg f chosen uniformly at random and independently, the
probability that gcd(h1 , . . ., hk , f ) = 1 is pk = ∏1≤ j≤r (1 − q−k deg f j ), where f1 , . . ., fr ∈ F[x] are the
distinct monic irreducible factors of f .
(v) Now let f be the minimal polynomial of (Ai b∗ )i∈N . Prove the following generalization of
Lemma 12.16 (with b replaced by b∗ ): If ψ(u) = h mod f with h ∈ F[x], then mψ∗ (u) = f / gcd( f , h).
Conclude that pk from (iv) is the probability that the above algorithm terminates after at most k
iterations.
(vi) Let ni = #{1 ≤ j ≤ r: deg f j = i} for all i. Prove that

∏ (1 − q−k deg f j ) ≥ 1 − ∑ q−k deg f j = 1 − ∑ ni q−ki .

1≤ j≤r 1≤ j≤r i≥1

Use that ni ≤ qi /i, by Lemma 14.38, to show that pk ≥ 1 −2q1−k if k ≥ 2. Conclude that the expected
number ∑k≥0 (1 − pk ) of iterations of the algorithm is at most 4.

Research problem.
12.19 Can you improve the cost for modular composition of polynomials of degrees at most n to,
say, O∼ (n1.5 ) or better?
Völker, hört die Signale!1
Emil Luckhardt (c. 1890)

L’étude approfondie de la nature est la source la plus féconde des

découvertes mathématiques. Non seulement cette étude,
en offrant aux recherches un but déterminé, a l’avantage d’exclure
les questions vagues et les calculs sans issue: elle est encore
un moyen assuré de former l’Analyse elle-même, et d’en découvrir
les éléments qu’il nous importe le plus de connaître,
et que cette science doit toujours conserver.2
Jean Baptiste Joseph Fourier (1822)

Unsere Allergrößten, wie Archimedes, Newton, Gauß,

haben stets Theorie und Anwendungen gleichmäßig umfaßt.3
Felix Klein (1908)

Die Mathematiker sind eine Art Franzosen, redet man zu ihnen,

so übersetzen sie es in ihre Sprache
und dann ist es alsobald ganz etwas anders.4
Johann Wolfgang von Goethe (1829)

He said he would rather decline two drinks

than one German adjective.
Mark Twain (1879)

1 Peoples, hear the signals!

2 The deep study of nature is the most fruitful source of mathematical discoveries. By offering a well-defined
goal for research, this study not only has the advantage of excluding vague questions and useless calculations, but
is also a sure means of developing mathematics itself and of discovering the elements which are most important
to know, and which this science ought always to conserve.
3 Our greatest mathematicians, as Archimedes, Newton, and Gauß, always united theory and applications in
equal measure.
4 Mathematicians are like Frenchmen; if you talk to them, they translate it into their own language, and right
away it is something entirely different.
13
Fourier Transform and image compression

In this chapter, we discuss the background of the Fourier Transform from electri-
cal engineering and signal processing. Its fundamental property of transforming a
(discrete or continuous) signal from its description in the time domain to an equiv-
alent characterization in the frequency domain is used to describe and analyze the
contributions of different frequencies to a signal. Furthermore, we present an ap-
plication of the Fourier Transform in image processing.

13.1. The Continuous and the Discrete Fourier Transform

D EFINITION 13.1. A continuous signal (or analog signal) is a function

f : D −→ R n , where D ⊆ R m and m, n ∈ N.

A discrete(-time) signal is a function

f : D −→ R n , where D ⊆ Z m and m, n ∈ N.

If in addition f (D) ⊆ Z n , then f is called a digital signal.

Sound is a continuous signal that varies over time and has range loudness. It is
an example of a signal f : R −→ R. In the case of gray-scale pixels on a screen, the
signal associates to each point an intensity value, so that f : D ⊆ Z 2 −→ R. When
color is represented by the constituent amounts of three basic colors (RGB) or four
basic colors (CMYK), we have a signal mapping into R 3 or R 4 , respectively.
A discrete signal is often obtained by sampling a continuous signal at discrete
intervals. This is illustrated in Figure 13.1, where a continuous signal is sampled
at regularly spaced points.
Discrete signals find applications in areas such as biomedical engineering, seis-
mology, acoustics, sonar and radar imaging, speech communication, data com-
munication, television satellite communication, satellite images, and many more.
Speech and telephone signals are examples of signals with only one dimensional

359
360 13. Fourier Transform and image compression

1.5

0.5

-0.5

-1

-1.5
0 20 40 60 80 100
t

F IGURE 13.1: The analog signal f (t) = sin(t/10) + t 2 sin(t/2)/40 000 (red curve), and
the corresponding discrete signal (blue dots).

domains, while radar imaging, satellite images, and lunar images are processed
with two dimensional domains. When modeling complicated problems, such as
those that appear in seismology, the domain can have many dimensions.
It is important to perform certain operations on signals to extract relevant infor-
mation from them, or to transform the signal to make it easier to use. For instance
one may wish to extract some important parameters from the data such as danger
alerts from an electrocardiogram or electroencephalogram. One may want to com-
press the data contained in a telephone signal or recognize the words associated
with speech signals. A common problem is to extract relevant information from
masses of data associated with such things as television transmission or satellite
images. Another application of signal processing in signal transmission is to try
to remove signal interference contributed by transmission noise, fading or channel
distortion.
Of particular importance are the sine signal f : R −→ R with f (t) = sint, and
2 ∼ it
its
√ complex variant f : R −→ R = C with f (t) = e = cost + i sint, where i=
−1. More generally, we have the signal f : R −→ C with f (t) = a · eikt , with
amplitude a, corresponding to the intensity of the signal (for example, loudness
of an audio signal or luminance of a video signal), and frequency k (corresponding
to pitch or color, respectively). All those signals are examples of periodic signals:
there exists a period T ∈ R>0 such that f (t + T ) = f (t) for all t ∈ R. Applying
the transformation t 7−→ 2πt/T , we may assume that T = 2π . For the sinusoidal
signal f (t) = a · eikt , the smallest such period T is the wavelength and related to
13.1. The Continuous and the Discrete Fourier Transform 361

2
sin(t) + sin(10t)/10
1.5 sin(t)
sin(10t)/10
1

0.5

-0.5

-1

-1.5

-2
-10 -5 0 5 10
t

F IGURE 13.2: A 2π-periodic signal and its harmonics.

the frequency k by T = 2π /k. Similarly, a discrete signal is periodic with period

n ∈ N>0 if f ( j + n) = f ( j) for all j ∈ Z.
Other popular examples of 2π -periodic functions are obtained by taking a func-
tion defined on a bounded interval, normalizing the interval to be [0, 2π ], and ex-
tending the function by periodicity. Examples are given in Exercise 13.4. In the
following, we assume without further mention that our functions are sufficiently
“smooth”, so that, for example, the required integrals and sums are well defined.

D EFINITION 13.2. The (Continuous) Fourier Transform of a 2π -periodic sig-

nal f : R −→ C is fb: Z −→ C with
Z 2π
fb(k) = f (t)e−ikt dt for k ∈ Z,
0
√
where i = −1 ∈ C as usual.

The following inversion formula expresses the function f in terms of its Fourier
Transform:
1
f (t) = ∑ fb(k)eikt .
2π k∈Z
(1)

This holds for all t ∈ R; the series converges uniformly to f . This

R
series is called
the Fourier series of f , and the numbers βk = 21π fb(k) = 21π 02π f (t)e−ikt dt for
362 13. Fourier Transform and image compression

k ∈ Z are the Fourier coefficients of f . The inversion formula says that the func-
tion f is uniquely determined by the sequence of its Fourier coefficients (βk )k∈Z .
The special functions eikt , for k ∈ Z, are a “basis” for the complex vector space
of all 2π -periodic functions; however, there are in general infinitely many nonzero
coefficients.
While the original signal f is described in the time domain by assigning to each
time t ∈ [0, 2π ] the value f (t) of the signal at that time, the Fourier Transform fb
is an equivalent characterization of f in the frequency domain. It associates to
each frequency k ∈ Z the contribution fb(k) of that frequency, namely of the signal
exp(ikt)/2π , to f , as given by the inversion formula (1). It compresses the “con-
tinuously” many values f (t) into the countably many values fb(k). For k ∈ N>0 ,
the signal ( fb(k) exp(ikt) + fb(−k) exp(−ikt))/2π is called the kth harmonic of f .

E XAMPLE 13.3. Consider the 2π -periodic signal f : R −→ R defined by f (t) =

sin(t) + sin(10t)/10, which is plotted in green in Figure 13.2. We may think of
f as consisting of a low frequency part sin(t) with a large amplitude plus a high
frequency part sin(10t)/10 with a small amplitude; these are the blue and red
curves in Figure 13.2. The Fourier Transform decomposes f into its harmonics:
Exercise 13.3 shows that
1
fb(1) = −π i = − fb(−1), fb(10) = − π i = − fb(−10),
10

and fb(k) = 0 for k 6= ±1, ±10. Thus the first harmonic of f is ( fb(1) exp(it) +
fb(−1) exp(−it))/2π = sin(t), the 10th harmonic is sin(10t)/10, and all other har-
monics are zero. ✸

The Discrete Fourier Transform is the analog of the Continuous Fourier Trans-
form for discrete periodic signals. If f : Z −→ C is a discrete signal with period
n ∈ N>0 , its Discrete Fourier Transform fb: Z −→ C is defined by

fb(k) = ∑ f ( j)e−2πi jk/n = ∑ f ( j) ω k j for k ∈ Z,

0≤ j<n 0≤ j<n

where ω = e−2πi/n ∈ C is a primitive nth root of unity. If we start with a continuous

2π -periodic signal g and sample it at regularly spaced points 2π j/n for 0 ≤ j < n
to obtain f , then the Discrete Fourier Transform of f is just one of the common
approximations to the integral defining the Continuous Fourier Transform of g. In
contrast to the continuous case, fb is again periodic with period n. The analog of
the inversion formula (1) is
1 1
f ( j) = ∑ fb(k)e2πi jk/n = n ∑ fb(k) ω−k j .
n 0≤k<n
(2)
0≤k<n
13.2. Audio and video compression 363

The connection to Section 8.2 is as follows. If we associate to a polynomial

g = ∑0≤ j<n g j x j ∈ C[x] a discrete signal f : Z −→ C of period n with f ( j) = g j for
all j and take ω = e−2πi/n , then

DFTω (g) = fb(0), . . . , fb(n − 1) ,

and Theorem 8.13 is the analog of the inversion formula (2).

In digital signal processing, continuous signals such as sounds or images are
sampled. The analysis or modification of the corresponding discrete signals (for
example, noise reduction or contrast amplification) on digital hardware or software
systems usually involves the computation of Discrete Fourier Transforms for large
amounts of data, and the use of the Fast Fourier Transform is indispensable.

13.2. Audio and video compression

The Fourier Transform and its inverse can be considered as a change of represen-
tation between the time domain and the frequency domain. If f is a (continuous
or discrete) signal and fb its Fourier Transform, then f (t) is the value of the sig-
nal at time t, while fb(k) is the contribution of the frequency k to the signal. Both
are equivalent characterizations of a signal. Often audio and video data intended
for reception by humans tend to vary “slowly” in the time domain, so that their
high frequency contributions are “small”. In the frequency domain, this means
that | fb(k)| is small for large values of k. The idea for data compression is now to
discard the values of fb(k) that are “close to zero” and to only store the rest. Since
the perceptive faculty of the human ear or eye is much better for low frequencies
than for high ones, the listener or observer hardly notices this loss of information,
and considerable compression rates can be achieved.
Audio and video data are real-valued signals, and often a variant of the Fourier
Transform is used that maps real-valued signals again to real-valued signals, as
follows. Let f : {0, . . . , n − 1} −→ R be a discrete real-valued signal of finite dura-
tion n ∈ N, for example, a sampled piece of music or one row of a digital screen
image. We may think of f as being extended to a periodic signal on Z by letting
f ( j) = f ( j rem n) for all j ∈ Z. If we let

1 π k(2 j + 1)
DCT( f )(k) = √ c(k) ∑ f ( j) cos for 0 ≤ k < n,
n 0≤ j<n 2n

1 π k(2 j + 1)
IDCT( f )( j) = √ ∑ c(k) f (k) cos for 0 ≤ j < n,
n 0≤k<n 2n
√
where c(k) = 1 if k = 0 and c(k) = 2 otherwise, then DCT and IDCT are inverse
operators mapping real-valued signals of finite duration n to signals of the same
364 13. Fourier Transform and image compression

1 1
0 k=0 0
k=1
-1 -1
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
1 1
k=3
0 k=2 0
-1 -1
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
1 1
0 0
k=4 k=5
-1 -1
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
1 1
k=6 k=7
0 0
-1 -1
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

F IGURE 13.3: The discrete cosine signals γk for 0 ≤ k < 8.

kind (Exercise 13.6). DCT( f ) is the Discrete Cosine Transform (DCT) of f . Ex-
ercise 13.6 also shows that computing this transform or its inverse can be reduced
to computing a Discrete Fourier Transform, which can be done efficiently via the
FFT if n is a power of 2, using O(n log n) operations in R.
The inversion formula
1 π k(2 j + 1)
f ( j) = (IDCT ◦ DCT)( f )( j) = √ ∑ c(k)DCT( f )(k) cos
n 0≤k<n 2n

for 0 ≤ j < n shows that the Discrete Cosine Transform leads to a representation of
the original signal f as a linear combination of periodic signals γk , where γk ( j) =
√
cos(π k(2 j + 1)/2n), with coefficients c(k)DCT( f )(k)/ n. Figure 13.3 depicts
the signals γk on the interval 0, . . . , n − 1 for n = 8. Larger values of k correspond
to a more rapid variation of γk .
A possible image data compression algorithm works as follows. For each row
f : {0, . . . , n − 1} −→ R of the image, where f ( j) is the luminance of the jth pixel
in that row, compute DCT( f )(k) for 0 ≤ k < n (if the image is a color image, then,
for example, apply this separately for the intensities of each of the three colors
red, green, and blue). Then choose a quantizing parameter q ∈ R≥1 , divide all val-
ues of DCT( f ) by q, and round to the nearest integer. The effect of quantization
13.2. Audio and video compression 365

F IGURE 13.4: A grayscale image of Schloß Neuhaus at Paderborn, and the absolute values
of its row-wise Discrete Cosine Transform. The color white corresponds to zero, and
frequency increases from left to right.

is that those values of DCT( f )(k) that are close to zero in absolute value (which
in general will be the case for the high frequency parts, that is, for large k) van-
ish completely. Thus large values of q correspond to high compression rates but
also to worse image quality. Finally a combination of lossless data compression
techniques, such as run length encoding and Huffman encoding, is applied to the
quantized values. Run length encoding compresses each consecutive sequence
(= run) of zeroes to two integers: a zero marking the position of the run, followed
by the length of the run. For example, the sequence

1, 2, 3, 0, 0, 0, 0, 4, 0, 5, −6, 0, 0, 0, 1, 2

is compressed to
1, 2, 3, 0, 4, 4, 0, 1, 5, −6, 0, 3, 1, 2
thereby decreasing the length by two. We note that “runs” of zeroes of length 1
actually increase the size, like the single zero between 4 and 5 in the above exam-
ple. In order to reconstruct the image, we proceed in the opposite direction: after
decoding the compressed values, we multiply them by q and apply the Discrete
Cosine Transform row by row.
366 13. Fourier Transform and image compression

Figure 13.4 shows at left a grayscale image with 1088 rows, 728 columns, and
luminance values between 0 (= black) and 255 (= white). Hence the size of the im-
age is 1088 · 728 = 792 064 bytes in a dense encoding, where the luminance value
of each pixel is stored in one byte. At right, we see the absolute values of the row-
wise Discrete Cosine Transforms of that image. The luminance of the kth pixel
from the left in row fl corresponds to the absolute value of DCT( fl )(k) (for better
visibility, white represents 0 in this image and all values are multiplied by 10), and
one can observe that the DCT coefficients get smaller as the frequency increases.
Finally, Figure 13.5 shows the same image after quantizing and dequantizing with
parameter q (that is, rounding the DCT coefficients to integral multiples of q) and
row-wise application of IDCT, for q = 10 and q = 100. The image at right in
Figure 13.4 illustrates why these compression methods are successful. Light gray
areas are “rounded down” to 0, and the darker values are similarly simplified. The
larger q is, the more rounding occurs.

F IGURE 13.5: The image from Figure 13.4 after quantizing with q = 10 and q = 100,
using a row-wise Discrete Cosine Transform.

Table 13.6 lists the compression rates—the size ratio of the compressed and the
original image—for the image from Figure 13.4 with various quantizing parame-
ters q and different lossless data compression techniques. For example, the size
of the file after quantizing with q = 10 and using both run length and Huffman
encoding is 107 646 bytes, which is about 13.59% of the original size, and this
13.2. Audio and video compression 367

q 1 2 5 10 20 50 100
Huffman 55.89 44.00 29.37 22.60 18.24 14.89 13.62
Run length 91.81 75.01 43.26 28.52 18.49 8.99 4.64
Run length + 52.83 39.89 22.26 13.59 8.21 3.85 1.94
Huffman

TABLE 13.6: Compression rates in % for the image from Figure 13.4 with the row-wise
Discrete Cosine Transform and different encoding schemes.

q 1 2 5 10 20 50 100
Huffman 40.95 29.62 20.38 17.13 15.43 14.18 13.52
Run length 78.14 54.35 23.39 12.57 8.08 5.34 4.21
Run length + 35.14 23.98 11.24 6.14 3.68 2.07 1.33
Huffman

TABLE 13.7: Compression rates in % for the image from Figure 13.4 with the Discrete
Cosine Transform for 8 × 8 squares and different encoding schemes.

F IGURE 13.8: The image from Figure 13.4 after quantizing with q = 10 and q = 100,
using the Discrete Cosine Transform for 8 × 8 squares.
368 13. Fourier Transform and image compression

compression rate is the entry in the last row of the column with head 10 in Ta-
ble 13.6. For comparison, Huffman encoding applied to the image itself (instead
of its Discrete Cosine Transform) yields a compression rate of 75.62%, and the
GIF graphics format achieves a lossless compression to 54.08% of the original
size.
The method described above has the disadvantage that quantization leads to
perturbations in the whole row, as can be seen in Figure 13.5 for q = 100. Thus
the local structure of an image (for example, slow variation of the luminance in
the sky parts of Figure 13.4) cannot be exploited. This can be circumvented by
dividing the image into smaller parts of a fixed size (instead of complete rows) and
applying the above compression technique to each part separately.
In the JPEG still image compression standard, for example, the original image
is divided into squares of 8 × 8 pixels, and the two-dimensional Discrete Cosine
Transform of each square (which is a combination of a row-wise and a column-
wise one-dimensional Discrete Cosine Transform) is computed. Then the DCT
coefficients of all squares are quantized, run length encoded and finally Huffman
encoded. While the one-dimensional row-wise Discrete Cosine Transform takes
only horizontal dependencies into account, its two-dimensional variant covers hor-
izontal and vertical dependencies simultaneously. This together with the improved
adaptivity to the local structure of an image leads to significantly higher compres-
sion rates than the above row-wise approach (Table 13.7).
Figure 13.8 shows the image from Figure 13.4 after compression and decom-
pression with the Discrete Cosine Transform for 8 × 8 squares and quantization
factors q = 10 and q = 100. For q = 10, for example, one hardly notices any dif-
ferences between the images in Figures 13.5 and 13.8, but the compression rate of
the former is about 13.59%, while the latter compresses down to 6.14%.

Notes. 13.1. Good references for (digital) signal processing are Oppenheim & Schafer
(1975) and Oppenheim, Willsky & Young (1983).
13.2. For a description of the Huffman code, see Huffman (1952), §16.3 in Cormen, Leiser-
son, Rivest & Stein (2009), and Exercise 10.6. The JPEG standard is described in Wallace
(1991) and Pennebaker & Mitchell (1993).

Exercises.
13.1 Show that for any discrete periodic signal f : Z −→ C there is a least period (called the funda-
mental period) n ∈ N>0 such that any other period of f is an integral multiple of n.
13.2 Let f , g: R −→ C be two 2π-periodic signals. If f and g are sufficiently smooth, then the
convolution Z 2π
( f ∗ g)(t) = f (s)g(t − s)ds
0
exists for all t ∈ R. Prove that f ∗ g is again 2π-periodic, and that the convolution property fd
∗ g = fb· gb
holds, so that
( fd
∗ g)(k) = fb(k) · gb(k) for all k ∈ Z.
Exercises 369

(Thus the Fourier Transform converts convolution into pointwise multiplication.) You may assume
that all occurring integrals exist.
13.3 Let f (t) = sin(t) + sin(10t)/10. Compute fb(k) for k ∈ Z.
√
13.4−→ (i) Compute the Fourier coefficients of f (t) = eint for a fixed n ∈ Z, where i = −1.
(ii) Compute the Fourier coefficients of the 2π-periodic square wave which has f (t) = −1 for
−π ≤ t < 0 and f (t) = 1 for 0 ≤ t < π.
1
0.5
0
t
-0.5 -10 -5 0 5 10
-1
(iii) Compute the Fourier coefficients of the 2π-periodic triangular wave which has f (t) = t/π for
−π ≤ t < π.
1
0.5
0
t
-0.5 -10 -5 0 5 10
-1

13.5 Let f : Z −→ C be a discrete signal of period n ∈ N>0 . Show that ℜ fb(k) = 0 for all k if f is
odd, so that f ( j) = − f (− j) for all j, and that ℑ fb(k) = 0 for all k if f is even, so that f ( j) = f (− j)
for all j.
13.6∗ Let f : {0, . . ., n − 1} −→ R be a discrete signal of finite duration n. We associate to f a signal
g: Z −→ R of period 4n, by letting

g(2 j + 1) = g(4n − 2 j − 1) = f ( j) for 0 ≤ j < n,

(3)
g(2 j) = 0 for 0 ≤ j < 2n,

and periodically extending g to a function that is defined for all integers. This corresponds to gluing
f and a reflected copy of f together and interleaving the result with zeroes. Obviously g is even and
g( j) vanishes if j is even.
(i) Prove that the Discrete Fourier Transform gb is real-valued of period 4n and has the symmetry
properties
gb(k) = gb(4n − k) = −b
g(2n + k) = −bg(2n − k) for k ∈ Z.
(ii) Show that the inversion formula
!
1 gb(0) πk(2 j + 1)
f ( j) = g(2 j + 1) = + ∑ gb(k) cos
n 2 1≤k<n 2n

holds for 0 ≤ j < n.

(iii) Conclude that DCT and IDCT are inverse operators.
Part III
Gauß
Carl Friedrich Gauß (1777–1855), the Prince of Mathematicians, was the latest,
after Archimedes and Newton, in this trio of great men whose ideas shaped
mathematics for centuries after their work (and two of whom figure prominently
in this book).
Born on April 30, 1777, and registered as Johann Friderich Carl Gauß, he grew
up in a poor bricklayer’s family in Braunschweig. His father, an honest but tough
and simple-minded person, did not succeed in keeping his son as uneducated as
himself, mainly because of the efforts of Gauß’ mother Dorothea and his uncle
Friederich.
Gauß loved to tell the story of how—at ten years of age—one of the first
flashes of his genius surprised his unsuspecting teacher Büttner. The class had
been given the task to sum the numbers 1, . . . , 100. (What a useless task!) Gauß
figured out the corresponding summation formula (see Section 23.1), wrote down
the correct answer almost immediately, and waited while the other boys took the
full hour to get their answers—all wrong. (Such stupidities have not vanished
from German schools: the first author had a high-school geography teacher who
would set similarly useless tasks in order to have some time for serious study—
of the current Playboy issue.)
Luckily for Gauß, this teacher and others recognized his genius, and Duke
Ferdinand of Braunschweig funded his high-school education and then university
studies at Göttingen. This feudal generosity may have contributed to Gauß’
absence of political involvement, although his were turbulent times: the French
Revolution, Napoleonic Wars, and the revolutions of 1830 and 1848.
His student years 1795–1798 were incredibly productive. He found the
ruler-and-compass construction of the regular 17–gon (and the determination,
without proof, of which regular polygons can be constructed in this way), proved
the law of quadratic reciprocity (where Euler’s and Legendre’s efforts had failed),
discovered the method of least squares, classified binary quadratic forms,
established the normal error distribution in probability theory (Gauß’ bell curve ),
gave the first rigorous proof of the factorization of real polynomials into linear
and quadratic factors (the fundamental theorem of algebra , his doctoral thesis in
1799), and much more.
He finished his masterpiece Disquisitiones Arithmeticae in 1798, at the age
of 21 (Gauß 1801). Written in a terse style, according to Gauß’ motto pauca sed
matura1, it was heavy going for the mathematicians of his age, but its wealth of
ideas paved the way for new directions. Its publication was funded by the Duke;
to limit the cost, Gauß had to leave out the last of eight planned parts, containing
methods for the factorization of polynomials (see the Notes to Chapters
1 few but mature

372
14 and 15). This material was published posthumously from his handwritten
notes (Gauß 1863b).
Gauß married Johanna Osthoff, younger by three years, in 1805. They had
three children, but Johanna died after the last birth. Less than a year later, Gauß
married her friend Minna Waldeck, eleven years younger, and they also had three
children.
Gauß’ influence permeates many parts of this book. His study of roots of unity
and their subdivision according to subgroups of the relevant Galois group, the
Gauß periods , can be seen as a precursor of the Fast Fourier Transform in
Section 8.2. (These periods are also instrumental for modern fast algorithms for
exponentiation in finite fields.) He proved the basic facts about factoring
polynomials and the relation between factoring over Z and over Q (Section 6.2),
found (but did not publish) the distinct-degree factorization method over finite
fields (Section 14.2) and Hensel lifting (Section 15.4), guessed the prime number
theorem (but did not prove it; see Notes 18.4), and studied hypergeometric series
(Section 23.4). His Gaussian elimination is a staple of linear algebra
(Sections 5.5 and 14.8).
Perhaps as important as his monumental contributions to so many fields is the
fact that he championed the idea of mathematical rigor and watertight proof. This
was often absent in 18th century mathematics, which lacked a precise
understanding of things like limits and infinite sums. (Later, people such as
Cauchy, Weierstraß, and Hilbert perfected Gauß’ approach.)
According to himself,
Gauß’ work was only
motivated by his inner urge
for mathematical discoveries,
and not his desire to publish
or impress others. This
manifested itself in markedly
weak public relations. He did
not educate a school of eager
young disciples to spread his gospel, but he had a few brilliant students: Bernhard
Riemann was his only pupil in the usual sense; Ferdinand Eisenstein and Richard
Dedekind were his students in a wider sense. Many of Gauß’ discoveries were not
published during his lifetime: his insights on the arithmetic–geometric mean,
elliptic functions and their double periodicity (with which Abel and Jacobi
struggled later), the fundamental theorem on analytic functions (vanishing of
closed curve integrals, rediscovered by Cauchy), quaternions (found by William
Hamilton on 16 October 1843, when Gauß’ notes had already slumbered in his
drawers for thirty years), and his 1816 discovery of non-Euclidean geometry

373
(given to the world by Nikolas Lobachevsky in 1829, and the son Johann Bolyai
de Bolya of Gauß’ friend Wolfgang Bolyai in 1832).
His appointment at the university of Göttingen in 1807 was as professor of
astronomy. On 1 January 1801, Guiseppe Piazzi had discovered the asteroid
Ceres—and it vanished in February. The astronomers could not find it again.
Gauß used his newly devised computational methods in astronomy to calculate
the orbit, and thanks to this, Ceres was rediscovered in December. This brought
world-wide fame to him instantly. During his 48 years as professor, he gave 181
courses and seminars; of these, 128 were on astronomy, and only one on number
theory.
One highly unusual aspect of Gauß’ work is his uncanny mixture of theory and
practice, with either profiting from the other. (Archimedes had a similar talent,
while Newton’s theoretical determination of improved ship hull cross-sections
failed in practice.) This gave his scientific achievements a much wider audience
than usual, and, after the low ebb of natural science (as opposed to literature,
music, and philosophy) in Germany during the 18th century, he helped create an
atmosphere in which bright young men were attracted to mathematics and science
in the 19th century.
Gauß led, over many years, a geodetical survey of the Kingdom of Hannover.
A private goal was to determine, in view of his discovery of non-Euclidean
geometry, whether physical triangles really have an angle sum of 180 degrees—
a question that astronomers still work on today with high-precision instruments.
This work stimulated his research in differential geometry, leading to the
important concept of Gaussian curvature and the Gauß–Bonnet theorem. He
constructed, with Wilhelm Weber, an electric telegraph in 1833, with a 2-km-long
wire, destroyed by lightning in 1845. He worked, also with Weber, on earth
magnetism, and the unit of the magnetic force is called a gauß. At the Senate’s
request, he reorganized the University Widow’s Fund, and on the way created the
basis for modern life insurance calculations.
Gauß died in 1855, at age 77, and was buried in St. Albani’s cemetery in
Göttingen; today this is a pleasant park.

374
Polynomial factorization is perhaps one of the
most striking successes of symbolic computation.
Zhuojun Liu and Paul S. Wang (1994)

La question de factorisation, que G AUSS considérait

avec raison comme fondamentale, est traitée dans notre ouvrage
avec une abondance de détails qu’on ne trouve pas
habituellement dans un livre d’étude, et certaines notions
y sont développées pour la première fois. [. . . ]
Il ne nous appartient pas de nous prononcer sur la valeur scientifique
de notre exposé, mais nous avons la conviction
de n’avoir épargné, ni notre travail, ni notre temps,
pour élucider cette question importante.1
Maurice Kraïtchik (1926)

Le plus souvent, cependant, il sera aisé de trouver par le tâtonnement

une congruence irréductible d’un degré donné ν.2
Évariste Galois (1830)

In den meisten Wissenschaften pflegt eine Generation das

niederzureißen, was die andere gebaut, und was jene gesetzt,
hebt diese auf. In der Mathematik allein setzt jede Generation
ein neues Stockwerk auf den alten Unterbau.3
Hermann Hankel (1869)

Two bombs each. Every bomb had a 95 percent probability of hitting

[. . .] Even the paper probability was less than half a percent chance
of a double miss, but that times ten targets meant a five percent chance
that one missile would survive.
Tom Clancy (1994)

1 The question of factorization, which G AUSS rightly considered as fundamental, is treated in our work with an
abundance of details that one does not find usually in a textbook, and some notions are developed there for the
first time. [. . . ] It is not appropriate for us to make statements about the scientific value of our exposition, but we
have the conviction that we have spared neither effort nor time in order to elucidate this important question.
2 Most often, however, it will be easy to find by trial an irreducible polynomial [modulo a prime number] of a
given degree ν.
3 In most sciences one generation tears down what another has built and what one has established another undoes.
In mathematics alone each generation builds a new storey on top of the old structure.
14
Factoring polynomials over finite fields

In this chapter, we present several algorithms for the factorization of univariate

polynomials over finite fields. The two central steps are distinct-degree factoriza-
tion, where irreducible factors of distinct degrees are separated from each other,
and equal-degree factorization, where all irreducible factors of the input poly-
nomial have the same degree. The reader who is happy with the basic result
of probabilistic polynomial-time factorization only has to go up to Section 14.4.
The remaining sections discuss root finding (14.5), squarefree factorization (14.6),
faster algorithms (14.7), methods using a different approach based on linear alge-
bra (14.8), and the construction of irreducible polynomials and BCH codes (14.9
and 14.10). The implementations, briefly described in Section 15.7, show that this
is an area where computer algebra has been tremendously successful: we can now
factor enormously large polynomials.

14.1. Factorization of polynomials

The fundamental theorem of number theory states that every integer can be
(essentially uniquely) factored as a product of primes. Similarly, for any field
F the polynomials in F[x1 , . . . , xn ] can be (essentially uniquely) factored into a
product of irreducible polynomials. In other words, Z and F[x1 , . . . , xn ] are Unique
Factorization Domains (Sections 6.2, 25.2).
The “essentially uniquely” says that such a factorization is unique up to the
order of the factors and multiplication by units, that is, by 1 or −1 in Z and by
nonzero constants from F in F[x1 , . . . , xn ]. As an example, x2 −1 = (x−1)(x+1) =
(−x − 1)(−x + 1). A polynomial f ∈ F[x1 , . . . , xn ] is irreducible if and only if
f 6∈ F and for any g, h ∈ F[x1 , . . . , xn ] with f = gh we have g ∈ F or h ∈ F. We
say that f is squarefree if it has no proper quadratic divisor, so that for any g ∈
F[x1 , . . . , xn ] with g2 dividing f we have g ∈ F.
The problem of univariate polynomial factorization is, given a polynomial f in
F[x] for a field F, to determine pairwise distinct monic irreducible polynomials
f1 , . . . , fr ∈ F[x] and positive integers e1 , . . . , er ∈ N such that f = lc( f ) f1e1 · · · frer .

377
378 14. Factoring polynomials over finite fields

It seems computationally difficult to factor large integers (Chapter 19). How-

ever, computer algebra systems can routinely factor reasonably large polynomials;
this is a task where the usual computational analogies between integers and poly-
nomials break down. The irreducible factors of a univariate polynomial of degree n
have degree at most n, and over a finite field Fq with q elements (Section 25.4) there
are only finitely many such polynomials. But an exhaustive search might consume
roughly qn/2 trials, and this is exponential both in n and the bit size log2 q of the
representation of a field element.
We will describe in detail probabilistic algorithms that factor a univariate poly-
nomial of degree n over a finite field Fq in time polynomial in n and log q, in fact,
with about n2 + n log q operations in Fq . An interesting effect of having the two
independent input size parameters n and log q is that today there are several asym-
ptotically best algorithms, depending on the relation between the two parameters
(Figure 14.9). More general questions include factoring polynomials in
◦ Z[x] and Q[x],
◦ Q(α)[x], where Q(α) is an algebraic number field (a finite algebraic extension
of Q),
◦ Zm [x], where m ∈ N is a positive integer,
◦ R[x] and C[x],
◦ and multivariate polynomials.

Fq [x]

Zm [x] Q[x] Fq [x1 , . . . , xn ]

Q(α)[x] Q[x1 , . . . , xn ]

Q(α)[x1 , . . . , xn ]

F IGURE 14.1: Polynomial factorization in various domains.

The dependencies between some of these are shown in Figure 14.1. It turns out
that factoring univariate polynomials over finite fields is a basic task used in many
other factoring algorithms. Factoring in Q[x] is the topic of Chapters 15 and 16.
Some algorithms for finite fields proceed in three stages:
14.1. Factorization of polynomials 379

squarefree factorization

distinct degree factorization

equal degree factorization

F IGURE 14.2: The stages of univariate polynomial factorization over finite fields.

1. squarefree factorization,
2. distinct-degree factorization,
3. equal-degree factorization.
Squarefree factorization gets rid of multiple factors, distinct-degree factorization
splits irreducible factors according to their degrees, and equal-degree factorization
solves the remaining problem, where all irreducible factors are distinct and of the
same degree. In Figure 14.2, we see how the three stages work. The width of a
box represents the degree of the corresponding polynomial; different colors stand
for different irreducible factors. In the example, the original polynomial consists
of four factors of degree 2 (two of them equal), one factor of degree 4, and one of
degree 6.
The first stage is quite easy, both in theory and in practice. When the input is a
large random polynomial, then the third stage is likely to be needed only for very
small polynomials, and the second stage consumes the bulk of the computing time
(more than 99% in our experiments described in Section 15.7).
In the next three sections we present in detail a conceptually simple complete
factorization algorithm; see Figure 14.7. The determination of repeated factors in
Figure 14.2 is actually delayed until the end.
A fundamental tool for our algorithms is the following theorem (see Section
25.4 for a proof), which generalizes Theorem 4.9.

T HEOREM 14.1 Fermat’s little theorem.

For nonzero a ∈ Fq , we have aq−1 = 1, and for all a ∈ Fq , we have aq = a, and
xq − x = ∏ (x − a) in Fq [x].
a∈Fq
380 14. Factoring polynomials over finite fields

The reader must be thoroughly familiar with the material of Section 25.4 on
finite fields, of which we make substantial use. The notation Fq and Fermat’s
little theorem in Fq are used over and over again. We will also use the fact that if
f ∈ Fq [x] is irreducible of degree n, then Fqn = Fq [x]/h f i is a field with qn elements
(Section 4.2), and Fermat’s little theorem implies that Fq = {a ∈ Fqn : aq = a} .
The possible sizes of finite fields are precisely the prime powers, and in the
following, q always denotes a prime power. The reader may think of q as being a
prime number. However, most statements or proofs do not become simpler for this
special case, so that we may as well work in full generality.

14.2. Distinct-degree factorization

In this section, we describe the distinct-degree factorization stage for squarefree
polynomials, deferring the question of how to deal with other polynomials until
Section 14.4.
Fermat’s little theorem is the special case d = 1 of the following result.

T HEOREM 14.2.
d
For any d ≥ 1, xq − x ∈ Fq [x] is the product of all monic irreducible polynomials
in Fq [x] whose degree divides d .

d
P ROOF. By Fermat’s little theorem 14.1, applied to Fqd , h = xq − x is the product
of all x − a with a ∈ Fqd . If g2 divides h (over Fq ) with g ∈ Fq [x] \ Fq , then some
x−a divides g and (x−a)2 divides h. Since this is impossible, no such g exists, and
d
xq − x is squarefree. It is sufficient to show for any monic irreducible polynomial
f ∈ Fq [x] of degree n that
d
f divides xq − x ⇐⇒ n divides d.
d
We consider the field extension Fq ⊆ Fqd . If f divides xq − x, then from Theo-
rem 14.1, applied to Fqd , we get a set A ⊆ Fqd with f = ∏a∈A (x − a). We choose
some a ∈ A, and let Fq [x]/h f i ∼
= Fq (a) ⊆ Fqd , where Fq (a) is the smallest subfield
of Fqd containing a (Section 25.3). This is a field with qn elements, and Fqd is an
extension of Fq (a), so that qd = (qn )e for some integer e ≥ 1. Hence n divides d.
Now suppose that n divides d, let Fqn = Fq [x]/h f i, and a = (x mod f ) ∈ Fqn be a
n
root of f . Theorem 14.1 says that aq = a. Since qn − 1 divides qd − 1 = (qn − 1) · e
n
with e = qd−n + qd−2n + · · · + 1, also xq −1 − 1 divides
d n n
xq −1
− 1 = (xq −1 − 1)(x(q −1)(e−1) + · · · + 1).
n d
Multiplying by x, we find that xq − x divides xq − x, and hence
n d
(x − a) | (xq − x) | (xq − x),
14.2. Distinct-degree factorization 381

d
so that x − a divides gcd( f , xq − x) in Fqn [x]. But the gcd of two polynomials
with coefficients in Fq also has coefficients in Fq (Example 6.19), and since it is
d
nonconstant and f is irreducible, gcd( f , xq − x) = f , or, equivalently, f divides
d
xq − x. ✷

The distinct-degree decomposition of a nonconstant polynomial f ∈ Fq [x] is

the sequence (g1 , . . . , gs ) of polynomials, where gi is the product of all monic irre-
ducible polynomials in Fq [x] of degree i that divide f and gs 6= 1 (but some gi for
i < s may be 1). As an example, (x2 + x, x4 + x3 + x + 2) is the distinct-degree de-
composition of f = x(x + 1)(x2 + 1)(x2 + x + 2) ∈ F3 [x]; the two quadratic factors
are irreducible since they have no zeroes in F3 .

A LGORITHM 14.3 Distinct-degree factorization.

Input: A squarefree monic polynomial f ∈ Fq [x] of degree n > 0.
Output: The distinct-degree decomposition (g1 , . . . , gs ) of f .
1. h0 ←− x, f0 ←− f , i ←− 0
repeat
2. i ←− i + 1
call the repeated squaring algorithm 4.8 in R = Fq [x]/h f i to compute
hi = hqi−1 rem f
fi−1
3. gi ←− gcd(hi − x, fi−1 ), fi ←−
gi
4. until fi = 1
5. s ←− i
return (g1 , · · · , gs )

The distinct-degree decomposition is a sequence of polynomials, and distinct-

degree factorization is the process of computing this sequence.
In our cost analyses, we use a multiplication time M as on the inside back cover.

T HEOREM 14.4.
The distinct-degree factorization algorithm works correctly as specified. It takes
O(s M(n) log(nq)) or O∼ (n2 log q) operations in Fq , where s is the largest degree
of an irreducible factor of f .

P ROOF. Let (G1 , . . . , Gt ) be the distinct-degree decomposition of f , with Gt 6= 1.

For the correctness, it is sufficient to show by induction on i ≥ 0 that
i
hi ≡ xq mod f , fi = Gi+1 · · · Gt , gi = Gi if i ≥ 1.
382 14. Factoring polynomials over finite fields

The first two claims are clear for i = 0. For i ≥ 1, we have hi ≡ hqi−1 ≡ (xq )q =
i−1

i i
xq mod f , so that hi − x ≡ xq − x mod f and
i
gi = gcd(hi − x, fi−1 ) = gcd(xq − x, fi−1 ).
By Theorem 14.2, gi is the product of all monic irreducible polynomials in Fq [x]
of degree dividing i that divide fi−1 = Gi · · · Gn , hence gi = Gi . Furthermore, fi =
Gi · · · Gn /gi = Gi+1 · · · Gn . This finishes the inductive step and also shows that s = t.
The cost for computing hi in step 2 is O(log q) multiplications modulo f in
step 2, or O(M(n) log q) operations in Fq , by Corollary 11.11. Similarly, the cost
for computing gi and fi is O(M(n) log n) operations. ✷

Algorithm 14.3 may be stopped as soon as deg fi < 2(i + 1), since all irreduc-
ible factors of fi have degree at least i + 1, and hence fi is irreducible in that
case. This is called early abort and guarantees that the algorithm stops after
i = max{m1 /2, m2 } ≤ n/2, where m1 and m2 are the degrees of the largest and
the second largest irreducible factor of f , respectively. In step 2, hi is actually only
needed modulo fi−1 .

E XAMPLE 14.5. We let q = 3 and trace Algorithm 14.3 on the squarefree poly-
nomial f = x8 + x7 − x6 + x5 − x3 − x2 − x ∈ F3 [x]. Then
h1 = h30 rem f = x3 rem f = x3 ,
g1 = gcd(h1 − x, f0 ) = gcd(x3 − x, f ) = x,
f0 f
f1 = = = x7 + x6 − x5 + x4 − x2 − x − 1,
g1 x
h2 = h31 rem f = x9 rem f = −x7 + x6 + x5 + x4 − x,
g2 = gcd(h2 − x, f1 ) = gcd(−x7 + x6 + x5 + x4 + x, f1 ) = x4 + x3 + x − 1,
f1 x7 + x6 − x5 + x4 − x2 − x − 1
f2 = = = x3 − x + 1.
g2 x4 + x3 + x − 1
At this point, Algorithm 14.3 would perform one further iteration, but the early
abort condition deg f2 < 2(2 + 1) = 6 says that this is not necessary since f2 is
already irreducible. Thus f has one linear factor, two distinct irreducible quadratic
factors (which we do not know yet), and one irreducible cubic factor. The trace is
illustrated in Figure 14.3. ✸

14.3. Equal-degree factorization: Cantor and Zassenhaus’ algorithm

The remaining task is equal-degree factorization: to factor one of the polyno-
mials that are produced by the previous distinct-degree factorization. Our algo-
rithm works only for an odd prime power q; a method for characteristic 2 is de-
scribed in Exercise 14.16.
14.3. Equal-degree factorization: Cantor and Zassenhaus’ algorithm 383

f0 = f

g1 f1

g2 f2

F IGURE 14.3: Sample distinct-degree factorization.

3
9
5 14
12 1 13
8 5
8
F×
13 12
2 F×
17 16 1
4 3 6
11 2
11
6 4
10 9 7
7 15
10

F IGURE 14.4: Squaring in F× ×

13 and F17 .

2
We first collect some facts about the squaring map σ : F×
q −→ Fq , with σ (a) = a .
×
× ×
As an example, the effect of σ on the elements of F13 and of F17 is given in Fig-
ure 14.4. An arrow from a number i to a number j indicates that j = σ (i). Each
element has either two or zero arrows pointing to it; the first ones are the squares,
the second ones the nonsquares. Both sets contain exactly half of the elements.
Lemma 14.7 below, which is the special case k = 2 of the following lemma, says
that this is always the case.

L EMMA 14.6. Let q be a prime power, k a divisor of q − 1, and S = {bk : b ∈ F×

q}
the set of kth powers in Fq .
×

(i) S is a subgroup of order (q − 1)/k.

(ii) S = {a ∈ F×
q :a
(q−1)/k
= 1}.

P ROOF. S is the image of the kth power homomorphism σk : F×q −→ Fq with

×
k
σk (b) = b and hence a subgroup of Fq . The kernel of σk is
×

× k
ker σk = {a ∈ F×
q : σk (a) = 1} = {a ∈ Fq : a = 1}, (1)
384 14. Factoring polynomials over finite fields

the set of kth roots of unity. Since Fq is a field, the polynomial xk − 1 ∈ Fq [x] has
at most k roots in Fq [x] (Lemma 25.4), and hence # ker σk ≤ k.
Since (bk )(q−1)/k = bq−1 = 1 for all b ∈ F× q , by Fermat’s little theorem 14.1,
we have S ⊆ ker σ(q−1)/k . By the same reasoning as above, this implies that #S ≤
(q − 1)/k. Now

q − 1 = #F×
q = # ker σk · # im σk = # ker σk · #S ≤ k · (q − 1)/k = q − 1,

by the homomorphism theorem for groups, and this implies that # ker σk = k, #S =
(q − 1)/k, and S = ker σ(q−1)/k . ✷

Applying the lemma for k = 2 and k = (q − 1)/2, we obtain the following.

L EMMA 14.7. Let q be an odd prime power and

2
S = {a ∈ F× ×
q : ∃b ∈ Fq a = b }

be the set of squares in F×

q . Then

(i) S ⊆ F×
q is a (multiplicative) subgroup of order (q − 1)/2,

(ii) S = {a ∈ F×
q :a
(q−1)/2
= 1},

(iii) a(q−1)/2 ∈ {1, −1} for all a ∈ F×

Now we want to factor a monic polynomial f ∈ Fq [x] with deg f = n, and have
a divisor d ∈ N of n so that each irreducible factor of f has degree d. There are
r = n/d such factors, and we can write f = f1 · · · fr with distinct monic irreduc-
ible f1 , . . . , fr ∈ Fq [x]. We may assume that r ≥ 2; otherwise, we know that f is
irreducible. Since gcd( fi , f j ) = 1 for i 6= j, we have the ring isomorphism of the
Chinese Remainder Theorem 5.3:

χ: R = Fq [x]/h f i −→ Fq [x]/h f1 i × · · · × Fq [x]/h fr i = R1 × · · · × Rr .

Each Ri is a field with qd elements and an algebraic extension of Fq of degree d:

F qd ∼
= Ri = Fq [x]/h fi i ⊇ Fq .

We use the convention that for a ∈ Fq [x], we have a mod f ∈ R and χ(a mod f ) =
(a mod f1 , . . . , a mod fr ) = (χ1 (a), . . . , χr (a)), where χi (a) = a mod fi ∈ Ri . For
a ∈ Fq [x] and i ≤ r, we have that fi divides a if and only if χi (a) = 0. If we obtain
an a ∈ Fq [x] with some χi (a) equal to zero and others nonzero, then gcd(a, f ) is a
nontrivial divisor of f . We now describe a probabilistic procedure to find such a
splitting polynomial a.
14.3. Equal-degree factorization: Cantor and Zassenhaus’ algorithm 385

We assume that q is odd, and write e = (qd − 1)/2. For any β ∈ R× ×

i = Fqd , we
e
have β ∈ {1, −1}, and both possibilities occur equally often, by Lemma 14.7 with
qd instead of q. If we choose a ∈ Fq [x] with deg a < n and gcd(a, f ) = 1 uniformly
at random, then χ1 (a), . . . , χr (a) are independent uniformly distributed elements
of F×qd
, and εi = χi (ae ) ∈ Ri is 1 or −1, each with probability 1/2. Therefore

χ(ae − 1) = (ε1 − 1, . . . , εr − 1),

and ae − 1 is a splitting polynomial unless ε1 = · · · = εr . The latter occurs with

probability 2 · (1/2)r = 2−r+1 ≤ 1/2.

A LGORITHM 14.8 Equal-degree splitting.

Input: A squarefree monic polynomial f ∈ Fq [x] of degree n > 0, where q is an odd
prime power, and a divisor d < n of n, so that all irreducible factors of f have
degree d.
Output: A proper monic factor g ∈ Fq [x] of f , or “failure”.

1. choose a ∈ Fq [x] with deg a < n at random

if a ∈ Fq then return “failure”

2. g1 ←− gcd(a, f )
if g1 6= 1 then return g1

3. call the repeated squaring algorithm 4.8 in R = Fq [x]/h f i to compute b =

d
a(q −1)/2 rem f

4. g2 ←− gcd(b − 1, f )
if g2 6= 1 and g2 6= f then return g2 else return “failure”

T HEOREM 14.9.
Algorithm 14.8 works correctly as specified. It returns “failure” with probability
less than 21−r ≤ 1/2, where r = n/d ≥ 2, and takes O((d log q + log n)M(n)) or
O∼ (n2 log q) operations in Fq .

P ROOF. The failure probability has been given above as 21−r if gcd(a, f ) = 1. For
general a, where step 2 might find a factor, the failure probability is less than 21−r .
The cost for the gcds in steps 2 and 4 is O(M(n) log n), and computing b in step 3
takes at most 2 log2 (qd ) ∈ O(d log q) multiplications modulo f or O(M(n)d log q)
operations in Fq . ✷

The usual trick of running the algorithm k times makes the failure probability
less than 2(1−r)k ≤ 2−k .
386 14. Factoring polynomials over finite fields

E XAMPLE 14.5 (continued). We run Algorithm 14.8 on the remaining unfactored

polynomial f = x4 + x3 + x − 1 ∈ F3 [x] from Example 14.5. We know that this
polynomial factors into r = 2 irreducible polynomials of degree d = 4/r = 2. If
our first choice in step 1 happens to be a = x + 1, then

g1 = gcd(a, f ) = gcd(x + 1, x4 + x3 + x − 1) = 1,
b = a4 rem f = (x + 1)4 rem x4 + x3 + x − 1 = −1,
g2 = gcd(b − 1, f ) = gcd(1, f ) = 1,

and this choice is unlucky. Our next random choice might be a = x. Then

g1 = gcd(a, f ) = gcd(x, x4 + x3 + x − 1) = 1,
b = a4 rem f = x4 rem x4 + x3 + x − 1 = −x3 − x + 1,
g2 = gcd(b − 1, f ) = gcd(−x3 − x, x4 + x3 + x2 − 1) = x2 + 1.

The latter is one irreducible factor of f , and the other one is f /(x2 +1) = x2 +x−1.

2x3 +2x2 2x+2

2x3 +x2 2x
2x3 x+1
x3 +2x2 x
x3 +x2 2x+1
x3 x+2
2x2 2
x2 1
0 0
0 1 2 x 2x 0 1 2 x 2x
1
2

1
2
1
2

1
2
x+
x+

x+
x+
+
+

+
+
2x
2x

2x
2x

F IGURE 14.5: The lucky and unlucky choices for factoring x4 + x3 + x − 1 ∈ F3 [x].

Figure 14.5 illustrates the situation. On the left hand side, we have R = F[x]/h f i,
consisting of the 81 polynomials a3 x3 + a2 x2 + a1 x + a0 mod f , with all ai ∈ F3 .
The possible values for a1 x + a0 are along the horizontal axis, and similarly a3 x3 +
a2 x2 along the vertical axis. Our two choices are marked by a •.
We have R ∼ = F3 [x]/hx2 + 1i × F3 [x]/hx2 + x − 1i ∼ = F9 × F9 on the right hand
side, with the nine elements of the first factor on the horizontal axis, and the sec-
ond factor on the vertical axis. We have arranged our two copies of F9 in an
isomorphic way; in particular, mapping x mod x2 + 1 to x + 2 mod x2 + x − 1 gives
an isomorphism, since (x + 2)2 + 1 ≡ 0 mod x2 + x − 1. On both axes, we first
have 0, then the four nonzero squares, and then the four nonsquares.
The lucky choices of a are colored, the unlucky ones white. At right, it is
clear what happens: the 16 blue elements with exactly one coordinate 0 give a
14.3. Equal-degree factorization: Cantor and Zassenhaus’ algorithm 387

factorization in step 2, and the 32 green elements of type square/nonsquare or

nonsquare/square are splitting polynomials. The 1 + 32 = 33 other elements are
unlucky. In general, only 2qd − 2 of the qrd elements of R are “blue”, so that for
larger values of qd it is very unlikely that the algorithm hits upon one of them by
chance.
The representation at left is the one we can compute with. The magic of an
isomorphism transforms this chaotic conglomerate into the disciplined diagram at
right, about which we can easily reason, derive algorithms and prove their proper-
ties; these algorithms then are executed in the messy real world at left. ✸

Algorithm 14.8 gives a factorization into two factors. If we need just one irre-
ducible factor, we can apply the algorithm recursively to the smaller factor (Exer-
cise 14.15). However, we will usually want all r factors, and this can be done by
running the algorithm recursively on each factor.

A LGORITHM 14.10 Equal-degree factorization.

Input: A squarefree monic polynomial f ∈ Fq [x] of degree n > 0, for an odd prime
power q, and a divisor d of n, so that all irreducible factors of f have degree d.
Output: The monic irreducible factors of f in Fq [x].
1. if n = d then return f
2. call the equal-degree splitting algorithm 14.8 with input f and d repeatedly
until it returns a proper factor g ∈ Fq [x] of f
3. call the algorithm recursively with input g and with input f /g
return the results of the two recursive calls

T HEOREM 14.11.
A squarefree polynomial of degree n = rd with r irreducible factors of degree d can
be completely factored with an expected number of O((d log q + log n)M(n) log r)
or O∼ (n2 log q) operations in Fq .

P ROOF. The workings of Algorithm 14.10 can be illustrated by means of a labeled

tree (see Figure 14.6 for an example). The node labels are factors of f , with f at
the root and the irreducible factors of f at the leaves. If the call to Algorithm 14.8
in step 2 returns “failure” at a particular node, then the corresponding node has
precisely one child with the same label. Otherwise, it has two children labeled g
and f /g. The product over all labels at one level of the tree is a divisor of f , and
hence the total degree over all nodes at each level is at most n. The cost at a node
of degree m is O((d log q + log m)M(m)) operations in Fq , by Theorem 14.9, and
by the superlinearity of M (Section 8.3), the total cost at each level of the tree is
O((d log q + log n)M(n)) operations.
388 14. Factoring polynomials over finite fields

We now show that the expected depth of the tree is O(log r), which together
with r ≤ n implies the claims. Let 1 ≤ i < j ≤ r be fixed. Then in Algorithm 14.8,
the probability that a mod gi and a mod g j are neither both squares nor both non-
squares is at least 1/2, by the Chinese Remainder Theorem. Thus for each level
of the tree, the probability that a call to Algorithm 14.8 separates gi and g j at that
level is at least 1/2 (if they were not already separated before). Hence the prob-
ability that gi and g j are not yet separated at depth k is at most 2−k . This is true
for any pair of irreducible factors of f , and since there are (r2 − r)/2 ≤ r2 such
pairs, the probability pk that not all irreducible factors are separated at depth k is at
most r2 2−k . This is the probability that the depth of the tree is greater than k, and
pk−1 − pk is the probability that the depth is exactly k. Let s = ⌈2 log2 r⌉. Then the
expected depth of the tree is

∑ k(pk−1 − pk ) = ∑ pk = ∑ pk + ∑ pk ≤ ∑ 1 + ∑ r2 2−k
k≥1 k≥0 0≤k<s k≥s 0≤k<s k≥s
2 −s −k
= s+r 2 ∑2 ≤ s + 2 ∈ O(log r). ✷
k≥0

E XAMPLE 14.12. Suppose that we want to find all the irreducible factors fi of
f = f0 · · · f9 ∈ Fq [x], where the fi are monic, irreducible, pairwise distinct, and
have the same degree d.

0 (0123456789)

1 (0347) (125689)

2 (4) (037) (589) (126)

3 (37) (0) (589) (1) (26)

4 (7) (3) (59) (8) (2) (6)

5 (59)

6 (5) (9)

F IGURE 14.6: The workings of the equal-degree factorization algorithm 14.10 in Exam-
ple 14.12.
14.4. A complete factoring algorithm 389

Figure 14.6 illustrates the process of a typical execution of Algorithm 14.10

in the form of a tree. The leaves correspond to the isolated irreducible factors.
The labels are the indices of the irreducible factors of the polynomial at that node.
The numbers in the left column are the levels. For example, the rightmost label at
level 2 corresponds to the factor f1 f2 f6 of f . The depth 6 is less than our upper
bound ⌈2 log2 10⌉ + 2 = 9 on the average value. ✸

When q is large enough, there is a way to replace almost all powerings with
exponent (qd − 1)/2 in step 3 of Algorithm 14.8 by cheaper powerings with expo-
nent (q − 1)/2, leading to an expected time of O(d M(n) log q + M(n) log(qn) log r)
operations in Fq for a variant of Algorithm 14.10 (Exercise 14.17).

14.4. A complete factoring algorithm

It remains to deal with polynomials that are not squarefree. Section 14.6 describes
the squarefree factorization stage in detail, but we now discuss a simpler approach
by modifying the distinct-degree factorization stage.
The algorithm proceeds as follows. For i = 1, 2, . . ., the (squarefree) product
g of all irreducible factors of f of degree i is computed, via one distinct-degree
factorization step. Then g is factored into irreducibles, by calling the equal-degree
factorization algorithm. For each irreducible factor g j obtained, we determine its
multiplicity e in f by trial division, and then remove gej .

A LGORITHM 14.13 Polynomial factorization over finite fields.

Input: A nonconstant polynomial f ∈ Fq [x], where q is an odd prime power.
Output: The monic irreducible factors of f and their multiplicities.

f
1. h0 ←− x, v0 ←− , i ←− 0, U ←− Ø
lc( f )
repeat

2. i ←− i + 1
{ one distinct-degree factorization step }
call the repeated squaring algorithm 4.8 in R = Fq [x]/h f i to compute
hi = hqi−1 rem f
g ←− gcd(hi − x, vi−1 )

3. if g 6= 1 then
{ equal-degree factorization }
call Algorithm 14.10 with input g and i to compute the monic
irreducible factors g1 , . . . , gs ∈ Fq [x] of g
390 14. Factoring polynomials over finite fields

4. vi ←− vi−1
{ determine multiplicities }
for j = 1, . . . , s do
e ←− 0
vi
while g j | vi do vi ←− , e ←− e + 1
gj
U ←− U ∪ {(g j , e)}
5. until vi = 1

6. return U

As in the distinct-degree factorization algorithm 14.3, the algorithm may be

aborted as soon as deg vi < 2(i + 1), and hi need only be computed modulo vi−1 in
step 2.

T HEOREM 14.14.
Algorithm 14.13 correctly computes the irreducible factorization of f . If deg f = n,
then it takes an expected number of O(n M(n) log(qn)) or O∼ (n2 log q) arithmetic
operations in Fq .

P ROOF. Let f = lc( f ) ∏1≤i≤r fiei be the irreducible factorization of f , with distinct
monic irreducible polynomials f1 , . . . , fr ∈ Fq [x] and positive integers e1 , . . . , er .
We prove that the invariants
i
hi ≡ xq mod f , vi = lc( f ) ∏ fkek
deg fk >i

hold each time before the algorithm passes through step 2. The first invariant is
shown as in the proof of Theorem 14.4. The second one is clear for i = 0, and we
i
may assume that i ≥ 1. By Theorem 14.2, xq −x is the product of all distinct monic
irreducible polynomials in Fq [x] of degree dividing i, and hence in particular it is
squarefree. Thus, since vi−1 | f and by the induction hypothesis, the polynomial
i
g = gcd(hi − x, vi−1 ) = gcd(xq − x, vi−1 ) = ∏ fk
deg fk =i

is a squarefree polynomial with irreducible factors of degree i only, and g1 , . . . , gs

are in fact the irreducible factors of g at the end of step 3 if g 6= 1. These are then
removed with the correct multiplicity from vi in step 4, and the invariants hold
again before the next pass through step 2.
The cost for one execution of step 2 is O(M(n) log(qn)) operations in Fq . There
are at most n iterations of the outer loop, and hence the overall cost for step 2 is
O(n M(n) log(nq)) operations. If mi = deg g, then step 3, the only probabilistic part
14.4. A complete factoring algorithm 391

of the algorithm, takes an expected number of O((i log q + log mi )M(mi ) log(mi /i))
operations, by Theorem 14.11. Now
log(mi /i)
i log(mi /i) = mi ≤ mi ,
mi /i

∑(i log q + log mi )M(mi ) log(mi /i) ≤ ∑(mi log q + log2 mi )M(n)
i i
∈ O(n M(n) log q),

where we have used ∑i mi ≤ n and log is the binary logarithm. If e j denotes the
multiplicity of g j in f , then one execution of the body of the for loop in step 4
takes O(e j M(n)) operations in Fq , and the overall cost for step 4 is O(n M(n))
operations, since the sum of the multiplicities of all irreducible factors of f is at
most n. The timing estimate for step 2 is dominant, and the claim follows. ✷

We note that this is the same time bound as for the distinct-degree factorization
algorithm 14.3 with a squarefree input.

i = 0: v0 = f

i = 1: g v1

i = 2: g

v2
g1 g2

F IGURE 14.7: Sample polynomial factorization.

E XAMPLE 14.5 (continued). Let q = 3 and f = x9 + x8 − x7 + x6 − x4 − x3 − x2 ∈

F3 [x] be the polynomial from Example 14.5 multiplied by x. Then Algorithm
14.13 computes v0 = f in step 1, and h1 = h30 rem f = x3 rem f = x3 and g =
gcd(h1 − x, f ) = gcd(x3 − x, f ) = x in step 2. The latter is obviously irreducible,
so that s = 1 and g1 = g in step 3, and in step 4 we find that x divides f precisely
twice, so that (x, 2) is adjoined to U and v1 = x7 + x6 − x5 + x4 − x2 − x − 1 at the
end of step 4. For i = 2, we compute in step 2 that

h2 = h31 rem f = x9 rem f = −x8 + x7 − x6 + x4 + x3 + x2 ,

g = gcd(h2 − x, v1 ) = x4 + x3 + x − 1.
392 14. Factoring polynomials over finite fields

In step 3, we find, as before, that the irreducible factors of g are g1 = x2 + 1 and

g2 = x2 + x − 1. In step 4, we find that each of them has multiplicity one, ad-
join (g1 , 1) and (g2 , 1) to U, and are left with v2 = x3 − x + 1. The algorithm
would now continue for one further iteration, but the early abort condition deg v2 <
2(2 + 1) = 6 tells us that v2 is already irreducible, and we are done. The trace is
illustrated in Figure 14.7. ✸

We now have the central result of this chapter: a complete factorization algo-
rithm over finite fields in polynomial time. In the next sections, we study the
problem in greater depth, discussing different (and faster) algorithms and various
applications.

14.5. Application: root finding

If we are looking for all zeroes of a given polynomial f ∈ Fq [x], so that we want
to determine all the linear factors of f , it is clearly sufficient to first compute g =
gcd(xq − x, f ) and then apply the equal-degree factorization algorithm to g, so that
we need not compute the whole distinct-degree decomposition of f .

A LGORITHM 14.15 Root finding over finite fields.

Input: A nonconstant polynomial f ∈ Fq [x].
Output: The distinct roots of f in Fq .
1. call the repeated squaring algorithm 4.8 in R = Fq [x]/h f i to compute h =
xq rem f
2. g ←− gcd(h − x, f ), r ←− deg g
if r = 0 then return Ø
3. call the equal-degree factorization algorithm 14.10 to compute the irreduci-
ble factors x − u1 , . . . , x − ur of g
4. return u1 , . . . , ur

C OROLLARY 14.16.
Given f ∈ Fq [x] of degree n, we can find all roots of f in Fq using an expected
number of O(M(n) log n log(nq)) or O∼ (n log q) operations in Fq .

Algorithm 14.15 can be used to find all integral roots of a polynomial f ∈ Z[x]
in a modular fashion, as follows.

A LGORITHM 14.17 Root finding over Z (big prime version).

Input: A nonconstant polynomial f ∈ Z[x] of degree n with max-norm || f ||∞ = A.
Output: The distinct roots of f in Z.
14.6. Squarefree factorization 393

1. B ←− 2n(A2 + A)
let p ∈ N be an odd prime between B + 1 and 2B

2. call Algorithm 14.15 to find all distinct roots {u1 mod p, . . . , ur mod p} in
F p of f mod p, with ui ∈ Z and |ui | < p/2 for all i

3. for i = 1, . . . , r compute vi ∈ Z[x] of degree n − 1 and with max-norm less

than p/2 such that f ≡ (x − ui )vi mod p

4. return {ui : 1 ≤ i ≤ r, |ui | ≤ A and ||vi ||∞ ≤ nA}

T HEOREM 14.18.
Algorithm 14.17 correctly computes all integral roots of f . The cost for step 2 is

O(M(n) log n log(nA) M(log(nA)) loglog(nA))

or O∼ (n log2 A) word operations, and the cost for step 3 per ui is O(n M(log(nA)))
or O∼ (n log A) word operations.

P ROOF. Dividing out powers of x if necessary, we may assume that f (0) 6= 0.

If f (u) = 0 for some u ∈ Z, then (x − u) | f and u divides the constant coefficient
of f , so that |u| ≤ A < p/2. Thus all distinct roots of f can be uniquely recovered
from their images modulo p. We show that f (ui ) = 0 if and only if |ui | ≤ A and
||vi ||∞ ≤ nA, for all i. If f (ui ) = 0, then |ui | ≤ A, by the above, and || f /(x − ui )||∞ ≤
nA < p/2, by Exercise 14.21. But f /(x − ui ) ≡ vi mod p, and since both sides
have coefficients less than p/2 in absolute value, they are equal. On the other
hand, if |ui | ≤ A and ||vi ||∞ ≤ nA, then ||(x − ui )vi ||∞ ≤ (1 + A)nA < p/2, and the
congruence f ≡ (x − ui )vi mod p is in fact an equality.
We have log p ∈ O(log(nA)), and the cost estimate for step 2 follows from
Corollaries 14.16 and 11.13. The cost for each ui in step 3 is O(n) additions and
multiplications in F p or O(n M(log(nA))) word operations. ✷

The cost for finding p is discussed in Section 18.4. In Section 15.6, we will
discuss a faster algorithm for computing integer roots.

14.6. Squarefree factorization

For the moment, we let F be an arbitrary field. We will show how to reduce the
problem of factoring arbitrary polynomials to that of factoring squarefree poly-
nomials. Polynomial factorization software usually performs this reduction first;
it is not quite clear when the (small) advantage one gains outweighs the (small)
cost of the reduction. Our purpose in skipping it was to describe, with Algo-
rithm 14.13, a factorization method that is conceptionally as simple as possible.
394 14. Factoring polynomials over finite fields

In Section 9.3, we defined the derivative f ′ of a polynomial f = ∑0≤i≤n ai xi ∈ F[x]

as f ′ = ∂ f /∂x = ∑0≤i≤n iai xi−1 .
Suppose that f is not squarefree, so that f = g2 h for some g, h ∈ F[x] and g 6∈ F.
Then f ′ = g · (2g′ h + gh′ ), so that g divides u = gcd( f , f ′ ). If the characteristic
(Section 25.3) of F is zero, then f ′ 6= 0 and deg f ′ < deg f = n, so that u is a proper
divisor of f .
Now let f = ∏1≤i≤r fiei be the irreducible factorization of the monic polynomial
f ∈ F[x], with distinct monic irreducible f1 , . . . , fr and positive e1 , . . . , er ∈ N. We
define the squarefree part of f to be ∏1≤i≤r fi . Then
f ′
f′ = ∑ ei f (2)
1≤i≤r fi i

(Exercise 14.22). In order to compute the squarefree part, let us determine u =

gcd( f , f ′ ). The only possible divisors are powers of f1 , . . . , fr . But what is the
multiplicity of fi in f ′ ? Each summand in (2) but the ith is divisible by fiei , and
the ith equals ei fi′ · f / fi and is divisible by fiei −1 . Thus fiei −1 divides f ′ . Can fiei
divide it? This happens if and only if it divides ei fi′ · f / fi . Certainly fi divides none
of the other factors in f / fi , except fiei −1 , but can it divide ei fi′ ? This is a polynomial
of degree smaller than deg fi , and one might think that it cannot be divisible by fi .
But indeed it can, namely when (and only when) ei fi′ = 0. We leave this situation
to Exercises 14.27 and 14.30, and now put together our current knowledge in the
case where this cannot happen, namely when the characteristic is zero.

A LGORITHM 14.19 Squarefree part in characteristic zero.

Input: f ∈ F[x] monic of degree n > 0, where F is a field of characteristic zero.
Output: The squarefree part of f , that is, the product of all distinct monic irreduc-
ible factors of f .

1. u ←− gcd( f , f ′ )
f
2. return v = .
u

T HEOREM 14.20.
Algorithm 14.19 works correctly as specified and takes O(M(n) log n) operations
in F .

P ROOF. For the correctness, we note that each ei fi′ is nonzero, and by the above

u= ∏ fiei −1 , v= ∏ fi .
1≤i≤r 1≤i≤r

The running time estimate follows from Theorems 9.6 and 11.7. ✷
14.6. Squarefree factorization 395

The squarefree decomposition provides more information than the squarefree

part. For a nonconstant monic polynomial f ∈ F[x], this is the unique sequence of
monic squarefree pairwise coprime polynomials (g1 , . . . , gm ) with
f = g1 g22 g33 · · · gm
m

and gm 6= 1. For example, the squarefree decomposition of x4 (x + 1)2 (x − 1)2 ·

(x2 + 1)2 (x2 + x + 1) ∈ Q[x] is (x2 + x + 1, x4 − 1, 1, x). Thus gi is the product of
those monic irreducible polynomials that divide f exactly i times. The squarefree
part of f is g1 · · · gm . The process of computing the squarefree decomposition is
called squarefree factorization.
We now present algorithms for computing the squarefree decomposition. This
decomposition will be used in an integration algorithm in Chapter 22. For sim-
plicity, we assume characteristic zero. A simple idea is to call Algorithm 14.19,
compute g1 = v/ gcd(u, v), and proceed recursively with f replaced by u. This
takes O(m M(n) log n) field operations. The following method is faster by one or-
der of magnitude.

A LGORITHM 14.21 Yun’s squarefree factorization in characteristic zero.

Input: A monic polynomial f ∈ F[x] of degree n ≥ 1, where F is a field of charac-
teristic zero.
Output: The squarefree decomposition of f .
f f′
1. u ←− gcd( f , f ′ ), v1 ←− , w1 ←−
u u
2. i ←− 1
repeat
vi wi − v′i
hi ←− gcd(vi , wi − v′i ), vi+1 ←− , wi+1 ←−
hi hi
i ←− i + 1
until vi = 1
k ←− i − 1
3. return (h1 , . . . , hk )

The correctness of the algorithm depends on the following lemma.

L EMMA 14.22. Let F be a field of characteristic zero, g1 , . . . , gm ∈ F[x] monic

squarefree and pairwise coprime, g = g1 · · · gm , and h = ∑1≤i≤m ci g′i g/gi , for some
constants ci ∈ F . Then gcd(g, h − cg′ ) = ∏c j =c g j for all c ∈ F .

P ROOF. We have g′ = ∑1≤i≤m g′i g/gi , by Exercise 9.27, and hence

g
h − cg′ = ∑ (ci − c)g′i .
1≤i≤m gi
396 14. Factoring polynomials over finite fields

Now g j divides each summand with i 6= j, and gcd(g j , g′j ) = gcd(g j , g/g j ) = 1
since F has characteristic zero and g j and g are squarefree. Thus the claim follows
from g
gcd(g j , h − cg′ ) = gcd g j , (c j − c)g′j = gcd(g j , c j − c). ✷
gj

T HEOREM 14.23.
The algorithm uses O(M(n) log n) operations in F and it computes correctly the
squarefree decomposition of f .

P ROOF. Let (g1 , . . . , gm ) be the squarefree decomposition of f . Then we have

u = g2 g23 · · · gm
m−1
, and we prove by induction on i that
vi+1
hi = gi if i ≥ 1, vi+1 = ∏ g j, wi+1 = ∑ ( j − i)g′j
i< j≤m i< j≤m gj

for 0 ≤ i ≤ m. This is clear for v1 , and the claim for w1 follows from
f v1
f′ = ∑ · jg′j = u ∑ j g′j .
1≤ j≤m g j 1≤ j≤m g j

For i ≥ 1, Lemma 14.22 gives hi = gi . Then vi+1 = ∏i< j≤m g j is clear, and

′ vi ′ vi
wi+1 = ∑ ( j − (i − 1))g j g j − ∑ g j g j /gi
i≤ j≤m i≤ j≤m
v i vi+1
= ∑ ( j − i)g′j = ∑ ( j − i)g′j .
i< j≤m g j gi i< j≤m gj

For the cost estimate, let d j = deg g j for 1 ≤ j ≤ m. Step 1 takes O(M(n) log n)
arithmetic operations. Moreover, deg vi = ∑i≤ j≤m d j , deg wi = (deg vi ) − 1, the gcd
computation in the ith loop takes O(M(deg vi ) log n), and the two division steps
O(M(deg vi )) operations in F. Using the superlinearity of M (Section 8.3), we find

∑ M(deg vi ) ≤ M ∑ deg vi = M ∑ j d
1≤i≤m 1≤i≤m 1≤i≤ j≤m

=M ∑ idi = M(n). ✷
1≤i≤m

E XAMPLE 14.24. Suppose that f = abc2 d 4 for monic distinct irreducible poly-
nomials a, b, c, d ∈ F[x]. Then Algorithm 14.21 computes u = gcd( f , f ′ ) = cd 3 ,

v1 = f /u = abcd, w1 = f ′ /u = a′ bcd + ab′ cd + 2abc′ d + 4abcd ′ ,

h1 = gcd(abcd, abc′ d + 3abcd ′ ) = ab,
14.6. Squarefree factorization 397

v2 = abcd/ab = cd, w2 = (abc′ d + 3abcd ′ )/ab = c′ d + 3cd ′ ,

h2 = gcd(cd, 2cd ′ ) = c, v3 = cd/c = d, w3 = 2cd ′ /c = 2d ′ ,
h3 = gcd(d, d ′ ) = 1, v4 = d/1 = d, w4 = d ′ /1 = d ′ ,
h4 = gcd(d, 0) = d, v5 = d/d = 1, w5 = 0/d = 0.

Thus the correct squarefree decomposition (ab, c, 1, d) is returned. ✸

An interesting possibility may occur if char F = p for a prime p, which does not
happen if char F = 0: f = ∑0≤i≤n ai xi 6∈ F and f ′ = 0. This happens if and only if
each i with ai 6= 0 is divisible by p; then the summand i ai xi−1 is zero in F[x]. If
F = F p , then we can write
p
ip i
f= ∑ aip x = ∑ aip x , (3)
0≤i≤n/p 0≤i≤n/p

p
since (g + h) p = g p + h p for all g, h ∈ F p [x] and aip = aip for all aip ∈ F p (Sec-
tion 25.4). For example, (x4 + x2 + 1)′ = 0 in F2 [x], and x4 + x2 + 1 = (x2 + x + 1)2 .
Similarly, if F = Fq for a prime power q = ps and s ≥ 1, then Fermat’s little
s−1
theorem 14.1 says that aq = a for all a ∈ Fq , and hence a p = aq/p is a pth root
q/p
of a. Then for g = ∑0≤i≤n/p aip xi, we have f = g p, in analogy to (3). On the other
hand, if f = g p , then f ′ = pg p−1 g′ = 0, and thus

∀ f ∈ Fq [x] f ′ = 0 ⇐⇒ f is a pth power in Fq [x]. (4)

Let f = f1e1 · · · frer be the irreducible factorization of f . For 1 ≤ i ≤ r, fi′ = 0

would imply that fi is a pth power, in contradiction to the irreducibility of fi . In
the language of field theory, any irreducible polynomial has a nonzero derivative
and thus is separable, and therefore finite fields and fields of characteristic zero
are perfect. The field F = F p (y) of rational functions in y is not perfect, since for
example f = x p − y ∈ F[x] is irreducible with f ′ = 0 (Exercise 14.33). The reason
for this is that the “constant” y ∈ F has no pth root in F. Thus fi′ 6= 0 over a finite
field, and deg fi′ < deg fi implies that the gcd( fi′ , fi ) is not fi and hence 1, by the
irreducibility of fi . But it can still happen that ei fi′ = 0, namely when the nonzero
integer ei is divisible by p. In that case, (2) implies that fiei divides f ′ .

C OROLLARY 14.25.
Let F be a finite field or a field of characteristic zero and f ∈ F[x] nonconstant.
Then f is squarefree if and only if gcd( f , f ′ ) = 1.

Exercises 14.27 and 14.30 discuss squarefree factorization over finite fields.
398 14. Factoring polynomials over finite fields

14.7. The iterated Frobenius algorithm

We now sketch a faster algorithm for our problem of polynomial factorization.
A central role is played by the Frobenius automorphism

Fqn −→ Fqn ,
σ:
α 7−→ αq .
In the language of Galois theory, the field extension Fqn /Fq is normal, σ generates
its Galois group {id, σ , . . . , σ n−1 }, and σ n = id, by Fermat’s little theorem 14.1.
For a squarefree monic polynomial f ∈ Fq [x] of degree n, we now consider the
residue class ring R = Fq [x]/h f i and the map

R −→ R,
σ:
α 7−→ αq ,
which is also called the Frobenius automorphism of R (it is in fact an automor-
phism since f is squarefree). It generalizes the above notion, where R = Fqn if f is
irreducible, and satisfies the following rules for all α, β ∈ R:
σ (α + β ) = σ (α) + σ (β ), σ (αβ ) = σ (α)σ (β ), σ (α) = α ⇐⇒ α ∈ Fq .
The last one is Fermat’s little theorem, where we identify Fq with the subfield of
R consisting of the residue classes of the constant polynomials modulo f . These
rules imply in particular that g(αq ) = g(α)q for any α ∈ R and any polynomial
g ∈ Fq [x].
Let ξ = (x mod f ) ∈ R. Then the powers 1, ξ , · · · , ξ n−1 of ξ form an Fq -basis
of R, and any element α ∈ R can be written as
α = an−1 ξ n−1 + · · · + a1 ξ + a0 = a(ξ ) = (a mod f ),
with unique an−1 , . . . , a0 ∈ Fq , where a = an−1 xn−1 + a1 x + a0 ∈ Fq [x] is the canon-
ical representative in Fq [x] of degree less than n for the residue class α. We will
write α̌ for a in what follows, so that
α̌(ξ ) = α̌ mod f = α. (5)
In software, there is no need to distinguish between α and α̌, or, equivalently,
between (a mod f ) and a. Both will be represented by an array [a0 , . . . , an−1 ] of
coefficients. But conceptually, one has to make the distinction; for example, it
does not make sense to “evaluate α at ξ q ”, but we can (and will) evaluate α̌ at ξ q .
If α ∈ R is arbitrary and a = α̌ ∈ Fq [x] as above, we can compute the image of
α under the Frobenius map as
αq = a(ξ )q = a(ξ q ) = α̌(ξ q ). (6)
This is called the polynomial representation of the Frobenius map and is the basis
for the following algorithm.
14.7. The iterated Frobenius algorithm 399

A LGORITHM 14.26 Iterated Frobenius.

Input: f ∈ Fq [x] squarefree of degree n, d ∈ N with d ≤ n, ξ q ∈ R = Fq [x]/h f i,
where ξ = x mod f ∈ R, and α ∈ R.
d
Output: α, αq , . . . , αq ∈ R.

1. γ0 ←− ξ , γ1 ←− ξ q , l ←− ⌈log2 d⌉

2. for i = 1, . . . , l do
call the fast multipoint evaluation algorithm 10.7 over R to compute
γ2i−1 + j = γ̌2i−1 (γ j ) for 1 ≤ j ≤ 2i−1
3. call the fast multipoint evaluation algorithm 10.7 over R to compute δk =
α̌(γk ) for 0 ≤ k ≤ d

4. return δ0 , . . . , δd

We note that the input ξ q is not required for the correctness of the algorithm, but
we need it for the running time bound.

T HEOREM 14.27.
Algorithm 14.26 works correctly as specified and uses O(M(n)2 log n log d) or
O∼ (n2 ) operations in Fq .

k
P ROOF. For the correctness, we prove the invariant γk = ξ q for 0 ≤ k ≤ 2i by
induction on i. The case i = 0 is clear from step 1. For the inductive step, it is
sufficient to prove the claim for k > 2i−1 . For 1 ≤ j ≤ 2i−1 , we have that
j
2i−1 q j 2i−1 + j
γ2i−1 + j = γ̌2i−1 (γ j ) = γ̌2i−1 (ξ q ) = (γ̌2i−1 (ξ ))q = γ2qi−1 = ξ q
j j
= ξq ,

by step 2, (5), (6), and the induction hypothesis. Finally, in step 3 we correctly
compute
k k k
δk = α̌(γk ) = α̌(ξ q ) = α̌(ξ )q = αq
for 0 ≤ k ≤ d.
By Corollary 10.8, the polynomials γ̌2i−1 and α̌ in R[x] of degree less than n can
be evaluated at no more than n ring elements using at most ( 11
2 M(n) + O(n)) log2 n
multiplications and additions in R. In steps 2 and 3 of Algorithm 14.26, we
solve l + 1 ∈ O(log d) such multipoint evaluation problems, at a total cost of
O(M(n) log n log d) operations in R, or O(M(n)2 log n log d) operations in Fq . ✷

Exercise 14.35 asks for a better O-free timing estimate. In fact,

O((n/d)M(nd) log d) (7)

400 14. Factoring polynomials over finite fields

operations in Fq are sufficient, but since this saves only factors log n when d = n,
we omit the proof.
The process of the iterated Frobenius algorithm can be illustrated as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
ξq ξq ξq ξq ξq ξq ξq ξq ξq ξq ξq ξq ξq ξq ξq ξq ...
|{z} | {z } | {z } | {z }
i=1 i=2 i=3 i=4

The ith brace encloses those powers of ξ that are newly computed in the ith iter-
ation of step 2. The advantage of the iterated Frobenius algorithm over the naïve
k
successive computation of the ξ q might be compared to the advantage of repeated
squaring for the computation of one power an over repeated multiplication.
Algorithm 14.26 can be used for distinct-degree factorization as well as for
equal-degree factorization. Remember that in Algorithm 14.3, we had to compute
i i
xq − x mod f = ξ q − ξ = γi − ξ
for 1 ≤ i ≤ n. This can be done by first computing ξ q using repeated squaring and
then applying steps 1 and 2 of the iterated Frobenius algorithm with d = n. The cost
for the other steps in the distinct-degree factorization algorithm is dominated by
the cost for the iterated Frobenius, and using (7), we have the following corollary.

C OROLLARY 14.28.
The distinct-degree decomposition of a squarefree polynomial f ∈ Fq [x] of degree
n can be computed using O(M(n2 ) log n + M(n) log q) or O∼ (n2 + n log q) opera-
tions in Fq .

d
In the equal-degree factorization of Algorithm 14.8, we compute α(q −1)/2 for a
uniform random element α ∈ R = Fq [x]/h f i, where d is the degree of any of the
irreducible factors of f . The exponent can be written as
qd − 1 q−1
= (qd−1 + qd−2 + · · · + q + 1) ,
2 2
and hence
d d−1
α(q −1)/2
= (α q · · · αq · α)(q−1)/2 = (δd−1 · · · δ1 · δ0 )(q−1)/2 ,
which can be computed using the iterated Frobenius algorithm, and repeated squar-
ing for the computation of the initial power ξ q and the final q−1
2 th power. Again
using (7), we have the following result.

C OROLLARY 14.29.
The complete factorization of a squarefree polynomial f ∈ Fq [x] of degree n = rd
with r irreducible factors of degree d can be computed using an expected number
of O((M(nd)r log d + M(n) log q) log r) or O∼ (n2 + n log q) operations in Fq .
14.8. Algorithms based on linear algebra 401

Similarly as for the equal-degree factoring algorithm 14.10, one finds a slightly
better estimate of O(M(nd)r log d + M(n) log r log q) arithmetic operations in Fq
(Exercise 14.17), or even O(M(nd)r + M(n) log q) for finding only one irreducible
factor. By replacing steps 2 and 3 in Algorithm 14.13 with the two algorithms
for distinct-degree and equal-degree factorization presented above, we obtain the
following result.

C OROLLARY 14.30.
A polynomial f ∈ Fq [x] of degree n can be completely factored with an expected
number of O(M(n2 ) log n + M(n) log n log q) or O∼ (n2 + n log q) operations in Fq .

A single operation in Fq can be done with O(M(log q) loglog q) or O∼ (log q)

word operations. Factorization thus takes O∼ (n2 log q + n log2 q) word operations,
which is softly quadratic in the input size of about n log264 q words. In the Notes
14.8 we mention subquadratic algorithms by Kaltofen & Shoup (1998) in the case
of small characteristic.

14.8. Algorithms based on linear algebra

The earliest factoring algorithm over finite fields running in polynomial time is due
to Berlekamp (1967, 1970). Instead of performing a distinct-degree factorization,
he takes a linear algebra approach to split the polynomial, as follows. Let f ∈ Fq [x]
be a squarefree monic polynomial of degree n > 0 and R = Fq [x]/h f i. Then R is a
vector space of dimension n over Fq (in fact, it is even an Fq -algebra), and the map
β = σ − id: R −→ R with β (a) = aq − a is Fq -linear. Let us determine its kernel.
If f = f1 · · · fr is the factorization of f into distinct monic irreducible polynomials
f1 , . . . , fr ∈ Fq [x], then we have the Chinese remainder decomposition

R∼
= Fq [x]/h f1 i × · · · × Fq [x]/h fr i. (8)

As in Section 14.3, each Fq [x]/h fi i is a finite field with qdeg fi elements and contains
Fq as a subfield (the constant polynomials modulo fi ). Now for a ∈ Fq [x], we have

a mod f ∈ ker β ⇐⇒ aq ≡ a mod f ⇐⇒ aq ≡ a mod fi for 1 ≤ i ≤ r

⇐⇒ a mod fi ∈ Fq for 1 ≤ i ≤ r,

by Fermat’s little theorem 14.1. Thus B = ker β is the subspace corresponding to

Fq ×· · ·×Fq = Fqr in (8), as illustrated in Figure 14.8. In fact, B is an Fq -subalgebra
of R, called the Berlekamp subalgebra. If χ is the isomorphism in (8), then
a mod f ∈ B if and only if χ(a mod f ) = (a1 mod f1 , . . . , ar mod fr ) for some
constants a1 , . . . , ar ∈ Fq .
402 14. Factoring polynomials over finite fields

Fq [x]/h f2 i

Fq [x]/h f3 i

Fq [x]/h f1 i

Fq [x]/h fr i

Fq Fq Fq ... Fq B

Fq [x]/h f i

F IGURE 14.8: The Berlekamp subalgebra B of R = Fq [x]/h f i.

The matrix Q ∈ Fqn×n representing the Frobenius map σ with respect to the
polynomials basis xn−1 mod f , . . . , x mod f , 1 mod f of R was first used in Petr
(1937) for distinct-degree factorization, and has been a staple of computer algebra
since Berlekamp’s work. We will call it the Petr-Berlekamp matrix of f . Now
Berlekamp’s algorithm first determines a basis b1 mod f , . . . , br mod f of B using
Gaussian elimination on Q − I. We note that

f is irreducible ⇐⇒ r = 1 ⇐⇒ rank(Q − I) = n − 1. (9)

Now we assume for simplicity that q is odd (see Exercise 14.16 for character-
istic 2), and let b = c1 b1 + · · · + cr br be a uniformly random linear combination
of basis elements, with c1 , . . . , cr ∈ Fq chosen independently, so that b mod f is
a uniform random element of B . We now employ the same (q − 1)/2 trick as
in the equal-degree factorization. The b mod fi are uniformly and independently
distributed random elements of Fq for 1 ≤ i ≤ r. Hence, if no fi divides b, then
b(q−1)/2 ≡ ±1 mod fi , and both possibilities occur with probability 1/2 and inde-
pendently for all i, by Lemma 14.7. This yields the following Las Vegas algorithm.
14.8. Algorithms based on linear algebra 403

A LGORITHM 14.31 Berlekamp’s algorithm.

Input: A monic squarefree polynomial f ∈ Fq [x] of degree n > 0, where q is an odd
prime power.
Output: Either a proper factor g of f , or “failure”.

1. call the repeated squaring algorithm 4.8 in Fq [x]/h f i to compute xq rem f

2. for i = 0, . . . , n − 1 compute xqi rem f = ∑0≤ j<n qi j x j

Q ←− (qi j )0≤i, j<n

3. use Gaussian elimination on Q − I ∈ Fqn×n , where I is the n × n identity

matrix, to compute the dimension r ∈ N and a basis b1 mod f , . . . , br mod f
of the Berlekamp algebra B , with b1 , . . . , br ∈ Fq [x] of degree less than n
if r = 1 then return f

4. choose independent uniformly random elements c1 , . . . , cr ∈ Fq

a ←− c1 b1 + · · · + cr br

5. g1 ←− gcd(a, f )
if g1 6= 1 and g1 6= f then return g1

6. call the repeated squaring algorithm 4.8 in R = Fq [x]/h f i to compute b =

a(q−1)/2 rem f

7. g2 ←− gcd(b − 1, f )
if g2 6= 1 and g2 6= f then return g2 else return “failure”

The Gaussian elimination in step 3 can be done with O(n3 ) operations in Fq .

In Section 12.1 we have seen faster methods, and we may use any feasible matrix
multiplication exponent ω , so that n × n matrices can be multiplied with O(nω )
operations. The reader may very well think of the classical ω = 3.

T HEOREM 14.32.
Algorithm 14.31 works correctly as specified and returns “failure” with probability
at most 1/2. It uses O(nω + M(n) log q) operations in Fq if ω > 2.

P ROOF. Correctness is clear from the discussion preceding the algorithm. In order
to analyze the failure probability, we note that a is a uniformly random element
of B , so that ui ≡ a mod fi for 1 ≤ i ≤ r are independent random elements of Fq
(via its embedding in Fq [x]/h f i). If some ui is zero and some u j nonzero, a factor is
returned in step 5. With probability q−r , all ui ’s are zero. All ui ’s are nonzero with
(q−1)/2
probability (1 − q−1 )r , and then each vi = ui is 1 or −1 with probability 2−1
for either case, and all vi ’s are equal with probability 2 · 2−r . Thus failure occurs
404 14. Factoring polynomials over finite fields

in step 7 with probability t = q−r + (1 − q−1 )r · 2−r+1 < 2−1 , since r ≥ 2, the last
inequality holds for r = 2, and t is monotonically decreasing in r.
The cost for step 1 is O(M(n) log q) field operations. Step 2 uses n − 2 multi-
plications modulo f , or O(n M(n)) operations in Fq . The cost for step 3 is O(nω ),
by Section 12.1. This dominates the cost for step 2, the O(nr) field operations for
step 4, and the O(M(n) log n) for the gcds in steps 5 and 7. Finally, step 6 uses
another O(M(n) log q) field operations. ✷

When we want the complete factorization of f , it is sufficient to compute the

basis of B only once and to apply the splitting process of steps 4 through 7 recur-
sively to g and f /g, as in the equal-degree factorization algorithm 14.10. Then an
analysis similar to that of Theorem 14.11 implies that all irreducible factors of f
can be computed in expected time O(nω + M(n) log r log q).
When log q is small in comparison to n, then the dominant cost of Algorithm
14.31 is the nullspace computation in step 2. Since evaluating the matrix Q − I
at a vector corresponds to computing β (a) = aq − a for some a ∈ R, at a cost of
O(M(n) log q) field operations, it looks promising to try the black box linear alge-
bra approach from Section 12.4. By combining the algorithm of Kaltofen & Saun-
ders (1991) (see Theorem 12.19 and the discussion following it) with Berlekamp’s
algorithm in a straightforward manner, we would obtain an O∼ (n2 log q) factor-
ing algorithm. Exploiting the special properties of the Frobenius automorphism,
Kaltofen & Lobo (1994) have taken the following more sophisticated approach
which leads to an O∼ (n2 + n log q) algorithm.

A LGORITHM 14.33 Kaltofen & Lobo’s algorithm.

Input: A monic squarefree polynomial f ∈ Fq [x] of degree n > 0.
Output: If f is reducible, then a proper factor of f , otherwise f itself, or “failure”.
{ Q ∈ Fqn×n denotes the Petr-Berlekamp matrix of f }
1. randomly choose two column vectors u, v ∈ Fqn
call the iterated Frobenius algorithm 14.26 to compute Qi v for 0 ≤ i < 2n
call Algorithm 12.9 with input uT Qi v for 0 ≤ i < 2n to compute the minimal
polynomial m ∈ Fq [x] of the sequence (uT Qi v)i≥0 ∈ FqN
{ with “high” probability, m is the minimal polynomial of Q; see Exercise
12.15 }
2. if m = xn − 1 return f else if m(1) 6= 0 return “failure”
m(x + 1)
3. h ←−
x
choose another random vector w ∈ Fqn
call the iterated Frobenius algorithm 14.26 to compute Qi w for 0 ≤ i ≤ deg h
compute the polynomial a ∈ Fq [x] with coefficient vector h(Q) · w
14.8. Algorithms based on linear algebra 405

4. g1 ←− gcd(a, f )
if g1 6= 1 and g1 6= f then return g1
5. call the repeated squaring algorithm 4.8 in R = Fq [x]/h f i to compute b =
a(q−1)/2 rem f
6. g2 ←− gcd(b − 1, f )
if g2 6= 1 and g2 6= f then return g2 else return “failure”

The analyses in Kaltofen & Lobo (1994) and Kaltofen & Shoup (1998) imply
the following.

T HEOREM 14.34.
Algorithm 14.33 works correctly as specified, returns “failure” with probability at
most 1/2 if q ≥ 4n, and takes O(M(n2 ) log n + M(n) log q) operations in Fq . If the
algorithm is used recursively to factor f completely, then the expected recursion
depth is O(log p n · log r), where p = char Fq and r is the number of irreducible
factors of f .

Figure 14.9 illustrates how the asymptotic running times of four factorization
algorithms depend on the relation between the two independent parameters n, the
degree of the input polynomial, and log2 q. The unit of time is log2 q bit opera-
tions, which is the cost of one operation in Fq . We ignore factors no(1) . The figure,
based on a similar one in Kaltofen & Shoup (1998), abstracts a three-dimensional
picture of the running time as a function of n and log2 q into a two-dimensional
figure with two logarithmic axes x and y, where log2 q and the time are about nx
and ny , respectively. The figure pictures Berlekamp’s classical algorithm 14.31, the
method of Cantor & Zassenhaus (Algorithms 14.3 and 14.10), the iterated Frobe-
nius algorithm of von zur Gathen & Shoup (Corollary 14.30), and the subquadratic
algorithm of Kaltofen & Shoup (1998), incorporating Huang & Pan’s (1998) fast
rectangular matrix multiplication. A derivation of the (rounded) numerical value
in Figure 14.9 for the latter is given in Notes 14.8. Huang & Pan (1998) present an-
other algorithm whose running time, corresponding to x+1.80535, beats the others
for x ≤ 0.00173; its graph is virtually indistinguishable for these small values from
the lower left segment. Finally, Kedlaya & Umans (2009) have applied their fast
modular composition algorithm, mentioned at the end of Section 12.2, to obtain
a factoring algorithm using an expected number of O(n1.5+o(1) log q + n log2 q) of
bit operations. They achieve this by moving out of the algebraic model of us-
ing only operations in Fq . Their operations count amounts to about n1.5 + n log q
Fq -equivalent operations. Each of these six algorithms is asymptotically faster
than previously known methods for some choices of n and q.
Computationally, (fast) polynomial factorization over a finite field is a much
more advanced task than, for example, multiplication or even gcd computation.
406 14. Factoring polynomials over finite fields

y Berlekamp (1970)
Cantor & Zassenhaus (1981)
time ∈ O∼ (ny ) von zur Gathen & Shoup (1992)
Kaltofen & Shoup (1998)
Kedlaya & Umans (2009)
2.6

2.4 y = 2.376
2
x+
y=

2.2

6 y=2
1.80

1
2.0

x+
6x +
0.41
y=
y=
1.8

y = 1.5 log2 q = nx
1.5
x
0.0 0.5 1.0 1.376

F IGURE 14.9: Asymptotic running times of various factoring algorithms.

Before implementing a particular algorithm, one has to implement carefully many

other routines for basic polynomial arithmetic, and designing a polynomial factor-
ization package may require several woman-years. Figure 14.10 gives an overview
of various factorization algorithms for univariate polynomials over finite fields and
the polynomial arithmetic they are based on. The arrows indicate dependencies.

14.9. Testing irreducibility and constructing irreducible polynomials

Any factorization algorithm can be used as an irreducibility test. For example, in
the distinct-degree factorization algorithm 14.3 we can stop when we either find
a proper factor or reach degree more than n/2 without having found a factor. An
alternative is provided by the following corollary.

C OROLLARY 14.35.
A polynomial f ∈ Fq [x] of degree n ≥ 1 is irreducible if and only if
n
(i) f divides xq − x, and
n/t
(ii) gcd(xq − x, f ) = 1 for all prime divisors t of n.
14.9. Testing irreducibility and constructing irreducible polynomials 407

multiplication
explicit
linear algebra
division with remainder

Extended Euclidean modular multipoint

Algorithm, gcd composition evaluation

miminal polynomial

black box
linear algebra iterated
Cantor & Frobenius
Zassenhaus
Niederreiter Berlekamp

Gao & Kaltofen Kaltofen von zur Gathen Kedlaya

von zur Gathen & Lobo & Shoup & Shoup & Umans

F IGURE 14.10: Algorithms for polynomial factorization over finite fields.

P ROOF. It follows immediately from Theorem 14.2 that f satisfies the two condi-
tions if it is irreducible. Conversely, if (i) holds, then Theorem 14.2 implies that
the degree of any irreducible factor of f divides n. Let g be such an irreducible
factor, and suppose that d = deg g < n. Then d divides n/t for some prime factor t
n/t
of n, and hence g | xq − x. This contradicts (ii), and we conclude that d = n and
f is irreducible. ✷

A LGORITHM 14.36 Irreducibility test over finite fields.

Input: f ∈ Fq [x] of degree n.
Output: “irreducible” or “reducible”.

1. call the repeated squaring algorithm 4.8 to compute xq rem f

n
use the modular composition algorithm 12.3 to compute a = xq rem f
if a 6= x then return “reducible”

2. for all prime divisors t of n do

n/t
3. use the modular composition algorithm 12.3 to compute b = xq rem f
if gcd(b − x, f ) 6= 1 then return “reducible”

4. return “irreducible”
408 14. Factoring polynomials over finite fields

For an integer n ≥ 1, we denote by δ (n) the number of distinct prime divisors

of n. Since 2 is the smallest prime, we have the trivial upper bound δ (n) ≤ log2 n.
In fact, δ (n) ≤ ln n/ lnln n, and δ (n) is O(loglog n) on average.

T HEOREM 14.37.
Algorithm 14.36 correctly decides whether the input polynomial is irreducible. It
can be implemented so as to use O(M(n) log q + (n(ω+1)/2 + n1/2 M(n))δ (n) log n)
or O∼ (n(ω+1)/2 + n log q) operations in Fq .

P ROOF. Correctness follows from Corollary 14.35. The cost for computing xq rem
m
f in step 1 is O(M(n) log q) field operations. To compute sm = xq rem f for some
m ∈ N, we employ the polynomial representation (6) of the Frobenius map, noting
that
i+ j
i q j j j
xq mod f = ξ q = si (ξ )q = si (ξ q ) = si (s j (ξ )) = si (s j ) mod f
m
for all i, j. Thus xq rem f can be computed from xq rem f in a “repeated squaring”
fashion along the binary representation of m, taking O(log m) modular composition
steps of the form si (s j ) rem f . By Theorem 12.4, this can be done at a total cost of
O((n(ω+1)/2 + n1/2 M(n)) log m) operations in Fq , dominating the cost for the gcd
in step 3. The total number of times we have to compute some sm is 1 + δ (n), and
the claim follows since m ≤ n in all those cases. ✷

With the current world record ω < 2.373 (Section 12.1), we have (ω + 1)/2 <
1.687. The iterated Frobenius algorithm for distinct-degree factorization can be
used for testing irreducibility and takes O∼ (n2 + n log q) operations in Fq (Corol-
lary 14.28). A third irreducibility test (for a squarefree polynomial) is given by (9);
it is sufficient to compute the rank of Q − I, taking O(nω + M(n) log q) field oper-
ations.
Comparing the three tests and using classical matrix arithmetic, where ω = 3, the
first two give the same soft-Oh estimate n2 + n log q, but the Oh-bound shows that
the test 14.36 is faster: n2 δ (n) log n versus M(n2 ) log n, for small q. The n3 estimate
for the third method is in a different league. When we take ω < 3, say ω = 2.373
(Section 12.1), then the estimate for Algorithm 14.36 shrinks to only O∼ (n1.687 +
n log q). The run time of a method by Kedlaya & Umans (2009) corresponds, as
explained at the end of the previous section, to only n1.5+o(1) + O∼ (n log q) opera-
tions in Fq .
Now that we know how to test a polynomial for irreducibility, it is natural to
ask how to find irreducible polynomials. This is used to construct finite extension
fields of finite fields and in modular algorithms. The following result tells us how
frequently irreducible polynomials occur among arbitrary polynomials.
14.9. Testing irreducibility and constructing irreducible polynomials 409

L EMMA 14.38. Let q be a prime power and n ∈ N≥1 . Then the number I(n, q) of
monic irreducible polynomials of degree n in Fq [x] satisfies

qn − 2qn/2 qn
≤ I(n, q) ≤ .
n n
In particular, if qn ≥ 16, then the probability pn for a uniformly random monic
polynomial of degree n to be irreducible satisfies

1 1 2 1
≤ 1 − n/2 ≤ pn ≤ .
2n n q n

P ROOF. Let fn be the product of all monic irreducible polynomials of degree n in

Fq [x]. Thus deg fn = n · I(n, q). Theorem 14.2 can then be reformulated as
n
xq − x = ∏ f d = f n · ∏ fd .
d|n d|n, d<n

Taking degrees on both sides, we obtain qn = deg fn + ∑ deg fd , and hence

d|n, d<n

qn ≥ deg fn = n · I(n, q). (10)

This proves the upper bound. Now

qn/2+1 − 1
∑ deg fd ≤ ∑ deg fd ≤ ∑ qd < ≤ 2qn/2 ,
d|n, d<n 1≤d≤n/2 1≤d≤n/2
q−1

by (10) with d instead of n and the fact that q ≥ 2, and hence

n · I(n, q) = deg fn = qn − ∑ deg fd ≥ qn − 2qn/2 ,

d|n, d<n

which establishes the lower bound.

There are qn monic polynomials of degree n in Fq [x], and hence

1 I(n, q) 1 1
≥ ≥ (1 − 2q−n/2 ) ≥
n qn n 2n

if qn ≥ 16. This proves the last assertion. ✷

In fact, the probability is close to 1/n when qn is not too small. The precise
formula n
n · I(n, q) = ∑ µ qd
d|n
d
410 14. Factoring polynomials over finite fields

can be found by using a well-known number theoretic tool called Möbius inver-
sion (Exercise 14.46). Here

 1 if n = 1,
µ(n) = (−1)k if n is the product of k distinct primes, (11)

0 if n is not squarefree,
is the Möbius function, defined for positive integers n. The first few values of
µ are listed in Section 17.4. Table 14.11 tabulates the values of I(n, q) for some
small values of n and q.
n q=2 q=3 q=4 q=5 q=7 q=8 q=9
2 1 3 6 10 21 28 36
3 2 8 20 40 112 168 240
4 3 18 60 150 588 1008 1620
5 6 48 204 624 3360 6552 11 808
6 9 116 670 2580 19 544 43 596 88 440
7 18 312 2340 11 160 117 648 299 592 683 280
8 30 810 8160 48 750 720 300 2 096 640 5 380 020
9 56 2184 29 120 217 000 4 483 696 14 913 024 43 046 640
10 99 5880 104 754 976 248 28 245 840 107 370 900 348 672 528

TABLE 14.11: The number I(n, q) of irreducible polynomials for 2 ≤ n ≤ 10 and q ≤ 9.

A simple idea to find a random irreducible polynomial of a given degree n is

now to repeatedly and independently choose random polynomials of degree n and
test them for irreducibility. Using Algorithm 14.36, this leads to the following
result.

C OROLLARY 14.39.
For a prime power q and n ∈ N>0 , one can find a uniformly random irreducible
polynomial of degree n in Fq [x] using an expected number of

O(n M(n) log q + (n(ω+3)/2 + n3/2 M(n))δ (n) log n) or O∼ (n(ω+3)/2 + n2 log q)

operations in Fq .

The exponent (ω + 3)/2 is less than 2.688 for the smallest currently known ω .
The following alternative method is somewhat faster.

A LGORITHM 14.40 Ben-Or’s generation of irreducible polynomials.

Input: A prime power q and n ∈ N>0 .
Output: A uniformly random monic irreducible polynomial of degree n in Fq [x].

1. randomly choose a monic polynomial f ∈ Fq [x] of degree n

14.9. Testing irreducibility and constructing irreducible polynomials 411

2. for i = 1, . . . , ⌊n/2⌋ do
i
gi ←− gcd(xq − x, f ), if gi 6= 1 then goto 1
3. return f

By Theorem 14.4, step 2 can be performed with O∼ (n2 log q) field operations,
and Lemma 14.38 would imply a total cost of O∼ (n3 log q), but the following anal-
ysis shows that the actual cost is lower by about one order of magnitude. We state
the following property without proof.

L EMMA 14.41. Let q be a prime power and n ∈ N>0 . The expected value of
the degree of the smallest irreducible factor of a uniformly random polynomial of
degree n in Fq [x] is O(log n).

T HEOREM 14.42.
Ben-Or’s algorithm 14.40 works correctly as specified and takes an expected num-
ber of O(n M(n) log n log(nq)) or O∼ (n2 log q) operations in Fq .

P ROOF. A reducible polynomial has an irreducible factor of degree at most n/2,

which divides some gi , by Theorem 14.2. This proves the correctness.
The test for a single value of i takes O(M(n) log(nq)) operations in Fq , and the
expected number of operations for step 2 for a single f is O(M(n) log n log(nq)),
by Lemma 14.41. The expected number of trials is O(n), by Lemma 14.38. ✷

In practice, it may be advantageous to use a hybrid algorithm which switches

from the distinct-degree factorization in step 2 of Algorithm 14.40 to Algorithm
14.36 as soon as the former has certified that the polynomial in question has no
irreducible factors below a certain bound, say log2 n. A random polynomial of de-
gree n without factors of degree up to m ∈ O(log n) is irreducible with probability
about cm/n, for some constant c. Experiments for q = 2 in Section 15.7, where
variants of the distinct-degree factorization algorithm and of Algorithm 14.36 were
run in parallel, indicate that this is useful even when classical matrix multiplication
with ω = 3 is used.

C OROLLARY 14.43.
For a prime power q and n ∈ N, we can construct the extension field Fqn of Fq
using an expected number of O(n M(n) log n log(nq)) or O∼ (n2 log q) operations
in Fq .

For the big prime modular gcd algorithm 6.28 in Fq [x, y], we have to find an
irreducible polynomial which does not divide some unknown resultant r ∈ Fq [y],
on which only a degree bound deg r ≤ m is known.
412 14. Factoring polynomials over finite fields

C OROLLARY 14.44.
Let n ∈ N, q be a prime power, and r ∈ Fq [y] nonzero of degree at most m. Then
we can compute a uniformly random irreducible polynomial f ∈ Fq [y] of degree n
taking O(n M(n) log n log(nq)) or O∼ (n2 log q) operations in Fq , and f does not
divide r with probability at least 1/2 if qn ≥ 2m.

P ROOF. The cost estimate is from Theorem 14.42. There are I(n, q) irreducible
polynomials of degree n, and at most ⌊m/n⌋ of them divide r. Thus the probability
that f does not divide r is
jmk
I(n, q) −
n ≥ 1− m · n = 1− m ,
I(n, q) n qn qn

and the claim follows. ✷

In fact, the probability is close to 1 when qn is not too small.

14.10. Cyclotomic polynomials and constructing BCH codes

We recall from Section 8.2 that an element ω in a field F is an nth root of unity
if ω n = 1, and a primitive nth root of unity if in addition char F ∤ n and ω k 6= 1
for 1 ≤ k < n, or equivalently, ω has multiplicative order n in F × .

D EFINITION 14.45. The polynomial

Φn = ∏ (x − ω ) = ∏ (x − e2πik/n ) ∈ C[x]
ω ∈C primitive 1≤k<n
nth root of unity gcd(k,n)=1

is called the nth cyclotomic polynomial.

Lemma 14.47 below implies that Φn has coefficients in Z. Table 14.12 lists the
first 20 cyclotomic polynomials. We have deg Φn = ϕ(n), where ϕ is Euler’s totient
function (Section 4.2).

L EMMA 14.46. For a positive integer n, we have the factorization

xn − 1 = ∏ Φd . (12)
d|n

P ROOF. Let ω ∈ C be a zero of xn −1, that is, an nth root of unity. Then ord(ω ) = d
for some divisor d of n, by Lagrange’s theorem (Section 25.1). But this means that
14.10. Cyclotomic polynomials and constructing BCH codes 413

n Φn n Φn
1 x−1 11 x10 + x9 + · · · + x + 1
2 x+1 12 x4 − x2 + 1
3 x2 + x + 1 13 x12 + x11 + · · · + x + 1
4 x2 + 1 14 x6 − x5 + x4 − x3 + x2 − x + 1
5 x4 + x3 + x2 + x + 1 15 x8 − x7 + x5 − x4 + x3 − x + 1
6 x2 − x + 1 16 x8 + 1
7 x6 + x5 + x4 + x3 + x2 + x + 1 17 x16 + x15 + · · · + x + 1
8 x4 + 1 18 x6 − x3 + 1
9 x6 + x3 + 1 19 x18 + x17 + · · · + x + 1
10 x4 − x3 + x2 − x + 1 20 x8 − x6 + x4 − x2 + 1

TABLE 14.12: The first 20 cyclotomic polynomials.

ω is a primitive dth root of unity, and hence Φd (ω ) = 0. Conversely, if ω ∈ C is

a primitive dth root of unity for some divisor d of n, then ω d = 1, and hence also
ω n = 1. This shows that both polynomials in (12) have the same roots in C.
Now (xn − 1)′ = nxn−1 is coprime to xn − 1, which implies that xn − 1 is square-
free. By definition, Φd is squarefree for all d, and Φd and Φe have no roots in
common unless d = e. This proves that also the polynomial on the right hand side
is squarefree, and since both polynomials are monic, they are equal. ✷

As examples, we have

x6 − 1 = (x2 − x + 1)(x2 + x + 1)(x + 1)(x − 1) = Φ6 Φ3 Φ2 Φ1 ,

x8 − 1 = (x4 + 1)(x2 + 1)(x + 1)(x − 1) = Φ8 Φ4 Φ2 Φ1 .

Using Möbius inversion, Lemma 14.46 yields the formula

Φn = ∏(xd − 1)µ(n/d)
d|n

for the cyclotomic polynomials. For example, we have

(x6 − 1)(x − 1)
Φ6 = x2 − x + 1 = .
(x3 − 1)(x2 − 1)

The following lemma, which is proven in Exercise 14.45, yields an alternative

method for computing Φn .

L EMMA 14.47. Let n, k ∈ N>0 . Then

(i) Φn = xn−1 + xn−2 + · · · + x + 1 if n is prime,

414 14. Factoring polynomials over finite fields

(ii) Φ2n = Φn (−x) if n ≥ 3 is odd,

(iii) Φkn Φn = Φn (xk ) if k is a prime not dividing n,

(iv) Φkn = Φn (xk ) if every prime divisor of k divides n.

A LGORITHM 14.48 Cyclotomic polynomial computation.

Input: n ∈ N>0 and the distinct prime divisors p1 , . . . , pr of n.
Output: Φn ∈ Z[x].

1. f0 ←− x − 1
fi−1 (x pi )
2. for i = 1, . . . , r do fi ←−
fi−1

3. return fr (xn/(p1 ···pr ) )

T HEOREM 14.49.
Algorithm 14.48 uses O(M(n) log n) arithmetic operations in Z and correctly com-
putes the nth cyclotomic polynomial.

P ROOF. We claim that fi = Φ p1 ···pi for 0 ≤ i ≤ r. This is clear if i = 0 and follows

inductively from Lemma 14.47 (iii). Now let m = p1 · · · pr be the squarefree part
of n. Then
fr (xn/m ) = Φm (xn/m ) = Φn ,
by Lemma 14.47 (iv), since every prime divisor of n/m divides m.
The only arithmetic operations occur in step 2, where the ith operation is a divi-
sion of a polynomial of degree (p1 − 1) · · · (pi−1 − 1)pi by a polynomial of degree
(p1 − 1) · · · (pi−1 − 1). Roughly, this takes O(M(n)) additions and multiplications
in Z (since the divisor is monic), and the claim follows since r ≤ log2 n. ✷

Now let F be an arbitrary field and n ∈ N. We may consider Φn as a polynomial

in F[x] by reducing its coefficients modulo the characteristic p of F. Then (12),
Lemma 14.47, and Algorithm 14.48 are still valid. In Galois theory, it is shown
that if p does not divide n, then

Φn = ∏ (x − ω )
ω ∈E primitive
nth root of unity

holds over any extension E of F containing a primitive nth root of unity, and that
Φn is irreducible over Q (so that (12) is the irreducible factorization of xn − 1
over Q). The following lemma says that the latter is not true over finite fields.
14.10. Cyclotomic polynomials and constructing BCH codes 415

L EMMA 14.50. Let n ∈ N>0 , Fq be a finite field of characteristic p not dividing n,

and d = ordn (q) the multiplicative order of q in Z×
n . Then Φn factors in Fq [x] into
a product of ϕ(n)/d irreducible polynomials of degree d each. In particular, the
degree of the minimal polynomial of any primitive nth root of unity over Fq is d .

P ROOF. First we note that d | ϕ(n) = #Z× n , by Lagrange’s theorem. Now n divides
qd − 1 = #F×
qd
, and hence F q d contains a primitive nth root of unity ω (Lemma 8.8).
We choose such an ω , and let f ∈ Fq [x] be the unique irreducible factor of Φn
i
that has ω as a root. Since f (xq ) = f (x)q , the element ω q is a root of f for all
i ∈ N. Now 1, q, q2 , . . . , qd−1 are distinct modulo n, the order of ω in F× qd
, and
2 d−1
hence {ω , ω q , ω q , . . . , ω q } are d distinct roots of f . Thus deg f ≥ d. On the other
hand, we have Fq [x]/h f i ∼ = Fq (ω ) ⊆ Fqd (Section 25.3), whence deg f ≤ d. Thus
deg f = d, and since the choice of ω was arbitrary, this is true for all irreducible
factors of Φn . ✷

For example, the order of 3 modulo 8 is 2, and in fact Φ8 splits into two irreduc-
ible factors of degree 2 over F3 : x4 + 1 = (x2 + x − 1)(x2 − x − 1).

E XAMPLE 14.51. We take q = 2 and n = 15. Then d = ord15 (2) = 4. The poly-
nomial x15 − 1 factors in F2 [x] as

x15 − 1 = Φ15 Φ5 Φ3 Φ1
= (x8 − x7 + x5 − x4 + x3 − x + 1)(x4 + x3 + x2 + x + 1)(x2 + x + 1)(x − 1)
= (x4 + x + 1)(x4 + x3 + 1)(x4 + x3 + x2 + x + 1)(x2 + x + 1)(x + 1).

As predicted by Lemma 14.50, Φ15 splits into two irreducible factors of degree 4,
and Φ5 , Φ3 , and Φ1 remain irreducible. Let β ∈ F16 be a root of x4 + x + 1. Then
β is a primitive 15th root of unity. The roots of the minimal polynomial x4 + x + 1
of β are β , β 2 , β 4 , β 8 .
For i ∈ Z, β i is a primitive lth root of unity, where l = ord(β i ) = n/ gcd(n, i)
(Exercise 8.13). We have ord(β 3 ) = 15/ gcd(3, 15) = 5, so that β 3 is a primitive
5th root of unity. Now ord5 (2) = 4, so that the minimal polynomial of β 3 has pre-
cisely the four roots β 3 , β 6 , β 12 , β 24 = β 9 . Similarly, ord(β 5 ) = 3 and ord3 (2) = 2,
whence β 5 is a primitive third root of unity and its minimal polynomial only has
the two roots β 5 , β 10 . ✸

We define an equivalence relation ∼ on Zn by

i ∼ j ⇐⇒ ∃l ∈ Z: iql = j. (13)

n , then the equivalence class of i is the cyclotomic coset i · hqi of the cyclic
If i ∈ Z×
subgroup hqi of Z× n . If d = ordn (q) and β ∈ Fqd is a primitive nth root of unity,
416 14. Factoring polynomials over finite fields

then the powers β i and β j have the same minimal polynomial if and only if i ∼ j,
as in the example.
Lemma 14.50 implies that cyclotomic polynomials can be directly factored over
finite fields using equal-degree factorization, without performing either squarefree
or distinct-degree factorization. The cost is O∼ (n2 + n log q) operations in Fq or
O∼ (n2 log q + n log2 q) word operations if p ∤ n. Exercise 14.47 yields an even
faster algorithm for factoring xn − 1, taking only O∼ (n log2 q) word operations,
which can be modified so as to work for Φn as well. When p divides n, we have
p
Φn = Φn/p in Fq [x].
In Chapter 7, we discussed a class of cyclic codes that are of importance in
modern coding theory, the BCH codes. For a finite field Fq , a primitive nth root of
unity β ∈ Fqd in some extension field of Fq , and a positive integer δ , BCH(q, n, δ )
is the cyclic code (that is, the ideal) in Fq [x]/hxn − 1i generated by g mod xn − 1,
where g ∈ Fq [x] is the least common multiple of the minimal polynomials of
β , β 2 , . . . , β δ−1 over Fq . We now show how to compute such a generator poly-
nomial g.

A LGORITHM 14.52 Construction of BCH codes.

Input: A prime power q and positive integers n ≥ δ with gcd(n, q) = 1.
Output: A generator polynomial g ∈ Fq [x] of a BCH(q, n, δ ) code.

1. let p1 , . . . , pr ∈ N be the distinct prime divisors of n

call Algorithm 14.48 over Fq to compute Φn

2. use equal-degree factorization to find one irreducible factor f ∈ Fq [x] of Φn

β ←− x mod f

3. determine the distinct ∼ equivalence classes S1 , . . . , St ⊆ Zn of 1, . . . , δ − 1

4. compute β 2 , β 3 , . . . , β δ−1

5. for k = 1, . . . ,t do
i ←− min Sk , m ←− #Sk
compute β 2i , . . . , β (2m−1)i
use Exercise 12.10 to compute the minimal polynomial gk ∈ Fq [x]
of β i
6. return g1 · · · gt

Before analyzing the algorithm, we give an example. Let 1 ≤ i < δ and l =

ord(β i ). By Lemma 14.50, the degree of the minimal polynomial of β i over Fq
m−1
is m = ordl (q) and its roots are precisely β i , β iq , . . . , β iq , and {i, iq, . . . , iqm−1 } is
the ∼ equivalence class of i in step 3.
Notes 417

E XAMPLE 14.51 (continued). We take δ = 7, corresponding to the fourth row in

Table 7.1. The distinct ∼ equivalence classes of 1, . . . , 6 are

S1 = {1, 2, 4, 8}, S2 = {3, 6, 9, 12}, S3 = {5, 10}.

Suppose that we have f = x4 + x + 1 in step 2. Then g1 = f in step 5, g2 = Φ5

(β 3 , β 6 , β 9 , β 12 are precisely the primitive 5th roots of unity), and g3 = Φ3 (β 5 , β 10
are the primitive 3rd roots of unity). Thus

g = g1 g2 g3 = x10 + x8 + x5 + x4 + x2 + x + 1 ∈ F2 [x]

generates a BCH(2, 15, 6) code. Its minimal distance is 7. ✸

T HEOREM 14.53.
Algorithm 14.52 works correctly as specified and takes O(M(nd)(n/d) log n +
M(n) log q) or O∼ (n2 + n log q) operations in Fq , where d = ordn (q).

P ROOF. Since β is a root of Φn , it is a primitive nth root of unity. For 1 ≤ j < δ ,

the minimal polynomial of β j is equal to gk for the unique k ∈ {1, . . . ,t} such that
j ∈ Sk , and hence the least common multiple of the minimal polynomials of all β j
for 1 ≤ j < δ is g1 · · · gt . This proves correctness.
The cost for step 1 is O(M(n) log n) operations in Fq , by Theorem 14.49. Step 2
can be done with O(M(nd)(n/d) log n + M(n) log q) operations, by the discussion
following Corollary 14.29. All powers of β needed in steps 4 and 5 may be
computed with at most min{δ − 2 + t(2d − 2), n − 2} multiplications in Fqd , or
O(min{δ d, n}M(d)) operations in Fq , since t ≤ δ − 1. The cost for one minimal
polynomial computation in step 5 is O(M(m) log m) operations in Fq , by Exer-
cise 12.10. The total cost for steps 4 and 5 is O(min{δ d, n}M(d) + δ M(d) log d),
since m ≤ d in step 5. Finally, multiplying the gk together may be done with
O(M(n) logt) operations, by Lemma 10.4. The total cost is dominated by the cost
for step 2 since d ≤ ϕ(n) < n and δ ≤ n. ✷

Notes. The pioneering works for this area of computer algebra are those of Berlekamp
(1967, 1970), Zassenhaus (1969), and Cantor & Zassenhaus (1981).
14.2 and 14.3. The distinct-degree factorization of Section 14.2 appears in Zassenhaus
(1969), Kempfert (1969), Knuth (1998), already in the 1969 edition, Berlekamp (1970),
and Cantor & Zassenhaus (1981); the latter also contains the equal-degree factorization of
Section 14.3.
Actually the basic algorithms go back almost two centuries. Gauß’ Disquisitiones Gen-
erales de Congruentiis were to appear as Part 8 of his Disquisitiones Arithmeticae, but did
not make it (see page 372). Written in 1797 or 1798, but not quite polished to the Mas-
ter’s usual high gloss, his hand-written notes were published in his Nachlass (Gauß 1863a,
1863b). In article 370, Gauß writes: Sit itaque X functio, quae nullos amplius divisores
418 14. Factoring polynomials over finite fields

aequales involvit. Supra vidimus, x p − x esse productum ex omnibus functionibus primis

unius dimensionis. Sit ξ divisor communis maximae dimensionis functionum X et x p − x,
erit ξ productum ex omnibus divisoribus ipsius X unius dimensionis et Xξ huiusmodi di-
visores non amplius habebit. Quodsi autem inveniatur, functiones X et x p − x esse inter
se primas, X nullum divisorem unius dimensionis habebit adeoque congruentia X ≡ 0
radices reales non habebit. Porro quoniam x pp − x est productum ex omnibus function-
ibus primis duarum dimensionum uniusque, divisor communis maximae dimensionis func-
tionum x pp − x et Xξ , ξ ′ involvet omnes divisores ipsius X , qui sunt duarum dimensionum.
Hinc ulterius progrediendo perspicitur, X hoc modo in factores ξ, ξ ′ , ξ ′′ etc. resolvi, qui
continent respective omnes divisores unius, duarum, trium etc. dimensionum1.
The algorithm was found independently and first published by Galois (1830) (who for-
got to say that the gcd has to be removed before the next step and the possibility that
f ′ might be zero). Serret’s (1866) book contains a correct version of Galois’s algorithm.
Arwin’s (1918) paper contains many of the modern ideas on factorization, including this
algorithm. Since Cantor & Zassenhaus (1981) it is a staple of modern computer algebra.
Theorem 14.2 is in Gauß (1863b), article 353, for a prime q. Galois (1830) also dis-
covered and was the first to publish it. In his article 372, Gauß begins with the simplest
case of equal-degree factorization, where all factors are linear. He points out the analogy
with factoring integers, but ends on a somewhat helpless note: Sed huic rei inhaerere nolu-
mus, nam calculator exercitatus principia probe assecutus, quando opus est, facile artificia
particularia reperiet2.
Legendre (1785) already knew the basics of the probabilistic root finding method, via
Algorithm 14.10. He suggests (§§25–28) splitting f via gcd( f , x(A−1)/2 ± 1), where A is
A−1
a prime, and to translate the variable x if necessary: On cherchera [. . . ] la valeur de x 2
exprimée en puissances de x inférieures à xn , & on égalera cette valeur à +1 & −1 succes-
sivement. [. . . ] Toutes les fois que l’équation proposée aura des racines de deux espèces,
A−1 A−1
les unes au nombre de p, donnant x 2 = 1, les autres au nombre de q, donnant x 2 = −1;
la séparation en sera faite par la méthode précédente. [. . . ] On peut faire x = y ± k, k étant à
volonté, & résoudre l’équation en y par les mêmes principes.3 Euler (1761) proved Lemma
14.6 (ii), and Euler (1754/55) showed Lemma 14.7 (i), both in the case where q is prime.
He introduced the terminology of residuum and non-residuum ; sometimes even in the
modern literature one finds the unfortunate nonresidue for a nonsquare. Legendre (1798)
also proves Lemma 14.6 (ii) (Theorème 134, page 196).

1 Let X be a polynomial that has no further multiple divisors. We have seen above that x p − x is the product of
all irreducible polynomials of degree one. If ξ is the greatest common divisor of the polynomials X and x p − x,
then ξ will be the product of all divisors of X of degree one, and X/ξ will not have such factors any more. But
if it is found that the polynomials X and x p − x are coprime, then X will have no divisor of degree one and hence
2
the congruence X ≡ 0 will have no real [integer] roots. Moreover, since x p − x is the product of all irreducible
′ p 2
polynomials of degrees two or one, the greatest common divisor ξ of x − x and X/ξ will contain all divisors of
X of degree two. Continuing from here, one sees that X will be factored in this manner into factors ξ, ξ ′ , ξ ′′ etc.,
which contain all [irreducible] divisors of degree one, two, three etc., respectively.
2 We do not want to expand on this question, since a skilled calculator, well versed in these principles, will easily
find special tricks when needed.
3 One expands [. . . ] x(A−1)/2 in powers of x less than xn , and sets this value equal to 1 and to −1. [. . . ]
Whenever the original equation has roots of both types [both squares and nonsquares], say p of the first type,
satisfying x(A−1)/2 = 1 and q of the other type, satisfying x(A−1)/2 = −1, the separation of these types is achieved
by the preceding method. [. . . ] One can set x = y ± k, k being arbitrary, and solve the equation in y by the same
principles.
Notes 419

The history of factorization algorithms involves many more people; see the references
in the surveys of Kaltofen (1982, 1990, 1992) and von zur Gathen & Panario (2001), and in
Shparlinski’s (1992, 1999) treatises. Other early algorithms are by Prange (1959), Lloyd
(1964), Lloyd & Remmers (1966), and Willett (1978). The survey by Slisenko (1981)
mentions unpublished algorithms by Skopin and Faddeev, apparently found in the late
1960s.
If each deg gi in Algorithm 14.3 is 0 or i, then the distinct-degree factorization is already
the complete factorization of f . How often does this happen? We consider monic random
polynomials in Fq [x] of degree n. Then, when n is fixed and q −→ ∞, the probability goes
to e−γ ≈ 56% for large n, and for q fixed and n −→ ∞, this probability tends to a limit cq ,
with 66.56% ≈ c2 > cq > e−γ for all q ≥ 3. This was shown by Flajolet, Gourdon & Pa-
nario (2001) (Theorem 4.1), who give further results on the distribution of factor degrees
of random polynomials and the average case analysis of factoring algorithms. Similar and
related results are in Knopfmacher & Knopfmacher (1993), Knopfmacher (1995), Knopf-
macher & Warlimont (1995), Gourdon (1996), Gao & Panario (1997), Panario (1997),
Panario, Gourdon & Flajolet (1998), Panario & Richmond (1998), and Panario & Viola
(1998).
Gourdon (1996), Panario (1997), and Panario, Gourdon & Flajolet (1998) give results
about the distributions of the degrees of the largest and the second largest irreducible factor
of a random polynomial in Fq [x].
In general, an isomorphism between finite fields Fq ∼ = F p [x]/h f i and Fq ∼
= F p [x]/hgi,
where f , g ∈ F p [x] are irreducible of degree n and q = pn , can be obtained by mapping x
mod f to a root of f in F p [x]/hgi. Lenstra (1991) shows that such an isomorphism can
even be constructed in deterministic polynomial time.
d−1 d−2 d−1
Instead of the norm N(α) = αq +q +···+1, one can also use the trace T (α) = αq +
d−2
αq + · · · + α in equal-degree factorization; see McEliece (1969), Berlekamp (1970),
Camion (1981, 1982, 1983), and von zur Gathen & Shoup (1992). Both functions have the
crucial property that N(α), T (α) ∈ Fq for all α ∈ Fqd . The trace also works in characteris-
tic 2 (Exercise 14.16) where it is more tricky to apply the norm for factoring.
14.6. Gauß (1863b), article 368, basically describes the squarefree part algorithm 14.19,
but does not deal with the difficulty when the characteristic divides all exponents (and
the editor Dedekind repeats the incorrect statement). Lagrange (1769), §15, notes that
f / gcd( f , f ′ ) has the same roots as f , but each with multiplicity one (over C).
Algorithm 14.21 is from Yun (1976). For a random polynomial in Fq [x] of degree n, the
expected degree of the squarefree part is asymptotically about n − 1/q (Flajolet, Gourdon
& Panario 2001, Theorem 2.1).
We have already noted that (4), saying that a polynomial with vanishing derivative over a
finite field of characteristic p > 0 is a pth power, is not true for arbitrary fields of character-
istic p. In fact, over sufficiently bizarre (but still “computable”) fields it is undecidable—in
the sense of Turing—whether a polynomial is squarefree or not (von zur Gathen (1984a),
based on van der Waerden (1930a) and Fröhlich & Shepherdson (1955–56)). Van der
Waerden’s result is of particular interest because he has to assume explicitly that an unde-
cidable problem—an “ignorabimus”—exists (this was proven by Turing later in 1937) and
because Hilbert’s (1930) article in the same volume of the Mathematische Annalen ends
with Hilbert’s credo: In der Mathematik gibt es kein ignorabimus.4
4 There is no undecidable problem in mathematics.
420 14. Factoring polynomials over finite fields

14.7. The iterated Frobenius algorithm is from von zur Gathen & Shoup (1992), where the
estimate (7) is proven in their Theorem 3.2. This is also used in Kedlaya & Umans (2009).
14.8. The first pioneering random polynomial-time algorithms, based on linear algebra,
are due to Berlekamp (1967, 1970). The matrix Q was already used by Petr (1937) who
determined the characteristic polynomial of Q − I and gave a distinct-degree factorization
method using Q as representing the Frobenius automorphism. Schwarz (1939, 1940, 1956,
1960, 1961) and Butler (1954) used Q in various algorithms, for example to compute the
number of factors of a given degree. Camion (1980) coined the term Berlekamp algebra
for the kernel of β.
Berlekamp (1970) introduced the (q − 1)/2 trick into modern polynomial factorization;
Legendre already stated it in 1785.
A different linear algebra based method for factoring polynomials in Fq [x] was devel-
oped by Niederreiter (1993a, 1993b, 1994a), Göttfert (1994), and Niederreiter & Göttfert
(1993, 1995); see Niederreiter (1994b) for an overview. The method turned out to be
closely related to Berlekamp’s algorithm. Gao & von zur Gathen (1994) showed how to
combine it with Wiedemann’s method. The special case where q is prime is discussed in
Exercise 14.42. Gao (2003) extends Niederreiter’s method to bivariate polynomials over a
finite field.
Kaltofen & Shoup (1998) have found clever improvements to the factorization methods
in this chapter that yield algorithms whose dependence on the degree of the polynomial
to be factored is less than quadratic, namely O(n1.815 (log q)0.407 ) operations in Fq . For
practical purposes, they recommend a O∼ (n2.5 + n log q) version of their method.
As Kaltofen and Shoup say in their “Note added in proof”, their estimates can be im-
proved slightly by combining them with the fast rectangular matrix multiplication algo-
rithm of Huang & Pan (1998). This does not require any new algorithmic idea. As it is
not in the literature, we briefly explain such an improvement, assuming familiarity with
both papers. Theorem 10.2 of Huang & Pan gives an upper bound on ω(1, 1, r), which
is defined so that an n × n times n × nr matrix product can be calculated with O(nω(1,1,r) )
arithmetic operations. This bound contains two parameters l and b. We set l = 7 and
b = −0.00191r + 0.03551 in their bound and obtain a function ϕ(r) with ω(1, 1, r) ≤ ϕ(r).
Then one verifies that ϕ(r) ≤ 0.95732r + 1.42261 for 1.36437 ≤ r ≤ 1.67555. In Lemma 3
of Kaltofen & Shoup (1998), the dominating cost is a t × t times t × t r matrix multiplica-
tion (more exactly, its transpose) with t = n1/r and r = 1/(1 − β/2), for a parameter β.
The cost for their algorithm is then O∼ (nω(1,1,r)/r + n1+β +x ), where x = logn log2 q, as in
Figure 14.9. Using fast square matrix multiplication and the Coppersmith & Winograd
exponent, we have ω(1, 1, r) ≤ r − 1 + ω(1, 1, 1) ≤ r + 1.375477. Equating the two expo-
nents of n yields the value of β which minimizes the cost and provides the upper bound
of 0.407x + 1.815 given in Kaltofen & Shoup (1998). Substituting the better linear bound
on ω(1, 1, r) for fast rectangular matrix multiplication from above and equating exponents,
we find the upper bound 0.41565x + 1.80636, as in Figure 14.9. The required values of r
all lie within the interval given above.
This estimate is not the best that one can get from the methods of Kaltofen & Shoup
and Huang & Pan, but it is not clear to us how to obtain a simple explicit description of
the running time that results from combining these methods in an optimal way. We do not
claim that calculations as the above are of much value for practical purposes.
For large fields of small characteristic, say F2k , Kaltofen & Shoup (1997) present even
faster solutions by applying variants of the iterated Frobenius algorithm 14.26 over the
Notes 421

prime field. The natural cost measure now is to count word operations; as an example, they
achieve O(n(log q)1.687 ) word operations when k = ⌈n1.5 ⌉.
14.9. The worst-case and average upper bounds on δ(n) are in Hardy & Wright (1985),
§22.10. An exact formula for I(n, q) and the approximation qn /n are in Gauß (1863b),
articles 344–347; the slightly sharper bound

qn q(qn/2 − 1) qn − q
− ≤ I(n, q) ≤
n (q − 1)n n

for n ≥ 2 is in Lidl & Niederreiter (1997), Theorem 3.25 and Exercises 3.26 and 3.27.
Algorithm 14.36 is due to Rabin (1980b), and Algorithm 14.40 to Ben-Or (1981). Lemma
14.41 was stated in Ben-Or (1981) and is proven in the solution to Exercise 7.32 of Bach &
Shallit (1996). Panario & Richmond (1998) give a precise analysis of the implied constant.
The expected minimal degree has a large variance, namely about cn for some constant
c ≈ 0.5568. Shepp & Lloyd (1966) proved a similar result about permutations, namely that
the expected length of the shortest cycle of a random permutation on n letters is O(log n).
Panario & Viola (1998) give an analysis of Rabin’s algorithm. The estimate of the prob-
ability that a random polynomial over a finite field with no small factors is irreducible is
from Gao & Panario (1997).
Galois (1830) proposed a probabilistic approach to finding irreducible polynomials over
finite fields; see the quote at the beginning of this chapter. The asymptotically fastest
method for computing irreducible polynomials is in Shoup (1994), using O∼ (n2 + n log q)
operations in Fq .
Further notes. The central open question in the theory of factoring polynomials over finite
fields is: can this be done in deterministic polynomial time? We recall that the distinct-
degree algorithm 14.3 is deterministic, but the equal-degree algorithm 14.10 is probabilis-
tic. Thus we may assume f ∈ Fq [x] to be equal-degree. Berlekamp (1970) significantly
simplified the problem: we may assume that q is a prime, and that f has only linear factors
(see Exercise 14.40). Thus the question is the following:

Given f ∈ F p [x], of degree n ≤ p, which is known to have n distinct roots

in F p , and where p is prime, can we find these roots with a number of opera-
tions that is polynomial in n and log p?

Several special cases have been solved: when p − 1 has only small prime factors (so
that p − 1 is smooth, see Section 19.5) (Moenck 1977a, von zur Gathen 1987, Mignotte &
Schnorr 1988), when Φk (p) is smooth for some cyclotomic polynomial Φk ∈ Z[y] (Bach,
von zur Gathen & Lenstra 2001), when f is cyclotomic when considered in Q[x] or, more
generally, has commutative Galois group (Huang 1985, Rónyai 1989), or when n is small
(Rónyai 1988). The most general result is Evdokimov’s (1994) algorithm with an almost
polynomial number of word operations (nlog n log p)O(1) . All these results assume the Ex-
tended Riemann Hypothesis (ERH; see Notes 18.4). Irreducible polynomials can be com-
puted in deterministic polynomial time under the ERH (Adleman & Lenstra 1986). Shoup
(1990) and Lange & Winterhof (2000) present deterministic polynomial-time algorithms
which factor almost all polynomials.
We stress that a solution of this interesting problem is unlikely to affect the practice of
factoring, since there the probabilistic algorithms are just fine.
422 14. Factoring polynomials over finite fields

Exercises.
14.1 (i) Let Fq be a finite field with q elements. Prove Wilson’s theorem ∏a∈F×q a = −1. Hint:
Every a ∈ F×q different from ±1 has a
−1 6= a.

(ii) Prove a converse of Wilson’s theorem: If n is an integer such that (n − 1)! ≡ −1 mod n, then n
is prime.
2
14.2 Suppose p ≥ 5 is a prime, f ∈ F p [x] has degree 4, and gcd(x p − x, f ) = gcd(x p − x, f ) = 1.
What can you say about the factorization of f in F p [x]?
14.3 Trace Algorithm 14.3 on computing the distinct-degree decomposition of the squarefree poly-
nomial

f = x17 + 2 x15 + 4 x13 + x12 + 2 x11 + 2 x10 + 3 x9 + 4 x8 + 4 x4 + 3 x3 + 2 x2 + 4 x ∈ F5 [x].

Tell from the output only how many irreducible factors of degree i the polynomial f has, for all i.
14.4 Let q ∈ N be a prime power.
(i) Use Theorem 14.2 to prove that if r is a prime number, then there are (qr − q)/r distinct monic
irreducible polynomials of degree r in Fq [x]. (Observe that, by Fermat’s little theorem 4.9, (qr − q)/r
is an integer.)
(ii) Now suppose that r is a prime power. Find a simple formula for the number of monic irreduc-
ible polynomials of degree r over Fq .
14.5 Let p ∈ N be a prime and f ∈ Z[x] monic of degree n. Prove that the congruence f (a) ≡ 0
mod p has n solutions a ∈ Z p if and only f mod p is a factor of x p − x; that is, if and only if
x p − x = f q + pr, where q and r have integral coefficients, and where r is a polynomial of degree less
than n.
14.6∗ Let q be a prime power and f ∈ Fq [x] squarefree of degree n.
(i) Prove that for 1 ≤ a ≤ b ≤ n, the polynomial

d
gcd ∏ (xq − x), f
a≤d<b

is the product of all monic irreducible factors of f whose degree divides some number in the interval
{a, a + 1, . . ., b − 1}.

qb qb−d
(ii) Determine gcd ∏ (x − x ), f .
a≤d<b
(iii) Consider the following blocking strategy for distinct degree factorization. We partition the set
{1, . . ., n} of possible degrees of irreducible factors of f into k intervals I1 = {c0 = 1, 2, . . ., c1 − 1},
I2 = {c1 , c1 + 1, . . ., c2 − 1}, . . ., Ik = {ck−1 , ck−1 + 2, . . ., ck − 1 = n}, with integers 1 = c0 < c1 <
c2 < · · · < ck = n + 1. Describe an algorithm which, on input f , computes the polynomials g1 , . . ., gk
such that g j is the product of all monic irreducible factors of f with degree in the interval I j , for
1 ≤ j ≤ k.
14.7 Show that −1 is a square in F×
q for an odd prime power q if and only if q ≡ 1 mod 4.

14.8 Let Fq be a finite field with q elements and a, b ∈ F×

q two nonsquares. Prove that ab is a square.
Hint: Lemma 14.7.
14.9 If G is a group and a, b ∈ G, then b is a square root of a if b2 = a.
(i) Prove that every element of a cyclic group G has at most two square roots.
(ii) Find a counterexample to (i) when G is not cyclic.
Exercises 423

14.10 We consider F41 .

(i) Draw the “squaring graph” in F× 41 , the directed graph on vertices 1, . . ., 40 with the edge (i, j)
present if and only if i2 ≡ j mod p, for 1 ≤ i, j ≤ 40. Arrange your drawing so that the structure of
the graph is easy to see.
(ii) Draw the “cubing graph” (i3 ≡ j mod p).
(iii) Draw the “fifth power graph” (i5 ≡ j mod p).
(iv) Can you see the qualitative differences in the three graphs above? Can you explain them?
(v) Let q be a prime larger than 1 000 000. What can you say about the “qth power graph” on F× 41 ?
(vi) How many elements of F× 41 are squares? How many are nonsquares? Same question for cubes
and fifth powers.
14.11 This exercise generalizes Lemma 14.6. Let Fq be a finite field with q elements, and k ∈ N.
(i) For q = 13 and q = 17, draw the graph of the cubing map a 7−→ a3 on Fq , with the elements of
Fq as vertices and an edge a −→ b present if and only if a3 = b.
(ii) Show that ord(ak ) = ord a/ gcd(k, orda) for all a ∈ F×
q.
×
(iii) Show that the kth power group homomorphism σk : Fq −→ F× q is an automorphism if and only
if gcd(k, q − 1) = 1. Hint: F×
q is cyclic (Exercise 8.16).
(iv) Conclude that ker σk = ker σg and im σk = im σg , where g = gcd(k, q − 1).
14.12 The squarefree polynomial
f = x18 − 7 x17 + 4 x16 + 2 x15 − x13 − 7 x12 + 4 x11 + 7 x10 + 4 x9
−3 x8 − 3 x7 + 7 x6 − 7 x5 + 7 x4 + 7 x3 − 3 x2 + 5 x + 5 ∈ F17 [x]
splits into 3 irreducible factors of degree 6.
(i) How would you check the above statement without factoring f , by computing at most three
gcd’s? (You need not actually compute the gcd’s.)
(ii) Trace the equal-degree factorization algorithm 14.10 on computing these factors.
14.13 Write an efficient probabilistic algorithm that, given a prime p and an a ∈ Z×
p , computes the
square roots of a mod p—provided they exist. Apply your algorithm to p = 2591 and a = 1005.
14.15∗ Show that recursively applying the equal-degree splitting algorithm 14.8 to the smaller factor
leads to an algorithm for finding one irreducible factor of the input polynomial with an expected
running time of O((d log q + log n)M(n)) operations in Fq .
14.16∗ This exercise discusses variants of Algorithms 14.8 and 14.31 for finite fields of character-
m−1 m−2
istic two. For m ∈ N, we define the mth trace polynomial over F2 by Tm = x2 + x2 + · · · + x4 +
x2 + x ∈ F2 [x]. Let q = 2k for some k ∈ N>0 , f ∈ Fq [x] squarefree of degree n, with r ≥ 2 irreducible
factors f1 , . . ., fr ∈ Fq [x], R = Fq [x]/h f i, and Ri = Fq [x]/h fi i and χi : R −→ Ri with χi (a mod f ) =
a mod fi , for all i.
m
(i) Prove that x2 + x = Tm · (Tm + 1). Conclude that Tm (α) ∈ F2 for any α ∈ F2m , and that both
Tm (α) = 0 and Tm (α) = 1 occur with probability 1/2 when α is chosen uniformly at random. Hint:
The map induced by Tm is F2 –linear.
(ii) Suppose that all irreducible factors of f have the same degree d. Show that χi (Tkd (α)) ∈ F2
for all α ∈ R, and conclude that for a uniformly random α ∈ R, we have Tkd (α) ∈ F2 with probability
21−r ≤ 1/2.
(iii) Modify the equal-degree splitting algorithm 14.8 so as to work for q = 2k , by computing
b = Tkd (a) rem f in step 3. Prove that the modified algorithm fails with probability at most 1/2, and
that its running time is the same as that of the original algorithm.
(iv) Now let the degrees of the fi be arbitrary again and B ⊆ R the Berlekamp subalgebra of R.
Show that χi (Tk (α)) ∈ F2 for all α ∈ B, and conclude that for a uniformly random α ∈ B, we have
Tk (α) ∈ F2 with probability 21−r ≤ 1/2.
424 14. Factoring polynomials over finite fields

(v) Modify Berlekamp’s algorithm 14.31 so as to work for q = 2k , by computing b = Tk (a) rem f
in step 6. Prove that the modified algorithm fails with probability at most 1/2, and that its running
time is the same as that of the original algorithm.
14.17∗∗ The aim of this exercise is to reduce the expected cost estimate for equal degree factoriza-
tion from O((d log q + log n)M(n) log r) field operations, as shown in Theorem 14.9 and the discus-
sion following it, to O(d log q · M(n) + log(qn)M(n) log r).
Let q be an odd prime power and f ∈ Fq [x] squarefree of degree n with r ≥ 2 irreducible factors
f1 , . . ., fr of degree d = n/r. We let R, R1 , . . ., Rr and the Chinese remainder isomorphism χ =
χ1 × · · · × χr : R −→ R1 × · · · × Rr be as in Section 14.3. The norm on Ri ∼ = Fqd is defined by
2 d−1 d
N(α) = ααq αq · · ·αq = α(q −1)/(q−1), and we use the same formula to define the norm on R.
(i) Let α ∈ R× be a uniform random element, β = N(α), and 1 ≤ i ≤ r. Show that χi (β) is a root of
q−1 − 1, and conclude that χi (β) is a uniform random element in F×
x q . Hint: N is a homomorphism
of multiplicative groups.
(ii) Provided that q > r, what is the probability that the χi (β) are distinct for 1 ≤ i ≤ r? Prove that
this probability is at least 1/2 if q − 1 ≥ r2 .
(iii) For u ∈ Fq , let π(u) = u(q−1)/2 , so that π(u) ∈ {−1, 0, 1}, π(u) = 0 if and only if u = 0, and
π(u) = −1 if and only if u is a nonsquare. Moreover, let u, v ∈ Fq be distinct. Prove that for a
uniformly random t ∈ Fq , we have π(u + t) 6= π(v + t) with probability at least 1/2. Hint: The map
t 7−→ (u + t)/(v + t) if t 6= −v and −v 7−→ 1 is a bijection of Fq .
(iv) Consider the following variant of Algorithm 14.8, due to Rabin (1980b).
A LGORITHM 14.54 Equal-degree splitting.
Input: A squarefree monic reducible polynomial f ∈ Fq [x] of degree n, where q is an odd prime
power, a divisor d < n of n, so that all irreducible factors of f have degree d, and a ∈ Fq [x] of
degree less than n with χi (a mod f ) ∈ Fq for all i.
Output: A proper monic factor g ∈ Fq [x] of f , or “failure”.
1. g1 ←− gcd(a, f )
if g1 6= 1 and g1 6= f then return g1
2. choose t ∈ Fq at random
3. call the repeated squaring algorithm 4.8 in R = Fq [x]/h f i to compute b = (a +t)(q−1)/2 rem f
4. g2 ←− gcd(b − 1, f )
if g2 6= 1 and g2 6= f then return g2 else return “failure”
Use (iii) to prove that the failure probability of the algorithm is at most 1/2 if a 6∈ Fq .
(v) Use the algorithm from (iv) as a subroutine to create a recursive algorithm for equal-degree
factorization, which has the same input specification as the above algorithm and outputs all irreduci-
ble factors of f . The value of a is passed to the recursive calls. Prove that the algorithm never halts
if χi (a mod f ) = χ j (a mod f ) for some i 6= j, and that otherwise, if all χi (a mod f ) are distinct ele-
ments of Fq , the probability for its recursion depth to be more than k = 1 + ⌈2 log2 r⌉ is at most 1/2.
Conclude that in the latter case, the number of operations in Fq is O(M(n) log(qn) log r).
d
(vi) Now we first compute a = c(q −1)/(q−1) rem f for a uniform random polynomial c ∈ Fq [x] of
degree less than n, and then call the algorithm from (v) for that value of a and stop the recursion at
depth k. We assume that q − 1 ≥ r2 . Prove that with probability at least 1/4, this method yields the r
irreducible factors of f in time O(d M(n) log q + M(n) log(qn) log r).
14.18−→ Use Algorithm 14.13 to factor the polynomial x6 + x3 + x2 + x + 1 ∈ F2 [x] into irreducible
factors. Show all your steps.
14.19 Let F be a field and f ∈ F[x] with f (0) 6= 0. We recall rev( f ) = f ∗ = xdeg f f (1/x), the
reversal (or reciprocal polynomial) of f (Section 9.1). We say that f is self-reciprocal if f = f ∗ .
Exercises 425

(i) Show that ∗ is multiplicative, so that ( f g)∗ = f ∗ g∗ for all g ∈ F[x] with g(0) 6= 0.
(ii) Prove that f (α−1 ) = 0 ⇐⇒ f ∗ (α) = 0, for all α ∈ F. Conclude that the set of zeroes of f is
closed under inversion if f is self-reciprocal.
(iii) Show that every self-reciprocal polynomial f of odd degree satisfies f (−1) = 0.
(iv) Let f ∈ F[x] with f (0) 6= 0 be self-reciprocal and g ∈ F[x] an irreducible factor of f . Then
also g∗ is an irreducible factor of f .
(v) The squarefree polynomial f = (x21 + 1)/(x + 1) ∈ F2 [x] has—among others—the following
irreducible factors: x2 + x + 1, x3 + x + 1, and x6 + x4 + x2 + x + 1. What are the others?
14.20∗ Let f ∈ Fq [x] of degree n be given, and for a ∈ Fq , let Ba = {b ∈ Fq : f (b) = a} be the set of
preimages of a under the mapping b 7−→ f (b) induced by f .
(i) Given a, show how to compute ∏b∈Ba (y − b) ∈ Fq [y] with O(M(n) log(qn)) operations in Fq .
(ii) Given a, show how to compute probabilistically Ba with O(M(n) log n log(qn)) operations
in Fq .
(iii) If the function corresponding to f is bijective (so that #Ba = 1 for all a ∈ Fq ), then f is called
a permutation polynomial. Use Exercise 14.11 to derive a criterion when f = xn is a permutation
polynomial.
(iv) If f is not a permutation polynomial, then in fact

1
#{a: Ba 6= Ø} = # im f ≤ q 1 −
n
(Wan 1993; a weaker result is in von zur Gathen 1991b). Use this fact to derive a probabilistic (Monte
Carlo) test for permutation polynomials, taking O(n M(n) log(qn)) operations in Fq .
14.21 Let f ∈ Z[x] be of degree n and max-norm || f ||∞ = A, and f = (ux+v)g, with nonzero u, v ∈ Z
and g = ∑0≤i<n gi xi ∈ Z[x].
(i) Prove that |gi | ≤ (i + 1)A/|v| for 0 ≤ i < n − 1 if |u| = |v|, and conclude that then ||g||∞ ≤ nA.
(ii) Now assume that α = |u/v| < 1. Show that |gi | ≤ A(1 − αi+1 )/(1 − α)|v| for 0 ≤ i < n − 1,
and conclude that ||g||∞ ≤ A. Prove that the latter also holds if |u/v| > 1.
14.22 (i) Use the Leibniz rule to prove (2).
(ii) Conclude that f / gcd( f , f ′ ) = ∏ei fi′ 6=0 fi .
14.23 Prove or disprove:
(i) The polynomial x1000 + 2 ∈ F5 [x] is squarefree.
(ii) Let F be a field and f , g ∈ F[x]. Then the squarefree part of f g is the product of the squarefree
parts of f and of g.
14.24 (Yun 1977b) Over a field F of characteristic zero, Algorithm 14.19 reduces the problem of
computing the squarefree part of a polynomial to a gcd computation.
(i) Show that conversely computing a gcd of two squarefree polynomials f , g ∈ F[x] can be reduced
to computing the squarefree part of a certain polynomial.
(ii) Let f , g ∈ F[x] be monic nonconstant, with squarefree decompositions f = ∏1≤i≤m fii and
g = ∏1≤i≤k gii . Show that gcd( f , g) = ∏1≤i≤min{m,k} gcd( fi · · · fm , gi · · ·gk ), and conclude from (i)
that computing gcd’s can be reduced to computing squarefree decompositions.
14.25 Test the following polynomials for multiple factors in Q[x].
(i) x3 − 3x2 + 4, (ii) x3 − 2x2 − x + 2.
14.26 Let F be a field of characteristic zero, f ∈ F[x] monic nonconstant of degree n, f = g1 g22 · · ·gm
m
its squarefree decomposition, v = g1 · · ·gm , u = f /v, and w = f ′ /u.
v
(i) Show that gcd( f , f ′ ) = u and w = ∑ ig′i . Hint: Exercise 14.22.
1≤i≤m g i
426 14. Factoring polynomials over finite fields

(ii) Prove that gi = gcd(v, w − iv′ ) for 1 ≤ i ≤ m.

(iii) State the squarefree factorization algorithm resulting from (ii). Show that its running time
in field operations is O(n M(n) log n), and give an example where this upper bound is essentially
achieved.
14.27∗ Let Fq be a finite field of characteristic p, and f = f1e1 · · · frer ∈ Fq [x] of degree n, with
distinct monic irreducible polynomials f1 , . . ., fr ∈ Fq [x] and positive e1 , . . ., er ∈ N.
(i) Show that Algorithm 14.19 returns v = ∏ p∤ei fi .
(ii) Prove that u/ gcd(u, vn ) = ∏ p|ei fiei , and show how to compute it at a cost of O(M(n) log n)
operations in Fq .
(iii) Derive an algorithm for computing the squarefree part over Fq by computing a pth root of
u/ gcd(u, vn ) and proceeding recursively. Show that this algorithm uses O(M(n) log n + n log(q/p))
operations in Fq .
(iv) Let f = ab2 c2 d 6 e8 ∈ F2 [x], with irreducible and pairwise coprime polynomials a, b, c, d, e ∈
F2 [x]. Use your algorithm from (iii) to compute its squarefree part.
14.28 Prove that the squarefree decomposition of a monic polynomial is unique.
14.29 Compute the squarefree decomposition of the following polynomials in Q[x] and in F3 [x].
(i) x6 − x5 − 4x4 + 2x3 + 5x2 − x − 2,
(ii) x6 − 3x5 + 6x3 − 3x2 − 3x + 2,
(iii) x5 − 2x4 − 2x3 + 4x2 + x − 2,
(iv) x6 − 2x5 − 4x4 + 6x3 + 7x2 − 4x − 4,
(v) x6 − 6x5 + 12x4 − 6x3 − 9x2 + 12x − 4.
14.30∗ Let Fq be a finite field of characteristic p, and f ∈ Fq [x] monic and nonconstant with square-
free decomposition (g1 , . . ., gm ).
(i) Show that Yun’s algorithm 14.21 returns the correct result if m < p.
(ii) Let m ≥ p and show that Algorithm 14.21 when applied to f computes hi = ∏ j≡i mod p g j for
1 ≤ i < p and hi = 1 for i ≥ p.
−p+1
(iii) Modify Algorithm 14.21 so as to work correctly also for m ≥ p. (Hint: f h−1 −2
1 h2 · · ·h p−1 is
a pth power.) Show that the modified algorithm takes O(M(n) log n + n log(q/p)) operations in Fq .
Trace the algorithm on the example polynomial from Exercise 14.27 (iv).
14.31 Let F be a field, and let f , g1 , . . ., gm ∈ F[x] be monic nonconstant polynomials. Recall that
(g1 , . . ., gm ) is the squarefree decomposition of f if f = g1 g22 · · ·gm
m , each gi is squarefree, the gi are
pairwise coprime, and gm 6= 1.
(i) Prove that there is a unique decomposition f = h1 · · ·hm such that each hi is monic, nonconstant,
and squarefree, hi | hi−1 for 2 ≤ i ≤ m, and hm 6= 1.
(ii) Give both decompositions for f = x4 (x + 1)3 .
(iii) Express the hi in terms of the gi and vice versa, and show that both conversions can be com-
puted in time O(M(n)) if n = deg f .
14.32 (Carlitz 1932) You are to show that for a prime power q and a positive integer n ≥ 2, the
probability for a random polynomial in Fq [x] of degree n to be squarefree is 1 − 1/q. Let sn denote
the number of monic squarefree polynomials of degree n in Fq [x]. Then s0 = 1 and s1 = q.
(i) Prove the recursive formula ∑0≤2k≤n qk sn−2k = qn . Hint: Every monic polynomial f ∈ Fq [x]
can be uniquely written as f = g2 h with a squarefree monic polynomial h.
(ii) Conclude that sn = qn − qn−1 if n ≥ 2 by subtracting a suitable multiple of the above formula
for n − 2 from the formula itself.
14.33 Let F be a field of positive characteristic p and a ∈ F such that a has no pth root in F. Prove
that x p − a ∈ F[x] is irreducible. Hint: f = (x − a1/p ) p over its splitting field F(a1/p ). Look at the
coefficient of xn−1 of a hypothetical factor g ∈ F[x] of degree n of f .
Exercises 427

14.34 Let Fq be a finite field with q elements, f ∈ Fq [x] nonconstant, and ξ = xq mod f ∈ R =
ˇ
Fq [x]/h f i. Prove or disprove that αq = ξ(α) for all α ∈ R.
14.35∗ Find “small” constants c1 , c2 ∈ Q such that the running time of the iterated Frobenius algo-
rithm 14.26 is at most (c1 n/d + c2 )M(d) log2 d + O(M(d) + n log d) additions and multiplications in
R when n and d are powers of 2. Hint: Exercise 10.2.
14.36∗ Let q be a prime power, f ∈ Fq [x] of degree n, and R = Fq [x]/h f i.
d−1
(i) Consider the following algorithm for computing the norm Nd (α) = ααq · · ·αq for α ∈ R and
a power of two d < n.
A LGORITHM 14.55 Norm computation.
Input: f ∈ Fq [x] of degree n, a power of two d ∈ N with d ≤ n, ξ q ∈ R = Fq [x]/h f i, where ξ = x
mod f ∈ R, and α ∈ R.
Output: Nd (α) ∈ R.
1. γ0 ←− ξ q , δ0 ←− ξ, l ←− log2 d
2. for i = 1, . . ., l do
call the modular composition algorithm 12.3 to compute γi = γ̌i−1 (γi−1 ) and
δ̌i−1 (γi−1 )
δi ←− δi−1 · δ̌i−1 (γi−1 )
3. return δl
Prove that the algorithm works correctly and takes O((n(ω+1)/2 + n1/2 M(n)) log d) operations
in Fq . Compare this to the time for computing the norm by employing the iterated Frobenius algo-
rithm 14.26.
(ii) Modify the algorithm so as to also work when d is not necessarily a power of two.
d−1
(iii) Design a similar algorithm for computing the trace Td (α) = α + αq + · · · + αq of α ∈ R.
14.37 Let q be a prime power, f = f1 · · · fr ∈ Fq [x] squarefree, with monic irreducible and pairwise
coprime f1 , . . ., fr ∈ Fq [x], and B ⊆ Fq [x]/h f i the Berlekamp algebra of f . Prove that the “Lagrange
interpolants” l1 , . . ., lr ∈ Fq [x] of degree less than deg f and with li ≡ 0 mod f j if j 6= i and li ≡ 1
mod fi are a basis of B.
14.38∗ Let f = f1 · · · fr ∈ F2 [x] be squarefree of degree n, with f1 , . . ., fr ∈ F2 [x] monic irreducible
and pairwise coprime, B ⊆ F2 [x]/h f i its Berlekamp algebra, and b1 mod f , . . ., br mod f a basis
of B, with all bi ∈ F2 [x] of degree less than n.
(i) Show that for 1 ≤ i ≤ r with at most one exception, there exist indices j, k such that fi | b j and
f i ∤ bk .
(ii) Let f = g1 · · ·gs be a partial factorization of f , with all g j monic nonconstant and pairwise
coprime, and 1 ≤ i ≤ r. Use Exercise 11.4 to show that
g1 gs
gcd(bi , g1 ), , . . ., gcd(bi , gs ),
gcd(bi , g1 ) gcd(bi , gs )

can be computed using O(M(n) log n) operations in F2 (we call this a refinement with bi ).
(iii) Show that by successively refining partial factorizations of f with b1 , . . ., br , starting with the
trivial factorization f = f , we obtain all irreducible factors of f in time O(r · M(n) log n).
14.39∗ Let p ∈ N be prime and q = pk for some positive k ∈ N, f ∈ Fq [x] monic squarefree of
degree n, and R = Fq [x]/h f i. We may replace the Frobenius endomorphism α 7−→ αq of R over Fq
in Berlekamp’s algorithm 14.31 by the absolute Frobenius endomorphism α 7−→ α p of R over the
prime field F p . Analyze this variant and compare its expected running time to that of the original
algorithm.
428 14. Factoring polynomials over finite fields

14.40∗ It is clear that finding roots of polynomials is a special case of factoring polynomials. This
exercise shows conversely how factoring over a finite field can be reduced to root finding over the
prime field. Let q = pk be a prime power for some positive k ∈ N, f ∈ Fq [x] monic squarefree
of degree n, R = Fq [x]/h f i, and B = {a mod f ∈ R: a p ≡ a mod f } ⊆ R the absolute Berlekamp
subalgebra (see Exercise 14.39).
(i) Let b ∈ Fq [x] such that b mod f ∈ B. Prove that f = ∏ gcd( f , b − a).
a∈F p
(ii) Let y be a new indeterminate and r = resx ( f , b − y) ∈ Fq [y]. Show that r has some roots in F p ,
and that any root of r in F p leads to a nontrivial factor of f if b 6∈ F p .
(iii) Give a deterministic polynomial-time reduction from factoring in Fq [x] to root finding in F p [x].
14.41∗ Let q be a prime power, f ∈ Fq [x] monic squarefree of degree n, and R = Fq [x]/h f i, as usual.
(i) Show that if f splits into r irreducible factors of degrees d1 , . . ., dr , then lcm{xdi − 1: 1 ≤ i ≤ r}
is the minimal polynomial of the matrix Q representing the Frobenius endomorphism α 7−→ αq on R.
Hint: Start with r = 1. Conclude that f is irreducible if and only if the minimal polynomial of Q is
xn − 1.
(ii) Use (i) and Exercise 12.15 to design a Monte Carlo test whether f is irreducible. Your test
should take O(n · M(n) logq) operations in Fq if q is “large enough”.
14.42∗∗ This exercise discusses the easiest case of another factoring method based on linear algebra,
due to Niederreiter (see Notes 14.8). Let p ∈ N be prime.
(i) Prove that for all rational functions h ∈ F p (x), the (p − 1)st derivative h(p−1) is a pth power.
(ii) Show that for any nonzero polynomial f ∈ F p [x], the rational function h = f ′ / f ∈ F p (x) is a
solution of the differential equation
h(p−1) + h p = 0. (14)
Hint: Prove this first when f is squarefree, using Exercise 9.27 over the splitting field of f , and
Wilson’s theorem (Exercise 14.1). For the general case, employ the squarefree decomposition of f
and Exercise 9.27.
(iii) Prove that if h = g/ f ∈ F p (x) satisfies (14), with nonzero coprime f , g ∈ F p [x] and f monic,
then deg g < deg f and f is squarefree.
(iv) Let f , g be as in (iii) and λ1 , . . .λn ∈ E the (distinct) roots of f in a splitting field E of f
over F p . By partial fraction decomposition, there exist d1 , . . ., dn ∈ E such that

g di
= ∑ .
f 1≤i≤n x − λi

Show that y = di /(x − λi ) solves (14) for 1 ≤ i ≤ n. (Hint: Uniqueness of partial fraction decompo-
sition). Prove that di = dk ∈ F p if λi and λk are roots of the same irreducible factor in F p [x] of f , and
conclude that
g f j′
= ∑ cj
f 1≤ j≤r fj
for some c1 , . . ., cr ∈ F p , where f1 , . . ., fr are the distinct monic irreducible factors of f .
(v) Let f ∈ F p [x] be monic of degree n with the factorization f = f1e1 · · · frer into irreducible factors,
and
g
N = {g ∈ F p [x]: deg g < n and h = satisfies (14)}.
f
Prove that f1′ f / f1 , . . ., fr′ f / fr is a basis of N as a vector space over F p .
(vi) Now let f be squarefree and B ⊆ F p [x]/h f i the Berlekamp algebra of f . Prove that the map
ϕ: N −→ B with ϕ(g) = g · ( f ′ )−1 mod f is a vector space isomorphism. Hint: Consider ϕ(g)
mod f j for all j.
Exercises 429

(vii) Assume that p > 2. Let f as in (vi), g = ∑1≤ j≤r c j f j′ f / f j ∈ N with all ci ∈ F p , and S ⊆ F×
p
the set of squares. Show that

gcd(g(p−1)/2 − ( f ′ )(p−1)/2 , f ) = ∏ f j,
c j ∈S

and conclude that this gcd is nontrivial with probability at least 1/2 if c1 , . . ., cr are chosen uniformly
at random in F p and gcd( f , g) = 1.
14.43∗∗ This exercise turns the theory from Exercise 14.42 into an algorithm for p = 2. Let f =
∑0≤i≤n fi xi ∈ F2 [x] be monic squarefree of degree n.
(i) Prove that N = {g ∈ F2 [x]: deg g < n and ( f g)′ = g2 }.
1/2
(ii) Let N ∈ F2n×n be the matrix of the linear operator g 7−→ ( f g)′ on the vector space of all
polynomials in F2 [x] of degree less than n with respect to the polynomial basis xn−1 , xn−2 , . . ., x, 1,
so that    
!2 gn−1 hn−1
∑ hi xi ⇐⇒ N ·  ...  =  ...  .
   
( f ∑ gi xi )′ =
0≤i<n 0≤i<n
g0 h0
Prove that  
fn 0 0 0 0 0 ···
 fn−2
 fn−1 fn 0 0 0 ··· 

N =  fn−4 fn−3 fn−2 fn−1 fn 0 ··· .
 
.. ..
. .
(iii) Design an algorithm for factoring f by determining a basis of N − I, where I is the n × n
identity matrix, and using an analog of Exercise 14.38. Prove that it takes O(nω ) operations in F2 ,
like Berlekamp’s algorithm.
14.44∗ Let q be a prime power, t ∈ N a prime divisor of q − 1, and a ∈ F× q.
(i) Show that the polynomial xt − a ∈ Fq [x] splits into linear factors if a is a tth power Hint: Use
Lemma 8.8.
(ii) Show that xt − a is irreducible if a is not a tth power Hint: Use (i) for the splitting field of xt − a
and consider the constant coefficient of a hypothetical factor f ∈ Fq [x] of xt − a.
(iii) Derive a formula for the probability that a random binomial xt − a (that is, for random a ∈ F× q)
is irreducible, and compare it to the probability that a random polynomial of degree t in Fq [x] is
irreducible.
14.45 Prove Lemma 14.47.
14.46 This exercise discusses a useful tool from number theory: Möbius inversion. Let the Möbius
function µ be defined as in (11).
(i) Prove that µ is multiplicative, so that µ(mn) = µ(m)µ(n) whenever m, n ∈ N>0 are coprime.
(ii) Show that ∑d|n µ(d) = 0 if n > 1, where the sum is over all positive divisors of n.
(iii) Let R be an arbitrary ring (commutative, with 1) and f , g: N>0 −→ R be two functions such
that
f (n) = ∑ g(d) for n ∈ N>0 .
d|n

Prove that
n n
g(n) = ∑ µ f (d) = ∑ µ(d) f for n ∈ N>0 .
d|n
d d|n
d
430 14. Factoring polynomials over finite fields

(iv) Now assume that

f (n) = ∏ g(d) for n ∈ N>0 .
d|n
Give a similar formula for g in terms of µ and f as in (iii).
(v) Let d(n) denote the number of positive divisors of n ∈ N>0 . Prove that ∑e|n µ(n/e)d(e) = 1
for all positive integers n.
14.47∗ (Prange 1959) Let q ∈ N be a prime power coprime to n ∈ N. We use the equivalence
relation ∼ as in (13). Let S1 , . . ., Sr be the distinct equivalence classes of Zn with respect to ∼, and
bi = ∑ j∈Si x j ∈ Fq [x] for 1 ≤ i ≤ r.
(i) Show that a basis of the Berlekamp algebra B ⊆ Fq [x]/hxn − 1i belonging to the polynomial
x − 1 is given by b1 mod xn − 1, . . ., br mod xn − 1.
n

(ii) Give an algorithm which computes all irreducible factors of the polynomial xn − 1 in Fq [x]
using O(log q log n + n M(log n)) word operations and an expected number of O(M(n) log(qn) log r)
field operations in Fq , in total O∼ (n log2 q) word operations. Hint: Use the method of Exercise 11.4
to split factors already found and perform a similar analysis as in the proof of Theorem 14.11.

Research problems.
14.48 Find a deterministic polynomial-time algorithm for computing a root of a squarefree polyno-
mial f ∈ F p [x] which divides x p − x, where p is a prime number. (Exercise 14.40 implies that then
the general problem of factoring polynomials can be solved in deterministic polynomial time.)
14.49 Allowing probabilistic algorithms using only operations in Fq , show that Ω(n log q) oper-
ations are required to factor a polynomial of degree n in Fq [x]. This corresponds to the diagonal
y = x + 1 in Figure 14.9.
The operation of factoring [polynomials]
must be performed by inspection.
Charles Davies (1867)

Tous les effets de la nature ne sont que les résultats

mathématiques d’un petit nombre de lois immuables.1
Pierre Simon Laplace (1812)

1 All the effects of nature are only mathematical results of a small number of immutable laws.
15
Hensel lifting and factoring polynomials

In this chapter, we present two modular algorithms for factoring in Q[x] and F[x, y]
for a field F. The first one uses factorization modulo a “big” prime and is concep-
tually easier, and the second one uses factorization modulo a “small” prime and
then “lifts” it to a factorization modulo a power of that prime. The latter is compu-
tationally faster and comprises our most powerful employment of the prime power
modular approach introduced in Chapter 5.

15.1. Factoring in Z[x] and Q[x]: the basic idea

Our first goal is to understand the difference between “factoring in Z[x]” and “fac-
toring a polynomial with integer coefficients in Q[x]”. The basic fact is that the
latter corresponds to factoring primitive polynomials in Z[x], while the former re-
quires in addition the factoring of an integer, namely the polynomial’s content. We
rely on the following notions which were introduced in Section 6.2.
Let R be a Unique Factorization Domain (our two main applications are, as
usual, R = Z and R = F[y] for a field F). The content cont( f ) of a polynomial f ∈
R[x] is the greatest common divisor of its coefficients (with the convention that the
gcd is positive if R = Z and monic if R = F[y]). The primitive part pp( f ) of f is
f / cont( f ) ∈ R[x], and f is primitive if cont( f ) = 1. Corollary 6.7, a consequence
of Gauß’ lemma, says that for any f , g ∈ R[x], we have cont( f g) = cont( f ) cont(g)
and pp( f g) = pp( f ) pp(g). In particular, if f and g are primitive, then so is f g.
As a consequence, R[x] is a Unique Factorization Domain, and its primes are the
primes of R plus the primitive polynomials in R[x] that are irreducible in K[x],
where K is the field of fractions of R.
Thus the relation between factoring polynomials over Q and over Z is as follows.
If f ∈ Z[x] is primitive, then a factorization f = f1 · · · fk into irreducible fi ∈ Q[x]
yields a factorization f = f1∗ · · · fk∗ into irreducible fi∗ ∈ Z[x] by multiplying up
denominators and then removing contents. On the other hand, any irreducible
factorization in Z[x] is also one in Q[x]. For an arbitrary f the factorization of f is
the factorization of the content of f together with the factorization of the primitive

433
434 15. Hensel lifting and factoring polynomials

part of f . Thus
factoring in Z[x] ←→ factoring in Q[x] plus factoring in Z.
The best known algorithms for factoring in Z (Chapter 19) are much less efficient
than those for Q[x] that we present in this and the next chapter. From now on,
“factoring in Z[x]” will usually refer to primitive polynomials, for which the part
“factoring in Z” is trivial.
The basic idea of the factoring algorithm is as follows. Let f ∈ Z[x] be a prim-
itive polynomial to be factored. Using the squarefree part algorithm 14.19 if nec-
essary, we may assume that f is squarefree. We take a “big” prime p ∈ Z not
dividing the leading coefficient of f and such that f mod p ∈ F p [x] is squarefree
(we will make precise later what “big” means). Using one of the (probabilistic)
algorithms in Chapter 14, we factor f modulo p. If g ∈ Z[x] is a factor of f and
g1 , . . . , gs are the irreducible factors of g modulo p, then we can recover g from
them. If f factors as f = f1 · · · fk in Z[x], then also f = f1 · · · fk in Z p [x], where the
bar means taking each coefficient modulo p. But if the true factor f1 is irreducible,
then f1 need not be irreducible, and our factorization modulo p will return all the
irreducible modular factors of all fi , but we will not immediately know which of
them belong together (see Figure 15.2).
The following questions arise if we want to turn this sketch into an algorithm.
◦ How large do we have to choose p so that we can recover the coefficients of
any factor from its image modulo p? The answer has already been given by
Mignotte’s bound 6.33 in Section 6.6.
◦ In what range do we have to choose a random p so that f mod p is squarefree
with sufficiently high probability? The answer to this question is provided
by the resultant theory from Chapter 6 and the prime number theorem in
Chapter 18.
◦ Finally, the trickiest question is: how can we find the modular factors of f
mod p that correspond to a true factor of f in Z[x]? Very easy: we simply try
all possible factor combinations. Unfortunately, this leads to an exponential
algorithm in the worst case; examples are given by the Swinnerton-Dyer poly-
nomials in Section 15.3. In Chapter 16, we present a method to circumvent
this: short vectors in lattices.
Our algorithms consist of two stages: a modular factorization stage, where we
take either a single big prime or a prime power as modulus, and a second stage
where we try to find true factors from modular ones, either by factor combination
or with the aid of short vectors in lattices. This is illustrated in Figure 15.1; each
of the two variants in the top row may be freely combined with each of the two
methods in the bottom row. The next section describes the “big prime” and “factor
combination” stages, Sections 15.4 and 15.5 the “prime power” approach, and
“short vectors” are treated in Chapter 16.
15.2. A factoring algorithm 435

modular
big prime prime power
factorization

❄ ✙ ❥ ❄
finding
factor combination short vectors
factors

F IGURE 15.1: The building blocks of factoring algorithms for Z[x].

15.2. A factoring algorithm

Let f ∈ Z[x] be squarefree. We first have to understand for which primes p the
polynomial f = f mod p ∈ Z p [x] is squarefree, where the bar denotes the reduc-
tion modulo p. We abbreviate the discriminant of f as disc( f ) = res( f , f ′ ), where
′
is the formal derivative. (In the literature, there are other definitions of the dis-
criminant which differ slightly from ours.) Thus Corollary 14.25 says that f is
′
squarefree if and only if disc( f ) 6= 0. From Lemma 6.25 and the fact that f ′ = f
(Exercise 9.23), we obtain the following.

L EMMA 15.1. Let f ∈ Z[x] be a nonzero squarefree polynomial, p ∈ N a prime

not dividing lc( f ), and denote by a bar the reduction modulo p. Then f is square-
free if and only if p ∤ disc( f ).

A closer look at the Sylvester matrix shows that lc( f ) divides res( f , f ′ ) (Ex-
ercise 6.41). Hence f is squarefree if p does not divide res( f , f ′ ) ∈ Z \ {0}; the
resultant is nonzero because f is squarefree.

f =

≡ mod p

F IGURE 15.2: Factorization patterns in Z[x] and F p [x].

Now let f = f1 · · · fk be the factorization of f into (primitive) irreducible poly-

nomials f1 , . . . , fk in Z[x]. Then

f = f1 · · · fk = lc( f )g1 · · · gr

in F p [x], where g1 , . . . , gr are the monic irreducible factors of f in F p [x]. This

is illustrated in Figure 15.2 for k = 3 and r = 7: the boxes represent irreducible
factors, the width of a box corresponds to the degree, and a box in the upper row
splits modulo p into the boxes in the lower row of the same color. Suppose that by
436 15. Hensel lifting and factoring polynomials

factoring f modulo p, we have computed all the gi . Let f1 be an irreducible factor

of f (say, the gray box in the upper row of Figure 15.2). Modulo p some of the
gi ’s divide f1 (the gray boxes in the lower row). We let S ⊆ {1, . . . , r} be the set of
indices of the irreducible factors of f1 (which we do not know). Then

lc( f )
f1 ≡ lc( f ) ∏ gi mod p. (1)
lc( f1 ) i∈S

If p/2 is larger than the Mignotte bound (n + 1)1/2 2n | lc( f )| · || f ||∞ , then the coef-
ficients of lc( f ) f1 / lc( f1 ) are integers less than p/2 in absolute value, by Corol-
lary 6.33, and the polynomials in (1) are equal if we use symmetric representatives
between −(p − 1)/2 and (p − 1)/2 for the elements of F p . Therefore we can
construct f1 from the gi ’s and S.
Unfortunately, there seems to be no easy way to find the set S: in Figure 15.2,
we are only given the boxes in the lower row, but do not know which ones have
the same color. Trying all subsets of {1, . . . , r} leads to the following algorithm.
For compatibility with later algorithms, it has no step numbered 4. We recall
the max-norm || f ||∞ = maxi | fi | and the one-norm || f ||1 = ∑i | fi | of a polynomial
f = ∑i fi xi ∈ Z[x].

A LGORITHM 15.2 Factorization in Z[x] (big prime version).

Input: A squarefree primitive polynomial f ∈ Z[x] of degree n ≥ 1 with lc( f ) > 0
and max-norm || f ||∞ = A.
Output: The irreducible factors { f1 , . . . , fk } ⊆ Z[x] of f .

1. if n = 1 then return { f }
b ←− lc( f ), B ←− (n + 1)1/2 2n Ab

2. repeat
choose a random odd prime number p with 2B < p < 4B
f ←− f mod p
′
until gcd( f , f ) = 1 in F p [x]

3. { modular factorization }
compute g1 , . . . , gr ∈ Z[x] of max-norm less than p/2 that are nonconstant,
monic, and irreducible modulo p, such that f ≡ bg1 · · · gr mod p

5. { initialize the index set T of modular factors still to be treated, the set G of
factors found, and the polynomial f ∗ still to be factored }
T ←− {1, . . . , r}, s ←− 1, G ←− Ø, f ∗ ←− f

6. { factor combination }
while 2s ≤ #T do
15.2. A factoring algorithm 437

7. for all subsets S ⊆ T of cardinality #S = s do

8. compute g∗ , h∗ ∈ Z[x] with max-norm less than p/2 satisfying
g∗ ≡ b ∏i∈S gi mod p and h∗ ≡ b ∏i∈T \S gi mod p
9. if ||g∗ ||1 ||h∗ ||1 ≤ B then
T ←− T \ S, G ←− G ∪ {pp(g∗ )},
f ∗ ←− pp(h∗ ), b ←− lc( f ∗ )
break the loop 7 and goto 6
10. s ←− s + 1
11. return G ∪ { f ∗ }

We recall the multiplication time M (see inside back cover).

T HEOREM 15.3.
Algorithm 15.2 works correctly. If β = log B, then β ∈ O(n + log A), and the
expected cost of steps 2 and 3 is

O β 2 M(β ) log β + (M(n2 ) + M(n)β ) log n · M(β ) log β or O∼ (n3 + log3 A)

word operations. One execution of steps 8 and 9 takes

O((M(n) log n + n log β )M(β )) or O∼ (n2 + n log A)

word operations, and there are at most 2n+1 iterations.

P ROOF. By Lemma 15.1, f mod p is squarefree in step 3, since p > B implies that
p ∤ b.
We show first that the condition in step 9 is true if and only if g∗ h∗ = b f ∗ . If
the latter holds, then ||g∗ ||1 ||h∗ ||1 ≤ B, by Corollary 6.33. Conversely, let g∗ and h∗
be as in step 8. Then g∗ h∗ ≡ b f ∗ mod p. Now ||g∗ h∗ ||∞ ≤ ||g∗ h∗ ||1 ≤ ||g∗ ||1 ||h∗ ||1 ≤
B < p/2 implies that both sides of the congruence have coefficients less than p/2
in absolute value, and hence they are equal.
For a factor u ∈ Z[x] of f , we denote by µ(u) the number of monic irreducible
factors which divide u modulo p; since F p [x] is a UFD, these factors form a subset
of {g1 , . . . , gr }. We show by induction that the invariants

f ∗ ≡ b ∏i∈T gi mod p, b = lc( f ∗ ), f = f ∗ ∏g∈G g,

each polynomial in G is irreducible, (2)
f ∗ is primitive and each of its irreducible factors u ∈ Z[x] has µ(u) ≥ s
hold each time the algorithm passes through step 6. This is clear initially. So
we assume that the invariants hold before step 8, and that the condition in step 9
438 15. Hensel lifting and factoring polynomials

is true for some subset S ⊆ T of cardinality s. Then g∗ h∗ = b f ∗ , by the above,

and by Gauß’ lemma, pp(g∗ ) is a factor of pp(b f ∗ ) = f ∗ . Since µ(g∗ ) = s and
µ(u) ≥ s for each irreducible factor u of f ∗ , pp(g∗ ) is an irreducible factor of f . The
actions taken in step 9 then ensure that the invariants hold the next time through
step 6. Now we assume that the condition in step 9 is false for all subsets S ⊆ T
of cardinality s, but that f ∗ has an irreducible factor g ∈ Z[x] with µ(g) = s. Then
there is a subset S ⊆ T of cardinality s such that g ≡ lc(g) ∏i∈S gi mod p, since
F p [x] is a UFD. We set h = f ∗ /g ∈ Z[x]. Then lc(g) lc(h) = lc( f ∗ ) = b,

lc(h)g ≡ b ∏ gi ≡ g∗ mod p, and lc(g)h ≡ b ∏ gi ≡ h∗ mod p.

i∈S i∈T \S

By Mignotte’s bound 6.33, the coefficients of lc(h)g and lc(g)h are at most B <
p/2 in absolute value. Since also ||g∗ ||∞ , ||h∗ ||∞ < p/2, we have lc(h)g = g∗ ,
lc(g)h = h∗ , g∗ h∗ = b f ∗ , and the condition in step 9 is true for that particular
subset S of T . This contradiction shows that f ∗ has no irreducible factor g with
µ(g) = s, and step 10 guarantees that the invariants hold again at the next pass
through step 6.
It remains to show that f ∗ is irreducible if 2s > #T in step 6. Let g ∈ Z[x] be
an irreducible factor of f ∗ and h = f ∗ /g. By (2), we have s ≤ µ(g), µ(h) ≤ #T if
h is nonconstant. But µ(g) + µ(h) = #T , and s > #T /2 implies that h = ±1 and
f ∗ = ±g is irreducible.
For the running time estimate, we first note that b ≤ || f ||∞ = A, and hence
β ∈ O(n + log A). In Section 18.4, we show that a random prime as required in
step 2 can be found by a probabilistic algorithm using O(β 2 M(β ) log β ) word op-
erations, and that p ∤ disc( f ) with probability at least 1/2 (Corollary 18.12). Hence
the expected number of iterations of step 2 is at most two. The cost for the gcd is
O(M(n) log n) arithmetic operations in F p or O(M(n) log n M(β ) log β ) word oper-
ations. Thus the expected cost of step 2 is O((β 2 + M(n) log n)M(β ) log β ) word
operations. By Corollary 14.30, step 3 can be done with O((M(n2 )+M(n)β ) log n)
arithmetic operations in F p . Each of these in turn takes O(M(β ) log β ) word oper-
ations, by Corollary 11.13, and the expected number of word operations for step 3
is O((M(n2 ) + M(n)β ) log n · M(β ) log β ).
Computing g∗ and h∗ in step 8 can be done with O(M(n) log n) additions and
multiplications modulo p, by Lemma 10.4, or O(M(n) log n M(β )) word opera-
tions. The primitive parts in step 9 can be computed with at most n gcds of integers
absolutely bounded by B, or O(n M(β ) log β ) word operations. This has to be done
k ≤ n many times. Between two subsequent times that the condition in step 9 is
true, there are at most 2#T executions of step 8. Now #T decreases by at least one
if the condition is true, and hence the total number of iterations of step 8 is at most

∑ 2i ≤ 2r+1 ≤ 2n+1 . ✷
1≤i≤r
15.2. A factoring algorithm 439

E XAMPLE 15.4. Let f = 6x4 + 5x3 + 15x2 + 5x + 4 ∈ Z[x]. Then f is primitive,

n = 4, A = 15, b = lc( f ) = 6, and we have f ′ = 24x3 +15x2 +30x+5 √ and disc( f ) =
1/2 n
19 250 814, so that f is squarefree, and B = (n + 1) 2 Ab = 1440 5 ≈ 3219.9.
Suppose that we choose p = 6473 > 2B in step 2. Then p ∤ b, and f and f ′ are
coprime modulo p. In fact, the only “unlucky” primes for f , with p | lc( f ) or f
mod p not squarefree, are the prime divisors 2, 3, 11, 31, and 97 of disc( f ).
In step 3, we obtain the modular factorization

f ≡ bg1 g2 g3 g4 = 6(x − 819)(x + 605)(x + 2632)(x + 2977) mod p.

We now trace steps 8 and 9 for two specific subsets S ⊆ {1, . . . , 4}. It turns out that
the condition in step 9 is false for all subsets S ⊆ {1, . . . 4} of cardinality s = 1, and
f ∗ = f has no linear factor. For s = 2 and S = {1, 2}, we compute

g∗ ≡ bg1 g2 = 6(x − 819)(x + 605) ≡ 6x2 − 1284x − 1863 mod p,

h∗ ≡ bg3 g4 = 6(x + 2632)(x + 2977) ≡ 6x2 + 1289x − 615 mod p

in step 8. Obviously ||g∗ ||1 ||h∗ ||1 ≥ ||g∗ ||∞ ||h∗ ||∞ = 1863 · 1289 > B in step 9, and in
fact g∗ h∗ 6= b f ∗ , which can be seen by comparing the constant coefficients

g∗ (0)h∗ (0) = (−1863) · (−615) 6= 24 = 6 · 4 = b f ∗ (0).

On the other hand, if we take S = {1, 4}, then

g∗ ≡ bg1 g4 = 6(x − 819)(x + 2977) ≡ 6x2 + 2x + 2 mod p,

h∗ ≡ bg2 g3 = 6(x + 605)(x + 2632) ≡ 6x2 + 3x + 12 mod p

in step 8, ||g∗ ||1 ||h∗ ||1 = 10 · 21 < B in step 9, and in fact g∗ h∗ = b f ∗ , so that
pp(g∗ ) = 3x2 + x + 1 and pp(h∗ ) = 2x2 + x + 4 are the irreducible factors of f
in Z[x]. ✸

Before computing g∗ and h∗ in step 8, one will test first whether the constant
coefficients of g∗ h∗ and b f ∗ are equal (unless f (0) = 0, which can be ruled out in
advance), as in Example 15.4. They can be computed with at most r multiplica-
tions of integers of absolute value at most B or O(r · M(n+log A)) word operations,
which is much faster than the worst case bound for steps 8 and 9 in the theorem.
In practice, most unsuccessful g∗ and h∗ already fail this simple test. Instead of
multiplying with b, we might also compute the monic associates of g∗ and h∗ by
rational number reconstruction (Section 5.10). If the constant coefficient of f is
smaller than b, then exchanging the roles of the leading and the constant coeffi-
cient decreases the required size of p. These remarks also apply to the prime power
algorithms 15.19 and 15.22 below and the corresponding algorithms in Chapter 16.
To factor an arbitrary polynomial f ∈ Z[x], we might apply Algorithm 15.2 to
the squarefree part h/ gcd(h, h′ ) ∈ Z[x] of h = pp( f ) and afterwards determine the
440 15. Hensel lifting and factoring polynomials

multiplicities. We use an alternative approach, where we apply it to the polyno-

mials in the squarefree decomposition of h (Section 14.6). Here is the algorithm
for factoring in Q[x] arbitrary polynomials with integer coefficients.

A LGORITHM 15.5 Polynomial factorization in Q[x].

Input: A polynomial f ∈ Z[x] with deg f = n ≥ 1 and max-norm || f ||∞ = A.
Output: A constant c ∈ Z and a set of pairs {( f1 , e1 ), . . . , ( fk , ek )} such that the
fi ∈ Z[x] are irreducible and pairwise coprime, the ei ∈ N are positive, and
f = c f1e1 · · · frer .

1. c ←− cont( f ), g ←− pp( f )
if lc( f ) < 0 then c ←− −c, g ←− −g

2. call Yun’s algorithm 14.21 to compute the squarefree decomposition g =

∏1≤i≤s gii of g, with squarefree and pairwise coprime polynomials g1 , . . . , gs
in Z[x] such that lc(gi ) > 0 for all i and gs is nonconstant.

3. G ←− Ø
for i = 1, . . . , s do

4. call Algorithm 15.2 to compute all irreducible factors h1 , . . . , ht ∈ Z[x]

of gi
G ←− G ∪ {(h1 , i), . . . , (ht , i)}

5. return c and G

If we use Algorithm 14.21 literally in step 2, then it computes monic poly-

nomials with rational coefficients. We obtain the squarefree decomposition into
primitive polynomials in Z[x] by multiplying with a common denominator, or by
performing all gcd computations in Algorithm 14.21 in Z[x] instead of Q[x], or by
using a modular approach (Exercises 15.26 and 15.27).
We forego a cost analysis, because Algorithms 15.2 and 15.5 have exponential
running time in the worst case. This does not follow from the upper bound of
Theorem 15.3, but we will see in Section 15.3 that there are polynomials for which
the algorithm takes exponentially many steps. In Chapter 16, we will present and
analyze a polynomial-time algorithm for factoring in Q[x].
Algorithm 15.2 can be easily adapted to bivariate polynomials over a field F,
provided that we know how to factor univariate polynomials over algebraic exten-
sions of F. We have solved this problem over finite fields, but the prime power
algorithm in Section 15.6 below only requires univariate factorization over F it-
self, and thus also applies to F = Q. For this reason, we forego the details of the
adaptation mentioned above.
15.3. Frobenius’ and Chebotarev’s density theorems 441

15.3. Frobenius’ and Chebotarev’s density theorems

We briefly explain how the factorization of f ∈ Z[x] is related to factorizations of
f mod p ∈ F p [x] for various primes p ∈ Z. This section requires some familiarity
with basic Galois theory. It is not required later and may be skipped at first reading.
We may assume that f is primitive and p ∤ lc( f ), and let f = f1 · · · fk be its
factorization, with f1 , . . . , fk ∈ Z[x] irreducible. Then of course

f mod p = ( f1 mod p) · · · ( fk mod p)

in F p [x]. The computational difficulty is that some fi mod p may not be irreducible
in F p [x], and that the factor combination stage may have to try exponentially many
combinations of the irreducible factors that were computed modulo p.
An example of a “bad” polynomial is the ith Swinnerton-Dyer polynomial
√ √ √ √
f = ∏(x ± 2 ± 3 ± 5 ± · · · ± pi ) ∈ Z[x],

where pi is the ith prime and the product runs over all 2i possible combinations of
+ and − signs. It follows from Galois theory that f is an irreducible polynomial
i
of
√ degree 2 in √ Z[x]. But since for any prime p, F p2 contains all the square roots
2 mod p, . . . , pi mod p, the reduction of f modulo p splits into linear factors
over F p2 . Hence the irreducible factors of f mod p in F p [x] are either all linear
(namely, if 2, 3, . . . , pi are squares modulo p) or all quadratic, if p does not divide
the discriminant of f (Exercise 15.8), and there are at least 2i−1 = n/2 of them,
where n = 2i is the degree of f . Then the factorization algorithm 15.2 will run
through about 2n/4 sets S before it is finally able to decide that f is irreducible.
Other examples of “bad” polynomials are the cyclotomic polynomials Φn , which
are irreducible over Q but split modulo each prime for most n (Exercise 15.7).
The Swinnerton-Dyer polynomials and the cyclotomic polynomials make the
factor combination stage work really hard. But is that typical? For example, we
have used the fact that squarefree polynomials “usually” remain squarefree mod-
ulo a prime. Can we hope that “usually” an irreducible polynomial in Z[x] remains
irreducible modulo a prime? The answer is no, and the powerful theorems of Fro-
benius (1896) and Chebotarev (1926) give precise information. Their explanation
requires some concepts not used elsewhere in this text; the background and a proof
can be found in Stevenhagen & Lenstra (1996).
So we have a primitive polynomial f ∈ Z[x] of degree n, irreducible over Q. We
let G be the Galois group of the splitting field of f over Q. Each automorphism in
G is a permutation of the n roots of f and has a unique decomposition into disjoint
cycles, say of lengths λ1 , . . . , λr . Then λ1 + · · · + λr = n, so that λ = (λ1 , . . . , λr )
is a partition of n. For an arbitrary partition λ of n, we let Hλ ⊆ G be the set of
those automorphisms that have cycle decomposition λ. Thus µ(λ) = #Hλ /#G is
the relative frequency with which the cycle type λ occurs in G.
442 15. Hensel lifting and factoring polynomials

E XAMPLE 15.6. We consider the three irreducible polynomials of degree n = 4

f1 = x4 − 6x3 − 5x2 + 8,
f2 = x4 + x3 + x2 + x + 1,
√ √
f3 = x4 − 10x2 + 1 = ∏(x ± 2 ± 3)

in Q[x]. The first one f1 was chosen at random from the monic polynomials in Z[x]
with coefficients absolutely less than 10, f2 = Φ5 is the 5th cyclotomic polynomial,
and f3 is a Swinnerton-Dyer polynomial. With appropriate numberings of the
roots of f1 , f2 , f3 , their Galois groups are Gal( f1 ) = S4 , the full symmetric group
on four letters, Gal( f2 ) = h(1234)i ∼ = Z4 , the cyclic group with four elements, and
Gal( f3 ) is Klein’s group V4 ∼ = Z2 × Z2 . The partitions of 4, that is, the possible
cycle types of automorphisms of f1 , f2 , f3 , are (1, 1, 1, 1), (2, 1, 1), (2, 2), (3, 1),
and (4). For each of the three polynomials and each cycle type λ, Table 15.3 lists
the automorphisms of type λ in the Galois group. The fractions in bold are the
relative frequencies. ✸

cycle type f1 f2 f3
1 1 1
(1, 1, 1, 1) id id id
24 4 4
(12), (13), (14), 6
(2, 1, 1)
(23), (24), (34) 24
(12)(34), (13)(24), 3 1 (12)(34), (13)(24), 3
(2, 2) (13)(24)
(14)(23) 24 4 (14)(23) 4
(123), (124), (132), 8
(3, 1) (134), (142), (143),
(234), (243) 24
(1234), (1243), (1324), 6 (1234), 2
(4)
(1342), (1423), (1432) 24 (1432) 4
TABLE 15.3: Cycle types of the Galois groups of f1 , f2 , f3 and their relative frequencies.

If we factor f modulo a prime p that does not divide res( f , f ′ ), then the de-
grees λ1 , . . . , λr of the irreducible factors also form a partition of n, the factor-
ization pattern λ = (λ1 , . . . , λr ) of f modulo p. For any partition λ, we can
consider the set Pλ of those primes where λ is this factorization pattern. Then
Frobenius’ density theorem says that Pλ has density µ(λ), so that a randomly
chosen prime is in Pλ with probability µ(λ). Chebotarev proved a stronger ver-
sion of this result, and Chebotarev’s density theorem has become much better
known than Frobenius’ theorem. For practical purposes, we would like to use this
kind of estimate for the primes in our algorithms. Unfortunately, nothing much
can be proved about this, because even the best versions of Chebotarev’s theorem
15.3. Frobenius’ and Chebotarev’s density theorems 443

(Lagarias & Odlyzko 1977, Oesterlé 1979) do not allow us to conclude that the
asymptotic density of Pλ already applies to the fairly small values of p that we
use (or of any size that leads to practical algorithms). However, case studies like
Example 15.6 give rise to the hope that more may be true than what can be proved
today.

pattern f1 f2 f3
(1, 1, 1, 1) 3.96% 24.84% 24.78%
(2, 1, 1) 25.30%
(2, 2) 12.70% 24.91% 75.22%
(3, 1) 33.18%
(4) 24.86% 50.25%

TABLE 15.4: Factorization patterns of f1 , f2 , f3 modulo the first 10 000 primes not dividing
the discriminant.

E XAMPLE 15.6 (continued). For each of the three polynomials f1 , f2 , f3 from Ex-
ample 15.6, Table 15.4 shows the frequencies of factorization patterns modulo the
first 10 000 primes where the polynomial is squarefree. One can see that these
approximate the relative frequencies of the conjugacy classes quite well. For ex-
ample, the partition (2, 2), corresponding to a factorization into two irreducible
quadratic factors, occurs in 12.70% of all cases for f1 , which is close to the fre-
quency 12.5% of the cycle type (2, 2) in Gal( f1 ). For f2 , this factorization pattern
occurs about twice as often, in 24.91% of all cases, and 25% of the four elements
of its Galois group have cycle type (2, 2). ✸

Now f is irreducible modulo p if and only if p ∈ P(n) . For a Swinnerton-Dyer

polynomial of degree n = 2i , the Galois group is isomorphic to Z2i , and µ((n)) = 0.
If G = Sn (which is the case for almost all random polynomials of degree n), then
µ((n)) = 1/n, so that f remains irreducible modulo p for a fraction of only about
1/n of all primes. Moreover, the average number of cycles of a random permuta-
tion in Sn is O(log n) (see Notes 15.3), and Chebotarev’s theorem implies that this
is also the average number of factors of f modulo a random prime p.
The reader who has absorbed the technique of modular algorithms may have
wondered why we do not use several small primes and Chinese remaindering to
factor polynomials. The reason is that the irreducible factors in Z[x] split inde-
pendently modulo different primes, obeying Frobenius’ rule, and it would be even
harder to fit them together than combining factors modulo a single prime. As a
more practical approach, we might factor modulo several primes, choose one of
them as the p in the factor combination step, and for each possible combination
S ⊆ {1, . . . , r} check whether the degree deg ∏i∈S gi is compatible with all other
modular factorizations.
444 15. Hensel lifting and factoring polynomials

E XAMPLE 15.7. Suppose that f ∈ Z[x] of degree 8 is squarefree modulo 2, 3, 5

and has the factorization patterns (3, 3, 2) modulo 2, (5, 2, 1) modulo 3, and (4, 3, 1)
modulo 5. From the factorization modulo 2, we can conclude that the possible de-
grees of factors of f in Z[x] are D2 = {2, 3, 5, 6, 8}. Similarly, from the factoriza-
tions modulo 3 and 5, we obtain D3 = {1, 2, 3, 5, 6, 7, 8} and D5 = {1, 3, 4, 5, 7, 8}.
Taking the intersection D2 ∩ D3 ∩ D5 = {3, 5, 8}, we conclude that every noncon-
stant factor of f ∈ Z[x] has degree 3, 5, or 8, so that f is either irreducible or it
splits into two irreducible factors of degree 3 and 5. ✸

15.4. Hensel lifting

In Chapter 9, we employed Newton iteration for computing inverses and solving
polynomial equations modulo pl over a ring R, where an initial solution modulo
p ∈ R is given. Hensel (1918) invented a similar method to compute factoriza-
tions of polynomials modulo pl , which Zassenhaus (1969) introduced into modern
factorization algorithms, and which—no surprise—had been known to Gauß.
This is the third of the modular algorithms mentioned at the beginning of Chap-
ter 5. In the big prime version, we took a large enough prime, and in the small
primes version, we took small prime numbers m1 , . . . , ml so that m1 · · · ml is large
enough. Now we take a single small prime p and an integer l so that pl is large
enough.
We start with the simplest idea and then add more technical details. Let R be
a ring (commutative, with 1, as usual, our two main examples being R = Z and
R = F[y] for a field F), f , g, h ∈ R[x], and m ∈ R such that f ≡ gh mod m. We
want to “lift” this to a factorization f ≡ ĝĥ mod m2 . We assume that we have
s,t ∈ R[x] with sg+th ≡ 1 mod m, and think of g and h as being coprime modulo m.
When R/hmi is a field, we can find these via the Extended Euclidean Algorithm in
(R/hmi)[x]. We now calculate

e = f − gh, ĝ = g + te, ĥ = h + se, (3)

and find

f − ĝĥ = f − gh − gse − hte − ste2 = f − gh − (sg + th)e − ste2

= (1 − sg − th)e − ste2 ≡ 0 mod m2 ,

since e ≡ 0 mod m and 1 − sg − th ≡ 0 mod m. Hence f ≡ ĝĥ mod m2 , so that ĝĥ

is a factorization of f modulo m2 . Starting with a prime p for m and proceeding
inductively (and simultaneously lifting the congruence sg +th ≡ 1), we can lift the
factorization modulo arbitrary powers of p.

E XAMPLE 15.8. In Example 9.24, we computed a nontrivial solution to x4 −1 ≡ 0

mod 625 using Newton iteration with the starting value x = 2 modulo 5. This can
15.4. Hensel lifting 445

also be regarded as lifting the factorization

x4 − 1 ≡ (x − 2)(x3 + 2x2 − x − 2) mod 5
to a factorization modulo 625. In the above setting, we then have f = x4 − 1,
p = 5, g = x3 + 2x2 − x − 2, and h = x − 2. The polynomials g and h are coprime
modulo 5, and with the Extended Euclidean Algorithm in F5 [x], we get s = −2 and
t = 2x2 − 2x − 1 such that sg + th ≡ 1 mod 5. With m = p, we obtain
e = f − gh = 5x2 − 5,
ĝ = g + te = 10 x4 − 9 x3 − 13 x2 + 9 x + 3,
ĥ = h + se = −10 x2 + x + 8,
and in fact
f − ĝĥ = 25 · (4 x6 − 4 x5 − 8 x4 + 7 x3 + 5 x2 − 3 x − 1) ≡ 0 mod 25,
so that f ≡ ĝĥ mod 25. ✸

The above example shows one drawback to our first approach: the degrees of ĝ
and ĥ are higher than those of g and h, in particular their sum exceeds the degree
of f . This may happen because the multiples of m are zero divisors modulo m2 ,
and hence the product of the leading coefficients of two polynomials may vanish
modulo m2 .
To overcome this problem, we use division with remainder in R[x]. Since R is
not a field, this is not always possible. The following lemma states that division
with remainder by monic polynomials always works.

L EMMA 15.9. (i) Let f , g ∈ R[x], with g nonzero and monic. Then there exist
unique polynomials q, r ∈ R[x] with f = qg + r and deg r < deg g.
(ii) If f , g, q, r are as in (i) and f ≡ 0 mod m for some m ∈ R, then q ≡ r ≡ 0
mod m.

Part (i) has been proven in Section 2.4, and the proof of (ii) is Exercise 15.12. We
do not need the coefficients of the new polynomials exactly, but only modulo m2 .
This means that over a Euclidean domain R we can reduce them accordingly and
keep their sizes small. Here are the formulas that work.

A LGORITHM 15.10 Hensel step.

Input: An element m in a (commutative) ring R, and polynomials f , g, h, s,t ∈ R[x]
such that
f ≡ gh mod m and sg + th ≡ 1 mod m,
lc( f ) is not a zero divisor modulo m, h is monic, deg f = n = deg g + deg h,
deg s < deg h, and degt < deg g.
446 15. Hensel lifting and factoring polynomials

Output: Polynomials g∗ , h∗ , s∗ ,t ∗ ∈ R[x] such that

f ≡ g∗ h∗ mod m2 and s∗ g∗ + t ∗ h∗ ≡ 1 mod m2 ,

h∗ is monic, g∗ ≡ g mod m, h∗ ≡ h mod m, s∗ ≡ s mod m, and t ∗ ≡ t mod m,

deg g∗ = deg g, deg h∗ = deg h, deg s∗ < deg h∗ , and degt ∗ < deg g∗ .
1. compute e, q, r, g∗ , h∗ ∈ R[x] such that deg r < deg h and
e ≡ f − gh mod m2 , se ≡ qh + r mod m2 ,
(4)
g∗ ≡ g + te + qg mod m2 , h∗ ≡ h + r mod m2

2. compute b, c, d, s∗ ,t ∗ ∈ R[x] such that deg d < deg h∗ and

b ≡ sg∗ + th∗ − 1 mod m2 , sb ≡ ch∗ + d mod m2 ,
(5)
s∗ ≡ s − d mod m2 , t ∗ ≡ t − tb − cg∗ mod m2

3. return g∗ , h∗ , s∗ ,t ∗

E XAMPLE 15.8 (continued). We let m = 5 and f , g, h, s,t, e in Z[x] be as in Exam-

ple 15.8. Division of se by h with remainder yields q = −10 x + 5 and r = −5 with
se ≡ qh + r mod 25. Moreover, we compute

g∗ ≡ g + te + qg ≡ x3 + 7x2 − x − 7 mod 25, h∗ ≡ h + r ≡ x − 7 mod 25.

Then f ≡ g∗ h∗ mod 25, and the degrees of g∗ , h∗ are the same as those of g and h;
the polynomials are simpler than ĝ, ĥ as calculated before. As in Example 9.24,
we obtain that 7 is a solution to x4 − 1 ≡ 0 mod 25 that is congruent to the starting
solution 2 modulo 5.
To obtain s∗ ,t ∗ , which we need for the next iteration, we compute

b ≡ sg∗ + th∗ − 1 ≡ −5x2 − 10x − 5 mod 25.

Polynomial division yields c = 10x − 10 and d = −10 with sb ≡ ch∗ + d mod 25.
Now

s∗ ≡ s − d ≡ 8 mod 25, t ∗ ≡ t − tb − cg∗ ≡ −8x2 − 12x − 1 mod 25.

Then indeed s∗ g∗ + t ∗ h∗ ≡ 1 mod 25, and the degrees of s∗ ,t ∗ agree with those of
s,t, respectively. ✸

T HEOREM 15.11.
Algorithm 15.10 works correctly as specified. It uses O(M(n)M(log m)) word
operations if R = Z, m > 1, and all inputs have max-norm less than m2 , and
O(M(n)M(degy m)) operations in the field F if R = F[y] and the degree in y of
all inputs is less than 2 degy m.
15.4. Hensel lifting 447

P ROOF. For the correctness, we only prove the claims about g∗ , h∗ ; those for s∗ ,t ∗
are left as Exercise 15.17. We calculate

f − g∗ h∗ ≡ f − (g + te + qg)(h + se − qh)
= f − gh − (sg + th)e − ste2 − (sg − th)qe + ghq2
≡ (1 − sg − th)e − ste2 − (sg − th)qe + ghq2 ≡ 0 mod m2 ,

since we have 1 − sg − th ≡ e ≡ 0 mod m, by assumption, and q ≡ 0 mod m, by

Lemma 15.9. These also imply that g∗ ≡ g mod m and h∗ ≡ h mod m. Now deg r <
deg h by construction, and thus h∗ is also monic and of the same degree as h.
Finally, f ≡ g∗ h∗ mod m2 and h∗ monic together with the assumption that lc( f ) is
not a zero divisor modulo m imply that deg g∗ = deg f − deg h∗ = deg f − deg h =
deg g.
We note that for any arithmetic operation (addition, multiplication, division with
remainder) of polynomials in Z[x] occurring in the algorithm, their degrees are at
most 2n, and their coefficients are less than m4 , with length in O(log m). Thus
the cost for one arithmetical operation in Z is O(M(log m)) word operations. The
number of arithmetic operations in Z for one polynomial addition, multiplication,
or division with remainder by a monic polynomial is O(M(n)). The estimate for
R = F[y] goes analogously. ✷

T HEOREM 15.12 Hensel’s lemma.

Given a nonzero p ∈ R, l ∈ N>0 , and assuming the input specification of Algorithm
15.10 for m = p, we can compute polynomials as in the output specification, but
with m2 replaced by pl .

P ROOF. Apply the Hensel step inductively for m = p, p2 , p4 , . . .. ✷

E XAMPLE 15.8 (continued). Let p = 5 and f , g1 = g∗ , h1 = h∗ , s1 = s∗ , t1 = t ∗ in

Z[x] as in Example 15.8. Then f ≡ g1 h1 mod 25 and s1 g1 + t1 h1 ≡ 1 mod 25, and
another execution of Algorithm 15.10 yields

e2 ≡ 50 x2 − 50 mod 625,
q2 ≡ −225x + 300 mod 625, r2 ≡ −175 mod 625,
3 2
g2 ≡ x + 182 x − x − 182 mod 625, h2 ≡ x − 182 mod 625,
2
b2 ≡ −225x + 300x − 25 mod 625,
c2 ≡ 75x − 200 mod 625, d2 ≡ 275 mod 625,
s2 ≡ −267 mod 625, t2 ≡ 267x2 − 312x − 176 mod 625.

Then s2 g2 + t2 h2 ≡ 1 mod 625, which we don’t actually need if we are only inter-
ested in a factorization modulo p4 , f ≡ g2 h2 mod 625, and as in Example 9.24,
448 15. Hensel lifting and factoring polynomials

we see that 182 is the fourth root of 1 modulo 625 that is congruent to the starting
solution 2 modulo 5. ✸

E XAMPLE 15.13. Let p = 5 and f = x4 − 1, as in Example 15.8. Then f has the

roots 1, 2, −2, −1 modulo 5, and hence

f ≡ (x − 1)(x − 2)(x + 2)(x + 1) ≡ (x2 + 2x + 2)(x2 − 2x + 2) mod 5.

We take g0 = x2 + 2x + 2 and h0 = x2 − 2x + 2 and compute s0 = −2x − 1 and

t0 = 2x − 1 such that s0 g0 + t0 h0 ≡ 1 mod 5 by the Extended Euclidean Algorithm
in F5 [x]. Two Hensel steps then yield the following.

e1 ≡ −5 mod 25,
q1 = 0, r1 ≡ 10x + 5 mod 25,
2
g1 ≡ x − 8x + 7 mod 25, h1 ≡ x2 + 8x + 7 mod 25,
b1 ≡ 5x2 + 10 mod 25,
c1 ≡ −10x mod 25, d1 ≡ −10 mod 25,
s1 ≡ −2x + 9 mod 25, t1 ≡ 2x + 9 mod 25,
2
e2 ≡ 50x − 50 mod 625,
q2 ≡ −100x mod 625, r2 ≡ 175x + 175 mod 625,
2
g2 ≡ x − 183x + 182 mod 625, h2 ≡ x2 + 183x + 182 mod 625,
b2 ≡ 125x2 + 150 mod 625,
c2 ≡ −250x mod 625, d2 ≡ 200x + 100 mod 625,
s2 ≡ −202x − 91 mod 625, t2 ≡ 202x − 91 mod 625. ✸

The following result corresponds to the uniqueness of Newton iteration (Theo-

rem 9.27). It says essentially that two liftings modulo pl of the same factorization
modulo p coincide modulo pl .

T HEOREM 15.14 Uniqueness of Hensel lifting.

Let R be a ring (commutative, with 1), p ∈ R not a zero divisor, l ∈ N≥1 , and
g, h, g∗ , h∗ , s,t ∈ R[x] nonzero such that sg + th ≡ 1 mod p, the leading coefficients
lc(g), lc(h) are not zero divisors modulo p, the polynomials g and g∗ have the same
leading coefficient and the same degree, and coincide modulo p, and similarly for
h and h∗ . If gh ≡ g∗ h∗ mod pl , then g ≡ g∗ mod pl and h ≡ h∗ mod pl .

P ROOF. We assume to the contrary that g 6≡ g∗ mod pl or h 6≡ h∗ mod pl , and let

1 ≤ i < l be maximal such that pi divides both g∗ − g and h∗ − h. Thus g∗ − g = upi
and h∗ − h = vpi for some u, v ∈ R[x] such that p ∤ u or p ∤ v. We may assume
15.4. Hensel lifting 449

that p ∤ u. Now

0 ≡ g∗ h∗ − gh = g∗ (h∗ − h) + h(g∗ − g) = (g∗ v + hu)pi mod pl .

Since p is not a zero divisor, we have p | pl−i | (g∗ v + hu). We denote by a bar the
reduction modulo p. Then sg + th = 1, g∗ = g, and g∗ v + hu = 0. Thus

0 = t(g∗ v + hu) = tgv + (1 − sg)u = (tv − su)g + u,

and hence g | u. Since lc(g) = lc(g∗ ) and deg g = deg g∗ , we have deg u < deg g.
Since lc(g) = lc(g) is not a zero divisor, neither is g, and u is the zero polynomial.
This contradicts our assumption that p ∤ u, and the claim is proved. ✷

The following consequence will be used in Section 16.5.

C OROLLARY 15.15.
Let R be a Euclidean domain, p ∈ R prime, l ∈ N>0 , f , g, u ∈ R[x] nonzero such that
p ∤ lc( f ), f mod p is squarefree, g divides f in R[x], and u is monic, nonconstant,
and divides f modulo pl and g modulo p. Then u divides g modulo pl .

P ROOF. Let h, v, w ∈ R[x] such that f = gh ≡ uw mod pl and g ≡ uv mod p. Since

f mod p is squarefree, so is g mod p, and gcd(u mod p, v mod p) = 1 in F p [x].
Then Hensel’s lemma 15.12 yields u∗ , v∗ ∈ R[x] such that u∗ ≡ u mod p, v∗ ≡ v
mod p, and g ≡ u∗ v∗ mod pl . Now uvh ≡ gh ≡ uw mod p implies vh ≡ w mod p.
Thus v∗ h ≡ vh ≡ w mod p and u∗ · (v∗ h) ≡ gh = f ≡ uw mod pl . Together with
the fact that u and v are coprime modulo p, Theorem 15.14 gives u ≡ u∗ mod pl ,
and finally g ≡ uv∗ mod pl . ✷

There is also an infinite version of Hensel’s lemma. Let p ∈ R be prime and R(p)
the p-adic completion of R (Section 9.6). If R = Z, then this is the ring Z(p) of
p-adic integers, whose elements can be represented by “power series in p” of the
form ∑i≥0 ai pi with 0 ≤ ai < p for all i ∈ N. If R = F[y] for a field F and p = y,
then R(p) = F[[y]] is the ring of formal power series in y with coefficients in F.

T HEOREM 15.16 Hensel’s lemma, infinite version.

Assuming the input specification of Algorithm 15.10 for a prime m = p ∈ R, there
exist polynomials as in the output specification, but with all congruences modulo
m2 replaced by equalities in R(p) .

The general multivariate Newton iteration works as follows. One has n functions
ϕ = (ϕ1 , . . . , ϕn ) in n variables y1 , . . . , yn , and from an approximation a ∈ R n to a
450 15. Hensel lifting and factoring polynomials

common root of ϕ1 , . . . , ϕn obtains a better one a∗ as a∗ = a − J −1 (a) f (a), where

J = (∂ϕi /∂y j )1≤i, j≤n ∈ R[y1 , . . . , yn ]n×n is the Jacobian of ϕ.
We have seen in Example 15.8 that Hensel lifting generalizes Newton iteration.
But Hensel lifting can also be considered as a special case of multivariate Newton
iteration, as follows. We regard the coefficients of g and h in a factorization f = gh
as indeterminates. Comparing the coefficients of xn−1 , . . . , x, 1 on both sides, we
obtain n equations for the n unknown coefficients, and a common solution to those
equations corresponds to a factorization of f . Invertibility of J is equivalent to
coprimality of g and h (Exercise 15.21).

15.5. Multifactor Hensel lifting

In this section, we will employ the Hensel step algorithm 15.10 to lift a factor-
ization into arbitrarily many factors. Let R be a ring, p ∈ R, and g, h ∈ R[x]. We
say that g and h are Bézout-coprime modulo p if there exist s,t ∈ R[x] such that
sg + th ≡ 1 mod p. If R/hpi is a field, then this just means that g mod p and h
mod p are coprime in the usual sense.

A LGORITHM 15.17 Multifactor Hensel lifting.

Input: An element p in a ring R (commutative, with 1), f ∈ R[x] of degree n ≥ 1
such that lc( f ) is a unit modulo p, monic nonconstant polynomials f1 , . . . , fr ∈
R[x] that are pairwise Bézout-coprime modulo p and satisfy f ≡ lc( f ) f1 · · · fr
mod p, and l ∈ N.
Output: Monic polynomials f1∗ , . . . , fr∗ ∈ R[x] with f ≡ lc( f ) f1∗ · · · fr∗ mod pl and
fi∗ ≡ fi mod p for all i.

1. if r = 1 then compute f1∗ ∈ R[x] with f ≡ lc( f ) f1∗ mod pl and return f1∗

2. k ←− ⌊r/2⌋, d ←− ⌈log2 l⌉

3. compute g0 , h0 ∈ R[x] such that g0 ≡ lc( f ) f1 · · · fk mod p and h0 ≡ fk+1 · · · fr

mod p

4. compute s0 ,t0 ∈ R[x] such that s0 g0 + t0 h0 ≡ 1 mod p, deg s0 < deg h0 , and
degt0 < deg g0 , using the Extended Euclidean Algorithm if R/hpi is a field
and Exercise 15.29 otherwise

5. for j = 1, . . . , d do
j−1
6. call the Hensel step algorithm 15.10 with m = p2 to lift the con-
j−1
gruences f ≡ g j−1 h j−1 and s j−1 g j−1 + t j−1 h j−1 ≡ 1 modulo p2 to
j
congruences f ≡ g j h j and s j g j + t j h j ≡ 1 modulo p2
7. g ←− gd , h ←− hd
15.5. Multifactor Hensel lifting 451

8. call the algorithm recursively to compute f1∗ , . . . , fk∗ ∈ R[x] satisfying g ≡

lc(g) f1∗ · · · fk∗ mod pl and fi∗ ≡ fi mod p for 1 ≤ i ≤ k

9. call the algorithm recursively to compute fk+1

∗
, . . . , fr∗ ∈ R[x] satisfying h ≡
l
fk+1 · · · fr mod p and fi ≡ fi mod p for k < i ≤ r
∗ ∗ ∗

10. return f1∗ , . . . , fr∗

T HEOREM 15.18.
Algorithm 15.17 works correctly as specified.

(i) If R = Z, p ∈ N is prime, || f ||∞ < pl , and || fi ||∞ < p for all i, then the algo-
rithm takes

O (M(n)M(l µ) + M(n) log n · M(µ) + n M(µ) log µ) log r

or O∼ (nl µ) word operations, where µ = log p.

(ii) If R = F[y] for a field F, p is irreducible, degy f < l degy p, and degy fi <
degy p for all i, then the algorithm takes

O (M(n)M(l µ) + M(n) log n · M(µ) + n M(µ) log µ) log r

or O∼ (nl µ) arithmetic operations in F , where µ = degy p.

P ROOF. Correctness follows from Theorem 15.11 by induction on j and on r.

We only prove the running time estimate (i); case (ii) is similar and left as Exer-
cise 15.22. Let ni = deg fi for 1 ≤ i ≤ r, and let T (n1 , . . . , nr ) denote the running
time of the algorithm in word operations. For r = 1, we first compute in step 1
lc( f )−1 mod p by the Extended Euclidean Algorithm, taking O(M(µ) log µ) word
operations, by Corollary 11.11. Then we use the Newton iteration algorithm 9.10
to lift the inverse modulo pl , in time O(M(l µ)), by Theorem 9.12. Multiplying
all coefficients of f by lc( f )−1 modulo pl takes O(n M(l µ)) word operations. We
conclude that T (n) ∈ O(n M(l µ) + M(µ) log µ).
Now let r ≥ 2. By Lemma 10.4, the cost for step 3 is O(M(n) log r) additions and
multiplications in the field F p . Step 4 takes O(M(n) log n) additions and multiplica-
tions in F p plus O(n) divisions, by Corollary 11.9. By Corollary 11.11, the cost for
steps 3 and 4 in word operations is therefore O(M(n) log n · M(µ) + n M(µ) log µ).
d−1
By Exercise 9.7, we can reduce all coefficients of f modulo p, p2 , p4 , . . . , p2
with O(n M(l µ)) word operations. Theorem 15.11 implies that the cost for the jth
execution of step 6 is O(M(n)M(2 j µ)) word operations. Summing this over j, the
superlinearity of M (Section 8.3) and the geometric sum ∑1≤ j≤d 2 j ≤ 4l imply that
the overall cost for steps 5 and 6 is O(M(n)M(l µ)) word operations.
452 15. Hensel lifting and factoring polynomials

We may assume that r is a power of two. Steps 8 and 9 take T (n1 , . . . , nk ) and
T (nk+1 , . . . , nr ) word operations, respectively. Adding costs leads to the recursive
inequalities

T (n) ≤ c(n M(l µ) + M(µ) log µ),

T (n1 , . . . , nr ) ≤ T (n1 , · · · , nk ) + T (nk+1 , . . . , nr )

+c M(n)M(l µ) + M(n) log n · M(µ) + n M(µ) log µ ,

for some positive constant c. Now

T (n1 , . . . , nr ) ≤ c M(n)M(l µ) + M(n) log n · M(µ) + n M(µ) log µ (1 + log r),

where n = n1 + · · · + nr , follows from the superlinearity of M (Section 8.3) by

induction on r. ✷

If we balance degrees in step 2, as discussed in Section 10.1, then the factor log r
in the timing estimate above may be replaced by the entropy H(n1 /n, . . . , nr /n),
where ni = deg fi for 1 ≤ i ≤ r.

E XAMPLE 15.13 (continued). We let l = 4. The factorization of f = x4 − 1 into

monic irreducible polynomials modulo 5 is

x4 − 1 ≡ f1 f2 f3 f4 = (x − 1)(x − 2)(x + 2)(x + 1) mod 5.

Algorithm 15.17 computes k = 2 and d = 2 in step 2 and

g ≡ f1 f2 ≡ x2 + 2x + 2 mod 5, h ≡ f3 f4 ≡ x2 − 2x + 2 mod 5

in step 3, and the Extended Euclidean Algorithm in F5 [x] yields s = −2x − 1 and
t = 2x − 1 in step 4. In Example 15.13, we have already performed the computa-
tions of steps 5 and 6, and we find g = x2 − 183x + 182 and h = x2 + 183x + 182 in
step 7. In steps 8 and 9, we recursively lift the factorizations

g ≡ f1 f2 = (x − 1)(x − 2) mod 5, h ≡ f3 f4 = (x + 2)(x + 1) mod 5

to factorizations

g ≡ f1∗ f2∗ = (x − 1)(x − 182) mod 625, h ≡ f3∗ f4∗ = (x + 182)(x + 1) mod 625.

Thus f ≡ (x − 1)(x − 182)(x + 182)(x + 1) mod 625, and the fourth roots of unity
modulo 625 = 54 are ±1 and ±182. ✸
15.6. Factoring using Hensel lifting: Zassenhaus’ algorithm 453

15.6. Factoring using Hensel lifting: Zassenhaus’ algorithm

Let R = Z and f ∈ Z[x] be a squarefree primitive polynomial of degree n. We will
now replace step 3 of the big prime algorithm 15.2 by factoring modulo a “small”
prime p ∈ N and subsequent Hensel lifting. As in Algorithm 15.2, we will choose
p in such a way that f mod p is squarefree and of the same degree as f , so that p
does not divide res( f , f ′ ). The bound on the resultant from Theorem 6.23 implies
that this is an easy task; the details are given later in Corollary 18.12.

A LGORITHM 15.19 Factorization in Z[x] (prime power version).

Input: A squarefree primitive polynomial f ∈ Z[x] of degree n ≥ 1 with lc( f ) > 0
and max-norm || f ||∞ = A.
Output: The irreducible factors { f1 , . . . , fk } ⊆ Z[x] of f .
1. if n = 1 then return { f }
b ←− lc( f ), B ←− (n + 1)1/2 2n Ab,
C ←− (n + 1)2n A2n−1 , γ ←− ⌈2 log2 C⌉
2. repeat choose a prime p ≤ 2γ ln γ , f ←− f mod p
′
until p ∤ b and gcd( f , f ) = 1 in F p [x]
l ←− ⌈log p (2B + 1)⌉
3. { modular factorization }
compute h1 , . . . , hr ∈ Z[x] of max-norm at most p/2 that are nonconstant,
monic, and irreducible modulo p, such that f ≡ bh1 · · · hr mod p
4. { Hensel lifting }
call Algorithm 15.17 to compute a factorization f ≡ bg1 · · · gr mod pl with
monic polynomials g1 , . . . , gr ∈ Z[x] of max-norm at most pl /2 such that
gi ≡ hi mod p for 1 ≤ i ≤ r
5. { initialize the index set T of modular factors still to be treated, the set G of
factors found, and the polynomial f ∗ still to be factored }
T ←− {1, . . . , r}, s ←− 1, G ←− Ø, f ∗ ←− f
6. { factor combination }
while 2s ≤ #T do
7. for all subsets S ⊆ T of cardinality #S = s do
8. compute g∗ , h∗ ∈ Z[x] with max-norm at most pl /2 satisfying
g∗ ≡ b ∏i∈S gi mod pl and h∗ ≡ b ∏i∈T \S gi mod pl
9. if ||g∗ ||1 ||h∗ ||1 ≤ B then
T ←− T \ S, G ←− G ∪ {pp(g∗ )},
f ∗ ←− pp(h∗ ), b ←− lc( f ∗ )
break the loop 7 and goto 6
454 15. Hensel lifting and factoring polynomials

10. s ←− s + 1
11. return G ∪ { f ∗ }

There are several ways to find suitable primes in step 2: we might try the small
primes 2, 3, 5, . . . one after the other, or we might use a single precision prime just
below the processor’s word length from a precalculated list. Both approaches work
well in practice, but do not yield a generally valid result since for some particular
input all primes from any fixed list might divide the discriminant. Another alterna-
tive which provably works is to choose p randomly; the required number theoretic
arguments will be discussed in Section 18.4.

T HEOREM 15.20.
Algorithm 15.19 works correctly. We have γ ∈ O(n log(nA)), and the expected
cost of steps 2 and 3 is

2 2
O γ log γ loglog γ + (M(n ) + M(n) log γ ) log n · M(log γ ) loglog γ

or O∼ (n2 + n log A) word operations. Step 4 takes O(M(n)M(n + log A) log n) or

O∼ (n2 + n log A) word operations. The cost for one execution of steps 8 and 9 is
O((M(n) log n + n log(n + log A))M(n + log A)) or O∼ (n2 + n log A)
word operations, and there are at most 2n+1 iterations.

P ROOF. The correctness proof of Theorem 15.3 carries over with the following
modifications. We replace the congruence in (2) by f ∗ ≡ b ∏i∈T gi mod pl . In
one part of that proof we assume that the condition in step 9 is false for all sub-
sets S ⊆ T of cardinality s, but that f ∗ has an irreducible factor g ∈ Z[x] with
µ(g) = s, and the fact that F p [x] is a UFD yields a set S ⊆ T of cardinality s
such that the condition in step 9 is true for that particular subset. Now Z pl [x] is
not a UFD in general (it even has nonzero zero divisors), and we have to replace
the argument by unique factorization in F p [x] plus an appeal to the uniqueness of
Hensel lifting (Theorem 15.14). Namely, let h = f ∗ /g and S ⊆ T with #S = s
be such that lc(h)g ≡ b ∏i∈S hi mod p and lc(g)h ≡ b ∏i∈T \S hi mod p. Now for
that same subset S, let g∗ ≡ b ∏i∈S gi mod pl and h∗ ≡ b ∏i∈T \S gi mod pl . Thus
b f ∗ ≡ lc(h)g · lc(g)h mod pl and b f ∗ ≡ g∗ h∗ mod pl are both liftings of the same
factorization of b f ∗ modulo p, and the uniqueness of Hensel lifting (Theorem
15.14) implies that lc(h)g ≡ g∗ mod pl and lc(g)h ≡ h∗ mod pl . Now B < pl /2,
by the choice of l, and as in the proof of Theorem 15.3, we arrive at the contradic-
tion that ||g∗ ||1 ||h∗ ||1 ≤ B holds in step 9.
Corollary 18.12 says that with O(γ log2 γ loglog γ ) word operations, we can find
a random p in step 2, and that p divides disc( f ) with probability at most 1/2.
15.6. Factoring using Hensel lifting: Zassenhaus’ algorithm 455

Now b | disc( f ), by Exercise 6.41, and Lemma 15.1 implies that the expected
number of iterations of step 2 is at most two. The length of p is in O(log γ ), the
cost for reducing all coefficients of f modulo p is O(n log A · log γ ) word oper-
ations, and the gcd takes O(M(n) log n · M(log γ ) loglog γ ) word operations. The
cost estimate for step 4 follows from Theorem 15.18 with µ = log p ∈ O(log γ ),
using log n log γ ∈ O(n + log A) and log γ loglog γ ∈ O(n + log A), and the rest of
the analysis is as in the proof of Theorem 15.3. ✷

E XAMPLE 15.4 (continued). Let f = 6x4 + 5x3 + 15x2 + 5x + 4 ∈ Z[x] be as in

Example 15.4, and suppose that we choose p = 5 in step 2, so that p divides neither
the leading coefficient nor the discriminant of f . Then l = ⌈log5 (2B + 1)⌉ = 6. In
step 3, we obtain the modular factorization

f ≡ bh1 h2 h3 h4 = 1 · (x − 1)(x + 1)(x − 2)(x + 2) mod 5.

In step 4, Algorithm 15.17 lifts it to the modular factorization

f ≡ bg1 g2 g3 g4 = 6 · (x − 5136)(x − 984)(x − 72)(x − 6828) mod 56 .

Among the subsets S ⊆ {1, 2, 3, 4} of cardinality s = 1, none leads to a factor-

ization. For S = {1, 3} we compute

g∗ ≡ bg1 g3 = 6(x − 5136)(x − 72) ≡ 6x2 + 2x + 2 mod 56 ,

h∗ ≡ bg2 g4 = 6(x − 984)(x − 6828) ≡ 6x2 + 3x + 12 mod 56 ,

in step 8 and obtain ||g∗ ||1 ||h∗ ||1 ≤ B and g∗ h∗ = b f ∗ in step 9. Thus pp(g∗ ) =
3x2 + x + 1 and pp(h∗ ) = 2x2 + x + 4 are the irreducible factors of f in Z[x], as in
Example 15.4. ✸

Steps 2 and 3 of the big prime algorithm 15.2 take about O∼ (n3 + log3 A) word
operations, while the cost for the corresponding steps 2 through 4 in the prime
power algorithm 15.19 is only about O∼ (n2 + n log A) word operations. If n ≈
log A, then the former is roughly cubic in n, while the latter is only quadratic.
Like Algorithm 15.2, Zassenhaus’ algorithm has exponential running time in the
worst case. Nevertheless, the algorithm works well in practice and should be used
in the complete factorization algorithm 15.5; this is confirmed by experiments in
Section 15.7. Collins (1979) showed, under a plausible but unproven hypothesis,
that the algorithm uses polynomial time “on the average”.
There is one further advantage of Zassenhaus’ algorithm over the big prime
approach: since the Mignotte bound determining l is usually far too large, we
may interleave Hensel lifting and factor combination, as follows. We first lift the
∗
factorization modulo pl for some l ∗ < l, then check whether some of the modular
factors are true factors in Z[x], remove these from f , and then lift the remaining
456 15. Hensel lifting and factoring polynomials

factorization modulo some higher power of p. This is iterated until all factors of f
are found, which—if we are lucky—may happen before pl is reached. A natural
choice for these l ∗ are consecutive powers of 2, starting with the smallest such l ∗
which is at least || f ||∞ .
If we are only interested in computing all integral or rational roots of a given
polynomial f ∈ Z[x], then we can greatly simplify Zassenhaus’ algorithm 15.19.
We have already discussed an approach via factoring modulo a big prime in Sec-
tion 14.5. For the prime power approach, we modify Algorithm 15.19 in the fol-
lowing respects. Firstly, since we are only interested in linear factors, we may
replace the bound 2B by 2nb(A2 + A) if the latter is smaller, as in the big prime
algorithm 14.17. Secondly, we need only compute the linear factors of f modulo p
in step 3 (executing the distinct-degree and equal-degree factorization only for de-
gree 1), and finally—and most importantly—the whole factor combination stage
may be replaced by a simple check whether the linear factors modulo pl are linear
factors in Z[x].

T HEOREM 15.21.
Given a nonconstant squarefree primitive polynomial f ∈ Z[x], we can compute
all its rational roots with an expected number of

O n log(nA)(loglog A)2 logloglog A

+ M(n) log n · M(log(n log A)) log(n log A) loglog(n log A) + n2 M(log(nA))

or O∼ (n2 log A) word operations.

P ROOF. Let γ = logC ∈ O(n log(nA)). The expected cost for step 2 of Algorithm
15.19, modified as described above, is

O(γ log2 γ loglog γ + n log A · log γ + M(n) log n · M(log γ ) loglog γ )

word operations, as in the proof of Theorem 15.20. The expected cost for com-
puting all (monic) linear factors of f mod p in step 3 is O(M(n) log n log(nγ ))
arithmetic operations in F p , by Corollary 14.16, or

O(M(n) log n log(nγ )M(log γ ) loglog γ )

word operations. We take all monic linear factors plus the remaining monic co-
factor of f mod p as inputs to the Hensel lifting in step 4. The cost for step 4
is

O (M(n)M(log(nA)) + M(n) log n · M(log γ ) + n M(log γ ) loglog γ ) log n
15.6. Factoring using Hensel lifting: Zassenhaus’ algorithm 457

word operations, by Theorem 15.18 with µ = log p ∈ O(log γ ). For each linear
factor bx − c ∈ Z[x] dividing f modulo pl , with b = lc( f ), we compute the cor-
responding cofactor v ∈ Z[x] such that (bx − c)v ≡ b f mod pl in step 8, at a cost
of O(n M(log(nA))) word operations, as in the proof of Theorem 14.18. There
are at most n modular factors, and hence the total cost for checking all of them is
O(n2 M(log(nA))) word operations. We obtain an overall cost of
O(γ log2 γ loglog γ + M(n) log n log(nγ ) M(log γ ) loglog γ + n2 M(log(nA)))
word operations, and the claim follows. ✷

Due to the checking in the last step, the overall asymptotic running time of the
prime power algorithm for root finding is about the same as for the big prime
variant (Algorithm 14.17). However, the modular factorization stage of the former
takes about O∼ (n log2 A) word operations, while the cost for the modular factoriza-
tion stage, including Hensel lifting, is only about O∼ (n log A) for the prime power
algorithm. In practice, one would expect that most of the “false” roots exceed the
trailing coefficient in absolute value, and would test the remaining trial roots mod-
ulo some other small primes first; this should rule out most of those which are not
roots in Q. The hope is that then there remain only few false roots.
With minor changes, Zassenhaus’ algorithm 15.19 can be adapted to factor bi-
variate polynomials over a field F with effective univariate factorization, so that
we know how to factor univariate polynomials over F (for example, F = Q or
F = Fq for a prime power q). The degree in y of a polynomial f ∈ F[x, y] plays the
role of the max-norm, and a bound for possible factors of f is much simpler than
in the integer case: divisors of f never have larger degree than f does. “Primitive”
now is with respect to the variable x, so that contx ( f ) = 1. Moreover, we require
f to have a trivial gcd with its derivative with respect to x. This implies that f is
squarefree. The converse is true in characteristic zero; see Exercise 15.25 for a
counterexample in positive characteristic.

A LGORITHM 15.22 Bivariate factorization (prime power version).

Input: A primitive polynomial f ∈ R[x] = F[x, y], where R = F[y] for a field F with
at least 4nd elements and effective univariate factorization, such that degx f =
n ≥ 1, degy f = d, and gcd( f , ∂ f /∂x) = 1 in F(y)[x].
Output: The irreducible factors { f1 , . . . , fk } ⊆ F[x, y] of f .
1. if n = 1 then return f
b ←− lcx ( f )
choose U ⊆ F of cardinality #U ≥ 4nd
2. repeat randomly choose an element u ∈ U, f ←− f (x, u)
′
until b(u) 6= 0 and gcd( f , f ) = 1 in F[x]
l ←− d + 1 + deg b
458 15. Hensel lifting and factoring polynomials

3. { modular factorization }
use the univariate factorization algorithm to compute a factorization f ≡
bh1 · · · hr mod (y − u) in (F[y]/hy − ui)[x] ∼
= F[x] with distinct monic irre-
ducible h1 , . . . , hr ∈ F[x]

4. { Hensel lifting }
call Algorithm 15.17 to compute a factorization f ≡ bg1 · · · gr mod (y − u)l
with polynomials g1 , . . . , gr ∈ F[x, y] that are monic with respect to x such
that degy gi < l and gi (x, u) = hi for 1 ≤ i ≤ r

6. { factor combination }
while 2s ≤ #T do

7. for all subsets S ⊆ T of cardinality #S = s do

8. compute g∗ , h∗ ∈ F[x, y] of degree less than l in y satisfying

g∗ ≡ b ∏i∈S gi mod (y − u)l and h∗ ≡ b ∏i∈T \S gi mod (y − u)l

9. if deg(g∗ h∗ ) = deg(b f ∗ ) then

T ←− T \ S, G ←− G ∪ {ppx (g∗ )},
f ∗ ←− ppx (h∗ ), b ←− lcx ( f ∗ )
break the loop 7 and goto 6
10. s ←− s + 1

11. return G ∪ { f ∗ }

The following analysis of the algorithm is in Exercise 15.24.

T HEOREM 15.23.
Algorithm 15.22 works correctly. The expected cost of step 2 is O(nd +M(n) log n)
or O∼ (nd) arithmetic operations in F , and step 4 takes O(M(n) log n(log n + M(d))
or O∼ (nd) operations. The number of field operations for one iteration of steps
8 and 9 is O((n log d + M(n) log n)M(d)) or O∼ (nd), and there are at most 2n+1
iterations. If F = Fq is a finite field with q elements, then the expected number of
operations in Fq for step 3 is O(M(n2 ) log n + M(n) log n log q) or O∼ (n2 + n log q).

When F = Fq is a “small” finite field, then we have a problem in step 1. There

are several ways to circumvent this. We may employ a big prime algorithm,
15.6. Factoring using Hensel lifting: Zassenhaus’ algorithm 459

with a worse cost estimate for the modular factorization stage. Or we may factor
f modulo a nonlinear irreducible m ∈ Fq [y] of degree O(log(nd)) in step 3 instead
of modulo y − u, and lift this to a factorization modulo a sufficiently high power
of m in step 4. This increases the timings of steps 2 and 3 by a factor of at most
O∼ (log2 (nd)). Or we perform a field extension of degree O(log(nd)) and employ
the algorithm over the larger field, thereby multiplying a factor of O∼ (log2 (nd)) to
all timings. However, irreducible factors of f in Fq [x, y] may split over the larger
field. In some applications, such a finer factorization may be advantageous, but
if factors over Fq are needed, one has to take care of this separately if necessary.
For example, if g ∈ Fqt [x, y] is an irreducible factor of f over the larger field Fqt ,
t
then g(q −1)/(q−1) ∈ Fq [x, y] (the norm of g) is a power of an irreducible factor of f
over Fq .
To factor an arbitrary polynomial f ∈ F[x, y], we proceed similarly as described
after Algorithm 15.2, with one notable difference: If F = Fq is a finite field, then
Fq (y) is not a perfect field, and we cannot literally use the algorithms of Section
14.6 for squarefree factorization. For example, if p = char Fq , then the polynomial
f = x p − y is irreducible, but has derivative ∂ f /∂x = 0. If we exchange the roles
of x and y, then ∂ f /∂y = −1 and gcd( f , ∂ f /∂y) = 1 in Fq (x)[y], and the algorithm
can be applied. If both partial derivatives ∂ f /∂x and ∂ f /∂y vanish, as for f =
x p − y p , then f is a pth power, as in the univariate case (here f = (x − y) p ), and it is
sufficient to factor f 1/p . Here is the analog of Algorithm 15.5; see Exercise 15.25
for a correctness proof.

A LGORITHM 15.24 Bivariate polynomial factorization.

Input: A nonconstant polynomial f ∈ F[x, y] with degx f = n and degy f = d, where
F is a field with effective univariate factorization.
Output: A constant a ∈ F and a set of pairs {( f1 , e1 ), . . . , ( fk , ek )} such that the fi ∈
F[x, y] are irreducible and distinct, the ei ∈ N are positive, and f = a f1e1 · · · frer .
f
1. G ←− Ø, cx ←− contx ( f ), cy ←− conty ( f ), g ←−
cx cy
2. compute the irreducible factorizations of cx and cy in F[y] and F[x], respec-
tively, and add them to G
g
3. gx ←− ∂g/∂x, u ←−
gcd(g, gx )
g u
gy ←− ∂g/∂y, v ←− , w ←−
gcd(g, gy ) gcd(u, v)
{ gcd denotes the greatest common divisor in F[x, y] }
4. call Algorithm 15.22 with the roles of x and y exchanged to compute the set
V ⊆ F[x, y] of all irreducible factors of v
call Algorithm 15.22 to compute the set W ⊆ F[x, y] of all irreducible factors
of w
460 15. Hensel lifting and factoring polynomials

5. for all v ∈ V ∪W do
determine the multiplicity e of v in g by trial division
g
G ←− G ∪ {(v, e)}, g ←− e
v
6. if F is finite of characteristic p and g 6∈ F then
call the algorithm recursively with input g1/p , yielding a ∈ F and a
set of pairs H
a ←− a p
for each pair (g, e) ∈ H do G ←− G ∪ {(g, ep)}
else a ←− g

7. return a and G

We conclude this section with a recapitulation of the algorithms which employ

one of our three modular computation schemes: big prime, small primes, or prime
power modular algorithms. They are given in Table 15.5.

computation
big prime small primes prime power
problem
Algorithm 5.10,
determinant Section 5.5
Exercise 5.32
linear system
Exercise 5.33
solving
Algorithms EZ-GCD
polynomial gcd Algorithms 6.36, 6.38
6.28, 6.34 (Notes 15.6)
polynomial EEA Algorithms 6.57, 6.59
integer
Exercise 8.36 Algorithm 8.25
multiplication
polynomial Algorithms 8.16, 8.20,
multiplication Exercise 5.34
polynomial
Exercise 9.14 Exercise 10.21 Algorithm 9.3
division
roots of integers Section 9.5
root finding Algorithm 14.17 Theorem 15.21
squarefree
Exercises 15.26, 15.31 Exercise 15.27
decomposition
polynomial Algorithms
Algorithm 15.2
factorization 15.19, 15.22, 16.22

TABLE 15.5: Modular algorithms.

For all problems listed, there exist big prime algorithms, and in most cases also
small primes and prime power algorithms, although we did not discuss all of them.
Usually the small primes or prime power variants are more efficient, both in theory
and in practice, than either the big prime approach or a direct computation. To ap-
ply the prime power approach, the computational problem needs to be described
15.7. Implementations 461

in terms of “equations” which are then lifted. Some problems are badly suited for
the small primes approach, such as polynomial factorization.

15.7. Implementations
In this section, we continue the description of N TL and B I P OL A R from Sec-
tion 9.7, the focus now being on polynomial factorization. All experiments with
N TL were done in version 1.5.
B I P OL A R is a C++ library for polynomial factorization in F2 [x]. It contains
algorithms for squarefree decomposition (Section 14.6), distinct-degree factoriza-
tion (Section 14.2), and equal-degree factorization (Section 14.3). Experiments
show that for random polynomials, the distinct-degree factorization stage is by far
the dominant part of the whole computation.
Both in theory and in practice, a gcd computation is more costly than a multi-
plication or a division with remainder. Since a polynomial of degree n does not
have irreducible factors of all degrees between 1 and n (on average, the number
of irreducible factors is O(log n); see Notes 15.7), most gcds in the distinct-degree
factorization algorithm 14.3 are equal to 1. B I P OL A R uses a blocking strategy to
reduce the number of gcds, at the expense of additional—but cheaper—modular
multiplications. The range {1, . . . , n} of possible degrees for the irreducible factors
of f is partitioned into disjoint intervals, and then for each interval I—proceeding
from lower to higher degrees—the
product
of all irreducible factors with degree
qi
in I is obtained as gcd f , ∏i∈I (x − x) and removed from f (Exercise 14.6). If
we are lucky, then each of these polynomials is already irreducible; otherwise a
fine distinct-degree factorization à la Algorithm 14.3 is performed. Experiments
with random polynomials show that on average, this is only necessary for small
degrees. In B I P OL A R, the intervals grow linearly in size; this takes into account
the fact that random polynomials tend to have many small but few large factors.
Every time a factor is split off, an irreducibility test (similar to Algorithm 14.36)
for the remaining polynomial is started up on a second processor. Much of the data
required for this test has already been computed in the distinct-degree factorization
phase.
Figure 15.6 gives some running times with B I P OL A R from 1998 on two Sun
Sparc Ultra 1 computers rated at 167 MHz each; one running the distinct-degree
algorithm described above, and the second one performing the irreducibility test
in parallel. The timings are the maximum of the CPU times on both machines. An
implementation on a parallel machine can factor polynomials in F2 [x] of degree
over one million (Bonorden, von zur Gathen, Gerhard, Müller & Nöcker 2001).
The abort degree is the degree to which the distinct-degree factorization had
progressed when the computation was finished. It is printed in bold if the irre-
ducibility test won; otherwise the distinct-degree factorization terminated when
462 15. Hensel lifting and factoring polynomials

degree time abort factorization pattern

4’ 3178 12503

5’ 3818 12616

16383 5’ 3698 13570

6’ 4724 10002

9’ 6728 6563 8325

16’ 3872 32071

24’ 7442 7245 13395

32767 34’ 10658 10414 11836

35’ 9839 9085 19678

39’ 10447 9659 20895

40’ 5201 61709

52’ 6036 57310

65535 54’ 7792 53619

59’ 8566 7891 47431

1h 08’ 9484 8328 51251

1h 49’ 8186 125794

2h06’ 9218 124863

131071 4h06’ 20510 18136 110722

5h16’ 27378 10400 23894 26057 27069 27804

6h37’ 29920 12758 15699 28780 70621

16881 29207 29819 43371

19h55’ 47536 95978 45877
262143 13616 29823 44413
26h06’ 46372 170977

F IGURE 15.6: Factoring polynomials in F2 [x] with B I P OL A R.

degree about max{m1 /2, m2 } was reached, where m1 , m2 are the degrees of the
largest and the second largest factor of the input polynomial, respectively.
For factoring in F p [x] for a prime p and in Q[x], N TL first computes the square-
free decomposition. Over finite prime fields, both the algorithms of Berlekamp
(1967,1970) and Shoup (1995) (who also does distinct- and equal-degree factor-
ization) are implemented. For efficiency reasons, there are special factoring sub-
routines for “small” primes p that fit into one machine word. Figure 15.7 shows
some running times in N TL for various primes and degrees. The timings are aver-
ages for 10 pseudorandomly chosen inputs.
To factor a squarefree polynomial with integral coefficients in Q[x], N TL first
computes—after extracting its content—its irreducible factorizations modulo sev-
eral “small” primes, thus (hopefully) gaining some information about the fac-
torization pattern in Q[x], as described at the end of Section 15.3. Then one of
these primes is selected and the factorization is lifted using a variant of Algorithm
15.17. For each product in the factor combination stage, the divisibility test is
only executed after checking that the degree of the product is compatible with all
modular factorization patterns, and then performing the constant coefficient test.
15.7. Implementations 463

20
CPU minutes

n = 64
5
n=k
k = 64

32 64 96 128 160 192 224 256

input size n · k in kilobits

F IGURE 15.7: Factoring polynomials of degree n − 1 modulo k-bit primes in NTL.

30
binomial
25 random

20
CPU minutes

0 256 512 768 1024 1280 1536 1792 2048

F IGURE 15.8: Factoring xn−1 − 1 in Z[x] in NTL (green crosses). The timings depend
highly on the factorization of n − 1. The test series is for n = 32, 64, 96, . . . , 2048. The
running times for the seven values n = 704, 1024, 1248, 1376, 1408, 2016, and 2048 were
above four hours, in five of these cases even over one day. For comparison, we have
included the average timings for 10 pseudorandom polynomials of degree n − 1 with coef-
ficients in {−1, 0, 1} (blue curve).
464 15. Hensel lifting and factoring polynomials

20
CPU minutes

n = 64
5
n=k
k = 64

0.5 1 1.5 2 2.5 3 3.5 4

input size n · k in megabits

F IGURE 15.9: Factoring products of two pseudorandom polynomials of degree (n/2) − 1

with about k/2-bit integer coefficients in NTL.

30
1 random factor
25 2 random factors
4 random factors
8 random factors
20 16 random factors
CPU minutes

256 512 768 1024 1280 1536 1792 2048

F IGURE 15.10: Factoring polynomials of degree about n with about n-bit integer coef-
ficients in NTL. The green curve is for pseudorandom polynomials, the blue curve for
products of two pseudorandom polynomials of degree (n/2) − 1 with about n/2-bit coeffi-
cients, and so on.
Notes 465

Figures 15.8 through 15.10 give some running times in N TL for pseudorandom
polynomials with various degrees, coefficient sizes, and factorization patterns. The
timings are averaged over 10 pseudorandomly chosen inputs. Since random poly-
nomials are irreducible with high probability (see Notes 15.3), we also took poly-
nomials with a designed number of pseudorandom factors of the same degree as
input; in fact, these were the irreducible factors in almost all cases.
Figure 15.10 indicates that the algorithms take longer if there are more factors,
as expected. For a fixed input size n · k, the “diagonal” case n = k appears to
be the most favorable to the implementation, as in Figure 15.9. A comparison
to Figure 15.7 indicates that the software factors random polynomials with k-bit
coefficients over the integers much faster than modulo k-bit primes. (A major dif-
ference between the two factorization tasks is that random polynomials of degree
n are irreducible over Q with high probability, but modulo a prime only with prob-
ability about 1/n.) Presumably the reason is that the modular factorization stage,
which takes about O∼ (n2 log p + n log2 p) word operations, uses a small prime p
with only about log2 k bits when factoring over Z by Hensel lifting.

Notes. 15.1. The first factoring algorithms for Z[x] are due to von Schubert (1793),
Gergonne (1822), and Kronecker (1882, 1883). In our terminology, they use a “small
primes approach” with linear moduli x − ui for small integers ui , plus factor combination.
It involves the factorization of large integers and is utterly impractical.
15.2. Algorithm 15.2 appears in Musser (1971). Our algorithms become somewhat simpler
when the input is monic. One can reduce the general case to this, replacing f ∈ Z[x] by
lc( f )n−1 f (x/ lc( f )), but this is computationally disadvantageous.
In our approach, factors are selected according to the number of irreducible modular
factors that they comprise. One might also consider selecting them according to their
degree, but Collins (1979) argues that this is disadvantageous. Collins & Encarnación
(1996) and Abbott, Shoup & Zimmermann (2000) propose some heuristic techniques for
the factor combination stage.
Arjen Lenstra (1984, 1987) gives algorithms for factoring polynomials over algebraic
number fields.
15.3. The Swinnerton-Dyer polynomials were suggested by H. P. F. Swinnerton-Dyer, as
Berlekamp (1970) mentions. Kaltofen, Musser & Saunders (1983) investigate generaliza-
tions and also give bounds on the coefficient sizes.
Frobenius (1896) had found his theorem in 1880, and Chebotarev (1926) generalized it
in the following way, as already conjectured by Frobenius. In the full symmetric group Sn ,
all permutations with the same cycle structure form a conjugacy class, but this may not be
true in other Galois groups. As an example, the two 4-cycles in the Galois group of f2 in
Table 15.3 are not conjugate within that group. But still each set of permutations with the
same cycle structure is a union of conjugacy classes. While Frobenius’ theorem refers to
cycle structure (and primes with that factorization pattern), Chebotarev’s result proves the
corresponding density estimate for the finer division into conjugacy classes (and primes
whose Frobenius automorphism lies within that conjugacy class).
Van der Waerden (1934) proved that random integer polynomials have the full sym-
metric group as their Galois group with probability 1. In particular, they are probably
466 15. Hensel lifting and factoring polynomials

irreducible. See also Dörge (1926). Quantitative estimates are in Gallagher (1973). Wilf
(1994), §4.1, proves that the average number of cycles of a random permutation on n letters
is Hn ∈ ln n + O(1), where Hn = 1 + 21 + · · · + 1n is the nth harmonic number (Section 23.2).
15.4. Legendre (1785) factors some integer polynomials by a p-adic method. One of his
examples is in Exercise 15.11. Another example gives two cubic factors of a polynomial
of degree six. But he does not state a general method, and cautions the reader (pages
506/507): Ces méthodes sont fort imparfaites, mais l’utilité de leur objet nous a engagés à
les insérer ici, quelque petit que soit le nombre des cas où on peut s’en servir avec succès.1
Hensel (1918) introduced the p-adic numbers, and his factoring method Hensel lift-
ing was first used in a computer algebra context by Zassenhaus (1969); see also Kempfert
(1969). As with so many topics of this book, Gauß had preempted them all. In his Nachlass
(Gauß 1863a, 1863b, see page 372) we find in articles 373 and 374 an explicit description
of the lifting procedure modulo prime powers, and Gauß concludes si functio X aequales
non habeat divisores secundum modulum p, eam secundum modulum pk similiter in fac-
tores discerpi posse, uti secundum modulum p. At si X divisores aequales habeat, res fit
multo magis complicata neque adeo ex principiis praecedentibus prorsus exhauriri potest.2
He then even considers the case where multiple factors modulo p may occur, but the cal-
culations in his manuscript end dramatically in the middle of an equation.
Indeed, an integer polynomial f that is squarefree modulo a prime p has a unique fac-
torization modulo any power pk , and this is easily computed by Hensel lifting. But when
f is not squarefree modulo p, this factorization is quite tricky. Hensel lifting reduces the
problem to the case where f is a power of an irreducible polynomial modulo p. There
may be exponentially (in deg f ) many irreducible factors, but a representation of them can
still be computed in polynomial time (in deg f , log p, and k) if the discriminant is nonzero
modulo pk (von zur Gathen & Hartlieb 1998). This is based on polynomial-time factoring
algorithms over the p-adic integers Z(p) ; see Chistov (1990) and Cantor & Gordon (2000).
Exercise 15.18 shows how factoring the innocuous polynomial x modulo a composite num-
ber is a nontrivial task.
15.5. Victor Shoup has implemented a somewhat more efficient variant of Algorithm 15.17
in N TL. It is described in the 1999 edition of this book. A different approach to lift factors
simultaneously is in von zur Gathen (1984a).
15.6. The prime power factorization algorithm 15.19, based on Hensel lifting, is essentially
due to Zassenhaus (1969). Loos (1983) gives an algorithm based on Hensel lifting for
computing all rational roots of an integer polynomial.
A conceptually simple way to factor bivariate polynomials over small finite fields is to
make an extension of prime degree larger than deg( f ); then all factors over the extension
are actually in the ground field (von zur Gathen 1985). However, this approach is compu-
tationally inferior to the other solutions presented.
Trager (1976) shows that if F ⊆ E is a finite Galois extension of fields, f ∈ F[x] is
irreducible, and g ∈ E[x] is an irreducible factor of f , then the norm of g is a power of f .
Moses & Yun (1973) propose the EZ-GCD algorithm for computing gcds in Z[x] and
rings of multivariate polynomials via Hensel lifting; see also Lauer (2000). Yun (1976)
1 These methods are quite imperfect, but the importance of their goal induced us to insert them here, however
small the number of cases may be where they can be used successfully.
2 If the polynomial X is squarefree modulo p, then it factors modulo pk in the same way as modulo p. But if X
has multiple factors, the task is much more complicated and cannot even be solved by the preceding principles in
a straightforward manner.
Exercises 467

gives a similar algorithm for computing the squarefree decomposition of a multivariate

polynomial.
15.7. The blocking strategy was introduced by von zur Gathen & Shoup (1992). The
average number of irreducible factors of a random polynomial with coefficients in a finite
field appears in Berlekamp (1984), exercise 3.6 (already in the 1968 edition).
Figure 15.6 is from von zur Gathen & Gerhard (1996). The running time depends very
much on the factorization pattern, not just on the degree. A similar phenomenon happened
with our factorizations in Z[x]. Among the ten inputs leading to one data point in Figures
15.8 through 15.10, the standard deviation was, in some cases, about as large as the average
value.
In our factoring experiments with products of pseudorandom polynomials in Z[x], the
pseudorandom factors were almost always irreducible; exceptions occurred up to degree 7
with 7-bit coefficients, and also for higher degrees with coefficients in {−1, 0, 1}.

Exercises.
15.1 (i) Prove Eisenstein’s theorem: If f ∈ Z[x] and p ∈ N is a prime number such that p ∤ lc( f ),
p divides all other coefficients of f , and p2 ∤ f (0), then f is irreducible in Q[x].
(ii) Conclude that for any n ∈ N, the polynomial xn − p is irreducible in Q[x].
15.2 Trace Algorithm 15.2 on factoring f = 30x5 + 39x4 + 35x3 + 25x2 + 9x + 2 ∈ Z[x]. Choose the
prime p = 5003 in step 2.

15.3 Here are the irreducible factorizations of the monic polynomial f ∈ Z[x] of degree 6 modulo
some small primes:

f = (x + 1)2 · (x2 + x + 1) · (x4 + x3 + x2 + x + 1) ∈ F2 [x]

f = (x + 3) · (x3 + 3) · (x4 + 4x3 + 2x2 + x + 4) ∈ F7 [x]
f = (x + 9) · (x2 + 2x + 4) · (x5 + 5) ∈ F11 [x]

What can you say about the degrees of the irreducible factors of f in Z[x]?
15.4−→ Compute the coefficients of the Swinnerton-Dyer polynomial
√ √ √ √ √ √ √ √
f = (x + −1 + 2)(x + −1 − 2)(x − −1 + 2)(x − −1 − 2) ∈ Z[x]

and its factorizations modulo p = 2, 3, 5. Prove that f is irreducible.

15.5 We have discussed big prime and prime power approaches to factoring in Z[x]. Use Chebot-
arev’s theorem to explain why a small primes approach is not promising.
15.6 Construct two polynomials which have factorization patterns modulo 2, 3, 5 as in Example
15.7, such that the first one is irreducible in Z[x] and the second one splits into two irreducible
factors of degrees 3 and 5.
15.7∗ Let n = p1 p2 , where p1 , p2 ∈ N are distinct odd primes. The nth cyclotomic polynomial
Φn is irreducible in Z[x]. Prove that it splits modulo any prime p into at least two factors. Hint:
Lemma 14.50. Show that the number of factors is at least p1 − 1 if (p1 − 1) | (p2 − 1).
15.8∗∗ Let p ∈ N be prime. Prove that all irreducible factors modulo p of a Swinnerton-Dyer poly-
nomial have degree at most 2, and if p does not divide the discriminant, they are either all linear or
all quadratic.
468 15. Hensel lifting and factoring polynomials

15.9∗∗ Let Fq be a finite field with q elements, for an odd prime power q, let x, y be indeterminates
over Fq , and
√ p √ p √ p √ p
f = (x + y + y + 1)(x + y − y + 1)(x − y + y + 1)(x − y − y + 1).

Show that f ∈ Fq [x, y] and that f is irreducible, but that f (x, u) ∈ Fq [x] splits into at least two factors
for all u ∈ Fq .
15.10∗ A partition of a positive integer n is a sequence λ = (λ1 , . . ., λr ) of positive integers such
that λ1 ≥ · · · ≥ λr and n = λ1 + · · · + λr ; r is the length of the partition. For example, if F is a
field and f ∈ F[x] of degree n, then the factorization pattern of f , with the degrees of the factors in
descending order, is a partition of n.
If λ = (λ1 , . . ., λr ) and µ = (µ1 , . . ., µs ) are two partitions of n, then we say that λ is finer than
µ and write λ 4 µ if there is a surjective map σ: {1, . . ., r} −→ {1, . . ., s} such that µi = ∑σ( j)=i λ j
for all i ≤ s. For example, λ = (4, 2, 1, 1) is finer than µ = (5, 3), as furnished by σ(1) = σ(3) = 1
and σ(2) = σ(4) = 2. (The function σ need not be unique.) In particular, (n) is the coarsest and
(1, 1, . . ., 1) is the finest partition of n.
(i) Prove that if f ∈ Z[x] has degree n and p ∈ N is prime, µ is the factorization pattern of f in
Q[x], and λ is the factorization pattern of f mod p in F p [x], then µ < λ.
(ii) Show that λ 4 λ, λ 4 µ =⇒ ¬(µ 4 λ), and λ 4 µ 4 ν =⇒ λ 4 ν holds for all partitions λ, µ, ν
of n, so that 4 is a partial order on the set of all partitions of n.
(iii) Enumerate all partitions of n = 8, and draw them in form of a directed graph, with an edge
from λ to µ if µ is a direct successor of λ with respect to the order 4, so that λ 4 µ, λ 6= µ, and
λ 4 ν 4 µ =⇒ λ = ν or ν = µ for all partitions ν.
(iv) Use (iii) to show that there exist partitions λ, µ of 8 that do not have a supremum with respect
to 4. (Thus the partitions do not form a “lattice” in the sense of order theory, not to be confused with
the Z-module lattices in Chapter 16.)
(v) Let an,r denote the number of partitions of n of length r. Thus an,1 = an,n = 1 and an,r = 0
for 1 ≤ n < r. Prove the recursion formula an,r = ∑1≤ j≤r an−r, j for 1 ≤ r < n. Calculate an,r for
1 ≤ r ≤ n ≤ 8, and compare with your results from (iii).
15.11 (Legendre 1785, p. 490) Let f = x3 − 292x2 − 2 170 221x + 6 656 000 ∈ Z[x]. Find 13-adic
i
linear factors x − ai with f rem x − ai ≡ 0 mod 132 for i = 0, 1, 2, starting with a0 = 0.
15.12 Prove Lemma 15.9 (ii).
15.13 Suppose that the monic polynomial f ∈ Z[x] has degree 8, and p is a prime so that f mod
p = g1 g2 g3 factors into three irreducible and pairwise coprime polynomials g1 , g2 , g3 ∈ F p [x] with
deg g1 = 1, deg g2 = 2, and deg g3 = 5.
(i) What can you say about the possible factorizations of f modulo p100 ?
(ii) What can you say about the possible factorizations of f in Q[x]?
(iii) Suppose q is another prime for which f mod q = h1 h2 with h1 , h2 ∈ Fq [x] irreducible and
deg h1 = deg h2 = 4. What can you say about the possible factorizations of f in Q[x], using all this
information?
15.14 Let f = x15 − 1 ∈ Z[x]. Take a nontrivial factorization f ≡ gh mod 2 with g, h ∈ Z[x] monic
and of degree at least 2. Compute g∗ , h∗ ∈ Z[x] such that

f ≡ g∗ h∗ mod 16, deg g∗ = deg g, g∗ ≡ g mod 2.

Show your intermediate results. Can you guess some factors of f in Z[x]?
15.15 Let f = 14x4 + 15x3 + 42x2 + 3x + 1 ∈ Z[x].
(i) Find a suitable prime p ∈ N such that f mod p is squarefree and has degree 4.
Exercises 469

(ii) Compute the irreducible factorization of f mod p in F p [x]. Choose two factors g, h ∈ Z[x] that
are coprime modulo p such that h is monic and irreducible modulo p and f ≡ gh mod p. Determine
s,t ∈ Z[x] with sg + th ≡ 1 mod p.
(iii) Execute two successive Hensel steps (Algorithm 15.10 for m = p and m = p2 ) to obtain a
factorization f ≡ g∗ h∗ mod p4 with g ≡ g∗ mod p and h ≡ h∗ mod p. Can you derive a factorization
of f in Q[x] from it?
15.16−→ Consider the polynomial

f = x5 + 3 y3 + 39 y2 + 50 y + 28 x4 + 36 y5 + 2 y4 + 47 y3 + 63 y2 + 49 y + 58 x3

+ 91 y6 + 18 y5 + 81 y4 + 37 y3 + 36 y2 + 53 y + 64 x2

+ 74 y7 + 54 y6 + 24 y5 + 39 y4 + 71 y3 + 18 y2 + 93 y + 53 x

+ 62 y6 + 72 y5 + 87 y4 + 27 y3 + 19 y2 + 61 y ∈ F97 [x, y].

(i) Compute a factorization f ≡ gh mod y with coprime nonconstant polynomials g, h ∈ F97 [x],
and polynomials s,t ∈ F97 [x] with sg + th = 1.
(ii) Execute two successive Hensel steps (Algorithm 15.10 with m = y and m = y2 ) to obtain
polynomials g∗ , h∗ ∈ F97 [x, y] such that f ≡ gh mod y4 , g∗ ≡ g mod y, and h∗ ≡ h mod y.
15.17 Complete the proof of Theorem 15.11.
15.18 (Shamir 1993) Let N = p · q be the product of two distinct primes p, q.
(i) Show that u = p2 + q2 is a unit in Z× N.
(ii) Verify the factorization x ≡ u−1 (px + q)(qx + p) mod N.
(iii) Prove that the two linear factors in (ii) are irreducible in ZN [x]. Hint: CRT.
15.19∗ Let N = p1 · · · ps be a product of s distinct primes, and f ∈ ZN [x] be monic and squarefree.
(i) Let g1 ∈ Z p1 [x] be irreducible, and g ∈ ZN [x] with g ≡ g1 mod p1 and g ≡ 1 mod pi for i ≥ 2.
Prove that g is irreducible in ZN [x].
(ii) Assume that we have factored f modulo each pi . Determine the factorization of f into irre-
ducible polynomials in ZN [x]. How many irreducible factors are there, in terms of the numbers of
irreducible factors modulo each pi ?
(iii) How many irreducible factors does x3 − x have modulo 105? Find four of them.
15.20 This exercise discusses a variant of Hensel lifting with linear convergence.
(i) One step works as follows. In Algorithm 15.10, we have an additional input p ∈ R, and the
congruence sg + th ≡ 1 should hold modulo p instead of m. In step 1, we perform all computations
modulo mp instead of m2 , and step 2 is omitted completely. Prove that then the output specifications
of Algorithm 15.10 for f , g∗ , h∗ hold if m2 is replaced by mp.
(ii) Now we start with a factorization of f , including the polynomials s,t, as specified in Algorithm
15.10 for m = p, and want to compute a factorization modulo pl for some l ∈ N. Show that for
R = Z, this takes O(M(n)M(l log p) log l) word operations when using the quadratic lifting algorithm
15.10 for m = p, p2 , p4 , p8 , . . ., and O(M(n)M(l log p)l) word operations when using the linear lifting
algorithm from (i) for m = p, p2 , p3 , p4 , . . ..
In fact, by employing fast lazy multiplication techniques (van der Hoeven 1997), the cost for linear
Hensel lifting can be reduced to O∼ (nl log p) as well (Bernardin 1998, private communication).
15.21 Let R be a ring (commutative, with 1) and f ∈ R[x], g = ∑0≤i≤m gi xi , h = ∑0≤i≤k hi xi be poly-
nomials such that n = deg f = m +k ≥ 1 and lc( f ) = gm hk . Regarding the n coefficients g0 , . . ., gm−1 ,
h0 , . . ., hk−1 as indeterminates, we define n polynomials ϕ0 , . . ., ϕn−1 in these indeterminates by let-
ting ϕi be the coefficient of xi in gh − f for 0 ≤ i < n.
470 15. Hensel lifting and factoring polynomials

(i) Prove that the Jacobian J ∈ R[g0 , . . ., gm−1 , h0 , . . ., hn−1 ]n×n , whose ith row comprises the par-
tial derivatives ∂ϕi /∂h j and ∂ϕi /∂g j , is precisely the Sylvester matrix of g and h.
(ii) Conclude that for specific values of the coefficients of g, h and a given p ∈ R such that lc( f )
is a unit modulo p, there exist s,t ∈ R[x] such that sg + th ≡ 1 mod p if and only if J is invertible
modulo p. Hint: Exercise 6.15.
15.22 Prove the running time estimate of Theorem 15.18 for the case R = F[y].
15.23 Let f = 6x5 + 23x4 + 51x3 + 65x2 + 65x + 42 ∈ Z[x] and p = 11.
(i) Compute the irreducible factorization of f mod p.
(ii) Use Algorithm 15.17 to lift the factorization above to a factorization of f modulo p4 .
(iii) Try to find nontrivial factors of f in Z[x] via factor combination.
15.24∗ Prove Theorem 15.23.
15.25∗ (i) Show that the bivariate polynomial factorization algorithm 15.24 works correctly when
F is a field of characteristic zero. First convince yourself that u = v is the squarefree part of g.
(ii) Let Fq be a finite field of characteristic p and h ∈ Fq [x, y]. Prove that h is a pth power if and
only if ∂h/∂x = ∂h/∂y = 0.
(iii) Show that h is squarefree if it is coprime to one of ∂h/∂x or ∂h/∂y. Hint: Exercise 14.22.
(iv) Prove that in step 3 of Algorithm 15.24, gcd(u, ∂u/∂x) = gcd(w, ∂w/∂x) = 1 in Fq (y)[x] and
gcd(v, ∂v/∂y) = 1 in Fq (x)[y], and that u, v, and w are squarefree.
(v) Now assume that h is an irreducible factor of g with multiplicity e in step 3 of Algorithm 15.24.
Conclude from the above that if p ∤ e, then h | vw in step 3 and h ∤ g in step 8, and that he still divides
g in step 8 if p | e.
(vi) Prove that Algorithm 15.24 works correctly when F = Fq .
15.26∗ (Gerhard 2001a) This exercise discusses a small primes modular algorithm for computing
the squarefree decomposition of a primitive polynomial f ∈ Z[x].
(i) Let R be a UFD and f ∈ R[x] nonconstant and primitive. Prove that there exist primitive
squarefree and pairwise coprime polynomials g1 , . . ., gm ∈ R[x] such that gm is nonconstant and
f = g1 g22 · · ·gm ×
m . Show that m is unique and the gi are unique up to multiplication by units in R .
For the special cases R = Z or R = F[y], where F is a field, we can make the decomposition above
unique by stipulating that lc(gi ) ∈ R be positive or monic, respectively, assuming that lc( f ) is also
positive or monic. Then we call the sequence (g1 , . . ., gm ) the primitive squarefree decomposition
of f .
(ii) Now let R = Z, p be a prime not dividing lc( f ) and f ≡ lc( f )h1 h22 · · ·hkk mod p be the square-
free decomposition of f modulo p, with monic h1 , . . ., hk ∈ Z[x] that are squarefree and pairwise
coprime modulo p, and hk 6= 1. Show that k ≥ m and the modular squarefree part h1 · · ·hk divides
modulo p the squarefree part g = g1 · · ·gm of f . Prove that k = m and gi ≡ lc(gi )hi mod p for all i if
p does not divide the (nonzero) discriminant res(g, g′ ) ∈ Z of g.
(iii) Prove the following generalization of Mignotte’s bound (Corollary 6.33): If f , g1 , . . ., gm ∈
Z[x] are nonzero polynomials with f = g1 · · ·gm and n = deg f , then

∏ ||gi ||1 ≤ (n + 1)1/2 2n || f ||∞ .

1≤i≤m

(iv) Design a small primes modular algorithm for computing the primitive squarefree decomposi-
tion of f , in analogy to the small primes modular gcd algorithm 6.38. Your algorithm should check
that the result is correct, so that it is Las Vegas, and use O∼ (n2 +n log A) word operations if A = || f ||∞ .
15.27∗ Using Exercise 15.26, design a prime power modular algorithm for computing the squarefree
decomposition of a primitive polynomial f ∈ Z[x]. Your algorithm should check that the result is
correct, so that it is Las Vegas, and use O∼ (n2 + n log A) word operations if n = deg f and A = || f ||∞ .
Exercises 471

15.28 We have indicated in Section 15.7 that factoring a random polynomial with k-bit coefficients
over the integers is computationally faster than factoring the same polynomial modulo a k-bit prime.
This suggests a factoring algorithm over finite prime fields which first factors the input polynomial
over the integers and then calls the factoring algorithm over finite fields for its (in Z[x]) irreducible
factors. Explain why this is not of much help for random polynomials.
15.29 Let R be a ring, 1 ≤ k < r, f1 , . . ., fr ∈ R[x] monic, nonconstant, and pairwise Bézout-coprime,
b ∈ R× , g = b f1 · · · fk , and h = fk+1 · · · fr . Show that g and h are Bézout-coprime. (Hint: Proceed as
in the solution of Exercise 10.13.) More precisely, prove that there exist polynomials s,t ∈ R[x] such
that sg + th = 1, deg s < deg h, and degt < deg g.
15.30 The aim of this exercise is to shave off the factor log r in the running time estimate of Theo-
rem 15.18 when using classical arithmetic. In addition to the input specification of Algorithm 15.17,
we assume that the fi are sorted by degree, so that n1 = deg f1 ≤ n2 = deg f2 ≤ · · · ≤ nr = deg fr .
Let li = ⌊log ni ⌋ for all i and e = e(n1 , . . ., nr ) = ⌈log ∑1≤i≤r 2li ⌉. (As usual, log denotes the binary
logarithm.)
(i) Assume r ≥ 2 and let 1 ≤ k < r be maximal such that ∑k<i≤r 2li ≤ 2e−1 . Prove that such a k
exists, that actually equality holds, and conclude that e(n1 , . . ., nk ) ≤ e−1 and e(nk+1 , . . ., nr ) ≤ e−1.
(ii) We replace the definition of k in step 2 of Algorithm 15.17 by the definition as in (i) and
denote by T (n1 , . . ., nr ) the cost of the algorithm with classical arithmetic. Prove that there is a
positive constant c such that

T (n) ≤ c2e l 2 µ2 , T (n1 , . . ., nr ) ≤ T (n1 , · · ·, nk ) + T (nk+1 , . . ., nr ) + c22e l 2 µ2 ,

with µ as in Theorem 15.18, n = n1 + · · · + nr , and e = e(n1 , . . ., nr ). Conclude that T (n1 , . . ., nr ) ≤

4 2e 2 2 2 2 2
3 c2 l µ ∈ O(n l µ ).
15.31∗ You are to design a small primes modular algorithm computing the squarefree decomposi-
tion of a bivariate polynomial. For the sake of simplicity, we let F be a field of characteristic zero
and f ∈ F[x, y] be primitive with respect to x with n = deg f > 0 and lcx ( f ) ∈ F[y] monic.
(i) Let u ∈ F be such that degx f (x, u) = degx f , let f = g1 g22 · · ·gm
m be the primitive squarefree de-
composition of f with respect to x, as defined in Exercise 15.26 (i), with g1 , . . ., gm ∈ F[x, y] that are
squarefree, pairwise coprime, and primitive with respect to x and have monic leading coefficients, and
degx gm > 0. Moreover, let h1 , . . ., hk ∈ F[x] be the squarefree decomposition of f (x, u)/ lcx ( f (x, u)).
Prove that k = m and gi (x, u) = lcx (gi )(u) · hi for 1 ≤ i ≤ m if u is not a root of the (nonzero) discrim-
inant resx ( f , ∂ f /∂x) ∈ F[y] of f with respect to x. Hint: Proceed as in Exercise 15.26.
(ii) Design a small primes modular algorithm for computing the primitive squarefree decomposi-
tion of f , in analogy to the small primes modular gcd algorithm 6.36. Your algorithm should check
that the result is correct, so that it is of Las Vegas type, and use O∼ (nd) word operations if degy f = d.

Research problem.
15.32 Let f ∈ Z[x], p be a prime, and k ∈ N. Can you find all factorizations of f into irreducible
factors modulo pk in time polynomial in deg f and k log p? An apparently difficult case is when the
discriminant res( f , f ′ ) is zero.
La clarté est, en effet, d’autant plus nécessaire,
qu’on a dessein d’entraîner le lecteur plus loin
des routes battues et dans des contrées plus arides.1
Joseph Liouville (1846)

Mathematics was orderly and it made sense. The answers were

always there if you worked carefully enough, or that’s what she said.
Sue Grafton (1982)

A good idea, but not as simple in life as it is in theory.

Philip Friedman (1992)

1 Clarity is all the more necessary when one intends to guide the reader further away from the beaten track and
into more arid countryside.
16
Short vectors in lattices

In this chapter, we present a polynomial-time algorithm for factoring univariate

polynomials with integer coefficients. We will also indicate how the algorithm can
be modified so as to also work for bivariate polynomials over a field where we have
univariate factorization, such as Q or a finite field. The main technical ingredient,
short vectors in lattices, will be the central topic of this chapter.

16.1. Lattices
The methods we discuss in this chapter deal with computational aspects of the ge-
ometry of numbers , a mathematical theory initiated by Hermann Minkowski in the
1890s. This theory produces many results about Diophantine approximation, con-
vex bodies, embeddings of algebraic number fields in C, and the ellipsoid method
for rational linear programming.
Let f = ( f1 , . . . , fn ) ∈ R n . In this chapter, we use the norm (or 2-norm, or
Euclidean norm) of f , given by
1/2
|| f || = || f ||2 = ∑ fi2 = ( f ⋆ f )1/2 ∈ R,
1≤i≤n

where f ⋆ g = ∑1≤i≤n fi gi ∈ R is the usual inner product of two vectors f and

g = (g1 , . . . , gn ) in R n (often written as ( f , g), or h f , gi, or f · gT in the literature).
The vectors f and g are orthogonal if f ⋆ g = 0.

D EFINITION 16.1. Let n ∈ N and f1 , . . . , fn ∈ R n with fi = ( fi1 , . . . , fin ). Then

L= ∑ Z fi = { ∑ ri fi : r1 , . . . , rn ∈ Z}
1≤i≤n 1≤i≤n

is the lattice or Z-module generated by f1 , . . . , fn . If these vectors are linearly

independent, they are a basis of L. The norm of L is |L| = | det( fi j )1≤i, j≤n | ∈ R.
Lemma 16.2 below implies that it is well defined, in other words, that the norm is
independent of the choice of the generators of L.

473
474 16. Short vectors in lattices

L EMMA 16.2. Let N ⊆ M ⊆ R n be lattices, generated by g1 , . . . , gn and f1 , . . . , fn ,

respectively, where fi = ( fi1 , . . . , fin ) and gi = (gi1 , . . . , gin ). Then det( fi j )1≤i, j≤n
divides det(gi j )1≤i, j≤n .

P ROOF. For 1 ≤ i, j ≤ n there exist ai j ∈ Z such that gi = ∑1≤ j≤n ai j f j . Hence

If we let N = M in the above lemma, so that f1 , . . . , fn and g1 , . . . , gn both gener-

ate the same lattice, we see that | det( fi j )| = | det(gi j )|. Hence the norm is indeed
independent of the choice of basis of L. Geometrically, |L| is the volume of the
parallelepiped spanned by f1 , . . . , fn , and Hadamard’s inequality (Theorem 16.6)
says that |L| ≤ || f1 || · · · || fn || holds.

E XAMPLE 16.3. We let n = 2, f1 = (12, 2), f2 = (13, 4) and L = Z f1 + Z f2 . Fig-

ure 16.1 shows some lattice points of L near the origin of the plane R 2 . The norm
of L is
12 2

|L| = det = 22
13 4
and equals the area of the blue parallelogram in Figure 16.1. Another basis of L
is g1 = (1, 2) and g2 = (11, 0), and g1 is a shortest vector in L with respect to the
Euclidean norm ||·||. ✸

F IGURE 16.1: The lattice in R 2 generated by (12, 2) (red) and (13, 4) (green).

A natural question is to compute a shortest vector in a given lattice. This prob-

lem is “NP -hard”, and there is no hope for efficient algorithms. But for our cur-
rent application, the factorization of polynomials with integer coefficients, it will
be sufficient to compute a “relatively short” vector, a problem for which Lenstra,
Lenstra & Lovász (1982) first gave a polynomial time algorithm. Their “short vec-
tor” is guaranteed to be off by not more than a specified factor, which depends on
the dimension but not the lattice itself.
16.2. Lenstra, Lenstra and Lovász’ basis reduction algorithm 475

16.2. Lenstra, Lenstra and Lovász’ basis reduction algorithm

We briefly review the Gram-Schmidt orthogonalization procedure from linear al-
gebra. Given an arbitrary basis ( f1 , . . . , fn ) of R n , it computes an orthogonal basis
( f1∗ , . . . , fn∗ ) of R n by essentially performing Gaussian elimination on the Gramian
matrix ( fi ⋆ f j )1≤i, j≤n ∈ R n×n (Section 25.5). The fi∗ are defined inductively as
follows.
fi ⋆ f j∗ fi ⋆ f j∗
fi∗ = fi − ∑ µi j f j∗ , where µi j = = for 1 ≤ j < i. (1)
1≤ j<i f j∗ ⋆ f j∗ || f j∗ ||2

In particular, f1∗ = f1 . We will call ( f1∗ , . . . , fn∗ ) the Gram-Schmidt orthogonal

basis of ( f1 , . . . , fn ), and the fi∗ together with the µi j form the Gram-Schmidt
orthogonalization (or GSO for short) of f1 , . . . , fn . The GSO has rational co-
efficients if f1 , . . . , fn have, and then the cost for computing the GSO is O(n3 )
arithmetic operations in Q.
We consider the fi and fi∗ to be row vectors in R n , and define three n×n matrices
F, F ∗ , and M in R n×n :
   ∗ 
f1 f1
 ..  ∗  .. 
F =  .  , F =  .  , M = (µi j )1≤i, j≤n ,
fn fn∗

where µii = 1 for i ≤ n, and µi j = 0 for 1 ≤ i < j ≤ n. Then M is lower triangular

with ones on the diagonal, and (1) reads:
    ∗ 
f1 1 0 f1
 ..   .. . ..   . 
  ..  = M · F .
∗
F = . = . (2)
fn µn1 · · · 1 fn∗

E XAMPLE 16.4. We let n = 3, f1 = (1, 1, 0), f2 = (1, 0, 1), f3 = (0, 1, 1), and
calculate f1∗ = f1 = (1, 1, 0),

f2 ⋆ f1∗ 1 ∗ ∗ 1 1
µ21 = ∗ ∗ = , f2 = f2 − µ21 f1 = ,− ,1 ,
f1 ⋆ f1 2 2 2

f3 ⋆ f1∗ 1 f3 ⋆ f2∗ 1 ∗ ∗ ∗ 2 2 2
µ31 = ∗ ∗ = , µ32 = ∗ ∗ = , f3 = f3 − µ31 f1 − µ32 f2 = − , , ,
f1 ⋆ f1 2 f2 ⋆ f2 3 3 3 3
     
1 1 0 1 0 0 1 1 0
F =  1 0 1  =  21 1 0  ·  12 − 12 1  = M · F ∗ .
1 1
0 1 1 2 3 1 − 32 2
3
2
3

We have || f1 ||2 = || f2 ||2 = || f3 ||2 = 2 and || f1∗ ||2 = 2, || f2∗ ||2 = 3/2, || f3∗ ||2 = 4/3. ✸
476 16. Short vectors in lattices

The following theorem collects the properties of the Gram-Schmidt orthogonal-

ization that we will need. The proof is left as Exercise 16.2.

T HEOREM 16.5.
Let f1 , . . . , fn ∈ R n be linearly independent, and fi∗ , . . . , fn∗ their Gram-Schmidt
orthogonal basis. Let 0 ≤ k ≤ n, and let Uk = ∑1≤i≤k R fi ⊆ R n be the R-subspace
spanned by f1 , . . . , fk .

(i) ∑1≤i≤k R fi∗ = Uk .

(ii) fk∗ is the projection of fk onto the orthogonal complement
⊥
Uk−1 = { f ∈ R n : f ⋆ u = 0 for all u ∈ Uk−1 }

of Uk−1 , and hence in particular || fk∗ || ≤ || fk ||.

(iii) f1∗ , . . . , fn∗ are pairwise orthogonal, that is, fi∗ ⋆ f j∗ = 0 if i 6= j.
   ∗ 
f1 f1
 ..   .. 
(iv) det  .  = det  .  .
fn fn∗

F IGURE 16.2: The Gram-Schmidt orthogonal basis of (12, 2) and (13, 4).

E XAMPLE 16.3 (continued). We have f1∗ = f1 = (12, 2),

f2 ⋆ f1∗ 41 11 66
µ21 = ∗ = , f2∗ = f2 − µ21 f1∗ = − , .
∗
f1 ⋆ f1 37 37 37
This is illustrated in Figure 16.2: the vector f2∗ (pink) is the projection of f2 (green)
onto the orthogonal complement of f1 (red). ✸

An immediate consequence of Theorem 16.5 is the following famous inequality.

16.2. Lenstra, Lenstra and Lovász’ basis reduction algorithm 477

T HEOREM 16.6 Hadamard’s inequality.

Let A ∈ R n×n, with row vectors f1 , . . . , fn ∈ R 1×n, and B ∈ R such that all entries of
A are at most B in absolute value. Then

| det A| ≤ || f1 || · · · || fn || ≤ nn/2 Bn .

P ROOF. We may assume that A is nonsingular and the fi are linearly independent.
If ( f1∗ , . . . , fn∗ ) is their Gram-Schmidt orthogonal basis, then Theorem 16.5 implies
that    ∗ 
f1 f
 .   .1 
det  .  = det  .  = || f1∗ || · · · || fn∗ || ≤ || f1 || · · · || fn ||.
. .
fn fn∗

The second inequality follows from noting that || fi || ≤ n1/2 B for all i. ✷

Of course, the theorem also holds for the column vectors of A.

The next lemma exhibits the connection between the Gram-Schmidt orthogonal
basis and short vectors.

L EMMA 16.7. Let L ⊆ R n be a lattice with basis ( f1 , . . . , fn ), and let ( f1∗ , . . . , fn∗ )
be its Gram-Schmidt orthogonal basis. Then for any f ∈ L \ {0} we have

|| f || ≥ min{|| f1∗ ||, . . . , || fn∗ ||}.

P ROOF. Let f = ∑1≤i≤n λi fi ∈ L \ {0} be arbitrary, with all λi ∈ Z, and let k be the
highest index such that λk 6= 0. Substituting ∑1≤ j≤i µi j f j∗ for fi yields

f= ∑ λi ∑ µi j f j∗ = λk fk∗ + ∑ νi fi∗
1≤i≤k 1≤ j≤i 1≤i<k

for some appropriate νi ∈ R. Then

! !
2
|| f || = f ⋆ f = λk fk∗ + ∑ νi fi∗ ⋆ λk fk∗ + ∑ νi fi∗
1≤i<k 1≤i<k

= λ2k ( fk∗ ⋆ fk∗ ) + ∑ νi2 ( fi∗ ⋆ fi∗ ) ≥ λ2k · || fk∗ ||2

1≤i<k

≥ || fk∗ ||2 ≥ min{|| f1∗ ||2 , . . . , || fn∗ ||2 },

where we used the pairwise orthogonality of the fi∗ and that λk ∈ Z \ {0}. ✷
478 16. Short vectors in lattices

u f1

F IGURE 16.3: The vectors computed by the basis reduction algorithm 16.10 for the lattice
of Example 16.3.

Our goal is to compute a short vector in L. If the Gram-Schmidt orthogonal basis

of ( f1 , . . . , fn ) is a basis for the lattice L generated by f1 , . . . , fn , then the lemma
says that one of the fi∗ is a shortest vector. But usually the fi∗ are not even in L,
as in Example 16.4. Lemma 16.7 provides a lower bound on the length of nonzero
vectors in L, in terms of the GSO. With the following definition, we get a similar,
though somewhat weaker, bound in terms of the original basis.

D EFINITION 16.8. Let f1 , . . . , fn ∈ R n be linearly independent and ( f1∗ , . . . , fn∗ )

the corresponding Gram-Schmidt orthogonal basis. Then ( f1 , . . . , fn ) is reduced
if || fi∗ ||2 ≤ 2|| fi+1
∗
||2 for 1 ≤ i < n.

T HEOREM 16.9.
Let ( f1 , . . . , fn ) be a reduced basis of the lattice L ⊆ R n and f ∈ L \ {0}. Then
|| f1 || ≤ 2(n−1)/2 · || f ||.

P ROOF. We have || f1 ||2 = || f1∗ ||2 ≤ 2|| f2∗ ||2 ≤ 22 || f3∗ ||2 ≤ · · · ≤ 2n−1 || fn∗ ||2 . Thus
|| f || ≥ min{|| f1∗ ||, . . . , || fn∗ ||} ≥ 2−(n−1)/2 || f1 ||, using Lemma 16.7. ✷

We now present an algorithm that computes a reduced basis of a lattice L ⊆ Z n

from an arbitrary basis. One can use this to find a reduced basis of a lattice in Q n ,
by multiplying with a common denominator of the given basis vectors. For µ ∈ R,
we write ⌈µ⌋ = ⌊µ + 1/2⌋ for the integer nearest to µ.

A LGORITHM 16.10 Basis reduction.

Input: Linearly independent row vectors f1 , . . . , fn ∈ Z n .
Output: A reduced basis (g1 , . . . , gn ) of the lattice L = ∑1≤i≤n Z fi ⊆ Z n .
16.2. Lenstra, Lenstra and Lovász’ basis reduction algorithm 479

1. for i = 1, . . . , n do gi ←− fi
compute the GSO G∗ , M ∈ Q n×n , as in (1) and (2), i ←− 2
2. while i ≤ n do

3. for j = i − 1, i − 2, . . . , 1 do
4. gi ←− gi − ⌈µi j ⌋g j , update the GSO { replacement step }

5. if i > 1 and ||g∗i−1 ||2 > 2||g∗i ||2

then exchange gi−1 and gi and update the GSO, i ←− i − 1
else i ←− i + 1
6. return g1 , . . . , gn

g1 g∗1
step M action
g2 g∗2

12 2 1 0 12 2
4 41 row 2 ←− row 2 − row 1
13 4 37 1 − 11
37
66
37

12 2 1 0 12 2
5 4 exchange rows 1 and 2
1 2 37 1 − 11
37
66
37

1 2 1 0 1 2
4 44 row 2 ←− row 2 − 3 · row 1
12 2 16
5 1 5 − 22
5

1 2 1 0 1 2
6 1 44
9 −4 5 1 5 − 22
5

TABLE 16.4: Trace of the basis reduction algorithm 16.10 on the lattice of Example 16.3.

In fact, Algorithm 16.10 does more than required: Lemma 16.12 (iii) below
implies that |µi j | ≤ 1/2 holds for the GSO of the reduced basis (g1 , . . . , gn ). A re-
duced basis with this additional property is “almost orthogonal”.

E XAMPLE 16.3 (continued). Table 16.4 traces the algorithm on the lattice of Ex-
ample 16.3, and Figure 16.3 depicts the vectors gi in the computation. We start
with g1 = f1 = (12, 2) (red) and g2 = f2 = (13, 4) (green). In the second row of
Table 16.4, g2 is replaced by u = g2 − ⌈41/37⌋g1 = (1, 2) (yellow). Then g1 = f1
and g2 = u are exchanged in the third row. In the last row, v = g2 − ⌈16/5⌋g1 =
f1 − 3u = (9, −4) (blue) is computed, and the algorithm returns the reduced basis
u = (1, 2) and v = (9, −4). We can see clearly in Figure 16.3 that the final g1 = u
(the yellow vector) is much shorter than the two input vectors f1 , f2 , and that the
computed basis u, v (the yellow and the blue vectors) is nearly orthogonal. ✸
480 16. Short vectors in lattices

In the example above, the final g1 is actually a shortest vector. This seems to
happen quite often, but Theorem 16.9 only guarantees that the norm of the first
vector in the computed basis is bigger by a factor of at most 2(n−1)/2 than the norm
of a shortest vector, where n is the dimension of the lattice.

16.3. Cost estimate for basis reduction

T HEOREM 16.11.
Algorithm 16.10 correctly computes a reduced basis of L and runs in polyno-
mial time. It uses O(n4 log A) arithmetic operations on integers whose length is
O(n log A), where A = max1≤i≤n || fi ||.

The idea of the estimate on the number of arithmetic operations is as follows.

Each execution of steps 4 or 5 has polynomial cost, and it is sufficient to bound
the number of passes through step 5 with an exchange. In fact, at first glance it is
not obvious that the algorithm terminates at all, since the decrease and increase of
i in step 5 might continue forever. The crucial point then is to exhibit a value D
with the following properties: It is always a positive integer, reasonably small in
the beginning, and does not change in the algorithm except that at each exchange
step it decreases (at least) by a factor of 3/4. Therefore only few exchange steps
can happen.
To structure the somewhat lengthy proof, we first investigate in the following
two lemmas how the GSO of (g1 , . . . , gn ) changes in steps 4 and 5.

L EMMA 16.12. (i) We consider one execution of step 4, and let λ = ⌈µi j ⌋ for
short. Let G, G∗ , M and H, H ∗ , N in Q n×n be the matrices of the gk , g∗k , µkl
before and after the replacement, respectively, and E = (ekl ) ∈ Z n×n the
matrix which has ekk = 1 for all k, ei j = −λ, and ekl = 0 otherwise. Then

H = EG, N = EM, and H ∗ = G∗ .

(ii) The following invariant holds before each execution of step 4:

1
|µil | ≤ for j < l < i.
2

(iii) The Gram-Schmidt orthogonal basis g∗1 , . . . , g∗n does not change in step 4,
and after the loop in step 3 we have |µil | ≤ 1/2 for 1 ≤ l < i.

P ROOF. (i) The equality H = EG is just another way of saying that gi is replaced
by gi − λg j and all other gk remain unchanged. Since j < i, for any k ≤ n the space
spanned by g1 , . . . , gk remains the same, and hence the orthogonal vectors g∗1 , . . . , g∗n
16.3. Cost estimate for basis reduction 481

are not changed, which means that G∗ = H ∗ . Now the third claim follows from the
equations analogous to (2),

EMG∗ = EG = H = NH ∗ = NG∗ ,

and the fact that G∗ is invertible.

(ii) The invariant is trivial at the beginning of the loop, and we suppose that it
holds before step 4. Now the multiplication by E subtracts λ times row j from
row i of M, which does not affect the µil with j < l. Finally, µi j is replaced by
µi j − λµ j j , which has absolute value at most 1/2, by the choice of λ, and the
invariant holds again before the next execution of step 4.
(iii) is immediate from (i) and (ii). ✷

 
1 0 ··· ··· ··· ··· ··· 0
 .. .. 
 · 1 . . 
 .. 
 .. 
 · · 1 . . 
 .. 
 .. 
 · · · 1 . . 
 .. 
 .. 
 · · · · 1 . . 
 .. 
 .. 
 ◦ ◦ • · · 1 . . 
 
∗ ∗ ∗ ∗ ∗ ∗ 1 0
∗ ∗ ∗ ∗ ∗ ∗ ∗ 1

F IGURE 16.5: The effect of one replacement step on the µi j .

The effect of one replacement step 4 is illustrated in Figure 16.5 for n = 8, i = 6,

and j = 3. Each · represents a number which is absolutely less that 1/2; it is not
modified in the current or any later replacement in the loop 3. Each ◦ represents
a number that is changed in step 4, and the • is µ63 , which is reduced to not more
than 1/2 in absolute value by it. (This is also the position of the lonely nonzero
off-diagonal entry of E.) The values of ∗ are arbitrary and not changed by the
current or any later replacement in the loop 3.

L EMMA 16.13. Suppose that gi−1 and gi are exchanged in step 5, and denote by
hk and h∗k the vectors and their Gram-Schmidt orthogonal basis after the exchange,
respectively. Then

(i) h∗k = g∗k for k ∈ {1, . . . , n} \ {i − 1, i},

(ii) ||h∗i−1 ||2 < 43 ||g∗i−1 ||2 ,

(iii) ||h∗i || ≤ ||g∗i−1 ||.

482 16. Short vectors in lattices

P ROOF. (i) For such k, we have gk = hk and ∑1≤l<k R gl = ∑1≤l<k R hl , and so

Theorem 16.5 (ii) implies that h∗k = g∗k .
(ii) The vector h∗i−1 is the component of gi orthogonal to ∑1≤l<i−1 R gl . Since
gi = g∗i + ∑1≤l≤i−1 µi,l g∗l , we have h∗i−1 = g∗i + µi,i−1 g∗i−1 and

1 1 3
||h∗i−1 ||2 = ||g∗i ||2 + µ2i,i−1 ||g∗i−1 ||2 ≤ ||g∗i−1 ||2 + ||g∗i−1 ||2 = ||g∗i−1 ||2 ,
2 4 4
by the condition for the exchange, the orthogonality of g∗i and g∗i−1 , and the fact
that |µi,i−1 | ≤ 1/2, by the previous lemma.
(iii) We let u = ∑1≤l<i−1 µi−1,l g∗l and U = ∑1≤l<i−1 R gl for short. Then the vector
hi is the component of gi−1 = g∗i−1 + u orthogonal to U + R gi . Now Theorem 16.5
∗

implies that u ∈ U ⊆ U + R gi . Thus h∗i is the component of g∗i−1 orthogonal to

U + R gi , and hence ||h∗i || ≤ ||g∗i−1 ||. ✷

L EMMA 16.14. At the beginning of each iteration of the loop in step 2, the fol-
lowing invariants hold:

1
|µkl | ≤ for 1 ≤ l < k < i, ||g∗k−1 ||2 ≤ 2||g∗k ||2 for 1 < k < i.
2

P ROOF. The claim is trivial at the beginning of the algorithm. So we assume that
the invariants hold at the beginning of step 3 and prove that they hold again at the
end of step 5. Lemma 16.12 implies that the first invariant also holds for k = i
immediately before step 5, and since an exchange does not affect the µkl for k <
i − 1, the first invariant holds after step 5 in any case. Again by Lemma 16.12, the
g∗k do not change in steps 3 and 4, and the second invariant is still valid immediately
before step 5. Now an exchange in step 5 does not affect the g∗k for k 6∈ {i − 1, i},
by Lemma 16.13, and the second invariant holds again after step 5 in any case as
well. ✷

In particular, the above lemma implies that the basis g1 , . . . , gn is reduced upon
termination of the algorithm, and it remains to bound the number of iterations of
the loop in step 2. At any stage in the algorithm and for 1 ≤ k ≤ n, we consider the
matrix  
g1
Gk =  ...  ∈ Z k×n
 

comprising the first k vectors, their Gramian matrix Gk · GTk = (g j ⋆ gl )1≤ j,l≤k ∈
Z k×k , and the Gramian determinant dk = det(Gk · GTk ) ∈ Z. For convenience, we
let d0 = 1.
16.3. Cost estimate for basis reduction 483

L EMMA 16.15. For 1 ≤ k ≤ n, we have dk = ∏ ||g∗l ||2 > 0.

1≤l≤k

P ROOF. Let 1 ≤ k ≤ n, Mk the upper left k ×k submatrix of the transition matrix M,

and  ∗ 
g1
 .. 
Gk =  .  ∈ R k×n .
∗

g∗k
Then det Mk = 1, G∗k · (G∗k )T ∈ R k×k is a diagonal matrix with diagonal entries
||g∗1 ||2 , . . . , ||g∗k ||2 , Gk = Mk G∗k , and

dk = det(Gk GTk ) = det(Mk G∗k (G∗k )T (Mk )T )

= det(Mk ) · det(G∗k (G∗k )T ) · det(MkT ) = ∏ ||g∗k ||2 . ✷
1≤l≤k

L EMMA 16.16. (i) In steps 3 and 4, none of the dk changes.

(ii) If gi−1 and gi are exchanged in step 5 and dk∗ denotes the new value of dk ,
for any k, then dk∗ = dk for k 6= i − 1 and di−1
∗
≤ 43 di−1 .

P ROOF. (i) follows from Lemmas 16.12 and 16.15.

(ii) For k 6= i−1 the effect of the exchange on the matrix Gk is a multiplication by
a k × k permutation matrix, whose determinant is 1 or −1. Thus dk∗ = dk . Lemma
16.15 says that di−1 = ∏1≤l<i ||g∗l ||2 , and Lemma 16.13 yields di−1
∗
≤ 43 di−1 . ✷

So now we have found our desired loop variant D = ∏1≤k<n dk , and can bound
the number of arithmetic operations. Step 1 takes O(n3 ) operations in Z. With
the notation as in Lemma 16.12, one execution of step 4 amounts to computing
the matrix products EG and EM, at a cost of O(n) operations. Thus the number
of operations in Z used in the loop in step 3 is O(n2 ). If an exchange happens in
step 5, then only g∗i−1 , g∗i , and rows and columns i − 1 and i of the transition matrix
M change, and they can be updated using O(n) operations, which is dominated by
the cost for the loop 3. We always have 1 ≤ D ∈ Z, and its initial value D0 , at the
start of the algorithm, satisfies

D0 = || f1∗ ||2(n−1) || f2∗ ||2(n−2) · · · || fn−1

∗
||2 ≤ || f1 ||2(n−1) || f2 ||2(n−2) · · · || fn−1 ||2 ≤ An(n−1) ,

since fi∗ is a projection of fi for all i. By Lemma 16.16, D does not change in steps
3 and 4, and decreases at least by a factor of 3/4 if an exchange happens in step 5,
so that the number of such exchange steps is bounded by log4/3 D0 ∈ O(n2 log A).
At any stage in the algorithm, let e ∈ N denote the number of exchange steps
performed so far and e∗ the number of times where the else-branch in step 5 has
484 16. Short vectors in lattices

been taken. Since i is decreased by one in an exchange step and increased by one
otherwise, the number i + e − e∗ is constant throughout the loop of step 2. Initially,
it equals 2, and hence n + 1 + e − e∗ = 2 at termination. Thus the total number of
iterations of the loop in step 2 is e + e∗ = 2e + n − 1 ∈ O(n2 log A), and we get a
total of O(n4 log A) operations in Z, as claimed in Theorem 16.11.

!  
g1 ||g∗1 ||2
µ21  ||g∗ ||2  d1 , d2
step g2 action
µ31 µ32 2 D
g3 ||g∗3 ||2
 
! ! 3
1 1 1 1
3  14  3, 14
4 −1 0 2  3  rep(3, 2)
14 13 42
3 5 6 3 14 9
14
 
! ! 3
1 1 1 1
3  14  3, 14
4 −1 0 2   rep(3, 1)
13 −1 3 42
4 5 4 3 14 9
14
 
! ! 3
1 1 1 1
3  14  3, 14
5 −1 0 2   ex(3, 2)
1 −1 3 42
0 1 0 3 14 9
14
 
! ! 3
1 1 1 1
3  2  3, 2
5 0 1 0   ex(2, 1)
1 −1 3 6
−1 0 2 3 2 9
2
 
! ! 1
0 1 0 1   1, 2
4 1 1 1  2  rep(2, 1)
0 1 2
−1 0 2 2 9
2
 
! ! 1
0 1 0 0   1, 2
6 1 0 1  2 
0 1 2
−1 0 2 2 9
2

TABLE 16.6: Trace of the basis reduction algorithm 16.10 on the lattice L = Z(1, 1, 1) +
Z(−1, 0, 2) + Z(3, 5, 6). We have d1 = ||g∗1 ||2 , d2 = ||g∗1 ||2 ||g∗2 ||2 , D = d1 d2 , and (det L)2 =
d3 = ||g∗1 ||2 ||g∗2 ||2 ||g∗3 ||2 = 9 throughout. Only the relevant values of the µi j and the squares
of the norms of the g∗i are given, and we have abbreviated a replacement gi ←− gi − ⌈µi j ⌋g j
by rep(i, j) and an exchange of gi and gi−1 by ex(i, i − 1) in the “action” column.

Table 16.6 traces Algorithm 16.10 and the values of the Gramian determinants
dk and of their product D on a three-dimensional lattice.
We still have the task of bounding the size of (the numerators and denomina-
tors of) the rational numbers that occur in the algorithm. The following lemma
16.3. Cost estimate for basis reduction 485

gives general bounds on the Gram-Schmidt orthogonalization at any stage in the

algorithm.

L EMMA 16.17. Let g1 , . . . , gn ∈ Z n , let G∗ and M be its Gram-Schmidt orthogo-

nalization, and 1 ≤ l < k ≤ n. Then

(i) dk−1 g∗k ∈ Z n ,

(ii) dl µkl ∈ Z,
1/2
(iii) |µkl | ≤ dl−1 ||gk ||.

P ROOF. (i) We can write g∗k = gk − ∑1≤l<k λkl gl , with some λkl ∈ R. (In fact, the
λkl are the coefficients of M −1 below the diagonal.) We take the inner product with
g j for some j < k. Then g∗k ⋆ g j = 0, and

gk ⋆ g j = ∑ λkl (gl ⋆ g j ).
1≤l<k

Thus λk1 , . . . , λk,k−1 form the solution of a (k − 1) × (k − 1) system of linear equa-

tions. The coefficient matrix is Gk−1 · GTk−1 , its determinant is dk−1 , and by Cra-
mer’s rule (Theorem 25.6), dk−1 λkl ∈ Z for each l. Hence dk−1 g∗k ∈ Z n .
gk ⋆ g∗ gk ⋆ g∗l
(ii) dl µkl = dl ∗ 2l = dl = dl−1 (gk ⋆ g∗l ) = gk ⋆ (dl−1 g∗l ) ∈ Z, by (i).
||gl || dl /dl−1
(iii) By Lemma 16.15, we have ||g∗l ||2 = dl /dl−1 ≥ 1/dl−1 . Using Cauchy’s in-
equality (Exercise 16.10), we find
|gk ⋆ g∗l | ||gk ||||g∗l || ||gk || 1/2
|µkl | = ≤ = ≤ dl−1 ||gk ||. ✷
||g∗l ||2 ||g∗l ||2 ||g∗l ||

We have assumed that || fk || ≤ A for all k. Then A is also an upper bound on

the initial Gram-Schmidt orthogonal basis: || fk∗ || ≤ A for all k. Lemmas 16.12 and
16.13 guarantee that the value of max{||g∗k ||: 1 ≤ k ≤ n} never increases during the
algorithm, so that at any stage and for all k, we have

||g∗k || ≤ A and dk = ∏ ||g∗l ||2 ≤ A2k . (3)

1≤l≤k

L EMMA 16.18. Let 1 ≤ k ≤ n.

(i) At any stage in the algorithm, except possibly in steps 3 and 4 when k = i,
we have ||gk || ≤ n1/2 A.
(ii) During each execution of step 4, ||gi || ≤ n(2A)n .
486 16. Short vectors in lattices

P ROOF. Initially, we have ||gk || ≤ A for all k. Step 5 does not change the ||gk ||, and
it is sufficient to examine what happens in steps 3 and 4. So we assume that the
claims are true immediately before step 3. The vectors gk for k 6= i are not affected
by step 4. Let mi = max{|µil |: 1 ≤ l ≤ i} be the maximal absolute value in the ith
row of M. From gi = ∑1≤l≤i µil g∗l and the orthogonality of the g∗l ’s, we find

||gi ||2 = ∑ µ2il ||g∗l ||2 ≤ nm2i A2 , and ||gi || ≤ n1/2 mi A. (4)
1≤l≤i

At the end of loop 3, we have mi = 1, by Lemma 16.12; this concludes the proof
of (i).
Lemma 16.17 and (i) imply that at the beginning of loop 3, we have

1/2
mi ≤ max{dl−1 : 1 ≤ l < i} · ||gi || ≤ An−2 · n1/2 A = n1/2 An−1 . (5)

We now consider the replacement in step 4. Since mi ≥ 1 and |µ jl | ≤ 1/2 for

1 ≤ l < j, by Lemma 16.14, we have

1 1 3 1
|µil − ⌈µi j ⌋µ jl | ≤ |µil | + |⌈µi j ⌋| · |µ jl | ≤ mi + (mi + ) · = mi + ≤ 2mi
2 2 2 4

for 1 ≤ l < j. For l = j, the new value of µi j is absolutely at most 1/2, by con-
struction, and also the values of µil for l > j, by Lemma 16.12. Together we find
that for each value of j, the value of mi doubles at most, so that during the loop 3
the value of mi increases at most by a factor 2i−1 ≤ 2n−1 . Together with (5), this
shows that mi ≤ n1/2 (2A)n−1 at all times. Using (4), we have

||gi || ≤ n1/2 mi A ≤ n(2A)n . ✷

We now put things together for the final result.

P ROOF of Theorem 16.11. We have already shown the correctness, and that the
number of arithmetic operations in Z is O(n4 log A). The denominators dl of the
rational numbers computed in the algorithm are at most A2n , and their length is
O(n log A). The numerators are absolutely at most
◦ ||gk ||∞ ≤ ||gk || ≤ n(2A)n for gk , by Lemma 16.18,
◦ ||dk−1 g∗k ||∞ ≤ ||dk−1 g∗k || ≤ A2k−2 A ≤ A2n for g∗k , by Lemma 16.17 and (3),
1/2
◦ |dl µkl | ≤ dl dl−1 ||gl || ≤ A2l Al−1 n(2A)n ≤ n(2A4 )n for µkl , by Lemmas 16.17 and
16.18,
and hence their length is O(n log A) as well. ✷
16.4. From short vectors to factors 487

C OROLLARY 16.19.
Given linearly independent vectors f1 , . . . , fn ∈ Z n with max1≤i≤n || fi || = A, we can
compute a “short” nonzero vector u ∈ L = ∑1≤i≤n Z fi with

||u|| ≤ 2(n−1)/2 min{|| f ||: 0 6= f ∈ L}

using O((n4 log A)M(n log A) log(n log A)) or O∼ (n5 log2 A) word operations.

P ROOF. The claim follows immediately from Theorem 16.11, noting that one
arithmetic operation in Z (addition, multiplication, division with remainder, or
gcd) on integers of length m can be performed with O(M(m) log m) or O∼ (m)
word operations. ✷

We note that A ≤ n1/2 B when B is an upper bound on the absolute value of

the coefficients of the fi , and hence n5 log2 A ∈ O∼ (n5 log2 B), which is indeed
polynomial in the input size of about n2 log264 B words.

16.4. From short vectors to factors

We identify a polynomial f ∈ Z[x] of degree n with its coefficient vector in Z n+1 ,
and let || f || denote the Euclidean norm of this coefficient vector.
The following lemma states that if two polynomials in Z[x] have a nonconstant
common divisor modulo m for some m ∈ N and m is absolutely larger than their
resultant, then they have a nonconstant common factor in Z[x]. For a prime m, it is
a consequence of Lemma 6.25.

L EMMA 16.20. Let f , g ∈ Z[x] have positive degrees n, k, respectively, and sup-
pose that u ∈ Z[x] is nonconstant, monic, and divides both f and g modulo m for
some m ∈ N with || f ||k ||g||n < m. Then gcd( f , g) ∈ Z[x] is nonconstant.

P ROOF. Suppose that gcd( f , g) = 1 in Q[x]. Then there exist s,t ∈ Z[x] such
that s f + tg ≡ res( f , g) mod m, by Corollary 6.21. Since u divides both f and
g modulo m, it divides res( f , g) modulo m. But u is monic and nonconstant, and
thus res( f , g) ≡ 0 mod m. Since | res( f , g)| ≤ || f ||k ||g||n < m, by Theorem 6.23,
it follows that res( f , g) is zero. This contradiction to our assumption shows that
gcd( f , g) ∈ Q[x] is nonconstant. By Corollary 6.10, the gcd of f and g in Z[x] is
also nonconstant. ✷

The idea of the factoring algorithm is as follows. Suppose that we are given a
squarefree primitive polynomial f ∈ Z[x] of degree n and have computed a monic
polynomial u ∈ Z[x] of degree d < n that divides f modulo m for some m ∈ N.
488 16. Short vectors in lattices

Then we find a “short” polynomial g ∈ Z[x], meaning that ||g||n < m|| f ||− deg g , that
is also divisible by u modulo m. Then the above lemma gives us a nontrivial factor
of f in Z[x].
To find such a g of degree less than some bound j, we consider the lattice L ⊆ Z j
generated by (the coefficient vectors of)
{uxi : 0 ≤ i < j − d} ∪ {mxi : 0 ≤ i < d}.
An element g of L has the form
g = qu + rm with q, r ∈ Z[x], deg q < j − d, deg r < d, (6)
and degree less than j. In particular, u divides g modulo m. If, on the other
hand, some g ∈ Z[x] is of degree less than j and divisible by u modulo m, then we
have g = q∗ u + r∗ m for some q∗ , r∗ ∈ Z[x]. Division with remainder by the monic
polynomial u yields q∗∗ , r∗∗ ∈ Z[x] with r∗ = q∗∗ u+r∗∗ and deg r∗∗ < deg u. Letting
q = q∗ + mq∗∗ and r = r∗∗ , we see that g has the form (6), and conclude that
g ∈ L ⇐⇒ deg g < j and u divides g modulo m. (7)
Thus we can use basis reduction to find a “short” vector g ∈ L with the desired
properties.

E XAMPLE 16.21. We let f = x3 − 1 ∈ Z[x] and m = 76 = 117 649. Using factor-

ization over finite fields and Hensel lifting, we find that f ≡ (x − 1)(x − 2)(x + 3)
mod 7 and f ≡ (x − 1)(x − 34 967)(x + 34 968) mod 76 . Taking u = x − 34 967
and j = 3, we consider the lattice L ⊆ Z 3 generated by the coefficient vectors of
ux = x2 − 34 967x, u, and m, namely
(1, −34 967, 0), (0, 1, −34 967), (0, 0, 117 649).
Since this lattice contains the coefficient vectors of all polynomials in Z[x] of de-
gree at most two that are divisible by u modulo m, it contains in particular the irre-
ducible factor f1 of f that is divisible by u modulo m, provided √ that deg f1 < 3. By
deg f
Mignotte’s bound 6.33, we then have || f1 || ≤ 2 || f || = 8 2, and Theorem 16.9
guarantees that the basis reduction algorithm 16.10 will √ find a polynomial g ∈ Z[x]
of degree at most two with ||g|| ≤ 2(3−1)/2 || f1 || ≤ 16 2. Then
√ 2 √ √
|| f ||deg g ||g||deg f ≤ 2 (16 2)3 = 16 384 2 < m,
and Lemma 16.20 implies that f and g have a common divisor in Z[x]. Indeed,
Algorithm 16.10 finds that the three vectors
(1, 1, 1), (132, 95, −228), (228, −132, −95)
form a reduced basis of L, and if we take g = x2 + x + 1, corresponding to the first
vector, then gcd( f , g) = g. ✸
16.5. A polynomial-time factoring algorithm for Z[x] 489

16.5. A polynomial-time factoring algorithm for Z[x]

Now we are ready to state a polynomial-time algorithm for factoring primitive
polynomials in Z[x]. The first four steps are identical to Zassenhaus’ algorithm
15.19, with the exception of the values of B in step 1 and of l in step 2, and the
factor combination is replaced by the short vector computation.

A LGORITHM 16.22 Polynomial-time factorization in Z[x].

Input: A squarefree primitive polynomial f ∈ Z[x] of degree n ≥ 1 with lc( f ) > 0
and max-norm || f ||∞ = A.
Output: The irreducible factors { f1 , . . . , fk } ⊆ Z[x] of f .
1. if n = 1 then return { f }
b ←− lc( f ), B ←− (n + 1)1/2 2n A
C ←− (n + 1)2n A2n−1 , γ ←− ⌈2 log2 C⌉
2. repeat choose a prime p ≤ 2γ ln γ , f ←− f mod p
′
until p ∤ b and gcd( f , f ) = 1 in F p [x]
2
l ←− ⌈log p (2n /2 B2n )⌉
3. { modular factorization }
compute h1 , . . . , hr ∈ Z[x] of max-norm at most p/2 that are nonconstant,
monic, and irreducible modulo p, such that f ≡ bh1 · · · hr mod p
4. { Hensel lifting }
call Algorithm 15.17 to compute a factorization f ≡ bg1 · · · gr mod pl with
monic polynomials g1 , . . . , gr ∈ Z[x] of max-norm at most pl /2 such that
gi ≡ hi mod p for 1 ≤ i ≤ r
5. { initialize the index set T of modular factors still to be treated, the set G of
factors found, and the polynomial f ∗ still to be factored }
T ←− {1, . . . , r}, G ←− Ø, f ∗ ←− f
6. while T 6= Ø do
7. choose u among {gt :t ∈ T } of maximal degree
d ←− deg u, n∗ ←− deg f ∗
{ find the irreducible factor of f ∗ that is divisible by u modulo p }
for j = d + 1, . . . , n∗ do
8. { short vector computation }
call Algorithm 16.10 to compute a “short” vector g∗ in the lat-
tice L ⊆ Z j generated by the coefficient vectors of

{uxi : 0 ≤ i < j − d} ∪ {pl xi : 0 ≤ i < d},

and denote the corresponding polynomial also by g∗

490 16. Short vectors in lattices

9. determine by trial division the set S ⊆ T of indices i for which

hi divides g∗ modulo p
compute h∗ ∈ Z[x] with max-norm at most pl /2 satisfying h∗ ≡
b ∏i∈T \S gi mod pl
if || pp(g∗ )||1 || pp(h∗ )||1 ≤ B then
{ pp(g∗ ) is the irreducible factor of f ∗ that is divisible
by u modulo p }
T ←− T \ S, G ←− G ∪ {pp(g∗ )},
f ∗ ←− pp(h∗ ), b ←− lc( f ∗ )
break the loop 7 and goto 6
10. T ←− Ø, G ←− G ∪ { f ∗ }
11. return G

To detect whether u in step 7 is already a true factor, we would check whether

u∗ | b f in step 7, where u∗ ≡ bu mod pl and ||u∗ ||∞ ≤ pl /2, before performing any
short vector computations in step 8.

T HEOREM 16.23.
The algorithm works correctly, and its expected cost in word operations is

O n6 (n + log A)M(n2 (n + log A))(log n + loglog A) or O∼ (n10 + n8 log2 A).

P ROOF. For the correctness proof, we show that the invariants

f ∗ ≡ b ∏i∈T gi mod pl , b = lc( f ∗ ), f = ± f ∗ ∏g∈G g,
(8)
each polynomial in G is irreducible
hold each time the algorithm passes through step 6. This is clear initially, and we
may assume that they hold before step 7. Let g ∈ Z[x] be the irreducible factor of
f ∗ that u divides modulo p. Then (8) and Corollary 15.15 imply that u divides g
modulo pl as well. Conversely, if v ∈ Z[x] is a divisor of f which is divisible by u
modulo p, then g divides v in Z[x]. As in the proof of Theorem 15.3, the condition
in step 9 is satisfied if and only if pp(g∗ ) pp(h∗ ) = ± f ∗ , and since deg g∗ < j in
step 8, we conclude that the condition is false as long as j ≤ deg g. In particular,
step 10 guarantees that the invariant (8) holds at the end of the algorithm if #T = 1
and g = f ∗ is irreducible.
Now we assume that deg g < n∗ and let j = 1 + deg g. Then the coefficient
vector of g is in L, by (7), and Theorem 16.9 and Mignotte’s bound 6.33 imply that
||g∗ || ≤ 2( j−1)/2 ||g|| < 2n/2 B. By the choice of l, we have
||g∗ || j−1 ||g||deg g < (2n/2 B)n Bn ≤ pl ,
∗
16.5. A polynomial-time factoring algorithm for Z[x] 491

and Lemma 16.20 says that gcd(g, g∗ ) is nonconstant in Z[x]. Since g is irreducible
and deg g∗ ≤ j − 1 = deg g, we have g = ± pp(g∗ ).
We let h = f ∗ /g and S ⊆ T be as in step 9. As in the proof of Theorem 15.20,
the uniqueness of Hensel lifting (Theorem 15.14) implies that lc(g)h ≡ h∗ mod pl
in step 9. Since pl /2 is larger than the Mignotte bound bB on ||bh||∞ , we have
lc(g)h = h∗ , h = pp(h∗ ), and f ∗ = ± pp(g∗ ) pp(h∗ ), and Corollary 6.33 implies
that the condition in step 9 is true. The actions taken in the then clause ensure
that the invariants (8) hold at the next pass through step 6. This proves that the
algorithm will indeed return the factor g of f .
The cost of the algorithm is dominated by the cost for the short vector computa-
tions in step 7. We have ||v|| ≤ j1/2 ||v||∞ ≤ n1/2 pl for all generators v of L in step 7.
Letting δ = log(n1/2 pl ) ∈ O(n2 + n log B) = O(n2 + n log A), Corollary 16.19 im-
plies that one short vector computation takes O( j4 δ M( jδ ) log( jδ )) word opera-
tions. Let f1 , . . . , fk ∈ Z[x] be the irreducible factors of f . By what we have shown
above, the value j in step 7 runs through j = 2, . . . , 1 + deg fi for each irreducible
factor fi . Now ∑1≤i≤k (1 + deg fi ) = k + n ≤ 2n, and hence

∑ ∑ j4 δ M( jδ ) log( jδ )
1≤i≤k 2≤ j≤1+deg fi

≤ ∑ (1 + deg fi )5 δ M((1 + deg fi )δ ) log((1 + deg fi )δ )

1≤i≤k
5
∈ O(n δ M(nδ ) log(nδ )),

by the superlinearity properties of M (Section 8.3). This establishes the time esti-
mate. ✷

We might also replace steps 1 through 4 of Algorithm 16.22 by the first three
steps of the big prime algorithm 15.2, yielding an algorithm with the same asym-
ptotic time bound.

E XAMPLE 16.24. Let f = 6x4 + 5x3 + 15x2 + 5x + 4 ∈ Z[x], as in Example 15.4.

Choosing p = 5 in step 2 of Algorithm 16.22, we would find l = 39. Since the
coefficients in the algorithm become fairly large with that value of l, we take l = 6
for illustration. We skip steps 3 and 4, since we have already found that f ≡
6g1 g2 g3 g4 mod 56 in Example 15.4 on page 455, where

g1 = x − 5136, g2 = x − 984, g3 = x − 72, g4 = x − 6828.

Thus r = 4. In step 7, we choose u = g1 = x − 5136, and then d = deg u = 1

and n∗ = deg f ∗ = 4. We start the loop in step 7 with j = 2. Then the lattice
in step 8 is generated by the coefficient vectors (1, −5136) and (0, 15 625) of the
two polynomials u and pl . Algorithm 16.10 finds the reduced basis consisting
492 16. Short vectors in lattices

of the two vectors (73, 72) and (−143, 73), and the polynomial g∗ in step 8 is
g∗ = 73x + 72. In step 9, we find that only g1 mod 5 = x − 1 divides g∗ mod 5,
whence S = {1} and
h∗ ≡ 6g2 g3 g4 ≡ 6x3 − 420x2 − 840x − 1728 mod 56 .
Now both g∗ and h∗ are primitive, || pp(g∗ )||1 || pp(h∗ )||1 ≥ || pp(g∗ )||∞ || pp(h∗ )||∞ >
B ≈ 3219.9, and in fact f 6= ± pp(g∗ ) pp(h∗ ), which can also seen by comparing
leading coefficients, and we continue the loop 7 with j = 3.
So now we consider the lattice generated by the vectors
(1, −5136, 0), (0, 1, −5136), (0, 0, 15 625).
Exercise 16.6 shows that a “short” vector in this lattice is (3, 1, 1) ∈ Z 3 , and hence
g∗ = 3x2 + x + 1 in step 8. Now both g1 mod 5 = x − 1 and g3 mod 5 = x − 2 divide
g∗ mod 5, so that S = {1, 3}, h∗ ≡ 6g2 g4 = 6x2 + 3x + 12 mod 56 , pp(g∗ ) = g∗ ,
and pp(h∗ ) = 2x2 + x + 4 in step 9. In fact, we have f = pp(g∗ ) pp(h∗ ), g∗ is an
irreducible factor of f (although we started with a smaller value for l as required),
and the assignments in the if clause yield T = {2, 4}, G = {x − 984, x − 6828},
f ∗ = 2x2 + x + 4, and b = 2. The next iteration of the while loop 6 would reveal
that f ∗ is irreducible, as we have already seen in Example 15.4. ✸

One might run factor combination and the short vector algorithm concurrently
(on one or, even better, on two processors) after Hensel lifting, and take the result
from whoever finishes first. This hybrid algorithm is reasonably fast on all inputs,
at a cost of at most doubling the overall running time.

C OROLLARY 16.25.
A polynomial f ∈ Z[x] of degree n ≥ 1 and with max-norm || f ||∞ = A can be
completely factored in Q[x] with an expected number of

O n6 (n + log A)M(n2 (n + log A))(log n + loglog A) or O∼ (n10 + n8 log2 A)

word operations.

P ROOF. We replace the call to Algorithm 15.19 (using factor combination) by a

call to Algorithm 16.22 (using basis reduction) in step 4 of the complete factor-
ization algorithm 15.5. The dominant cost is for the calls to Algorithm 16.22 in
step 4 for the polynomials g1 , . . . , gs in the squarefree decomposition of pp( f ).
For 1 ≤ i ≤ s, let ni = deg gi and Ai = ||gi ||∞ ≤ (ni + 1)1/2 2ni A, using Mignotte’s
bound 6.33, so that log Ai ∈ O(ni + log A). Then Theorem 16.23 implies that the
cost for the ith call is
O(n6i (ni + log A)M(n2i (ni + log A))(log ni + loglog A))
16.6. Factoring multivariate polynomials 493

word operations, and the claim follows from n1 +· · ·+ns ≤ n and the superlinearity
of M (Section 8.3). ✷

Polynomial-time factoring in F[x, y]. Algorithm 16.22 can be adapted to the

factorization of polynomials in F[x, y], where F is a field with effective univariate
factorization. This is the case, for example, when F is a finite field (Chapter 14),
or when F = Q, by the above. The variant of Algorithm 16.22 is then merely a
(deterministic) polynomial time reduction from bivariate to univariate polynomial
factorization.
The main difference is that basis reduction is much easier over F[y] than over Z.
The appropriate norm is the maximum norm || f ||∞ = max1≤i≤n deg fi for a vector
f = ( f1 , . . . , fn ) ∈ F[y]n . In this case, one can always find a shortest vector in an
n-dimensional lattice in F[y]n in polynomial time (Exercise 16.12).

16.6. Factoring multivariate polynomials

We have discussed at length the factorization of univariate polynomials over finite
fields and over the rational numbers, and briefly indicated a method for bivari-
ate polynomials. What about more variables? This section tells the story of this
exciting subject; a precise description of the algorithms and relevant theorems is
beyond the scope of this book.
The first issue that distinguishes this problem from the univariate and bivariate
cases is that of the appropriate data structure or representation. We assume we
have a polynomial f ∈ F[x, . . . , xt ] of total degree n. In the dense representation,
we write down every term of total degree at most n. For the Fermat polynomial

f = x3 + y3 − z3 ∈ Q[x, y, z] (9)

this reads:

f = 1 · x3 + 0 · x2 y + 0 · x2 z + 0 · x2 + 0 · xy2 + 0 · xyz + 0 · xz2

+ 0 · xy + 0 · xz + 0 · x + 1 · y3 + 0 · y2 z + 0 · y2 (10)
+ 0 · yz2 + 0 · yz + 0 · y + (−1) · z3 + 0 · z2 + 0 · z + 0 · 1

Throughout this book, we have assumed (at least implicitly) this representation
for univariate and bivariate polynomials as inputs to algorithms, such as gcd com-
putations or factorization. (In examples, the format like (9) is used.) Multivariate
polynomials can be factored in random polynomial time in the length of this dense
representation over the usual fields of relevance to computer algebra, such as finite
fields, the rational numbers, and finite algebraic and transcendental extensions of
these. In fact, it is not hard to adapt the gcd algorithms of Chapter 6 and the bi-
variate factorization in the previous section to this situation. Is the problem then
494 16. Short vectors in lattices

completely solved? Obviously not, because the drawback of this representation is

already visible in the small Example (10) above: it is much too large. It has

t +n
αt,n =
n
terms; in the example, α3,3 = 20. This number grows exponentially with t and n,
and our objects become unmanageable even for reasonable values of t and n. Exer-
cise 16.16 describes a factoring algorithm, based on the Kronecker substitution.
It seems clear what to do. The sparse representation as in (9) is much more
concise and readable than (10). In general, it consists of a list of coefficients and
exponents (ak , ik1 , . . . , ikt ) so that
f = ∑ ak · x1ik1 · · · xtikt .
k

If the list consists of s entries, then clearly the length is at least s. But this is not
enough; we have to bring the degree into play, since otherwise arbitrarily large
degrees might occur. So we consider the length of a list entry (ak , ik1 , . . . , ikt ) to be
1 + ik1 + · · · + ikt ; if we count word operations, say over Q, then the summand 1
has to be replaced by the length of ak . This convention for the length can be ex-
pressed by saying that the individual degrees ik1 , . . . are encoded in unary. One
might think that the binary encoding for the exponents is more natural. But then
the degree may be exponential in the length, and even univariate polynomials be-
come unmanageable. For very simple questions no polynomial-time answer is
known in this ultra-concise encoding; for example: given two polynomials in this
representation, does the first one divide the second one?
This sparse representation is the natural mathematical notation, and the user of a
computer algebra system will want to see her input and output in that format. For a
“random” polynomial (with a fixed number of variables and fixed degree), almost
all possible coefficients will be nonzero, and there will not be much difference be-
tween the sparse and the dense representations. However, natural problems given
to a computer algebra system tend to be sparse; see the cyclohexane example in
Section 24.4.
Unfortunately, no algorithm for factoring is known that runs in time polynomial
in the length of the sparse representation. There are even examples where the
output size is more than polynomial in the input size. But even if one allows time
polynomial in the combined input plus output size, no direct “sparse” solution is
known, but the arithmetic circuit and black box representations discussed below
solve the problem.
The key to get over this hurdle is to consider even more concise representations.
At first sight, the problem becomes even harder, since the input size (for a fixed
polynomial) might be even smaller. But the gain is that the output might also be
smaller, and, above all, that new computational methods may be used.
16.6. Factoring multivariate polynomials 495

x y z

F IGURE 16.7: An arithmetic circuit for x3 + y3 − z3 .

The first new idea is the arithmetic circuit representation, where a polyno-
mial is represented by an arithmetic circuit, as illustrated in Chapters 2 and 8, that
computes f using x1 , . . . , xt and constants from F as inputs and only addition and
multiplication gates. (There is an efficient way to remove division gates if they
are also present.) For (9), this looks like Figure 16.7. Equivalently, an arithmetic
circuit may be represented by a straight-line program such as

g1 ←− x ∗ x
g2 ←− g1 ∗ x
g3 ←− y ∗ y
g4 ←− g3 ∗ y
g5 ←− z ∗ z
g6 ←− g5 ∗ z
g7 ←− g2 + g4
g8 ←− g6 − g7

This approach culminated in Kaltofen’s (1989) celebrated result of random poly-

nomial-time factorization in the arithmetic circuit representation. (Actually, the
problem of computing a circuit for f from one for f p in characteristic p is still
open in this context.)
The technical key ingredient is an efficient version of Hilbert’s irreducibility
theorem . In Hilbert’s (1892) version (see page 586), it says that an irreducible
polynomial f ∈ Q[x, y] usually yields an irreducible polynomial f (x, a) ∈ Q[x] if
an integer a is substituted for y. (This is not true for polynomials over finite fields
or the complex numbers.) Although there are many generalizations and improve-
496 16. Short vectors in lattices

ments of this in the literature, none of them is strong enough to derive a (proba-
bilistic) polynomial-time algorithm.
However, the situation can be saved by leaving one more variable and consid-
ering more general substitutions of the form ax1 + bx2 + c for the variables, where
a, b, c are chosen randomly from a (sufficiently large) finite subset of the ground
field. Then for an irreducible polynomial in F[x1 , . . . , xn ], the substituted polyno-
mial in F[x1 , x2 ] will be irreducible for almost all random choices. An arbitrary
polynomial can be mapped to two variables by such a substitution, then the bivari-
ate factoring technology can be applied, and finally Hensel lifting to get back to
the original multivariate situation. The role of the efficient Hilbert irreducibility
theorem is to insure that one (probably) does not have to worry about irreduci-
ble polynomials splitting after the substitution. This phenomenon required fac-
tor combination or a short vector computation for substitutions Z[x] −→ Z p [x] or
F[x, y] −→ F[x], but such methods would not lead to polynomial-time methods in
the multivariate case.
An even more powerful technique is the black box representation. A polyno-
mial f ∈ F[x1 , . . . , xn ] is now given by a “black box” subroutine which on input
a1 , . . . , an ∈ F returns the value f (a1 , . . . , an ) ∈ F. We have discussed this type of
representation for matrices in Section 12.4. Initially, a polynomial will often be
given in some other representation, say the sparse one. It is then easy to build
a black box for it. The power of the method is that now these black boxes can
be handled efficiently; Kaltofen & Trager (1990) give random polynomial-time
algorithms for several problems, including factorization. Finally, the black box
representation has to be converted back to human-readable output. There are sev-
eral interpolation algorithms for achieving this. Now an output polynomial with a
few hundred terms is rather useless for the human reader (but possibly useful as
input to another procedure). One interpolation method has the beautiful feature
that one can tell it to print only about a dozen (or about a hundred) terms (and to
say that there are more if that is so).
The black box technology has also been successfully applied to other problems
such as the gcd of two multivariate polynomials.

Notes. 16.1. Minkowski (1910) describes the geometry of numbers that he invented.
Grötschel, Lovász & Schrijver (1993) is a good textbook in this area, and Kannan (1987)
present an overview on computational aspects of this theory, including basis reduction and
several applications.
Ajtai (1997) shows that computing a shortest vector in a lattice is “NP-hard”. This is
under probabilistic polynomial-time reductions rather than the usual deterministic reduc-
tion for standard NP-hardness. As long as BPP = 6 NP is considered about as likely as
P 6= NP, this difference does not matter much.
16.2 and 16.3. The Gram-Schmidt orthogonalization procedure is from Schmidt (1907),
§3, who states that Gram (1883) has given essentially the same formulas. Hadamard (1893)
proved Theorem 16.6. The geometrical idea is that the volume | det A| of the polytope
Notes 497

spanned by f1 , . . . , fn ∈ R n is maximal when these vectors are mutually orthogonal; in this

case it is precisely equal to || f1 || · · · || fn ||.
Basis reduction (sometimes called the LLL algorithm) was introduced in the important
paper of Lenstra, Lenstra & Lovász (1982), who used it to factor integer polynomials in
polynomial time. Their definition of a reduced basis is slightly different from ours. We are
grateful to Victor Shoup for his permission to use unpublished lecture notes of his.
Odlyzko (1990) notes that basis reduction turns out to be fast in practice, and usually
finds a reduced basis in which the first vector is much shorter than is guaranteed [by Theo-
rem 16.9]. (In low dimensions, it has been observed empirically that this algorithm usually
finds the shortest nonzero vector in a lattice.)
Improvements both on the algorithm and its analysis are from Kaltofen (1983), Schön-
hage (1984), whose algorithm takes only O∼ (n4 log2 A) word operations for a slightly dif-
ferent notion of reduced basis, Schnorr (1987, 1988), who proves a bound of O∼ (n5 log A+
n4 log2 A) word operations for his algorithm and approximation quality (1 + ε)n || f || for
any ε > 0 in Theorem 16.9 by employing floating point arithmetic, Schnorr & Euchner
(1991), who also give efficient implementations, and Storjohann (1996), who achieves
O∼ (n3.381 log2 A) word operations with a modular approach and by employing fast matrix
multiplication. The current record is by Koy & Schnorr (2001a) whose segment reduc-
tion uses O(n3 log n) arithmetic operations on n-dimensional lattices with basis vectors of
Euclidean norm at most 2n . Koy & Schnorr (2001b) give a floating point implementation
which can handle dimensions of 1000 or more. De Weger (1989) and Storjohann (1996)
give fraction-free variants of Algorithm 16.10. Gauß (1801), article 171, describes basis
reduction in two dimensions.
16.5. For a unified discussion of the factoring algorithms over Z and F[y], we refer to
von zur Gathen (1984a). By performing a binary search on the degree of f ∗ instead of the
linear search in step 7 of Algorithm 16.22, Lenstra, Lenstra & Lovász (1982) achieve a
time bound of O∼ (n9 + n7 log2 A) word operations.
Schönhage (1984) gives an algorithm with only O∼ (n6 + n4 log2 A) word operations.
His approach is numerical: modular factorization and Hensel lifting are replaced by the
computation of a complex root of f to a sufficiently high precision, and Diophantine ap-
proximation based on basis reduction (see Section 17.3) is used to find the irreducible
factor of f with that root.
Van Hoeij (2002) applies lattice reduction to the factor combination stage of Zassen-
haus’ algorithm 15.19, and reports: The result is a practical algorithm that can factor poly-
nomials that are far out of reach for any previous algorithm. His method is implemented in
Shoup’s N TL since version 5.2, which takes, e.g., about 5 seconds for the factorization of
the seventh Swinnerton-Dyer polynomial, of degree 128, on an 800 MHz Pentium III pro-
cessor, while the same computation with version 5.1 does not terminate within two weeks.
See Hart, van Hoeij & Novocin (2011) for references and practical results.
16.6. The polynomial-time solution for dense multivariate factorization was given by
Kaltofen (1985a). An algorithm for sparse factorization is in von zur Gathen & Kaltofen
(1985); that paper also has the example with very large factors. Open questions such as
the divisibility problem (for exponentially large degrees), are discussed in von zur Gathen
(1991a). For the circuit representation, the solution is again by Kaltofen (1989); von zur
Gathen (1985) has an irreducibility test in the same data structure. The latter paper intro-
duced this representation into the factorization business, but it has its origins in Strassen
(1972, 1973a, 1973b); see also Heintz & Sieveking (1981).
498 16. Short vectors in lattices

The generic determinant polynomial for n × n matrices is an example where the length
of the sparse representation, which is essentially n!, is exponential in the length of the
arithmetic circuit representation, since there is an arithmetic circuit (with divisions) of size
O(n3 ) performing Gaussian elimination.
Lang (1983), chapter 9, describes Hilbert’s irreducibility theorem and the theory of
Hilbertian fields for which this theorem holds. Results on specific substitutions that con-
serve irreducibility are in Sprindžuk (1981, 1983) and Dèbes (1996). Efficient probabilistic
versions of Hilbert irreducibility, valid over any field but reducing only to two variables,
can be found in Kaltofen (1985b), von zur Gathen (1985), and Kaltofen (1995a). Huang &
Wong (1998) give a similar result for more general polynomial ideals.
The important paper of Kaltofen & Trager (1990) introduced the black box method and
gave several algorithms discussed above. A seminal idea for sparse interpolation is due to
Zippel (1979); several other papers deal with various aspects of interpolation: Ben-Or &
Tiwari (1988), Kaltofen & Lakshman (1988), Borodin & Tiwari (1990), Grigoriev, Karpin-
ski & Singer (1990), Clausen, Dress, Grabmeier & Karpinski (1991), Grigoriev, Karpinski
& Singer (1994). Freeman, Imirzian, Kaltofen & Lakshman (1988) and Díaz & Kaltofen
(1998) describe implementations of the straight-line and the black box technologies, re-
spectively.

Exercises.
16.1 Let F ∈ R n×n be nonsingular. Show that the GSO M, F ∗ of F is uniquely determined by the
conditions that F = MF ∗ , M be lower triangular with ones on the diagonal, and F ∗ (F ∗ )T be diagonal.
16.2∗ Prove Theorem 16.5.
16.3 We define an inner product R1
⋆ on the vector
p space V of continuous real-valued functions on the
real interval [−1, 1] by f ⋆ g = −1 f (y)g(y) 1 − y2 dy.
(i) Convince yourself that ⋆ is in fact an inner product.
(ii) Compute the Gram-Schmidt orthogonal basis of f0 , f1 , f2 , f3 , where fi (x) = xi for −1 ≤ x ≤ 1.
(The resulting polynomials are the monic associates of the first four Chebyshev polynomials of the
second kind.).
16.4∗ Let g1 , . . ., gn ∈ R n be linearly independent and L = ∑1≤i≤n Zgi the lattice that they generate.
Prove that for each vector x ∈ R n there is a vector g ∈ L such that
1
kx − gk2 ≤ (kg1 k2 + · · · + kgn k2 ).
4
Hint: Induction on n. For the induction step, determine a suitable λ ∈ Z such that the vector x − λgn
has minimal distance to the hyperplane spanned by g1 , . . ., gn−1 .
16.5 (i) Compute the GSO of (22, 11, 5), (13, 6, 3), (−5, −2, −1) ∈ R 3 .
(ii) Trace Algorithm 16.10 on computing a reduced basis of the lattice in Z 3 spanned by the vectors
from (i). Trace also the values of the di and of D, and compare the number of exchange steps to the
theoretical upper bound from Section 16.3.
16.6−→ Compute a “short” vector in the lattice in Z 3 spanned by (1, −5136, 0), (0, 1, −5136), and
(0, 0, 15 625).
16.7∗ The following algorithm takes an arbitrary nonsingular matrix A ∈ Z n×n and computes a
Hermite normal form H of A (Notes 4.5), such that H = UA for a matrix U ∈ Z n×n which is
unimodular, so that detU = ±1.
Exercises 499

A LGORITHM 16.26 Hermite normal form.

Input: A matrix A ∈ Z n×n with det A 6= 0.
Output: A matrix H ∈ Z n×n in Hermite normal form such that H = UA for a unimodular matrix
U ∈ Z n×n .
1. H ←− A
for m = n, . . ., 1 do
2. choose a row index k with 1 ≤ k ≤ m such that
|hkm | = min{|him |: 1 ≤ i ≤ m and him 6= 0}
exchange rows k and m of H
3. if hmm | hlm for 1 ≤ l ≤ m then goto 5
4. choose a row index l with 1 ≤ l < m and hmm ∤ hlm
compute q ∈ Z with |hlm − qhmm | ≤ |hmm |/2 by division with remainder
subtract q times row m from row l of H
goto 2
5. if hmm < 0 then multiply row m of H by −1
for l = 1, . . ., m − 1, m + 1, . . ., n do
6. compute q ∈ Z with 0 ≤ hlm − qhmm < hmm by division with remainder
subtract q times row m from row l in H
7. return H
 
5 2 −4 7
 
 3 6 0 −3 
 
(i) Trace the algorithm on input A =   ∈ Z n×n .
 1 2 −2 4 
 
7 1 5 6
(ii) Prove the following invariant of the loop in step 1: H = UA for some unimodular matrix U, and
for each column j with m < j ≤ n, the diagonal entry h j j is positive, the entries above the diagonal
are zero, and the entries below the diagonal are nonnegative and less than h j j . Conclude from the
invariant and from the fact that deg A 6= 0 that the minimum in step 2 always exists.
(iii) Show that the condition in step 3 is true after at most log2 d executions of step 4, where
d = min{|him | : 1 ≤ i ≤ m and him 6= 0} at the beginning of the current iteration of the for loop in
step 1. Conclude that the algorithm terminates.
(iv) Let f1 , . . ., fn ∈ Q n be linearly independent, L = ∑1≤i≤n Z fi the lattice that they generate, and
g1 , . . ., gn ∈ L be linearly independent as well. Then M = ∑1≤i≤n Z gi ⊆ L is a sublattice of L. Prove
that there exists a basis c1 , . . ., cn of M which has the form
c1 = h11 f1 ,
c2 = h21 f1 + h22 f2 ,
.. (11)
.
cn = hn1 f1 + hn2 f2 + · · · + hnn fn ,
with hi j ∈ Z and hii 6= 0 for 1 ≤ j ≤ i ≤ n.
16.8 Modify the basis reduction algorithm 16.10 so as to work for arbitrary inputs f1 , . . ., fm ∈ Z n ,
not necessarily linearly independent, and with arbitrary m ∈ N.
16.9 With notation as in Lemma 16.13, prove the formula
||g∗i ||2 g∗i−1 − µi,i−1 ||g∗i−1 ||2 g∗i
h∗i = .
||h∗i−1 ||2

Hint: h∗i is the projection of hi onto (R h1 + . . . + R hi−1 )⊥ .

500 16. Short vectors in lattices

16.10 Let x, y ∈ Cn .
(i) Prove the Cauchy-Schwarz inequality |x ⋆ y| ≤ ||x||2 ||y||2 . Hint: Consider the inner product
(||y||2 x + ||x||2 y) ⋆ (||y||2 x + ||x||2 y).
(ii) Use (i) to prove the triangle inequality ||x + y||2 ≤ ||x||2 + ||y||2 .
16.11∗ Lemma 16.17 shows that at any stage in the basis reduction algorithm 16.10, dl µkl and
dk−1 g∗k have integral coefficients for 1 ≤ l < k ≤ n. By multiplying the entries of the GSO by (and,
where possible, dividing out) appropriate dl ’s, convert Algorithm 16.10 into a fraction-free algorithm,
so that all intermediate coefficients are in Z.
16.12∗∗ This exercises discusses basis reduction for polynomials. Let F be a field, R = F[y], and
n ∈ N>0 . The max-norm of a vector f = ( f1 , . . ., fn ) ∈ Rn is || f || = || f ||∞ = max{deg fi : 1 ≤ i ≤ n}.
For vectors f1 , . . ., fm ∈ R which are linearly independent over F(y), the field of fractions of R, the
R-module spanned by f1 , . . ., fm is M = ∑1≤i≤m R fi , and ( f1 , . . ., fm ) is a basis of M.
(i) Let f1 , . . ., fm ∈ Rn be linearly independent (over F(y)), with fi = ( fi1 , . . ., fin ) for 1 ≤ i ≤ m.
We say that the sequence ( f1 , . . ., fm ) is reduced if
◦ || f1 || ≤ || f2 || ≤ · · · ≤ || fm ||, and
◦ deg fi j ≤ deg fii for 1 ≤ j ≤ n, with strict inequality if j < i, for 1 ≤ i ≤ m.
In particular, we have || fi || = deg fii for 1 ≤ i ≤ m. Prove that f1 is a shortest vector in the R-module
M = ∑1≤i≤m R fi , so that || f1 || ≤ || f || for all nonzero f ∈ M.
(ii) Consider the following algorithm, from von zur Gathen (1984a).
A LGORITHM 16.27 Basis reduction for polynomials.
Input: Linearly independent (over F(y)) row vectors f1 , . . ., fm ∈ Rn , where R = F[y] for a field F,
with || fi || < d for 1 ≤ i ≤ m.
Output: Row vectors g1 , . . ., gm ∈ Rn and a permutation matrix A ∈ Rn×n such that (g1 , . . ., gm ) is a
reduced sequence and (g1 A, . . ., gm A) is a basis of M = ∑1≤i≤m R fi .
1. let g1 , . . ., gm be such that {g1 , . . ., gm } = { f1 , . . ., fm } and ||gi || ≤ ||gi+1 || for 1 ≤ i < m
A ←− id, k ←− 1
2. while k ≤ m do
3. { (g1 , . . ., gk−1 ) is reduced and ||gi || ≤ ||gi+1 || for 1 ≤ i < m }
u ←− ||gk ||
4. for i = 1, . . ., k − 1 do
5. q ←− gki quo gii , gk ←− gk − qgi
6. if ||gk || < u then
r ←− min{i: i = k or (1 ≤ i < k and ||gi || > ||gk ||)}
replace gr , . . ., gk−1 , gk by gk , gr , . . ., gk−1
k ←− r, goto 2
7. l ←− min{k ≤ j ≤ n: deg gkl = u}
let B ∈ Rn×n be the permutation matrix for the exchange of columns k and l
for i = 1, . . ., m do gi ←− gi B
A ←− BA, k ←− k + 1
8. return g1 , . . ., gm and A
Show that M = ∑1≤i≤m R · gi A holds throughout the algorithm, and conclude that the gi are always
nonzero vectors.
(iii) Assume that the invariants in curly braces are true in step 3. Convince yourself that ||gk−1 || ≤ u
holds during steps 4 and 5 if k ≥ 2. Show that gii 6= 0 holds in step 5 if k ≥ 2, so that the division
with remainder can be executed, and prove the invariants ||gk || ≤ u and deg gk j < u for 1 ≤ j < i of
the loop 4.
Exercises 501

(iv) Show that (g1 , . . ., gk−1 ) is reduced and ||gi || ≤ ||gi+1 || for 1 ≤ i < m holds each time the
algorithm passes through step 3. Conclude that it works correctly if it halts in step 8.
(v) Show that ||gi || < d for 1 ≤ i ≤ m holds throughout the algorithm. Prove that the cost for
one execution of steps 3 through 7 is O(nm) arithmetic operations (additions, multiplications, and
divisions with remainder) in R or O(nm M(d)) operations in F.
(vi) Show that the function s(g1 , . . ., gm ) = ∑1≤i≤m ||gi || never increases in the algorithm and
strictly decreases if the condition in step 6 is true. Conclude that the number of times when the latter
happens is at most md and that the number of iterations of the loop 2 is at most (m − 1)(md + 1).
(vii) Putting everything together, show that the running time of the algorithm is O(nm3 d M(d)) or
O∼ (nm3 d 2 ) arithmetic operations in F.
(viii) Trace the algorithm on the F97 [y]-module generated by

5y3 + 44y2 + 37y + 91, 8y3 + 86y2 + 91y + 89, 16y3 + 65y2 + 20y + 76 ,

8y3 + 70y + 37, 16y3 + 7y2 + 54y + 38, 32y3 + 23y2 + 80y + 77 ,

16y2 + 84y + 63, 32y2 + 15y + 19, 64y2 + 48y + 51 ∈ F97 [y]3 .

Mulders & Storjohann (2000) give an algorithm for computing a reduced basis taking only O(nm2 d 2 )
arithmetic operations in F.
16.13 State and prove the analog of Lemma 16.20 for polynomials in F[x, y] for a field F when || · ||2
is replaced by || f ||∞ = degy f .
16.14∗ Use Exercises 16.12 and 16.13 to adapt Algorithm 16.22 to bivariate polynomials over a
field. Prove that your algorithm works correctly and analyze its running time. You may assume that
F has effective univariate factorization and is “large enough”, so that the modulus p may be chosen
linear.
16.15 Let F be a field and n ∈ N. What is the size of the sparse representation of the polynomial
i
∏0≤i<n (x + y2 ) ∈ F[x, y]? Find an arithmetic circuit representation of size 3n − 2.
16.16∗ You are to design an algorithm for factoring multivariate polynomials over a field F with
efficient univariate factorization. Suppose that f ∈ F[x1 , . . ., xt ] has degree less than n in each vari-
i−1
able, and consider the Kronecker substitution σ: F[x1 , . . ., xt ] −→ F[x] which maps xi to xn for
1 ≤ i ≤ t. This is a ring homomorphism.
(i) Show that polynomials with degree less than n in each variable can be uniquely recovered
from their image under σ. More precisely, let U ⊆ F[x1 , . . ., xt ] be the vector space of all these
polynomials, and V = {h ∈ F[x]: deg h < nt }. Show that σ gives a vector space isomorphism between
U and V .
(ii) Prove that the following procedure correctly factors f : Factor σ( f ) into irreducible factors
h1 , . . ., hr ∈ F[x], and test for each factor combination h of them whether its inverse σ −1 (h), in the
sense of (i), divides f .
(iii) Analyze the cost of your algorithm from (ii). You will first have to estimate the cost for mul-
tivariate multiplication (Exercise 8.38). Ignore the time for univariate factorization in this analysis.
(iv) Trace your algorithm on the example f = −x4 y + x3 z + xz2 + yz2 ∈ F3 [x, y, z].

Research problem.
16.17 Can one compute the gcd of two multivariate polynomials in random polynomial time in the
length of the sparse representation plus the degree? Is the output length always polynomial in the
input length? As mentioned in Section 16.6, one can factor polynomials in random polynomial time
both in the arithmetic circuit and in the black box representation.
Il faut bien distinguer entre la géométrie utile et la géométrie curieuse.
L’utile est le compas de proportion inventé par Galilée [. . . ]
Presque tous les autres problèmes peuvent éclairer l’esprit et le
fortifier; bien peu seront d’une utilité sensible au genre humain.1
Voltaire (1771)

Partager une nuit entre une jolie femme et un beau ciel,

le jour à rapprocher ses observations et les calculs,
me paraît être le bonheur sur la terre.2
Napoléon I. (1812)

But yet one commoditie moare [. . . ] I can not omitte. That is the
filyng, sharpenyng, and quickenyng of the witte, that by practice of
Arithmetike doeth insue. It teacheth menne and accustometh them, so
certainly to remember thynges paste: So circumspectly to consider
thynges presente: And so prouidently to forsee thynges that followe:
that it maie truelie bee called the File of witte.
Robert Recorde (1557)

1 One has to distinguish carefully between practical geometry and theoretical geometry. Practical is the rule of
proportions invented by Galileo [. . . ] Almost all other problems can enlighten the mind and strengthen it; rather
few will be of any reasonable usefulness to mankind.
2 To share a night between a beautiful woman and a clear sky, the day in seeking agreement between one’s
observations and calculations, seems to me happiness on earth.
17
Applications of basis reduction

This chapter presents four applications of basis reduction: breaking certain cryp-
tosystems and linear congruential pseudorandom generators, finding simultaneous
Diophantine approximations, and a refutation of Mertens’ conjecture. We can only
give the basic ideas; technical details can be found in the references, provided in
the notes. The first two sections assume familiarity with the basics of cryptogra-
phy, as explained in Chapter 20.

17.1. Breaking knapsack-type cryptosystems

The subset sum problem seeks to answer the following.

Given a1 , . . . , an , s ∈ N, are there x1 , . . . , xn ∈ {0, 1} with ∑1≤i≤n ai xi = s? (1)

For example, an instance is to ask whether there exist x1 , . . . , x6 ∈ {0, 1} such that
366x1 + 385x2 + 392x3 + 401x4 + 422x5 + 437x6 = 1215.
This problem is NP -complete, as is a slight generalization of it, called the knap-
sack problem. After Diffie & Hellman (1976) invented public key cryptography,
Merkle & Hellman (1978) proposed a public key cryptosystem based on the subset
sum problem. The computations in this system were much less voluminous than
for other systems such as RSA (Section 20.2), and its higher throughput seemed
to promise a bright future. Several other such systems were proposed, based on
versions of the knapsack problem. But the roof fell in when Shamir (1984) broke
the Merkle & Hellman system, and almost all subsequently proposed improved
schemes have suffered the same fate. Basis reduction has played a major role in
some of these cryptanalyses.
In the notation of Section 20.1, Alice publishes her public key a1 , . . . , an for such
a knapsack cryptosystem. When Bob wants to send her n bits x1 , . . . , xn secretly,
he encodes them as s = ∑1≤i≤n ai xi and sends s. Decoding a general such problem
is NP -complete and hence infeasible, but the idea now is to use a special type
of problem for which decoding is easy with some secret additional knowledge,

503
504 17. Applications of basis reduction

but hopefully hard without the secret. These special subset sum problems start
with “superincreasing” b1 ≪ b2 ≪ · · · ≪ bn as the summands; a trivial example
is bi = 2i−1 , where the solution (x1 , . . . , xn ) is just the binary representation of s.
More generally, it is sufficient to have bi > ∑1≤ j<i b j for all i; the solution is then
unique and easy to calculate. The “easiness” is then hidden by multiplying the bi ’s
with a random number c modulo another random number m to obtain the public
ai ’s. Alice’s private key c, m allows her to multiply s by c−1 modulo m and then
solve an easy subset sum problem. At first sight, the ai look like a general subset
sum problem, but the cryptanalysts’ work then showed that this hiding does not
work. Of course, the breaking of these schemes does not mean that large instances
of an NP -complete problem can be solved routinely; the “superincreasing” subset
sum problem is just too special.
The connection between the subset sum problem and short vectors is given by
the fact that a solution of (1) yields a short vector in the lattice L ⊆ Z n+1 generated
by the rows r1 , . . . , rn+1 ∈ Z n+1 of the matrix
 
1 0 · · · 0 −a1
 0 1 · · · 0 −a2 
 
 .. .. . . .. ..  ∈ Z (n+1)×(n+1) .
 . .
 . . . 
 0 0 · · · 1 −an 
0 0 ··· 0 s
To see this, let (x1 , . . . , xn ) ∈ {0, 1}n be a solution of (1). Then
v= ∑ xi ri + rn+1 = (x1 , . . . , xn , 0) ∈ L
1≤i≤n
√
is a vector with ||v||2 ≤ n, which is very small since the ai are typically very
large numbers. The approach to breaking such a cryptosystem is to compute a
reduced basis of L and to hope that the resulting short vector is (essentially) v. Of
course, this does not work too well for general subset sum problems, but it does
work for low-density subset sums, where the ratio n/(maxi log2 ai ) of information
bits to transmitted bits is small. This number is about 1/n for Merkle & Hellman’s
original scheme, and then this attack is very successful. In the example, the density
6/ log2 437 ≈ 0.684 is high.
For example, with a1 , . . . , a6 as in the beginning, we consider the lattice L ⊆ Z 7
generated by the rows of the matrix
 
1 0 0 0 0 0 −366
 0 1 0 0 0 0 −385 
 
 0 0 1 0 0 0 −392 
 
 0 0 0 1 0 0 −401  ∈ Z 7×7 .
 
 0 0 0 0 1 0 −422 
 
 0 0 0 0 0 1 −437 
0 0 0 0 0 0 1215
17.2. Pseudorandom numbers 505

Algorithm 16.10 then computes the short vector v = (0, 0, 1, 1, 1, 0, 0) ∈ L, and

indeed 1215 = 366 · 0 + 385 · 0 + 392 · 1 + 401 · 1 + 422 · 1 + 437 · 0.

17.2. Pseudorandom numbers

The simplest version of a linear congruential pseudorandom number generator
is given by three parameters a, b, m ∈ N, and for any seed x0 ∈ {0, . . . , m − 1}
successively computes the elements xi ∈ {0, . . . , m − 1} defined by xi+1 ≡ axi + b
mod m for i ∈ N. Pseudorandom numbers are used in many situations; the rand
function in most computer systems is based on some form of these generators.
These work just fine for computer algebra applications in modular algorithms or
in the factorization of polynomials or integers. In cryptography, one would like
to use them for the generation of keys. Therefore it is necessary that the numbers
xi are not only (almost) uniformly distributed but that it is practically impossible
or computationally hard to infer xi from x0 , . . . , xi−1 , and to break a pseudorandom
number generator means exactly to do this with a reasonable amount of time.
Boyar (1989) broke the simple generator given above by noting that
xn+1 − xn ≡ a(xn − xn−1 ) mod m, xn+2 − xn+1 ≡ a(xn+1 − xn ) mod m
imply that m divides (xn+1 − xn )2 − (xn − xn−1 )(xn+2 − xn+1 ), provided that a and
m are coprime. This yields guesses for m, a, and b. She uses these guesses to
predict the sequence. If a prediction ever turns out to be wrong, this leads to a new,
improved guess. She proves that in polynomial-time one arrives at a permanently
correct prediction.
Frieze, Håstad, Kannan, Lagarias & Shamir (1988) considered a generator,
which they attribute to the 1981 edition of Knuth (1998), where not the whole
of xi , but only the top half of the bits of xi are used. They broke this using basis
reduction, assuming that the generator m, a, b is known, but not the seed x0 .
The methods of both Boyar and Frieze et al. apply to much more general gen-
erators than explained here, and all these generators are useless for cryptographic
purposes.

17.3. Simultaneous Diophantine approximation

We are given real numbers α1 , . . . , αn and want to find p1 , . . . , pn , q ∈ Z such that
|αi − (pi /q)| is “small” for all i and q is not too large. We can simply take q to
be a power of 10 and pi an appropriate initial segment of the decimal expansion
of αi . However, we look for much better solutions, with q much smaller than (the
inverse of) the approximation error.
For n = 1, this is the classical Diophantine approximation problem, discussed
in Section 4.6: the “best” approximations can be computed with the Extended
Euclidean Algorithm and yield |α1 − p1 /q| < 5−1/2 q−2 .
506 17. Applications of basis reduction

For general n, Dirichlet (1842) showed that there are infinitely many approxi-
mations with |αi − pi /q| ≤ q−(1+1/n) for all i. Lenstra, Lenstra & Lovász (1982)
expressed this as a short vector problem, as follows. Given individual rational ap-
proximations αi ≈ βi = ui /vi with ui , vi ∈ Z for 1 ≤ i ≤ n (but not necessarily the
same denominator as demanded for a simultaneous approximation), and a rational
ε with 0 < ε < 1, we take Q = ε−n as an approximate bound on the denominator q.
We set w = 2−n(n+1)/4 εn+1 , and let L ⊆ Q n+1 be the lattice generated by the rows
f0 , . . . , fn ∈ Q n+1 of the matrix
 
w β1 β2 · · · βn
 0 −1 0 · · · 0 
 
 0 0 −1 · · · 0 
  ∈ Q (n+1)×(n+1) .
 .. .. .. . . .. 
 . . . . . 
0 0 0 · · · −1

In our treatment, we always assumed the vectors generating the lattice to have inte-
gral coefficients, but basis reduction also works for rational coefficients, as we have
them here. It will produce in polynomial time a reduced basis for L = ∑0≤i≤n Z fi .
By multiplying together the n + 1 inequalities in the proof of Theorem 16.9, we
find that its first vector g satisfies

||g||2(n+1) ≤ || f0∗ ||2 · 2|| f1∗ ||2 · · · 2n || fn∗ ||2 = 2n(n+1)/2 || f0∗ ||2 · · · || fn∗ ||2 .

Since f0∗ , . . . , fn∗ are orthogonal, we find from Theorem 16.5 (iv)
 1/(n+1)
f0∗


||g|| ≤ 2n/4 (|| f0∗ || · · · || fn∗ ||)1/(n+1) = 2n/4 det  .. 
. 
fn∗
 1/(n+1)
f0

= 2n/4 det  ... 
 
= ε < 1.

fn

Since g ∈ L, there exist q, p1 , . . . , pn ∈ Z so that

g = q f0 + ∑ pi fi = (qw, qβ1 − p1 , . . . , qβn − pn ).

1≤i≤n

If q ≤ 0, we replace g by −g, so that q ≥ 0, and ||g|| < 1 implies that q ≥ 1. Then

|βi − pi /q| ≤ ||g||/q ≤ ε/q = q−1 Q−1/n for all i, and 1 ≤ q ≤ ε/w = 2n(n+1)/4 ε−n =
2n(n+1)/4 Q. This is about as good as guaranteed by Dirichlet’s theorem, except for
the factor 2n(n+1)/4 ; the latter does not depend on the sizes of the βi , but only on
their number.
17.3. Simultaneous Diophantine approximation 507

Lagarias (1985) gave the following all-integer variant. Input is the vector β =
(u1 /v1 , . . . , un /vn ) ∈ Q n . For any nonzero q ∈ N, we write

ui pi
{{β q}} = min max −
p1 ,...,pn ∈Z 1≤i≤n vi q
for the best approximation quality with denominator q. For pi , one simply takes
the integer nearest to ui q/vi . A further input is a bound Q on the denomina-
tor. If a simultaneous approximation denominator q∗ exists with 1 ≤ q∗ ≤ Q and
ε = {{β q∗ }}, say as guaranteed by Dirichlet’s theorem but unknown, then the
algorithm produces an approximation q which is almost as good:
√
1 ≤ q ≤ 2n/2 QV, and {{β q}} ≤ 5 n 2(n−1)/2 ε. (2)
We let V = v1 · · · vn , and assume that ε > 0. For all j ∈ {0, . . . , n + log2 (QV )}, we
consider the lattice L j ⊆ Z n+1 spanned by the rows of the matrix
 u1 u2 un 
2 j QV QV QV
 v1 v2 vn 
 0 QV 0 · · · 0 
 
 0 0 0 
(n+1)×(n+1)
 QV ··· ∈Z .
 . . . . 
 .. .. .. .. 
0 0 0 ··· QV
The lower n vectors have only one nonzero entry each. We run the basis reduction
( j) ( j)
algorithm 16.10 on this basis, and let x( j) = (x0 , . . . , xn ) ∈ Z n+1 be the short
( j)
vector returned. Lagarias shows that for some value of j, the denominator q = x0
provides an approximation satisfying (2), and that the whole algorithm runs in
polynomial time.

E XAMPLE 17.1. We try Lagarias’ method on the binary logarithms of the mu-
sical intervals 2, 3/2, 4/3, 5/4, 6/5, 9/8 from Section 4.8. Since log2 2 = 1 and
log2 (3/2) + log2 (4/3) = 1, it suffices to find simultaneous Diophantine approxi-
mations for α1 = log2 (4/3) ≈ 0.42, α2 = log2 (5/4) ≈ 0.32, α3 = log2 (6/5) ≈
0.26, and α4 = log2 (9/8) ≈ 0.17. We take as initial approximation ui /vi the dec-
imal expansion of αi rounded to two digits, so that we start with a simultaneous
Diophantine approximation with common denominator V = 100 (instead of the
product required above). Letting Q = 1 and j = 0, we obtain the lattice L ⊆ Z 5
generated by the rows of the matrix
 
1 42 32 26 17
 0 100 0 0 0 
 
 0 0 100 0 0 
 .
 0 0 0 100 0 
0 0 0 0 100
508 17. Applications of basis reduction

Then Algorithm 16.10 computes the matrix

 
12 4 −16 12 4
 −19 2 −8 6 −23 
 
 1 42 32 26 17 
 ,
 16 −28 12 16 −28 
22 24 4 −28 −26
whose rows form a reduced basis of L. Indeed, the leftmost entries of the first two
rows yield our familiar common denominator 12 and the next best choice 19. ✸

17.4. Disproof of Mertens’ conjecture

We defined the Möbius function µ(n) for n ∈ N in Section 14.9, equation (11).
Its summation M: N −→ N is given by M(x) = ∑n≤x µ(n). Table 17.1 lists the first
fifteen values of µ and M.
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
µ(n) 1 −1 −1 0 −1 1 −1 0 0 1 −1 0 −1 1 1
M(n) 1 0 −1 −1 −2 −1 −2 −2 −2 −1 −2 −2 −3 −2 −1

TABLE 17.1: The values of µ(n) and M(n) for n ≤ 15.

Mertens (1897) contains a table and a foldout chart of values up to 10 000, and
√
he conjectured that |M(x)| ≤ x for all x ∈ N; a similar conjecture had been made
in 1885 by Stieltjes.
The conjecture may seem to come out of the blue, but in analytic number theory
one studies various functions that take values 0, 1, −1 (or other complex values
with absolute value 0 or 1) such as the Jacobi symbol (Section 18.5) and proves
√
that their sum up to x is absolutely bounded by O( x). The same is true for the
absolute value of the sum of a random sequence of 1 and −1 (Exercise 19.18; the
sum itself has mean 0); in fact, the quotient M(x)/x goes to zero if and only if µ
takes the values 1 and −1 roughly equally often. This principle will motivate the
bound on the size of elliptic curves (Hasse’s theorem 19.20).
Mertens proved that his conjecture implies the famous Riemann Hypothesis
(Notes 18.4). Furthermore, it was known that it implies the unsolvability of a cer-
tain (inhomogeneous) simultaneous Diophantine approximation problem, as in the
previous section, involving roots of Riemann’s zeta function. However, Odlyzko &
te Riele (1985) used basis reduction in a lattice in R 70 to show that this approxima-
tion problem does have a solution, and thus disproved Mertens’ conjecture. Their
account is eminently readable also for the non-specialist, and their method sug-
gests that a counterexample might exist for an x of order exp(1065 ), but current
algorithmics do not allow us to calculate M for such huge arguments. So we know
√
that an x with M(x) > 1.065 x exists, but we do not know any such x.
Notes 509

Notes. 17.1. The subset sum problem was proven to be NP-complete by Karp (1972);
see problem SP13 of Garey & Johnson (1979). The original successful attack on the Merkle
& Hellman system was by Shamir (1984). Lagarias & Odlyzko (1985) described the short
vector attack, and Odlyzko (1990) gives a nice overview of the problem. Other subset sum
cryptosystems were proposed by Graham & Shamir (see Shamir & Zippel (1980) for a
description), Lu & Lee (1979), Niederreiter (1986), Goodman & McAuley (1984), Ong,
Schnorr & Shamir (1984). Most of them were broken in the 1980s using basis reduction,
among others by Adleman (1983) and Brickell (1984, 1985). The only knapsack type
cryptosystem withstanding all attacks up to now is the Chor & Rivest (1988) scheme.
Many further applications of basis reduction in cryptography are discussed in Nguyen &
Stern (2001).
17.2. Lagarias (1990), §8, surveys pseudorandom number generators in cryptography.
17.3. Dirichlet (1842) showed that there exist simultaneous Diophantine approximations
with absolute error bound q−(1+1/n) , and Lagarias (1982a, 1982b) presents many results
concerning best approximations. Lagarias (1985) discusses the computational complexity
of various such problems, which, depending on the specification, ranges from polynomial-
time to NP-complete. As an example, the following is NP-complete: given β ∈ Q n , as in
Section 17.3, and integers Q, s,t, is there an approximation denominator q with 1 ≤ q ≤ Q
and {{βq}} ≤ s/t?

Exercises.
17.1−→ We consider the following knapsack cryptosystem. The pairs AA, AB, . . ., AZ, BA, BB, . . .,
BZ, . . ., ZA, ZB, . . ., ZZ of letters are identified with the 10-bit representations of the numbers 0, . . .,
262 − 1 = 675. For example, the pair AL corresponds to the bit string x9 x8 · · ·x0 = 0000001011.
Longer messages are broken into two-letter blocks and each block is treated separately.
1. The private key is c9 , . . ., c0 , m, w ∈ N with ci+1 ≥ 2ci for 0 ≤ i ≤ 8, m > ∑0≤i≤9 ci , and
gcd(w, m) = 1.
2. The public key is ai = (wci rem m) ∈ N for i = 0, . . ., 9.
3. A bit string x = x9 x8 · · ·x0 is encrypted as s = ∑0≤i≤9 xi ai ∈ N.
4. To decrypt a ciphertext s, you compute t ∈ N such that t ≡ w−1 s mod m and 0 ≤ t < m. Then
t = ∑0≤i≤9 xi ci ∈ N, and you can reconstruct x9 , x8 , . . ., x0 from t.
(i) Write procedures for encryption and decryption, and check them with the key c0 = 1 and ci+1 =
2ci + 1 for 0 ≤ i ≤ 8, m = 9973, and w = 2001, on the message “ALGEBRAISFUN”.
(ii) Prove that t = ∑0≤i≤9 xi ci actually holds in step 4.
(iii) Now you are an eavesdropper who knows the public key

i 9 8 7 6 5 4 3 2 1 0
ai 2720 2580 5963 5712 7529 8393 6372 6749 6660 2775

and has intercepted a ciphertext consisting of the eight blocks

30 754, 5 495, 29 503, 30 781, 29 048, 15 752, 43 586, 33 361

(in decimal representation). Try basis reduction to find the original message. This need not work for
all blocks.
Part IV
Fermat
Pierre Fermat (c. 1601–1665) has been called the greatest amateur
mathematician. After growing up in Beaumont-de-Lomagne in Gascony (where
his home now houses an interesting museum⋆⋆ ), he studied in Orléans and
Toulouse, became “commissioner of requests” in 1631, and conseiller du roi in
the local parlement, through which any petitions to the king had to pass. He died
in Castres, where he was in the commission implementing the Édit de Nantes,
which gave some protection to the persecuted protestant Huguenots. Fermat
never left the area, never published a paper, and still became the second-best
mathematician of his century (after Newton). Fermat communicated his
mathematical discoveries in numerous letters, usually without proof and often in
the form of challenges, to his contemporaries. (Among them was René Descartes,
who could only be reached through his friend Marin Mersenne in Paris, because
for many years he lived in Holland without a fixed address—a Flying Dutchman
of mathematics, like the modern-day late Pál Erdős.)
Fermat was a pioneer in several
areas. His method for drawing
a tangent to certain plane curves was
a step in the invention of calculus—
later came Newton and Leibniz.
He invented probability theory, in
extensive correspondence with Blaise
Pascal around 1654. He determined
extrema of functions as zeroes of their
derivative, and used this to calculate
the path of light through different
media according to the “principle of
least time”. There was a controversy
between him and Descartes about
the discovery of analytical geometry;
certainly Fermat was the first to use
it in three dimensions.
But Fermat’s greatest contributions—and those of interest for computer
algebra—were in number theory. He was fascinated by perfect and amicable
numbers, and the Pell–Fermat equation x2 − ny2 = 1. Fermat discovered that
primes of the form 4n + 1 can be represented (in exactly one way) as a sum of two
squares; for example, 29 = 52 + 22 . (It is easy to show that numbers of the form
4n − 1 are never sums of two squares; see Exercise 18.1.) His “method of infinite
descent” can determine the (un)solvability of many Diophantine equations.
⋆⋆ worth the detour

512
Fermat wrote to Bernard Frénicle de Bessy around August 1640 that the numbers
0 1 2 3 4
22 + 1 = 3, 22 + 1 = 5, 22 + 1 = 17, 22 + 1 = 257, and 22 + 1 = 65 537 are
n
primes; he conjectured that all these Fermat numbers Fn = 22 + 1 were prime.
He was wrong; the next values of Fn , at least up to n = 23, are not prime; see
Section 4.3. Not much harm done; he pointed out that he did not have a proof of
his conjecture. We come across Fermat numbers in Chapter 8, where they are
used in the integer Fourier Transform, and Chapter 18. Weil (1984) presents
Fermat’s achievements in detail.
The theorem that includes
Fermat into our little Hall
of Fame is that for a prime
p and an integer a, a p−1 − 1
is divisible by p. He stated
it in a letter to Frénicle on 18
October 1640: Tout nombre
premier mesure infailliblement
une des puissance −1
de quelque progression que
ce soit, et l’exposant de la dite
puissance est sous-multiple du
nombre premier donné −1.1
He forgot to mention that we
have to disallow the geometric
progression a, a2 , a3 , . . .
if p divides a. We will call
it and various generalizations
“Fermat’s little theorem” in this book; they play a crucial role in primality testing
and the factorization of polynomials and integers (Chapters 14, 18, and 19).
Leibniz rediscovered and proved this result in unpublished notes from 1680; see
Notes 4.4. Euler (1732/33, 1747/48) was the first to publish a proof. He also
derived conditions on the factors of Fermat numbers, which led him to the factor
641 of F5 .
Finally, there is Fermat’s (in)famous remark in the margin near the eighth
problem of Book II of Bachet’s translation of Diophantus’ Arithmetic, which
deals with rational solutions of the equation x2 + y2 = z2 : Cubum autem in duos
cubos, aut quadratoquadratum in duos quadratoquadratos & generaliter nullam in
infinitum ultra quadratum potestatem in duos eiusdem nominis fas est dividere;
cuius rei demonstrationem mirabilem sane detexi. Hanc marginis exiguitas non
1 Every prime number divides invariably one of the powers −1 in any given geometric progression, and the
exponent of this power is a divisor of the given prime number −1.

513
caperet.2 A proof of this eluded mathematicians for over three centuries and
inspired Kummer’s theory of ideals and the construction of large parts of the
edifice of arithmetic algebraic geometry, culminating in Wiles’ proof of
“Fermat’s last theorem” via a special case of the Taniyama–Weil conjecture
(Wiles 1995, Taylor & Wiles 1995; see van der Poorten 1996 for the mathematics
and Singh 1997 for the story). The designation comes from the fact that after all
of Fermat’s claims had been proven, this remained open as the last one.
His son, Samuel de Fermat, published Diophant’s Arithmetic with Fermat’s
annotations, and in 1679 the Varia opera mathematica D. Petri de Fermat,
Senatoris Tolosani. His dedication to Ferdinand II. von Fürstenberg, featured on
page 513, reads: To His Highness Prince Ferdinand, Bishop of Paderborn,
Coadiutor of Münster, Duke of Pyrmont, Free Baron of Fürstenberg. By Samuel
de Fermat S. P. Motto: soft and strong. Ferdinand II. (1626–1683) was a shining
light of Paderborn science, student at the University of Paderborn from 1644 to
1646, erudite author of Monumenta Paderbornensia on the local history,
correspondent with the leading philosophers and scientists of his times, and
sponsor of the arts and architecture. His residence is shown in Figure 13.4.
Fermat’s dedication makes it plausible that he financed the Varia opera.
Samuel de Fermat includes a poem, whose title and first lines are:

De Principis eiusdem præclaro

Monumentorum Paderbornensium opere.
Dum Paderæ fontes æterno carmine Princeps
Aonij celebrat spes columenque chori,
Ut superat quæ sic ponit monumenta, suisque
Altius ipse aliud tollit ad astra modis!3

2 But it is impossible to divide a cube into two cubes, or a fourth power into two fourth powers, or generally any
power beyond the squares into two like powers; I discovered a truly marvelous proof of this fact. The margin is
too narrow to write it down.
3 On the Prince’s famous work Monumenta Paderbornensia : The Prince, hope and pillar of the chorus of the
Muses, celebrates the sources of the Pader river with his eternal song. Just as he who builds such monuments
towers high, he carries in his way [through his generosity] another work [Fermat’s Opera ] higher up to the stars!

514
Il est remarquable qu’on déduise ainsi du calcul intégral
une propriété essentielle des nombres premiers; mais
toutes les vérités mathématiques sont liées les unes aux autres,
et tous les moyens de les découvrir sont également admissibles.1
Adrien-Marie Legendre (1830)

Can I get a witness

The Rolling Stones (1964)

No mathematician can now write on demand a prime with,

say, 10 million digits, although one surely exists.
Stanisław Marcin Ulam (1964)

Such sentiments [half-credence in the supernatural] are seldom

thoroughly stifled unless by reference to the doctrine of chance, or, as
it is technically termed, the Calculus of Probabilities. Now this
Calculus is, in its essence, purely mathematical; and thus we have the
anomaly of the most rigidly exact in science applied to the shadow and
spirituality of the most intangible in speculation.
Edgar Allan Poe (1842)

“Du har sagt det där förut”, sa Kollberg torrt.

“Det är rena gissningen.”—“Sannolikhetsprincipen.”2
Maj Sjöwall and Per Wahlöö (1967)

1 It is remarkable that one should deduce in this way from integral calculus an essential property of the prime
numbers; but all mathematical truths are connected to each other, and all means of discovering them are equally
admissible.
2 “You have said that before,” Kollberg said drily. “It is pure guesswork.”—“The principle of probability.”
18
Primality testing

We want to know whether a given integer is prime or not. Certainly we can find
out by factoring it. Can you think of any other way? Well, there is, and the major
discovery in this area is that primality testing is much easier than factoring, at least
to current knowledge. One can test integers with many thousands of digits, but
factoring numbers with only 300 digits is in general not feasible.
In this chapter, we provide an efficient probabilistic algorithm to test primality;
factorization is the subject of the next chapter. As an easy application, we can also
find large prime numbers, as they are required in some modular algorithms and
in modern cryptography. We conclude with brief discussions of other primality
testing algorithms. The long-standing quest for a deterministic polynomial-time
primality test, stated as a Research Problem in the first two editions of this book,
was resolved by Agrawal, Kayal & Saxena (2004).
For numbers of a special form, such as the Mersenne numbers Mn = 2n − 1,
particularly efficient methods have been known since the 19th century. Indeed,
throughout history the largest known prime has usually been a Mersenne prime.
On 23 August 2008, a UCLA computer managed by Edson Smith discovered
M43 112 609 , the largest among 47 known Mersenne primes. It has 12 978 189 deci-
mal digits. This current world record (at the time of writing) is an achievement of
the Great Internet Mersenne Prime Search (GIMPS), based on software by George
Woltman and Scott Kurowski. This record, together with another one, discovered
two weeks later and slightly smaller, earned a US$ 100 000 award by the Electronic
Frontier Foundation. GIMPS harnesses the spare power of over 20 000 computers
all over the world, and performs about 720 billion calculations per second. This
new paradigm of internet computing started in the area of integer factorization, and
may solve in the future very large instances of such easily distributed problems.

18.1. Multiplicative order of integers

An integer N ∈ N≥2 is a prime number if N | ab implies that N | a or N | b, or,
equivalently, if ab = N implies that a ∈ {1, −1} or b ∈ {1, −1}, for all a, b ∈ Z

517
518 18. Primality testing

(Section 25.2). The other integers N ≥ 2 are composite. (The number 1 is neither
prime nor composite, but a unit. In the ring Z, −5 is a prime just as 5 is; see
Section 3.1 for a discussion of associates.) The prime numbers form the “building
blocks” for all integers, according to the following basic fact.

F UNDAMENTAL THEOREM OF ARITHMETIC. Every positive integer can be writ-

ten as a product of positive prime factors. This is unique up to the order of the
factors. In other words, Z is a Unique Factorization Domain.

The idea of primality was known to the Pythagoreans (around 500 BC), and
Book IX of Euclid’s Elements contains his famous proof that there are infinitely
many primes (see page 26).
This chapter deals mainly with testing whether a given integer √ N is prime or not.
Can we do better than to try division by all integers up to N, a method already
known to Eratosthenes in the third century BC? Yes, indeed, there are efficient
algorithms that differentiate between prime and composite numbers, and as a result
the set of prime numbers is in P . In this text, we only present a probabilistic
algorithm. It shows that this set is in the complexity class ZPP (see Section
25.8), and is eminently practical.
We recall the following facts. Z× N = {a mod N ∈ ZN : gcd(a, N) = 1} is the mul-
tiplicative group of units in ZN = Z/NZ. Remember that a unit in a ring is an
element that has an inverse in the ring. The elements of Z× N form a multiplica-
tive group of cardinality ϕ(N) = #Z× N ; ϕ is Euler’s totient function. If the prime
e1 er
factorization of N is N = p1 · · · pr , where p1 , . . . , pr are distinct positive primes
and e1 , . . . , er are positive integers, then the Chinese Remainder Theorem 5.3 says
that ZN ∼ = Z pe1 × · · · × Z perr (a ring isomorphism), and that Z× ∼ × ×
N = Z pe1 × · · · × Z per r
1 1
(a group isomorphism). If N is prime, then ZN is a field, and Z× N is a group of
order ϕ(N) = N − 1. If N = pe is a prime power, then ϕ(N) = pe−1 (p − 1), and in
general, ϕ(N) = pe11 −1 · (p1 − 1) · · · prer −1 · (pr − 1), by Corollary 5.6.
A central fact is Fermat’s little theorem 4.9 which says that aN−1 ≡ 1 mod N
for a prime N and any a ∈ Z which is coprime to N. For coprime integers a, N
we define the order ordN (a) of a modulo N as the smallest integer k ≥ 1 such
that ak ≡ 1 mod N. Euler’s theorem, generalizing Fermat’s, states that aϕ(N) ≡ 1
mod N, and is a consequence of Lagrange’s theorem (Section 25.1). Besides these
“upper bounds” on the order, we also need some “lower bounds”.

L EMMA 18.1. Let N ∈ N≥2 .

(i) If a ∈ Z is coprime to N and k ∈ N with ak ≡ 1 mod N , then ordN (a) di-
vides k. In particular, ordN (a) divides ϕ(N).
(ii) If p is prime, e ∈ N≥2 , N = pe , and a = 1 + pe−1 , then ordN (a) = p.
18.2. The Fermat test 519

P ROOF. (i) Let e = ordN (a) and divide k by e with remainder: k = qe + r with
0 ≤ r < e. Then ar = ak−qe = ak · (ae )−q ≡ 1 mod N, and hence r = 0. By Euler’s
theorem, we have aϕ(N) ≡1 mod N, and (i) follows.
p p (e−1)i
(ii) We have a ≡ ∑ p ≡ 1 mod pe . By (i), ordN (a) is either 1 or p,
0≤i≤p i
and since a 6≡ 1 mod N, the claim follows. ✷

If p is a prime, then in the language of Section 8.2, a ∈ Z is a primitive kth root

of unity in F p if and only if p ∤ a and ord p (a) = k.

18.2. The Fermat test

From now on, we assume that N ∈ N≥3 is odd; testing even numbers for primality
is rather easy. We recall that a rem N ∈ N is the remainder of a ∈ Z on division
by N, with 0 ≤ a rem N < N. We often use it to represent a mod N ∈ ZN , but the
two objects live in different domains. Fermat’s little theorem furnishes the Fermat
test for primality.

A LGORITHM 18.2 Fermat test.

Input: An odd integer N ≥ 5.
Output: Either “composite” or “possibly prime”.

1. choose a ∈ {2, · · · , N − 2} uniformly at random

2. call the repeated squaring algorithm 4.8 to compute b = aN−1 rem N

3. if b 6= 1 then return “composite” else return “possibly prime”

If a and N are not coprime, then neither are b and N, and the algorithm correctly
returns “composite”. So we may assume that gcd(a, N) = 1. Then by Fermat’s
little theorem, the answer is correct if the test replies “composite”. If it replies
“possibly prime”, it may be right or it may be wrong. We have to understand when
and why an error may occur. To this end, we consider the subgroup of
N−1
LN = {u ∈ Z×
N:u = 1}

of Z× ×
N . Clearly LN is a group, and Fermat’s little theorem says that LN = ZN if N is
× 1 ×
prime. If LN 6= ZN , then in fact #LN ≤ 2 #ZN , since the size of a finite group is an
integer multiple of the size of any of its subgroups, by Lagrange’s theorem (Sec-
tion 25.1). If the a chosen in step 1, taken modulo N, happens to be in Z×N \ LN , then
the test will answer “composite”. Such an a, and also its residue class a mod N, is
called a Fermat witness to the compositeness of N. Similarly, if a mod N ∈ LN ,
then a (and also a mod N) is a Fermat liar for N.
520 18. Primality testing

If we know any Fermat witness, we are guaranteed that N is composite, although

we still do not necessarily know a factor of N. The required computation is fairly
easy to do. We have come across a famous historical example on page 513: the first
n
five Fermat numbers Fn = 22 + 1 are prime, for n = 0, 1, 2, 3, 4. Pierre Fermat
conjectured in August 1640 that all these numbers are prime, while he commu-
nicated his famous “little” theorem in a letter dated 18 October 1640. However,
32
32 ≡ 1 461 798 105 6≡ 1 mod 232 + 1; therefore, by Fermat’s little theorem, Fer-
mat’s conjecture is false. The required computation amounts to 32 multiplications
modulo the 10-digit number 232 + 1 (it takes a few milliseconds on a workstation).
We saw in Section 4.3 how Euler had found the factor 641 of F5 . It turns out that
for “most” composite N, most a’s are Fermat witnesses. The impressive recent
progress in factorization of Fermat numbers is presented in Section 19.1.
It would be nice if this test were enough. However, it clearly fails if N is com-
posite and LN = Z× N . Are there such numbers? Indeed there are, and we define
the Carmichael numbers to be precisely those composite integers N for which
LN = Z× N . In other words, they are the composite numbers without any Fermat
witnesses.
For the cost analysis, we recall the multiplication time M(n) for multiplying two
integers of length n (Definition 8.26). We can use M(n) ∈ O(n log n loglog n), but
the reader may also think of classical multiplication with M(n) ∈ O(n2 ) (see inside
back cover).

T HEOREM 18.3.
If N is prime, then the Fermat test 18.2 returns “possibly prime”. If N is compos-
ite and not a Carmichael number, then it returns “composite” with probability at
least 1/2. The algorithm uses O(log N · M(log N)) word operations.

P ROOF. If gcd(a, N) > 1, then also gcd(b, N) > 1, and the test returns “compos-
ite”, so that we only need to consider the cases where a and N are coprime. If N
is composite and not Carmichael, then #LN ≤ ϕ(N)/2, as noted above, so that at
least half of the possible choices for a in step 1 (coprime to N) are Fermat wit-
nesses. Repeated squaring in step 2 takes O(log N) multiplications modulo N or
O(log N M(log N)) word operations, and the bound on the running time follows. ✷

If N is a Carmichael number, then the Fermat test returns either “possibly prime”
or “composite”. The latter happens only when gcd(a, N) > 1.

18.3. The strong pseudoprimality test

What to do with the challenge posed by the Carmichael numbers? Pomerance
(1990) pleads: Using the Fermat congruence is so simple, that it seems a shame to
give up on it just because there are a few counter-examples!
18.3. The strong pseudoprimality test 521

We now resolve the shortcoming of the Fermat test in a drastic way: the new test
not only distinguishes primes from Carmichael numbers, it actually factors these
seemingly difficult numbers in random polynomial time. In general, factoring
integers is much harder than testing them for primality, and so these numbers turn
out to be quite harmless after all.

L EMMA 18.4. Any Carmichael number is squarefree.

P ROOF. We take a prime number p and assume that it divides the Carmichael
number N exactly e ≥ 2 times. By the Chinese Remainder Theorem, there exists
an a ∈ Z with a ≡ 1 + pe−1 mod pe and a ≡ 1 mod N/pe . Then a has order p
modulo pe , by Lemma 18.1, and hence also modulo N. Since aN−1 ≡ 1 mod N, it
follows that p divides N − 1, by the same lemma. Since p also divides N, we have
a contradiction, and the claim is proved. ✷

Exercise 18.9 shows that N is a Carmichael number if and only if N is squarefree

and p − 1 divides N − 1 for any prime factor p of N, and that Carmichael numbers
are odd and have at least three prime factors. The first three Carmichael numbers
are: 561 = 3 · 11 · 17, 1105 = 5 · 13 · 17, and 1729 = 7 · 13 · 19.
The following algorithm refines the Fermat test, and there is no input on which
it systematically fails. If it is run on a Carmichael number, then it probably not
only recognizes it as composite but in addition finds a proper factor of it.

A LGORITHM 18.5 Strong pseudoprimality test.

Input: An odd integer N ≥ 3.
Output: Either “composite”, or “probably prime”, or a proper factor of N.

1. choose a ∈ {2, . . . , N − 2} uniformly at random

2. d ←− gcd(a, N)
if d > 1 then return d

3. write N − 1 = 2v m with v, m ∈ N, v ≥ 1, and m odd

call the repeated squaring algorithm 4.8 to compute b0 = am rem N
if b0 = 1 then return “probably prime”

4. for i = 1, . . . , v do bi ←− b2i−1 rem N

5. if bv = 1 then j ←− min{0 ≤ i < v: bi+1 = 1} else return “composite”

6. g ←− gcd(b j + 1, N)
if g = 1 or g = N then return “probably prime” else return g
522 18. Primality testing

T HEOREM 18.6.
If N is prime, then Algorithm 18.5 returns “probably prime”. If N is composite and
not a Carmichael number, then the algorithm returns “composite” with probability
at least 1/2. If N is a Carmichael number, the algorithm returns a proper divisor of
N with probability at least 1/2. It uses O(log N · M(log N)) word operations.

i
P ROOF. By induction, we have bi ≡ a2 m mod N for 0 ≤ i ≤ v, and in particular
bv ≡ aN−1 mod N. If bi−1 = 1, then also bi = 1, for any i. If N is composite and
not Carmichael, then with probability at least 1/2, a is a Fermat witness for N,
bv 6= 1, and the algorithm returns “composite” in step 5. We next assume that N
is prime. Then bv = 1. If b0 = 1, then the algorithm correctly returns “probably
prime” in step 3. Otherwise, we have b j 6= 1 and b2j ≡ b j+1 = 1 mod N in step 6.
By Lemma 25.4, the polynomial x2 − 1 ∈ ZN [x] has at most two zeroes. Hence the
only square roots of 1 modulo N are 1 and −1, so that b j = N − 1 and g = N, and
the correct result is returned in step 6.
The last case to be considered is when N is a Carmichael number. We let P be
the set of prime divisors of N. Since N is squarefree, we have N = ∏ p∈P p. We
consider
2i m
I = {i: 0 ≤ i ≤ v and ∀u ∈ Z×
N u = 1}.
Then v ∈ I, by the definition of Carmichael numbers, and i + 1 ∈ I for any i ∈ I
with i < v. Since m is odd, we have (−1)m = −1 6= 1, and therefore 0 6∈ I. Hence
there exists some l < v such that I = {l + 1, l + 2, . . . , v}. Now let
l
2m
G = {u ∈ Z×
N:u = ±1} ⊆ Z×
N.

This is a subgroup of Z× ×
N , and we now show that G 6= ZN . There exists some p ∈ P
l
and b ∈ Z coprime to p with b2 m 6≡ 1 mod p, since otherwise we would have l ∈ I.
We take some such p and b. The Chinese Remainder Theorem implies that there
exists a c ∈ Z such that c ≡ b mod p and c ≡ 1 mod N/p. Then c mod N ∈ Z× N \ G.
Being a proper subgroup, G has at most #Z× N /2 = ϕ (N)/2 elements.
If a in step 1 is chosen so that a mod N ∈ Z× N \ G, then we claim that the algo-
l+1
rithm will actually discover a proper divisor of N. The fact that bl+1 ≡ a2 m ≡ 1
mod N implies that for all p ∈ P, also bl+1 ≡ 1 mod p. Again, the only square roots
l
of 1 modulo p are 1 and −1, so that for each p, a2 m mod p is either 1 or −1. Since
l
bl mod N = a2 m mod N is neither 1 nor −1, both possibilities actually occur, we
have j = l in step 5, and

g = gcd(bl + 1, N) = ∏ p
p∈P
l
a2 m ≡−1 mod p

is a proper divisor of N. The fact that #(Z×

N \ G) ≥ ϕ(N)/2 implies the bound on
the probability.
18.4. Finding primes 523

Steps 2 and 6 take O(M(log N) loglog N) word operations, as discussed at the

end of Section 11.1, the cost of steps 3 and 4 is O(log N) multiplications modulo
N or O(log N M(log N)) word operations, and the time estimate follows. ✷

If N is composite, then a number a ∈ {1, . . . , N − 1}, coprime to N, such that

Algorithm 18.5 outputs “composite” or a proper factor of N is called a strong
witness for the compositeness of N, and if “probably prime” is returned, it is a
strong liar. We also use these two notions for the corresponding residue class
a mod N. Every Fermat witness is a strong witness.
In order to push the error probability below some given positive ε, we can run
the test independently ⌈log2 ε−1 ⌉ times and return “probably prime” if this is the
result of each run, and otherwise “composite” or a factor.
In the proof of Theorem 18.3 and in step 2 of Algorithm 18.5, we have dealt
separately with the case where gcd(a, N) > 1. This is highly unlikely to happen
when N is reasonably large.
What does it mean when a primality test returns “probably prime” on input N? Is
N then “probably prime”? Of course not; N is either prime or it is not. The “prob-
ably” only refers to the random choices we made within the algorithm. (This is not
to be confused with the weaker notion of “average case analysis”, where one as-
sumes a probability distribution on the inputs.) If we have run the test 1001 times,
say, then it means the following: if N is not prime, then an event has been wit-
nessed whose probability is at most 2−1001 . If you fly in an airplane whose safety
depends on the actual primality of such an “industrial-strength pseudoprime”, then
this fact should not worry you unduly, since other things are much more likely to
fail ;-)
Exercise 18.6 shows that with a small modification, both the Fermat test and the
strong pseudoprimality test find the prime factor p when N = pe is a prime power.
Exercise 18.12 shows how to modify the algorithm to find all prime factors of a
Carmichael number (whose factors are in general not Carmichael numbers).

18.4. Finding primes

For our modular algorithms and in cryptography, one needs prime numbers satis-
fying certain specifications. For example, we may be given an integer n and require
an n-bit prime. In this section, we provide the tools for a variety of such tasks, and
the corollaries at the end give cost estimates for all our prime-consuming algo-
rithms. In most (but not all) cases, this cost is negligible when compared to the
overall time taken by the algorithms.
The famous prime number theorem in analytic number theory says approxi-
mately how many primes there are in an initial segment of the natural numbers.
524 18. Primality testing

It may be stated in two equivalent ways, using the two functions

π (x) = #{p ∈ N : p ≤ x, p prime}, pn = the nth prime number,
where x ∈ R>0 and n ∈ N>0 (so that p1 = 2). Its proof is beyond the scope of this
text.

T HEOREM 18.7 Prime number theorem.

Let ln denote the logarithm in base e. Then we have approximately
x
π (x) ≈ , pn ≈ n ln n,
ln x
and more precisely

x 1 x 3
1+ < π (x) < 1+ if x ≥ 59,
ln x 2 ln x ln x 2 ln x

3 1
n ln n + lnln n − < pn < n ln n + lnln n − if n ≥ 20.
2 2

The first statement says that a random integer near x is prime with probability
about (ln x)−1 . If we choose random n-bit integers and test them for primality, we
expect to find a prime with about n · ln 2 trials. Throughout this section, “ln” is the
“natural” logarithm for the prime number theory, but we continue to use “log” in
“O” estimates of running times, where the base is irrelevant.
In order to find a large prime p, say with B < p ≤ 2B for some given B, we
simply test uniformly selected random numbers p in the range for primality and
return the first number that passes k such tests, for some given k. On any composite
number, the tests return “probably prime” with probability at most 2−k . One might
then want to conclude that the output is prime with probability at least 1 − 2−k .
This is fallacious. Imagine that there were only few primes between B and 2B, say
just one. Then for small k one would be much more likely to receive a composite
number than a prime. Thus the density of the primes enters the following result.

T HEOREM 18.8.
Given positive integers B, k, the output of the above procedure is prime with prob-
ability at least 1 − 2−k+1 ln B. It uses an expected number of O(k(log2 B)M(log B))
word operations.

P ROOF. The probability space here is the set of all random choices within the
procedure. By the prime number theorem, the set P of primes considered has size

B 3 B
#P = π (2B) − π (B) ≥ 1− ≥ , (1)
ln B ln B 2 ln B
18.4. Finding primes 525

provided that B ≥ 6 (Exercise 18.18). Thus a random integer between B + 1 and

2B is prime with probability at least
#P 1
≥ .
B 2 ln B
We denote by C and T the events that the chosen random number is composite
and that all k tests answer “probably prime”, respectively. Then prob(p prime) ≤
prob(T ), and, with the conditional probability q = prob(C ∩ T )/ prob(T ) (Section
25.6), we have

(2 ln B)−1 · q ≤ prob(p prime) · q ≤ prob(T ) · q

= prob(C ∩ T ) ≤ prob(C ∩ T )/ prob(C) ≤ 2−k ,

which implies the probability estimate. We expect to make at most 2 ln B choices.
By Theorem 18.6, each choice costs O(k log B · M(log B)) word operations, and the
claim follows for B ≥ 6. If B ∈ {1, . . . , 5}, then we modify the procedure to choose
one of the primes 2, 3, 5, or 7. ✷

The procedure can be viewed as a computationally efficient version of Ber-

trand’s postulate which says that there is always a prime between an integer and
its double. The estimate (1) is slightly too pessimistic for larger values of B; a more
realistic assumption is #P ≈ B/ ln B.
In Section 6.9, we proved that a nonzero polynomial probably takes a nonzero
value at a random point. We now give an integer version of this important tool
for modular algorithms. Evaluating a polynomial at a point and taking an inte-
ger modulo a prime are analogous operations. If we rewrite “ f (u) = 0” as “ f ≡
0 mod x − u” in the case n = 1, the integer analog of Lemma 6.44 reads as follows.

L EMMA 18.9. Let P ⊆ N be a finite nonempty set of prime numbers, a = min P,

and M ∈ Z with 0 6= |M| ≤ C. If p is chosen from P uniformly at random, then
loga C
prob{M ≡ 0 mod p: p ∈ P} ≤ . (2)
#P

P ROOF. There are at most loga |M| ≤ loga C primes in P that divide M. ✷

We are now in a position to provide one of the ingredients for the modular
algorithms for the determinant (Section 5.5), gcds and the Extended Euclidean
Algorithm (Chapter 6), root finding (Sections 14.5 and 15.6), and factorization
(Chapters 15 and 16) in Z[x], and for cryptography: finding suitable primes. Table
18.1 summarizes the costs and requirements. The second last algorithm does not
occur explicitly in Section 16.5; essentially, one has to replace the first four steps of
526 18. Primality testing

cost for
modular prime requirements prime algorithm
algorithm finding
deter- big prime §5.5 p > 2nn/2 An n3 log3 A n4 log A
minant small primes 5.10 p1 , . . . , pr < 2r ln r n log A n4 log A
√
big prime 6.34 p > n+1 · 2n+1 A2 n +log3 A
3 2
n +n log A
gcd
small primes 6.38 p1 , . . . , pr < 2r ln r n log A n2 +n log A
EEA small primes 6.57 p1 , . . . , pr < 2r ln r n log A n3 log A
root big prime 14.17 p > 2n(A2 +A) log3 A n log A+n log2 A
2

finding prime power §15.6 p < 2r ln r n log A n2 log A

√
factor- big prime 16.22 p > n+1 · 2n+1 A2 n +log3 A
3 n10 +n8 log2 A
ization prime power 16.22 p < 2r ln r n log A n10 +n8 log2 A

TABLE 18.1: Costs and requirements of various modular algorithms on inputs of degree
(or dimension) n and max-norm at most A. For all small prime and prime power algorithms,
there is a parameter r ∈ O(n log(nA)). For some big prime algorithms, we also have the
requirement that p does not divide a certain subresultant, of word length O(n log(nA)).
The last column contains the running time for the remaining algorithm without the prime
finding stage. All stated costs are with fast arithmetic and ignore logarithmic factors.

Algorithm 16.22 by the first three steps of the big prime algorithm 15.2. For most
algorithms, we see in the fourth column that the time for finding one or several
small primes is much less than the time for finding a big prime; in practice, we
would work with a precomputed list of small primes, as discussed below. However,
also the remaining stages of small primes and prime power algorithms are faster
than the corresponding stages of their big prime counterparts: in theory only by
logarithmic factors, which do not show up in the last column of Table 18.1, but
they are clearly visible in practice (see Section 6.13).

T HEOREM 18.10.
(i) There is a probabilistic algorithm which, with probability at least 3/4, re-
turns a prime p between B +1 and 2B, for any positive integer B ∈ N of word
length β . Moreover, if M ∈ Z is a nonzero number such that 6 ln |M| ≤ B,
then p is prime and p does not divide M with probability at least 1/2. The
algorithm takes O(β 2 · M(β ) log β ) word operations.

(ii) For r ∈ N, we can compute the first r prime numbers p1 = 2, . . . , pr ∈ N

deterministically at a cost of O(r log2 r loglog r) word operations, and each
of them is less than 2r ln r if r ≥ 2.

P ROOF. (i) If B ≥ 6, then Theorem 18.8 with k = 2 + ⌈log2 ln B⌉ gives the first
claim. Using Lemma 18.9 and (1), we find that p divides M with probability
18.4. Finding primes 527

at most
logB C lnC · 2 ln B 1
≤ ≤
#P ln B · B 3
if it is prime. Therefore the probability that p has the required properties is at least
3 1 1 B/6
4 (1 − 3 ) = 2 . If B ∈ {1, . . . , 5}, then |M| ≤ ⌊e ⌋ ≤ B, so that none of the primes
between B + 1 and 2B divides M, and we may take one of 2, 3, 5, or 7 for p.
(ii) We have pr < r(ln r + lnln r − 1/2) ≤ 2r ln r for r ≥ 20, by the prime number
theorem 18.7. In fact, pr < 2r ln r for all r ≥ 2. We find our primes by the sieve
of Eratosthenes, as follows. We write down a list of all integers below x = 2r ln r.
Then we cross out all even numbers, all multiples of 3, all multiples of 5, and so
√
on, for each prime less than x. The remaining integers are not divisible by a
√
prime less than x, and hence they are prime. The cost is ⌊x/p⌋ steps for each
√
prime p ≤ x, altogether at most
1
x ∑√ ∈ O(x loglog x)
p< x
p
p prime

steps, by equation (3.20) in Rosser & Schoenfeld (1962). We may implement a

single step by setting a flag for “crossed out” and incrementing a counter by p,
taking O(log x) word operations. We obtain a total cost of O(x log x loglog x) word
operations, and the claim follows from log x ∈ O(log r). ✷

Our first application is the big prime modular gcd algorithm 6.34.

C OROLLARY 18.11.
Let n ∈ N≥2 , f , g ∈ Z[x] be primitive, with degrees at most n and max-norms at
most A, h = gcd( f , g), b = gcd(lc( f ), lc(g)), B = ⌈(n + 1)1/2 2n+1 bA⌉, and β =
log B. If n ≥ 5 or A ≥ 5, then we can find an integer p with B < p ≤ 2B such that,
with probability at least 1/2, p is prime and does not divide res( f /h, g/h). This
algorithm uses O(β 2 M(β ) log β ) or O∼ (n3 + log3 A) word operations.

P ROOF. We let σ = res( f /h, g/h), and have |σ | ≤ (n + 1)n A2n , by Theorem 6.35,
and therefore 6 ln |σ | ≤ 12n ln((n + 1)A). Since 12n < 2n+1 and ln((n + 1)A) <
(n + 1)1/2 A if n ≥ 5, and

12n ln((n + 1)A) < (n + 1)1/2 2n+1 A ≤ B

if n = 2, 3, 4, for all A ≥ 5, we find 6 ln |σ | ≤ B in all cases, and the claims follow

from Theorem 18.10 (i) and β ∈ O(n + log A). ✷

It is satisfying to have a worst-case guarantee on the cost of prime generation,

and to see yet another reason why the big prime modular algorithms are impractical
528 18. Primality testing

in spite of their conceptual simplicity. In Figure 6.4 the big prime running times
are quite erratic, but smooth when the cost of prime generation is suppressed. In
small primes or prime power modular algorithms, the cost of obtaining primes is
negligible.
The prime finding step in the factorization algorithms 15.2 (big prime) and 15.19
and 16.22 (prime power) in Z[x] is quite inexpensive.

C OROLLARY 18.12.
Let f ∈ Z[x] be squarefree of degree n ≥ 2 and with max-norm || f ||∞ = A, γ =
2n ln((n + 1)A), and suppose that A ≥ 5 if n ≤ 4.

(i) There is a probabilistic algorithm which, with probability at least 1/2, out-
puts a prime p between B + 1 and 2B and not dividing res( f , f ′ ), where
B = ⌈(n + 1)1/2 2n+1 | lc( f )|A⌉. Then β = log B ∈ O(n + log A), and the al-
gorithm uses an expected number of O(β 2 M(β ) log β ) or O∼ (n3 + log3 A)
word operations.
(ii) With O(γ log2 γ loglog γ ) or O∼ (n log A) word operations we can find (prob-
abilistically) a prime p of word length in O(log γ ) and such that p ∤ res( f , f ′ )
with probability at least 1/2.

P ROOF. (i) We let σ = res( f , f ′ ) and C = (n + 1)2n A2n−1 . As in the proof of

Corollary 18.11, we have 6 lnC ≤ 6γ ≤ B. Since || f ′ ||∞ ≤ nA and 0 < |σ | ≤ C by
Theorem 6.23, the claim follows from Theorem 18.10 (i).
(ii) We let σ and C be as in the proof of (i). Then Theorem 18.10 (ii) with
r = ⌈2 log2 C⌉ ∈ O(γ ) says that we can compute the first r primes, each less than
2r ln r, within the stated time bound. By the choice of r, at most half of these
primes divide σ , and we choose one of them uniformly at random. ✷

With some more calculations using the prime number theorem one can shave off
a logarithmic factor in the estimate of (ii) (Exercise 18.21). Similar improvements
are possible for small primes modular determinant computation (Algorithm 5.10),
small primes modular gcd computation (Algorithm 6.38), and the small primes
modular EEA (Algorithm 6.57).
A software implementation of small primes or prime power modular algorithms
should precompute a table of small primes, so that for most purposes only table
look-up is needed. Rather than using the first primes, it is more efficient to take
the largest single precision primes. As discussed at the beginning of Chapter 5 and
in Section 8.3, it is advantageous in the small primes modular approach to choose
p1 , . . . , pr to be Fourier primes, so that pi − 1 is divisible by some large power 2t
of 2, for all i. There are quantitative versions of Dirichlet’s (1837) famous theo-
rem on primes in arithmetic progressions that give asymptotic estimates, but even
18.5. The Solovay and Strassen test 529

the best versions (Alford, Granville & Pomerance 1994, Bach & Sorenson 1996)
are considerably less precise than the prime number theorem 18.7. For practical
purposes, however, it is reasonable to assume that a random number p ≡ 1 mod 2t
near x is prime with probability about 2/ ln x. To find enough such primes, we
consecutively test 2t + 1, 2 · 2t + 1, 3 · 2t + 1, . . . for primality until we have found r
primes. This will be a precomputation stage. Exercise 18.19 estimates the number
of single precision Fourier primes for 32-bit and 64-bit processors.
In our big prime algorithms for gcd computation and factorizations, it may hap-
pen that the number p given by Theorem 18.10 is not prime. If we stumble upon
a nonzero element which is not invertible in our computation, then we recognize
our p as composite and start all over again with a new one. However, it may hap-
pen that all computations go through even if p is composite. In the gcd case, it
is possible to show that the output is nevertheless correct, but in the polynomial
factoring algorithm, reducible polynomials may wrongly be declared irreducible.
However, this only happens with probability at most 1/2, by Theorem 18.10, and
we may simply rerun the whole algorithm several times independently to make the
error probability arbitrarily small. Or, preferably, we use the prime power factoring
algorithm, which is faster and returns the complete factorization in any case.

18.5. The Solovay and Strassen test

The following two sections are not used in what follows and may be skipped at
first reading. We do not give proofs.
The first probabilistic polynomial-time test for primality is due to Solovay &
Strassen (1974, published 1977). The Legendre symbol is defined for a, N ∈ Z
with N prime as

 1 if gcd(a, N) = 1 and a is a square modulo N,
a
= −1 if gcd(a, N) = 1 and a is not a square modulo N,
N 
0 if gcd(a, N) 6= 1,

A famous theorem of number theory, Gauß’ law of quadratic reciprocity, says

that for distinct odd primes a and N the two symbols

a N
and
N a

are equal unless and only unless both a and N are congruent to 3 modulo 4. The
Jacobi symbol is the generalization to an arbitrary odd N. If N = pe11 · · · per r is its
prime factorization, then it is defined as
e1 er
a a a
= ··· .
N p1 pr
530 18. Primality testing

This quantity can be computed by an efficient method, akin to the Euclidean Al-
gorithm, without actually factoring N (Notes 18.5 and Exercise 18.23).
When N is a prime, then Lemma 14.7 implies that

a
≡ a(N−1)/2 mod N (3)
N

for all a ∈ Z. Solovay & Strassen (1977) prove that (3) is false for at least half of all
a in {1, . . . , N −1} if N is composite and not a prime power. Their algorithm checks
(3) for randomly chosen a; each test takes O(log N · M(log N)) word operations.
Although Berlekamp’s (1970) probabilistic algorithm for factoring polynomials
(Section 14.8) had been around for a while, it was the Solovay & Strassen (1977)
result for integers that aroused widespread interest in the power of randomized al-
gorithms. (Are numbers more intuitive to computer scientists than polynomials?
The reader should by now—and even more so after reading Chapter 19—be con-
vinced that polynomials are much easier objects than numbers.) See Notes 6.5.

18.6. Primality tests for special numbers

For some special integers of interest in number theory, particularly efficient pri-
mality tests have been designed. Pepin (1877) obtained the following primality
n
test for the Fermat numbers Fn = 22 + 1 (see page 513 and Section 18.2):

Fn prime ⇐⇒ 3(Fn −1)/2 ≡ −1 mod Fn .

In fact, this is just the Solovay–Strassen test with a = 3 (Exercise 18.25). If p

is a prime dividing Fn , then p is of the form p = a · 2n+2 + 1 for some integer a
(Exercise 18.26). Trying such divisors led to the large factor 5 · 21947 + 1 of F1945 .
The Mersenne numbers are defined by Mn = 2n − 1 for n ≥ 1. The integer
Ma is a nontrivial factor of Mab , for a, b ∈ N, and hence if Mn is prime, then so
is n. There is a special test for these numbers, the Lucas-Lehmer test (Lucas
1878, Lehmer 1930, 1935). The test says that 2n − 1 is prime if and only if ln−1 ≡
2
0 mod 2n − 1, where li is recursively defined by l1 = 4 and li = li−1 − 2 for i ≥ 1.
Presently it is not known if there are infinitely many Fermat primes or infinitely
many Mersenne primes. No new Fermat primes have been discovered since Fer-
mat’s time, while there are now 47 Mersenne primes known, as mentioned at the
beginning of this chapter.
Williams & Dubner (1986) used a Lucas-Lehmer type test to prove the primality
of the repunit
Notes 531

101031 − 1
=
9
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111
11111111111.

Notes. Good references for the material of this chapter are Knuth (1998), §4.5.4, Koblitz
(1987a), Bach (1990), Lenstra & Lenstra (1990), Lenstra (1990), Adleman (1994), and
Bach & Shallit (1996).
The word prime number (πρω̃τος ἄριθμος) comes, according to Iamblichus, from the
fact that in Eratosthenes’ sieve (Section 18.4) they are the first to appear in the sequence of
their multiples which have to be removed.
An integer is perfect if it equals the sum of all its proper divisors; 6 = 1 + 2 + 3 is an
example. Euclid proves in Proposition 36 of Book 9 of his Elements that 2n−1 Mn is perfect
for any Mersenne prime Mn (Exercise 18.11); for n = 2 and 3 we obtain 6 and 28. The
39th known Mersenne prime provides in this way the largest known perfect number (at the
time of writing).
The idea of internet computing was pioneered by Silverman in the area of factoring inte-
gers (see Caron & Silverman 1988), and became popular with Lenstra & Manasse’s (1990)
article. All about Mersenne prime records can be found at http://www.mersenne.org.
18.2. Carmichael numbers have their name from the work of Carmichael (1909/10, 1912).
Mahnke (1912/13) discusses Leibniz’ proof of Fermat’s little theorem (see Notes 4.4) and
his attempts at a converse; Leibniz thought for a while that N is prime if 2N−1 ≡ 1 mod N.
In his discussion, Mahnke gives the defining property of Carmichael numbers, proves that
neither a prime power nor a product of two primes is Carmichael, and gives five examples,
including 561. He mentions a letter of Bachmann with similar results. Actually, Ball &
Coxeter (1947), first issued in 1892, attribute a Chinese origin to Leibniz’ incorrect pri-
mality criterion, and Tarry (1898) asked in the 19th century version of sci.math whether
532 18. Primality testing

the criterion is true. In his 11-line reply, Korselt (1899) gives the definition of Carmichael
numbers (as on page 520) and states that he has proved the characterization of Exercise
18.9 (ii), namely squarefreeness and divisibility of N − 1 by the least common multiple of
all p − 1. Thus we may consider Korselt as the discoverer of Carmichael numbers, and
the characterization is known as Korselt’s criterion. The editor notes after Korselt’s reply
that five other replies were received to this question; toutes à peu près dans le même sens1 .
Lenstra (1979b) proves that if aN−1 ≡ 1 mod N for every prime a < ln2 N, then N is
squarefree.
18.3. Miller (1976) proposed a deterministic version of the strong pseudoprimality test
18.5 and showed that it runs in polynomial time under the ERH (Theorem 18.6), and Rabin
(1976, 1980a) suggested the probabilistic variant. Neither of the two algorithms looks for
factors of Carmichael numbers. Earlier versions of the test were given by Dubois (1971)
and Selfridge (not later than 1974, unpublished), but they did not reach a wide audience.
Dubois suggests the strong pseudoprimality test with a = 2, 3, and 5. He is well aware that
it may fail, and proposes to use a = 7 as well. See the end of these Notes for the smallest
N on which this variant fails.
Bach, Miller & Shallit (1986) state a generalized version of Algorithm 18.5 for integer
factorization, without actually mentioning the application to the Carmichael case. The fact
that Carmichael numbers can be factored in random polynomial time seems to be folklore.
Alford, Granville & Pomerance (1994) solved a long-standing open problem by proving
that there are infinitely many Carmichael numbers.
For an odd composite integer N, the probability for a random a ∈ {1, . . . , N − 1} with
gcd(a, N) = 1 to be a strong liar is at most 1/2, by the proof of Theorem 18.6. Rabin
(1980a), Monier (1980), and Atkin & Larson (1982) have shown the smaller bound 1/4. To
generate random primes, suppose that we fix n and k, choose n-bit odd numbers uniformly
at random, subject them to k strong pseudoprimality tests, and return the first one that
passes all these tests. We call pn,k the probability that a composite number is returned.
Damgård, Landrock & Pomerance (1993) deal with the subtleties of estimating pn,k , as
noted before Theorem 18.8, and prove several estimates, for example:

1 15/4 −n/2−2k
p600,1 ≤ 2−75 , pn,k < n 2 if 4k ≥ n ≥ 21.
7
All reasonably small numbers have small strong witnesses: Pomerance, Selfridge & Wag-
staff (1980) prove that for all composite N ≤ 25 × 109 (except for N = 3 215 031 751), at
least one of 2, 3, 5, and 7 is a strong witness. Pinch (1993) describes erroneous results of
primality tests implemented in some computer algebra systems.
Both the Solovay–Strassen and the strong pseudoprimality tests should properly be
called compositeness tests because they show the set P RIMES of all prime numbers is in
co-RP and its complement (without 0 and 1) C OMPOSITES is in RP, which is a (possibly
proper) subset of BPP (Section 25.8), but the wrong terminology has stuck.
18.4. The prime number theorem is a central result in number theory, and has a long and
distinguished history. Proofs of the asymptotic version stated first in Theorem 18.7 are in
many texts, for example Hardy & Wright (1985). The precise version stated in Theorem
18.7 is from Rosser & Schoenfeld (1962).
1 all in about the same sense
Notes 533

An early attempt at the prime number theorem was by Legendre (1798), and Gauß
(1849) said that he found the estimate around 1792. Chebyshev (1849, 1852) proved that
π(x) is asymptotically x/ ln x, up to a constant factor, and de la Vallée Poussin (1896) and
Hadamard (1896) proved that π(x) = x/ ln x + R
o(x/ ln x). A better approximation is given
by the logarithmic integral π(x) ≈ Li(x) = 2x dt/ lnt.
A vital tool in modern prime number theory is Riemann’s (1859) zeta function ζ(s),
a meromorphic function on the complex plane. It is obtained by analytic continuation of

ζ(s) = ∏ (1 − p−s )−1 = ∑ n−s ,

p prime n≥1

which is defined when ℜs > 1, and which already Euler used. Riemann made his fa-
mous conjecture, the Riemann Hypothesis, that all zeroes s of ζ lie on the critical line
ℜs = 1/2: [. . .] und es ist sehr wahrscheinlich, dass alle Wurzeln [von ζ( 12 + it)] reell sind.
Hiervon wäre allerdings ein strenger Beweis zu wünschen; ich habe indess die Aufsuchung
desselben nach einigen flüchtigen vergeblichen Versuchen vorläufig bei Seite gelassen, da
er für den nächsten Zweck meiner Untersuchung entbehrlich schien.2 A proof of this
conjecture, still elusive after over 130 years, would imply dramatic improvements in the
estimates for the error term in the prime number theorem. Clever methods have been de-
vised to calculate billions of roots of the zeta function (van de Lune, te Riele & Winter
1986, Odlyzko & Schönhage 1988, Odlyzko 1995c); fast arithmetic is a must for such
high-performance calculations.
Already Legendre (1798) had used a logarithmic integral, but he was well aware that he
had no proof of his (incorrect) formula. He set the task solved in this chapter: Il serait à
désirer, pour la perfection de la théorie des nombres, qu’on trouvât une méthode praticable
au moyen de laquelle on pût décider assez promptement si un nombre donné est premier
ou s’il ne l’est pas.3 Did he already feel that there is a computational difference between
testing primality and factoring?
The zeta function has been generalized from integers to algebraic number fields. The
conjecture that all those generalizations have their zeroes on the critical line is called the
Extended Riemann Hypothesis. For several algorithms, the estimates of their running time
(or their proofs of correctness) rely on the ERH; see Notes 14.9.
Pritchard (1983, 1987) and Sorenson (1998) give several more efficient versions of the
sieve of Eratosthenes.
18.5. We heard the “unless and only unless” from Hendrik Lenstra. The “iff” was coined
by Halmos (see Halmos 1985, page 403), and Conway invented comic imitations like
“unlesss”.
Monier (1980) compares the two tests by Solovay & Strassen (1977) and by Miller and
Rabin. Eisenstein (1844) and Lebesgue (1847) present algorithms for the Jacobi symbol.
They are analyzed in Shallit (1990), and efficient methods are given by Bach & Shallit
(1996), §5.9, and Meyer & Sorenson (1998).
Miller, Rabin, Solovay, and Strassen shared the ACM Paris Kanellakis Award for their
work.
2 [. . .] and it is very probable that all roots [of ζ( 1 + it)] are real. A rigorous proof of this would be desir-
2
able; I have, however, left aside the quest for one after several brief and unsuccessful attempts, since it seemed
dispensable for the immediate goal of my investigation.
3 It would be desirable for the perfection of number theory to find a practical method by which one should be
able to decide fairly quickly whether a given number is prime or not.
534 18. Primality testing

18.6. See Bach & Shallit (1996) for details on Pepin’s test, and Hardy & Wright (1985),
§2.5, for the example F5 . In fact, Pepin used 5 instead of 3 as his witness. It is widely
conjectured that no Fn > F4 is prime, but Wagstaff (1983) has conjectured that

eγ
#{p < x M p is prime} ≈ lnln x ≈ 2.57 lnln x,
ln 2

where γ = 0.5772156649 . . . is Euler’s constant. In particular, this would imply that there
are infinitely many Mersenne primes. The term “Mersenne number” was apparently coined
by Rouse Ball in 1892 (see Ball & Coxeter 1947, page 65: Mersenne’s Numbers ), and
“repunit” is from Beiler (1964), page 83.
Further notes. The largest twin primes known at the time of writing are 1 807 318 575 ·
298305 ± 1, with 29 603 decimal digits, discovered by Underbakke, Carmody, and Gallot.
Caldwell’s prime list http://www.utm.edu/research/primes regularly updates this
and many other prime records. These numbers have a very special form and can be proven
to be prime by deterministic methods which are far too slow for general numbers of this
size.

Exercises.
18.1 Show that a2 + b2 ≡ 0, 1, or 2 modulo 4 for all a, b ∈ Z.

18.2 Compute 21 000 005 mod 55. Hint: This needs virtually no calculation.

18.3 Which of the two integers 10200 +349 and 10200 +357 is probably prime and which is certainly
composite? You may use a computer algebra system to find this out, but you should not use routines
like isprime or ifactor. Warning: not every exponentiation routine is suited for solving this task.

18.4∗ You are to determine precisely the error probability of the Fermat test in a special case. Let
p 6= q be primes with p ≡ q ≡ 3 mod 4 and gcd(p − 1, q − 1) = 2, and N = pq.
(i) Prove that gcd(N − 1, p − 1) = 2, and conclude that {uN−1 : u ∈ Z× 2 ×
p } = {u : u ∈ Z p } and
prob(a N−1 ≡ 1 mod p) = 2/(p − 1) for a uniform random element a ∈ {1, . . ., p − 1}. Hint: Ex-
ercise 14.11.
(ii) Calculate the probability that the Fermat test outputs “possibly prime” on input N, assuming
that a is chosen from {1 ≤ c < N: gcd(c, N) = 1} uniformly at random in step 1. Compare your result
numerically to the estimate from Theorem 18.3 for p = 79 and q = 83.

18.5∗ (i) Find all Fermat liars for N = 15.

(ii) Show that if p and 2p − 1 are both prime and N = p(2p − 1), then 50% of the elements in Z×
N
are Fermat liars, namely all those which are squares modulo 2p − 1.

18.6 (Lenstra, Lenstra, Manasse & Pollard 1993) Let N be composite.

(i) Let a ∈ {1, . . ., N − 1} with gcd(a, N) = 1 be a Fermat witness and g = gcd(aN−1 − 1, N). Prove
that N is not a prime power if g = 1.
(ii) State and prove a similar criterion when a is a strong witness.

18.7 Prove that for N = 3 215 031 751, the smallest strong witness is 11.

18.8 Find a 20 decimal digit prime. Explain how you obtained it and why you believe it is prime.
You may find functions such as M APLE’s isprime useful.
Exercises 535

18.9 Let N ∈ N>3 . Prove:

(i) If N is a Carmichael number, then it has the property that p − 1 divides N − 1 for all prime
divisors p of N. Hint: CRT and Exercise 8.16.
(ii) N is prime or a Carmichael number if and only if it is squarefree and the property in (i) holds.
(iii) A Carmichael number is odd and has at least three distinct prime divisors.
(iv) Which of the following integers are Carmichael numbers: 561, 663, 867, 935, 1105, 1482,
1547, 1729, 2077, 2465, 2647, 2821, 172 081? You may use the integer factoring routine of your
favorite computer algebra system.
18.10∗ (i) Find all Carmichael numbers of the form 3pq, where p 6= q are primes.
(ii) Find all Carmichael numbers of the form 5pq, where p 6= q are primes.
(iii) Show that for any fixed prime number r there are only finitely many Carmichael numbers of
the form rpq, with distinct primes p, q.
18.11∗ For N ∈ N≥2 , we denote by D(N) = {1 ≤ d < N: d | N} the set of proper divisors of N.
(i) Let n ∈ N be such that Mn = 2n − 1 is a Mersenne prime. Show that Pn = 2n−1 Mn is a perfect
number, so that
∑ d = Pn .
d∈D(Pn )
(These are the only even perfect numbers, and it is unknown whether there are odd perfect numbers.)
(ii) Assume that p1 = 6m + 1, p2 = 12m + 1, and p3 = 18m + 1 are primes, for some m ∈ N≥1 .
Prove that p1 p2 p3 is a Carmichael number. (The smallest example is 1729 = (6+1)(12+1)(18+1),
where m = 1. Can you find other examples? 1729 = 1 + 123 = 93 + 103 is Hardy and Ramanujan’s
famous taxi-cab number, the smallest positive integer which is the sum of two cubes in two different
ways; see Hardy 1937, page 147.)
(iii) Let D(Pn ) = {d1 , d2 , . . ., d2n−1 }, m ∈ N≥1 , pi = di Pn m+1 for 1 ≤ i < 2n, and N = p1 · · · p2n−1 .
Show that pi − 1 divides N − 1 for 1 ≤ i < 2n, and conclude that N is a Carmichael number if
p1 , . . ., p2n−1 are prime. (Part (ii) is the special case n = 2.)
18.12∗ (i) Modify Algorithm 18.5 so as to find all prime factors of a Carmichael number N.
Your algorithm should take a confidence parameter k ∈ N as additional input, such that each factor
in the output is prime with probability at least 1 − 2−k , and it should use an expected number of
O((k + log N) logN · M(log N)) word operations.
(ii) Generalize your algorithm from (i) so that it can factor a squarefree number N if a multiple L
of λ(N) (see Exercise 18.13) is known. Prove the same time bound as for (i) if log L ∈ O(log N).
(A more general algorithm is in Bach, Miller & Shallit 1986).
18.13 Let N > 1 be an odd integer with prime factorization N = pe11 · · · per r . The Carmichael func-
tion λ(N) is defined as the least common multiple of ϕ(pe11 ), . . ., ϕ(per r ), where ϕ is Euler’s totient
function.
(i) Prove that aλ(N) = 1 for all a ∈ Z×N.
(ii) Prove that a N−1 = 1 for all a ∈ Z×
N if and only if λ(N) | N − 1. Hint: Exercise 18.9.

18.14∗ Let N > 1 be an odd integer, λ(N) as in Exercise 18.13, and CN = {a ∈ Z× N:a
λ(N)/2 = ±1}.
×
(i) Prove that CN is a multiplicative subgroup of ZN .
(ii) Show that if N = pe for some e ≥ 1 and some prime p, then CN = Z× N . Hint: Exercise 9.40.
(iii) Prove the converse of (ii). Hint: CRT.
(iv) Recall that N is a perfect power if N = ml for some integers m, l > 1. Discuss whether the
following is a good primality testing algorithm: First check whether N is a perfect power. If it is not,
then output “probably prime” if aλ(N)/2 = ±1 for a randomly chosen a ∈ Z× N , and return “composite”
otherwise.
536 18. Primality testing

18.15 Let p ∈ N be an odd prime.

(i) Prove that no square in F×
p is a generator of that group.
n
(ii) If p = 22 + 1 is a Fermat prime, for some n ∈ N, then show that any nonsquare in F×
p is a
generator of that group. Hint: Exercise 8.16.
18.16 Let r, s ∈ N≥1 such that p = r2s + 1 is prime.
(i) Let a ∈ N be a nonsquare modulo p. Prove that ar mod p is a primitive 2s th root of unity in F× p.
Hint: Exercise 8.15.
(ii) Design a probabilistic algorithm that on input r, s computes a primitive 2s th root of unity in F×
p
(the algorithm need not check whether p is prime), and prove that it uses an expected number of
O(log p · M(log p)) word operations.
(iii) Using a computer algebra system, find all primes p between 231 and 232 such that p − 1 is
divisible by 227 , and find a primitive 227 th root of unity modulo each such prime. Prove that 27 is
the largest exponent for which three such primes exist.
18.17 Show that for each positive integer n, there exists a positive integer a such that a, a + 1,
a + 2, . . ., a + n are all composite.
18.18 (i) Prove that π(2x) − π(x) > x/(2 ln x) if x ≥ e6 . Hint: Prime number theorem.
(ii) Write a program which checks that the claim from (i) is indeed true for x ≥ 6.
(iii) How many primes between 2k−1 and 2k are there at least for k = 32 and k = 64?
(iv) Assuming that n ≈ log2 B in the small primes modular determinant algorithm 5.10, for which
values of n are k-bit primes sufficient for k = 32 and k = 64?
18.19 Let k and s be integers with 0 < s < k. A single precision Fourier prime is a prime p with
2k−1 < p < 2k and p ≡ 1 mod 2s . For simplicity, you may assume that the probability for a random
number of the form 1 + q2s between 2k−1 and 2k to be prime is 2/(k − 1) ln 2, as discussed at the end
of Section 18.4.
(i) For k = 32 and k = 64 and 1 ≤ s < k, estimate the number of single precision Fourier primes
and tabulate the estimates. Compare them to the actual number of such primes for k − 8 ≤ s < k.
(ii) Consider the small primes modular multiplication algorithm in Z[x] from Section 8.4. Use your
estimates from (i) to answer the following question for processors with word size k = 32 and k = 64.
Assuming that you want to multiply polynomials of degree less than 2s−1 with 2s−1 -bit coefficients
by that method using single precision Fourier primes, what is the maximal possible value of s?
18.20∗ The following method improves the cost in Theorem 18.8 by a logarithmic factor, by first
testing for small prime factors. We use the fact that the number of integers up to x having no prime
factor less or equal to y is asymptotically at most x/ ln y (see Tenenbaum (1995), Theorem III.6.3).
Given B and k as for Theorem 18.8, we set β = ⌈ln B⌉ and let r ∈ N be a parameter. We choose
integers p ∈ {B + 1, . . ., 2B} uniformly at random, use Pollard & Strassen’s algorithm 19.2 to check
whether p has a prime factor less or equal to r, and if not, then we run k strong pseudoprimality tests
on p. We return the first p that is always pronounced “probably prime”. Prove:
(i) The expected cost for all smoothness tests is O(β M(r1/2 )M(β)(log r + log β)).
(ii) The output is prime with probability (over the random choices) at least 1 − 2−k+1 β/ ln r.
(iii) If M(β) ∈ O(β 2 / log2 β) and we choose r = β, then the analog of Theorem 18.8 is valid with
success probability at least 1 − 2−k+1 β/ lnβ and cost O(kβ 2 M(β)/ logβ).
18.21∗ You are to improve the time estimates for some small primes and prime power modular
algorithms over Z by logarithmic factors. The function ϑ(x) = ∑ p≤x ln p, with p running over the
primes, will be useful. We have ϑ(x) ≈ x, and Rosser & Schoenfeld (1962) show more precisely that

1 1
x 1− < ϑ(x) < x 1 + if x ≥ 41.
ln x 2 ln x
Exercises 537

(i) The small primes modular determinant algorithm 5.10 and the small primes modular EEA 6.57
only require that we find a collection of primes p1 , . . ., pr ≤ x such that their product exceeds a given
bound C ∈ R>0 , and we took the first r ≈ log2 C primes for simplicity (Theorems 5.12 and 6.58),
so that x = pr ≈ (log2 C) ln log2 C. But in fact, ϑ(x) ≥ lnC is sufficient, which leads to the choice
x ≈ lnC, by the above, so that r = π(x) ≈ lnC/ lnlnC. Work out the details and show that the cost
of Algorithms 5.10 and 6.57 drops to O(n4 log(nB) loglog(nB) + n3 log2 (nB)) and O(n3 m log2 (nA))
word operations, respectively.
(ii) In Corollary 18.12 (ii), the requirement is slightly different: we need r primes such that
the product of each r/2 of them exceeds the discriminant bound C. Thus we may take x ≈ lnC
and r = 2π(x) ≈ 2 lnC/ lnlnC. Use this to improve the cost estimate of Corollary 18.12 (ii) to
O(γ log γ loglog γ) word operations.
18.22 Let p ∈ N be an odd prime.
(i) Prove that 4 divides p − 1 if −1 is a square modulo p. Hint: Lagrange’s theorem.
(ii) Prove the converse of (i). Hint: Consider a(p−1)/4 for a nonsquare a ∈ F×
p.
(iii) Conclude that the Legendre symbol ( −1
p ) is 1 if and only if p ≡ 1 mod 4.
18.23∗ (i) Show that the Jacobi symbol is multiplicative with respect to both arguments:

ab a b a a a
= , =
N N N MN M N
for all a, b, M, N ∈ N>0 with M, N ≥ 3 odd.
(ii) Prove that the law of quadratic reciprocity also holds for the Jacobi symbol: If a, N ∈ N are
coprime and odd, then ( Na ) and ( Na ) are equal unless and only unless a ≡ N ≡ 3 mod 4.
(iii) A special case of the law of quadratic reciprocity is that ( N2 ) = 1 if and only if N ≡ ±1 mod 8
for an odd prime N ∈ N. Prove that this also holds for the Jacobi symbol, where N ≥ 3 is an arbitrary
odd integer.
(iv) Show that ( Na ) = ( a rem
N
N
) for all a, N ∈ N≥1 with N ≥ 3 odd.
(v) Write an efficient algorithm that, given an odd integer N > 1 and a ∈ {1, . . ., N − 1}, computes
the Jacobi symbol ( Na ), and analyze its cost.
18.24∗ (Lehmann 1982) Let N ∈ N≥3 be odd, σ: Z× ×
N −→ ZN the power map σ(a) = a
(N−1)/2 , and
×
T = im(σ) ⊆ ZN .
(i) Show that T = {1, −1} if N is prime.
(ii) Prove that T 6= {1, −1} if N is not a prime power. Hint: Assume that −1 ∈ T and apply the
Chinese Remainder Theorem.
(iii) Show that T 6= {1, −1} if N = pe for a prime p ∈ N and e ∈ N≥2 . Hint: Lemma 18.1.
(iv) Prove that N is a Carmichael number if T = {1}.
(v) Consider the following algorithm.
A LGORITHM 18.13 Lehmann’s primality test.
Input: An odd integer N ≥ 3 and a parameter k ∈ N.
Output: Either “probably composite” or “probably prime”.
1. for i = 1, . . ., k do
2. choose ai ∈ {1, . . ., N − 1} uniformly at random
(N−1)/2
3. call the repeated squaring algorithm 4.8 to compute bi = ai rem N.
4. if {b1 , . . ., bk } 6= {1, −1} then return “probably composite” else return “probably prime”

Prove that the algorithm outputs “probably prime” with probability at least 1 − 21−k if N is prime,
and that it outputs “probably composite” with probability at least 1 − 2−k if N is composite.
538 18. Primality testing

(vi) Prove that Lehmann’s algorithm can be executed with O(k log N · M(log N)) word operations.
(vii) Discuss the following modification of step 4: if bi = −1 for 1 ≤ i ≤ k, then the algorithm
should return “probably prime” as well.
(viii) For each of the composite numbers N = 343, 561, 667, and 841, compute T and determine
exactly the error probability of Lehmann’s algorithm for k = 10 (you may assume that gcd(ai , N) = 1
for all i). Compare your results to the estimate from (v).
n
18.25∗ Let Fn = 22 + 1 be the nth Fermat number, for n ∈ N.
(i) Assume that Fn is prime. Show that 3 and 7 are nonsquares modulo Fn if n ≥ 1 and that 5 is a
nonsquare modulo Fn if n ≥ 2. Hint: Exercise 18.23.
(ii) Conclude that for n ≥ 1, Pepin’s (1877) test works correctly: Fn is prime if and only if
3(Fn −1)/2 ≡ −1 mod Fn .
(iii) Show that Pepin’s test can be performed with O(2n M(2n )) word operations.
n
18.26∗ Let n ∈ N≥2 , Fn = 22 + 1 the nth Fermat number, and p ∈ N a prime divisor of Fn . Prove
that 2n+2 | p − 1. Hint: Lagrange’s theorem and Exercise 18.23.
Problema, numeros primos a compositis dignoscendi,
hosque in factores suos primos resolvendi, ad gravissima
ac utilissima totius arithmeticae pertinere [. . . ] tam notum est,
ut de hac re copiose loqui superfluum foret. [. . . ] Praetereaque
scientiae dignitas requirere videtur, ut omnia subsidia ad solutionem
problematis tam elegantis ac celebris sedulo excolantur.1
Carl Friedrich Gauß (1801)

A NTON F ELKEL hatte [die Faktorentafel . . . ] bis zu zwei Millionen

in der Handschrift vollendet [. . .] ; allein was davon in Wien auf
öffentliche Kosten bereits gedruckt war, wurde, weil sich keine Käufer
fanden, im Türkenkriege zu Patronen verbraucht!2
Carl Friedrich Gauß (1812)

Factoring is the resolving of a composite number into its factors,

and is performed by division.
Daniel W. Fish (1874)

L’équation x2 − y2 = N, est de la plus grande importance

dans les questions de factorisation.3
Maurice Kraïtchik (1926)

They made another calculation: how long it would take

to complete the job. The answer: Seven years.
They persuaded the professor to set the project aside.
Richard Phillips Feynman (1984)

1 The problem of distinguishing prime numbers from composite numbers and of resolving the latter into their
prime factors is so well known to be one of the most important and useful in arithmetic [. . . ] that it is superfluous
to speak at length about this matter. [. . . ] Further, the dignity of the science itself seems to require that every
possible means be explored for the solution of such an elegant and celebrated problem.
2 Anton Felkel [. . .] had completed the manuscript of his table [of factors of integers] up to two million [. . .] ; the
parts that had been printed in Vienna with a government grant did not sell and were, unfortunately, used to make
cartridges in the war against the Turks!
3 The equation x2 − y2 = N is of paramount importance in the factorization problem.
19
Factoring integers

In this chapter, we present several of the algorithms listed in Table 19.1 to factor
an integer N of length n into its prime divisors. The running time of Lenstra’s
algorithm actually depends not on n, but mainly on the size of the second largest
prime factor of N. Some of the timing analyses are only heuristic, not rigorously
proven. We note that the input size is n ≈ log2 N/64 words.
method year time
trial division −∞ O∼ (2n/2 )
Pollard’s p − 1 method 1974 O∼ (2n/4 )
Pollard’s ρ method 1975 O∼ (2n/4 )
Pollard’s and Strassen’s method 1976 O∼ (2n/4 )
Morrison’s and Brillhart’s continued fractions 1975 exp(O∼ (n1/2 ))
Dixon’s random squares 1981 exp(O∼ (n1/2 ))
Lenstra’s elliptic curves 1987 exp(O∼ (n1/2 ))
number field sieve 1990 exp(O∼ (n1/3 ))

TABLE 19.1: Some algorithms to factor an integer of length n.

The reader will become convinced that a fair amount of mathematical ingenuity
has been spent on this problem, and that modern methods can attack surprisingly
large numbers. But in comparison to the striking success in factoring polynomials
(say, of degree 1 000 000 over F2 ), the current records, still under 300 digits, are
disappointingly small. This is a manifestation of the practical relevance of poly-
nomial time.
In Chapter 20, we will see how this disappointment has been turned around to
major progress in another area: the assumed difficulty of factoring is important for
the security of some cryptosystems.

19.1. Factorization challenges

A driving force and benchmark for the development of integer factoring algorithms
is the Cunningham project. Cunningham & Woodall (1925) published a table of

541
542 19. Factoring integers

factorizations of bn ± 1 for nonsquare b with 2 ≤ b ≤ 12. The book by Brillhart,

Lehmer, Selfridge, Tuckerman & Wagstaff (1988) contains substantial extensions
of these tables, and Sam Wagstaff regularly keeps the world up to date with a
newsletter on progress and the “most wanted” factorizations.1
If it were easy to factor large integers, then cryptosystems like RSA (Sec-
tion 20.2) would be insecure. RSA Labs, a company marketing these security
products, posed the RSA challenge of factoring certain integers. An implementa-
tion of the number field sieve (see Notes 19.7) factored in 2009 the 232-digit RSA
challenge RSA-768 into its two prime factors (Kleinjung, Aoki, Franke, Lenstra,
Thomé, Bos, Gaudry, Kruppa, Montgomery, Osvik, te Riele, Timofeev & Zim-
mermann 2010).
Another challenge for a new algorithm or implementation is to factor one of the
Cunningham numbers. The number field sieve gained instant fame when it was
9
used to factor the ninth Fermat number F9 = 22 + 1 (Section 18.2).
n Factorization Method Discovery
5 p3 · p7 Euler (1732/33, 1747/48)
6 p6 · p14 unknown Landry (1880); see Williams (1993)
7 p17 · p22 cont. fractions Morrison & Brillhart (1971)
8 p16 · p62 Pollard ρ Brent & Pollard (1981)
9 p7 · p49 · p99 number field sieve Lenstra et al. (1990)
10 p8 · p10 · p40 · p252 elliptic curves Brent (1995)
11 p6 · p′6 · p21 · p22 · p564 elliptic curves Brent (1988)

TABLE 19.2: Factorization of Fermat numbers Fn .

The elliptic curve method has been successful in finding some of the “most
wanted” factorizations of the Cunningham project, in particular for some Fermat
numbers. Table 19.2 shows the history of factorizations of Fermat numbers Fn ;
pk stands of a prime number with k decimal digits. Richard Brent (1999) reports
on two factorizations that he calculated:
F10 = 45 592 577 · 6 487 031 809 ·
4 659 775 785 220 018 543 264 560 743 076 778 192 897· p252 ,
F11 = 319 489 · 974 849 · 167 988 556 341 760 475 137 · 3 560 841 906 445 833 920 513· p564 ,
He factored the 617-digit number F11 in 1988, and F10 (with 309 decimal digits)
in 1995; the former was easier for his elliptic curve software because its second-
largest prime divisor has 22 digits vs. 40 digits for F10 and it is this second-largest
divisor that determines the running time of the elliptic curve algorithm. Numbers
of this size are typically outside the reach of modern factorization software. The
next Fermat number factors as
F12 = 114 689 · 26 017 793 · 63 766 529 · 190 274 191 361 · 1 256 132 134 125 569 · c,
1 http://www.cerias.purdue.edu/homes/ssw/cun/index.html
19.2. Trial division 543

where c is a composite integer with 1187 digits. These factorizations require a

careful implementation and a slew of special insights and tricks, devised by the
people working in the area. All Fn with 5 ≤ n ≤ 30 are known to be composite.
Determining this for F24 took as many computer operations as the rendering of the
feature-length Pixar-Disney movie A Bug’s Life.
In Wagstaff’s Cunningham list dated 24 March 2011, the top “most wanted”
factorization was that of 2929 − 1. Three prime factors have been found, but the
remaining cofactor is composite and has 214 decimal digits.
In order to factor an integer N, one will first remove all “small” prime factors,
say below 106 , by taking gcd’s with the product of several of these; see also Ex-
ercise 19.5. So we may assume N to be odd, and also composite, which can be
ascertained by some probabilistic primality test (Chapter 18). Some algorithms
assume in addition that N is not a perfect power, so that we first use the algorithm
in Section 9.5 to check whether N is an rth power for some integer r, and replace
N by its rth root for the largest possible r (Exercise 9.44). Exercise 18.6 describes
how the strong primality test 18.5 can be modified so as to recognize prime powers
as well. If successful, most algorithms return one nontrivial divisor d of N, and to
obtain the complete factorization of N, the whole procedure—including primality
and perfect power test—is applied recursively to d and N/d. Figure 19.3 illustrates
the whole process. We only discuss probabilistic algorithms, which guarantee the
primality of the returned factors with high probability. Some applications require
a rigorous proof of their primality, which is provided by the deterministic method
of Agrawal, Kayal & Saxena (2004).

19.2. Trial division

The following is the simplest factoring algorithm.

A LGORITHM 19.1 Trial division.

Input: N ∈ N≥3 , neither a prime nor a perfect power, and b ∈ N.
Output: The smallest prime factor of N if it is less than b, and otherwise “failure”.
1. for p = 2, 3, . . . , b do
2. if p | N then return p
3. return “failure”

To find all prime factors, we divide out p as often as possible and continue as
illustrated in Figure 19.3. When calling the algorithm again (with some larger
value of b in case of failure), we may of course use that the input has no prime
factors below p. This procedure will terminate when p is the second largest prime
factor of N.
544 19. Factoring integers

integer trial “small”

N≥2 division prime factors

extract largest
possible root found
factors

“probably test for “composite” factoring

“prime” factor
prime” compositeness algorithm

F IGURE 19.3: Factoring an integer into primes.

For N ∈ N, we let S1 (N) denote the largest prime divisor of N, and S2 (N) the
second largest prime divisor of N. Thus S2 (N) < N 1/2 . The number of steps
required by trial division is S2 (N)(log N)O(1) . For random integers N,

prob(S1 (N) > N 0.85 ) ≈ 0.20, prob(S2 (N) > N 0.30 ) ≈ 0.20.

Thus, the number of steps needed for the trial division algorithm is O∼ (N 0.30 )
“most of the time”.

19.3. Pollard’s and Strassen’s method

√
We denote by a 7−→ a the reduction of integers modulo N. Let 1 ≤ c ≤ N,
F = (x + 1)(x + 2) . . . (x + c) ∈ Z[x], and f = F ∈ ZN [x]. Then

c2 ! = ∏ f (ic).
0≤i<c

The following algorithm uses a “baby step/giant step” strategy.

A LGORITHM 19.2 Pollard’s and Strassen’s integer factoring algorithm.

Input: N ∈ N≥3 , neither a prime nor a perfect power, and b ∈ N.
Output: The smallest prime factor of N if it is less than b, or otherwise “failure”.

1. c ←− ⌈b1/2 ⌉
call Algorithm 10.3 to compute the coefficients of f = ∏ (x + j) ∈ ZN [x]
1≤ j≤c

2. call the fast multipoint evaluation algorithm 10.7 to compute gi ∈ {0, . . . ,

N − 1} such that gi mod N = f (ic) for 0 ≤ i < c
19.4. Pollard’s rho method 545

3. if gcd(gi , N) = 1 for 0 ≤ i < c then return “failure”

k ←− min{0 ≤ i < c: gcd(gi , N) > 1}

4. return min{kc + 1 ≤ d ≤ kc + c : d | N}

We recall the multiplication time M (see inside back cover).

T HEOREM 19.3.
Algorithm 19.2 works correctly and uses O(M(b1/2 )M(log N)(log b + loglog N))
word operations and space for O(b1/2 log N) words.

P ROOF. For 0 ≤ i < c, a prime divisor p of N divides F(ic), and hence also
gcd(gi , N) = gcd(F(ic) rem N, N), if and only if p divides some number in the
interval {ic + 1, . . . , ic + c}, and the correctness follows.
By Lemma 10.4 and Corollary 10.8, the cost for steps 1 and 2 is O(M(c) log c)
additions and multiplications in ZN . Step 3 takes O(c M(log N) loglog N) word
operations, as noted in the end of Section 11.1, and step 4 takes O(c M(log N))
word operations, by Theorem 9.8. The cost for one addition or multiplication in ZN
is O(M(log N)), by Corollary 9.9, and we get a total cost of O(M(b1/2 )M(log N) ·
(log b + loglog N)). We have to store O(b1/2 ) integers of length O(log N). ✷

If we run Algorithm 19.2 for b = 2i and i = 1, 2, . . ., then we have factored N

completely as soon as b > S2 (N). This leads to the following result.

C OROLLARY 19.4.
Using Algorithm 19.2, we can completely factor N with

O M(S2 (N)1/2 )M(log N) log N or O∼ (N 1/4 )

word operations and space O∼ (N 1/4 ).

19.4. Pollard’s rho method

This probabilistic method is by Pollard (1975). Its time bounds are “heuristic” and
have not been rigorously established. The idea is as follows. We choose some
function f : ZN −→ ZN and a starting value x0 ∈ ZN , and recursively define xi ∈ ZN
by xi = f (xi−1 ) for all i > 0. We hope that x0 , x1 , x2 , . . . behave like a sequence
of independent random elements in ZN . If p is an (unknown) prime divisor of N,
then we will have a collision modulo p if there are two integers t, l with l > 0
and xt ≡ xt+l mod p. If N is not a prime power and q is a different prime divisor
of N, and the xi ’s are random residues modulo N, then xi mod p and xi mod q are
546 19. Factoring integers

independent random variables, by the Chinese Remainder Theorem. Thus it is

very likely that xt 6≡ xt+l mod q, and gcd(xt − xt+l , N) is a nontrivial factor of N.
The first question now is: how “small” can we expect t and l to be? We certainly
have t + l ≤ p, and the following analysis shows that the expected value is only
√
O( p) for a random sequence (xi )i∈N . The problem is known as the birthday
problem: Assuming peoples’ birthdays occur randomly, how many people do we
need to get together before we have probability at least 1/2 for at least two people
to have the same birthday? Surprising answer: only 23 are sufficient. In fact, with
23 or more people at a party, the probability of two coinciding birthdays is at least
50.7%.

T HEOREM 19.5 Birthday problem.

We consider random choices, with replacement, among p labeled balls. The ex-
√
pected number of choices until a collision occurs is O( p).

P ROOF. Let s be the number of choices until a collision occurs, that is, two iden-
tical balls are chosen. This is a random variable. For j ≥ 2, we have

1 i−1
prob(s ≥ j) = j−1 ∏ (p − (i − 1)) = ∏ 1 −
p 1≤i< j 1≤i< j p
2
≤ ∏ e−(i−1)/p = e−( j−1)( j−2)/2p ≤ e−( j−2) /2p ,
1≤i< j

where we have used 1 − x ≤ e−x . It follows that

Z ∞
2 2
E (s) = ∑ prob(s ≥ j) ≤ 1 + ∑ e− j /2p
≤ 2+
0
e−x /2p
dx
j≥1 j≥0
Z r
p ∞
−x2 pπ
≤ 2+ 2p e dx = 2 +
0 2
R∞ 2 √
since 0 e−x dx = π /2, by Exercise 19.6. ✷

Floyd’s cycle detection trick. Given an integer x0 ∈ {0, . . . , p−1} and a function
f : {0, . . . , p − 1} −→ {0, . . . , p − 1}, we examine the sequence x0 , x1 , . . . defined by
xi+1 = f (xi ) for i ≥ 0. This is an infinite sequence from a finite set, so that at
some point the values repeat. This results in a cycle of some length l > 0 such that
xi = xi+l for all i ≥ t, for some t ∈ N. We may assume that l and t are minimal with
that property. In Figure 19.4, we see an example with t = 3 and l = 7.
An obvious method to find i 6= j such that xi = x j is to write down the sequence
until a value repeats itself, but this requires space O(t +l). The following algorithm
uses only constant space. The idea of Floyd’s 1-step/2-step cycle detection method
19.4. Pollard’s rho method 547

x5
x4
x6

x3 = x10
x7

x9
x8
x2

F IGURE 19.4: Pollard’s ρ method.

is to use a second sequence (yi )i∈N that iterates f with double speed, so that yi = x2i
for all i, and to store only the current values of xi and yi . Intuitively, it should be
clear that the “faster” sequence “overtakes” the slower one for some i, and then we
have x2i = yi = xi .

A LGORITHM 19.6 Floyd’s cycle detection trick.

Input: x0 ∈ {0, . . . , p − 1} and f : {0, . . . , p − 1} −→ {0, . . . , p − 1}, as above.
Output: An index i > 0 such that xi = x2i .

1. y0 ←− x0 , i ←− 0

2. repeat i ←− i + 1, xi ←− f (xi−1 ), yi ←− f ( f (yi−1 ))

until xi = yi

3. return i

The following lemma says that the number of steps until the first collision xi = yi
in Floyd’s method occurs is at most the number of steps until the first collision
xi = x j with i < j happens.
548 19. Factoring integers

L EMMA 19.7. With t and l as above, Algorithm 19.6 halts after at most t +l steps.

P ROOF. Since yi = x2i for all i, we have xi = yi if and only if i ≥ t and l | (2i−i) = i,
and the smallest positive such index is i = t + (−t rem l) < t + l if t > 0, and i = l
if t = 0. ✷

We now describe Pollard’s ρ method for factoring N. It generates a sequence

x0 , x1 , . . . ∈ {0 . . . N − 1}. We pick x0 at random and define xi+1 ∈ {0, . . . , N − 1} by
xi+1 = f (xi ) = xi2 + 1 rem N. The choice of this iteration function is black magic,
but linear functions do not work, and higher degree polynomials are more costly
to evaluate, and one cannot prove more about them than about x2 + 1.
Let p be the smallest prime dividing N. Then we have xi+1 ≡ xi2 + 1 mod p
for i ≥ 0. The birthday problem and heuristic reasoning which says that the xi ’s
√
“look” random imply that we can expect a collision mod p after O( p) steps. This
duplicate can be detected by using Floyd’s cycle detection trick.

A LGORITHM 19.8 Pollard’s ρ method.

Input: N ∈ N≥3 , neither a prime nor a perfect power.
Output: Either a proper divisor of N, or “failure”.

1. pick x0 ∈ {0, . . . , N − 1} at random, y0 ←− x0 , i ←− 0

2. repeat

2
3. i ←− i + 1, xi ←− xi−1 + 1 rem N, yi ←− (y2i−1 + 1)2 + 1 rem N

4. g ←− gcd(xi − yi , N)
if 1 < g < N then return g
else if g = N then return “failure”

T HEOREM 19.9.
Let N ∈ N be composite, p its smallest prime factor, and f (x) = x2 + 1. Under
the assumption that the sequence ( f i (x0 ))i∈N behaves modulo p like a random se-
quence, the expected running time of Pollard’s algorithm for finding the smallest
√
prime factor p of N is O( p M(log N) loglog N). By applying the algorithm re-
cursively, N can be completely factored in expected time S2 (N)1/2 O∼ (log2 N), or
O∼ (N 1/4 ).

E XAMPLE 19.10. We want to factor N = 82 123. Starting with x0 = 631, we find

the following sequence:
19.5. Dixon’s random squares method 549

i xi mod N xi mod 41
i xi mod N xi mod 41
0 631 16
6 40 816 21
1 69 670 11
7 80 802 32
2 28 986 40
8 20 459 0
3 69 907 2
9 71 874 1
4 13 166 5
10 6685 2
5 64 027 26

The iteration modulo 41 is illustrated in Figure 19.4; the algorithm’s name derives
from the similarity to the Greek letter ρ. It leads to the factor gcd(x3 −x10 , N) = 41.
When we execute the algorithm, only the values modulo N are known; the values
modulo 41 are included for our understanding. The algorithm calculates in tandem
xi and yi = x2i and performs the gcd test each time. We have t = 3, l = 7, and
t + (−t rem l) = 7, and in fact, after seven 2-steps the algorithm catches up with
the 1-steps:

i xi mod N xi mod 41 yi mod N yi mod 41 gcd(xi − yi , N)

0 631 16 631 16 1
1 69 670 11 28 986 40 1
2 28 986 40 13 166 5 1
3 69 907 2 40 816 21 1
4 13 166 5 20 459 0 1
5 64 027 26 6685 2 1
6 40 816 21 75 835 26 1
7 80 802 32 17 539 32 41

The factorization N = 41 · 2003 is found as gcd(x7 − y7 , N) = 41. Of course the

sequence also repeats modulo 2003, and in fact x38 ≡ 4430 ≡ x143 mod N. If this
repetition occurred simultaneously with the one modulo 41, we would not find a
factor: an “unlucky” x0 . ✸

19.5. Dixon’s random squares method

In this section, we present Dixon’s (1981) method of using random squares for
factoring an odd integer N that is not a prime power. It was the first method proven
to be faster than exp(ε · log N) for any ε > 0. Our explanation of Dixon’s method
proceeds in four stages. We first describe the idea of the algorithm, including an
example, and then state the algorithm, with an unspecified parameter B. The crux
then is to show that sufficiently many random numbers computed in the algorithm
are smooth, that is, they have only prime factors below B. From this, we can
determine the best value for B and complete the analysis.
550 19. Factoring integers

We begin by observing that the equations

N = s2 − t 2 = (s + t)(s − t),
2 2
a+b a−b
N = a·b = −
2 2
describe a bijection between factorizations of N and representations of N as a dif-
ference of
√two squares.
√ This immediately suggests a crude factorization algorithm:
for t = ⌈ N⌉, ⌈ N⌉ + 1, . . . , check whether t 2 − N is a perfect square. If we find
such a square, then we can factor N. This algorithm will work well if N = ab with
|a − b| small, since then the running time is dependent
√ on |a − b|. This was already
clear to Fermat; he took N = 2 027 651 281, so that N ≈ 45 029, and found

N = 45 0412 − 10202 = 46 061 · 44 021.

√ √
A modification is to choose some integer k ≪ N, to let t = ⌈ kN⌉, ⌈ kN⌉ + 1, . . .,
and then to check if t 2 − kN is a perfect square. If t 2 − kN = s2 , then gcd(s + t, N)
is a factor of N, hopefully nontrivial, so that s 6≡ ±t mod N.
Actually, finding a relation s2 ≡ t 2 mod N in this way is unlikely for large N. The
basic idea of the random squares method is illustrated in the following example.

E XAMPLE 19.11. Let N = 2183. Suppose that we have found the system of con-
gruences

4532 ≡ 7 mod N, 10142 ≡ 3 mod N, 2092 ≡ 21 mod N.

Then we obtain (453 · 1 014 · 209)2 ≡ 212 mod N, or 6872 ≡ 212 mod N. This
yields the factors 37 = gcd(687 − 21, N) and 59 = gcd(687 + 21, N); in fact N =
37 · 59 is the prime factorization of N. ✸

The above example suggests a systematic approach: choose b at random and

hope that b2 rem N is a product of small primes. When enough have been found,
we will get a congruence s2 ≡ t 2 mod N. We now describe this approach in detail.
We take as our factor base the prime numbers p1 , p2 , . . . , ph up to B, for some
parameter B ∈ R>0 . A number b is a B -number if the integer b2 rem N (the
remainder of b2 on division by N) is a product of these primes. Thus, in Exam-
ple 19.11, the integers 453, 1014, and 209 are B-numbers, for any B ≥ 7 and
N = 2183. A variant of the algorithm allows p0 = −1 in the factor base and
considers the least absolute residue a of b ∈ Z, defined by b ≡ a mod N and
−N/2 < a ≤ N/2; we do not pursue this here.
For every B-number b, we write b2 ≡ pα1 1 pα2 2 · · · pαh h mod N, with α1 , . . . , αh ∈ N,
and associate with b the binary exponent vector

ε = (α1 mod 2, α2 mod 2, . . . , αh mod 2) ∈ F2h . (1)

19.5. Dixon’s random squares method 551

Now suppose that we have b1 , b2 , . . . , bl such that ε1 + ε2 + · · · + εl = 0 in F2h .

α
Writing b2i ≡ ∏1≤ j≤h p j i j mod N, we see that
2 2
∑ α 2γ γ
∏ bi ≡ ∏ p j 1≤i≤l i j = ∏ pj j = ∏ pj j mod N,
1≤i≤l 1≤ j≤h 1≤ j≤h 1≤ j≤h

where γ j = 12 ∑1≤i≤l αi j ∈ N for all j. This gives the desired congruence s2 ≡ t 2

mod N, where
γ
s = ∏ bi , t = ∏ p j j . (2)
1≤i≤l 1≤ j≤h

Note that we have to generate no more than h + 1 B-numbers, so that we may

always take l ≤ h + 1, since any set of h + 1 vectors in F2h is linearly dependent
over F2 .
Having obtained s2 ≡ t 2 mod N, we hope that s 6≡ ±t mod N. If N is not a
prime power and has r ≥ 2 distinct prime factors, then the Chinese Remainder
r
Theorem 5.3 implies that every square in Z× N has exactly 2 square roots in ZN .
2
Therefore, if s is a random square root of t we have

2 1
prob{s ≡ ±t mod N} = ≤ .
2r 2
In Example 19.11, with B = {2, 3, 5, 7}, we have ε1 = (0, 0, 0, 1), ε2 = (0, 1, 0, 0),
ε3 = (0, 1, 0, 1) and ε1 + ε2 + ε3 = 0 in F24 . Furthermore γ1 = γ3 = 0, γ2 = γ4 = 1,
s = 453 · 1 014 · 209, and t = 20 · 31 · 50 · 71 .
Here is the resulting algorithm.

A LGORITHM 19.12 Dixon’s random squares method.

Input: An odd integer N ≥ 3, neither a prime nor a perfect power, and B ∈ R>0 .
Output: Either a proper divisor of N, or “failure”.

1. compute all primes p1 , p2 , . . . , ph up to B

if pi divides N for some i ∈ {1, . . . , h} then return pi

2. A ←− Ø { initialize the set of B-numbers }

repeat

3. choose a uniform random number b ∈ {2, . . . , N − 2}

g ←− gcd(b, N), if g > 1 then return g

4. a ←− b2 rem N
{ factor a over {p1 , . . . , ph } }
for i = 1, . . . , h do
552 19. Factoring integers

5. { determine multiplicity of pi in a }
αi ←− 0
a
while pi divides a do a ←− , αi ←− αi + 1
pi

6. if a = 1 then α ←− (α1 , . . . , αh ), A ←− A ∪ {(b, α)}

7. until #A = h + 1

8. find distinct pairs (b1 , α(1) ), . . . , (bl , α(l) ) ∈ A with α(1) +· · ·+ α(l) ≡ 0 mod 2
in F2h , for some l ≥ 1, by solving an (h + 1) × h system of linear equations
over F2
1
9. (γ1 , . . . , γh ) ←− (α(1) + · · · + α(l) )
2 γ
s ←− ∏ bi , t ←− ∏ p j j , g ←− gcd(s + t, N)
1≤i≤l 1≤ j≤h
if g < N then return g else return “failure”

We let n = log N. Using the sieve of Eratosthenes, the cost for setting up the
factor base in step 1 is O(h log2 h loglog h) word operations, by Theorem 18.10, and
the cost for checking divisibility is O(h · M(n)). The cost for one iteration of the
loop 2 is O(M(n) log n) word operations for the gcd, O(M(n)) word operations to
compute b2 rem N, and O((h + n)M(n)) operations for trial division by all primes
p1 , . . . , ph to check smoothness. (The check can actually be performed faster with
a modification of the Pollard and Strassen algorithm 19.2.) If k is the number of
iterations of the loop 2, then the total cost of the loop is O(k(h + n)M(n)) word
operations. The cost of solving the system of linear equations over F2 in step 8 is
O(h3 ) word operations. The cost of all other steps is dominated by these estimates,
and we obtain a total cost of

O(h3 + k(h + n)M(n)) (3)

word operations.
In practice, √−1 is included in B and the numbers b are chosen randomly in
2
the vicinity
√ of N, because the least absolute residue of b modulo N is then only
about O( N) and more likely to have all prime factors less than B than an arbitrary
number up to N. For our arguments below, however, we need the assumption that
the b’s are uniform random numbers between 1 and N − 1.
Our goal now is to estimate the expected number k of iterations, and to determine
the right choice for B, given just N. For given x, y ∈ R>2 , we let

Ψ (x, y) = {a ∈ N: 1 ≤ a ≤ x, ∀p prime p | a =⇒ p ≤ y},

(4)
ψ (x, y) = #Ψ (x, y).
19.5. Dixon’s random squares method 553

The numbers in Ψ (x, y), all of whose prime factors are not greater than y, are called
y -smooth. We have
b is a B-number ⇐⇒ b2 rem N ∈ Ψ (N, B).
Clearly the crux of the problem is to choose B wisely: if B is too small, then B-
numbers are rare and take a long time to find, and if B is too large, then it takes a
long time to test a prospective B-number, and the linear system is large.
As a warmup exercise, we estimate roughly the probability that a random integer
a ∈ {1, . . . , x} is y-smooth, with y = B. We put u = ln(x)/ln(y), so that y = x1/u ,
and v = ⌊u⌋, and let (a1 , a2 , . . . , av ) ∈ {p1 , . . . , ph }v and a = a1 a2 · · · av . (ln is the
natural logarithm.) Then a ≤ Bv ≤ yu = x, hence a ∈ Ψ (x, y). Each a comes from at
most v! vectors in {p1 , . . . , ph }v , and hence we have the approximate inequalities
v v u
hv h y −v y −u
ψ (x, y) ≥ ≥ & ·v ≈ · u−u = x(u ln y) ,
v! v ln y ln y
by the prime number theorem 18.7 which says that h ≈ y/ ln y. So for a random
positive integer a ≤ x, we have
ψ (x, y) −u
prob{a is y-smooth} = & (u ln y) .
x
We state without proof that for reasonably small u, the true order of this proba-
bility, called Dickman’s ρ-function, is u−u for large enough y. Although we will
not use this fact, it is comforting to know that our coarse estimate is not too far off.

T HEOREM 19.13.
Let u: N −→ R>1 be an increasing function with u(x) ∈ O(log x/ loglog x). Then
the probability that a random integer in {1, . . . , ⌊x⌋} is x1/u -smooth satisfies
ψ (x, x1/u )
= u−u(1+o(1)) .
x

Here, o(1) is shorthand for a function which tends to zero as u approaches in-
finity. The above estimates apply to random values a. We now prove a similar
result about b2 rem N for random values b. Then the expected number of trials
necessary to find a single B-number is at most (u ln y)u (or, in fact, at most uu ).
A different argument is in Exercise 19.10.

L EMMA 19.14. Suppose that N is not divisible by any of p1 , . . . , ph , and r ∈ N

with p2r
h ≤ N . Then

h2r
#{b ∈ N: 1 ≤ b < N and b2 rem N ∈ Ψ (N, ph )} ≥ . (5)
(2r)!
554 19. Factoring integers

P ROOF. The idea of the proof is to adapt our warmup strategy to this situation.
So we consider power products b of p1 , . . . , ph with exactly r factors. The square
of such a b is smooth, and therefore b is clearly in the set S on the left hand
side of (5). But there are not enough of these numbers. It would be sufficient
if the product of any two b’s, rather than just the squares, was actually a square
modulo N. This looks implausible at first. But consider a prime factor q of N.
Modulo q, half of the numbers are squares (Lemma 14.7), and we have not only
“square · square = square”, but also “nonsquare · nonsquare = square” (Exercise
14.8). The same actually holds modulo a power of q. The proof will produce
sufficiently many numbers in S by partitioning the set of all b’s as above according
to their square/nonsquare character modulo all the prime factors of N. Then when
we take two b’s that are distinct but have the same character, their product will
actually be a square modulo all q’s, and therefore modulo N. The following proof
makes this precise. For a first understanding, the reader may want to assume that
N = q1 q2 is the product of two distinct primes.
To begin the proof, we let N = ql11 · · · qtlt be the prime factorization of N. The
quadratic character χi = χqli on Z×li is defined as follows:
i qi

1 if ∃b ∈ N a ≡ b2 mod qlii ,
χi (a mod qlii ) =
−1 otherwise.
Relatives of this character have played a role (implicitly) in equal-degree factor-
ization (Section 14.3) and primality testing (Chapter 18), then called the Jacobi
symbol. The map χi is a group homomorphism. Putting all these characters to-
gether, we get
t
χ: Z×
N −→ {1, −1} = G,

a mod N 7−→ χ1 (a mod ql11 ), . . . , χt (a mod qtlt ) .

We let

Q = {a ∈ N: 1 ≤ a < N, gcd(a, N) = 1, and ∃b ∈ N a ≡ b2 mod N}

be the set of (invertible) squares modulo N. Since a number is a square modulo

N if and only if it is modulo each qlii , by the Chinese Remainder Theorem 5.3, we
have
a ∈ Q ⇐⇒ χ(a mod N) = (1, . . . , 1)
for 1 ≤ a < N. Furthermore, an invertible square modulo qlii has precisely two
square roots, and, again by the Chinese Remainder Theorem, any a ∈ Q has pre-
cisely 2t square roots modulo N (Exercise 9.40).
We denote by S the set in (5). For x ∈ R and s ∈ N, we let

Ts (x) = {a ∈ N: a ≤ x and ∃e1 , . . . , eh ∈ N a = pe11 · · · pehh and e1 + · · · + eh = s}

19.5. Dixon’s random squares method 555

be the set of ph -smooth integers below x with exactly s (not necessarily distinct)
prime factors.√ By assumption, we have a mod N ∈ Z× N for all a ∈ Ts (x). Now we
t
partition Tr ( N) into 2 sets Ug for g ∈ G:
√
Ug = {a ∈ Tr ( N): χ(a mod N) = g}.
We denote by V the image of the multiplication map
[
µ: (Ug ×Ug ) −→ N
g∈G

with µ(b, c) = bc rem N. Since χ(bc mod N) = (1, . . . , 1) for all b, c ∈ Ug and
g ∈ G, we have V ⊆ Q. Furthermore, V ⊆ T2r (N), so that V ⊆ T2r (N) ∩ Q.
Every element in T2r (N) ∩ Q has exactly 2t square roots, and these are all in S, so
S
that #S ≥ 2t · #(T2r (N)∩ Q). How many elements (b, c) ∈ g∈G Ug ×Ug are mapped
√
by µ to the same a ∈ V ? Since b, c ≤ N and bc ≡ a mod N, we then actually have
bc = a. Thus we have to split the2r prime factors of a into two halves to make up
b and c, and there are at most 2rr = (2r)!/(r!)2 ways of doing this. Thus
[
#V · (2r)!/(r!)2 ≥ #( Ug ×Ug ) = ∑ (#Ug )2 .
g∈G g∈G

Putting things together, we have

(r!)2
#S ≥ 2t · #(T2r (N) ∩ Q) ≥ 2t · #V ≥ 2t ∑ (#Ug )2 (2r)! .
g∈G

The Cauchy-Schwarz inequality (Exercise 16.10) says that for any two vectors
x = (x1 , . . . , xn ), y = (y1 , . . . , yn ) ∈ R n we have
2
2 2 2 2 2
x
∑ i ∑ i · y = ||x||2 · ||y||2 ≥ x ⋆ y = ∑ xi yi .
1≤i≤n 1≤i≤n 1≤i≤n

We apply this to n = 2t = #G, xg = 1 and yg = #Ug for g ∈ G, to obtain

2 √ 2
2t · ∑ (#Ug )2 ≥ ∑ #Ug = #Tr ( N) .
g∈G g∈G
√ √
Since prh ≤ N, an element of Tr ( N) corresponds to a choice of exactly r
primes up to ph , with repetitions. Thus

√ h+r−1 hr
#Tr ( N) = ≥ .
r r!
Finally, we find
2t (r!)2 (r!)2 √ 2 h2r
#S ≥ ∑ (#Ug )2 ≥ #Tr ( N) ≥ .✷
(2r)! g∈G (2r)! (2r)!
556 19. Factoring integers

Step 1 of Algorithm 19.12 guarantees that N is not divisible by any of the primes
p1 , . . . , ph . Then the expected number of trials to find a single B-number is at most
−1
#{B-numbers} N N(2r)!
= ≤
N #{B-numbers} h2r

for any r with p2r

h ≤ N, by Lemma 19.14. We now fix r ∈ N and let n = ln N
1/2r
and B = N , so that ln B = n/2r. By the prime number theorem 18.7, we have
h = π (B) > B/ ln(B) if B ≥ 59. Using the rough estimate (2r)! ≤ (2r)2r , we find
that the expected number of trials is at most

N N(ln B)2r
(2r)! < (2r)2r = n2r .
h2r B2r

We need h + 1 B-numbers, and the expected number k of loop iterations satisfies

k ≤ n2r (h + 1). Plugging this into (3) and using n < h < B, we obtain a total cost
of
O(B3 + B2 n2r M(n)) (6)
word operations. Ignoring the factor M(n) and equating the logarithms of the two
factors B2 = en/r and n2r = e2r ln n gives r2 ≈ n/(2 ln n), and we set
r
n
r= . (7)
2 ln n
√
Then B ≤ e n(ln n)/2 , and using
√
ln N lnln N
L(N) = e , (8)

we obtain the following result by substituting (7) in (6).

T HEOREM 19.15.
Dixon’s random
√ squares method factors an integer N with an expected number of
O∼ (L(N)2 2 ) word operations.

Variants of this algorithm are used for factoring large integers, and many practi-
cal improvements to it have been made. We only mention two. The first one notes
that, since a number below N has at most log2 N factors, each exponent vector (1)
has at most log2 N nonzero entries, the matrix of the linear system in step 8 is
sparse, and we can use a variant of Wiedemann’s algorithm (Section 12.4) to solve
it in O∼ (h2 ) steps. In fact, Wiedemann (1986) invented his algorithm specifically
for this approach. However, this does not decrease the cost estimate of Theo-
rem 19.15.
19.6. Pollard’s p − 1 method 557

The second important practical improvement, due to Pomerance (1982, 1985),

is to use a sieving method—reminiscent of Eratosthenes’ sieve to generate prime
numbers—which does not test individual numbers b as above, but rather generates
whole series of B-numbers. The amortized cost for a single smoothness test re-
duces to about 4 log2 log2 B additions of numbers up to log2 B. This is called the
quadratic sieve.
The space requirement for these algorithms is h log2 h bits to store the factor
base, and roughly the same for the linear algebra, since the matrix has h sparse
rows, and Wiedemann’s algorithm can be executed in the same space.

19.6. Pollard’s p − 1 method

This algorithm is a useful introduction to the elliptic curve method in the next
section. We wish to factor the integer N and assume that N has a prime factor p
such that p−1 is smooth. Specifically, we assume that all prime powers l e dividing
p − 1 are such that l e ≤ B for some suitably chosen parameter B. The algorithm is
as follows.

A LGORITHM 19.16 Pollard’s p − 1 method.

Input: Positive integers N ≥ 3 and B.
Output: Either a proper divisor of N, or “failure”.
1. k ←− lcm{i: 2 ≤ i ≤ B}
2. choose a ∈ {2, .., N − 2} uniformly at random
3. b ←− ak mod N, d ←− gcd(b − 1, N)
4. if 1 < d < N then return d else return “failure”

We hope that d is a nontrivial divisor of N. Under the above assumption it is

certainly true that ak ≡ 1 mod p since p − 1 | k. This guarantees that d > 1. In
order to have d < N it is sufficient that N have another prime factor q such that
ak 6≡ 1 mod q.
Another way to view the above method is as follows. We begin by choosing a
group G (in this case G = Z× N ) and we hope that the size of the group “G mod p”
is smooth. Then we choose k as a multiple of the order of G mod p and a ∈ G
at random, so that ak is the unit element modulo p, and hope that is not the unit
element modulo q. In the next section, G mod p will be an elliptic curve over F p .

19.7. Lenstra’s elliptic curve method

In this section, we present Lenstra’s (1987) method for factoring integers using
elliptic curves. Its running time is expressed by the same function L as for the
558 19. Factoring integers

2 2

1 1

–2 –1 0 1 2 –2 –1 0 1 2

–1 –1

–2 –2

F IGURE 19.5: The elliptic curve y2 = x3 − x over the real numbers (left diagram), and the
elliptic curves y2 = x3 − x + b for b = 0, 1/10, 2/10, 3/10, 4/10, 5/10.

random squares method, but instead of L(N) we have L(p), where p is the second
largest prime factor of N. Thus it is faster than Dixon’s method when N = pq is
the product of two primes of substantially different sizes, say 50 and 100 digits. In
Section 19.1, we have highlighted some of the successes of this method.

Elliptic Curves. The basic approach in the elliptic curve method to factor N is as
follows. One prescribes a certain sequence of computations modulo N. A division
by w ∈ Z in this sequence can only be executed if gcd(w, N) = 1. Thus at each
division step, we either continue the computation or we are lucky—gcd(w, N) is
not trivial and we have found a divisor of N. This is sometimes called the pretend
field technique. Calculating a multiple of a random point on a random elliptic
curve leads to such a sequence of computations. What makes this work is that we
will be lucky with reasonably large probability.
The elliptic curve factoring method corresponds to choosing randomly a group
G from a set of elliptic curve groups. Lenstra (1987) showed that with large enough
probability at least one curve will have smooth order.
We start by defining elliptic curves and stating some of their properties. They
inhabit the realm of algebraic geometry , one of the richest and deepest areas of
mathematics. In this text, we cannot but scratch the surface of this beautiful theory.
We have to rely on several results whose proof is beyond the scope of this text.

D EFINITION 19.17. Let F be a field of characteristic different from 2 and 3, and

x3 + ax + b ∈ F[x] squarefree. Then

E = {(u, v) ∈ F 2 : v2 = u3 + au + b} ∪ {O} ⊆ F 2 ∪ {O}

is an elliptic curve over F . Here O denotes the “point at infinity” on E .

19.7. Lenstra’s elliptic curve method 559

3
y
2 S

Q 1
P

–2 –1 1 2 x

–1

–2 R

–3

F IGURE 19.6: Adding two points P with x = −0.9 (red) and Q with x = −0.5 (green) on
the elliptic curve y2 = x3 − x. The point R = P + Q (blue) is the negative of the intersection
point S (black) of the two lines with the curve.

There are other—equivalent—ways of defining and presenting elliptic curves.

The above is called the Weierstraß equation for E, and a and b are its Weierstraß
coefficients. The polynomial x3 + ax + b is squarefree if and only if 4a3 + 27b2 6= 0
(Exercise 19.14).

E XAMPLE 19.18. x3 − x = x(x − 1)(x + 1) is squarefree if char F 6= 2. The corre-

sponding elliptic curve, together with other examples of elliptic curves, is drawn
in Figure 19.5 for F = R. ✸

The reader may imagine that O lies beyond the horizon in the direction of the
y-axis (up and down), and that any two vertical lines “intersect” at O. Projective
geometry provides a rigorous framework for these notions.
An elliptic curve E is nonsingular (or smooth) in the geometric sense, as fol-
lows. Let f = y2 − (x3 + ax + b) ∈ F[x, y], so that E = { f = 0} ∪ {O}. For
(u, v) ∈ E \ {O}, we have

∂f ∂f
(u, v), (u, v) = (−3u2 − a, 2v),
∂x ∂y

which is (0, 0) (meaning that (u, v) is singular on E) if and only if u = (−a/3)1/2

560 19. Factoring integers

and v = 0. But then u3 + au + b = v2 implies that 4a3 + 27b2 = 0, which contradicts

our choice of E.
For F = C, any line which intersects E does this in exactly three points, if
multiplicities are counted properly. This is a special case of Bézout’s theorem
(Section 6.4). Let L = {(u, v) ∈ F 2 : v = ru + s} be a line, for some r, s ∈ F. Then

L ∩ E = {(u, v) ∈ F 2 : (ru + s)2 = u3 + au + b and v = ru + s}.

Since a, b, r, s are all fixed, this is a cubic equation for u. In the case of a vertical
line L = {(u, v): v ∈ F}, where u ∈ F is fixed, one of the points is O.

The group structure. The fundamental property that makes elliptic curves in-
teresting for factorization is that they have a group structure in a natural way. We
define the group operation as follows. The negative of a point P = (u, v) ∈ E is its
mirror image −P = (u, −v) upon reflection at the x-axis, and −O = O. When we
intersect the line through P and Q with E, we get three points, say {P, Q, S}. Then

R = P + Q = −S

(Figure 19.6 on page 559). Special cases are:

(i) Q = P. We take the tangent line at P. Since E is smooth, the tangent is

always well defined.

(ii) Q = O. We take the vertical line through P:

P + O = −(−P) = P

(iii) Q = −P. We take again the vertical line through P and Q and obtain

P + (−P) = −O = O.

It turns out that these definitions make E into a commutative group. The second
special case above shows that O is the neutral element of E, and the third case says
that the inverse of a point P is its negative −P. As usual, for k ∈ Z and P ∈ E we
will write kP for adding P (respectively −P if k < 0) k times (−k times if k < 0)
to itself, and 0P = O.
We now derive the rational expressions for addition on an elliptic curve E. Sup-
pose that P = (x1 , y1 ), Q = (x2 , y2 ), and x1 6= x2 . Then R = (x3 , y3 ) = P + Q ∈
E \ {O}. The line through P and Q has the equation y = αx + β , where α =
(y2 − y1 )/(x2 − x1 ) and β = y1 − αx1 . Let S = (x3 , −y3 ) be the third intersection
point of this line with the curve. Then (αx3 + β )2 = x3 3 + ax3 + b. Since x1 , x2 are
19.7. Lenstra’s elliptic curve method 561

the two other roots of the cubic equation (u3 + au + b) − (αu + β )2 = 0, we have
x1 + x2 + x3 = α2 . It follows that

y2 − y1 2 y2 − y1
x3 = − x1 − x2 , y3 = −y1 + · (x1 − x3 ). (9)
x2 − x1 x2 − x1
Thus, the coefficients of the sum of two distinct points are given by rational func-
tions of the input coefficients. We note that these formulas do not explicitly use the
Weierstraß coefficients of E, which are determined in fact by the two points on it.
A similar formula holds for doubling a point (where R = 2P, x1 = x2 , and y1 = y2 ;
see Exercise 19.15): we have
2 2
3x1 + a 3x2 + a
x3 = − 2x1 , y3 = −y1 + 1 · (x1 − x3 ), (10)
2y1 2y1
if y1 6= 0, and 2P = O if y1 = 0.
The curve E with this operation is a commutative group. We have already
checked all required properties, except associativity. The latter is not hard to check
on a computer algebra system (Exercise 19.17).

The size of an elliptic curve. Our intuition so far has been based on the real
numbers. But for the intended application we have to consider elliptic curves over
finite fields. Our first task is to determine the size of such an elliptic curve, that is,
to estimate the number of points on it. The following estimate is easy and crude.

T HEOREM 19.19.
Let E be an elliptic curve over the finite field Fq of characteristic greater than three.
Then #E ≤ 2q + 1.

P ROOF. For each of the q possible values for u, there are at most two possible
values for v such that v2 = u3 + au + b, corresponding to the two square roots of
u3 + au + b. Adding the point at infinity gives the required estimate. ✷

One reason to think that this is a crude estimate is that, pretending that the value
of u3 + au + b varies randomly as u ranges over Fq , we should expect that for about
half of the u’s there would be two solutions v for the equation, and no solution for
the other half. In other words, u3 + au + b should be a square about half of the
time. Random elements have this property by Lemma 14.7. More formally, we
consider the quadratic character χ : F× q −→ {1, 0, −1} defined by

 1 if c is a square,
χ(c) = 0 if c = 0,

−1 otherwise.
562 19. Factoring integers

For q prime, χ(c) = ( qc ) is called the Legendre symbol (Section 18.5), and for all
c ∈ Fq ,
#{v ∈ Fq : v2 = c} = 1 + χ(c).
From this we conclude that

#E = 1 + ∑ (1 + χ(u3 + au + b)) = q + 1 + ∑ χ(u3 + au + b).

u∈Fq u∈Fq

If χ(u3 + au + b) was a uniformly distributed random variable, then the sum would
behave like a random walk on the line. After q steps of such a random walk, we
√
expect to be about q steps away from the origin (Exercise 19.18). Of course this
is not at all a random process, but the analogy provides some intuitive motivation
for the following result.

T HEOREM 19.20 Hasse’s bound.

√
If E is an elliptic curve over the finite field Fq , then |#E − (q + 1)| ≤ 2 q.

E XAMPLE 19.21.√ Let q = 7. By the Hasse bound, each elliptic curve E over F7
has |#E − 8| ≤ 2 7, so that 3 ≤ #E ≤ 13. Table 19.7 gives the orders of all 42
elliptic curves over F7 .

n 3 4 5 6 7 8 9 10 11 12 13
#{E: #E = n} 1 4 3 6 4 6 4 6 3 4 1

TABLE 19.7: Frequencies of the orders of all elliptic curves E over F7 .

If we take the curve E from Example 19.18 given by y2 = x3 − x, we can count

the points by hand and determine that it contains exactly the eight points

(0, 0), (1, 0), (4, 2), (4, 5), (5, 1), (5, 6), (6, 0), O.

This group is generated by the two elements (4, 2) of order 4 and (0, 0) of order 2,
and hence is isomorphic to Z4 × Z2 .
Another example is the curve E ∗ with the equation y2 = x3 + x, comprising the
eight points

(0, 0), (1, 3), (1, 4), (3, 3), (3, 4), (5, 2), (5, 5), O.

E ∗ is cyclic and generated, for example, by (3, 3). Figure 19.8 illustrates the group
structures of E and E ∗ . ✸

As in any finite group, (#E) · P = O for any P ∈ E, by Lagrange’s theorem. The

order d of P is the smallest positive integer so that dP = O ; d is a divisor of #E.
19.7. Lenstra’s elliptic curve method 563

(4, 2)
(1, 0)
O
(4, 5) (3, 3) (1, 4) (5, 5)
O (0, 0)

(5, 1) (3, 4) (1, 3) (5, 2)

(0, 0) (6, 0)
(5, 6)

F IGURE 19.8: Structure of the elliptic curve groups E (left) and E ∗ (right) from Example
19.21. E is generated by (4, 2) (red) and (0, 0) (green), and E ∗ is generated by (3, 3) (red).
There is a colored arrow from a point P to a point Q if Q − P is the generator of that color.

The elliptic curve algorithm. We first state Lenstra’s algorithm to factor N, and
then prove some of its properties.

A LGORITHM 19.22 Lenstra’s elliptic curve factoring method.

Input: N ∈ N composite, not a perfect power of an integer, and not divisible by 2
or 3, a bound B on the primes in the base, and a guessed upper bound C for the
smallest prime factor of N.
Output: Either a nontrivial divisor of N, or “failure”.

1. choose randomly (a, u, v) ∈ {0, . . . , N − 1}3

b ←− v2 − u3 − au, g ←− gcd(4a3 + 27b2 , N)
if 1 < g < N then return g else if g = N then return “failure”

2. let E be the “elliptic curve” over ZN with Weierstraß coefficients a, b

compute all primes p1 = 2 < · · · < ph less or equal to B
P ←− (u, v), Q ←− P, t ←− 1

3. for i = 1, . . . , h do √
ei ←− ⌊log pi (C + 2 C + 1)⌋
for j = 0, . . . , ei − 1 do
{ Loop invariants: t = pij ∏ per r and Q = tP }
1≤r<i

4. try to compute pi Q in E, using the formulas (9) and (10) and

“repeated doubling”
if some denominator w ∈ {1, . . . , N − 1} which is not invertible
modulo N shows up during the computation
then return gcd(w, N)
else Q ←− pi Q, t ←− pit
5. return “failure”
564 19. Factoring integers

The “elliptic curve” in step 2 is in quotes because the proper definition for a
composite N is more complicated. All we need here is that for each prime factor p
of N, E mod p is an elliptic curve in the proper sense. In particular, the equations
(9) and (10) do not make E into a group. This is plausible since some denominator
might be nonzero but not invertible, so that the expressions might not be well-
defined modulo N. The point is that, until a divisor is found, they give the group
structure on the reduction E p modulo any prime divisor p of N.
We are going to show that successful termination eventually occurs if N has a
prime factor below C. (The order of computation, from smaller to larger prime
factors, might not be essential for the validity of the algorithm, but is required in
the proof given below.)
Let p be a prime divisor of N. Then p does not divide 4a3 + 27b2 , since other-
wise the choice is (successfully or unsuccessfully) abandoned in step 1. We denote
by E p the reduction of E modulo p, that is, the elliptic curve over Z p with Weier-
straß coefficients a, b modulo p. To P ∈ E corresponds Pp ∈ E p , just by reducing
the coefficients modulo p. Moreover, we let O p , the point at infinity on E p , cor-
respond to the point at infinity O of E. Then Pp 6= O p for all P ∈ E \ {0}, and
hence
Pp = O p if and only if P = O. (11)
Then, until the divisor p is found (so that p | gcd(w, N) in step 4), the com-
putation in the algorithm can be considered as implementing arithmetic on E p in
the sense that each partial result Q = tP on E gives, modulo p, the partial result
Q p = tPp on E p ; in other words, tPp = (tP) p . The lucky event that provides a fac-
torization occurs when we reach a multiple of the order of Pp on E p but not of Pq
on Eq , for two prime divisors p, q of N. We had a similar situation in Pollard’s
p − 1 method, with the elliptic curve replaced by the group of units.

L EMMA 19.23. Suppose that (E, P) is chosen, p, q are distinct prime divisors
of N , l is the largest prime factor of the order of Pp in the group E p , p ≤ C, #E p is
B-smooth, and l ∤ #Eq . Then the algorithm factors N .

P ROOF. We let k = ∏1≤r≤h per r , with er as in step 3 for 1 ≤ r ≤ h. The loop in-
variants are easily checked by induction on i and j. Since #E p is B-smooth and
p ≤ C, the Hasse bound implies that #E p | k. Let d be the order of Pp in E p . Then
d | #E p , and hence l ≤ B and d | k. Let pi = l and e be the exponent of l in d, so
that 1 ≤ e ≤ ei . When j = e − 1, then

t = l e−1 ∏ per r and Q = tP

1≤r<i

holds before step 4, t 6≡ 0 mod d, and lt ≡ 0 mod d. It follows that Q p = tPp 6= O p ,

and lQ p = l tPp = O p . Therefore, if the algorithm ever succeeded in computing lQ,
19.7. Lenstra’s elliptic curve method 565

the result could only be O. We show that in fact it terminates by finding a divisor
before reaching this situation.
Assume to the contrary that lQ = O is computed in step 4 by the algorithm.
Since this computation also implements arithmetic on Eq , we have computed lQq =
(l tP)q = Oq . But then, since l does not divide #Eq or the order of Pq , the point
Qq = tPq is already Oq . It then follows that Q = O by (11) and hence Q p = O p ,
a contradiction. ✷

We now turn to the analysis of the probability that the assumptions in this lemma
are satisfied. We note that prob(l ∤ #Eq ) is almost 1 (see Notes 19.7), so we only
consider the probability of the randomly chosen parameters to produce an elliptic
curve E p with #E p being B-smooth. This is basically settled by the following result
of Lenstra’s, whose proof is outside the scope of this text.

T HEOREM 19.24.
There exists a c ∈ R>0 with the following property. Let p be prime, S ⊆ N with
√ √
S ⊆ (p + 1 − p, p + 1 + p) and #S ≥ 3, and a, b ∈ F p chosen at random. Let
E p = {(u, v) : v2 = u3 + au + b} ∪ {O}
be an elliptic curve over F p . Then
c · #S
prob{#E p ∈ S} ≥ √ .
p log p

We note that S is taken from the middle half of the range given by the Hasse
bound; we have S ⊆ {6, 7, 8, 9, 10} for p = 7. Thus the sizes of elliptic curves
are roughly equally distributed in this middle half. Taking the set of B-smooth
numbers for S and using Lemma 19.23, we have the following consequence.

C OROLLARY 19.25.
There exists a c ∈ R>0 with the following property. Let p ≤ C be a prime divisor
of N , and
√ √
σ = #{B-smooth numbers in (p + 1 − p, p + 1 + p)}
√ √
= ψ (p + 1 + p, B) − ψ (p + 1 − p, B),
where ψ is the “number of smooth numbers” function defined in (4). If σ ≥ 3, then
the number M of triples (a, u, v) ∈ {0, . . . , N − 1}3 for which the algorithm factors
N satisfies
M cσ
3
≥√ .
N p log p
566 19. Factoring integers

How many trials of the algorithm are necessary for successful factoring with
√
high probability? We let s = σ /(2 p) denote the probability for a random number
√ √
in the range (p + 1 − p, p + 1 + p) to be B-smooth. If we run the algorithm
repeatedly, say m times, then the failure probability is at most
m m m
M sc sc
1− 3 ≤ 1− ≤ 1− ≤ e−msc/ lnC ≤ ε,
N ln p lnC

when we choose m ≥ ln(1/ε) ln(C)/(sc), where c is as in the previous corollary.

One addition or doubling according to (9) and (10) takes a constant number
of arithmetic operations modulo N, and the time for one execution of step 4 is
O(log pi ) operations modulo N. (It is amusing to note that the formula for addi-
tion is cheaper in this representation than the formula for doubling, requiring only
two vs. three multiplications modulo N, plus one division and some additions.)
Thus the overall cost of the algorithm is O(∑i≤h ei log pi ) or O(B logC) operations
modulo N, and the number of arithmetic operations modulo N for m executions is

O(m · B logC) or O(m · B log N). (12)

Increasing B makes σ and s larger, hence m smaller, so that there is a trade-off in

this time estimate. To choose B optimally, we would like to use a smoothness result
like Theorem 19.13, but for random numbers from a small interval. Unfortunately,
no proof of the following is known.

C ONJECTURE 19.26. For positive real numbers x, u, and an integer d chosen uni-
√ √
formly at random from the interval (x − x, x + x), we have

prob {d is x1/u -smooth} = u−u(1+o(1)) .

For a rough estimate of the cost, we take u = ln p/ ln B, so that B = p1/u . Ignoring

the term o(1), the conjecture says that s is about u−u and m about lnC · uu . We want
to minimize the factor mB in (12). Its logarithm is about

ln p ln p
ln(uu ) + ln B = · ln + ln B. (13)
ln B ln B
Setting √ √
B = e (ln p·lnln p)/2 = L(p)1/ 2 , (14)
with the function L defined in (8), we have
√ 2 ln p 1/2
ln p 2 ln p
= = .
ln B (ln p · lnln p)1/2 lnln p
Notes 567

The right hand side in (13) is, when lnln p ≥ 2, at most

2 ln p 1/2 1 √
· lnln p + ln B = 2(ln p · lnln p)1/2 .
lnln p 2
Thus the number of word operations is, assuming the conjecture,
√ √ √
2+o(1)) ln p·lnln p 2+o(1)
e( = L(p) .
√
2 1/2
We note that L(p) ≤ L(N) for any p ≤ N . In an implementation, we do not
know p and have to guess good values for B and C; the algorithm works well
if there is a prime factor p ≤ C for which (14) holds approximately. One usually
guesses a “small” initial value of C, determines B according to (14) with p replaced
by C, and doubles the value of C in case the algorithm does not produce a factor.
In the complete factorization of N, the second largest prime factor p will usually
require the major portion of the effort. The particular strength of the elliptic curve
method is that it profits from the presence of prime factors that are much smaller
than N 1/2 ; Section 19.1 tells some of the success stories.
Notes. 19.1. Table 19.2 is from Brent (1999).
19.2. The estimates for random integers are from Knuth & Trabb Pardo (1976).
19.3. Algorithm 19.2 is from Strassen (1976). An algorithm using a similar method was
earlier given by Pollard (1974). It yields the same asymptotic result, but is quite a bit more
complicated.
19.4. The annoying assumption in Pollard’s ρ method that x2 + 1 is random was an open
problem for a long time, until Horwitz & Venkatesan (2002) found a variant for which they
could prove the required randomness. Brent (1980) gives an improvement of Pollard’s
method. Brent & Pollard (1981) used it to factor F8 . Knuth (1998) attributes the cycle
detection method to Floyd.
19.5. The simple observation that s2 − t 2 = (s + t)(s − t) leads to a description of all
Pythagorean triples (x, y, z) ∈ N 3 with x2 + y2 = z2 ; see Exercise 19.8. A method for
factoring integers based on congruent squares is designed in Kraïtchik (1926), Chap-
ter 16. Theorem 19.13 is from Canfield, Erdős & Pomerance (1983). The approach via
Lemma 19.14 (and its proof) are in Schnorr (1982). Charlie Rackoff has proposed a vari-
√ 19.10). Pomerance (1982) gives an improvement of Dixon’s method using
ant (Exercise
O (L(N) 5/2 ) word operations.
∼

An important variant of the quadratic sieve, the multiple polynomial quadratic sieve,
is practically useful for distributed computation on a network of workstations (Caron &
Silverman 1988). See Silverman (1987) and Pomerance (1990) for an overview.
19.6. The p − 1 method is from Pollard (1974).
19.7. Elliptic curves and the role of the “point at infinity” are best understood in the
framework of projective geometry. The projective plane P 2 over F consists of all triples
(u : v : w) with u, v, w ∈ F 3 , not all zero, where we identify two such triples if they are
multiples of each other. We may also regard (u : v : w) as the line in F 3 through (u, v, w)
and the origin. The projective curve in P 2 corresponding to an elliptic curve E given by
y2 = x3 + ax + b is
Ẽ = {(U : V : W ) ∈ P 2 : V 2W = U 3 + aUW 2 + bW 3 },
568 19. Factoring integers

and E ∩ F 2 is in correspondence with the affine part Ẽ ∩ {W 6= 0} = Ẽ ∩ {W = 1} via the

substitution u = U/W, v = V /W . The intersection with the line at infinity is

Ẽ ∩ {W = 0} = {(U : V : W ) ∈ P 2 : W = U = 0} = {(0 : 1 : 0)} = {O}.

The choice of 1 for the second coordinate is arbitrary.

In order to get a better feeling for the required algebraic geometry, the reader is advised
to consult Lenstra’s (1987) article and one of the texts in algebraic geometry, such as Chap-
ters 1 and 4 of Hartshorne (1977), Fulton (1969), Brieskorn & Knörrer (1986), Section VI.1
of Koblitz (1987a), or the detailed books by Silverman (1986) and Cox (1989).
Over the complex numbers, an elliptic curve is a one-dimensional curve and hence a
two-dimensional real surface that looks like a torus. You can think of the real picture as
being the intersection of such a torus with a plane (in four dimensions).
In general, one invariant of complex curves is the genus g. In the real surface rep-
resentation, g is the number of holes. If E ⊆ C 2 is a nonsingular planar curve defined
by an irreducible polynomial of degree d (and also nonsingular at infinity), then g =
(d − 1)(d − 2)/2. For an elliptic curve, we have d = 3 and g = 1. Also, g = 0 if and
only if the curve is isomorphic to the projective line P 1 , which is defined analogously to
the projective plane. (An example is the parabola given by y = x2 .)
No curve of genus ≥ 2 is a group with a rational operation. Of course any curve (and
any set) can be made into a group, since a group of the same cardinality exists, and one
can translate the group operations via a bijection. However, the above says that if we want
to express the group operation by rational functions in the coordinates, as done in Section
19.7 for elliptic curves, then this is impossible for genus ≥ 2.
Manin (1956) and Chahal (1995) present proofs of Hasse’s (1933) bound 19.20 that use
only “elementary” methods. This bound had been conjectured by Emil Artin in 1923. It
√
is a special case of the famous Weil bound, which says that |#X − (q + 1)| ≤ 2g q for a
nonsingular projective algebraic curve X over Fq of genus g. A variant of Weil’s bound,
also valid for singular curves, is given in Bach (1996).
An elliptic curve group over a finite field is either cyclic or a direct product of two cyclic
groups, as illustrated in Figure 19.8 (see Silverman 1986).
A fixed prime number ℓ divides a random number with probability about 1/ℓ. In view
of Theorem 19.24, one might expect that ℓ divides #E, where E is a random elliptic curve
over a finite prime field F p , with probability about 1/ℓ as well. This is false, however.
Lenstra (1987) shows that the probability tends to 1/(ℓ − 1) and ℓ/(ℓ2 − 1) as p tends to
infinity over the primes with p 6≡ 1 mod ℓ and p ≡ 1 mod ℓ, respectively.
Further notes. We list some other integer factorization algorithms. The p + 1 method
by Guy (1975) and Williams (1982) and the Φk (p) method by Bach & Shallit (1988)
are generalizations of Pollard’s (1975) p − 1 method. (Φk ∈ Z[x] is the kth cyclotomic
polynomial; see Section 14.10). These methods work well if N has a prime factor with the
property that Φk (p) is smooth.
The continued fraction method by Morrison & Brillhart (1971, 1975) is a different
approach
√ to generating B-numbers, where the continued fraction expansion (Section 4.6)
of N is used to get the b’s. By a heuristic argument, their algorithm should use L(N)O(1)
steps, thus predating Dixon’s result, but this was not proven rigorously. (They factored F7
and introduced several other important ideas, such as using a factor base and linear algebra
Exercises 569

modulo 2 to find squares.) Already Lehmer & Powers (1931) had used this expansion to
√ squares that are congruent modulo N. Pomerance (1982) exhibits variants that use
find two
L(N) 3/2+o(1) word operations, under some unproven hypotheses. Further discussions are
in Pomerance & Wagstaff (1983) and Williams & Wunderlich (1987). The origins of this
method can already be found in Legendre (1785), § XV. (In the Berkeley library copy that
we consulted, D. H. Lehmer has corrected a calculation error of Legendre’s.)
number field sieve by Lenstra, Lenstra, Manasse & Pollard (1990) runs in time
The q
exp(O( 3 log N(loglog N)2 )). It was the first general asymptotic progress (in terms of the
order of the exponent) since Dixon’s (1981) random squares method. Lenstra & Lenstra
(1993) give a status report. The original approach was designed for numbers of a spe-
cial form (as they occur in the Cunningham project), but newer versions apply to arbi-
trary numbers; see Dodson & Lenstra (1995) and Cowie, Dodson, Elkenbracht-Huizing,
Lenstra, Montgomery & Zayer (1996) about their efficiency. In 1999, it was used to break
the 211-digit repunit (10211 − 1)/9 into its two prime factors, with 93 and 118 digits.
As we have seen, the analyses of several factoring algorithms rely on unproven conjec-
tures. The current world records on rigorously proven upper bounds on integer factoring al-
gorithms are Pollard’s and Strassen’s O∼ (N 1/4 ) for deterministic methods and L(N)1+o(1),
due to Lenstra & Pomerance (1992), for probabilistic algorithms.

Exercises.
19.1−→ Prove that the quotient N of 2599 − 1 divided by its 23-digit prime factor
16 659 379 034 607 403 556 537 (15)
is composite. N has 159 decimal digits.
19.2∗ (Lenstra 1990) Consider the following special polynomial factorization task: input is a prime
p and f ∈ F p [x] of degree n and dividing x p − x, so that all monic irreducible factors of f in F p [x]
are linear and distinct. Adapt the Pollard and Strassen method to find a deterministic algorithm for
√
factoring f with O∼ (n p) operations in F p if p2 > n.
19.3 Factor the integer N = 23 802 996 783 967 using Pollard’s ρ method, and also with the Pollard
and Strassen method.
19.4 Let p be a prime. For a sequence u = (ui )i∈N ∈ Z pN let S(u) = min{i ∈ N: ∃ j < i u j = ui } be
the least index with a collision.
(i) For any u0 ∈ Z p , we define a sequence u = (ui )i∈N ∈ Z pN by ui = u2i−1 +1 if i ≥ 1, as in Pollard’s
algorithm 19.8. Determine the mean value (over the choices of u0 ) of S(u) for p = 167 and p = 179
by trying all possible initial values of u0 . Compare your result with the estimated expected value of
S(u) for random sequences from the proof of Theorem 19.5.
(ii) Determine the mean value of T (u) = min{i ∈ N>0 : ui = u2i }, with u and p as in (i), for all
possible values of u0 . Compare to your results of (i).
19.5 (Guy 1975) Let x0 = 2 and xi = x2i−1 + 1 for i ≥ 1. For p ∈ N, we let e(p) = min{i ∈ N≥1 :
xi ≡ x2i mod p}.
(i) Calculate e(p) for the primes p ≤ 11.
(ii) Calculate e(p) for the primes p ≤ 106 . You should find e(p) ≤ 3680 for all these p. (Guy
(1975) notes that e(p) seems to grow like (p ln p)1/2 .)
(iii) Let N be a number to be factored, run Pollard’s ρ method on it with initial value x0 = 2, and
assume that gcd(xi − x2i , N) = 1 for 1 ≤ i ≤ k. Show that e(p) > k for all prime divisors p of N.
(iv) Conclude that if the gcd in (iii) is trivial for 3680 steps, then N has no factor up to 106 .
570 19. Factoring integers

19.6 Prove that Z 2

∞ 2
e−x dx = π. (16)
−∞
Hint: Write (16) as a two-dimensional integral, and use the substitution (x, y) = (r cos ϕ, r sin ϕ).
19.7−→ Factor the number N from Exercise 19.3 using Dixon’s random squares method with all
primes less than B = 40 as your factor base.
19.8 Three positive integers (x, y, z) ∈ N 3 form a Pythagorean triple if x2 +y2 = z2 , and a primitive
Pythagorean triple if furthermore gcd(x, y, z) = 1.
(i) Show that any Pythagorean triple is of the form (λx, λy, λz) for a primitive Pythagorean triple
(x, y, z) and some λ ∈ N.
(ii) Let s,t ∈ N be coprime with s > t and st even. Show that (s2 − t 2 , 2st, s2 + t 2 ) is a primitive
Pythagorean triple.
(iii) Let (x, y, z) be a primitive Pythagorean triple. Show that z is odd and either x or y is odd, but
not both. (Hint: Calculate modulo 4.) Assume that x is odd. Prove that (z + x)/2 and (z − x)/2 are
coprime squares, not both odd, and conclude that (x, y, z) is of the form as in (ii).
(iv) Use (ii) and (iii) to generate all primitive Pythagorean triples (x, y, z) with z ≤ 100.
19.9 What is the relation between n1+o(1) and O∼ (n)?
19.10∗∗ In Lemma 19.14, we proved that the numbers a2 rem N for random a are smooth suffi-
ciently often. We can avoid the use of this lemma by the following modifications to Dixon’s random
squares algorithm 19.12, as proposed by Charlie Rackoff.
◦ In step 1: u ←− (ln N)/ ln ph , τ ←− λuu with a small constant λ, say λ = 10
choose y ∈ {1, . . ., N − 1} at random, g ←− gcd(y, N), if 1 < g < N then return g
◦ In step 4: a ←− b2 y rem N
◦ In step 7: until #A = τ
◦ In step 8: l must be even
◦ In step 9: s ←− yl/2 ∏ bi
1≤i≤l

We denote by σ = ψ(N, ph )/ϕ(N) the probability for x ∈ {1, . . ., N − 1} with gcd(x, N) = 1 to be

B-smooth. Thus σ is about τ −1 . We let N = ql11 · · ·qtlt be the prime factorization of N, with pairwise
different primes q1 , . . ., qt . For each i ≤ t, there are exactly (qi − 1)qlii −1 /2 squares modulo qlii , and
the same number of nonsquares. In a somewhat audacious notation we write +Si ⊂ Z× q li
for the set
of squares, and −Si ⊆ Z× l
qi
for the nonsquares.
Then the Chinese Remainder Theorem gives a decomposition
[ [
Z× ∼ ε1 S1 × ε2 S2 × · · · × εt St = Tε
N =
ε1 ,...,εt ∈± ε∈±t

of Z× t
N into 2 subsets Tε of equal size.
If y ∈ Tε , then also all a computed in step 4 are in Tε . If Tε has its fair share of B-smooth numbers,
namely about σ · #Tε many, then the algorithm will work well with that choice of y. However, we do
not know that the smooth numbers are equally distributed over the 2t sets Tε . So the first question is
to show that a reasonable fraction of all y’s is sufficiently good.
S
(i) Let A = i∈I Bi be a partition of a finite set A into disjoint subsets of equal size k = #A/#I,
C ⊆ A, and s = #C/#A. Then for at least s · #I/2 indices i ∈ I we have #(Bi ∩C) ≥ sk/2.
(ii) Show that for a fraction at least σ/2 of the ε ∈ ±t , Tε contains a fraction at least σ/2 of B-
smooth numbers. Hint: Apply (i) to A = Z× N , C the B-smooth numbers, so that s = σ, and the partition
into the subsets Tε .
(iii) Analyze the success probability and the running time of the algorithm described above.
Exercises 571

19.11 Check that the curve E = {(x, y) ∈ F27 : y2 = x3 + x + 3} over F7 is nonsingular. Compute all
points on it, and verify that it is cyclic and generated by (4, 1).
19.13 Let E be an elliptic curve and P, Q ∈ E. Explain why P + Q = S, where S is the third inter-
section point of E with the line through P and Q (see Figure 19.6), is not a group operation.
19.14 Let F be a field and f = x3 + ax + b ∈ F[x].
(i) Check that r = res( f , f ′ ) = 4a3 + 27b2 .
(ii) Conclude that f is squarefree if and only if r 6= 0.
(iii) For which values of b does y2 = x3 − x + b not define an elliptic curve over F = R? Plot the
curves for all these values.
19.15 Let E = {(x, y) ∈ F 2 : y2 = x3 +ax+b} be an elliptic curve over a field F and P = (x1 , y1 ) ∈ E.
Determine the equation of the tangent to E through P (distinguish the two cases y1 = 0 and y1 6= 0),
and prove that the doubling formula (10) realizes the geometric description using the tangent line.
19.16 Show that an elliptic curve E has at most three points P of order 2, for which P 6= O and
2P = O.
19.17−→ You are to check associativity of the addition we defined on an elliptic curve E.
(i) Write a procedure add to calculate the sum of two distinct points, using (9).
(ii) Check that for three points P, Q, R,

ass = add(add(P, Q), R) − add(P, add(Q, R))

is not zero.
(iii) What has gone wrong in (ii)? We have not used that the three points lie on the same curve. Cal-
culate the Weierstraß coefficients a, b from P and Q, set f = y23 − (x33 + ax3 + b), where R = (x3 , y3 ),
and check that ass ≡ 0 mod f . (You may have to simplify and take numerators at the appropriate
place.)
(iv) We now have associativity at three “generic” points P, Q, R. Check associativity when one of
them is O.
(v) It remains to check the cases where two points coincide, say P = Q or P + Q = R, so that (9) is
not applicable. You have two ways of doing this: writing a little program for these cases, or arguing
by continuity. The latter requires some algebraic geometry.

19.18∗∗ (i) Prove that ∑0≤k<n 2n k = (4n − 2n
n )/2 and ∑0≤k<n
2n−1
k = 4n−1 , for all posi-
tive integers n.
(ii) Let n ∈ N>0 , Xi for 1 ≤ i ≤ 2n be a collection of independent random variables which take
on each of the two values 1 and −1 with probability 1/2, and X = ∑1≤i≤2n Xi be a random walk of
length 2n. Prove that prob(X = 2(n − k)) = prob(X = −2(n − k)) = 2n k 4
−n for 0 ≤ k ≤ n.

(iii) Show that E(X) = 0 and E(|X|) = 2n 2n n 4 .
−n
√
(iv) Use Stirling’s formula n! ∈ 2πn(n/e)n (1 + O(n−1 )) (see Graham, Knuth & Patashnik 1994)
to show that E(|X|) ∈ 2π −1/2 n1/2 + O(n−1/2 ).
(v) Prove the same formulas as in (iii) when there are 2n − 1 instead of 2n random variables.
19.19−→ Program Lenstra’s algorithm 19.22, and use it to factor the number N from Exercise 19.3
with B = 40 and C = 12 000.
Real mathematics has no effects on war. No one has yet discovered any
warlike purpose to be served by the theory of numbers or relativity;
and it seems very unlikely that anyone will do so for many years.
Godfrey Harold Hardy (1940)

Die reine Zahlentheorie ist dasjenige Gebiet der Mathematik,

das bisher noch nie Anwendung gefunden hat.1
David Hilbert (1930)

It would not be an exaggeration to state that abstract cryptography

is identical with abstract mathematics.
Abraham Adrian Albert (1941)

Give me problems, give me work, give me the most

abstruse cryptogram, or the most intricate analysis,
and I am in my own proper atmosphere.
Sir Arthur Conan Doyle (1890)

“Right. So I have a translation key and you have a signature key and all
the communication from you to me needs both
those keys to encode and decode it properly. But if I want
to send a message back, I can’t use those same keys—
I need my signature key and your translation key.”
“And Joe has a different translation key and when I send
him a message I have to use his key. And that’s how
everybody is approaching this, and doing it that way has
the kinds of problems we’re sitting here to solve.”
Philip Friedman (1996)

The KGB, more than other foreign-intelligence agencies,

still depended on one-time-pad cipher systems.
These were unbreakable, even in a theoretical sense,
unless the code sequence itself were compromised.
Tom Clancy (1988)

1 Pure number theory is that part of mathematics for which up to now no application has ever been found.
20
Application: Public key cryptography

This chapter presents one of the most interesting applications of the ideas from
complexity theory and the algorithms from computer algebra: modern cryptog-
raphy. After an introduction to the problem, we present six cryptographic algo-
rithms: the famous RSA scheme, the Diffie-Hellman key exchange, two crypto-
systems by ElGamal and by Rabin, and systems based on elliptic curves and short
vectors in lattices.
It is satisfying to see how many of the computer algebra methods discussed in
this text, certainly designed without this application in mind, have been useful for
cryptography.

20.1. Cryptosystems
The scenario in this chapter is as follows. Bob wants to send a message to Alice in
such a way that an eavesdropper Eve1 listening to the transmission channel cannot
understand the message. This is done by enciphering the message so that only
Alice, possessing the right key, can decipher it, but Eve, having no access to the
key, has no chance to recover the message.
Bob Alice
• ✲•
✻
•
Eve

The following are some of the ciphers that have been used in history.
◦ The Caesar cipher, which simply permutes the alphabet. The classical Caesar
cipher used the cyclic shift by three letters A 7−→ D, B 7−→ E, C 7−→ F, . . .,
Y 7−→ B, Z 7−→ C. For example, the word “CAESAR” is then enciphered as
“FDHVDU”. This cryptosystem is trivial to break: there are only 26 possibili-
ties to try. More generally, one can use any of the 26! ≈ 4 · 1026 permutations
1 Alice, Bob, and Eve are the leading characters of modern cryptography.

573
574 20. Application: Public key cryptography

of a 26-letter alphabet. The key in this simple cryptosystem is the permutation;

its inverse is used to decipher an encrypted message. However, if the eaves-
dropper knows in what language the original message was and thus knows the
(approximate) probabilities for individual letters to occur in an average text,
she may easily recover the message without knowledge of the key by perform-
ing a frequency analysis—provided the message is long enough. Thus this
cipher is a convenient but highly insecure method.
◦ The one-time pad, which works as follows. To encipher a message of length
n over the usual alphabet with 26 elements, a random sequence of n letters is
chosen as a key and added letterwise modulo 26 to the message. The recip-
ient then deciphers the encrypted message by subtracting the key letterwise.
For example, if the message is “CAESAR”, the key is “DOHXLG”, and we
identify the letters A,. . .,Z with the numbers 0, . . . , 25, then the ciphertext is
“FOLPLX”.
The one-time pad is the only known provably secure cryptosystem (in an infor-
mation theoretic sense), but has the disadvantage that the keys are of the same
length as the message (keys cannot be reused without loss of security). It is an
inconvenient but highly secure method.
In a more practical variant, a pseudorandom number generator is used to gen-
erate keys. Both Alice and Bob then use the same kind of generator parametrized
by a seed, so that the two generators produce the same sequence when the same
seed is used. Popular examples are the linear congruential generators.
However, we have seen in Section 17.2 that linear congruential generators are
badly suited for cryptographical purposes since they are vulnerable to so-called
“short vector” attacks. Today there are other types of pseudorandom number gen-
erators where no such attacks are known. In fact, several of the cryptosystems we
are about to describe can be modified to work as pseudorandom generators. The
variant of the one-time pad using pseudorandom number generators is no longer
provably secure; its security heavily depends on the hardness of determining the
elements of the pseudorandom sequence without knowledge of the seed.
In World War II, the German Wehrmacht2 and Marine3 used an enciphering
machine called E NIGMA. This was a mechanical device where the keys were on
cylinders; the underlying cipher was somewhat similar to the Caesar cipher but
vastly more complicated. Its breaking by a British intelligence group at Bletchley
Park under the famous computer scientist Alan Turing was vital for the Allied
victory in the North Atlantic submarine war.
The main customers for cryptosystems used to be the military and secret ser-
vices. Today, these systems are employed for all kinds of secure electronic data
2 army
3 navy
20.1. Cryptosystems 575

processing and communication like passwords, point-of-sale registers, banking

machines, electronic cash, and internet commerce.
All of the classical cryptosystems are symmetric in the sense that the same key
is used for both enciphering and deciphering, or at least the decryption key is easy
(which means in polynomial time) to infer from the encryption key. A problem
with this approach is that the number of keys grows quadratically with the number
of parties if each party needs to communicate with each other party.
In classical cryptography, there never was a clear mathematical understanding
of what “difficult to break” means; one could only take it to mean “the tricks that
I know are not sufficient to break it”. The security of a cryptosystem depends on
the eavesdropper’s cryptanalytic skills and her knowledge about the system. In the
age of Caesar, it may have been reasonable to assume that the cryptanalyst has
only a limited amount of computing power and that the encryption method may be
kept secret from her. In the 20th century however, designers of cryptosystems, both
classical and public key, have to take into account that potential eavesdroppers have
high mathematical intelligence, access to supercomputers, complete knowledge
about the encryption method except the keys, or can even channel in arbitrary
parts of plaintext so that they have many plaintext/ciphertext pairs encrypted with
the current key at their disposal.

public key K private key S

❄ transmitted ❄
plaintext ✲ ciphertext ✲ decrypted text
x y = ε(x) δ(y)
encryption ε decryption δ

F IGURE 20.1: A public key cryptosystem.

Diffie & Hellman (1976) made a revolutionary proposal which has since then
been known as public key cryptography. The idea is to have two different keys
K and S for encryption and decryption, respectively, such that both encryption and
decryption are “easy”, but decryption without knowledge of S is “hard”. Here
“easy” means polynomial time, preferably almost linear or quadratic time in the
message length. Figure 20.1 illustrates the situation. The name “public key cryp-
tography” comes from the fact that the encryption key K may be publicly available.
Since we want x = δ (y) = δ (ε(x)), δ is an inverse of ε. A function that is “easy” but
its inverse is “hard” to compute without additional knowledge, like the encryption
function in a public key cryptosystem, is called a trapdoor function. The keys
K and S are called public key and private key, respectively. With such an asym-
metric cryptosystem, n public-private key pairs are sufficient to permit secure
communication among any two of n parties.
576 20. Application: Public key cryptography

A cryptosystem is certainly broken when the private key is easy to find, but
an appropriate notion of breaking a code is much more generous: a system is
considered broken if there exists a Boolean predicate B(x)—say, the parity of x
if x is an integer—and a polynomial-time probabilistic algorithm which takes y =
ε(x) as input and has a slightly better capability of predicting B(x) than a random
guess. Otherwise, the system is semantically secure; this is only possible for
probabilistic encryption schemes, and the precise definition is a bit tricky.
There are several possibilities to make precise what “hard” means. Here is a list
of some, ordered in increasing desirability.
◦ The inventor of the cryptosystem does not know of any polynomial time algo-
rithm.
◦ Nobody knows of a polynomial time algorithm.
◦ Whoever breaks the system will probably in turn have solved a well-studied
“hard” problem.
◦ Whoever breaks the system has in turn solved a well-studied “hard” problem.
◦ Whoever breaks the system has in turn solved an NP –complete problem (Sec-
tion 25.8).
◦ There is provably no (probabilistic) polynomial-time algorithm, as we have
stipulated above.
At present, nobody knows of a cryptosystem fulfilling any of the last three require-
ments. However, it was a major conceptual breakthrough of the Diffie & Hellman
proposal that the hitherto elusive notion of a “hard-to-break cipher” should be
studied within the well-established framework of computational complexity.
Some of the modern proposals for cryptosystems have already been broken.
Merkle & Hellman (1978) suggested a cryptosystem based on the subset sum
problem. This system and several variants were broken using a basis reduction al-
gorithm (Section 17.1). Another cipher proposed by Cade in 1985 (see Cade 1987)
was based on the assumed hardness of the functional decomposition problem for
polynomials: Given a polynomial f over a field F of degree n, decide if there exist
polynomials g, h ∈ F[x] of degree at least 2 such that f = g ◦ h = g(h), and if so,
compute such g and h. The system was broken by Kozen & Landau (1989), who
gave an algorithm for the problem with running time O(n3 ). Beyond our fairly sim-
ple scenario, modern cryptography studies many other tasks: electronic signatures
and message authentication, multi-party communication, electronic cash, etc.
We now present some modern public key cryptosystems.

20.2. The RSA cryptosystem

We have already introduced in Section 1.2 this famous system, which is named
after its inventors Rivest, Shamir & Adleman (1978) and is based on the assumed
20.2. The RSA cryptosystem 577

hardness of factoring integers. The idea is that Alice randomly chooses two large
(say 150-digit) primes p 6= q, and sets N = pq. Anybody who can factor N can
break the system; ideally, we would also like the converse to be true, since numbers
N with more than about 160 decimal digits seem out of the range of current integer
factorization software (see the Notes and Chapter 19). Messages are encoded as
sequences of elements of ZN = {0, . . . , N −1}. If, for example, we use the standard
alphabet Σ = {A, . . . , Z} of cardinality #Σ = 26, then messages of up to 212 =
⌊log26 10300 ⌋ letters can be uniquely represented by a single element of ZN , using
the 26-adic representation. For example, the message “CAESAR” is encoded as

2 · 260 + 0 · 261 + 4 · 262 + 18 · 263 + 0 · 264 + 17 · 265 = 202 302 466 ∈ ZN .

If Alice wants to receive messages from Bob, she chooses e ∈ {2, . . . , ϕ(N) − 2}
with gcd(e, ϕ(N)) = 1 at random, where ϕ is Euler’s totient function (Section 4.2)
and ϕ(N) = #Z× N = (p − 1)(q − 1). (She can also fix e, say e = 3.) Then she
computes d ∈ {2, . . . , ϕ(N) − 2} with de ≡ 1 mod ϕ(N), using the Extended Eu-
clidean Algorithm (Theorem 4.1), publishes the pair K = (N, e) as her public key,
and keeps her private key S = (N, d) as well as p, q secret (the latter may even be
discarded). The encryption and decryption functions ε, δ : Z× ×
N −→ ZN are defined
e d ×
by ε(x) = x and δ (y) = y . To send a message x ∈ ZN to Alice, Bob looks up her
public key, computes y = ε(x), and sends this to Alice, who computes δ (y), using
her private key. Then, with u ∈ Z such that de − 1 = u · ϕ(N), we have

(δ ◦ ε)(x) = δ (xe ) = xde = x1+u·ϕ(N) = x(xϕ(N) )u ≡ x mod N,

since xϕ(N) = 1 by Euler’s theorem (Section 18.1). Although the latter is only valid
if gcd(x, N) = 1, actually (δ ◦ ε)(x) = x is true for all x (Exercise 20.5). However,
values of x that are not coprime to N lead to the factorization of N and thus to a
break of the cryptosystem. Fortunately, if p and q are large and we assume that
many or all messages x are likely to occur, then this will practically never happen.
We recall that a polynomial-time reduction from one problem X to another
one Y is a polynomial-time algorithm for X making calls to a subroutine for Y . If
polynomial-time reductions exist in both directions, then X and Y are polynomial-
time equivalent (see Section 25.8). The following theorem is proven in Exercise
20.6.

T HEOREM 20.1.
The following three problems are polynomial-time equivalent:

(i) factoring N ,
(ii) computing ϕ(N),
(iii) computing d ∈ N with de ≡ 1 mod ϕ(N) from K = (N, e).
578 20. Application: Public key cryptography

Unfortunately, the theorem does not say that breaking the system means that
one can factor integers efficiently, since there might be a successful attack that
does not compute the private key at all.
The RSA scheme can also be used for authentication, where the sender of a
message has to prove that he actually is the originator. This is also called a digital
signature. If Bob wants to send a signed message x to Alice, he computes y = δ (x)
using his own private key, and sends this to Alice, who looks up Bob’s public key
and recovers x = ε(y). Since only Bob is assumed to know his private key, no
forger would have been able to produce y, and Alice is convinced that the message
originated from Bob. Instead of the whole message x, Bob might just sign a short
digest of x obtained with a cryptographic hash function.
The authentication scheme may even be used in conjunction with the encryption
scheme to ensure privacy. If εA , δA and εB , δB are Alice’s and Bob’s encryption and
decryption functions, respectively, and Bob wants to send a signed message x to
Alice that no one else can decipher, he computes y = εA (δB (x)), and sends this to
Alice, who first decrypts δA (y) = δB (x) and then x = εB (δA (y)), at the same time
assuring herself that the message originates from Bob.

20.3. The Diffie–Hellman key exchange protocol

Diffie & Hellman (1976) presented this scheme for two parties to agree upon a
common key for future communication with a symmetric cryptosystem rather than
a public key cryptosystem. An example might be the seed of a pseudorandom
generator to be used for a one-time pad. Let q ∈ N be a “large” prime power
(say with about 1000 bits) and g be a generator of F×
q , the multiplicative group of
the finite field Fq (Section 25.4). Then F×
q is isomorphic to the cyclic (additive)
i
group Zq−1 , via g ←→ i. The protocol works as follows.

1. Alice and Bob agree on q and g, which may be public.

2. Alice secretly chooses a ∈ Zq−1 at random, computes u = ga ∈ F×

q , and sends
u to Bob.

3. Bob secretly chooses b ∈ Zq−1 at random, computes v = gb ∈ F×

q , and sends
v to Alice.

4. Alice computes gab = va .

5. Bob computes gab = ub .

Both parties may use gab as a common key for further communication with a sym-
metric cryptosystem. In this context, the following problems play a central role.
20.4. The ElGamal cryptosystem 579

P ROBLEM 20.2. (Diffie-Hellman Problem, DH) Given ga , gb ∈ F× ab

q , compute g .

P ROBLEM 20.3. (Discrete Logarithm Problem, DL) Given ga ∈ F×

q , compute a.

It is conjectured that DL is a “hard” problem. The fastest algorithms for DL have

the same subexponential running times as for factoring integers, and DL seems
unlikely to be NP -complete. An eavesdropper knowing q, g and the transmitted
u, v but not a or b has to solve DH to find out gab , which in turn is polynomial-time
reducible to DL. It is, however, not clear whether DL is polynomial-time reducible
to DH.
The currently best estimate for the computation of discrete logarithms in F× q is
p
3 2
exp(O( n log n )) word operations, where n ≈ log2 q is the binary length of a
1013
description for an elementq of Fq (Gordon 1993). A cryptosystem using q = 2
was marketed, and exp( 3 1 013 log22 1013 ) > 1.6 · 1020 ; a machine performing one
calculation per nanosecond takes about 500 years to achieve this number.

20.4. The ElGamal cryptosystem

As before, F×q is a “large” finite field with q elements, and g is a generator of Fq . To
×

receive messages from Bob, Alice randomly chooses S = b ∈ Zq−1 as her private
key and publishes K = (q, g, gb ). If Bob wants to send a message x to Alice,
he looks up her public key, randomly chooses k ∈ Zq−1 , computes gk and xgkb ,
and sends y = (u, v) = (gk , xgkb ) to Alice, who computes the original message as
x = v/ub . Computing x from y without knowing S is polynomial-time equivalent
to the Diffie-Hellman Problem.
A practical problem in implementing the Diffie-Hellman scheme or the ElGamal
3
system is that exponentiation in F× q is theoretically easy (O(n ) word operations for
n
q = 2 using classical arithmetic), but not fast enough to achieve high throughput.
One can, however, achieve time O∼ (n2 ).

20.5. Rabin’s cryptosystem

This is based on the hardness of computing square roots modulo N = pq, where
p, q are two “large” primes as in the RSA scheme. The factorization of N can be
reduced to the√computation of square roots, as follows. Choose some x ∈ ZN and
compute y = x2 . Then x2 ≡ y2 mod N, or equivalently, pq = N | (x + y)(x − y).
If x 6≡ ±y mod N, then this gives us the factorization of N. This is the main idea
for Dixon’s random squares algorithm in Section 19.5.
Thus to send a message x to Alice, Bob uses her public key N, and sends y ≡
x2 mod N. Alice can compute the two square roots each of y modulo p and q,
by equal-degree factorization (Section 14.3), and combine them via the Chinese
580 20. Application: Public key cryptography

Remainder Algorithm. There are various tricks to deal with the choice among the
four different answers computed by Alice.
However, the system’s use as a signature scheme is vulnerable to an active at-
tack: if Eve chooses a random x and gets Alice to sign a message y ≡ x2 mod N,
by returning a square root z of y modulo N, then with probability 1/2, gcd(x −z, N)
will be a proper factor of N. The system is not considered secure for this reason.

20.6. Elliptic curve systems

These work similarly to the ElGamal system, except that the multiplicative group
F×q is replaced by the additive group of an elliptic curve E over Fq (see Sec-
tion 19.7). The curve is public, and so is a point P on E. Let n be the order of
P in E. Alice chooses a ∈ {2, . . . , n − 2} at random and publishes aP. When Bob
wants to send her the message (m1 , m2 ) ∈ Fq2 , with m1 , m2 ∈ Fq , he chooses a ran-
dom k, computes kP = (r1 , r2 ) and k · aP = (s1 , s2 ) in Fq2 , and sends (y1 , y2 , y3 , y4 ) =
(r1 , r2 , s1 m1 , s2 m2 ) ∈ Fq4 to Alice. She then computes a · (r1 , r2 ) = a · kP = (s1 , s2 )
and retrieves Bob’s message as (m1 , m2 ) = (s−1 −1
1 y3 , s2 y4 ).

Notes. 20.1. The one-time pad is in Vernam (1926). Cryptographically strong pseudo-
random number generators, whose prediction is thought to be computationally hard, are
discussed in Lagarias (1990). An algorithm with time O∼ (n) for functional decomposition
is in von zur Gathen (1990a, 1990b); see Exercise 20.3.
20.2. The security of individual bits in the RSA scheme has been discussed by several
researchers; see Näslund (1998) and Håstad & Näslund (1998) for references. It is gener-
ally hard to predict, and even harder to predict the future. Nevertheless, Odlyzko (1995b)
extrapolates past progress and concludes that 1500 to 10 000 bits are needed for a number
(used in a cryptosystem) to be safe against factoring attempts. Attacks via basis reduction
on small RSA exponents and other cryptographic applications are discussed in Nguyen &
Stern (2001).
20.3. McCurley (1990) gives an overview on discrete logarithm algorithms. Maurer &
Wolf (1999) reduce DL to DH in some special cases.
20.4. ElGamal (1985) gives his cryptosystem. Fast exponentiation in finite fields F2n
can be achieved using Gauß periods, normal bases, and fast arithmetic (von zur Gathen
& Nöcker 1997, Gao, von zur Gathen & Panario 1998, Gao, von zur Gathen, Panario &
Shoup 2000).
20.6. Public key cryptosystems based on elliptic curves were invented by Miller (1986)
and Koblitz (1987b). Menezes (1993) and Blake, Seroussi & Smart (1999) present com-
prehensive treatments.

Exercises.
20.1 As in Section 20.1, we identify the letters A, B, C, . . ., Z with the elements 0, 1, 2, . . ., 25 of Z26 .
(i) The word “OAYBGFQD” is the result of an encryption with a Caesar cipher, which maps each
letter x ∈ Z26 to x + k, where k ∈ Z26 is the key. What are the cleartext and the key?
Exercises 581

(ii) The word “MLSELVY” is the ciphertext after encryption with the one-time pad using the key
“IAMAKEY”. Find the cleartext.
20.2−→ This exercise is about a variant of the following password encryption scheme suggested
by Purdy (1974), before the advent of public-key cryptography (Diffie & Hellman 1976). Let p =
264 − 59, which is the largest prime below our processor’s assumed word length 264 , encode a 13-
letter password w∗ over the 26-letter alphabet {A, B, . . ., Z} as a number w using 26-adic notation,
and consider w ∈ F p . This makes sense, since 2613 < p. Then w is encrypted as f (w) ∈ F p , where
24 24
f = x2 +17
+ a1 x2 +3
+ a2 x3 + a3 x2 + a4 x + a5 ∈ F p [x],

for some specific values a1 , . . ., a5 ∈ F p . The pairs (login-name, f (password)) are stored in a public
file. When a user logs on and types in her password w∗ , f (w) is calculated and checked against the
entry in the file.
(i) Let a1 = 2, a2 = 37, a3 = −42, a4 = 15, a5 = 7, and w∗ = RUMPELSTILTZK. Calculate w
and f (w).
(ii) How many arithmetic operations in F p are used to calculate f (w) from w?
(iii) Let v ∈ F p . The core of the algorithm in Exercise 14.20, which calculates {w ∈ F p : f (w) = v},
is the computation of x p rem f . Extrapolating the timings from Figure 9.10, you may assume that
one multiplication modulo f can be done in about one hour. Since f is sparse, the reduction modulo
f is inexpensive. How long does the computation of x p rem f take approximately? What do you
conclude about the security of this system (on today’s computers)?
20.3∗ (Kozen & Landau 1989, von zur Gathen 1990a) Let F be a field and f ∈ F[x] of degree n.
A functional decomposition of f is given by two polynomials g, h ∈ F[x] of degrees at least two
such that f = g ◦ h = g(h). If no such decomposition exists, then f is indecomposable. Obviously a
necessary condition for the existence of a decomposition is that n be composite.
(i) Let f = g ◦ h be a functional decomposition and c, d ∈ F with c 6= 0. Show that f = g(cx + d) ◦
(h − d)/c is also a functional decomposition. Find a functional decomposition f / lc( f ) = g∗ ◦ h∗ into
monic polynomials g∗ , h∗ ∈ F[x], with the same degrees as g, h, and such that h(0) = 0. We call such
a decomposition normal.
(ii) Let f = g ◦ h be a normal decomposition, r = deg g, s = deg h, and f ∗ = rev( f ) = xn f (x−1 )
and h∗ = rev(h) = xs h(x−1 ) the reversals of f and h, respectively. Prove that f ∗ ≡ (h∗ )r mod xs .
(iii) Let f = g1 ◦ h1 be another normal decomposition with r = deg g1 and s = deg h1 and assume
that r is coprime to char F. Prove that h = h1 and g = g1 . Hint: Uniqueness of Newton iteration
(Theorem 9.27).
(iv) Consider the following algorithm, which works even over rings.
A LGORITHM 20.4 Functional decomposition of polynomials.
Input: A monic polynomial f ∈ R[x] of degree n > 3 and a nontrivial divisor r of n, where R is a ring
(commutative, with 1) of characteristic coprime to r.
Output: Either a normal decomposition f = g ◦ h with g, h ∈ R[x] and deg g = r, or “no such decom-
position”.
1. f ∗ ←− rev( f ), s ←− n/r
{ compute rth root of f ∗ via Newton iteration }
call the Newton iteration algorithm 9.22 to compute h∗ ∈ R[x] of degree less than s with
h∗ (0) = 1 and (h∗ )r ≡ f ∗ mod xs
h ←− xs h∗ (x−1 )
2. call Algorithm 9.14 to compute the h-adic expansion f = hr + gr−1 hr−1 + · · · + g1 h + g0 of f ,
with gr−1 , . . ., g0 ∈ R[x] of degrees less than s
3. if gi ∈ R for all i then return g = xr + ∑0≤i<r gi xi and h
else return “no such decomposition”
582 20. Application: Public key cryptography

Prove that the algorithm works correctly, and show that it takes O(M(n) log r) additions and mul-
tiplications in R. What goes wrong if gcd(r, char R) > 1?
(v) Apply the algorithm to find a decomposition of f = x6 + x5 + 2x4 + 3x3 + 3x2 + x + 1 ∈ F5 [x].
20.4 Let N = 8051 = 97 · 83.
(i) The public key in a RSA cryptosystem is K = (N, e) = (8051, 3149). Find the corresponding
private key S = (N, d).
(ii) A message x has been encrypted using K, and the resulting ciphertext is 694. What is x?
20.5 Let p, q ∈ N be distinct primes, N = pq, K = (N, e) the public key, and S = (N, d) the private
key in a RSA cryptosystem, such that d, e ∈ N satisfy de ≡ 1 mod ϕ(N).
(i) In Section 20.2, we have assumed that messages x to be encrypted are coprime to N. Prove that
the RSA scheme also works if this condition is violated. Hint: Chinese Remainder Theorem.
(ii) Show that the intruder Eve, who has intercepted the ciphertext ε(x) but does not know the
private key S, can easily break the system if x is not coprime to N.
20.6∗ In this exercise, you are to prove Theorem 20.1. So let N = pq for two distinct primes p, q ∈ N.
(i) Show how to compute p, q from the knowledge of N and ϕ(N). Hint: Consider the quadratic
polynomial (x − p)(x − q) ∈ Z[x].
(ii) Suppose that you are given a black box which on input e ∈ N decides whether it is coprime
to ϕ(N), and if so, returns d ∈ {1, . . ., ϕ(N) − 1} such that de ≡ 1 mod ϕ(N). Give an algorithm
using this black box which computes ϕ(N) in time (log N)O(1) . Hint: Find a “small” e coprime to
ϕ(N).
20.7−→ (i) Program a procedure key generate that generates a pair (K, S) of keys for the RSA
cryptosystem, such that K = (N, e) is the public key, S = (N, d) is the private key, N is the product
of two random 100 bit prime numbers, e ∈ {2, . . .ϕ(N) − 2} is chosen uniformly at random, and d ∈
{2, . . ., ϕ(N) − 2} satisfies de ≡ 1 mod ϕ(N).
(ii) Design a coding for short strings of English words with at most 30 letters, including punc-
tuation marks, parentheses, and blanks, as integers between 0 and N − 1, and write corresponding
procedures encode and decode.
(iii) Write a procedure crypt for encrypting and decrypting with the RSA cryptosystem. Its argu-
ments should be a number in ZN and a key.

key generate

K S

☛ ❯
text x y = ε(x) x = δ(y) text
✲ encode ✲ crypt ✲ crypt ✲ decode ✲

(iv) Check your programs with sample messages of your choice, and produce some timings.

Research problems.
20.8 Reduce (in probabilistic polynomial time) factoring integers to breaking RSA (or some other
cryptosystem).
20.9 Reduce DL to DH in polynomial time.
Part V
Hilbert
David Hilbert (1862–1943) grew up in Königsberg, then capital of East Prussia
and now Kaliningrad in Russia, in an upper middle-class family; his father was a
judge. The town had been home to the philosopher Immanuel Kant, to Leonard
Euler, whose solution to the riddle of how to cross its seven bridges across the
river Pregel without re-using one became a starting point for graph theory and
topology, and to C. G. J. Jacobi.
After an unimpressive school career, he studied at the university to graduate
with his doctoral thesis on invariant theory in 1885. He worked in this area until
1893, proving among other things the Hilbert basis theorem saying that any ideal
in a polynomial ring (in finitely many variables over a field) is finitely generated
(Theorem 21.23), and introducing the Hilbert function of algebraic varieties.
Two further results from his “multivariate polynomial phase” are relevant to the
subject matter of this text: firstly Hilbert’s Nullstellensatz 1 (1890), which says
that if a polynomial g vanishes on the set of common roots of some multivariate
polynomials f1 , . . . , fs over C, then some power ge is in the ideal h f1 , . . . , fs i (see
Section 21.7). Secondly, Hilbert’s irreducibility theorem (1892), stating that for
an irreducible polynomial f ∈ Q[x, y], the univariate polynomial f (x, a) ∈ Q[x] is
irreducible for “most” a ∈ Z. This sounds useful for reducing bivariate to
univariate factorization. Unfortunately, no efficient versions of “most” are known,
but, fortunately, such versions are known for reducing from many to two variables
(Section 16.6).
Hilbert became a professor at the university of Göttingen in 1895. Under his
leadership and that of Felix Klein, its fame, established by Gauß, as a center for
mathematics kept growing. Among their famous colleagues were Hermann
Minkowski, Ernst Zermelo, Constantin Carathéodory, Emmy Noether, Hermann
Weyl, Carl Runge, Richard Courant, Edmund Landau, Alexander Ostrowski, Carl
Ludwig Siegel, and Bartel van der Waerden, who based his Modern Algebra
(1930b, 1931) on Emmy Noether’s Göttingen lectures.
Hilbert’s Zahlbericht 2, commissioned by the Deutsche Mathematiker -
Vereinigung 3, gave a rich overview of the state of algebraic number theory and led
him to a vast and elegant generalization of Gauß’ quadratic reciprocity law and to
the Hilbert class field theory .
His next area of work culminated in the booklet Grundlagen der Geometrie 4,
where he laid down the basic properties that a “nice” system of axioms should
have: soundness, completeness, and independence.
1 Nullstelle = root
2 Report on [the theory of] numbers
3 German Mathematical Society
4 Foundations of Geometry

586
Then came what turned out to be his most influential “work”: his talk on
August 8, 1900, at the International Congress of Mathematicians in Paris (Hilbert
1900). He began with: Wer von uns würde nicht gern den Schleier lüften, unter
dem die Zukunft verborgen liegt, um einen Blick zu werfen auf die bevor-
stehenden Fortschritte unserer Wissenschaft und in die Geheimnisse ihrer
Entwicklung während der künftigen Jahrhunderte!5, and ended with the list of the
23 Hilbert problems . As intended, this set of problems shaped the mathematics of
the next century, and those who contributed to a solution would be said to belong
to the “honors class” of mathematicians.
Hilbert liked lecturing,
and excelled at it. He usually
prepared only an outline of his
lecture and filled in the details in
front of the students—so he got
stuck and confused everybody
at times, but “a third of his
lectures were superb”. He was
Doktorvater to the impressive
number of 69 doctoral
students, and this Hilbert
school spread his approach to
mathematics around the world.
Hilbert could be funny
and entertaining at social events,
loved dancing, and was a
successful charmer of the ladies.
His unprejudiced and liberal
thinking led to a clash with
the German authorities when he
refused to sign, at the beginning
of World War I, a declaration
supporting the Kaiser and his
government. At the beginning
of the next German catastrophe,
the Nazis forced in 1933 almost
all Jewish professors out of their positions (and brutally worse was to come).
Constance Reid (1970) relates in her wonderful biography how the Nazi minister
of education said to Hilbert at a banquet in 1933 that mathematical life in
5 Who of us would not be glad to lift the veil behind which the future lies hidden, to cast a glance at the next
advances of our science and at the secrets of its development during future centuries!

587
Göttingen probably had not suffered from being freed of Jewish influence.
Hilbert’s reply: Jelitten? Das hat nicht jelitten, das jibt es nicht mehr.6
After work on the Dirichlet Principle, Waring’s Problem, the transcendence of e
and π , integral equations, Hilbert spaces (spectral theory ), calculus of variations,
and a less successful attempt at laying the foundations of modern physics, Hilbert
returned to logic and the foundations of mathematics in the 1920s. The 19th
century philosopher Emil du Bois-Reymond had pointed to the limits of our
understanding of nature: ignoramus et ignorabimus7 . Hilbert was strongly
opposed to this scepticism (in der Mathematik gibt es kein ignorabimus8 ) and set
himself the goal of formalizing mathematics in a symbolic way, as pioneered by
Gottlob Frege, and Bertrand Russell and Alfred North Whitehead. Alas, Hilbert’s
program was proved to be infeasible on this point by Kurt Gödel and Alan
Turing; see Section 14.6 for the interesting juxtaposition of Hilbert’s belief and a
precocious undecidability result in polynomial factorization by van der Waerden.
Although that particular goal of Hilbert’s turned out to be unattainable, the ideas
he introduced into proof theory and symbolic logic are alive and well today; see
Section 24.1 for a small example. In fact, modern programming languages
realize, in some sense, Hilbert’s program of formalizing mathematics and science.
In the last decade of Hilbert’s life, his health—including his mental
facilities—deteriorated, and he led a secluded life. He died in February 1943, of
the long-term effects of a fall. By then, the war had his country in its grip, and
only a miserable procession of a dozen people accompanied the great
mathematician on his last trip.

6 Suffered? It did not suffer, it does not exist any more.

7 we do not know and we shall not know
8 in mathematics there is no ignorabimus [quoted from his 1900 lecture.]

588
Tant que l’Algèbre et la Géométrie ont été séparées, leurs progrès
ont été lents et leurs usages bornés; mais lorsque ces deux sciences
se sont réunies, elles se sont prêté des forces mutuelles
et ont marché ensemble d’un pas rapide vers la perfection.1
Joseph Louis Lagrange (1795)

À la vérité le travail qu’il faudra faire pour trouver

ce diviseur, sera de nature, dans plusieurs cas,
à rebuter le Calculateur le plus intrépide. [. . . ] Dans un travail
aussi long que l’est souvent celui de l’élimination,
il n’est pas inutile de multiplier les méthodes
sur lesquelles les Calculateurs peuvent porter leur choix.2
Étienne Bézout (1764)

The theory of modular systems is very incomplete and offers

a wide field for research. The object of the algebraic theory is
to discover those general properties of a module [polynomial ideal]
which will afford a means of answering the question
whether a given polynomial is a member of a given module or not.
Such a question in its simpler aspect is of importance in Geometry
and in its general aspect is of importance in Algebra.
Francis Sowerby Macaulay (1916)

1 As long as algebra and geometry proceeded separately, their progress was slow and their application limited;
but when these two sciences joined forces, they mutually strengthened each other, and marched together at a rapid
pace toward perfection.
2 Actually the effort required to find this divisor will, in several cases, be so large as to discourage the most
intrepid Computer. [. . . ] In an undertaking that is as hard as elimination often is, it is not useless to multiply the
methods between which Computers can make their choice.
21
Gröbner bases

In this chapter, we present an important algorithmic approach to dealing with poly-

nomials in several variables. Hironaka (1964) introduced in his work on resolution
of singularities over C—for which he received the Fields medal, the “Nobel prize”
in mathematics—a special type of basis for polynomial ideals, called “standard ba-
sis”. Bruno Buchberger (1965) invented them independently in his Ph. D. thesis,
and named them Gröbner bases after his thesis advisor Wolfgang Gröbner. They
are a vital tool in modern computational algebraic geometry.
We start with two examples, one from robotics and one illustrating “automatic”
proofs of theorems in geometry. We then introduce the basic notions of orders
on monomials and the resulting division algorithm. Next come two important
theorems, by Dickson and by Hilbert, that guarantee finite bases for certain ideals.
Then we can define Gröbner bases and Buchberger’s algorithm to compute them.
The end of this chapter presents two “geometric” applications: implicitization of
algebraic varieties and solution of systems of polynomial equations. While these
fall naturally into the realm of manipulating polynomials, the examples in Sec-
tions 24.1 and 24.2 below are less expected: logical proof systems and analysis of
parallel processes. We cannot even mention numerous other applications, for ex-
ample, in tiling problems and term rewriting. We finish with some facts—without
proof—on the cost of computing Gröbner bases.

21.1. Polynomial ideals

We begin with two examples that illustrate the kind of problems that can be solved
with the help of Gröbner bases.

E XAMPLE 21.1. Figure 21.1 shows a very simple robot, one-armed with two
joints. The arm is fixed at one end with a joint to a point (say, the origin of the
Cartesian plane), and has another joint in the middle. The distance between the
two joints is 2, the joint between the two arms is in position (x, y), and the distance
from the second joint to the endpoint, at position (z, w), is 1. Furthermore, there is

591
592 21. Gröbner bases

F IGURE 21.1: A two-segment robot.

Q P
S

A R B

F IGURE 21.2: The three medians AP, BQ, and CR of a triangle ABC intersect at the center
of gravity S.

a line L = {(u, λu + µ): u ∈ R} with some fixed parameters λ, µ ∈ R. A simple

question is: can the robot reach the line? The possible positions (x, y, z, w) ∈ R 4 of
the robot are characterized by the algebraic equations

x2 + y2 = 4 and (z − x)2 + (w − y)2 = 1, (1)

and an answer to the question is either a quadruple (x, y, z, w) satisfying (1) and the
additional equation w = λz + µ, or a proof that no such quadruple exists. ✸

E XAMPLE 21.2. A well-known geometric theorem says that the three medians of
a triangle intersect at one point, the center of gravity of the triangle, and that the
intersection point trisects each median (Figure 21.2). We now formulate this as
21.1. Polynomial ideals 593

a problem about multivariate polynomials. Since the assumptions and the conclu-
sion of the theorem are invariant under translation, rotation, and scaling, we may
assume that two of the vertices of the triangle are A = (0, 0) and B = (1, 0), and the
third point is C = (x, y), with arbitrary x, y ∈ R. Then the midpoints of the three
edges BC, AC, and AB are P = ((x + 1)/2, y/2), Q = (x/2, y/2), and R = (1/2, 0),
respectively. We let S = (u, v) be the intersection point of the two medians AP
and BQ. (If y = 0, then these two lines coincide.) The condition that S lies on AP
is equivalent to saying that AS and AP have the same slope, so that
u x+1
= ,
v y
or, after clearing denominators,
f1 = uy − v(x + 1) = 0.
Similarly, the condition that S lies on BQ can be expressed as
f2 = (u − 1)y − v(x − 2) = 0.
The claims now are that S also lies on the third median CR, or
g1 = −2(u − x)y − (v − y)(1 − 2x) = −2uy − (v − y) + 2vx = 0,
and that S trisects each of the three medians, so that
(u, v) = AS = 2SP = (x + 1 − 2u, y − 2v),
(u − 1, v) = BS = 2SQ = (x − 2u, y − 2v),
(u − x, v − y) = CS = 2SR = (−2u + 1, −2v),
or equivalently,
g2 = 3u − x − 1 = 0 and g3 = 3v − y = 0.
A short computation shows that g1 = − f1 − f2 , so that g1 = 0 follows from f1 =
f2 = 0, which establishes that the three medians intersect indeed in S. We will
continue this example in Section 21.6. ✸

Let F be a field, R = F[x1 , . . . , xn ] a polynomial ring in n variables over F, and

f1 , . . . , fs ∈ R. The polynomials f1 , . . . , fs generate (or form a basis of) the ideal
n o
I = h f1 , . . . , fs i = ∑ qi fi : qi ∈ R
1≤i≤s

(not to be confused with the sequence ( f1 , . . . , fs )), and

V (I) = {u ∈ F n : f (u) = 0 for all f ∈ I} = {u ∈ F n : f1 (u) = · · · = fs (u) = 0}
is the variety of I . We also write V ( f1 , . . . , fs ) instead of V (h f1 , . . . , fs i) for short.
Interesting questions about I include:
594 21. Gröbner bases

◦ Is V (I) 6= Ø?
◦ How “big” is V (I)?
◦ Ideal membership problem: given f ∈ R, is f ∈ I?
◦ Triviality: Is I = R?

E XAMPLE 21.2 (continued). In Example 21.2, we have seen that

g1 = − f1 − f2 ∈ h f1 , f2 i ⊆ R[u, v, x, y],

and used this to prove that V ( f1 , f2 ) ⊆ V ( f3 ). If we can show that also g2 , g3 ∈

V ( f1 , f2 ), then this concludes the proof of the theorem. ✸

E XAMPLE 21.3. (i) Let f1 = x2 + y2 − 1 and f2 = y − 2 in R = R[x, y], and I =

h f1 , f2 i. Then

V (I) = {(u, v) ∈ R 2 : u2 + v2 − 1 = v − 2 = 0} = {(u, 2) ∈ R 2 : u2 = −3} = Ø

is the intersection of the circle V (x2 + y2 − 1) with the line V (y − 2) (see Fig-
ure 21.3), which is empty over R. If we regard f1 , f2 as polynomials in C[x, y] and
consider their variety over the complex numbers, then
√ √
V (I) = {(u, 2) ∈ C 2 : u2 = −3} = {( 3i, 2), (− 3i, 2)}
√
consists of two points, where i = −1 ∈ C.

✻y
V (y − 2)

V (x2 + y2 − 1)
✲
x

F IGURE 21.3: A circle and a line in R 2 .

(ii) Let f = (y2 + 6)(x − 1) − y(x2 + 1), g = (x2 + 6)(y − 1) − x(y2 + 1), h =
(x − 5/2)2 + (y − 5/2)2 − 1/2 in C[x, y], and I = h f , gi. We have seen in Example
6.41 that V (I), the intersection of the two plane curves V ( f ) and V (g), consists of
the six points
( )
1 ± √15i 1 ∓ √15i
V (I) = (2, 2), (2, 3), (3, 2), (3, 3), , ⊆ C 2. (2)
2 2
21.2. Monomial orders and multivariate division with remainder 595

A short calculation shows that h = − f − g ∈ I, and hence h vanishes at all points

of V (I) as well. Figure 6.3 shows the three curves V ( f ), V (g), V (h), and their four
real intersection points.
Now let h∗ = x2 + y2 − 5x − 5y + 11 ∈ C[x, y]. Then V (h∗ ) ∩ R is a circle cen-
tered at (5/2, 5/2) with a bigger radius than that of V (h), and it contains none of
the points in V (I). Thus h∗ does not belong to I. In fact, we have − f − g − h∗ = 1
in C[x, y], and hence h f , g, h∗ i = C[x, y] and V ( f , g, h∗ ) = Ø. Indeed, any common
root of f , g, h∗ would also be a “root” of 1, and hence there are no such common
roots. Hilbert’s famous Nullstellensatz (see Section 21.7) states that such a cer-
tificate always exists: when J is an ideal with V (J) = Ø, then 1 ∈ J. This is true
over C, and any algebraically closed field, but, as (i) shows, not over R. ✸

The structure of these varieties is the subject of algebraic geometry , a deep

mathematical theory whose most recent success is the solution of the 350–year old
Fermat problem (page 514). We give a few pointers to the recent exciting work in
computational algebraic geometry at the end of this chapter. A Gröbner basis of
I is a special “basis” for I, in which these questions are easy to answer. Everything
is easy in the case n = 1: Since F[x] is a Euclidean domain (Chapter 3), we have

h f1 , . . . , fs i = hgcd( f1 , . . . , fs )i (3)

(Exercise 21.3), so that we may assume s = 1. Now we let f , g ∈ F[x] and divide f
by g with remainder, yielding q, r ∈ F[x] with f = qg + r and deg r < deg g. Then

f ∈ hgi ⇐⇒ r = 0, (4)

and V (g) = {u1 , . . . , ud } if x − u1 , . . . , x − ud are the distinct linear factors of g in

F[x]. Equation (3) is no longer true in the case of two variables or more. For
example, gcd(x, y) = 1 in F[x, y], but hx, yi is different from h1i = F[x, y] (Exer-
cise 21.1). Much of the effort of Gröbner bases goes into restoring some of these
apparently lost properties, such as division with remainder.

21.2. Monomial orders and multivariate division with remainder

We introduce an analog of division with remainder in the case of many variables.
In order to do that, we have to say what the “leading term” of a polynomial should
be in the general case.
A partial order < on a set S is an irreflexive and transitive relation, so that

not (α < α) and α < β < γ =⇒ α < γ for all α, β , γ ∈ S.

These conditions imply that < is asymmetric: ((α < β ) and (β < α)) is always
false (Exercise 21.7). A partial order is a total order (or simply order) if either
596 21. Gröbner bases

α = β or α < β or β < α, for all α, β ∈ S, and is a well-order if in addition every

nonempty subset of S has a least element. If < is a partial order, we write α ≤ β if
α < β or α = β , and α > β or α ≥ β if β < α or β ≤ α, respectively. For example,
the familiar orders < on N, Z, Q, R are total, but only the first one is a well-order.
The inclusion ⊆ on the set 2N of subsets of N is a partial order, but not total.
We identify the vector α = (α1 , . . . , αn ) ∈ N n with the monomial

xα = x1α1 · · · xnαn ∈ R.

D EFINITION 21.4. A monomial order in R = F[x1 , . . . , xn ] is a relation ≺ on N n

such that
(i) ≺ is a total order,
(ii) α ≺ β =⇒ α + γ ≺ β + γ for all α, β , γ ∈ N n ,
(iii) ≺ is a well-order.

Under the above identification, this gives an order on the monomials in R. If

n = 1, for example, the natural order on N = N 1 is a monomial order, and the
corresponding univariate monomials are ordered by their degree.
If (i) holds, then (iii) is equivalent to the condition that there be no infinite de-
scending chain α1 ≻ α2 ≻ α3 ≻ · · · in N n .

E XAMPLE 21.5. Here are three standard examples of monomial orders:

(i) Lexicographic order:
α ≺lex β ⇐⇒ the leftmost nonzero entry in α − β is negative.

For example, if n = 3, α1 = (0, 4, 0), α2 = (1, 1, 2), α3 = (1, 2, 1), and α4 =

(3, 0, 0), then α1 ≺lex α2 ≺lex α3 ≺lex α4 .
(ii) Graded lexicographic order: Let α = (α1 , . . . , αn ), β = (β1 , . . . , βn ) ∈ N n .
Then

α ≺grlex β ⇐⇒ ∑ αi < ∑ βi or ∑ αi = ∑ βi and α ≺lex β .
1≤i≤n 1≤i≤n 1≤i≤n 1≤i≤n

With α1 , . . . , α4 as above, we have α4 ≺grlex α1 ≺grlex α2 ≺grlex α3 .

(iii) Graded reverse lexicographic order:

α ≺grevlex β ⇐⇒ ∑ αi < ∑ βi or ∑ αi = ∑ βi and the
1≤i≤n 1≤i≤n 1≤i≤n 1≤i≤n

rightmost nonzero entry in α − β ∈ Z n is positive .

Then α4 ≺grevlex α2 ≺grevlex α3 ≺grevlex α1 , with α1 , . . . , α4 as before. ✸

21.2. Monomial orders and multivariate division with remainder 597

In all three examples, the variables (that is, the monomials of degree one) are
ordered as x1 ≻ x2 ≻ . . . ≻ xn−1 ≻ xn . “Graded” refers to the fact that the total
degree ∑ αi is the main criterion. In the case n = 1, we have ≺lex = ≺grlex = ≺grevlex .
Once we have a monomial order on R, we can sort terms of a polynomial ac-
cording to ≺.

E XAMPLE 21.5 (continued). Let f = 4xyz2 + 4x3 − 5y4 + 7xy2 z ∈ Q[x, y, z] (we
always identify x, y, z with x1 , x2 , x3 ). Then the orders of f with respect to ≺lex ,
≺grlex , and ≺grevlex are: 4x3 + 7xy2 z + 4xyz2 − 5y4 , 7xy2 z + 4xyz2 − 5y4 + 4x3 , and
−5y4 + 7xy2 z + 4xyz2 + 4x3 , respectively. ✸

T HEOREM 21.6.
≺lex , ≺grlex , and ≺grevlex , are monomial orders.

P ROOF. The proof is a simple check; we give some details only for ≺grevlex . We
omit the verification that ≺grevlex is a partial order. For each α, β ∈ N n with α 6= β ,
we have either ∑1≤i≤n αi < ∑1≤i≤n βi , or ∑1≤i≤n βi < ∑1≤i≤n αi , or ∑1≤i≤n αi =
∑1≤i≤n βi , and in the last case either the rightmost nonzero entry in α − β is positive
or the rightmost nonzero entry in β − α is positive. Thus ≺grevlex is total.
For condition (ii), we have

∑ αi < ∑ βi ⇐⇒ ∑ (αi + γi ) < ∑(βi + γi ),

1≤i≤n 1≤i≤n i

and similarly for “=”, and α − β = (α + γ ) − (β + γ ). For condition (iii), if S ⊆ N n

is nonempty and T ⊆ S the set of monomials of smallest total degree in S, then
T is finite (since for any m ∈ N there are only finitely many monomials of total
degree m) and min T = min S. ✷

The antilexicographic order ≺alex on N 2 , with α ≺alex β ⇐⇒ β ≺lex α, is an

example where condition (iii) is violated: S = N × {0} has no smallest element,
since (0, 0) ≻alex (1, 0) ≻alex (2, 0) ≻alex · · ·.

D EFINITION 21.7. Let f = ∑α∈N n cα xα ∈ R be a nonzero polynomial with all

cα ∈ F (only finitely many nonzero), and ≺ a monomial order.

(i) Each cα xα with cα 6= 0 is a term of f .

(ii) The multidegree of f is mdeg( f ) = max≺ {α ∈ N n : cα 6= 0}, where max≺ is

the maximum with respect to ≺.

(iii) The leading coefficient of f is lc( f ) = cmdeg( f ) ∈ F \ {0}.

598 21. Gröbner bases

(iv) The leading monomial of f is lm( f ) = xmdeg( f ) ∈ R.

(v) The leading term of f is lt( f ) = lc( f ) · lm( f ) ∈ R.

E XAMPLE 21.5 (continued). To illustrate these notions, we take the polynomial

f = 4xyz2 + 4x3 − 5y4 + 7xy2 z ∈ Q[x, y, z] and the three orders from Example 21.5.

≺lex ≺grlex ≺grevlex

mdeg( f ) (3, 0, 0) (1, 2, 1) (0, 4, 0)
lc( f ) 4 7 −5
3 2
lm( f ) x xy z y4
3 2
lt( f ) 4x 7xy z −5y4 ✸

If we wanted to extend these definitions to the zero polynomial, we would

formally adjoin the element −∞ to N n and let lt(0) = lm(0) = lc(0) = 0 and
mdeg(0) = −∞, with the obvious interpretation. Also, α 4 β means that α ≺ β or
α = β.
The following lemma is proved in Exercise 21.11.

L EMMA 21.8. Let ≺ be a monomial order on R, and f , g ∈ R \ {0}.

(i) mdeg( f g) = mdeg( f ) + mdeg(g).

(ii) If f + g 6= 0 then mdeg( f + g) 4 max{mdeg( f ), mdeg(g)}, with equality if
mdeg( f ) 6= mdeg(g).

Our next goal is an algorithm for division with remainder in R. Given polyno-
mials f , f1 , . . . , fs ∈ R, we want to write f = q1 f1 + . . . + qs fs + r with q1 , . . . , qs , r
in R. Before stating the algorithm formally, we give some examples.

E XAMPLE 21.9. Let ≺ = ≺lex , f = xy2 + 1, f1 = xy + 1, f2 = y + 1.

xy + 1 y + 1 xy + 1 y + 1
xy2 + 1 y xy2 + 1 xy
−(xy2 + y) −(xy2 + xy)
−y + 1 −1 −xy + 1 −x
−(−y − 1) −(−xy − x)
2 x+1

In the left hand table, division is performed as in the univariate case, with the
difference that we have two divisors instead of one. The quotient of the two lead-
ing terms that we get in each step is recorded in the column below the respec-
tive divisor. In the last line, 2 is not divisible by the leading term of f1 or f2 ,
21.2. Monomial orders and multivariate division with remainder 599

and the process stops. Hence f = y · f1 − 1 · f2 + 2. Note that there is some

non-determinism here because we might as well have chosen f2 instead of f1
in the first step; this is illustrated in the right hand table, where we find that
f = 0 · f1 + (xy − x) · f2 + (x + 1), and no term in x + 1 is divisible by lc( f1 ) or
lc( f2 ). ✸

E XAMPLE 21.10. Let ≺=≺lex , f = x2 y + xy2 + y2 , f1 = xy − 1, f2 = y2 − 1.

xy − 1 y2 − 1 remainder
2 2 2
x y + xy + y x
−(x2 y − x)
xy2 + x + y2 y
−(xy2 − y)
x + y2 + y x
−x
2
y +y 1
−(y2 − 1)
y+1
Here, a phenomenon occurs that cannot happen in the univariate case. In the third
step, the leading term x is not divisible by the leading term of f1 or f2 and is moved
to the remainder column. Afterwards, the division may continue further, and the
result is f = (x + y) · f1 + 1 · f2 + (x + y + 1). ✸

A LGORITHM 21.11 Multivariate division with remainder.

Input: Nonzero polynomials f , f1 , . . . , fs ∈ R = F[x1 , . . . , xn ], where F is a field,
and a monomial order ≺ on R.
Output: q1 , . . . , qs , r ∈ R such that f = q1 f1 + · · · + qs fs + r and no monomial in r
is divisible by any of lt( f1 ), . . . , lt( fs ).
1. r ←− 0, p ←− f
for i = 1, . . . , s do qi ←− 0
2. while p 6= 0 do
3. if lt( fi ) divides lt(p) for some i ∈ {1, . . . , s}
lt(p) lt(p)
then choose some such i, qi ←− qi + , p ←− p − fi
lt( fi ) lt( fi )
else r ←− r + lt(p), p ←− p − lt(p)
4. return q1 , . . . , qs , r.

The following theorem implies the correctness of the algorithm; its proof is left
as Exercise 21.12.
600 21. Gröbner bases

T HEOREM 21.12.
Each time the algorithm passes through step 3, the following invariants hold.
(i) mdeg(p) 4 mdeg( f ) and f = p + q1 f1 + · · · + qs fs + r,
(ii) qi 6= 0 =⇒ mdeg(qi fi ) 4 mdeg( f ) for 1 ≤ i ≤ s,
(iii) no term in r is divisible by any lt( fi ).

This kind of division with remainder need not be unique: there may be a choice
for the value of i in step 3 when the leading term of f is divisible by more than one
lt( fi ). We have already encountered this in Example 21.9, and here is another one.

E XAMPLE 21.10 (continued). If we choose f2 instead of f1 in the second step of

Example 21.10, we get
xy − 1 y2 − 1 remainder
x2 y + xy2 + y2 x
−(x2 y − x)
xy2 + x + y2 x
−(xy2 − x)
2x + y2 2x
−2x
y2 1
−(y2 − 1)
1
and f = x · f1 + (x + 1) · f2 + (2x + 1). ✸

If we make Algorithm 21.11 deterministic by always choosing the smallest pos-

sible i in step 3, then the quotients q1 , . . . , qs , and the remainder r, for which we
write
r = f rem ( f1 , . . . , fs ), (5)
are uniquely determined. For s = 1, division with remainder solves the ideal mem-
bership problem: f ∈ h f1 i holds if and only if the remainder is zero. This fails in
general for s ≥ 2, as the following example shows.

E XAMPLE 21.13. We divide f = xy2 − x by f1 = xy + 1 and f2 = y2 − 1:

xy + 1 y2 − 1
2
xy − x y
2
−(xy + y)
−x − y
21.3. Monomial ideals and Hilbert’s basis theorem 601

so that f = y · f1 + 0 · f2 + (−x − y). But on the other hand, f = 0 · f1 + x · f2 + 0,

which shows that f ∈ h f1 , f2 i. ✸

Our goal is now to find a special basis of an arbitrary ideal such that the remain-
der on division by that basis is unique and thus gives the correct answer to the ideal
membership problem, as in (4) for n = 1. At first sight, it is not clear whether such
a type of basis exists at all.

21.3. Monomial ideals and Hilbert’s basis theorem

D EFINITION 21.14. A monomial ideal I ⊆ R is an ideal generated by monomials
in R, so that there exists a subset A ⊆ N n with

I = hxA i = h{xα : α ∈ A}i.

L EMMA 21.15. Let I = hxA i ⊆ R be a monomial ideal, and β ∈ N n . Then

xβ ∈ I ⇐⇒ ∃α ∈ A xα | xβ .

P ROOF. “⇐=” is clear. Conversely, let α1 , . . . , αs ∈ A and q1 , . . . , qs ∈ R such

that xβ = ∑i qi xαi . There is at least one term qi xαi in which xβ occurs, and then
xαi | xβ . ✷

L EMMA 21.16. Let I ⊆ R be a monomial ideal and f ∈ R. Then the following are
equivalent:
(i) f ∈ I ,
(ii) each term of f is in I ,
(iii) f is an F –linear combination of monomials in I .

P ROOF. (i) =⇒ (ii): As in the proof of Lemma 21.15, a representation f = ∑i qi xαi

implies that each monomial of f is divisible by some xγ , with γ ∈ A. The implica-
tions (ii) =⇒ (iii) =⇒ (i) are clear, and, in fact, hold for any ideal I. ✷

For example, if I = hx3 , x2 yi ⊆ Q[x, y], then the lemma shows that 3x4 +5x2 y3 ∈ I
and 2x4 y + 7x2 6∈ I. The implication (i) =⇒ (ii) is false for some ideals, as shown
in Example 21.21 below.

C OROLLARY 21.17.
Two monomial ideals are identical if and only if they contain the same monomials.
602 21. Gröbner bases

T HEOREM 21.18 Dickson’s lemma.

Every monomial ideal is generated by a finite set of monomials. More precisely,
for every A ⊆ N n there exists a finite subset B ⊆ A such that hxA i = hxB i.

P ROOF. Except for its last sentence, our proof is purely combinatorial, without
any algebra. The claim is trivial if A = Ø, and we may assume that A is nonempty.
We define a relation “≤” on N n by

α ≤ β ⇐⇒ αi ≤ βi for 1 ≤ i ≤ n,

for all α = (α1 , . . . , αn ) and β = (β1 , . . . , βn ) in N n , so that α ≤ β if and only if

xα | xβ . As usual, we write α < β if α ≤ β and α 6= β . Then < is a partial order
on N n which is not total if n ≥ 2 (for example, we have neither (1, 0) < (0, 1) nor
(1, 0) > (0, 1)). We now let B = {α ∈ A: ∀β ∈ A β 6< α} be the set of minimal
elements of A with respect to ≤, and claim that B is a finite subset of A and

for each α ∈ A there exists some β ∈ B with β ≤ α. (6)

For any α ∈ N n there are only finitely many β ∈ N n with β ≤ α (Exercise 21.13),
and hence there is no infinite descending chain of elements α(1) > α(2) > α(3) > · · ·
in N n . In particular, for any α ∈ A there is some minimal element β ∈ B such that
β ≤ α.
It remains to show that B is finite, which we prove by induction on n. If n = 1,
then < is a total order, and B consists of the unique smallest element of A. If
n ≥ 2, we let A∗ = {(α1 , . . . , αn−1 ) ∈ N n−1 : ∃αn ∈ N (α1 , . . . , αn ) ∈ A}. By the
induction hypothesis, the set B∗ of minimal elements of A∗ is finite. For each β =
(β1 , . . . , βn−1 ) ∈ B∗ , we choose some bβ ∈ N such that (β1 , . . . , βn−1 , bβ ) ∈ A, and
let b = max{bβ : β ∈ B∗ }. We claim that every (α1 , . . . , αn ) ∈ B has αn ≤ b. Let α =
(α1 , . . . , αn ) ∈ B. Then there exists some minimal element β = (β1 , . . . , βn−1 ) ∈ B∗
of A∗ such that β ≤ (α1 , . . . , αn−1 ). If αn > b, then

(β1 , . . . , βn−1 , bβ ) ≤ (β1 , . . . , βn−1 , b) < α,

and α is not minimal. This proves the claim, and similarly we also find that all
other coordinates of minimal elements are bounded, which implies that there are
only finitely many of them.
Now (6) and the fact that α ≤ β ⇐⇒ xα | xβ imply that xA ⊆ hxB i, whence
hx i ⊆ hxB i, and the reverse inclusion follows trivially from B ⊆ A. ✷
A

E XAMPLE 21.19. Let n = 2 and A = {(α1 , α2 ) ∈ N 2 : 6α2 = α12 −7α1 +18}. Then
the set of minimal elements is B = {(0, 3), (1, 2), (3, 1)}, as can be seen from Fig-
ure 21.4, and hence hxA i = hy3 , xy2 , x3 yi. ✸
21.3. Monomial ideals and Hilbert’s basis theorem 603

α2

α1

F IGURE 21.4: The monomial ideal I = hxA i, where A = {(α1 , α2 ) ∈ N 2 : f (α1 , α2 ) = 0}

and f (α1 , α2 ) = 6α2 −α12 +7α1 −18. The elements of A are marked by a •, the black curve
is the set of all real (α1 , α2 ) such that f (α1 , α2 ) = 0, and a monomial xα is contained in I
if and only if α lies in the gray area.

C OROLLARY 21.20.
Let ≺ be a total order on N n such that
∀α, β , γ ∈ N α ≺ β =⇒ α + γ ≺ β + γ .
Then ≺ is a well-order if and only if α < 0 for all α ∈ N n .

P ROOF. We only prove “⇐=”; the reverse implication is Exercise 21.8. Let A ⊆
N n be nonempty, and I = hxA i ⊆ R. Then I is finitely generated, by Dickson’s
lemma, and
∃α1 , . . . , αs ∈ A I = hxα1 , . . . , xαs i.
We order them so that α1 ≺ α2 ≺ · · · ≺ αs , and claim that min≺ A = α1 . Let α ∈ A
be arbitrary. Since xα ∈ I, by Lemma 21.15 there exist i ≤ s and γ ∈ N n with
α = αi + γ . Thus α = αi + γ < α1 + γ < α1 + 0 = α1 , and hence α1 = min≺ A. ✷

Thus we can replace the condition (iii) in the definition of monomial orders by
(iii)’ ∀α ∈ N n α < 0.
For any subset G ⊆ R different from Ø and {0}, we let lt(G) = {lt(g): g ∈ G}.
If I ⊆ R is an ideal, then there is a finite subset G ⊆ I such that hlt(G)i = hlt(I)i,
by Dickson’s lemma. However, it can happen that a finite set G generates I but
hlt(G)i ( hlt(I)i, as in the following example.
604 21. Gröbner bases

E XAMPLE 21.21. Let g = x3 − 2xy, h = x2 y − 2y2 + x ∈ Q[x, y], ≺ = ≺grlex , G =

{g, h}, and I = hGi. Then x2 = −y · g + x · h and x2 ∈ hlt(I)i but x2 6∈ hlt(G)i =
hx3 , x2 yi. ✸

L EMMA 21.22. Let I be an ideal in R = F[x1 , . . . , xn ]. If G ⊆ I is a finite subset

such that hlt(G)i = hlt(I)i, then hGi = I .

P ROOF. Let G = {g1 , . . . , gs }. If f is an arbitrary polynomial in I, then division

with remainder yields f = q1 g1 + · · · + qs gs + r, with q1 , . . . , qs , r ∈ R, such that
either r = 0 or no term of r is divisible by the leading term of any gi . But r =
f − q1 g1 − · · · − qs gs ∈ I, and hence lt(r) ∈ lt(I) ⊆ hlt(g1 ), . . . , lt(gs )i. This together
with Lemma 21.15 implies that r = 0. Thus f ∈ hg1 , . . . , gs i = hGi. ✷

Together with Dickson’s lemma, applied to hlt(I)i, and the fact that the zero
ideal {0} is generated by the zero polynomial, we obtain the following famous
result.

T HEOREM 21.23 Hilbert’s basis theorem.

Every ideal I in R = F[x1 , . . . , xn ] is finitely generated. More precisely, there exists
a finite subset G ⊆ I such that hGi = I and hlt(G)i = hlt(I)i.

An immediate consequence is the following corollary.

C OROLLARY 21.24 Ascending chain condition.

Let I1 ⊆ I2 ⊆ I3 ⊆ . . . be an ascending chain of ideals in R. Then the chain stabilizes,
so that In = In+1 = In+2 = · · · for some n ∈ N.

S
P ROOF. Let I = j≥1 I j . Then I is an ideal, which is finitely generated, by Hil-
bert’s basis theorem, say I = hg1 , . . . , gs i. With n = min{ j ≥ 1: g1 , . . . , gs ∈ I j }, we
then have In = In+1 = · · · = I. ✷

In general, a ring satisfying the ascending chain condition is called Noetherian,

after Emmy Noether. Thus F[x1 , . . . , xn ] is Noetherian if F is a field.

21.4. Gröbner bases and S-polynomials

Hilbert’s basis theorem motivates the following definition.

D EFINITION 21.25. Let ≺ be a monomial order and I ⊆ R an ideal. A finite set

G ⊆ I is a Gröbner basis for I with respect to ≺ if hlt(G)i = hlt(I)i.
21.4. Gröbner bases and S-polynomials 605

Lemma 21.22 says that any Gröbner basis G for I is in fact a basis of I in the
ring theoretic sense, which means that hGi = I. With the convention that hi =
hØi = {0}, Hilbert’s basis theorem implies the following.

C OROLLARY 21.26.
Every ideal I in R = F[x1 , . . . , xn ] has a Gröbner basis.

In Example 21.21, G is not a Gröbner basis for I, but {g, h, x2 , 2xy, −2y2 + x} is,
as we will see below.
We now want to show that division with remainder by a Gröbner basis is a valid
ideal membership test. Throughout this section, we assume some monomial order
on R.

L EMMA 21.27. Let G be a Gröbner basis for an ideal I ⊆ R, and f ∈ R. Then

there is a unique polynomial r ∈ R with

(i) f − r ∈ I ,

(ii) no term of r is divisible by any monomial in lt(G).

P ROOF. Existence follows from Theorem 21.12. For the uniqueness, we suppose
that f = h1 + r1 = h2 + r2 with h1 , h2 ∈ I and no term of r1 or r2 divisible by any
of lt(G). Then r1 − r2 = h2 − h1 ∈ I, and lt(r1 − r2 ) is divisible by some lt(g) with
g ∈ G, by Lemma 21.15. Hence r1 − r2 = 0. ✷

Thus the remainder r on division of f by G does not depend on the order in

which we divide by the elements of G. Extending the notation (5), we then write

f rem G = r ∈ R.

The next result follows immediately from the lemma.

T HEOREM 21.28.
Let G be a Gröbner basis for the ideal I ⊆ R with respect to a monomial order ≺,
and f ∈ R. Then f ∈ I if and only if f rem G = 0.

This property of G is equivalent to being a Gröbner basis (Exercise 21.17). It

solves the ideal membership problem: all we have to do in order to test whether
f ∈ I is to divide f by G.
However, the proof of Hilbert’s basis theorem is not constructive: it does not tell
us how to compute a Gröbner basis for an ideal I from a given basis G. We now
606 21. Gröbner bases

solve this computational problem, so that we obtain an effective version of Hil-

bert’s basis theorem. In order to construct a Gröbner basis, we investigate how G
can fail to be a Gröbner basis. One possible reason is that a linear combination
axα g + bxβ h of two polynomials g, h ∈ G with a, b ∈ F and α, β ∈ N n may yield
a polynomial whose leading term is not divisible by any of lt(G), by cancellation
of leading terms. In Example 21.21, we had (−y) · g + x · h = x2 , and lt(x2 ) = x2 is
not divisible by any of lt(G) = {x3 , x2 y}.

D EFINITION 21.29. Let g, h ∈ R be nonzero, α = (α1 , . . . , αn ) = mdeg(g), β =

(β1 , . . . , βn ) = mdeg(h), and γ = (max{α1 , β1 }, . . . , max{αn , βn }). The S-polyno-
mial of g and h is
xγ xγ
S(g, h) = g− h ∈ R. (7)
lt(g) lt(h)

Clearly S(h, g) = −S(g, h), and since xγ / lt(g), xγ / lt(h) ∈ R, we have S(g, h) ∈
hg, hi. In Example 21.21, we have α = (3, 0), β = (2, 1), γ = (3, 1), and

x3 y x3 y
S(g, h) = g − h = −x2 .
x3 x2 y

The following lemma says that when cancellation of leading terms occurs in a
linear combination of polynomials in G, it necessarily comes from S-polynomials.

L EMMA 21.30. Let g1 , . . . , gs ∈ R, α1 , . . . , αs ∈ N n , c1 , . . . , cs ∈ F \ {0},

f= ∑ ci xαi gi ∈ R, (8)
1≤i≤s

and δ ∈ N n such that αi + mdeg(gi ) = δ for 1 ≤ i ≤ s, and mdeg( f ) ≺ δ . (This

expresses the assumption that the leading terms cancel.) Then xγi j divides xδ for
1 ≤ i < j ≤ s, where xγi j = lcm(lm(gi ), lm(g j )), and there are ci j ∈ F such that

f= ∑ ci j xδ−γi j S(gi , g j ) (9)

1≤i< j≤s

and mdeg(xδ−γi j S(gi , g j )) ≺ δ for all 1 ≤ i < j ≤ s.

P ROOF. By multiplying each ci with lc(gi ) if necessary, we may assume that

lc(gi ) = 1 and lt(gi ) = lm(gi ) = xmdeg(gi ) for all i. Let 1 ≤ i < j ≤ s. The monomial
xδ = xαi lm(gi ) = xα j lm(g j ) is a common multiple of lm(gi ) and lm(g j ), so that
xγi j | xδ . Now
xγi j x γi j
S(gi , g j ) = gi − g j,
lt(gi ) lt(g j )
21.4. Gröbner bases and S-polynomials 607

and hence mdeg(S(gi , g j )) 4 γi j , by Lemma 21.8. Since the leading terms in (7)
cancel, we have

mdeg(xδ−γi j S(gi , g j )) = δ − γi j + mdeg(S(gi , g j )) ≺ δ − γi j + γi j = δ .

We prove (9) by induction on s. If s = 1, then no cancellation can occur, and the

claim is vacuously true. If s ≥ 2, then

g = f − c1 xδ−γ12 S(g1 , g2 )

α1 α2 αi δ −γ12 xγ12 xγ12
= c1 x g1 + c2 x g2 + ∑ ci x gi − c1 x g1 − g2
3≤i≤s lt(g1 ) lt(g2 )
= c1 (xα1 − xδ−mdeg(g1 ) )g1 + (c2 xα2 + c1 xδ−mdeg(g2 ) )g2 + ∑ ci xαi gi
3≤i≤s
α2 αi
= (c1 + c2 )x g2 + ∑ ci x gi ,
3≤i≤s

where we use that α1 + mdeg(g1 ) = δ = α2 + mdeg(g2 ) in the last equation. Now

Lemma 21.8 says that mdeg(g) 4 max≺ {mdeg( f ), mdeg(xδ−γ12 S(g1 , g2 ))} ≺ δ ,
and hence g is again of the form (8), with a sum of length s − 1 if c1 + c2 6= 0
and of length s − 2 otherwise. The induction hypothesis implies that

g= ∑ ci j xδ−γi j S(gi , g j ),
2≤i< j≤s

with some ci j ∈ F for 2 ≤ i < j ≤ s; we have g = 0 if s = 2. If we let c12 = c1 and

c1 j = 0 for 3 ≤ j ≤ s, then

f = g + c1 xδ−γ12 S(g1 , g2 ) = ∑ ci j xδ−γi j S(gi , g j ). ✷

1≤i< j≤s

The following theorem gives us an easy test for Gröbner bases.

T HEOREM 21.31.
A finite set G = {g1 , . . . , gs } ⊆ R is a Gröbner basis of the ideal hGi if and only if

S(gi , g j ) rem (g1 , . . . , gs ) = 0 for 1 ≤ i < j ≤ s. (10)

P ROOF. “=⇒” follows from Theorem 21.28, since S(gi , g j ) ∈ I = hGi for all i, j.
For the reverse direction, we let f ∈ I \ {0}, and have to show that lt( f ) ∈ hlt(G)i.
We write
f = ∑ qi gi , δ = max≺ {mdeg(qi gi ): 1 ≤ i ≤ s}, (11)
1≤i≤s
608 21. Gröbner bases

with all qi ∈ R. Then mdeg( f ) 4 δ . If strict inequality holds, then some cancella-
tion of leading terms occurs in (11), and

f∗ = ∑ lt(qi )gi
1≤i≤s
mdeg(qi gi )=δ

is of the form (8). We can write f ∗ as an F-linear combination of polynomials

of the form xαi j S(gi , g j ), with αi j ∈ N n such that αi j + mdeg(S(gi , g j )) ≺ δ for
all i, j, by Lemma 21.30, and then divide by (g1 , . . . , gs ) with remainder. Since the
remainder of S(gi , g j ) on division by (g1 , . . . , gs ) is zero, we get q∗i ∈ R for 1 ≤ i ≤ s
with f ∗ = ∑1≤i≤s q∗i gi , and Theorem 21.12 (ii) implies that max≺ {mdeg(q∗i gi ) :
1 ≤ i ≤ s} ≺ δ . Now both f − f ∗ and f ∗ have a representation of the form (11)
with a smaller value of δ , and so has f .
Continuing this replacement process if necessary (it terminates because of the
well-order property), we find qi ∈ R for 1 ≤ i ≤ s with (11) and mdeg( f ) = δ , and
hence mdeg( f ) = mdeg(qi gi ) for at least one index i. Then

lt( f ) = ∑ lt(qi ) · lt(gi ) ∈ hlt(G)i. ✷

1≤i≤s
mdeg(qi gi )=δ

E XAMPLE 21.32. The twisted cubic C = V (G) is the curve in F 3 given by G =

{y−x2 , z−x3 }, so that C = {(a, a2 , a3 ): a ∈ F} ⊆ F 3 . In R 3 , C is the intersection of
the two cylindrical surfaces V (y − x2 ) and V (z − x3 ), as illustrated in Figure 21.5.
G is a Gröbner basis for I = hGi with respect to the lexicographic order and y ≻
z ≻ x, by Theorem 21.31, since

S(y − x2 , z − x3 ) = z(y − x2 ) − y(z − x3 ) = yx3 − zx2

= x3 (y − x2 ) + (−x2 )(z − x3 ) + 0. ✸

21.5. Buchberger’s algorithm

We now describe a method for computing a Gröbner basis, and start by look-
ing at Example 21.21. The basic idea is that, whenever (10) is violated, we
add the offending S-polynomial to our basis. Let ≺ = ≺grlex with y ≺ x, f1 =
x3 − 2xy, and f2 = x2 y − 2y2 + x ∈ Q[x, y]. G = { f1 , f2 } is not a Gröbner basis since
S( f1 , f2 ) = −x2 and lt(S( f1 , f2 )) = −x2 6∈ hx3 , x2 yi = hlt(G)i. Now we include
f3 = S( f1 , f2 ) rem ( f1 , f2 ) = −x2 in our basis, and S( f1 , f2 ) rem ( f1 , f2 , f3 ) = 0.
Next,
S( f1 , f3 ) = 1 · f1 − (−x) · f3 = −2xy,

S( f1 , f3 ) rem ( f1 , f2 , f3 ) = −2xy = f4 ,
21.5. Buchberger’s algorithm 609

–1
0 z

y
0
–1
1 x
0 1
–1

F IGURE 21.5: The twisted cubic (black) in R 3 .

which is adjoined to our basis, so that S( f1 , f3 ) rem ( f1 , . . . , f4 ) = 0. Now

1
S( f1 , f4 ) = y · f1 − (− x2 ) · f4 = −2xy2 = y · f4 ,
2

so that S( f1 , f4 ) rem ( f1 , . . . , f4 ) = 0, and

S( f2 , f3 ) = 1 · f2 − (−y) f3 = −2y2 + x.

After adjoining f5 = S( f2 , f3 ) rem ( f1 , . . . , f4 ) = −2y2 + x, we check that

S( fi , f j ) rem ( f1 , . . . , f5 ) = 0 for 1 ≤ i < j ≤ 5,

and { f1 , . . . , f5 } is a Gröbner basis.

We now present a simplified version of Buchberger’s (1965) algorithm.
610 21. Gröbner bases

A LGORITHM 21.33 Gröbner basis computation.

Input: f1 , . . . , fs ∈ R = F[x1 , . . . , xn ], and a monomial order ≺.
Output: A Gröbner basis G ⊆ R for the ideal I = h f1 , . . . , fs i with respect to ≺,
with f1 , . . . , fs ∈ G.

1. G ←− { f1 , . . . , fs }

2. repeat

3. S ←− Ø
order the elements of G somehow as g1 , . . . , gt
for i = 1, . . . ,t − 1 and j = i + 1, . . . ,t do
4. r ←− S(gi , g j ) rem (g1 , . . . , gt )
if r 6= 0 then S ←− S ∪ {r}
5. if S = Ø then return G else G ←− G ∪ S

T HEOREM 21.34.
Algorithm 21.33 works correctly as specified.

P ROOF. First we show the correctness assuming that the procedure terminates. At
any stage of the algorithm, the set G in step 2 is a basis of I and f1 , . . . , fs ∈ G,
since this is true initially and only elements of I, namely the remainders of S-
polynomials of gi , g j ∈ I on division by elements of I, are added to G during the
algorithm. If the algorithm terminates, the remainders of all the S-polynomials on
division by G are zero, and G is a Gröbner basis by Theorem 21.31.
It remains to show that the algorithm terminates. If G and G∗ correspond to
successive passes through step 2, then G∗ ⊇ G and hlt(G∗ )i ⊇ hlt(G)i. Hence the
ideals hlt(G)i in successive passes through step 2 form an ascending chain, which
stabilizes by the ascending chain condition of Corollary 21.24. Thus, after a finite
number of steps we have hlt(G∗ )i = hlt(G)i. We claim that then G = G∗ . So let
g, h ∈ G, and r = S(g, h) rem G. Then r ∈ G∗ and either r = 0 or lt(r) ∈ hlt(G∗ )i =
hlt(G)i, and from the definition of the remainder we conclude that r = 0. ✷

C OROLLARY 21.35.
The following problems are solvable using Gröbner bases:

(i) ideal membership testing via division with remainder,

(ii) ideal containment testing,

(iii) ideal equality testing.

21.5. Buchberger’s algorithm 611

In general, the Gröbner basis computed by Buchberger’s algorithm is neither

minimal nor unique. One can, however, manipulate it so that both properties hold.

L EMMA 21.36. If G is a Gröbner basis of I ⊆ R, g ∈ G, and lt(g) ∈ hlt(G \ {g})i,

then G \ {g} is a Gröbner basis of I .

P ROOF. Exercise 21.18. ✷

D EFINITION 21.37. A subset G ⊆ R is a minimal Gröbner basis for I = hGi if it

is a Gröbner basis for I and for all g ∈ G
(i) lc(g) = 1,
(ii) lt(g) 6∈ hlt(G \ {g})i.
An element g of a Gröbner basis G is reduced with respect to G if no monomial
of g is in hlt(G \ {g})i. A minimal Gröbner basis G for I ⊆ R is reduced if all its
elements are reduced with respect to G.

T HEOREM 21.38.
Every ideal has a unique reduced Gröbner basis.

P ROOF. We first show the existence. Repeatedly applying Lemma 21.36 if neces-
sary, we may start with a minimal Gröbner basis G = {g1 , . . . , gs }. For 1 ≤ i ≤ s,
we then set
hi = gi rem {h1 , . . . , hi−1 , gi+1 , . . . , gs }.
Induction on i proves that lt(g j ) = lt(h j ) and h j is reduced with respect to Gi =
{h1 , . . . , hi , gi+1 , . . . , gs } for 0 ≤ j ≤ i ≤ s, and finally Gs = {h1 , . . . , hs } is a reduced
Gröbner basis.
Now suppose that G and G∗ are reduced Gröbner bases of I. We claim that
lt(G) = lt(G∗ ). For g ∈ lt(G) ⊆ hlt(G)i = lt(I) = hlt(G∗ )i, there is some g∗ ∈ G∗
such that lt(g∗ ) | lt(g), by Lemma 21.15. By a symmetric argument, there exists
a g∗∗ ∈ G such that lt(g∗∗ ) | lt(g∗ ). Since G is minimal, we have lt(g) = lt(g∗ ) =
lt(g∗∗ ) ∈ lt(G∗ ), and lt(G) ⊆ lt(G∗ ). Similarly, lt(G∗ ) ⊆ lt(G), which proves the
claim.
For a given g ∈ G, let g∗ ∈ G∗ be such that lt(g) = lt(g∗ ). Both G and G∗
are reduced, and hence no monomial in g − g∗ ∈ I is divisible by any element of
lt(G) = lt(G∗ ). Thus g − g∗ = g − g∗ rem G = 0 since g − g∗ ∈ I, whence g ∈ G∗ ,
G ⊆ G∗ , and by a symmetric argument, also G∗ ⊆ G. ✷

At the beginning of this section, we saw how several polynomials may have
to be added to form a Gröbner basis. How many? In Section 21.7, we will learn
612 21. Gröbner bases

the rather devastating answer: sometimes doubly exponentially many, and their de-
grees may be doubly exponentially large (in the number of variables). It is not easy
to say how many steps Buchberger’s algorithm takes, but for such huge outputs it
uses at least exponential space.
Both Gaussian elimination and Euclid’s algorithm for gcds in F[x] are special
cases of Buchberger’s algorithm (see Exercise 21.24 for the former).

21.6. Geometric applications

Algebra provides a powerful language which can express a wide range of prob-
lems. Then a general–purpose solution strategy like Gröbner bases is available for
the solution of these problems. Of course, this approach has its limitations, for
example, with regards to efficiency. We now give two sample applications from
geometry. In Chapter 24, we will exhibit two computer science questions whose
geometric formulation is not obvious (but easy to produce, once we set our mind
on it), and a larger example from chemistry.

Automatic geometric theorem proving. Geometric theorems can often be for-

mulated in terms of polynomial equations. We have seen this for the theorem
about the center of gravity of a triangle in Example 21.2, where the hypotheses
yielded two polynomials f1 , f2 ∈ R[u, v, x, y], the conclusions three further polyno-
mials g1 , g2 , g3 ∈ R[u, v, x, y], and the theorem is equivalent to “ f1 = f2 = 0 =⇒
g1 = g2 = g3 = 0”. In general, we obtain a set of hypothesis polynomials f1 , . . . , fs
in R = R[x1 , . . . , xn ] and one (or several) conclusion polynomials g ∈ R, and the
theorem is true if and only if V ( f1 , . . . , fs ) ⊆ V (g). In particular, we can prove the
theorem by showing that g ∈ h f1 , . . . , fs i.
In Example 21.2, we have already done this for g = g1 , so let us try Gröbner
bases on g2 and g3 . We start by computing a Gröbner basis for the ideal I = h f1 , f2 i
with respect to ≺=≺lex and u ≻ v ≻ x ≻ y. We have

f1 = uy − vx − v, f2 = uy − vx + 2v − y

with respect to this monomial order,

S( f1 , f2 ) = f1 − f2 = −3v + y = −g3 , −g3 rem ( f1 , f2 ) = −g3 .

After adding f3 = −g3 to our basis, it turns out that

1
S( f1 , f3 ) rem ( f1 , f2 , f3 ) = uy2 − v2 x − v2 rem ( f1 , f2 , f3 ) = 0,
3
1
S( f2 , f3 ) rem ( f1 , f2 , f3 ) = uy2 − v2 x + 2v2 − vy rem ( f1 , f2 , f3 ) = 0,
3
21.6. Geometric applications 613

–1
0 z

y
0
–1
1 x
0 1
–1

F IGURE 21.6: The twisted cubic.

and { f1 , f2 , f3 } is a Gröbner basis. By Lemma 21.36, we may discard f2 , and using

the process from the proof of Theorem 21.38 for reducing, we obtain

1 1
f4 = f1 rem f3 = uy − xy − y, f3 rem f4 = f3 ,
3 3

and G = { f4 , f3 } = {uy − 13 xy − 13 y, v − 13 y} is the unique reduced Gröbner basis of

I with respect to ≺.
We know already that g3 = − f3 ∈ I. Division with remainder yields g2 rem G =
g2 6= 0, and Theorem 21.28 implies that g2 6∈ I. On the other hand, f4 = 13 yg2
belongs to I, and hence the theorem about the center of gravity is true if y 6= 0.
(See Notes 21.6 for the degenerate case y = 0.)

Implicitization. Let f1 , . . . , fn ∈ F[t1 , . . . ,tm ], and suppose that the affine alge-
braic variety V ⊆ F n is given in parametrized form by
614 21. Gröbner bases

x1 = f1 (t1 , . . . ,tm ),
..
.
xn = fn (t1 , . . . ,tm ),
so that V is “explicitly” described as V = {a ∈ F n : ∃b ∈ F m a = ( f1 (b), . . . , fn (b))}.
The task is now to find polynomials g1 , . . . , gs ∈ F[x1 , . . . , xn ] such that V has the
“implicit” representation V = V (I), where I = hg1 , . . . , gs i. (More precisely, V (I)
will equal the “closure” of V .)

E XAMPLE 21.32 (continued). The twisted cubic C from Example 21.32 can be
parametrized by
x = t, y = t 2 , z = t 3 .
An implicitization for C is g1 = y − x2 , g2 = z − x3 . The curve is illustrated in
Figures 21.5 and 21.6 on pages 609 and 613, respectively. The latter corresponds
to the explicit representation of C (the plot was generated by letting the parameter
t run through the interval [−1, 1]), while the former depicts the implicit form as
the intersection of the two surfaces defined by g1 and g2 ; this picture is somehow
more informative. ✸

E XAMPLE 21.39. Let V ⊆ F 3 be parametrized by x = t 2 , y = t 3 , z = t 4 . Then an

implicitization for V is g1 = z − x2 , g2 = y2 − x3 . ✸

To solve the implicitization problem, we consider J = hx1 − f1 , . . . , xn − fn i ⊆

F[t1 , . . . ,tm , x1 , . . . , xn ], order the variables so that t1 ≻ · · · ≻ tm ≻ x1 ≻ · · · ≻ xn , and
compute a Gröbner basis G for J with respect to ≺=≺lex . Then some g ∈ G will
depend only on x1 , . . . , xn . These are candidates for the gi ’s.
In Examples 21.32 and 21.39, the reduced Gröbner bases for J with respect to
t ≻ z ≻ y ≻ x are {t − x, z − x3 , y − x2 } and {t 2 − x,ty − x2 ,tx − y, z − x2 , y2 − x3 },
respectively.
This approach always works over an infinite field, in the sense that the variety in
n
F defined by the polynomials in G∩F[x1 , . . . , xn ] is the smallest variety containing
the image of the parametrization (see Cox, Little & O’Shea 1997, §3.3).

Solving systems of polynomial equations. This is the most natural application

of Gröbner bases, and they were originally invented for this. In general, we have
polynomials f1 , . . . , fm ∈ F[x1 , . . . , xn ] and want to answer questions about the cor-
responding affine variety V = {a ∈ F n : f1 (a) = · · · = fm (a) = 0}. Examples of such
questions are: Is V 6= Ø? If so, please find a point on V . Given another polynomial
g ∈ F[x1 , . . . , xn ], is g(a) = 0 for all a ∈ V ?
21.6. Geometric applications 615

One of the simplest situations occurs when V is zero-dimensional, so that it

consists only of finitely many isolated points. An example is the intersection of
two plane curves. We saw in Section 6.8 how to use resultants for this problem:
one first finds a polynomial r ∈ F[x] in one variable which the x–coordinates of all
intersection points satisfy, computes all roots of r, and then has to solve, for each
root, the original bivariate equations (or their gcd). As mentioned, this resultant
approach has a (nontrivial) generalization to higher dimensions.
Gröbner bases provide an alternative approach. For a zero–dimensional prob-
lem, the Gröbner basis with respect to lexicographic order ≺=≺lex and x1 ≺
· · · ≺ xn will often include some polynomials only containing x1 , some only with
x1 , x2 , and so on. These can then be solved by back substitution: first for x1 , then
for x2 , and so on.
Application areas such as robot kinematics and motion planning provide many
such interesting geometric problems. We now look at a simple example: the inter-
section of two plane curves. A more elaborate application in chemistry is discussed
in Section 24.4.
We let f = (y2 + 6)(x − 1) − y(x2 + 1) and g = (x2 + 6)(y − 1) − x(y2 + 1) =
f (y, x) in C[x, y], as in Example 21.3 (ii), and use Gröbner bases to determine all
intersection points of the two plane curves V ( f ) and V (g), or equivalently, all
common roots of the system of polynomial equations f = g = 0. We start by com-
puting the reduced Gröbner basis G of I = h f , gi with respect to the lexicographic
order ≺=≺lex and x ≺ y. We find that G consists of f , g, and the three polynomials

h = − f − g = x2 − 5x + y2 − 5y + 12, p = y2 x − 5yx + 6x + y3 − 6y2 + 11y − 6,

q = y4 − 6y3 + 15y2 − 26y + 24.

The last polynomial contains only the variable y, and we determine all its roots
symbolically by factoring it over Q. Using one of the algorithms from Part III, we
obtain q = (y − 2)(y − 3)(y2 − y + 4), and hence
n √
1 ± 15i o
V (q) = 2, 3, .
2
It remains to determine which of these partial solutions extend to a solution in
V (h, p, q) = V (I). Substituting y = 2 in h and p, we find that p(x, 2) vanishes
identically, and from the solutions of the univariate equation

h(x, 2) = x2 − 5x + 6 = (x − 2)(x − 3) = 0,

for x, we find the two common zeroes (3, 2) and (3, 3) of f and g in C 2 . The
other four intersection points √
from (2) can be obtained in a similar fashion, by
substituting y = 2 or y = (1 ± 15i)/2 in h = p = 0 and solving for x.
616 21. Gröbner bases

21.7. The complexity of computing Gröbner bases

For over thirty years after its publication in 1965, the cost of computing Gröbner
bases remained a mystery. In this section, we briefly recount its solution by Mayr
and his collaborators, and some remaining open questions. The limited space of
this text prevents us from giving any details.
Experiments with implementations almost invariably ran into the dreaded phe-
nomenon of intermediate expression swell. Starting out with a few polynomials
of low degree and with small coefficients, the systems produce huge numbers of
polynomials with very large degrees and enormous coefficients. In fact, the solu-
tion tells us that this is so by necessity, at least in the worst case. This should be
contrasted with the situation for Euclid’s algorithm in Q[x], where also intermedi-
ate expression swell was empirically observed, but then kept under control, using
subresultants and clever algorithmic variations (Chapter 6).
We need some notions from complexity theory, as explained in Section 25.8. We
have “feasible” classes like P and BPP , the infeasible NP -complete problems
(assuming P 6= NP ), and much “higher up” the class EX PSPA CE of problems
that can be solved using exponential space. The EX PSPA CE -complete prob-
O(n)
lems are exceedingly difficult; they typically use doubly exponential time 22 ,
for some inputs of size n at least.
We write IM for the polynomial ideal membership problem over Q. Already
Mayr & Meyer (1982) had shown that IM is EX PSPA CE -hard for general ideals.
Mayr (1989, 1992) proved that IM is in EX PSPA CE , and thus EX PSPA CE -
complete. Since Theorem 21.28 provides a reduction from IM to computing a
Gröbner basis, the (decision version of the) latter problem is also EX PSPA CE -
hard. The question was finally settled by Mayr (1989) and Kühnle & Mayr (1996),
who gave an exponential space algorithm for computing a (reduced) Gröbner basis.

T HEOREM 21.40.
The problem of finding a reduced Gröbner basis is EX PSPA CE -complete.

The same results hold for binomial ideals (generated by binomials xα − xβ ),

and for some other types of fields besides Q. Bürgisser (1998) proved that over
any infinite field, the ideal membership problem requires exponential parallel time.
For homogeneous ideals (generated by polynomials f each of whose terms has
total degree deg f ), Mayr (1995) proved that IM is PSPA CE -complete, while
computing a Gröbner basis is EX PSPA CE -complete.
The upper bounds mentioned make use of estimates of certain degrees. Ideal
membership was first proven to be decidable by Hilbert (1890). Hermann (1926)
gave a constructive method to represent any f ∈ h f1 , . . . , fs i as a linear combination

f= ∑ qi fi . (12)
1≤i≤s
Notes 617

She gave a doubly exponential bound on the degrees of these qi ’s; see also Mayr &
Meyer (1982). Mayr & Ritscher (2010) showed that the degrees of the polynomials
in a reduced Gröbner basis for an ideal in F[x1 , . . . , xn ] of dimension r are at most
d n−r 2r
2 +d , (13)
2
when deg fi ≤ d for all i. This bound does not depend on the number of poly-
nomials defining the ideal nor on their coefficients, depends polynomially on the
degree, exponentially on the codimension and doubly exponentially on the dimen-
sion. They also show a lower bound of a similar form.
If an ideal is such that the degrees in its reduced Gröbner basis come close to
the doubly exponential bound (13), one might think that the output cannot even be
written down in exponential space, and similarly for the certificate (12) of ideal
membership. However, the model of space-bounded computations is such that one
may take doubly exponential time to write a result of doubly exponential length
on a special output tape, all the while using only singly exponential work space
(Section 25.8).
Hilbert’s famous Nullstellensatz says the following. If F is algebraically closed
(say, F = C), f , f1 , . . . , fs ∈ F[x1 , . . . , xn ], and f (a) = 0 for all a ∈ F n with f1 (a) =
· · · = fs (a) = 0, then there exists e ∈ N with f e ∈ h f1 , . . . , fs i. In particular, the
variety V ( f1 , . . . , fs ) is empty if and only if 1 ∈ h f1 , . . . , fs i. If this is the case, then
1 will appear in any Gröbner basis of f1 , . . . , fs . This provides a test of whether
V ( f1 , . . . , fs ) is empty. For this particular instance of IM better results are available:
one can always choose e and the degrees in (12) to be simply exponential, and this
implies that the problem is in PSPA CE .
The worst-case cost of Buchberger’s algorithm is still unknown today, but the
important result of Theorem 21.40 completely settles the question of the “best”
worst-case cost of any algorithm for Gröbner bases, and provides a lower bound
for Buchberger’s. It gives rise to the pessimistic view that these methods for poly-
nomial ideals are not useful in practice, except for rather small cases.
However, it is not the full story. The inputs used for the lower bound are more
combinatorial than geometric in nature, while most of the problems that people
try to solve derive from geometric tasks. The algorithm of Kühnle & Mayr (1996)
uses essentially the same time for all polynomials of a given degree and number of
variables and thus is uniformly impractical, while one might hope that “natural”
geometric problems are easier to solve than “combinatorial” ones.

Notes. 21.1. The papers in Eisenbud & Robbiano (1993) present the state of the art at
that time. A good reference for this chapter is Cox, Little & O’Shea (1997), which we have
followed closely in our exposition. Cox, Little & O’Shea (1998) discuss more advanced
topics.
618 21. Gröbner bases

21.3. With the passage of time, the proof of Hilbert’s basis theorem has become quite
simple. But his 1890 paper was a milestone, solving this long–standing open question, and
furthermore introducing the Hilbert function of an ideal and showing that invariant rings
are finitely generated.
21.5. Buchberger (1965,1970,1976,1985,1987) explains his Gröbner basis method, gives
many references, and puts it in the context of more general questions about grammars and
term writing systems. Two further important contributions of Bruno Buchberger are the
founding of the Journal of Symbolic Computation in 1985 and of the Research Institute for
Symbolic Computation (RISC) in Linz, Austria.
21.6. Buchberger & Winkler (1998) contains a variety of tutorials on applications of Gröb-
ner bases. With slight modifications, the approach described in the text works for a rich
class of geometric theorems (see Cox, Little & O’Shea 1997, §6.4, and Wu 1994).
In Section 21.6, we found that g2 6∈ I and yg2 ∈ I. Thus we may conclude that g2 (x, y) =
0 if (x, y) ∈ V (I) and y 6= 0. This can be phrased as an ideal membership property via
Rabinowitsch’s trick (1930) of adding 1 − yz to I, where z is a new indeterminate. This
ensures that the value of y is nonzero, and g2 = 3z · f4 + g2 · (1 − yz) ∈ h f3 , f4 , 1 − yzi.
The reverse question of transforming an implicit representation of a variety into an ex-
plicit one has in general no solution; it can be solved only for the “rational varieties” of
genus zero. However, it can be solved “near a smooth point” in general, if we allow ap-
propriate power series for the fi ’s, as in the implicit function theorem of calculus. The
state of the art about algorithms on parametric varieties is presented in the twelve articles
of a Special Issue of the Journal of Symbolic Computation (Hoffman, Sendra & Winkler
1997).
It can be shown that for zero-dimensional ideals I ∈ F[x1 , . . . , xn ], so that V (I) is finite,
a Gröbner basis with respect to lexicographic order x1 ≺ x2 ≺ · · · ≺ xn always contains a
“triangular” subset g1 , . . . , gn , such that gi ∈ F[x1 , . . . , xi ] and lm(gi ) is a power of xi , for
1 ≤ i ≤ n (see, for example, Becker & Weispfenning (1993), Theorem 6.54, or Cox, Little
& O’Shea (1997), §3.1 and 3.2).
21.7. Yap (1991) gave an improvement of Mayr & Meyer’s result. Brownawell (1987)
and Kollár (1988) proved that one can always choose the Nullstellensatz exponent e to
be simply exponential; see also Amoroso (1989). Caniglia, Galligo & Heintz (1989),
Lakshman (1990), and Berenstein & Yger (1990) showed that in some important cases
the qi ’s in (12) have only singly exponential degree: for zero-dimensional varieties, and
for complete intersections.
Giusti (1984), Möller & Mora (1984), Bayer & Stillman (1988), and Dubé (1990)
proved upper bounds for the elements of a reduced Gröbner basis. Huynh (1986) showed
a lower bound on the number and degrees of polynomials in Gröbner bases. Mayr (1997)
gives a survey of complexity results, more references, and discusses applications such as
those in Section 24.2.
Bayer & Stillman (1988) considered an invariant m associated to any multivariate ideal,
called the Castelnuovo-Mumford regularity . This number seems to be fairly small for
many natural geometric problems, but is exponential in the number of variables for the
combinatorial problems of Mayr & Meyer (1982). Furthermore, Bayer & Stillman prove
that, after a generic change of coordinates, the polynomials in a Gröbner basis with respect
to the graded reverse lexicographic order ≺grevlex have degree at most m. This gives rise to
a bit of hope that the method might be able to deal successfully with interesting geometric
problems, and to a practical recommendation in favor of ≺grevlex .
Exercises 619

Almost all computer algebra systems contain some routines for Gröbner bases; Dave
Bayer’s system M ACAULAY focuses particularly on this problem, and S INGULAR is an-
other powerful package in this area. The research projects P OSSO and F RISCO of the Eu-
ropean Community have produced substantial software and a library of benchmark prob-
lems. Efficient algorithms and software are a highly active area of research. Three topics
of particular interest are modular algorithms (Traverso 1988), selection strategies for S-
polynomials (Giovini, Mora, Niesi, Robbiano & Traverso 1991), and conversion between
Gröbner bases with respect to different orders (Faugère, Gianni, Lazard & Mora 1993).
Finally, we mention that there are other methods of dealing with certain geometric prob-
lems, based on cylindrical algebraic decomposition (Collins 1975), elimination theory, also
using arithmetic circuits as a data structure (Chistov & Grigor’ev 1984, Caniglia, Galligo
& Heintz 1988, Fitchas, Galligo & Morgenstern 1990, Giusti & Heintz 1991), u-resultants
(Macaulay 1902, 1916, 1922, Canny 1987), and characteristic sets (Ritt 1950, Wu 1994,
Gallo & Mishra 1991).
The important subject of computational real algebraic geometry started with Tarski
(1948). Major progress was made by Collins (1975), and later by Ben-Or, Kozen & Reif
(1986), Fitchas, Galligo & Morgenstern (1987), Grigor’ev (1988), Canny (1988), Renegar
(1992a, 1992b, 1992c), and others. See the surveys of Heintz, Recio & Roy (1991) and
Renegar (1991) for references and applications; our cyclohexane example in Section 24.4
can be viewed as such an application.

Exercises.
21.1 Let F be a field and x, y indeterminates. Prove that the two ideals hx, yi and hgcd(x, y)i in
F[x, y] are distinct, and conclude that F[x, y] is not Euclidean. Hint: Exercise 3.17.
21.2 Let F be a field. Prove that the ideals I = hx + xy, y + xy, x2 , y2 i and J = hx, yi in F[x, y] are
identical. Your proof should also work if char F = 2. Hint: It is sufficient to prove that the generators
of each ideal belong to the other ideal.
21.3 Prove (3). Hint: Theorem 4.11.
21.4∗ Besides the usual Cartesian coordinates (u, v) with u, v ∈ R, we represent the points of the
plane by polar coordinates (r, ϕ) with r ∈ R and 0 ≤ ϕ < 2π. This representation is not unique;
for example, when ϕ < π then (r, ϕ) and (−r, ϕ + π) represent the same point. We obtain the polar
coordinates from the Cartesian ones by the formulas u = r cos ϕ, and v = r sin ϕ. Now consider the
curve C = {(r, ϕ) : 0 ≤ ϕ < 2π and r = sin 2ϕ} ⊆ R 2 , and let I = h(x2 + y2 )3 − 4x2 y2 i ⊆ R[x, y].
(i) Create a plot of C.
(ii) Using the addition formulas for sine and cosine, show that C ⊆ V (I).
(iii) Prove that also the reverse inclusion V (I) ⊆ C holds (be careful with the signs).
21.5∗ Let F be a field and n ∈ N. For a subset M ⊆ F n , we define the ideal of M by

I(M) = { f ∈ F[x1 , . . ., xn ]: f (u) = 0 for all u ∈ M}.

(i) Show that I(M) is in fact an ideal.

(ii) Prove that N ⊆ M =⇒ I(N) ⊇ I(M) and V (I(M)) ⊇ M for all M, N ⊆ F n .
(iii) Prove that I(M) is radical, which means that f m ∈ I implies f ∈ I for all m ∈ N and f ∈
F[x1 , . . ., xn ].
(iv) Let P = (u, v) ∈ F 2 be an arbitrary point. Determine polynomials f1 , f2 ∈ F[x, y] such that
I({P}) = h f1 , f2 i. Find a set M ⊆ F 2 such that I(M) = hx, yi.
620 21. Gröbner bases

(v) Determine I(∅) and I(F n ). Hint: Start with n = 1.

(vi) Let M = {(u, 0) ∈ R 2 : 0 ≤ u ≤ 1} ⊆ R 2 , and find a polynomial f ∈ R[x, y] with I(M) = h f i.
21.6 Let f = 2x4 y2 z − 6x4 yz2 + 4xy4 z2 − 3xy2 z4 + x2 y4 z − 5x2 yz4 in Q[x, y, z].
(i) Determine the order of the monomials in f for the three monomial orders ≺lex , ≺grlex , and
≺grevlex , with x ≻ y ≻ z in all cases.
(ii) For each of the three monomial orders from (i), determine mdeg( f ), lc( f ), lm( f ), and lt( f ).
21.7 Let < be an order on a set S. Prove that α < β implies β 6< α for all α, β ∈ S.
21.8 Show that 0 = (0, . . ., 0) ∈ N n is the smallest element with respect to a monomial order ≺ :
0 = min≺ N n .
21.9−→ (i) Prove that ≺lex is a monomial order.
(ii) Let ≺ be a total order on R, not necessarily a well–order. Prove that the graded variant of ≺
defined by

α ≺gr β ⇐⇒ ∑ αi < ∑ βi or ∑ αi = ∑ βi and α ≺ β
1≤i≤n 1≤i≤n 1≤i≤n 1≤i≤n

is a monomial order, and conclude that so is ≺grlex .

21.10 How many monomials in variables x1 , . . ., xn have total degree m?
21.11 Prove Lemma 21.8.
21.12 Prove Theorem 21.12.
21.13 Let n ∈ N and α = (α1 , . . ., αn ) ∈ N n . Determine the number of elements β = (β1 , . . ., βn ) ∈
N n such that βi ≤ αi for 1 ≤ i ≤ n.
21.14 Show that the set B from the proof of Dickson’s lemma is the smallest subset of A (with
respect to inclusion) such that hxB i = hxA i.
21.15 Show that for each n ∈ N there exists a monomial ideal I ⊆ Q[x, y] such that every basis of I
has at least n elements.
21.16 Show that G from Example 21.32 is not a Gröbner basis with respect to ≺lex and x ≻ y ≻ z.
21.17 Let F be a field, R = F[x1 , . . ., xn ], and f1 , . . ., fs ∈ R. Prove that { f1 , . . ., fs } is a Gröbner
basis if f rem ( f1 , . . ., fs ) = 0 for all f ∈ I = h f1 , . . ., fs i.
21.18 Prove Lemma 21.36.
21.19 Let G be a Gröbner basis for the ideal I ⊆ F[x1 , . . ., xn ], where F is a field. Prove that 1 ∈ I if
and only if G contains a nonzero constant from F. Conclude that G = {1} if 1 ∈ I and G is reduced.
21.20 (i) Compute a Gröbner basis for the ideal I = hx2 + y − 1, xy − xi ⊆ Q[x, y] with respect to
≺=≺lex and x ≻ y, using Buchberger’s algorithm. If your basis is not minimal, then use Lemma
21.18 to compute a minimal one.
(ii) Which of the following polynomials belong to I: f1 = x2 + y2 − y, f2 = 3xy2 − 4xy + x + 1?
21.21 Which of the following finite subsets of Q[x, y, z] are Gröbner bases with respect to ≺=≺lex ?
Which are minimal or even reduced?
(i) {x + y, y2 − 1} for x ≻ y,
(ii) {y + x, y2 − 1} for y ≻ x,
(iii) {x2 + y2 − 1, xy − 1, x + y3 − y} for x ≻ y,
(iv) {xyz − 1, x − y, y2 z − 1} for x ≻ y ≻ z.
Exercises 621

21.22 Let F be a field and { f1 , . . ., fs } and {g1 , . . ., gt } in R = F[x1 , . . ., xn ] be minimal Gröbner

bases of the same ideal I ⊆ R, with f1 4 · · · 4 fs and g1 4 · · · 4 gt . Prove that s = t and lt( fi ) = lt(gi )
for all i.
21.23−→ Compute a Gröbner basis for

h f1 = x2 y − 2yz + 1, f2 = xy2 − z2 + 2x, f3 = y2 z − x2 + 5i ⊆ Q[x, y, z],

using ≺=≺grlex with x ≺ y ≺ z. Compare your output to the Gröbner basis that M APLE computes
with a different order.
21.24∗ Let F be a field, n ∈ N, and A = (ai j )1≤i, j≤n ∈ F n×n a square matrix. Moreover, let GA =
{∑1≤ j≤n ai j x j : 1 ≤ i ≤ n} ⊆ F[x1 , . . ., xn ] be the set of linear polynomials corresponding to the rows
of A and IA = hGA i. Then V (GA ) = V (IA ) is equal to ker A, the set of solutions v ∈ F n of the linear
system Av = 0. Prove:
(i) ILA = IA if L ∈ F n×n is nonsingular.
(ii) Assume that there exists a nonsingular matrix L ∈ F n×n such that

Ir V
U = LA = ,
0 0

where r is the rank of A, Ir is the r × r identity matrix, and V ∈ F r×(n−r) (this means that no column
exchange is necessary when applying Gaussian elimination to A). Prove that GU is a reduced Gröbner
basis of IA with respect to any monomial order ≺ such that x1 ≻ x2 ≻ · · · ≻ xn .
(iii) What is the reduced Gröbner basis of IA if A is nonsingular, with respect to an arbitrary
monomial order?
21.25∗ You are to solve the following nonlinear optimization problem: Determine all maxima and
minima of the polynomial f = x2 y − 2xy + y + 1 on the unit circle S = {(u, v) ∈ R 2 : g(u, v) = 0},
where g = x2 + y2 − 1. In numerical analysis, such a problem is solved with the aid of Lagrange
multipliers: if ∇ f = ( fx , fy ) and ∇g = (gx , gy ) are the Jacobians of f and g, respectively, where
fx = ∂ f /∂x and fy , gx , gy , are defined analogously, then the equality ∇ f = λ∇g holds at a local
maximum or minimum of f on S for some λ ∈ R.
(i) Set up the system of polynomial equations

g = 0 and ∇ f − z∇g = 0, (14)

where z is another indeterminate.

(ii) Let I ⊆ Q[x, y, z] be the ideal generated by the equations in (14). Compute a Gröbner basis G
of I with respect to ≺=≺lex and z ≻ x ≻ y.
(iii) Solve the system of polynomial equations corresponding to G, which is equivalent to (14),
by back substitution. Check that all solutions are in fact solutions of (14). Which are the absolute
maxima and minima of f on the unit circle?
(iv) Generate a plot of the values of f on S.
21.26 Find an example over F5 (which is not algebraically closed) where the statement of Hilbert’s
Nullstellensatz is false.
21.27 Prove Hilbert’s Nullstellensatz for univariate polynomials over C.
Schönheit, höre ich Sie da fragen; entfliehen nicht die Grazien,
wo Integrale ihre Hälse recken.1
Ludwig Boltzmann (1887)

Es dürfte richtig sein, zu sagen, daß die Begriffe Differentialquotient

und Integral, deren Usprung jedenfalls bis auf Archimedes zurückgeht,
dem Wesen der Sache nach durch die Untersuchungen von Kepler,
Descartes, Cavalieri, Fermat und Wallis in die Wissenschaft
eingeführt worden sind. [. . . ] Sie hatten noch nicht bemerkt,
daß Differentiation und Integration inverse Operationen sind.
Diese kapitale Entdeckung gehört Newton und Leibniz.2
Marius Sophus Lie (1895)

Common integration is only the memory of differentiation

[. . . ] the artifices by which it [integration] is effected, are changes,
not from the unknown to the known, but from the forms
in which memory will not serve us to those in which it will.
Augustus De Morgan (1844)

He who can digest a second or third Fluxion,

a second or third Difference, need not, methinks,
be squeamish about any Point in Divinity.
George Berkeley (1734)

The Integration of Quantities seems to merit farther labour and

research; and no doubt this important and abstruse branch will,
by and by, obtain due consideration,
and we shall have important simplifications of,
and additions to, our already large stock of knowledge.
William Shanks (1853)

1 Beauty, I hear you ask; do not the Graces flee where integrals stretch forth their necks?
2 It may be said that the notions of differential quotient and integral, whose origin certainly goes back to
Archimedes, were essentially introduced into science by the investigations of Kepler, Descartes, Cavalieri, Fer-
mat, and Wallis. [. . . ] They had not yet noticed that differentiation and integration are inverse operations; this
capital discovery belongs to Newton and Leibniz.
22
Symbolic integration

The basic task in this chapter is, given an “expression”

R R
f , say f ∈ F(x), where F is
a field, to compute the indefinite integral f = f (x)dx, that is, another “expres-
sion” (possibly in a larger domain) g with g′ = f , where ′ denotes differentiation
with respect to the variable x. “Expressions” are usually built from rational func-
tions and “elementary functions” such as sin, cos, exp, log, etc. (Since it is more
common, we denote the natural (base e) logarithm by “log” instead of “ln” in this
chapter.) Such integrals need not exist: Liouville’s (1835) theorem implies that
exp(x2 ) has no integral involving only rational functions, sin, cos, exp, and log.
A practical approach to the symbolic integration problem is to use a plethora
of formulas for special functions, tricks from basic calculus like substitutions and
integration by parts, and table lookups. There are projects that load the whole
contents of existing printed integral tables into computer algebra systems, using
optical character recognition, and modern computer algebra systems can solve
practically all integration exercises in calculus textbooks. In the following, we
discuss a systematic algorithm in the case of rational and “hyperexponential” func-
tions as integrands. This approach can be extended—with plenty of new ideas and
techniques—to more general functions, but we do not pursue this topic further.

22.1. Differential algebra

A remarkable fact about symbolic integration is that the theory can be set up in a
purely algebraic way, without any use of limit processes. We are familiar with this
approach from Chapter 9, where we used a formal Taylor expansion for Newton
iteration.

D EFINITION 22.1. We let R be an integral domain of characteristic zero, and

D: R −→ R such that the following hold for all f , g ∈ R.
(i) D( f + g) = D( f ) + D(g),
(ii) D( f g) = D( f )g + f D(g) (Leibniz rule).

623
624 22. Symbolic integration

Then D is called a differential operator (or derivation, or derivative), and (R, D)

is a differential algebra (or differential field, if R is in fact a field).
The set R0 = {c ∈ R: D(c) = 0} is the ring of constants of (R, D). If f , gR ∈ R
are such that D( f ) = g, then we say that f is an integral of g and write f = g.
R
The notation f = g is merely a relation and not a function since, as in calculus,
we may add an arbitrary constant c ∈ R0 to f to get another integral of g.

L EMMA 22.2. In a differential algebra (R, D), the usual properties hold for all
f , g ∈ R:

(i) D(1) = 0,
(ii) D is R0 –linear: D(a f + bg) = aD( f ) + bD(g) for a, b ∈ R0 ,

f D( f )g − f D(g)
(iii) D = if g is a unit,
g g2
(iv) D( f n ) = n f n−1 D( f ) for n ≥ 1,
R R
(v) ( f D(g)) = f g − (D( f )g) (integration by parts).

See Exercise 22.2 for a proof.

E XAMPLE 22.3. (i) D(a) = 0 for all a ∈ R. This is the trivial derivative on R,
with R0 = R.
(ii) R = Q(x), D(x) = 1, and D(a) = 0 for all a ∈ Q. This gives the usual deriva-
tive: D(∑i fi xi ) = ∑i i fi xi−1 when all fi are rational numbers. Here, R0 = Q (Exer-
cise 22.4), and polynomials are easily integrated as
Z
fi
∑ fixi = ∑ i + 1 xi+1 . ✸
i i

In what follows, we write f ′ instead of D( f ) for the usual derivative on F(x),

where F is a field of characteristic zero.

L EMMA 22.4. The rational function 1/x ∈ Q(x) has no rational integral:
1
∀ f ∈ Q(x) f ′ 6= .
x

See Exercise 22.5 for a proof. The lemma motivates the need for domain exten-
sions when looking for integrals: the usual derivation on Q(x) is not surjective, so
that we need logarithms.
22.2. Hermite’s method 625

D EFINITION 22.5. Let (L, D) be a differential field, K ⊆ L a subfield, f ∈ L, and

u ∈ K with D( f ) = D(u)/u. Then we write f = log u and say that f is logarithmic
over K .

We note that log is in general a relation, not a function, since we may add an
arbitrary constant to f and get another integral of D(u)/u.

E XAMPLE 22.6. (i) We let u = x ∈ K = Q(x), L = Q(x, Zf ), where f is a new

1
indeterminate, and f ′ = u′ /u = 1/x, so that f = log x. Then = log x.
√ √ x
Z √ √
1 2 2
(ii) Similarly, 2
= log(x − 2) − log(x + 2) in the differential
√ x −√ 2 4 √ 4
field Q( 2)(x, log(x − 2), log(x + 2)).
Z
1 1 1 1
(iii) = log x − log(x + i) − log(x − i) = log x − log((x + i)(x − i))
x3 + x 2 2 2
1
= log x − log(x2 + 1) ∈ Q(x, log x, log(x2 + 1)). ✸
2

22.2. Hermite’s method

Let F be a field of characteristicR zero, f , g ∈ F[x] nonzero and coprime, and sup-
pose that we want to compute ( f /g). The idea is to find first a, b, c, d ∈ F[x]
with Z Z
f c a
= + , (1)
g d b
deg a <R
deg b, and b squarefree. The rational function c/d is called the rational
part, (a/b) the logarithmic part of the integral; we deal with the latter in the
next section.
First, the polynomial part is separated from f /g by performing one division
with remainder. This yields q, h ∈ F[x] with f = qg + h and deg h < deg g, so that
f /g = q + h/g, and h/g is a proper rational function. Integration of the polynomial
q is of course trivial.
We compute the squarefree decomposition of g, that is, g1 , . . . , gm ∈ F[x] monic,
squarefree and pairwise coprime with gm 6= 1 and g = g1 g22 · · · gm m (Section 14.6).
Next, we calculate the corresponding partial fraction decomposition
h hi j
= ∑ ∑ j, (2)
g 1≤i≤m 1≤ j≤i gi

with all hi j ∈ F[x] of smaller degree than gi (see Section 5.11).

R
Now comes the Hermite reduction process. Let i ∈ {1, . . . , m}. We reduce
hi j /gij to a sum of a rational function plus an integral of the same form, but with
626 22. Symbolic integration

the index j lowered by 1, from j = i down to j = 2. Using that gcd(gi , g′i ) = 1

since gi is squarefree, we can compute s,t ∈ F[x] such that sgi + tg′i = hi j and
deg s, degt < deg gi , by Theorem 4.10. Then
Z Z Z Z Z
hi j s t · g′i s −t t′
= + = + +
gij gij−1
gij j−1
gi j−1
( j − 1)gi ( j − 1)gij−1
Z (3)
s + t ′ /( j − 1) −t
= + ,
gij−1 ( j − 1)gij−1

using integration by parts. Adding s + t ′ /( j − 1) to hi, j−1 and proceeding recur-

sively leads to the following theorem. We use the multiplication time M as de-
scribed on the inside back cover.

T HEOREM 22.7 Hermite reduction.

The problem of integrating rational functions over a field F of characteristic zero
can be reduced to integrating rational functions with squarefree denominator in
softly linear time. More specifically, for nonzero polynomials f , g ∈ F[x] of degree
at most n, we can compute polynomials q, c, d, a1 , . . . , am ∈ F[x] and squarefree and
pairwise coprime polynomials g1 , . . . , gm ∈ F[x] such that
f c ′ a am
1
= q+ + +···+ ,
g d g1 gm
deg q ≤ n, deg c < deg d < n, deg ai < deg gi for all i, and ∑1≤i≤m deg gi < n, using
O(M(n) log n) operations in F .

P ROOF. Splitting off the polynomial part is one division with remainder, taking
O(M(n)) operations in F. The squarefree decomposition of g can be computed us-
ing O(M(n) log n) operations, by Theorem 14.23. Using fast Chinese remaindering
(Section 10.3), the partial fraction decomposition (2) can again be computed with
O(M(n) log n) arithmetic operations (Exercise 10.18). Let di = deg gi for all i. To
analyze the cost of Hermite reduction, let 1 ≤ j ≤ i ≤ m. Then one Hermite step (3)
takes O(M(di ) log di ) operations for computing s and t, plus O(di ) operations for
updating hi, j−1 , in total O(i M(di ) log di ) operations per gi . Now

∑ i M(d i ) log di ≤ (log n)M ∑ i ≤ M(n) log n,
id
1≤i≤m 1≤i≤m

since ∑1≤i≤m idi = deg g ≤ n, and the claim follows. ✷

It is sufficient to use the fast Extended Euclidean Algorithm only once per gi to
compute s∗ ,t ∗ ∈ F[x] such that s∗ gi +t ∗ g′i = 1, and then for each j the polynomials
s,t can be obtained from s∗ ,t ∗ and hi j as described in Section 4.5, using only
22.3. The method of Lazard, Rioboo, Rothstein, and Trager 627

O(M(di )) arithmetic operations. This does not affect the asymptotic time bound
but is a practical improvement.
A different approach, due to Horowitz (1971), is the method of undetermined
coefficients. Splitting off the polynomial part if necessary, we may assume that
deg f < deg g. All denominators arising via (3) outside an integral divide the prod-
uct g2 g23 · · · gm
m−1
, and hence so does the denominator d in (1). Thus we may take
the latter polynomial as d, the squarefree part (Section 14.6) of g as b, and plug
a of degree deg b − 1 and c of degree deg g − deg b − 1 with unknown coefficients
into (1). This yields a linear system of equations for the coefficients of a and c,
which has a coefficient matrix with at most n rows and columns, and it can be
solved in time O(n3 ) using Gaussian elimination. Hermite reduction, however, is
asymptotically faster by nearly two orders of magnitude.

22.3. The method of Lazard, Rioboo, Rothstein, and Trager

R
It remains to compute (a/b) for nonzero coprime polynomials a, b ∈ F[x] with b
squarefree. Many calculus texts provide the following solution. Let K be the split-
ting field of b over F, that is, the smallest algebraic field extension of F containing
all roots of b. Then b = ∏1≤k≤n (x − λk ), with distinct λ1 , . . . , λn ∈ K. There exist
c1 , . . . , cn ∈ K such that a/b = ∑1≤k≤n ck /(x − λk ) is the partial fraction decompo-
sition of the rational function a/b, and the integral is
Z
a
= ∑ ck log(x − λk ) ∈ K(x, log(x − λ1 ), . . . , log(x − λn )).
b 1≤k≤n

E XAMPLE 22.6 (continued). The integral in (iii) has been expressed without any
algebraic extension of the field of constants Q, while it is not clear how to do that
with the integral in (ii). In fact, from what we will prove below, it is impossible to
write the latter integral as a sum of logarithms with rational arguments. ✸

This example shows that it may be unwise to compute the integral using the
complete partial fraction decomposition. The following method computes the in-
tegral with as small an algebraic extension of the field of constants as possible.

T HEOREM 22.8 Rothstein’s and Trager’s integration algorithm.

Let a, b ∈ F[x] be coprime with deg a < deg b = n, and b monic and squarefree.
If E is an algebraic extension of F , c1 , . . . , cl ∈ E \ {0} are pairwise distinct, and
v1 , . . . , vl ∈ E[x] \ E are monic, squarefree, and pairwise coprime, then the follow-
ing are equivalent:
Z
a
(i) = ∑ ci log vi .
b 1≤i≤l
628 22. Symbolic integration

(ii) The polynomial r = resx (b, a − yb′ ) ∈ F[y] splits over E in linear factors,
c1 , . . . , cl are precisely the distinct roots of r, and vi = gcd(b, a − ci b′ ) for
1 ≤ i ≤ l . Here, resx denotes the resultant with respect to the variable x
(Chapter 6).

P ROOF. (i) =⇒ (ii): Differentiating (i) yields

a v′
= ∑ ci i ,
b 1≤i≤l vi

or equivalently,
a· ∏ vj = b · ∑ ci ui v′i ,
1≤ j≤l 1≤i≤l

where ui = ∏1≤ j≤l, j6=i v j . We claim that b = ∏1≤ j≤l v j and a = ∑1≤i≤l ci ui v′i .
Since a and b are coprime, b divides ∏1≤ j≤l v j . On the other hand, v j divides
b ∑1≤i≤l ci ui v′i and v j | ui for i 6= j, whence v j | b · c j u j v′j . But c j ∈ E is nonzero,
gcd(v j , u j ) = 1, and gcd(v j , v′j ) = 1, so that v j | b for 1 ≤ j ≤ l. By the relative
primality of the v j , we also have ∏ j v j | b. This implies that b = ∏1≤ j≤l v j , since b
and all the v j are monic, and a = ∑ ci ui v′i , as claimed.
Now Lemma 14.22 yields

′ v j if c = c j for some j ∈ {1, . . . , l},
gcd(b, a − cb ) =
1 otherwise,

and for any c ∈ E

c ∈ {c1 , . . . , cl } ⇐⇒ gcd(b, a − cb′ ) 6= 1 ⇐⇒ resx (b, a − cb′ ) = 0 ⇐⇒ r(c) = 0,

by Lemma 6.25. Thus r splits over E, and {c1 , . . . , cl } are precisely the distinct
roots of r.
(ii) =⇒ (i): Let K be the splitting field of b over E, and λ1 , . . . , λn ∈ K pairwise
distinct with b = ∏1≤k≤n (x − λk ). Since b is squarefree, b′ (λk ) 6= 0 for 1 ≤ k ≤ n.
For c ∈ K, we have

r(c) = 0 ⇐⇒ resx (b, a − cb′ ) = 0 ⇐⇒ gcd(b, a − cb′ ) 6= 1

⇐⇒ ∃k ∈ {1, . . . , n} (a − cb′ )(λk ) = 0
a(λk )
⇐⇒ ∃k ∈ {1, . . . , n} c= ′ .
b (λk )

We write dk = a(λk )/b′ (λk ) ∈ K for 1 ≤ k ≤ n. Then every root c j of r is equal to dk

for some k ∈ {1, . . . , n}, and conversely every dk is a root of r, so that {c1 , . . . , cl } =
{d1 , . . . , dn }. In particular, we have l ≤ n. The values d1 , . . . , dn need not be distinct.
22.3. The method of Lazard, Rioboo, Rothstein, and Trager 629

We now claim that v j has precisely those λk as zeroes where dk = c j . Let

σ : {1, . . . , n} −→ {1, . . . , l} be such that dk = cσ(k) for all k. Then σ is surjective,

(a − cσ(k) b′ )(λk ) = (a − dk b′ )(λk ) = 0

for 1 ≤ k ≤ n, and vσ(k) (λk ) = 0 since (x − λk ) divides gcd(b, a − cσ(k) b′ ) = vσ(k) .

Conversely, if v j (λk ) = 0 for some j, then also (a − c j b′ )(λk ) = 0, so that c j = dk
and j = σ (k). Thus
vj = ∏ (x − λk ),
1≤k≤n, σ (k)= j

and b = ∏1≤ j≤l v j .

Now let ui = ∏1≤ j≤l, j6=i v j for 1 ≤ i ≤ l, as before. Then

′ ′
c
∑ iii ku v ( λ ) = (c u v
σ (k) σ (k) σ (k) )( λ k ) = cσ (k) ∑ ui v′i (λk )
1≤i≤l 1≤i≤l
= dk b′ (λk ) = a(λk )

for 1 ≤ k ≤ n. Both a and ∑1≤i≤l ci ui v′i have degrees less than n and interpolate the
same values at the n points λ1 , . . . , λn , and thus are equal. Hence
′ v′i 1 a
∑ ci log vi = ∑ vi = b · ∑ ci ui v′i = b . ✷
ci
1≤i≤l 1≤i≤l 1≤i≤l

E XAMPLE 22.6 (continued). Let a = 1 and b = x3 + x, as in Example 22.6 (iii).

Then
 
1 0 −3y 0 0
 0 1 0 −3y 0 
3 2
 

r = resx (x + x, 1 − y(3x + 1)) = det  1 0 −y + 1 0 −3y  
 0 1 0 −y + 1 0 
0 0 0 0 −y + 1
= −4y3 + 3y + 1 = −(2y + 1)2 (y − 1),

and its zeroes are c1 = −1/2 and c2 = 1. Thus

1 2 3 2
3
v1 = gcd(x + x, 1 + (3x + 1)) = gcd x + x, (x + 1) = x2 + 1,
3
2 2
v2 = gcd(x + x, 1 − (3x + 1)) = gcd(x + x, −3x2 ) = x.
3 2 3

Now u1 = b/v1 = x, u2 = b/v2 = x2 + 1, and we check that v1 v2 = b,

1
c1 u1 v′1 + c2 u2 v′2 = − x · 2x + 1 · (x2 + 1) · 1 = 1 = a,
2
and therefore in fact (− 12 log(x2 + 1) + log x)′ = 1/(x3 + x), as in Example 22.6. ✸
630 22. Symbolic integration

Although Theorem 22.8 keeps the degree of the algebraic extension as small as
possible, it does not avoid gcd computations in such an algebraic extension. The
following observation, however, will lead to a purely rational algorithm.

T HEOREM 22.9 Lazard-Rioboo-Trager formula.

Let a, b, c1 , . . . , cl , v1 , . . . , vl be as in Theorem 22.8, i ∈ {1, . . . , l}, and let e ≥ 1
denote the multiplicity of ci as a root of the resultant r = resx (b, a − yb′ ).

(i) deg vi = e.

(ii) Let w(x, y) ∈ F(y)[x] denote the remainder of degree e in the monic Euclid-
ean Algorithm for b and a − yb′ in F(y)[x]. Then vi = w(x, ci ).

P ROOF. We let λ1 , . . . , λn be the roots of b in its splitting field. Then

r = lcx (b)degx (a−yb )

′
∏ (a(λk ) − yb′ (λk )),
1≤k≤n

by Exercise 6.12, and

e = #{k ∈ N: 1 ≤ k ≤ n and a(λk ) − ci b′ (λk ) = 0}.

On the other hand,

!
′ ′
vi = gcd(b, a − ci b ) = gcd ∏ (x − λk ), a − ci b = ∏ (x − λk ),
1≤k≤n 1≤k≤n
a(λk )−ci b′ (λk )=0

since the λk are distinct. This proves (i).

For (ii), we let σ ∈ F[y] denote the eth subresultant (with respect to x) of b
and a − yb′ . If e < n, then (i) and Theorem 6.55 with R = F[y] and p = y − ci
imply that σ (ci ) 6= 0 and w(x, ci ) is well-defined and equal to vi . If e = n, then
w(x, ci ) = w(x, y) = b(x)/ lc(b), and (i) implies that vi = b/ lc(b) as well. ✷

Thus it is sufficient to separate the roots of the resultant r by their multiplicities,

that is, to compute the squarefree decomposition of r; the complete irreducible
factorization is not necessary. This leads to the following integration algorithm for
rational functions.

A LGORITHM 22.10 Symbolic integration of rational functions.

Input: f , g ∈ F[x] coprime, where F is a field of characteristic zero and g 6= 0 is
monic.Z
f
Output: .
g
22.3. The method of Lazard, Rioboo, Rothstein, and Trager 631

qi i+1
1. h ←− f rem g, ∑i qi xi ←− f quo g, U ←− ∑ x
i i+1

2. call Yun’s algorithm 14.21 to compute the squarefree decomposition g =

∏ gii of g
1≤e≤m

h hi j
3. compute the partial fraction decomposition = ∑ ∑ j , with hi j ∈
g 1≤i≤m 1≤ j≤i gi
F[x] such that deg hi j < deg gi for 1 ≤ j ≤ i ≤ m

4. { Hermite reduction }
V ←− 0
for i = 2, . . . , m do
for j = i, i − 1, . . . , 2 do
compute s,t ∈ F[x] such that sgi + tg′i = hi j and deg s, degt <
deg gi using Theorem 4.10
t t′
V ←− V − , hi, j−1 ←− hi, j−1 + s +
( j − 1)gij−1 j−1

5. W ←− 0
for i = 1, . . . , m do

6. { Lazard-Rioboo-Trager method }
a ←− hi1 , b ←− gi , r ←− resx (b, a − yb′ )
call Yun’s algorithm 14.21 to compute the squarefree decomposition
r = lc(r) ∏ ree of r
1≤e≤d

7. for e = 1, . . . , d such that deg re > 0 do

compute the remainder we (x, y) ∈ F(y)[x] of degree e in the
monic Euclidean Algorithm of b and a − yb′ in F(y)[x]
W ←− W + ∑ γ log we (x, γ )
re (γ )=0

8. return U +V +W

Updating V in step 4 may be done by bringing V −t/(( j − 1)gij−1 ) to a common

denominator. Usually, however, it is more desirable to concatenate the summands
symbolically, without actually computing the sum. This also applies to the updat-
ing of W in step 7, and then the integral is returned as a formal sum of several
“small” rational functions and logarithms in step 8. Many computer algebra sys-
tems handle it that way, and the user must explicitly direct the system to bring the
integral to a common denominator after calling the symbolic integration engine.
A similar remark applies to the sum over all zeroes γ of re in step 7. This can be
632 22. Symbolic integration

left as is, without further evaluation (for example, by using the RootOf construct
in M APLE).

E XAMPLE 22.6 (continued). Let f = 1 and g = x2 − 2, as in Example 22.6 (ii).

Then g = g1 is squarefree, and nothing happens in steps 1 through 4, so that U =
V = 0. In step 6, we have m = 1, a = f , b = g, and
 
1 −2y 0
r = resx (x2 − 2, 1 − y · 2x) = det  0 1 −2y  = −8y2 + 1.
−2 0 1

This polynomial is irreducible in Q[x], so that d = 1 and r1 = r/ lc(r) = y2 − 1/8.

The monic remainder of degree 1 in x in the Euclidean Algorithm over Q(y) for
b and a − yb′ = −2yx + 1 is simply the latter polynomial divided by its leading
coefficient, namely w1 = x − 1/2y. Thus

W= ∑ γ log(x − 1/2γ ),
r1 (γ )=0
√
and if we plug in the two zeroes ±1/2 2 ∈ R of r1 for γ , we arrive at the same
result as in Example 22.6. ✸

T HEOREM 22.11.
Algorithm 22.10 works correctly as specified. If f and g are of degree at most n,
its running time is O(n M(n) log n) or O∼ (n2 ) operations in F .

P ROOF. Correctness follows from the discussion preceding Theorem 22.7 and
from Theorems 22.8 and 22.9.
By Theorem 22.7, steps 1 through 4 take O(M(n) log n) operations. Let i ∈
{1, . . . , m} and di = deg gi . Exercise 6.12 implies that deg r = di , and the cost for
computing r is O(di M(di ) log di ), by Corollary 11.21. Within the same time bound,
we can in fact compute all the we with deg re > 0 in step 7, by Exercise 11.9, since

∑ e≤ ∑ e deg re = deg r = di .
1≤e≤d 1≤e≤d
deg re >0

Now ∑1≤i≤m di ≤ n, and the overall cost for the loop 5 is O(n M(n) log n) operations
in F. This dominates the cost for the other steps, and the claim follows. ✷

22.4. Hyperexponential integration: Almkvist & Zeilberger’s algorithm

The preceding sections provide a complete algorithm for symbolic integration of
rational functions. In this section, we present an algorithm for the larger class
22.4. Hyperexponential integration: Almkvist & Zeilberger’s algorithm 633

of hyperexponential functions. In contrast to the method for rational functions,

however, this algorithm does not always return a symbolic integral, but only if the
integral is again hyperexponential.

D EFINITION 22.12. Let (L, D) be a differential field, K ⊆ L a subfield, and f ∈ L

with logarithmic derivative D( f )/ f ∈ K . Then f is hyperexponential over K .

E XAMPLE 22.13. (i) Every rational function in Q(x) is hyperexponential over

Q(x) with respect to the usual differential operator Dx = 1.
(ii) Let L = Q(x, exp(x2 )) with Dx = 1 and D(exp(x2 )) = 2x · exp(x2 ). Then
f = exp(x2 ) is hyperexponential over Q(x).
(iii) The square root function f = x1/2 ∈ L = Q(x, x1/2 ) is hyperexponential over
Q(x), with logarithmic derivative D(x1/2 )/x1/2 = 1/2x. ✸

In the remainder of this chapter, “hyperexponential” always means “hyperex-

ponential over K = F(x)”, where F is a field of characteristic zero, and we as-
sume that D =′ is the usual derivative on F(x). The hyperexponential integra-
tion problem is to determine whether a nonzero hyperexponential element g in an
extension field of F(x) has a hyperexponential integral, and if so, to compute one.
We assume that g′ is nonzero and let σ = g′ /g ∈ F(x) be the logarithmic deriva-
tive of g. If f is another hyperexponential element with logarithmic derivative
ρ = f ′ / f ∈ F(x) such that g = f ′ = ρ f , then f = g/ρ is a rational multiple of g.
(We have ρ 6= 0 since g′ 6= 0 implies f ′ 6= 0.) Thus, if g has a hyperexponential
integral, then this already belongs to the same differential field as g.
Let τ ∈ F(x) and f = τ g. Then

f ′ = (τ g)′ = τ ′ g + τ g′ = (τ ′ + στ )g,

and this equals g if and only if τ satisfies the first order differential equation

τ ′ + στ = 1. (4)

We note that only rational functions occur in (4). Thus we have eliminated
all hyperexponential elements and reduced the original task to a purely rational
problem. In what follows, we discuss how to solve the differential equation (4).
We write σ = a/b and τ = u/v, with nonzero polynomials a, b, u, v ∈ F[x] such
that b and v are monic and gcd(a, b) = gcd(u, v) = 1. Then (4) becomes
u′ v − uv′ au
τ ′ + στ = + = 1.
v2 bv
After multiplying by bv2 , we obtain the equivalent polynomial differential equation

bu′ v − buv′ + auv = bv2 (5)

634 22. Symbolic integration

Conversely, any solution u, v ∈ F[x] of (5) for given a, b ∈ F[x] yields a solution of
our hyperexponential integration problem by setting f = gu/v.
The algorithm proceeds in two phases. In the first phase, we find a multiple V
of any possible denominator v. Once we know v (or a multiple of it), then (5) is
just a system of linear equations in the coefficients of u. It is easily solved if we
can bound the degree of u; such a bound is calculated in Lemma 22.18 below.
For the first phase, we want to determine a suitable multiple of v. Let v0 , v1 ∈
F[x] be such that v = v0 ·gcd(v, v′ ) and v′ = v1 ·gcd(v, v′ ). Dividing (5) by gcd(v, v′ ),
we find
bu′ v0 − buv1 + auv0 = bvv0 . (6)
We see that v0 divides buv1 , and since gcd(u, v) = 1 = gcd(v0 , v1 ), it divides b.
Thus we can divide (6) by v0 and obtain
b
bu′ + a − v1 u = bv.
v0
Again, since v0 divides b and is coprime to u, we conclude that v0 divides a −
(b/v0 )v1 .
Now we let (h1 , . . . , hm ) be the squarefree decomposition of v, as defined in
Section 14.6, with m ∈ N≥1 and monic squarefree and pairwise coprime polyno-
mials h1 , . . . , hm ∈ F[x] such that v = h1 h22 · · · hm
m and hm 6= 1. Then v0 = h1 · · · hm
and v1 = ∑1≤i≤m ih′i v0 /hi , by Exercise 14.26. Letting 1 ≤ i ≤ m and computing
modulo hi , we find

b b ′ v0 b ′ b ′ b ′
v1 ≡ · ihi = i hi ≡ i h+ hi = ib′ mod hi .
v0 v0 hi hi hi i hi
Now hi divides b, by the above, and both left hand summands in

b b
a − v1 + ′
v1 − ib = a − ib′ ,
v0 v0
so that hi divides gcd(b, a − ib′ ), for 1 ≤ i ≤ m. This leads to the following algo-
rithm for computing a multiple of v.

A LGORITHM 22.14 Multiple of integration denominator.

Input: Relative prime polynomials a, b ∈ F[x] with b 6= 0 monic.
Output: A monic polynomial V ∈ F[x] such that for any coprime u, v ∈ F[x], equa-
tion (5) implies that v divides V .
1. R ←− resx (b, a − yb′ ), d ←− max{i ∈ N: i = 0 or R(i) = 0}
if d = 0 then return 1
2. for i = 1, . . . , d do Hi ←− gcd(b, a − ib′ )
3. return H1 H22 · · · Hdd
22.4. Hyperexponential integration: Almkvist & Zeilberger’s algorithm 635

We note that the resultant in step 1 is the same as in algorithm 22.10. However,
here we do not need its complete factorization into irreducible polynomials, but
only its positive integral roots.

T HEOREM 22.15.
Algorithm 22.14 works correctly as specified. More precisely, if u, v are coprime
polynomials in F[x] solving (5), v 6= 0 is monic, and (h1 , . . . , hm ) is the squarefree
decomposition of v, then m ≤ d and hi divides Hi for 1 ≤ i ≤ m.

P ROOF. We may assume that deg v ≥ 1. We have deg hm > 0, by the definition of
the squarefree decomposition. By the discussion preceding the theorem, hi divides
gcd(b, a − ib′ ) for all i. In particular, this implies that gcd(b, a − mb′ ) is noncon-
stant, and hence R(m) = resx (b, a − mb′ ) = 0. Thus m ≤ d and hi | Hi for 1 ≤ i ≤ m.
m m+1
Finally, v = h1 h22 · · · hm 2 d
m divides H1 H2 · · · Hm Hm+1 · · · Hd = V . ✷

We note that it is sufficient to perform the loop in step 2 only for those i that
are roots of the resultant R. Moreover, it is somewhat more efficient to remove Hi
from b and a − ib′ in step 2; see Exercise 22.11.

E XAMPLE 22.16. Let g = (x3 + x2 ) exp(x) ∈ Q(x, exp(x)) = L. The logarithmic

derivative of g is
g′ (3x2 + 2x) exp(x) + (x3 + x2 ) exp(x) x2 + 4x + 2
= = ,
g (x3 + x2 ) exp(x) x2 + x
so that g is hyperexponential over K = Q(x), and we have a = x2 + 4x + 2 and
b = x2 + x. Algorithm 22.14 computes
 
1 0 1 0
 1 1 4 − 2y 1 
R = resx (x2 + x, x2 + 4x + 2 − y(2x + 1)) = det 
 0

1 2 − y 4 − 2y 
0 0 0 2−y
= −y2 + 3y − 2 = −(y − 1)(y − 2),

so that d = 2. In step 2 of Algorithm 22.14, we compute

H1 = gcd(b, a − b′ ) = gcd(x2 + x, x2 + 2x + 1) = x + 1,
H2 = gcd(b, a − 2b′ ) = gcd(x2 + x, x2 ) = x,

and finally V = H1 H22 = x3 + x2 in step 3. ✸

The following example shows that d may be exponentially large in the input
size.
636 22. Symbolic integration

E XAMPLE 22.17. We let g = xn exp(x) ∈ Q(x, exp(x)), with a parameter n ∈ N.

Then g is hyperexponential over Q(x), with logarithmic derivative

g′ x + n
= ,
g x

and the input to Algorithm 22.14 is a = x + n and b = x. Now gcd(b, a − ib′ ) =

gcd(x, x + n − i) is nonconstant precisely when i = n, and hence d = n in step 1,
Hi = 1 for 1 ≤ i < n, and Hn = x, so that V = xn . The degree of V is exponential in
the size of a and b, which is about log264 n words. However, it is polynomial in the
“degree” of g with respect to x. ✸

Now that we have found a monic multiple V of the denominator v, it remains to

find a suitable numerator polynomial U ∈ F[x] such that

bVU ′ − (bV ′ − aV )U = bV 2 . (7)

Then τ = U/V ∈ F(x) is a solution of (4), and f = gU/V is a hyperexponential

integral of g. Conversely, if τ = u/v solves (4), with coprime polynomials u, v and
v 6= 0 monic, then U = uV /v is a solution of (7).
We divide both sides of (7) by h = gcd(bV, bV ′ − aV ) ∈ F[x] and obtain the
equivalent equation
rU ′ − sU = t, (8)
where r = bV /h is nonzero and monic, s = (bV ′ − aV )/h, and t = rV is nonzero.
(We can actually achieve t = V ; see Exercise 22.12.) Suppose that we know the
degree of U, or at least an upper bound. Then we can plug a generic polyno-
mial U with unknown coefficients into (8) and solve the resulting system of linear
equations for the coefficients of U. The following lemma provides such a degree
bound.

L EMMA 22.18. Let r, s,t,U ∈ F[x] satisfy (8), with r,U nonzero and r monic, let
m = max{deg r − 1, deg s}, and let δ ∈ F be the coefficient of xm in s. (As usual,
δ = 0 if m < 0 or deg s < m.) Moreover, let

degt − m if deg r − 1 < deg s or δ 6∈ N \ {0, 1, . . . , degt − m},
e=
δ otherwise.

Then the following hold.

(i) Either degU = degt − m, or else degU = δ > degt − m and deg r − 1 ≥ deg s.
In particular, degU ≤ e.

(ii) If deg r − 1 ≥ deg s, then degt − m 6= δ .

22.4. Hyperexponential integration: Almkvist & Zeilberger’s algorithm 637

(iii) If t = 0, then deg r − 1 ≥ deg s and degU = δ ∈ N.

(iv) If deg r − 1 < deg s or δ 6∈ N, then exactly one U ∈ F[x] satisfies equation (8).

We recall that the zero polynomial has degree −∞; thus degt − m is to be inter-
preted as −∞ if t = 0. We have N ⊆ Q ⊆ F, so that any integer is also an element
of F.

P ROOF. We compare the degrees and the top coefficients in (8). Firstly, we have
degt ≤ max{deg(rU ′ ), deg(sU)}
≤ max{deg r + degU − 1, deg s + degU} = m + degU.
Let γ ∈ F denote the coefficient of xm+1 in r. Then the coefficient of xm+degU
in rU ′ is γ lc(U) degU, and the coefficient of xm+degU in sU is δ lc(U). Thus the
coefficient of xm+degU in t is (γ degU − δ ) lc(U), and degt < m + degU if and only
if this coefficient vanishes.
If deg r − 1 < deg s, then γ = 0 and δ = lc(s) 6= 0, and hence degU = degt − m.
Otherwise, we have γ = lc(r) = 1. We conclude that degU ≥ degt − m, with strict
inequality if and only if deg r − 1 ≥ deg s and degU = δ . This proves (i), (ii), and
(iii).
To show (iv), we assume that U ∗ ∈ F[x] is another solution of (8). Then the
difference U −U ∗ satisfies the homogeneous equation r(U −U ∗ )′ −s(U −U ∗ ) = 0,
and the claim follows from (iii). ✷

Thus we can (almost) determine degU from the known polynomials r, s and
t = rV , namely degU = e = degt − m if deg r − 1 < deg s, and degU ≤ e =
max({degt − m, δ } ∩ Z) if deg r − 1 ≥ deg s. If the bound e is nonnegative and
if deg r − 1 < deg s or degt − m 6= δ , we set up the system of linear equations
equivalent to (8) and solve it for the unknown coefficients of U. If the system has
a solution, then τ = U/V satisfies (4), and gU/V is a hyperexponential integral
of g. Otherwise, or if e < 0, or if deg r − 1 ≥ deg s and degt − m = δ , then we know
that (4) has no rational solution τ ∈ F(x), and hence g has no hyperexponential
integral.

A LGORITHM 22.19 Hyperexponential integration.

Input: Nonzero coprime polynomials a, b ∈ F[x], with b 6= 0 monic, where F is a
field of characteristic zero.
Output: Coprime polynomials u, v ∈ F[x], with v monic, satisfying (5), if such
polynomials exists, and otherwise “unsolvable”.
1. call Algorithm 22.14 with input a, b, yielding a monic multiple V ∈ F[x] of
any possible denominator
638 22. Symbolic integration

bV bV ′ − aV
2. h ←− gcd(bV, bV ′ − aV ), r ←− , s ←− , t ←− r ·V
h h
3. m ←− max{deg r − 1, deg s}
let δ be the coefficient of xm in s
if deg r − 1 < deg s or δ 6∈ N then e ←− degt − m
else if degt − m = δ then return “unsolvable”
else e ←− max{degt − m, δ }
if e < 0 then return “unsolvable”
4. solve the linear system corresponding to (8) for the unknown coefficients
U0 , . . . ,Ue ∈ F of U of degree at most e
if the system is unsolvable then return “unsolvable”
else U ←− Ue xe + · · · +U1 x +U0
U V
5. return and
gcd(U,V ) gcd(U,V )

The solution space of the system of linear equations in step 4 is either empty, or
has precisely one element, or turns out to be one-dimensional (Exercise 22.13). In
the latter case, we preferably take a solution leading to a numerator U of smallest
degree. The coefficient matrix is triangular, so that the system is particularly easy
to solve without Gaussian elimination, simply by back substitution, taking O(e2 )
operations in F. At most one diagonal element is zero, and this occurs only if
deg r − 1 ≥ deg s and δ ∈ N, by Lemma 22.18 (iii).
Exercise 22.12 shows that we may even obtain gcd(r,t) = 1 in step 2, by appro-
priately dividing out common factors, and perform steps 3 through 5 for a divisor
of U. This may further reduce the size of the system in step 4.
We now give some examples.

E XAMPLE 22.17 (continued). We recall that a = x + n, b = x, and V = xn , where

n ∈ N is a parameter. Now bV = xn+1 , bV ′ − aV = −xn+1 , and hence h = xn+1 ,
r = 1, s = −1, and t = xn in step 2 of Algorithm 22.19. Thus equation (8) reads

U ′ +U = xn . (9)

We have m = max{deg r − 1, deg s} = 0 > deg r − 1, so that we are in the first case
of Lemma 22.18, and the degree bound in step 3 is e = degt − m = n. In step 4, we
let U = Un xn + · · · +U1 x +U0 , with undetermined coefficients Un , . . . ,U0 . Plugging
this into (9), we obtain the system of linear equations

Un = 1, nUn +Un−1 = 0, ..., 2U2 +U1 = 0, U1 +U0 = 0.

This system has a unique solution, and it satisfies U0 6= 0. Finally, the algorithm
returns u = U and v = V in step 5.
22.4. Hyperexponential integration: Almkvist & Zeilberger’s algorithm 639

For example, for n = 2 the solution is U2 = 1, U1 = −2, and U0 = 2, and hence

u = U = x2 − 2x + 2. We check that gu/v is indeed an integral of g = x2 exp(x):
u ′ ′
g = (x2 − 2x + 2) exp(x) = (2x − 2) exp(x) + (x2 − 2x + 2) exp(x) = g. ✸
v

E XAMPLE 22.20. Let g = 1/x ∈ F(x). Then σ = g′ /g = −1/x, and a = −1 and

b = x. We compute R = resx (x, −1 − y) = −(y + 1) in step 1 of Algorithm 22.14
and hence d = 0 and V = 1. Plugging this into (8), we obtain
xU ′ −U = x.
We have r = t = x, s = 1, m = deg s = deg r − 1 = 0, and degt − m = 1 = δ . By
Lemma
R
22.18 (ii), the above equation has no solution U ∈ F[x]. We conclude that
(1/x) is not hyperexponential.
R
This can also be checked directly: the logarithmic
derivative of log x = (1/x) is 1/x log x, and this is not a rational function. ✸

E XAMPLE 22.21. This example

√ illustrates that degU = δ may actually occur. Let
g = (x2 + 1)−3/2 ∈ Q(x, x2 + 1). Then g is hyperexponential over Q(x) with
logarithmic derivative
g′ −3x
= 2 ,
g x +1
so that a = −3x and b = x2 + 1. In step 1 of Algorithm 22.14, we compute
 
1 −3 − 2y 0
R = resx (x2 + 1, −3x − y · 2x) = det  0 0 −3 − 2y  = (2y + 3)2 ,
1 0 0
so that d = 0 and V = 1. Now equation (8) reads
(x2 + 1)U ′ − 3xU = x2 + 1. (10)
Thus r = t = b = x2 + 1, s = −a = 3x, m = max{deg r − 1, deg s} = 1, and we are
in the second case of Lemma 22.18. Now 1 = degt − m 6= δ = 3, and the lemma
implies that degU = δ = 3. We let U = U3 x3 +U2 x2 +U1 x+U0 , with undetermined
coefficients U3 , . . . ,U0 . Plugging this into (10), we see that the coefficients of x4
cancel, and comparing coefficients of x3 , x2 , x, 1 leads to the linear equations
−U2 = 0, 3U3 − 2U1 = 1, 2U2 − 3U0 = 0, U1 = 1.
The unique solution of these equations is U3 = U1 = 1 and U2 = U0 = 0. Thus
U = x3 + x in step 4, and the algorithm returns u = U and v = 1 in step 5. We
check that gu/v is an integral of g:
u ′ ′ 1
g = x · (x2 + 1)−1/2 = (x2 + 1)−1/2 − x · (x2 + 1)−3/2 · 2x
v 2
2 −3/2 2 2
= (x + 1) (x + 1 − x ) = g. ✸
640 22. Symbolic integration

Notes. Historically, the foundations of symbolic integration were laid by Joseph Liou-
ville (1833a, 1833b, 1835). Ritt (1948) invented the notion of a differential algebra, which
is the appropriate framework for the integration problem. A more general method was
presented by Risch (1969, 1970), and variants of his algorithm are implemented today in
almost any computer algebra system. They employ suitably modified versions of Hermite’s
and Rothstein’s and Trager’s methods.
Richardson (1968) and Caviness (1970) showed that a sufficiently general version of the
integration problem is unsolvable. Already when we just consider real functions built up
from the constant 1, a single variable, the four arithmetic operations, and the sine function,
then determining whether the definite integral (from −∞ to ∞) of such an expression exists
is undecidable, and similarly for the existence of indefinite integrals as we considered in
this chapter (Matiyasevich 1993, §9.4).
In spite of this fundamental limitation, symbolic integration and, more generally, the
symbolic solution of ordinary differential equations, is a highly active area of research.
Among its goals are algorithms for a wider range of problems, and better algorithms for
special types of problems. Bronstein (1997) gives a nice overview.
22.1. A project for OCR-reading integral tables is described in Berman & Fateman (1994).
22.2. Most undergraduate calculus textbooks contain an integration algorithm for rational
functions by factoring the denominator into linear polynomials over the complex numbers
(or at most quadratic polynomials over the real numbers) and performing a complete partial
fraction decomposition. For rational functions with only simple poles, this algorithm first
appears in Johann Bernoulli (1703). For symbolic computation, this approach is inefficient
since it involves polynomial factorization and computation with algebraic numbers, and
most computer algebra systems implement the algorithms described in this chapter.
In fact, on the pages just preceding Bernoulli’s article, Leibniz (1703) goes a step further.
He also computes the integral of a rational function by partial fraction decomposition if the
denominator is a product of distinct linear factors. But then he also gives the decomposition
in several cases where the denominator is not squarefree, for example,
1 1 1 1 1 1
= 4− 2 3+ 3 2− 4 + 4 ,
h4 l ωh ω h ω h ω h ω l
where h = x + a, l = x + b, and ω = b − a, similarly for 1/h4 l 3 , and a general formula
for 1/ht l s . For those terms, interveniunt etiam Hyperboloidum quadraturæ, quales sunt,
1 1 1
quarum ordinatæ sunt xx , x3 , x4 , &c,1 which are calculated by the usual rules. Here we
have the essential ingredients of Hermite’s method, 170 years before Hermite!
Ostrogradsky (1845) shows that for coprime f , g ∈ F[x] with deg f < deg g, there exist
unique polynomials a, c ∈ F[x] such that (1) holds, deg a < deg b, and deg c < deg d, where
b is the squarefree part of g and d = g/b, and presents an algorithm for computing a and c.
The algorithm described in Section 22.2 is from Hermite (1872). Theorem 22.7 appears in
Yun (1977a). Gerhard (2001a) gives fast modular algorithms for Hermite reduction.
22.3. Theorem 22.8 is from Rothstein (1976, 1977) and Trager (1976). Theorem 22.9 is
from Lazard & Rioboo (1990). Independently, Trager implemented the resulting algorithm
in S CRATCHPAD, but did not publish it. Mulders (1997) describes an error in software
implementations of the Lazard-Rioboo-Trager method.
1 furthermore enter the integrals of the hyperbolas, whose function values are x−2 , x−3 , x−4 , etc.
Exercises 641

Gerhard (2001a) presents a modular variant of Algorithm 22.10. If f , g ∈ Z[x] and their
coefficients are absolutely bounded by A ∈ N, then this algorithm takes O∼ (n5 + n4 log A)
word operations.
22.4. Algorithm 22.19 is from Almkvist & Zeilberger (1990). It is the analog of Gosper’s
(1978) algorithm for hypergeometric summation, which we discuss in Section 23.4. The
method of Almkvist & Zeilberger for determining a multiple of the denominator is de-
scribed in Exercise 22.11.
Equation (4) is a special case of the Risch differential equation : given elements σ, ρ in
a differential field K, determine τ ∈ K satisfying Dτ + στ = ρ. Equation (4) corresponds
to the case K = F(x), ρ = 1, and D = ′ . The more general equation plays a significant
role in Risch’s (1969, 1970) algorithm. Other algorithms for solving the Risch differ-
ential equation are given by Rothstein (1976, 1977), Kaltofen (1984), Davenport (1986),
and Bronstein (1990, 1991, 1997). Algorithms computing rational solutions of linear dif-
ferential equations of higher order are given by Abramov (1989a, 1989b), Abramov &
Kvashenko (1991), Bronstein (1992), Abramov, Bronstein & Petkovšek (1995), and Bron-
stein & Fredet (1999). The algorithms of Singer (1991), Petkovšek & Salvy (1993), and
Pflügel (1997), for example, find more general solutions of higher order equations.
The idea of finding polynomial solutions of linear differential equations by first de-
termining a degree bound and then solving a system of linear equations is known as
the method of undetermined coefficients and goes back to Newton (1691/92). Gerhard
(2001b), Chapter 9, presents several asymptotically fast methods for first order equations.
In practice, one would compute all integral roots of the resultant R in step 1 of Algo-
rithm 22.14, using Corollary 11.21 and Theorem 15.21, and then iterate the loop in step 2
only for those i that are roots of the resultant. Gerhard (2001b), Section 8.1 and Chapter 10,
gives a cost analysis of modular variants of Algorithms 22.14 and 22.19, respectively, in
terms of word operations.
Whether the output of Algorithm 22.14 is exponential in the input size depends on what
we regard as input. For example, if we consider hyperexponential functions of the form
g = r1 exp(r2 ) with r1 , r2 ∈ Q(x), then the result returned by Algorithm 22.14 is polynomial
in the degrees of the numerators and denominators of r1 and r2 (Theorem 10.16 in Gerhard
2001b). If we think of this algorithm as the first step of an integration algorithm, then it is
natural to regard g—and not the logarithmic derivative g′ /g—as input. If we represent g by
storing the rational functions r1 and r2 in a dense format as quotients of polynomials, then
the output of Algorithm 22.14 is polynomial in the input size. However, if we represent g
in a sparse format, e.g., by an “expression tree”, or if we consider the logarithmic derivative
g′ /g as input, then Example 22.17 shows that the output may be exponential in this input
size. On the other hand, Algorithm 22.14 may also be regarded as the first step of a method
for solving the linear differential equation (4), and then it is natural to consider σ = g′ /g
as the input. In fact, for a higher order linear differential equation with rational function
coefficients, the degrees of the rational solutions are in general exponential in the size of
the coefficients of the differential equation. Similar remarks also apply to the degree of the
numerator in Algorithm 22.19.

Exercises.
22.1 Let (R, D) be a differential algebra. Show that R0 is in fact a subring of R, and a subfield if R
is a field.
642 22. Symbolic integration

22.2 Prove Lemma 22.2.

22.3 Show that Q has only the trivial derivative.
22.4 Let ′ denote the usual derivative on Q(x). Prove without using arguments from calculus that
f ′ = 0 =⇒ f ∈ Q holds for all f ∈ Q(x). Hint: Prove the claim first for polynomials f ∈ Q[x].
22.5∗ Let F be a field of characteristic zero, and a, b, c, d ∈ F[x] \ {0} such that (c/d)′ = a/b.
(i) Prove that deg a − deg b ≤ deg c − deg d − 1, with equality if and only if deg c 6= deg d. Give an
example where equality does not hold. Conclude that deg a − deg b = −1 is impossible.
(ii) Let p ∈ F[x] be irreducible and v p (a) = e ∈ N if pe | a and pe+1 ∤ a (this is the negative
logarithm of the p-adic value of a, as in Example 9.31 (iii)), and similarly v p (b), v p (c), v p (d). Prove
that v p (a) − v p (b) ≥ v p (c) − v p (d) − 1, with equality if and only if v p (c) 6= v p (d). Give an example
where equality does not hold. Conclude that v p (a) −v p (b) = −1 is impossible, and that v p (b) ≥ 2 for
every irreducible divisor p of b if gcd(a, b) = 1. In particular, b is not squarefree if it is nonconstant
and coprime to a.
2 2
22.6∗ You are to show that ex has no integral of the form g · ex with a rational function g ∈ R(x).
(i) Suppose to the contrary that such a g exists. Derive a first order differential equation g′ + sg = t
for g, with polynomials s,t ∈ R[x].
(ii) Now assume that g = u/v with coprime u, v ∈ R[x] and v monic. Prove that v = 1.
(iii) Compare degrees on both sides of the equation u′ + su = t, and conclude that no polynomial
u ∈ R[x] satisfies it.
22.7∗ Let F be a field of characteristic zero and f , g ∈ F[x] nonzero polynomials with g monic.
(i) Prove that the decomposition (1) is unique (up to adding a constant to c/d) when we stipulate
that b is the (monic) squarefree part of g, d = f /b, and deg a < deg b. Hint: Use Exercise 22.5 (ii).
(ii) Show that a decomposition (1) always exists uniquely if g = bd for monic polynomials b, d ∈
F[x] such that b | d and every irreducible factor of d divides b, and deg a < deg b. Give an example
where such a decomposition does not exist if we drop the requirement that every irreducible factor
of d divides b.
22.8∗ This exercise discusses a variant of Hermite reduction, due to Mack (1975) (see also Bronstein
1997, §2.2). Let F be a field of characteristic zero and g, h ∈ F[x] nonzero such that n = deg g > deg h
and g is monic.
(i) Let g = g1 g22 · · ·gm
m be the squarefree decomposition of g, with all gi ∈ F[x] monic, pairwise
coprime, and gm 6= 1. Moreover, let g∗ = g/g1 , b = g2 · · ·gm be the squarefree part of g∗ , and
d = g2 g23 · · ·gm−1
m = g∗ /b. Now gcd(g1 , g∗ ) = 1, and we have the partial fraction decomposition
h h1 h∗
= + ,
g g1 g∗
with unique h1 , h∗ ∈ F[x] such that deg h1 < deg g1 and deg h∗R< deg g∗ . The first fraction has a
squarefree denominator; it contributes to the logarithmic part of (h/g).
Prove that d divides bd ′ and that gcd(bd ′ /d, b) = 1. Thus there exist s,t ∈ F[x] of smaller degree
than b such that s(bd ′ /d) + tb = h∗ . Use this to write
h∗ u ′ v
= + ,
g∗ d d
with u, v ∈ F[x] of smaller degree than d, and determine u and v.
R
(ii) Use (i) iteratively to split (h/g) into its rational and logarithmic parts, and show that this takes
O(m M(n) logn) operations in F.
(iii) Analyze both the Hermite reduction as described in Section 22.2 and Mack’s variant when
using classical polynomial arithmetic, and compare your results.
Exercises 643

22.9 What is the leading coefficient of the resultant r in Theorem 22.8? Hint: Prove that it is the
constant coefficient of resx (ay − b′ , b).
22.10−→ Trace Algorithm 22.10 on computing the integral of

x9
f= ∈ Q(x).
x7 + 3 x6 − 5 x5 − 23 x4 − 8 x3 + 40 x2 + 48 x + 16
22.11 Let F be a field of characteristic zero and a, b ∈ F[x] nonzero and coprime.
(i) Let γ ∈ F and p ∈ F[x] be an irreducible factor of gcd(b, a − γb′ ). Prove that p2 ∤ b, and
conclude that the gcd is squarefree.
(ii) Show that gcd(b, a − γ1 b′ ) and gcd(b, a − γ2 b′ ) are coprime if γ1 , γ2 ∈ F are distinct.
(iii) Consider the following variant of Algorithm 22.14.
A LGORITHM 22.22 Almkvist & Zeilberger’s multiple of integration denominator.
Input: Relative prime polynomials a, b ∈ F[x] with b 6= 0 monic.
Output: A monic polynomial V ∈ F[x] such that for any coprime u, v ∈ F[x], equation (5) implies that
v divides V .
1. R ←− resx (b, a − yb′ ), d ←− max{i ∈ N: i = 0 or R(i) = 0}
if d = 0 then return 1
2. a0 ←− a, b0 ←− b
for i = 1, . . ., d do
ai−1 − b′i−1 bi−1
Hi ←− gcd(bi−1 , ai−1 − b′i−1 ), ai ←− , bi ←−
Hi Hi
3. return H1 H22 · · ·Hdd
Show that Hi = gcd(b, a − ib′ ) for all i and conclude that the algorithm returns the same result as
Algorithm 22.14.
(iv) Trace Algorithm 22.22 on the input from Example 22.16.
22.12 Let F be a field of characteristic zero and r, s,t,U ∈ F[x] with rU ′ − sU = t and gcd(r, s) = 1.
Suppose that deg gcd(r,t) ≥ 1, and reduce this differential equation for U to a differential equation
for a proper divisor U ∗ of U, with coefficients r, s∗ , and t ∗ = t/ gcd(r,t).
22.13 Let F be a field of characteristic zero and r, s,t ∈ F[x] such that r is nonzero and monic.
(i) Let S = {H ∈ F[x]: rH ′ − sH = 0} and H1 , H2 ∈ S \ {0}. Show that there is a nonzero constant
c ∈ F such that H1 = cH2 . Hint: Lemma 22.18.
(ii) Prove that either S = {0} or there is a unique monic polynomial H0 ∈ S, which has degree δ,
where δ is as in Lemma 22.18, such that S = {cH0 : c ∈ F}.
(iii) Suppose that the inhomogeneous equation (8) has a solution U ∈ F[x]. Prove that the set of
all solutions to this equation is {U + H: H ∈ S}.
Summa cum laude.1

A sum to trick th’ arithmetic.

Rudyard Kipling (1893)

Some merely took refuge in the mathematics, chains of difficult

calculations using symbols as stepping stones on a march through fog.
James Gleick (1992)

As an algorist Euler has never been surpassed, and probably never

even closely approached, unless perhaps by Jacobi. An algorist
is a mathematician who devises “algorithms” (or “algorisms”)
for the solution of problems of special kinds [. . . ] There is no uniform
mode of procedure—algorists, like facile rhymesters, are born,
not made. It is fashionable today to despise the ‘mere algorist’;
yet, when a truly great one like the Hindu Ramanujan
arrives unexpectedly out of nowhere, even expert analysts
hail him as a gift from Heaven [. . . ] An algorist is a ‘formalist’
who loves beautiful formulas for their own sake.
Eric Temple Bell (1937)

Sometimes when studying his work I have wondered how much

Ramanujan could have done if he had had MACSYMA or
SCRATCHPAD or some other symbolic algebra package. More often
I get the feeling that he was such a brilliant, clever, and intuitive
computer himself that he really did not need them.
George E. Andrews (1986)

1 Sum with praise ;-)

23
Symbolic summation

The task that we address in this chapter is, given an “expression” g(n) depending
on n, to find an “expression” f (n) such that

f (n) = ∑ g(k),
0≤k<n

or, more generally, a closed form for the sum ∑a≤k<b g(k) for arbitrary nonnegative
integers a ≤ b. We will explain later what kind of expressions we consider; for the
time being, the reader may imagine univariate rational functions over a field of
characteristic zero.
We first solve the summation problem for polynomials, and introduce much of
the notation used later. After a digression about harmonic numbers, we discuss
hypergeometric terms and their summation. Section 24.3 gives a brief outlook on
further extensions, where computer algebra systems have had remarkable success
in giving short proofs of seemingly difficult problems. In contrast to the rest of
this book, we omit cost analyses.

23.1. Polynomial summation

Most computer algebra systems can handle tasks like these:

n(n − 1)
∑ k= ,
0≤k<n 2
n(n − 1)(2n − 1)
∑ k2 = ,
0≤k<n 6
n2 (n − 1)2
∑ k3 = ,
0≤k<n 4
n(n − 1)(2n − 1)(3n2 − 3n − 1)
∑ k4 = ,
0≤k<n 30

645
646 23. Symbolic summation

cn − 1
∑ ck = if c 6= 1,
0≤k<n c−1
n
∑ = (1 + 1)n = 2n , (1)
0≤k≤n k
n
∑ (−1)k k = (1 − 1)n = 0 if n > 0. (2)
0≤k≤n

The last two summations are of a different type than the others, in that the up-
per bound n also occurs in the summand. In this chapter, we will only consider
summations where this is not the case; this is called indefinite summation.
A useful tool for symbolic summation is the difference operator ∆. It asso-
ciates to an expression f an expression ∆ f , defined by (∆ f )(n) = f (n+1)− f (n).
It has the following properties:
◦ Linearity: ∆(a f + bg) = a∆ f + b∆g for expressions f , g and constants a, b
in F,
◦ Product rule: ∆( f g) = f ∆g + g∆ f + ∆ f · ∆g for expressions f , g.
In particular, ∆ f is a rational function if f is. We will see below, however, that
the converse is false in general. Related operators are the shift operator E, with
(E f )(n) = f (n + 1), and its powers (E k f )(n) = f (n + k) for k ∈ Z. We have the
operator identity ∆ = E − I, where I = E 0 is the identity operator.
The following lemma gives the connection between the difference operator and
symbolic summation.

L EMMA 23.1. If f , g are expressions such that g = ∆ f , then

∑ g(k) = f (b) − f (a) (3)

a≤k<b

for all a, b ∈ N with a ≤ b.

P ROOF.

∑ g(k) = ∑ f (k + 1) − f (k) = ∑ f (k + 1) − ∑ f (k)
a≤k<b a≤k<b a≤k<b a≤k<b

= ∑ f (k) − ∑ f (k) = f (b) − f (a). ✷

a<k≤b a≤k<b

This effect of cancellation of consecutive summands is called telescoping. The

lemma shows that the symbolic summation problem can be reformulated as fol-
lows: Given an expression g, find another expression f such that ∆ f = g. We also
write f = Σ g for such an expression, noting that Σ is not really a function but
23.1. Polynomial summation 647

a binary relation, since ∆( f + c) = ∆ f + ∆c = ∆ f = g for any expression with

∆c = 0, for example, a constant c ∈ F. Thus ∆ is a left inverse of Σ : ∆(Σ f ) = f
for all expressions f . Depending on the class of expressions we consider, there
may be nonconstant periodic expressions which are annihilated by ∆, such as
sin(2nπ ). Lemma 23.3 below implies that the rational functions c with ∆c = 0
are precisely the constants.
Symbolic summation is the discrete analog of symbolic integration. The role of
the (formal) derivative D = d/dx is played by the difference operator ∆, which is
some kind of analog to the limit
f (x + h) − f (x)
(D f )(x) = lim
h−→0 h
when only positive integral values
R
for h are allowed. The expression Σ g corre-
sponds to the indefinite integral g(x)dx, and Lemma 23.1 is the analog of the
fundamental theorem of calculus:
Z b
d
g = Df = f =⇒ g(x)dx = f (b) − f (a).
dx a

Both here and in (3) it is assumed that the integral and the sum, respectively, are
well-defined.
We have seen that the product rule is somewhat different from the Leibniz rule.
What about the analog of
D(xm ) = mxm−1 (4)
for m ∈ N ? For m = 3, for example, we have

∆(n3 ) = (n + 1)3 − n3 = 3n2 + 3n + 1,

so that (4) does not hold with ∆ instead of D. The following notions restore (4)
for the difference operator.

D EFINITION 23.2. For a polynomial f ∈ F[x] and m ∈ N, we define the mth

falling factorial by

f m = f (x) f (x − 1) · · · f (x − m + 1) = f · E −1 f · E −2 f · · · E −m+1 f .

In particular, we have

xm = x(x − 1) · · · (x − m + 1),

which is a monic polynomial of degree m. Similarly, we also have the mth rising
factorial
f m = f (x) f (x + 1) · · · f (x + m − 1) = E m−1 f m .
For m = 0, we let f 0 = f 0 = 1.
648 23. Symbolic summation

The similarity with the definition of m! = m(m − 1)(m − 2) · · · 1 gives rise to

the name factorial, and in fact m! = xm (m) = xm (1). It is possible to extend the
definition also to negative values of m; see Exercise 23.10.

L EMMA 23.3. The shift operator E k for k ∈ Z is an automorphism of F(x) leaving

each element of F fixed, with inverse E −k . The following relations hold for all
ρ, σ ∈ F(x), f , g ∈ F[x], and m ∈ N.
(i) E k (ρ ± σ ) = E k ρ ± E k σ , E k (ρ · σ ) = E k ρ · E k σ , E k (ρ/σ ) = E k ρ/E k σ ,
(ii) E k ( f m ) = (E k f )m , E k ( f m ) = (E k f )m ,
(iii) gcd(E k f , E k g) = E k gcd( f , g),
(iv) f is irreducible ⇐⇒ E k f is irreducible,
(v) deg(E k f ) = deg f ,
(vi) ρ = E ρ ⇐⇒ ρ ∈ F .

P ROOF. Statements (i)–(v) and (vi) “⇐=” are clear. We first prove (vi) “=⇒”
when ρ = ∑0≤i≤n fi xi ∈ F[x] is a polynomial, with fn 6= 0 and n > 0. Then the
coefficient of xn−1 in E ρ = fn (x + 1)n + fn−1 (x + 1)n−1 + ∑0≤i≤n−2 fi (x + 1)i is
n fn + fn−1 , and in ρ it is fn−1 . Since F has characteristic zero, n fn 6= 0, and E ρ 6= ρ.
Now we let ρ = f /g, with coprime f , g ∈ F[x] and deg g > 1, and assume that
E ρ = ρ, or equivalently, g · E f = f · Eg. Then g | Eg, by the relative primality of
f and g. Since the degrees and the leading coefficients of g and Eg agree, we have
g = Eg, and hence g ∈ F, by what we have already shown. This is a contradiction
to deg g > 1 and concludes the proof. ✷

By (ii) of the above lemma, it is unambiguous to write E k f m and E k f m . Now

we calculate

∆(xm ) = Exm − xm = (x + 1)m − xm

= (x + 1)x(x − 1) · · · (x − m + 2) − x(x − 1) · · · (x − m + 2)(x − m + 1)

= (x + 1) − (x − m + 1) xm−1 = mxm−1

for all m ∈ N, and this is the discrete analog of (4). Thus Σ xm = xm+1 /(m + 1) and
nm+1
∑ km =
0≤k<n m+1

for all m, n ∈ N, by Lemma 23.1. We are somewhat abusing our notation here in
that we write km instead of xm (k) for k ∈ N.
23.1. Polynomial summation 649

By expressing the ordinary powers xm as a linear combination of falling fac-

torials, we can solve the summation problem for arbitrary polynomials, as in the
following example.

E XAMPLE 23.4.
n2 n(n − 1)
∑ k= ∑ k1 = = ,
0≤k<n 0≤k<n 2 2
n3 n2 n(n − 1)(n − 2) n(n − 1)
∑ k2 = ∑ (k2 + k1 ) = + = +
0≤k<n 0≤k<n 3 2 3 2
n(n − 1)(2n − 1)
= ,
6
n4 n2 n2 (n − 1)2
∑ k3 = ∑ (k3 + 3k2 + k1 ) = + n3 + = .
0≤k<n 0≤k<n 4 2 4
These sums are in accordance with the first three examples at the beginning of the
chapter. ✸

Since for each m ∈ N both xm , . . . , x2 , x, 1 and xm , . . . , x2 , x, 1 are Q-bases of the

vector space of polynomials in Q[x] of degree at most m, we can write
nmo
xm = ∑ xi (5)
0≤i≤m i
with rational numbers { mi } for 0 ≤ i ≤ m. Then

m m−1 m−1
x = x·x = ∑ x · xi
0≤i<m i

m − 1 i+1 m−1
= ∑ x + ∑ i · xi
0≤i<m i 0≤i<m i

m m−1 m−1
=x + ∑ +i xi
1≤i<m i − 1 i

for m > 1, where we have used that xi+1 = xi (x − i) in the second line. Comparing
coefficients of xi in the last line and in (5), we find that the { mi } satisfy the recursion
formula nmo m − 1 m − 1
= +i (6)
i i−1 i
for m ≥ i > 0, with the boundary conditions
nmo nmo
0
= 0 if i > m, = 0 for m ≥ 1, = 1.
i 0 0
In particular, { mi } is a nonnegative integer for 0 ≤ i ≤ m.
650 23. Symbolic summation

The numbers { mi } have a combinatorial interpretation: they count the number

of partitions of the set {1, . . . , m} into i nonempty subsets and are known as the
Stirling numbers of the second kind. For example, the partitions of {1, . . . , 4}
into two nonempty subsets are

{1}{2, 3, 4}, {2}{1, 3, 4}, {3}{1, 2, 4}, {4}{1, 2, 3},

{1, 2}{3, 4}, {1, 3}{2, 4}, {1, 4}{2, 3},

and hence 42 = 7. Obviously m1 = m m = 1 for m ≥1. The combinatorial
m−1

interpretation of the recursion formula (6) is that there are i−1 possibilities for

m to be in a subset of its own and i · m−1 i possibilities to adjoin m to one of the i
subsets of a partition of {1, . . . , m − 1} into i nonempty subsets.
The representation (5) for m = 1, . . . , 5 is:

x1 = x1 ,
x2 = x2 + x1 ,
x3 = x3 + 3x2 + x1 ,
x4 = x4 + 6x3 + 7x2 + x1 ,
x5 = x5 + 10x4 + 25x3 + 15x2 + x1 .

If g = ∑0≤m≤d gm xm ∈ F[x] is an arbitrary polynomial of degree d, we get

!
nmo nmo
i
Σg = Σ ∑ g m ∑ i x = ∑ gm Σ xi
0≤m≤d 0≤i≤m 0≤i≤m≤d i
n m o xi+1 (7)
= ∑ gm i i + 1 ,
0≤i≤m≤d

and hence n m o ni+1

∑ g(k) = ∑ gm .
0≤k<n 0≤i≤m≤d i i+1
In particular, Σ g is a polynomial of degree d + 1, in analogy with symbolic inte-
gration. This completely solves the summation problem for polynomials. For a
different approach using Bernoulli numbers, see Exercise 23.8.

23.2. Harmonic numbers

In the case of symbolic integration, we have seen that the rational function 1/x in
F(x) has no rational integral. Something similar happens when summing 1/x:
1 1 1
∑ = 1 + + · · · + = Hn .
1≤k≤n k 2 n
23.2. Harmonic numbers 651

The rational number Hn is called the nth harmonic number, since it is the nth
partial sum of the (divergent!) harmonic series ∑k≥1 1/k. The following lemma
implies that the harmonic numbers cannot be represented by a rational function.

L EMMA 23.5. There is no rational function ρ ∈ F(x) with ∆ρ = 1/x.

P ROOF. We assume the contrary, namely that there exist coprime polynomials
f , g ∈ F[x] of degrees m and n, respectively, such that

1 f Ef f g · E f − f · Eg
=∆ = − =
x g Eg g g · Eg

in F(x), or, equivalently,

g · Eg = x(g · E f − f · Eg) (8)

in F[x]. It is clear that g is not constant, and hence m + n ≥ 1. The coefficient

of xm+n in g · E f − f · Eg = g∆ f − f ∆g is zero, and the coefficient of xm+n−1 is
(m − n) lc( f ). Comparing degrees on the two sides of (8), we obtain

m = n ⇐⇒ deg(g · Eg) = m + n
⇐⇒ deg (x(g · E f − f · Eg)) = m + n
⇐⇒ m 6= n.

This contradiction proves the claim. ✷

Generalizations of this lemma are in Exercises 23.12 and 23.28. The harmonic
numbers are the discrete analog of the natural logarithm ln x, and in fact

1 1
Hn ∈ ln n + γ + + O 2 ,
2n n

where γ = limn−→∞ (Hn −ln n) = 0.5772156649 . . . is Euler’s constant. Table 23.1

shows the values of Hn , ln n, and their difference for increasing values of n.

n Hn ln n Hn − ln n n(Hn − ln n − γ)
10 2.9289682540 2.3025850930 0.6263831610 0.4916749607
100 5.1873775176 4.6051701860 0.5822073316 0.4991666750
1000 7.4854708606 6.9077552790 0.5777155816 0.4999166667
10 000 9.7876060360 9.2103403720 0.5772656640 0.4999916667
100 000 12.0901461299 11.5129254650 0.5772206649 0.4999991667
1 000 000 14.3927267229 13.8155105579 0.5772161650 0.4999999167

TABLE 23.1: The difference between Hn and ln n.

652 23. Symbolic summation

One can also prove that there is no rational function representing the sum

1
∑ k j
1≤k≤n

for j = 2, 3, . . . (Exercise 23.28). The corresponding infinite series is the famous

Riemann zeta function ζ ( j) = ∑k≥1 k− j , which converges for j ≥ 2 (in fact, for
any complex j ∈ C with ℜ j > 1). This function plays a fundamental role in ana-
lytic number theory (Notes 18.4). For j = 2, its value is

1 π2
ζ (2) = ∑ 2 6.=
k≥1 k

We have come across this number in Chapter 3, where we found its inverse to be
the probability that two random integers have a nontrivial gcd.

Modern Computer Algebra 1

Modern Computer Algebra 2 d2
Modern Computer Algebra 3 d3
Modern Computer Algebra 4 d4
d5

F IGURE 23.2: A tower of books on a table.

The following example gives a physical interpretation of the harmonic numbers.

Suppose that we have a collection of books, all of equal size and equal weight and
with the center of gravity in the middle, and want to build a tower of them at the
edge of a table with a horizontal overhang as far as possible so that the tower
does not fall over (see Figure 23.2). We assume for simplicity that each book has
length 2, and denote the distance between the right edges of book 1 and book i
(counted from the top) by di . The book on top of the tower does not fall down if
the vertical projection of its gravity center hits the second book, so the maximal
possible value for d2 is 1. The second book does not produce a collapse if the
common center of gravity of the upper two books is vertically above book 3, and
so on. In general, di is the horizontal distance between the common gravity center
of books 1 through i − 1 and the right edge of book 1, which leads to the recursive
formula
(i − 1)di = (d1 + 1) + · · · + (di−1 + 1), (9)
since the center of gravity of k objects of equal weight whose respective centers of
gravity are at position p1 , . . . , pk is at position (p1 + · · · + pk )/k. We get rid of the
dependence on all previous values in (9) by subtracting the recursion formulas for
i and i − 1:
(i − 1)di − (i − 2)di−1 = di−1 + 1.
23.3. Greatest factorial factorization 653

After rearranging terms and unfolding the recurrence, we obtain

1 1 1
di = di−1 + = d1 + 1 + + · · · + = Hi−1 ,
i−1 2 i−1
if i ≥ 2, since d1 = 0. The first values are d2 = 1, d3 = 3/2, d4 = 11/6, d5 = 25/12,
so that 4 is the minimal number of books in a tower whose top book is horizontally
completely beyond the table (try this at home!). Since Hn is unbounded, one can in
principle reach any horizontal distance from the table, but the logarithmic growth
of Hn necessitates exponentially many books. For example, with 1 000 000 books,
the overhang is ≈ 14.39 (don’t try this at home!).
Now that we have seen the discrete analog of the logarithm, what is the analog
of the exponential function? Its characteristic differential equation is Dex = ex .
The corresponding identity ∆ f = f implies that f (x + 1) = 2 f (x), and we check
that f = 2x does the job. More generally,
∆cx = cx+1 − cx = (c − 1)cx
for all constants c ∈ F, and hence Σ cx = cx /(c − 1) if c 6= 1. In the version
cn c0 cn − 1
∑ ck = − = ,
0≤k<n c−1 c−1 c−1
we recognize the familiar formula for the geometric sum.
Here is a summary of the analogies between integration and summation.

g = Df g = ∆f
R
f= g f = Σg
Z b
f (b) − f (a) = g(x)dx f (b) − f (a) = ∑ g(k)
a a≤k<b

Dxm = mxm−1 ∆xm = mxm−1 for m ∈ Z

Z
xm+1 xm+1
xm dx = Σ xm = for m ∈ Z \ {−1}
Z n m+1 m+1
−1 −1
x dx = ln n ∑ k = Hn
1 1≤k≤n
Z
x cx cx
c dx = Σ cx = for c 6= 1
ln c c−1

23.3. Greatest factorial factorization

In this section, we discuss a representation of polynomials related to the squarefree
decomposition from Section 14.6. As the latter does for symbolic integration, this
greatest factorial factorization will play a crucial role for symbolic summation
in Section 23.4. The aim is to write a monic polynomial f ∈ F[x] uniquely as a
product f = f11 f22 · · · fmm , with f1 , . . . , fm ∈ F[x].
654 23. Symbolic summation

E XAMPLE 23.6. To write

f = x5 + 2x4 − x3 − 2x2 = (x − 1)x2 (x + 1)(x + 2) ∈ Q[x]

as a product of falling factorials, we have (among others) the following possibili-

ties:
f = x1 (x + 2)4 = x2 (x + 2)3 = (x2 + 2x)1 (x + 1)3 = f 1 . ✸

Intuitively, the first factorization in the above example is “maximal” in the sense
that we have extracted the “greatest” possible falling factorial. An informal algo-
rithm for its computation would be to look for the largest m ∈ N such that gm | f
for some nonconstant g ∈ F[x], divide gm out, and proceed recursively.

D EFINITION 23.7. Let f , f1 , . . . , fm ∈ F[x] and f monic. Then ( f1 , . . . , fm ) is

called a greatest factorial factorization (gff) of f if the following hold.
(F1 ) f = f11 · · · fmm ,
(F2 ) f1 , . . . , fm are monic and fm 6= 1,
(F3 ) gcd( fii , E f j ) = 1 for 1 ≤ i ≤ j ≤ m,

(F4 ) gcd( fii , E − j f j ) = 1 for 1 ≤ i ≤ j ≤ m.

The definition formalizes the maximality condition indicated above: (F3 ) states
j
that the falling factorial f j = f j · E −1 f j · · · E − j+1 f j cannot be extended to the left
(if g = gcd( fii , E f j ) 6= 1, then g j+1 is a falling factorial of length j + 1 dividing f ),
and (F4 ) means that it cannot be extended to the right.
In the example, only the first sequence (x, 1, 1, x + 2) is a greatest factorial fac-
torization. For (1, x, x + 2), condition (F4 ) is violated since gcd(x2 , E −3 (x + 2)) =
x − 1, so that (x + 2)3 may be extended to the right by (x − 1) to get (x + 2)4 .
The factorization (x2 + 2x, 1, x + 1) fails to satisfy (F3 ) for i = 1 and j = 3, since
(x + 1)3 may be extended to the left by the factor x + 2 of x2 + 2x, and ( f ) violates
both (F3 ) and (F4 ) for i = j = 1.

E XAMPLE 23.8. The monic irreducible factors of the polynomial

f = x(x − 1)3 (x − 2)2 (x − 4)2 (x − 5) ∈ F[x]

are all integral shifts of x. Figure 23.3 illustrates the shift structure of f . A bullet at
point (i, j) ∈ N 2 indicates that E −i x j = (x − i) j divides f . A gff of f can be read off
this figure by collecting maximal horizontal chains and packing together chains of
equal length (shaded equally in Figure 23.3). Thus (x2 − 5x + 4, x2 − 5x + 4, x) =
((x − 1)(x − 4), (x − 1)(x − 4), x) is a gff of f . ✸
23.3. Greatest factorial factorization 655

1
0 1 2 3 4 5 i

F IGURE 23.3: The shift structure of x(x − 1)3 (x − 2)2 (x − 4)2 (x − 5).

0 1 2 3 4 5 i
1

F IGURE 23.4: The shift structure of f from Example 23.9.

This example also shows that—in contrast to the squarefree decomposition—the

f1 , . . . , fm need not be pairwise coprime.

E XAMPLE 23.9. We let

1 7 10 2 5
f = x(x − 1)(x − 2)(x − 3)(x − 5)(x − )(x − )(x − )(x − )(x − ) ∈ F[x].
3 3 3 3 3
Then f is squarefree, and all monic irreducible factors of f are shifts of exactly
one of the three polynomials p1 = x, p2 = x − 1/3, and p3 = x − 2/3. Figure 23.4
illustrates the shift structure of f . A bullet at point (i, k) ∈ N 2 indicates that E −i pk
divides f . Again, collecting maximal horizontal chains easily leads to a gff of f ,
which is

2 16 5 2 14 1 7 2
x − x + , x − 3x + , 1, x = (x − )(x − 5), (x − )(x − ), 1, x . ✸
3 3 9 3 3 3

Examples 23.8 and 23.9 illustrate two extreme cases. The general situation is
three-dimensional: If we partition the monic irreducible factors of a polynomial
f ∈ F[x] into shift-equivalence classes Π1 , . . . , Πl and determine the unique rep-
resentative pk ∈ Πk for each k so that all other elements of Πk are of the form E −i pk
656 23. Symbolic summation

for some i ∈ N, then the full information about the shift structure of f can be read
off the set of ordered triples S = {(i, j, k) ∈ N 3 : E −i pkj | f }. In Example 23.8, we
have only one shift-equivalence class Π1 = {x, x − 1, x − 2, x − 4, x − 5} and p1 = x.
In Example 23.9, we have the three classes Π1 = {x, x − 1, x − 2, x − 3, x − 5},
Π2 = {x − 1/3, x − 7/3, x − 10/3}, and Π3 = {x − 2/3, x − 5/3}, and all multiplic-
ities are 1. It is easy to find a gff of f knowing S: As in the examples, we look for
maximal chains “in the i-direction” and combine all chains of equal length.
The examples suggest that the gff is unique, and the following lemma confirms
this.

L EMMA 23.10. A nonzero monic polynomial f ∈ F[x] has at most one gff.

P ROOF. If f = 1, then the empty sequence is the only gff of f , by (F1 ), and we
now assume that deg f > 0. We suppose that ( f1 , . . . , fm ) and (g1 , . . . , gn ) are both
greatest factorial factorizations of f , and show by induction on deg f that they are
equal. Let p ∈ F[x] be an irreducible factor of fm . We will show that p | gm . To
this end, we let k ∈ {1, . . . , n} be maximal such that gcd(pm , gkk ) 6= 1. Then there
exists some i ∈ {−m + 1, . . . , k − 1} with p | E −i gk .
If i > 0, then E p | E −i+1 gk | f , which is impossible by (F3 ) for ( f1 , . . . , fm ), and
j
we conclude that i ≤ 0. If i < 0, then E i+1 p divides f , and hence E i+1 p | g j for some
j
j ∈ {1, . . . , k}, by the maximality of k. Thus E i+1 p | gcd(g j , Egk ), contradicting
(F3 ) for (g1 , . . . , gn ), which implies that i = 0 and p | gk . In (10) and (11), the
situations i > 0 and i < 0, respectively, are illustrated; arrows indicate divisibility.
p E −1 p E −2 p · · ·
↓ ↓ ↓ (10)
gk E −1 gk · · · E −i+1 gk E −i gk E −i−1 gk E −i−2 gk · · ·

p E −1 p · · · E i+1 p E i p E i−1 p E i−2 p · · ·

↓ ↓ ↓ (11)
gk E −1 gk E −2 gk · · ·
j
If k < m, then E −k p divides f , and by the maximality of k, E −k p | gcd(g j , E −k gk )
for some j ∈ {1, . . . , k}; as illustrated in (12).
p E −1 p · · · E −k+1 p E −k p · · · E −m+1 p
↓ ↓ ↓ (12)
−1 −k+1
gk E gk · · · E gk
This contradiction to (F4 ) for (g1 , . . . , gn ) proves that m ≤ k ≤ n. By symmetry, we
also have n ≤ m, hence m = k = n and p | gm . After shortening tails if necessary,
( f1 , . . . , fm /p) and (g1 , . . . , gm /p) are both greatest factorial factorizations of f /pm ,
and we conclude that fi = gi for 1 ≤ i ≤ m by induction on the degree of f . ✷
23.3. Greatest factorial factorization 657

If ( f1 , . . . , fm ) is a gff of f , we write gff( f ) = ( f1 , . . . , fm ). For f = 1, we set

gff( f ) = (), the empty sequence.

D EFINITION 23.11. The shift gcd of f is gcdE( f ) = gcd( f , E f ).

The following lemma not only proves that a gff always exists, but also leads
to an algorithm for its computation similar to the algorithm for computing the
squarefree decomposition (Section 14.6).

T HEOREM 23.12 Fundamental lemma about gff.

Every nonzero monic polynomial f ∈ F[x] has a unique gff, and it can be recur-
sively computed as gff( f ) = ( f1 , f2 , . . . , fm ) if f is nonconstant, where
f
gff(gcdE( f )) = ( f2 , . . . , fm ) and f1 = . (13)
f22 · · · fmm

P ROOF. Uniqueness was proven in Lemma 23.10. For the existence, we proceed
by induction on deg f . If f = 1, then by definition gff( f ) = (), and we may assume
that deg f > 0. Then g = gcdE( f ) has strictly smaller degree than f , since f = E f
if and only if f is constant, by Lemma 23.3. If g = 1, then gff( f ) = ( f ) (it is easily
checked that (F1 ) through (F4 ) hold).
If g is nonconstant, then by induction there are m ∈ N≥2 and nonconstant monic
g1 , . . . , gm−1 ∈ F[x] such that

gff(g) = (g1 , . . . , gm−1 ) and gff(gcdE(g)) = (g2 , . . . , gm−1 ).

Thus
g E −1 g lcm(g, E −1 g)
(E −1 g1 ) · · · (E −m+1 gm−1 ) = E −1 = = .
gcdE(g) gcd(g, E −1 g) g

Now both g and E −1 g divide f , by the definition of gcdE, and hence

lcm(g, E −1 g) f
(E −1 g1 ) · · · (E −m+1 gm−1 ) = | . (14)
g g
If we let fi+1 = gi for 1 ≤ i < m, then (14) proves that f1 in (13) is indeed a
polynomial, and (F1 ) and (F2 ) are satisfied for ( f1 , . . . , fm ).
To show (F3 ) for ( f1 , . . . , fm ), let 1 ≤ i ≤ j ≤ m. If i ≥ 2, then property (F3 ) for
i−1 i−1
(g1 , . . . , gm−1 ) implies that gcd( fi , E f j ) = gcd(gi−1 , Eg j−1 ) = 1. Now E −i+1 fi =
E −i+1 gi−1 divides f /g, by (14), and E f j divides E( f1 · · · fm ) = (E f )/g, and since
658 23. Symbolic summation

f /g and (E f )/g are coprime, so are E −i+1 fi and E f j . Thus (F3 ) holds if i ≥ 2. For
i = 1, we have that f1 | f /g and E f j | (E f )/g, and again gcd( f /g, (E f )/g) = 1
implies that gcd( f1 , E f j ) = 1. This concludes the proof of (F3 ).
The proof of (F4 ) is similar, see Exercise 23.16. ✷

In Example 23.6, we have E f = x(x + 1)2 (x + 2)(x + 3), and

gcdE( f ) = x(x + 1)(x + 2) = (x + 2)3 ,
so that gff(gcdE( f )) = (1, 1, x + 2), in accordance with the previous lemma.
If f = g11 g22 · · · gkk is the squarefree decomposition of f , with monic squarefree
and pairwise coprime polynomials g1 , . . . , gk ∈ F[x], then gcd( f , f ′ ) = g12 · · · gkk−1
(Section 14.6). The fundamental lemma is the discrete analog of this statement,
since gcd( f , E f ) = gcd( f , (E f ) − f ) = gcd( f , ∆ f ). As in the case of squarefree
factorization, the gff of a polynomial f can be computed without factoring f com-
pletely, only by performing essentially gcd calculations.

A LGORITHM 23.13 Computation of the gff.

Input: A monic polynomial f ∈ F[x].
Output: gff( f ).
1. if f = 1 then return ()
2. call the algorithm recursively to compute (g1 , . . . , gm−1 ) = gff(gcdE( f ))
3. for i = 1, . . . , m − 1 do fi+1 ←− gi ,
f
f1 ←−
gcdE( f ) · (E g1 ) · · · (E −m+1 gm−1 )
−1

4. return ( f1 , . . . , fm )

T HEOREM 23.14.
Algorithm 23.13 works correctly as specified and uses O(n · M(n) log n) operations
in F , where n = deg f .

P ROOF. Correctness follows from the fundamental lemma. The recursion depth
of the algorithm is at most n. The cost for one iteration of the algorithm is
O(M(n) log n), and the claim follows. ✷

23.4. Hypergeometric summation: Gosper’s algorithm

In Section 23.1, we have solved the summation problem for polynomials. In what
follows, we will solve the summation problem for a much larger class of ex-
pressions, including rational functions, exponentials, factorials, and binomial co-
efficients. As before, F will denote a field of characteristic zero throughout.
23.4. Hypergeometric summation: Gosper’s algorithm 659

The corresponding algorithm is analogous to the hyperexponential integration al-

gorithm 22.19, and our presentation follows closely Section 22.4.

D EFINITION 23.15. A difference field is a field K together with an automor-

phism E of K . The field of constants CK of K is the fixed field of E : CK =
{ f ∈ K: E f = f }.

Our most important example of a difference field is the field F(x) of rational
functions, with E being the shift operator E f = f (x + 1), as before, and Lemma
23.3 implies that CF(x) = F. Another example is the difference field Q(2x ) with
E(2x ) = 2x+1 = 2 · 2x .

D EFINITION 23.16. Let K be a subfield of the difference field L with automor-

phism E . An element f ∈ L \ {0} is hypergeometric over K if the term ratio
E f / f belongs to the smaller field K .

E XAMPLE 23.17. (i) Every nonzero rational function in F(x) is hypergeometric

over F(x) with respect to the shift operator Ex = x + 1.
(ii) The element f = 2x ∈ Q(x, 2x ), with Ex = x + 1 and E(2x ) = 2 · 2x , is hyper-
geometric over Q(x) (in fact, even over Q), with term ratio E f / f = 2.
(iii) Let L = R(x, Γ ), where Γ is the gamma function (see Notes 23.1 and Exer-
cise 23.5). This is a continuous function on R>0 which satisfies Γ (1) = 1 and
the functional equation Γ (x + 1) = xΓ (x). Thus Γ (n + 1) = n! for all n ∈ N
and Γ “interpolates” the factorial at real values. If we let E act as the shift
operator E Γ = Γ (x + 1), then Γ is hypergeometric over R(x), with term ratio
E Γ /Γ = x ∈ R(x).
(iv) We let L = R(x, f ), where
n Γ (n + 1)
f= =
x Γ (x + 1)Γ (n − x + 1)
extends the familiar binomial coefficient ( nk ) for n, k ∈ N to arbitrary real numbers
in the lower argument. (The gamma function has simple poles at all nonpositive
integers, and the convention 1/Γ (x) = 0 for −x ∈ N makes the binomial coefficient
zero for integer values of x outside {0, . . . , n}, as it should be.) Using the shift
operator E, we compute the term ratio

n
Ef x+1 Γ (n + 1)Γ (x + 1)Γ (n − x + 1) −x + n
= n = = ∈ R(x),
f Γ (x + 2)Γ (n − x)Γ (n) x+1
x
and conclude that binomial coefficients are hypergeometric over R(x) with respect
to their lower argument. (In fact, n might even be another indeterminate, and then
660 23. Symbolic summation

the binomial coefficient is hypergeometric over R(x, n).) A similar computation

shows that binomial coefficients are also hypergeometric over R(x) with respect to
the upper argument. ✸

In what follows, we will say “hypergeometric” for short if we mean “hyperge-

ometric over F(x)”, and E always is the shift operator on F(x). Also, we will ig-
nore questions of convergence and poles completely, simply regarding the gamma
function and the binomial coefficients as formal indeterminates subject to the addi-
tional (purely algebraic) properties E Γ = xΓ and E(( nx )) = (−x + n)/(x + 1) · ( nx ),
keeping in mind that—as for rational functions—we should only evaluate them at
points or sum them over ranges where they are well-defined.
The hypergeometric summation problem is, given a nonzero hypergeometric
term g, to determine another hypergeometric term f such that ∆ f = E f − f = g,
where ∆ = E − I is the difference operator, or to assert (correctly) that no such
term exists. Since g is hypergeometric if g itself is a rational function, a method to
solve the hypergeometric summation problem will in particular provide a solution
for the summation of rational functions. However, Example 23.17 shows that the
range of problems that fall into the hypergeometric category is much larger.
Suppose that we are given a nonzero hypergeometric term g, together with its
term ratio σ = Eg/g ∈ F(x). If f is another hypergeometric term with term ratio
ρ = E f / f ∈ F(x) such that g = ∆ f , then g = E f − f = (ρ − 1) f , which implies
that f = τ g, where τ = 1/(ρ − 1) ∈ F(x) (we have ρ 6= 1, for otherwise g = 0). In
other words: if f = Σ g is hypergeometric, then f is a rational multiple of g and
already belongs to the same difference field as g.
So let us assume that f = τ g for a rational function τ ∈ F(x). Then

∆ f = E f − f = E τ · Eg − τ g = (E τ · σ − τ )g,

and this is equal to g if and only if τ solves the difference equation

E τ · σ − τ = 1 in F(x). (15)

We note that only rational functions occur in (15); we have eliminated all hy-
pergeometric terms and reduced the original hypergeometric summation problem
to a purely rational question.
If we write σ = a/b with coprime polynomials a, b ∈ F[x], b 6= 0 monic, and
similarly τ = u/v with coprime u, v ∈ F[x], v 6= 0 monic, and multiply up denomi-
nators in (15), we arrive at the equivalent polynomial condition

a · v · Eu − b · u · Ev = b · v · Ev. (16)

Conversely, we see that any polynomial solution u, v ∈ F[x] of the above equa-
tion for the given a, b ∈ F[x] yields a solution to our hypergeometric summation
23.4. Hypergeometric summation: Gosper’s algorithm 661

problem by setting
u
f = g. (17)
v
In order to solve (16), we try to find a suitable denominator polynomial v or a
multiple of it. We define v0 , v1 ∈ F[x] by v = v0 · gcdE(v) and Ev = v1 · gcdE(v),
so that v0 , v1 are coprime. Dividing (16) by gcdE(v) yields

a · v0 · Eu − b · u · v1 = b · v0 · v1 · gcdE(v). (18)

Since v0 divides a · v0 · Eu and the right hand side of (18), it divides b · u · v1 . But
gcd(u, v0 ) divides gcd(u, v) = 1 = gcd(v1 , v0 ), and hence v0 | b. Similarly, v1 | a.
Let gff(v) = (h1 , . . . , hm ) be the greatest factorial factorization of v. Then

h11 h22 · · · hm
m
v0 = m−1
= h1 · (E −1 h2 ) · · · (E −m+1 hm ) | b,
h21 · · · hm
(19)
(Eh1 )1 (Eh2 )2 · · · (Ehm )m
v1 = m−1
= (Eh1 )(Eh2 ) · · · (Ehm ) | a,
h21 · · · hm

by the fundamental lemma 23.12. Thus hi divides E −1 a and E i−1 b for 1 ≤ i ≤ m.

If v 6= 1, then 1 6= hm | gcd(E −1 a, E m−1 b) = E −1 gcd(a(x), b(x + m)). Therefore
gcd(a(x), b(x + m)) 6= 1, or, equivalently, R(m) = 0, where R = resx (a(x), b(x + y))
in F[y], by Lemma 6.25. This leads to the following way of computing a multiple
of v, analogous to Algorithm 22.14.

A LGORITHM 23.18 Multiple of summation denominator.

Input: Relatively prime polynomials a, b ∈ F[x] with b 6= 0 monic.
Output: A monic polynomial V ∈ F[x] such that for any coprime u, v ∈ F[x], equa-
tion (16) implies that v divides V .

1. R ←− resx (a(x), b(x + y)), d ←− max{k ∈ N: k = 0 or R(k) = 0}

if d = 0 then return 1

2. for i = 1, . . . , d do Hi ←− gcd(E −1 a, E i−1 b)

3. return H11 · · · Hdd

T HEOREM 23.19.
Algorithm 23.18 works correctly as specified. In particular, if u, v are coprime
polynomials in F[x], with v 6= 0 monic, solving (16), and gff(v) = (h1 , . . . , hm ),
then m ≤ d and hi | Hi for 1 ≤ i ≤ m.
662 23. Symbolic summation

P ROOF. There is nothing to prove if v = 1, and we assume that deg v ≥ 1. Then

the discussion preceding the algorithm implies that m ≥ 1 is a positive integer zero
of the resultant R in step 1, whence d ≥ m, and that hi | Hi for 1 ≤ i ≤ m. Finally,
m+1
v = h11 · · · hm 1 m d
m | H1 · · · Hm Hm+1 · · · Hd | V. ✷

Gosper’s (1978) algorithm computes a multiple of v which sometimes has smal-

ler degree than the multiple computed above. Furthermore, it runs often faster than
Algorithm 23.18, as shown in Example 23.21 below.

A LGORITHM 23.20 Gosper’s multiple of summation denominator.

Input: Relatively prime polynomials a, b ∈ F[x] with b 6= 0 monic.
Output: A monic polynomial V ∈ F[x] such that for any coprime u, v ∈ F[x], equa-
tion (16) implies that v divides V .

1. R ←− resx (a(x), b(x + y)), d ←− max{k ∈ N: k = 0 or R(k) = 0}

if d = 0 then return 1

2. a0 ←− a, b0 ←− b
for i = 1, . . . , d do
ai−1 bi−1
Hi ←− gcd(E −1 ai−1 , E i−1 bi−1 ), ai ←− , bi ←−
EHi E −i+1 Hi

3. return H11 · · · Hdd

In practice, the algorithm may be stopped as soon as ai = 1 or bi = 1 for some

i < d, since then H j = 1 for i < j ≤ d. Moreover, it is sufficient to perform the
loop in step 2 only for those i that are roots of the resultant R.

E XAMPLE 23.21. Let a = x + 2 and b = x(x + 1) = x2 + x in F[x]. Then both

Algorithms 23.18 and 23.20 compute
 
1 0 1
R = resx (x + 2, x2 + 2xy + y2 + x + y) = det  2 1 2y + 1 
0 2 y2 + y
= (y2 + y) − 2(2y + 1) + 4 = (y − 1)(y − 2),

whence d = 2. In step 2 of Algorithm 23.18, we now have

H1 = gcd(E −1 a, b) = gcd(x + 1, x2 + x) = x + 1,
H2 = gcd(E −1 a, Eb) = gcd(x + 1, x2 + 3x + 2) = x + 1,

and finally V = H11 H22 = x(x + 1)2 in step 3.

23.4. Hypergeometric summation: Gosper’s algorithm 663

On the other hand, in step 2 of Algorithm 23.20, we compute

a0 = a = x + 2, b0 = b = x2 + x,
H1 = gcd(E −1 a0 , b0 ) = gcd(x + 1, x2 + x) = x + 1,
a0 b0
a1 = = 1, b1 = = x.
EH1 H1
Here the algorithm stops and returns V = H11 = x + 1 in step 3. ✸

Let a, b ∈ F[x] be nonzero monic polynomials that split over some extension
field K of F into linear factors a = ∏1≤i≤m (x − αi ) and b = ∏1≤ j≤n (x − β j ), and
R = resx (a(x), b(x + y)) ∈ F[y], as in step 1 of Algorithms 23.18 and 23.20. For
any γ ∈ K, we have
R(γ ) = 0 ⇐⇒ resx (a(x), b(x + γ )) = 0 in K
⇐⇒ gcd(a(x), b(x + γ )) 6= 1 in K[x]
⇐⇒ a(x) and b(x + γ ) = ∏ (x − β j + γ ) have a common zero
1≤ j≤n
⇐⇒ γ = β j − αi for some i ∈ {1, . . . , m} and j ∈ {1, . . . , n}.
Thus the roots of R are exactly the distances between the roots of a and b. We
have already used this in Section 6.8 to find the minimal polynomial of the sum
of two algebraic numbers. In particular, if the value d computed in step 1 of
Algorithms 23.18 and 23.20 is nonzero, then it is the maximal positive integer
distance between a root of a and a root of b. Thus for computing d, we may
replace a and b by their squarefree parts.

E XAMPLE 23.22. Let g = x · Γ (x + 1), so that g(k) = k · k! for k ∈ N. Then the

term ratio is
(x + 1)Γ (x + 2) (x + 1)2
σ= = ∈ F(x),
xΓ (x + 1) x
and hence a = (x + 1)2 and b = x. In step 1 of Algorithms 23.18 and 23.20, we
compute
 
1 1 0
R = resx (x2 + 2x + 1, x + y) = det  2 y 1  = y2 − 2y + 1 = (y − 1)2 ,
1 0 y
whence d = 1. Thus the maximal positive integer distance between a zero of a and
a zero of b is 1, which can also be seen directly in this example. In fact, if replace
a by its squarefree part a∗ = x + 1, then the smaller resultant

∗ 1 1
resx (a (x), b(x + y)) = resx (x + 1, x + y) = det = y−1
1 y
leads to the same value d = 1.
664 23. Symbolic summation

In step 2 of Algorithm 23.18, we now compute

H1 = gcd(E −1 a, b) = gcd(x2 , x) = x

(Algorithm 23.20 computes the same value for H1 ), and the denominator v of a
solution u/v of (16) divides V = H11 = x—provided that such a solution exists. ✸

Exercise 23.27 gives an upper bound on d, and the following example shows
that this bound is almost sharp, and that the value of d may be exponentially large
in the input length.

E XAMPLE 23.23. We let g = (x2 + nx)−1 ∈ Q(x), with a parameter n ∈ N≥1 . Its
term ratio is
Eg x(x + n)
= ,
g (x + 1)(x + n + 1)
so that the input to Algorithm 23.18 is a = x(x + n) and b = (x + 1)(x + n + 1).
Then d = n − 1 in step 1, Hi = gcd(a(x − 1), b(x + i − 1)) = 1 for 1 ≤ i < n − 1,
and Hn−1 = gcd(a(x − 1), b(x + n − 2)) = (x + n − 1), so that V = (x + n − 1)n−1 =
(x + 1)n−1 . (Algorithm 23.20 computes the same result.) Its degree is exponential
in the size of a and b, which is about log264 n words. ✸

At this point, we have computed a monic multiple V of v. If we find U ∈ F[x]

such that
a ·V · EU − b · EV ·U = b ·V · EV, (20)
then τ = U/V ∈ F(x) is a solution of (15), and f = Ug/V solves the hypergeo-
metric summation problem. Conversely, if τ = u/v, with u, v coprime and v 6= 0
monic, satisfies (15), then U = uV /v is a solution of (20).
Dividing both sides by h = gcd(a · V, b · EV ) ∈ F[x], we arrive at the equivalent
equation
r · EU − s ·U = t, (21)
where r = a ·V /h, s = b · EV /h is monic, and t = s ·V . (In fact, we can also get rid
of the factor s on the right hand side; see Exercise 23.26.) In Example 23.22, (20)
reads
x(x + 1)2 · EU − x(x + 1) ·U = x2 (x + 1),
or equivalently, (x + 1)EU −U = x, after dividing by h = x(x + 1), so that r = x + 1
and s = 1.
In order to solve (21) for U, it is sufficient to have an upper bound on the degree
of U, since then (21) is equivalent to a system of linear equations in the coefficients
of U. Such a bound is provided by the following lemma.
23.4. Hypergeometric summation: Gosper’s algorithm 665

L EMMA 23.24. Let r, s,t,U ∈ F[x] satisfy (21), with r,U nonzero and r monic,
let m = max{deg r − 1, deg(s − r)}, and let δ ∈ F be the coefficient of xm in s − r.
(As usual, δ = 0 if m < 0 or deg(s − r) < m.) Moreover, let

degt − m if deg r − 1 < deg(s − r) or δ 6∈ N \ {0, 1, . . . , degt − m},
e=
δ otherwise.

Then the following hold.

(i) Either degU = degt − m, or else degU = δ > degt − m and deg r − 1 ≥
deg(s − r). In particular, degU ≤ e.

(ii) If deg r − 1 ≥ deg(s − r), then degt − m 6= δ .

(iii) If t = 0, then deg r − 1 ≥ deg(s − r) and degU = δ ∈ N.

(iv) If deg r − 1 < deg(s − r) or δ 6∈ N, then exactly one U ∈ F[x] satisfies equa-
tion (21).

The proof is analogous to the proof of Lemma 22.18 and left as Exercise 23.31.
We recall that the zero polynomial has degree −∞; thus degt − m is to be inter-
preted as −∞ if t = 0. We have N ⊆ Q ⊆ F, so that any integer is also an element
of F.
In any case, we can (almost) determine degU from the known polynomials r, s,
and t = s · V . If the value e from Lemma 23.24 is a nonnegative integer and if
deg r − 1 < deg(s − r) or degt − m 6= δ , we set up the system of linear equa-
tions equivalent to (21) and solve it for the unknown coefficients of U. Then
τ = U/V ∈ F(x) satisfies (15), and we get a solution to our hypergeometric sum-
mation problem as in (17). If, however, e is negative, or if deg r − 1 ≥ deg(s − r)
and degt − m = δ , or if the linear system is unsolvable, then we know that (15) has
no rational solution τ ∈ F(x), and no hypergeometric term f with ∆ f = g exists.
Here is the complete algorithm.

A LGORITHM 23.25 Gosper’s algorithm for hypergeometric summation.

Input: Nonzero coprime polynomials a, b ∈ F[x], with b 6= 0 monic, where F is a
field of characteristic zero.
Output: Coprime polynomials u, v ∈ F[x], with v monic, satisfying (16), if such
polynomials exists, and otherwise “unsolvable”.

1. call Algorithm 23.20 with input a, b, yielding a monic multiple V ∈ F[x] of

the denominator.
a ·V b · EV
2. g ←− gcd(a ·V, b · EV ), r ←− , s ←− , t ←− s ·V
g g
666 23. Symbolic summation

3. m ←− max{deg r − 1, deg(s − r)}

let δ be the coefficient of xm in s − r
if deg r − 1 < deg(s − r) or δ 6∈ N then e ←− degt − m
else if degt − m = δ then return “unsolvable”
else e ←− max{degt − m, δ }
if e < 0 then return “unsolvable”
4. solve the linear system corresponding to (21) for the unknown coefficients
U0 , . . . ,Ue ∈ F of U of degree at most e
if the system is unsolvable then return “unsolvable”
else U ←− Ue xe + · · · +U1 x +U0
U V
5. return and
gcd(U,V ) gcd(U,V )

The solution space of the linear system in step 4 is either empty, or has precisely
one element, or is one-dimensional. In the latter case, we preferably take a solu-
tion leading to a numerator U of smallest degree. The coefficient matrix of the
linear system is triangular, so that the system is particularly easy to solve without
Gaussian elimination, simply by back substitution, taking O(e2 ) operations in F.
At most one diagonal element is zero, and this occurs only if deg r −1 ≥ deg(s−r).
Exercise 23.26 shows that we may even obtain gcd(r,t) = gcd(s,t) = 1 in step 2,
by appropriately dividing out common factors, and perform steps 3 through 5 for
a divisor of U. This may further reduce the size of the linear system in step 4.
We conclude this section with a series of examples.

E XAMPLE 23.22 (continued). We obtain the equation (x + 1)EU − U = x. Thus

r = x + 1, s = 1, t = x, deg r − 1 = 0 < deg(−x) = deg(s − r), and thus degU =
degt − m = 0. Hence if the equation is solvable for U, then U = U0 ∈ F is a
constant polynomial, and by comparing coefficients on both sides, we see that
U = 1 does the job. This shows that f = Ug/V = Γ (x + 1) is a hypergeometric
solution of the summation problem ∆ f = g (indeed we check that Γ (x + 2) −
Γ (x + 1) = x · Γ (x + 1)), and we obtain the formula

∑ k · k! = ∑ g(k) = f (n) − f (0) = n! − 1. ✸

0≤k<n 0≤k<n

E XAMPLE 23.23 (continued). We have h = gcd(a·V, b·EV ) = (x+1)n , and equa-

tion (21) reads

x · EU − (x + n + 1) ·U = (x + n + 1)(x + 1)n−1 , (22)

with r = x, s = x + n + 1, and t = (x + n + 1)(x + 1)n−1 . Then deg r − 1 = 0 =

deg(n + 1) = deg(s − r) and δ = n + 1. Thus e = max{n, n + 1} = n + 1 in step 3 of
23.4. Hypergeometric summation: Gosper’s algorithm 667

Algorithm 23.25. If n = 2, for example, we let U = U3 x3 +U2 x2 +U1 x +U0 , with

undetermined coefficients U3 ,U2 ,U1 ,U0 . Plugging this into (22), we see that the
coefficients of x4 and x3 cancel, and comparing coefficients of x2 , x and 1 on both
sides, we arrive at the linear system

3U3 −U2 = 1, U3 +U2 − 2U1 = 4, 3U0 = 3,

whose solutions are U3 = λ, U2 = 3λ − 1, U1 = 2λ − 5/2, and U0 = 1, for an

arbitrary choice of λ ∈ F. Taking λ = 0 yields the solution U = −(x2 + 25 x + 1) of
smallest degree, and in fact

U x2 + 25 x + 1 x + 21
f= g=− 2 =−
V (x + 2x)(x + 1) x(x + 1)
satisfies ∆ f = g. Exercise 23.13 shows that for an arbitrary positive integer n,
U = − 1n (x + n)(xn )′ solves (22), where ′ denotes the formal derivative, and that
∆(−(xn )′ /nxn ) = 1/(x2 + nx). ✸

E XAMPLE 23.26. Let g = 1/x ∈ F(x). Then σ = Eg/g = x/(x + 1), and a = x,
b = x + 1. We compute R = resx (x, x + y + 1) = y + 1 in step 1 of Algorithms 23.20
and 23.18, and hence d = 0 and V = 1. Plugging this into (21), we obtain

x · EU − (x + 1)U = (x + 1).

We have r = x, s = t = x + 1, deg r − 1 = 0 = deg(s − r), and degt − m = 1 = δ . By

Lemma 23.24 (ii), equation (21) has no solution U, and we conclude that Σ (1/x)
is not hypergeometric. This generalizes Lemma 23.5. ✸

The following example shows that if deg r − 1 ≥ deg(s − r) in Lemma 23.24 (i),
both choices for degU may lead to a solution of the hypergeometric summation
problem.

E XAMPLE 23.27. For a fixed n ∈ N≥1 , we want to determine

k n
∑ (−1) . (23)
0≤k<m k

We know, of course, that the sum is 0 for m > n, by the binomial theorem, but
sometimes such sums also occur with the upper summation bound being smaller
than n. The term ratio of the summand g = (−1)x ( nx ) is

x+1 n
(−1)
Eg x+1 x−n
σ= = n = ,
g (−1)x x+1
x
668 23. Symbolic summation

and hence a = x − n and b = x + 1. (The reader being concerned about what (−1)x
might mean for x 6∈ Z may replace it by eiπx .) The first step of Algorithms 23.18
and 23.20 computes
R = resx (x − n, x + y + 1) = y + n + 1,
so that d = 0 and V = 1. Equation (21) then is
(x − n)EU − (x + 1)U = x + 1, (24)
and we have r = x − n, s = t = x + 1, deg r − 1 = 0 = deg(s − r), and δ = n + 1.
Lemma 23.24 (i) implies that either degU = degt − m = 1 or degU = δ = n + 1.
The lemma does not tell us which choice to make, so we try degU = 1 first, since
it looks easier. With U = U1 x +U0 , (24) is
x + 1 = (x − n)(U1 (x + 1) +U0 ) − (x + 1)(U1 x +U0 )
= (−n)U1 x + (−nU1 − (n + 1)U0 ),
and the linear system
1 = −nU1 , 1 = −nU1 − (n + 1)U0
has the unique solution U1 = −1/n and U0 = 0. Thus τ = U/V = −x/n,

x n x n x x n x−1 n − 1
Σ (−1) = τ · (−1) = − (−1) = (−1) ,
x x n x x−1
and consequently
n n
∑ (−1)k = 1+ ∑ (−1)k
0≤k<m k 1≤k<m k

m−1n−1 0 n−1
= 1 + (−1) − (−1)
m−1 0

n−1
= (−1)m−1
m−1
for all m ∈ N. In particular, the sum is 0 if we plug in m > n, as it should be.
Now let us try degU = δ = n + 1. If a solution U ∗ ∈ F[x] of (24) of degree n + 1
exists, then the difference U = U ∗ −U solves the homogeneous equation
(x − n)EU − (x + 1)U = 0,
and vice versa. Inspection shows that U = xn+1 is a solution of the homogeneous
equation, and hence U ∗ = U +U = xn+1 − x/n is a solution of (24) of degree n + 1.
Now U = Γ (x + 1)/Γ (x − n), and hence
(x + 1)Γ (x + 1)
E(Ug) = EU · Eg = · σ g = Ug,
(x − n)Γ (x − n)
Notes 669

or equivalently, ∆(Ug) = 0, which means that Ug is a new “constant”. In particu-

lar, Ug vanishes at all integers since it vanishes at zero, and we may ignore it if we
are only interested in summing g for integral values of x.
The above derivation is essentially still valid if n is an indeterminate, that is, if
we want to solve the summation problem over the field F(n) of rational functions
in n with coefficients in F. One exception is that then δ = n + 1 6∈ N, and hence the
n−1
case degU = δ need not be considered at all. Moreover, (−1)m−1 m−1 is nonzero
as a polynomial in n for all integers m ≥ 1, but has 0, . . . , m − 2 as zeroes.
Exercise 23.21 shows that the summation problem (23) has no hypergeometric
solution if we omit the factor (−1)x . ✸

Notes. An excellent reference for much of this chapter is Graham, Knuth & Patashnik
(1994); the book tower example is from this text. It also contains useful information about
binomial coefficients, Bernoulli, Euler, and Stirling numbers, including lots of pretty sums
involving them.
The theory of differencing, summing, and solving difference (or recurrence) equations
in a symbolic fashion is treated in classics like Boole (1860) and Jordan (1965, first edition
1939). The solution of differential equations by discretization has been a driving force for
studying “difference calculus” and solving difference equations numerically.
23.1. Archimedes gives in his book On spiral lines, Proposition 10, essentially the formula
for ∑0≤k<n k2 . Fermat says in a letter to Mersenne, to be forwarded to de Sainte-Croix,
from September or October 1636, that he has found a way of computing ∑0≤k<n km for
any m. He gives his solution for m = 4 (actually describing it for n = 5):
!
1 2
∑ k4 = 5 4(n − 1) + 2 · ∑ k − ∑ k2 .
0≤k<n 0≤k<n 0≤k<n

Von zur Gathen & Gerhard (1997) analyze several algorithms to compute E f = f (x + 1)
for a polynomial f ∈ Z[x].
There is a common framework which covers the similarity between the two equalities
Dxn = nxn−1 and ∆xn = nxn−1 for n ∈ N, the umbral calculus. It studies linear operators
on the vector space F[x] of polynomials over a field F. If T is such a linear operator with
the additional properties that T commutes with the differential operator D and deg(T f ) =
deg f −1 for all nonzero f ∈ F[x], then there is a unique sequence f0 = 1, f1 , f2 , . . . ∈ F[x] of
polynomials such that deg fn = n, T fn = n fn−1 , and fn (0) = 0 for all n ≥ 1, the sequence
associated to T . Thus fn = xn is associated to D and fn = xn is associated to ∆. All
associated sequences satisfy a binomial theorem
n
fn (x + y) = ∑ fk (x) fn−k (y)
0≤k≤n k

(this is clear for fn = xn , and Exercise 23.9 proves it for the falling factorials); in fact, this
property is equivalent to being an associated sequence. The origins of the umbral calculus
date back to the middle of the 19th century, and Gian-Carlo Rota put it on a rigorous formal
basis in the 1970s. An excellent reference is Roman (1984).
670 23. Symbolic summation

R
The gamma function is defined by Γ (x) = 0∞ e−t t x−1 dt for all x ∈ R≥0 . It satisfies
the functional equation Γ (x + 1) = xΓ (x) for all x ≥ 1, and since Γ (1) = 1, we have in
particular Γ (n + 1) = n! for all n ∈ N (Exercise 23.5). By analytic continuation, the gamma
function can be extended to a meromorphic function on the complex plane with simple
poles at the nonpositive integers 0, −1, −2, −3, . . .. This leads to a more general definition
of (falling and rising) factorials and binomial coefficients, via xn = Γ (x + n)/Γ (x) for
arbitrary complex numbers x and n with x + n 6∈ −N.
The rising factorial is also called the Pochhammer symbol and written as (x)m .
The formula for summing a polynomial goes back to Stirling (1730), where also the
Stirling numbers (of both kinds) are defined. Vandermonde introduced xn in 1772. Knuth
(1993) explains Johann Faulhaber’s (1631) methods for summation of powers, yielding
much more beautiful expressions than the ones we give. A true renaissance man.
23.3. The definition of the gff, as well as Theorem 23.12 and Algorithm 23.13, are from
Paule (1995). We have adopted the graphical representation of the shift structure from
Pirastu (1992).
23.4. Hypergeometric terms inherit their name from hypergeometric series, which are
power series over C with hypergeometric coefficients in the sense of Definition 23.16:
a series f = ∑k≥0 fk xk /k! ∈ C[[z]] is hypergeometric if

ak1 · · · akm
fk =
bk1 · · · bkn

for all k ∈ N, where the upper parameters a1 , . . . , am ∈ C may be arbitrary and the lower
parameters b1 , . . . , bn ∈ C must not lie in −N. Hypergeometric series have a distinguished
history in calculus, and many familiar series such as

zk 1 zk 1 1k 1k zk
exp(z) = ∑ k! , a
= ∑ ak , ln =z∑
k≥0 (1 − z) k≥0 k! 1−z k≥0 2 k!
k

are in fact hypergeometric, including the second example for a = 1: the geometric se-
ries. Hypergeometric series with two upper parameters and one lower parameter were first
studied by Euler, Gauß, and Pfaff.
Algorithm 23.20, Lemma 23.24, and Algorithm 23.25 are due to Gosper (1978); see
also Graham, Knuth & Patashnik (1994), §5.7, and Koepf (1998). Our presentation follows
Paule (1995).
In view of the fact that the values of d and δ may be exponentially large in the input
size, as in Examples 23.23 and Exercise 23.24, it would be interesting to know what asym-
ptotically fast methods can achieve in this area; a first step was taken by von zur Gathen
& Gerhard (1997). Gerhard (2001b), Chapters 8 and 10, gives a cost analysis of modular
variants of Algorithms 23.20 and 23.25, respectively, in terms of word operations.
If Gosper’s algorithm is applied to a rational function g = p/q, with nonzero p, q ∈ F[x],
then deg r − 1 ≥ deg(s − r) in Lemma 23.24. Lisoněk, Paule & Strehl (1993) prove that
in this situation the case degU = δ occurs if and only if g is a proper rational function, so
that deg p < deg q. Moreover, they show that if Σg exists, then (21) has a unique solution
U ∈ F[x] of degree degt − m, so that the case degU = δ in Lemma 23.24 (i) need not be
considered.
Exercises 671

Further notes. The early work of Abramov (1971, 1975), Moenck (1977b), Gosper
(1978), and Karr (1981, 1985) was influential for symbolic summation. Lafon (1983)
gives an overview of the state in the early 1980s. More recent works on rational and hyper-
geometric summation and extensions of Gosper’s algorithm are due to Lisoněk, Paule &
Strehl (1993), Man (1993), Petkovšek (1994), Pirastu & Strehl (1995), Koepf (1995), Paule
(1995), Paule & Strehl (1995), Pirastu (1996), Bauer & Petkovšek (1999), and Gerhard,
Giesbrecht, Storjohann & Zima (2003).
An exciting development was started by Zeilberger’s (1990a, 1990b, 1991) solution of
the definite hypergeometric summation problem (the two sums (1) and (2) are of that type).
It provided rather surprising computer-aided verifications of well-known identities, such as
the Rogers-Ramanujan formula, the Pfaff-Saalschütz identity, Dixon’s theorem, Apéry’s
formula, and similar proofs of new identities. We will briefly discuss this in Section 24.3;
see also Notes 24.3. Almkvist & Zeilberger (1990) discuss an analogous algorithm for
definite hyperexponential integration. Generalizations of these algorithms are due to Wilf
& Zeilberger (1992), Chyzak (1998a, 1998b, 2000), Chyzak & Salvy (1998), and Abramov
& van Hoeij (1999).
Equation (15) is a special case of a first order linear difference equation with rational
coefficients. Algorithms for solving higher order linear difference equations are given, for
example, by Abramov (1989a, 1989b, 1995), Petkovšek (1992), van Hoeij (1998, 1999),
Hendriks & Singer (1999), and Bronstein (2000).

Exercises.
23.1 Give an example of functions f , g: R −→ R such that f (k + 1) − f (k) = g(k) for all k ∈ Z but
∆ f 6= g.
23.2 Let ∇ be the “backward” difference operator, with ∇ f = f − E −1 f = ∆E −1 f . Prove the
following identities for all m ∈ N.
(i) xm = (x + m − 1)m = (−1)m (−x)m ,
(ii) ∆xm = m · Exm−1 ,
(iii) ∇xm = mxm−1 .
23.3 (i) Show that ∆ f · g − f · ∆g = E f · g − f · Eg.
(ii) The product rule for the difference operator ∆ can be written as
∆( f · g) = f · ∆g + Eg · ∆ f = E f · ∆g + g · ∆ f .
Find and prove a quotientrule for ∆ expressing
∆( f /g) in terms of f , ∆ f , g, ∆g, and Eg.
(iii) Prove that ∆ f m = E f − E 1−m f · f m−1 for all m ∈ N.
23.4∗ Let F be a field of characteristic zero. For an arbitrary h ∈ F, we define the h-shift operator
E h by E h f = f (x + h) (if h ∈ Z, then this coincides with the usual definition as hth power of E), and
similarly ∆h = E h − I.
(i) Show that ∑0≤k<n g(a + kh) = f (a + nh) − f (a) for all n ∈ N and a ∈ F if ∆h f = g.

k
(ii) Prove the operator identities ∆kh = ∑0≤i≤k (−1)k−i i E ih and ∆k = ∆ ∑0≤i<k E i for k ∈ N.
(iii) Let f ∈ F[x] have degree less than n, and let h ∈ F. Prove that f has the Newton expansion
(∆ih f )(0)
f= ∑ x(x − h) · · · (x − ih + h),
0≤i<n hi i!
and relate this to the Taylor expansion of f around 0, and also to Newton interpolation (Exercise 5.11)
at the equidistant pointsu j = jh for 0 ≤ i < n.
(iv) Conclude that i! mi = (∆i xm )(0) for 0 ≤ i ≤ m.
672 23. Symbolic summation

R
23.5 (i) Show that Γ (x) = 0∞ e−t t x−1 dt exists for all x ∈ R>0 .
(ii) Prove the functional equation Γ (x + 1) = xΓ (x) for all x ∈ R>0 . Hint: Integration by parts.
(iii) Show that Γ (1) = 1, and conclude that Γ (n + 1) = n! for all n ∈ N.
m
23.6 Show that m−1 = m2 and m2 = 2m−1 − 1 for m ≥ 1.

23.7∗ Let nk denote the number of permutations on {1, . . ., n} with exactly k cycles, for all non-
negative integers.
h i Theseh i numbers are the Stirling
h i numbers of the first kind. We have the boundary
conditions 00 = 1, n0 = 0 if n > 0, and nk = 0 if k > n.
(i) Give all permutations on {1, . . ., n} having k cycles, for 1 ≤ k ≤ n ≤ 4.
n
(ii) Prove that nn = 1, n−1 = n2 , and n1 = (n − 1)! for all n ∈ N>0 .

(iii) Find and prove a recursion formula for nk for 1 ≤ k ≤ n. Hint: Distinguish the two cases
whether n is a fixed point (a cycle of length 1) or not.

(iv) Prove that xm = ∑0≤i≤m (−1)m−i mi xi for m ∈ N. What is the corresponding formula for xm ?
(v) Conclude that
hmin n o nmoh n i
∑ (−1)m−i i m = δn−i = ∑ (−1)n−m i m
i≤m≤n i≤m≤n

for all i, n ∈ N such that i ≤ n, where δn−i is 1 if n = i and 0 otherwise.

23.8∗ For m ∈ N, the mth Bernoulli number Bm ∈ Q is recursively defined by B0 = 1 and

m+1
∑ Bi = 0 for m ∈ N≥1 ,
0≤i≤m i

and for m ≥ 0 we define the polynomial

1 m+1
Sm = ∑ Bm+1−k xk ∈ Q[x].
m + 1 1≤k≤m+1 k

(i) Compute Bm and Sm for 0 ≤ m ≤ 4.

(ii) For nonnegative integers c ≤ b ≤ a, prove the identity
! ! ! !
a b a a−c
= .
b c c b−c

(iii) Prove that ∆Sm = xm for all m ∈ N. (Hint: Use (ii).) Show that this implies ∑0≤k<n km = Sm (n)
for all n ∈ N.
(iv) Conclude from Exercise 23.7 and (7) that
! ( )" #
Bm+1−k m + 1 (−1)i+1−k m i+1
m+1 k
= ∑ i+1 i k
k−1≤i≤m+1

holds for all k, m ∈ N with 1 ≤ k ≤ m + 1.

23.9∗ (i) Give a combinatorial proof of Vandermonde’s convolution
r s r + s
∑ i m−i = m (25)
0≤i≤m

for integers r, s, m with 1 ≤ m ≤ r + s by counting in two different ways the number of possibilities
to choose m persons among r women and s men.
Exercises 673

(ii) Give a different proof of (25) by comparing coefficients of zm on both sides of the (formal)
power series equality
1 1 1
· =
(1 − z)r (1 − z)s (1 − z)r+s
in Q[[z]].
(iii) Show that for each m ∈ N≥1 , (25) becomes an equality in the polynomial ring Z[x, y] if we
formally replace r, s by indeterminates x, y. Hint: Lemma 6.44.
(iv) Conclude that the falling factorials satisfy the binomial theorem
m
∑ i xi ym−i = (x + y)m .
0≤i≤m

What about the rising factorials?

23.10 (i) Prove that the falling factorials satisfy the identity
xm+n = xm (x − m)n = xm E −m xn (26)
for m, n ∈ N. (This corresponds to the power law xm+n = xm xn
for the ordinary powers.)
(ii) We take (26) as definition for xm when m = −n is negative. Prove that (26) and ∆xm = mxm−1
hold for all m, n ∈ Z.
23.11 Compute closed forms for the following sums and verify your results for n = 1, 2.
(i) ∑ (k3 − 3k2 + 2k + 5),
0≤k<n
(ii) ∑ k2k . Hint: Think of integration by parts.
0≤k<n
23.12 This exercise generalizes Lemma 23.5. Let F be a field. For a rational function ρ = f /g
in F(x), with nonzero f , g ∈ F(x), we let deg(ρ) = deg f − deg g. Prove that deg(∆ρ) ≤ deg ρ − 1
holds for all ρ ∈ F(x) such that ∆ρ 6= 0, with equality if and only if deg ρ 6= 0. Conclude that if
σ ∈ F(x) has deg σ = −1, then there is no rational function ρ ∈ F(x) such that ∆ρ = σ.
23.13 Let F be a field of characteristic 0.
(i) Prove that the usual differential operator D and the difference operator ∆ on F(x) commute, so
that D(∆ρ) = ∆(Dρ) for all ρ ∈ F(x).
(ii) For m ∈ N, the mth polygamma function is defined as Ψm (x) = Dm ln Γ (x) for x ∈ C, where Γ
is the gamma function from Exercise 23.5. Prove that Σx−m = (−1)m−1 Ψm /(m − 1)! for all nonzero
m ∈ N.
(iii) For a ∈ N>0 , use (ii) to prove that ∑(x2 + ax)−1 = −D(xa )/axa . Hint: Partial fraction decom-
position.
23.14 Let F be a field. Prove that the relation ∼ on the polynomial ring F[x] defined by
f ∼ g ⇐⇒ f = E i g for some i ∈ Z
is an equivalence relation.
23.15 What is the gff of the following polynomials?
(i) x2 (x + 1)(x + 2)3 (x + 3)4 , (ii) x3 (x + 1)2 (x + 3)4 .
23.16 Complete the proof of Theorem 23.12.
23.17∗ This exercise discusses the analog of Hermite reduction, using the gff. Let F be a field
1 m
of characteristic zero, f , g ∈ F[x] nonzero, and gff(g) = g1 · · ·gm , with monic g1 , . . ., gm ∈ F[x] and
gm 6= 1. For simplicity, we assume that deg f < deg g and that g is saturated, which means that
gcd(gi , E k g j ) 6= 1 implies i = j and k = 0, for all 1 ≤ i, j ≤ m and k ∈ Z. The following analog of
Hermite reduction is due to Moenck (1977b) and Pirastu (1992).
674 23. Symbolic summation

(i) Prove that there exist hi j ∈ F[x] of degree less than j deg gi for 1 ≤ j ≤ i ≤ m such that we have
the “partial fraction decomposition”
f hi j
= ∑ j
.
g 1≤ j≤i≤m g i

(ii) Show that there are polynomials s,t ∈ F[x] of degree less than deg gi such that sE − j+1 gi + t ·
(gi − E − j+1 gi ) = hi j . Using the product rule, conclude that

hi j t s − Et
j
= ∆ j−1
+ j−1 .
gi gi gi
(iii) Let b = g1 · · ·gm and d = E −1 (g/b). Conclude that there exist polynomials a, c ∈ F[x] with
deg a < deg b and deg c < deg d such that
f c a
=∆ + .
g d b
23.18 Which of the following expressions are hypergeometric, which are not? Compute the term
ratios.

(x + 1)2 2 2
(i) , (ii) (x + 1)2x , (iii) (−1)x Γ (x + 1).
3
23.19 Prove that the set of hypergeometric terms is closed under multiplication and division but not
under addition.
23.20 Trace Algorithm 23.25 on computing Σx2 , and compare its result to the one that you obtain
from Section 23.1.

23.21 Let n ∈ N≥1 . Prove that there is no hypergeometric expression f such that ∆ f = nx .
23.22 Decide whether the following hypergeometric expressions have hypergeometric sums, and if
so, compute
them.

3x + 1 2x 100
(i) , (ii) (2x + 1)2x Γ (x + 1), (iii) 2x .
x+1 x x
2 2
23.23 Fix n ∈ N≥1 . Which of the two indefinite sums Σ(−1)x nx and Σ nx are hypergeomet-
ric?
23.24−→ (Gerhard 1998) Let n ∈ N≥2 and
24x
g= 2 .
n+x 2n + 2x 2
x n+x
(i) Show that the term ratio of g is
Eg (x + 1)2 x2 + 2x + 1
σ= = 2 = . (27)
g 2n + 1 2 (2n + 1)2
x+ x + (2n + 1)x +
2 4

(ii) The numerator and denominator in (i) are a = x2 +2x +1 and b = x2 +(2n +1)x +(2n +1)2 /4.
Show that both Algorithms 23.20 and 23.18 return the denominator V = 1.
(iii) Equation (21) now reads a · EU − b · u = b. Prove that the solution U ∈ Q[x] has degree
δ = 2n − 1 if it exists.
(iv) Show that the linear operator L = aE − b: Q[x] −→ Q[x], with L f = a · E f − b · f , maps the 2n-
dimensional vector space W ⊆ Q[x] of all polynomials of degree less than 2n to itself, and conclude
that there is a unique polynomial U ∈ Q[x] with LU = a · EU − b ·U = b.
(v) Compute U for n = 6.
Exercises 675

23.25 Let K be a difference field with automorphism E, y an indeterminate, a, b, c, d ∈ K such that

ad 6= bc, and σ = (ay + b)/(cy + d) ∈ K(y). Prove that there is a unique automorphism E ∗ of K(y)
extending E such that E ∗ y = σ. For what choices of a, b, c, d is y hypergeometric over K?
23.26 Let F be a field of characteristic zero and r, s,t,U ∈ F[x] such that r · EU − s · U = t and
gcd(r, s) = 1. Suppose that deg gcd(r,t) ≥ 1, and reduce this difference equation for U to a difference
equation for a proper divisor U ∗ of U, with coefficients r∗ , s∗ ,t ∗ of degrees no larger than those of
r, s,t. Answer the same question when deg gcd(s,t) ≥ 1.
23.27 Let a, b ∈ Z[x] be nonzero with max-norm ||a||∞ , ||b||∞ ≤ B ∈ N. Prove that d ≤ 4B, where d
is the value computed in step 1 of Algorithms 23.18 and 23.20. Hint: Exercise 6.23.
23.28∗ (Abramov 1971) Let F be a field of characteristic zero and ρ = f /g ∈ F(x) a rational func-
tion, with coprime f , g ∈ F[x] and deg g > 0. The dispersion of ρ is defined as

dis(ρ) = max{k ∈ N: gcd(g, E k g) 6= 1}.

(i) Prove that dis(∆ρ) = dis(ρ) + 1.

(ii) Conclude that there is no rational function ρ ∈ F(x) such that ∆ρ = x−m , for any m ∈ N≥1 .
(Exercise 23.13 gives a “closed form” for Σx−m .)
23.29∗ Let F be a field of characteristic zero and a, b ∈ F[x] with b 6= 0 monic.
(i) Use Algorithm 23.20 to prove that there exist polynomials r, s, v ∈ F[x] with s, v nonzero and
monic such that
a r Ev
= · , gcd(r, E i s) = 1 for i ∈ N>0 . (28)
b Es v
(ii) Show how to obtain from an arbitrary representation (28) one where gcd(r, v) = gcd(s, v) = 1
(hint: Exercise 23.26), and prove that it is unique. It is called the Gosper-Petkovšek representation
of a/b, after its inventors Gosper (1978) and Petkovšek (1992).
(iii) Prove that there exist polynomials r, s, u, v ∈ F[x] such that s, u, v are nonzero and monic,
a r u Ev
= · · , gcd(r, E i s) = 1 for i ∈ Z, (29)
b s Eu v
and gcd(r, Eu) = gcd(r, v) = gcd(s, u) = gcd(s, Ev) = gcd(u, v) = 1. We call this an extended
Gosper-Petkovšek representation of a/b.
(iv) Let a = x(x + 2) and b = (x − 1)(x + 1)2 (x + 3). Give at least two distinct extended Gosper-
Petkovšek representations of a/b.
(v) Prove that there is a rational function c/d ∈ F(x) such that E(c/d) = a/b if and only if r = s = 1
holds for every extended Gosper-Petkovšek representation of a/b.
(vi) Compute an extended Gosper-Petkovšek representation of the rational function

x4 + 4x3 + 3x2
∈ Q(x).
x6 + 7x4 + 5x5 + x3 − 2x2 − 4x − 8
The example in (iv) is from Abramov & Petkovšek (2001), who also show that the extended Gosper-
Petkovšek form is unique in the case (v).
23.30 Let F be a field of characteristic zero, f ∈ F[x] nonconstant, and k ∈ N. Prove that

gcdE(gcdE(· · ·gcdE( f ) · · ·)) = gcd( f , E f , . . ., E k f ).

| {z }
k times

23.31 Prove Lemma 23.24. Hint: Rewrite (21) in terms of the difference operator ∆.
It is no paradox to say that in our most theoretical moods
we may be nearest to our most practical applications.
Alfred North Whitehead (1911)

L’algèbre est généreuse, elle donne souvent plus qu’on lui demande.1
Jean le Rond D’Alembert ()

I will, however, mention an unexpected circumstance,

as it illustrates, in a striking manner, the connection between
remote inquiries in mathematics, and as it may furnish a lesson
to those who are rashly inclined to undervalue the more
recondite speculations of pure analysis, from an erroneous idea
of their inapplicability to practical matters.
Charles Babbage (1822)

Many identities in combinatorics are still out of the range of

computers, but even if one day they would all be computerizable,
that would by no means render them obsolete,
since the ideas behind the human proofs are often
much more important than the theorems that are being proved.
Marko Petkovšek, Herbert S. Wilf, and Doron Zeilberger (1996)

1 Algebra is generous, she often gives more than is asked of her.

24
Applications

24.1. Gröbner proof systems

This application from mathematical logic shows how very general situations can
be coded in polynomial ideals, and then our techniques applied to them.
A proof system in logic consists of axioms and inference rules. By applying
these rules repeatedly, starting with the axioms, one obtains the theorems provable
in the system. The set of all these theorems is the theory of the system. As an
example, we can take as axioms the basic facts about real numbers, expressible
with 0, 1, +, ∗, =, <, such as
∀x ∃y y < x,
as inference rules the usual ones, such as
x < y, z < w
x+z < y+w
which is to be read as: “if x < y and z < w are provable, then so is x + z < y + w”,
and then one obtains as provable theorems the theory of the real numbers.
We only deal with propositional calculus, a much simpler object, where we
only have variables that may assume the values true or false, and the logical con-
nectives ¬ (not), ∨ (or), ∧ (and), −→ (implies). There are no quantifiers. An
example of an axiom is
x ∨ ¬x,
and of two inference rules
x, x −→ y x, ¬x
, . (1)
y false
In a typical formulation of propositional proof systems, one proves that a certain
set of formulas, such as

S = {x1 , x1 −→ x2 , x2 −→ x3 , x3 −→ x4 , ¬x4 }, (2)

677
678 24. Applications

is unsatisfiable, that is, it is impossible to make all formulas in S simultaneously

true by giving truth values true or false to its variables.
A refutation of S is a derivation of the simplest contradiction false by using
repeatedly formulas from S, axioms, and inference rules. In (2), we can derive x2
from x1 , x1 −→ x2 , and the first rule in (1), and then similarly x3 , and x4 , and finally
false from x4 , ¬x4 , and the second rule in (1).
There are several propositional proof systems, among them tableaux, resolution,
Horn clause resolution, the Davis-Putnam procedure, Frege-Hilbert proofs, and
Gentzen systems. Two measures of such a system are of interest: How long is a
shortest refutation of a given unsatisfiable set of formulas? How hard is it to find a
“short” refutation, if one exists?
The pigeonhole principle states that, for any n ∈ N, it is not possible to fit n + 1
pigeons into n pigeonholes without a collision. (Animal rights supporters, worried
about the well-being of pigeons stuffed into dreary holes, may prefer the German
term Dirichlets Schubfachprinzip 1.) To express this as a propositional formula, we
let xi j stand for “pigeon i is in pigeonhole j”. Then the principle for n = 2 says
that
PHP2 = {x11 ∨ x12 , x21 ∨ x22 , x31 ∨ x32 ,
¬x11 ∨ ¬x21 , ¬x11 ∨ ¬x31 , ¬x21 ∨ ¬x31 , ¬x12 ∨ ¬x22 , ¬x12 ∨ ¬x32 , ¬x22 ∨ ¬x32 }
is unsatisfiable. PHPn has length about n3 log2 n in general. A famous result by
Haken (1985) is that any resolution proof of PHPn has exponential length at least
2cn for some positive constant c ∈ R. This is an active area of research, and we
cannot but scratch the surface.
Propositional proof systems occur in several areas of computer science. For
example, back-tracking algorithms can be formulated as such systems, and in arti-
ficial intelligence, logic programming languages such as Prolog employ them as a
method of computation and as a means of representing knowledge.
Algebra is a powerful tool that can express many things, and we now describe
how it can be used to build a propositional proof system. We fix a field F, usually
F = F2 . To each Boolean variable x we associate a variable xe over F. We use the
(somewhat unusual) correspondence

true ←→ 0, false ←→ 1

between Boolean values and field elements.

The polynomial xe2 − xe expresses the fact that xe can only take the values 0 or 1.
The Boolean connectives are translated as follows: if ϕ and ψ are formulas, with
corresponding polynomials ϕ e and ψe, then

¬ϕ ←→ 1 − ϕ
e; eψe;
ϕ ∨ ψ ←→ ϕ e)(1 − ψe).
ϕ ∧ ψ ←→ 1 − (1 − ϕ
1 Dirichlet’s drawer principle. The English “pigeonhole” is actually an office mail box.
24.2. Petri nets 679

In this way, to every propositional formula ϕ in x1 , . . . , xn there corresponds a poly-

nomial ϕ e ∈ F[e x1 , . . . , xen ] = R. For example, if ϕ is x −→ y, which is logically
equivalent to ¬x ∨ y, then ϕ e = (1 − xe)e y. An algebraic derivation makes use of the
axioms
xe12 − xe1 , . . . , xen2 − xen ∈ R, (3)
and the rules of inference are that if one has f , g ∈ R already, a, b ∈ F and 1 ≤ i ≤ n,
then one can derive
a f + bg and xei f .
To derive a formula ψ from a set S = {ϕ1 , . . . , ϕs } of formulas, one may apply
these rules to the axioms and to ϕ es ∈ R to derive ψe ∈ R. In other words, we
e1 , . . . , ϕ
consider the ideal

x12 − xe1 , . . . , xen2 − xen , ϕ

I = he es i ⊆ R.
e1 , . . . , ϕ

The Nullstellensatz proof system asks to derive 1, that is, to prove that 1 ∈ I,
according to these rules, in order to refute S. Corresponding to (2), we have

x12 − xe1 , xe22 − xe2 , xe32 − xe3 , xe42 − xe4 ,

I = he
(4)
xe1 , (1 − xe1 )e
x2 , (1 − xe2 )e
x3 , (1 − xe3 )e
x4 , (1 − xe4 )i ⊆ F[e
x1 , xe2 , xe3 , xe4 ],

and a refutation of S is

xe2 = (1 − xe1 )e
x2 + xe2 · xe1 , xe3 = (1 − xe2 )e
x3 + xe3 · xe2 ,

xe4 = (1 − xe3 )e
x4 + xe4 · xe3 , 1 = xe4 + (1 − xe4 ) ∈ I.
The Gröbner proof system, introduced by Clegg, Edmonds & Impagliazzo
(1996), computes the reduced Gröbner basis G of I and then checks whether
G = {1}, thus testing whether 1 ∈ I (Exercise 21.19). For the ideal in (4), M APLE’s
Groebner:-Basis command produces G = {1}, immediately showing that 1 ∈ I.
Clegg et al. prove that their Gröbner proof system can simulate others efficiently,
such as Horn clause resolution, and thus is at least not worse than these, in an
appropriate sense. They also show that in some cases, this system is better than
others. The details are not too difficult, but beyond the scope of this text.

24.2. Petri nets

This section shows how algebraic methods—Gröbner bases in this case—can be
put to use for another apparently remote problem: the analysis of parallel pro-
cesses. A widely used model is Petri nets, introduced by Petri (1962). Rather
than give a formal definition, we content ourselves with an example. Figure 24.1
shows a Petri net. It is a weighted bipartite directed graph, with places s1 , s2 , s3 ,
680 24. Applications

transitions t1 ,t2 ,t3 , and weights on the edges. All weights are 1, except weight 2
from s3 to t3 . Furthermore, a marking M of the net assigns to each place si a non-
negative integer M(si ), the number of tokens on that node. Such a marked Petri
net can be used to describe a state of a system of processes.

s1 s2

t1 t2

s3
2
t3

F IGURE 24.1: A marked Petri net.

The dynamics of the system correspond to a firing of some of the transitions

in the Petri net. Each participating transition receives tokens from its input places
and sends tokens to its output places; the number of tokens equals the edge weight.
Three consecutive firings of the net in Figure 24.1 are given in Figure 24.2. The
first two could be exchanged or even combined into one firing. Only the marking
changes in a firing.

s1 s2 s1 s2 s1 s2

t1 t2 t1 t2 t1 t2

s3 s3 s3
2 2 2
t3 t3 t3

F IGURE 24.2: The Petri net from Figure 24.1 after one, two, and three firings, respectively.

A Petri net is reversible if for each firing there is a sequence of firings that
form the reverse of the given one. This is illustrated in Figure 24.2, where the last
two firings reverse the first one. The reachability problem for (reversible) Petri
nets is, given two markings M and M ∗ of the same net, to decide whether M ∗ can
be obtained from M by a sequence of firings. If the Petri net is reversible, then M ∗
is reachable from M if and only if M is reachable from M ∗ .
24.3. Proving identities and analysis of algorithms 681

What does this have to do with algebra? We choose a field F, variables x1 , . . . , xn

corresponding to the n places, and consider the ideal generated by polynomials
f1 , . . . , fm ∈ F[x1 , . . . , xn ] corresponding to the m transitions, as illustrated now for
Figure 24.1:
f1 = x3 − x1 , f2 = x3 − x2 , f3 = x1 x2 − x32 . (5)
Such an ideal, generated by binomials, is called a binomial ideal. It is interest-
ing that its reduced Gröbner basis consists again of binomials. It is now possible
to decide the reachability problem for reversible Petri nets by means of Gröbner
bases: Let M and M ∗ be two markings. Then the binomial
M ∗ (s1 ) ∗ M(s1 )
x1 · · · xnM (sn )
− x1 · · · xnM(sn )

is in the ideal generated by f1 , . . . , fm if and only if M and M ∗ are reachable from

each other.
In our example, the two markings given by M(s1 ) = M(s2 ) = 1, M(s3 ) = 0 and
M (s1 ) = 0, M ∗ (s2 ) = M ∗ (s3 ) = 1, corresponding to Figure 24.1 and the leftmost
∗

marking in Figure 24.2, respectively, are reachable from each other, and in fact the
polynomial
x2 x3 − x1 x2 = x2 · f1
is in the ideal h f1 , f2 , f3 i ⊆ F[x1 , x2 , x3 ].

24.3. Proving identities and analysis of algorithms

We will briefly discuss how the symbolic summation algorithms from Chapter 23
can be used to prove combinatorial identities. As an example, we take a sum-
mation problem that shows up in the analysis of a variant of our modular gcd
algorithm 6.36. We can only roughly sketch the ideas and refer to the literature for
details (see Notes 24.3).
Given a, b ∈ F[x, y], where F is a sufficiently large field and a, b are primitive
with respect to x, Algorithm 6.36 chooses a set S ⊆ F of evaluation points, com-
putes vu = gcd(a(x, u), b(y, u)) for all u ∈ S, and recovers c = gcd(a, b) ∈ F[x, y]
by interpolation. We have seen that some evaluation points u may be “unlucky”,
so that gcd(lcx (a), lcx (b)) vanishes at y = u or degx vu > degx c, which happens if
and only if u is a root of a certain subresultant σ ∈ F[y] of a and b. Thus the total
number of unlucky points is at most degy σ . The number of “lucky” points suffi-
cient to recover the gcd by interpolation is s = 2 max{degy a, degy b} + 1, which in
general is much smaller than degy σ . Theorem 6.37 analyzes this algorithm.
In practice, however, we would successively choose points u from a fixed subset
U ⊆ F of (sufficiently large) cardinality n uniformly at random and discard unlucky
choices until we have found exactly s lucky points. To analyze this variant, we
are interested in the expected number of choices until s lucky points are found.
682 24. Applications

We model this by the following random experiment. An urn contains w white and
n − w black balls, for integers 0 < w ≤ n. The white and black balls represent the
lucky and unlucky points in U, respectively. We repeatedly draw balls, without
replacement, and let the random variable T denote the number of trials until we
have first found precisely s white balls, where 0 ≤ s ≤ w. Moreover, we consider
random variables X1 , . . . , Xn , where Xk counts the number of white balls obtained
after k trials. Then the Xk have a hypergeometric distribution
wn−w

prob(Xk = s) = s k− s ,
n
k

since there are ws possibilities to choose s white balls, n−w k−s possibilities to
n

obtain k − s black balls, and the total number of choices is k . If k < s, then we
have prob(Xk = s) = 0. Using conditional probabilities, we find

pk = prob(T = k) = prob(Xk = s and Xk−1 = s − 1)

= probXk−1 =s−1 (Xk = s) · prob(Xk−1 = s − 1)

w n−w
w−s+1 s−1 k−s
= ,
n−k+1 n
k−1

w n−w w n−w k n−k
(w − s + 1)
s−1 k−s s k−s s w−s
kpk = =s =s .
n−k+1 n n n
k k−1 k w

We claim that the expected number E (T ) = ∑0≤k≤n kpk of trials is

(n + 1)s
∑ kpk = , (6)
0≤k≤n w+1

(this is quite close to the expected value ns/w for drawing with replacement ) and
let
k n−k k n−k
kpk (w + 1) s w−s s w−s
g(n, k) = = = (7)
(n + 1)s n+1 n n+1
w+1 w w+1
for 0 ≤ k ≤ n. If we define S(n) = ∑0≤k≤n g(n, k) = ∑k∈Z g(n, k), then the claim (6)
is equivalent to S(n) = 1 for all n ∈ N with n ≥ w. It is easy to check that the
24.3. Proving identities and analysis of algorithms 683

hypergeometric term f (n, k) defined by

(k − n − 1)(k − s)
f (n, k) = − g(n, k) (8)
(n + 2)(k − n − 1 + w − s)

satisfies
f (n, k + 1) − f (n, k) = g(n + 1, k) − g(n, k) (9)
for all n ≥ w and k ∈ Z (after dividing both sides in (9) by g(n, k), only rational
functions in n and k remain). Thus

S(n + 1) − S(n) = ∑ g(n + 1, k) − ∑ g(n, k) = ∑ ( f (n, k + 1) − f (n, k)) = 0

k∈Z k∈Z k∈Z

for all n ≥ w, so that indeed S(n) = S(n − 1) = · · · = S(w) = 1 (by (7), the only
nonvanishing summand in S(w) = ∑0≤k≤w g(w, k) is g(w, s) = 1), and the claim is
proved.
But where does the magic term f in (8) come from? We consider s, w to be
indeterminates, let x and y be two other indeterminates over Q, and see that (9),
with n and k replaced by y and x, respectively, is just an indefinite summation
problem with respect to the variable x over the field Q(s, w, y) of rational functions
in s, w, y, namely ∆ f ∗ = g∗ , where f ∗ (x) = f (y, x) and g∗ (x) = g(y + 1, x) − g(y, x),
and we can apply the hypergeometric summation algorithm 23.25. In our example,
we have
 
y−x+1 y−x

x   w−s − w−s 

g(y + 1, x) − g(y, x) = 
s y+2 y+1 
w+1 w+1

x y−x

(y + 1 − w)(x − y − 1) s w−s
= −1
(y + 2)(x − y + w − s − 1) y+1
w+1
(w + 1)x − (s + 1)y + w − 2s − 1
=− g(y, x), (10)
(y + 2)(x − y + w − s − 1)

which is hypergeometric in x. Its term ratio is

g(y + 1, x + 1) − g(y, x + 1)
σ=
g(y + 1, x) − g(y, x)

(x + 1)(x − y + w − s − 1) (w + 1)x − (s + 1)y + 2w − 2s
= ,
(x − s + 1)(x − y) (w + 1)x − (s + 1)y + w − 2s − 1
684 24. Applications

and we apply Algorithm 23.20 with the polynomials

a = (x + 1)(x − y + w − s − 1) (w + 1)x − (s + 1)y + 2w − 2s ,

b = (x − s + 1)(x − y) (w + 1)x − (s + 1)y + w − 2s − 1

in Q(s, w, y)[x] as input. Then the algorithm computes V = (w + 1)x − (s + 1)y +

w − 2s − 1, and the solution of the difference equation for the numerator is U =
(x − y − 1)(x − s). Using (10), we obtain

U (x − y − 1)(x − s)
f (y, x) = (g(y + 1, x) − g(y, x)) = − g(y, x),
V (y + 2)(x − y − 1 + w − s)

so that f (n, k) is precisely as in (8), and we are done.

The procedure as described above has provided us with a proof certificate,
namely f , or more precisely, the rational function

f (n, k) (2k − 3n − 3)k2

= ,
g(n, k) 2(k − n − 1)2 (2n + 1)

which enabled us to verify the correctness of (6) independently of the above com-
putations, by simply checking the validity of (9). In the example, it may be easier
to prove (6) directly by induction (Exercise 24.3). A variant of the above procedure
can even find the right hand side in (6), given only the summand kpk .
The above approach, due to Herbert Wilf and Doron Zeilberger, works for a
certain class of bivariate hypergeometric summands. In this way a large variety of
combinatorial identities involving sums over binomial coefficients, with (6) being
only a rather trivial special case, can be automatically proved, and proof certificates
similar to f in (9) such that f (n, k)/g(n, k) is a rational function can be generated.
If a closed form for the sum is not known in advance, then Zeilberger’s method
finds one if it exists, or otherwise at least a recursion formula. For example, the
method finds routinely the second order recurrence

(n + 2)3 a(n + 2) − (34n3 + 153n2 + 231n + 117)a(n + 1) + (n + 1)3 a(n) = 0

for the Apéry numbers

n 2 n + k 2
a(n) = ∑ for n ∈ N,
0≤k≤n k k

which played an important role in Roger Apéry’s sensational proof that ζ (3) is
irrational. The ideas can also be extended to more than two variables, nested sums,
24.4. Cyclohexane revisited 685

definite integrals, and q-hypergeometric sums: a prominent example is a machine-

generated proof for the identity
2
qk
∑ k n−k )
0≤k≤n (1 − q) · · · (1 − q ) · (1 − q) · · · (1 − q
2
(−1)k q(5k −k)/2
= ∑ n−k ) · (1 − q) · · · (1 − qn+k )
, (11)
−n≤k≤n (1 − q) · · · (1 − q

whose limit for n −→ ∞ is the first Rogers-Ramanujan identity.

24.4. Cyclohexane revisited

In this last section, we return to the problem from the first section to determine
all spatial conformations of cyclohexane and show how computer algebra is used
to obtain the solution indicated in Section 1.1. This will make use of some of the
tools that we have discussed in the book.
We have six vectors a1 , . . . , a6 ∈ R3 that represent the bonds between carbon
atoms (Figure 24.3) and satisfy

a1 ⋆ a1 = a2 ⋆ a2 = · · · = a6 ⋆ a6 = 1,
1
a1 ⋆ a2 = a2 ⋆ a3 = · · · = a6 ⋆ a1 = , (12)
3
a1 + a2 + · · · + a6 = 0,

where ⋆ is the inner product. These conditions express the convention that each
bond has unit length, the angle α between two successive (oriented) bonds has
cos α = 1/3, and that the structure is cyclic. We now let Si j = ai ⋆ a j for 1 ≤ i, j ≤ 6.
Under the conditions (12), Si j is the cosine of the angle between ai and a j . Since

Si1 + · · · + Si6 = ai ⋆ a1 + · · · + ai ⋆ a6 = ai ⋆ (a1 + · · · + a6 ) (13)

for all i, (12) implies that

Si j = S ji for 1 ≤ i < j ≤ 6,
S11 = S22 = · · · = S66 = 1,
1 (14)
S12 = S23 = · · · = S61 = ,
3
Si1 + Si2 + · · · + Si6 = 0 for 1 ≤ i ≤ 6

The advantage of (14) over (12) is that all equations are linear. We have 33 =
15 + 6 + 6 + 6 linear equations in 36 variables. It turns out that these equations
are linearly independent, so that the solution space has 36 − 33 = 3 dimensions.
Calculating by hand, we plug the first three lines into the last six equations and
686 24. Applications

a3 a5

a2 a6
a1
F IGURE 24.3: A “chair” conformation of cyclohexane, and the orientation we give to the
bonds a1 , . . . , a6 .

arrange the indices so that only Si j ’s with i < j appear. All 36 values are then
expressed as linear functions of the nine unknowns

S13 , S14 , S15 , S24 , S25 , S26 , S35 , S36 , S46 ,

and the remaining six equations between these can be written as

S13 + S14 + S15 = S14 + S24 + S46 =

S13 + S35 + S36 = S26 + S36 + S46 = (15)
5
S15 + S25 + S35 = S24 + S25 + S26 =− ,
3
which in turn is equivalent to

S24 = S15 , S26 = S35 , S46 = S13 ,

5 5 5 (16)
S14 = −S13 − S15 − , S25 = −S15 − S35 − , S36 = −S13 − S35 −
3 3 3
(for example, by adding the first two lines in (15) and subtracting the third from it,
we obtain 2S13 = 2S46 ). Thus all Si j are linear expressions in the three unknowns
S13 , S35 , and S15 , each of them either a constant (1 or 1/3), or an unknown, or as
in the last line of (16). The three unknowns correspond precisely to the possible
rotations around the bonds a2 , a4 , and a6 . We have now moved from ordinary
three-space in which the ai ’s live (and where the six ai ’s really form a point in R 18 )
to the configuration space as parametrized by (S13 , S35 , S15 ) ∈ R 3 .
In the computer algebra system M APLE, the above calculation goes as follows:

with(LinearAlgebra):
G := Matrix(1..6, 1..6, (i,j)->’S’[i,j], shape=symmetric);
for i from 1 to 6 do
S[i,i] := 1:
end:
24.4. Cyclohexane revisited 687

for i from 1 to 5 do
S[i,i+1] := 1/3:
end:
S[1,6] := 1/3:
eq := {seq(S[i,1]+S[i,2]+S[i,3]+S[i,4]+S[i,5]+S[i,6]=0,
i=1..6)}:
sol := solve(eq, {S[1,4],S[2,4],S[2,5],S[2,6],S[3,6],S[4,6]});
G := eval(G, sol);
 
S1, 1 S1, 2 S1, 3 S1, 4 S1, 5 S1, 6
 S2, 1 S2, 2 S2, 3 S2, 4 S2, 5 S2, 6 
 
 S3, 1 S3, 2 S3, 3 S3, 4 S3, 5 S3, 6 
G := 



 S4, 1 S4, 2 S4, 3 S4, 4 S4, 5 S4, 6 
 S5, 1 S5, 2 S5, 3 S5, 4 S5, 5 S5, 6 
S6, 1 S6, 2 S6, 3 S6, 4 S6, 5 S6, 6

5 5
sol := {S1, 4 = − − S1, 3 − S1, 5 , S2, 5 = −S1, 5 − S3, 5 − ,
3 3
5
S3, 6 = −S1, 3 − − S3, 5 , S4, 6 = S1, 3 , S2, 6 = S3, 5 , S2, 4 = S1, 5 }
3
 
1 5 1
1 , , S1, 3 , − − S1, 3 − S1, 5 , S1, 5 ,
 3 3 3 
 
 1 1 5 
 , 1 , , S1, 5 , −S1, 5 − S3, 5 − , S3, 5 
 3 3 3 
 
 1 1 5 
 
 S1, 3 , , 1 , , S3, 5 , −S1, 3 − − S3, 5 
 3 3 3 
G :=  
 5 1 1 
 − − S1, 3 − S1, 5 , S1, 5 , , 1 , , S1, 3 
 3 3 3 
 
 
 S , −S − S − 5 , S , 1 , 1 , 1 
 1, 5 1, 5 3, 5
3
3, 5
3 3 
 
 1 5 1 
, S3, 5 , −S1, 3 − − S3, 5 , S1, 3 , , 1
3 3 3
We have included M APLE’s results here. The M APLE code can be downloaded
from the book’s web page http://cosec.bit.uni-bonn.de/science/mca/,
and the reader is encouraged to run this code herself or to modify it so that it runs
under her computer algebra system, if that is different from M APLE.
In the transition from the vectors ai to the inner products Si j we have lost one
central piece of information: the dimension of the space in which the ai ’s live.
Mathematically (but not chemically) the Si j might just as well be the inner products
of some ai ’s in R 10 or so. In fact, if we consider science-fiction cyclohexane
in six (or more) dimensions, then (14) describes precisely all its conformations.
(Actually, some inequalities also have to hold; these are explained later.)
688 24. Applications

Back to the real world! We have to reintroduce the information that chemistry
(unlike mathematics and computer science) lives in three dimensions only. In R 3 ,
any four vectors are linearly dependent, and hence a solution to (14) can be realized
in three-space if and only if any four of a1 , . . . , a6 are linearly dependent. In order
to express this in terms of the Si j , we introduce the abbreviations x = S13 , y = S35 ,
and z = S15 , and consider the Gramian matrix (Section 25.5)

G = Ga1 ,...,a6 = (ai ⋆ a j )1≤i, j≤6 = (Si j )1≤i, j≤6

 
1 13 x vy z 31
 1 1

 
 3 1 3 z vx y 
 x 1 1 1 y v 
 z 
= 3
1
3
1
 ∈ R6 (17)
 vy z 3 1 3 x 
 
 z v y 1 1 1 
 x 3 3 
1 1
3 y vz x 3 1

of a1 , . . . , a6 , where vx is an abbreviation for −y − z − 5/3, and similarly for vy , vz .

Exercise 24.4 shows that the Gramian matrix of a sequence of vectors is nonsingu-
lar (so that its determinant is nonzero) if and only if the vectors are linearly inde-
pendent. Thus the three-dimensionality condition is equivalent to the fact that for
any subset T ⊆ {1, . . . , 6} of cardinality 4, the Gramian matrix Gai :i∈T of (ai : i ∈ T ),
which is a 4 × 4 submatrix of G, is singular, that is,

∀T ⊆ {1, . . . , 6} #T = 4 =⇒ det(Gai :i∈T ) = 0. (18)

Among the 64 = 15 submatrices of this kind, there are only nine distinct ones,
since the matrix G is invariant under the exchange of rows and columns 1, 2, 3 with
4, 5, 6, which corresponds to a vertical and horizontal swap of the 3 × 3 blocks
indicated in (17).
We now have arrived at a system of nine equations in the three indetermi-
nates x, y, z, expressing the fact that the subdeterminants of G corresponding to the
9 subsets {1, 2, 3, 4}, {1, 2, 3, 5}, {1, 2, 3, 6}, {1, 2, 4, 5}, {1, 2, 4, 6}, {1, 2, 5, 6},
{1, 3, 4, 6}, {1, 3, 5, 6}, and {2, 3, 5, 6} vanish. For example, the subdeterminant
belonging to the subset {1, 2, 3, 6} is
 
1 1
1 3 x 3
 1 1

 1 y 
g1 = det Ga1 ,a2 ,a3 ,a6 = det  3
 x 1
3 

 3 1 −x − y − 53  (19)
1
y3 −x − y − 53 1
1 2 2
= (9x y + 6x2 y + 6xy2 − 23x2 − 20xy − 23y2 − 34x − 34y − 15).
9
24.4. Cyclohexane revisited 689

We let F ⊆ Q[x, y, z] be the set of those nine subdeterminants, regarded as polyno-

mials in the three variables x, y, z, and I = hFi the ideal in Q[x, y, z] generated by
the polynomials in F. Here is the M APLE code to compute F.
S[1,3] := x:
S[3,5] := y:
S[1,5] := z:
A := {1,2,3,4,5,6}:
F := {}:
for i from 2 to 6 do
for j from 1 to i-1 do
T := convert(A minus {i,j}, list):
F := F union {Determinant(SubMatrix(G, T, T))}:
end:
end:

The convert command transforms the set A \ {i, j} into a list, the data format
required for the second and third arguments of the SubMatrix command.
As noted above, any assignment of specific real values for the inner products
Si j for 1 ≤ i, j ≤ 6 coming from a conformation of the cyclohexane molecule in
three-space yields a common zero of the polynomials in F, and hence a zero of all
polynomials in I. More precisely, we have

{(S13 , S35 , S15 ) ∈ R 3 : ∃S11 , S12 , . . . , S66 such that (14) and (18) hold} = V (I).

We first look at a two-dimensional image of our situation. We are lucky: the set
F already contains exactly one polynomial in x and y only, namely g1 from (19).
It will turn out that its zero set X = V (g1 ), shown in Figure 24.4, is the projection
of V (F) onto the x, y-plane, and that with two exceptions, over each point of X lies
exactly one point of V (F).
The polynomial g1 is quadratic both in x and y and of total degree four, and for
a specific value u ∈ R, we can determine v ∈ R such that (u, v) ∈ V (g1 ) by solving
g1 (u, v) = 0:
p
−3u2 + 10u + 17 ± 8(27u4 + 48u3 − 24u2 − 44u − 7)
v= , (20)
9u2 + 6u − 23
as provided by M APLE’s command
g[1] := Determinant(SubMatrix(G, [1,2,3,6], [1,2,3,6]));
solve(g[1], y);

If the denominator in (20) does not vanish, then this equation has 0, 1, or 2 real
solutions v ∈ R, depending on whether the discriminant
32
(27u4 + 48u3 − 24u2 − 44u − 7)
81
690 24. Applications

x-y=0

–4 –2 0 2 4 x

x+y+2/3=0

g[1]=0 –2

–4

0.2

–1 –0.8 –0.6 –0.4 –0.2 0.2

x 0

–0.2

Q y

–0.4

"boat"

–0.6

g[1]=0

–0.8

–1

F IGURE 24.4: Projections of V (F) and V (F) ∩ A onto the x, y plane.

24.4. Cyclohexane revisited 691

of g1 (see Example 6.18) is negative, zero, or positive. The latter happens when
√ √
6 7 6
u < −1 − or − < u < −1 + or 1 < u, (21)
3 9 3
and single solutions occur for
√
6 7
u ∈ {−1 ± , − , 1} ≈ {−1.8165, −0.1835, −0.7778, 1} (22)
3 9
(the pink points in Figure 24.4), as certified by the M APLE command
factor(discrim(g[1], y));
√
For u = (−1 ± 2 6)/3 ≈ −0.3333 ± 1.6330 (the light blue points in Figure 24.4),
the coefficient of y2 in g1 , which equals the denominator in (20), vanishes, and
there is a unique v ∈ R such that g1 (u, v) = 0, namely
√
23u2 + 34u + 15 −1 ∓ 2 6
v= = . (23)
6u2 − 20u − 34 3
Figure 24.4 shows a plot of X and of a magnification of the central piece of X,
which will turn out to be the only piece relevant for our cyclohexane problem. (The
yellow triangle is explained below.) The points (−1/3, −1/3), (−1/3, −7/9), and
(−7/9, −1/3) marked in Figure 24.4 are the projections of the three “boat” points
in Figure 1.5 on page 15.
What is the precise connection between X and V (F)? Since g1 ∈ F, for any
(u, v, w) ∈ V (F), we have g1 (u, v) = 0, and thus the whole projection of V (F) is
contained in X. Conversely, if (u, v) ∈ X, can we find a w ∈ R such that (u, v, w) is
a common zero of the other eight polynomials in F? All of them have degree two
in z, and it would be nice if we could somehow eliminate the occurrences of z2 .
We take a close look, and find that the two polynomials
g2 (x, y, z) = det Ga2 ,a3 ,a5 ,a6
1
= (9x2 y2 + 18x2 yz + 9x2 z2 + 18xy2 z + 18xyz2 + 9y2 z2 + 30x2 y
9
+30x2 z + 60xy2 + 120xyz + 30xz2 + 60y2 z + 30yz2 + 16x2
+118xy + 98xz + 36y2 + 118yz + 16z2 + 50x + 60y + 50z + 21),
g3 (x, y, z) = det Ga1 ,a3 ,a4 ,a6 = g2 (y, x, z)
have the same leading coefficient (9x2 + 18xy + 9y2 + 30x + 30y + 16)/9 with re-
spect to the variable z. Thus
10 2
g4 = g3 − g2 = (3x y + 3x2 z − 3xy2 − 3y2 z + 2x2 + 2xz − 2y2 − 2yz + x − y)
9
10
= (x − y)(3xy + 3xz + 3yz + 2x + 2y + 2z + 1)
9
692 24. Applications

is a linear polynomial in z, and any common zero of the nine polynomials in F

is also a zero of g4 . Thus for an arbitrary point (u, v) ∈ X with u 6= v, we may
determine a unique w ∈ R from the equation g4 (u, v, w) = 0, namely

3uv + 2u + 2v + 1
w=− . (24)
3u + 3v + 2

It remains to verify that then in fact g(u, v, w) = 0 for all g ∈ F. It is sufficient to

show that the polynomial
3xy + 2x + 2y + 1
(3x + 3y + 2)2 · g x, y, −
3x + 3y + 2

is a multiple of g1 . This is certified by M APLE’s reply to the commands

g[2] := Determinant(SubMatrix(G, [2,3,5,6], [2,3,5,6]));

g[3] := Determinant(SubMatrix(G, [1,3,4,6], [1,3,4,6]));
g[4] := factor(g[3] - g[2]);
w := solve(g[4], z);
seq(divide(numer(eval(F[i], z=w)), g[1], x), i=1..nops(F));

There are four points (u, v) in X with u = v, namely

1 1 2√ 2√
(−3, −3), Q = − ,− , 1± 6, 1 ± 6 , (25)
3 3 3 3

as provided by

factor(eval(g[1], y=x));

The plane curve X contains no line and has degree four, and any line in the plane
intersects X in at most four points, counting multiplicities, by the famous theorem
of Bézout; see Section 6.8. We have found those four points for our special line
x = y (see Figure 24.4). We have to check separately—by using other polynomials
from F—that they yield the six points
1 1 1 1 1 7
(−3, −3, −3), (−3, −3, 1),
C = − ,− ,− , − ,− ,− ,
3 3 3 3 3 9
(26)
2√ 2√ 1√ 2√ 2√ 1√
1+ 6, 1 + 6, −1 − 6 , 1− 6, 1 − 6, −1 + 6
3 3 3 3 3 3

in V (F); the point C corresponds to the isolated “chair” conformation.

24.4. Cyclohexane revisited 693

Oops! We have overlooked something: division by zero. We have to investigate

separately the case where the denominator in (24) vanishes, namely when 3u +
3v + 2 = 0. Which points on X satisfy this equation?

factor(eval(g[1], y = -x-2/3));

gives three values for u, and substituting each of them in the equation v = −u−2/3
yields the three intersection points
1 1 −1 + 2√6 −1 − 2√6 −1 − 2√6 −1 + 2√6
− ,− , , , , (27)
3 3 3 3 3 3
of the line V (3x + 3y + 2) with X (see Figure 24.4). The first one is the point Q
from above, which lies on the central piece and is actually a “double point” of the
intersection. The slope of the tangent to X at a point (u, v) is

∂g1 /∂x 18uv2 + 12uv + 6v2 − 46u − 20v − 34
− (u, v) = − ,
∂g1 /∂y 18u2 v + 6u2 + 12uv − 20u − 46v − 34

and its value at Q = (−1/3, −1/3) is −1. The line has the same slope, and so it
is the tangent at Q to X. We have already seen that there are precisely two points
of V (F) lying over Q. In the vicinity of the other two intersection points, the z-
coordinate grows unboundedly, and there are no points of V (F) lying above them.
Putting things together, we have found the following. In order to determine all
solutions S11 , S12 , . . . , S66 of (14) and (18), we can proceed as follows. We pick
a point P = (u, v, w) ∈ V (F), set S13 = u, S35 = v, S15 = w, and solve for the
remaining Si j via (14) and (16). To find P ∈ V (F), we pick a real number u as
in (21) or (22), determine v according to (20) or (23) (so that (u, v) ∈ X), and w
by (24) if (u, v) equals none of the points in (25) and (27); otherwise, P is one of the
six points in (26). We obtain similar solutions, with the roles of three coordinates
permuted, when we apply the M APLE command solve directly to the original
nine equations in F.
Finally, another constraint comes into play that we have ignored so far. Each
Si j is the cosine of an angle, and hence lies between −1 and 1. These inequalities,
applied to S14 , S25 , S36 , and using (16), show that all our physical solutions lie in
the polytope
2 2 2
A = {(u, v, w) ∈ [−1, 1]3 : u + v ≤ − , u + w ≤ − , v + w ≤ − },
3 3 3
The projection of A onto the first two coordinates is the yellow triangle in Fig-
ure 24.4. Thus neither the point (−3, −3, −3) nor the outlying branches of X con-
tribute physical
√ solutions, but C = (−1/3, −1/3, −1/3) (the “chair”) and −7/9 ≤
u ≤ −1 + 6/3 do, leading precisely to the points in V (F) ∩ A.
694 24. Applications

From the point of view of computer algebra, this completely solves the problem,
except that we would also have to calculate actual a1 , . . . , a6 from S11 , S12 , . . . , S66
(Exercise 24.6).
We now indicate how Gröbner bases lead to a somewhat more systematic way
of solving our problem than the above ad hoc approach. It may seem that this is a
bit of overkill, but the reader may imagine that the solution “by hand” is no longer
feasible for cycloheptane (seven carbon atoms), where 35 polynomial equations in
seven unknowns have to be solved, while Gröbner bases still work.
We take the lexicographical order with z ≻ y ≻ x. In M APLE, the commands

with(Groebner):
B := Basis(F, plex(z,y,x));

provide in a few seconds the reduced Gröbner basis B of V (F) consisting of the
four polynomials

f1 = 9g1 = 9x2 y2 + 6x2 y + 6xy2 − 23x2 − 20xy − 23y2 − 34x − 34y − 15,
f2 = 27x4 y + 27x4 z + 18x4 + 108x3 y + 108x3 z + 18x2 y + 18x2 z
−284x2 − 212xy − 212xz − 400x − 69y − 69z − 102.
f3 = −9x3 y − 9x3 z − 6x3 − 9x2 y − 9x2 z + 18x2 + 41xy + 41xz + 20yz
+54x + 21y + 21z + 18,
f4 = 9x2 z2 + 6x2 z + 6xz2 − 23x2 − 20xz − 23z2 − 34x − 34z − 15.

(Depending on your specific M APLE version, the polynomials may appear in a

different order. While in our definition each polynomial in a reduced Gröbner
basis is monic, M APLE computes the primitive integer multiples.) What is “nice”
about this new basis? In f2 and f3 , the variables y and z occur only linearly. This
makes it easy to eliminate (at least one of) them, and does not occur in any of the
original equations. (Alternatively, the resultant method of Section 6.8 allows one
to eliminate even a variable occurring with high degree.) Moreover, f1 and f4 have
two variables only. This is no surprise, since—up to a multiplicative constant—
they equal (19) and det Ga1 ,a2 ,a3 ,a4 , respectively.
It is not hard to show that the set F of all 4×4 subdeterminants of G—and hence
the ideal I—is invariant under an arbitrary permutation of the variables x, y, z. This
symmetry is only partially reflected in the Gröbner basis: f4 is f1 with z substituted
for y, and f2 and f3 are symmetric with respect to y and z. Hence our basis B is
invariant under the exchange of y and z.
Since B is a basis for I, we have V ( f1 , f2 , f3 , f4 ) = V (I), so that we only need to
look at the solutions of f1 = f2 = f3 = f4 = 0. M APLE’s command

B := factor(B);
24.4. Cyclohexane revisited 695

shows that f1 , f3 and f4 are irreducible over Q. For f1 , for example, this follows
from the facts that f1 is primitive with respect to x, so that it has no nonconstant
factor in Q[y], and that f1 (x, 2) = 25(x2 − 2x − 7) is irreducible. The polynomial
f2 factors as f2 = (x + 3)(3x + 1) f5 , where

f5 = 9x2 y + 9x2 z + 6x2 + 6xy + 6xz − 20x − 23y − 23z − 34.

The first two factors can be found by computing contents with respect to y or z,
and f5 is irreducible because it is primitive with respect to x, so that it contains no
nonconstant factor in Q[y, z], and since f5 (x, 0, 0) = 2(3x2 − 10x − 17) is irreduci-
ble. Since a point in R 3 is a root of a polynomial f = gh if and only if it is either
a root of g or a root of h (or of both), we have V (I) = V (I1 ) ∪V (I2 ) ∪V (I3 ), where

I1 = h f1 , f3 , f4 , x + 3i, I2 = h f1 , f3 , f4 , 3x + 1i, I3 = h f1 , f3 , f4 , f5 i.

The M APLE commands

F1 := {B[1],B[3],B[4],op(1,B[2])}:
F2 := {B[1],B[3],B[4],op(2,B[2])}:
F3 := {B[1],B[3],B[4],op(3,B[2])}:
B1 := Basis(F1, plex(z,y,x));
B2 := Basis(F2, plex(z,y,x));
B3 := Basis(F3, plex(z,y,x));

compute the reduced Gröbner bases for the three new ideals with respect to the
lexicographic order and z ≻ y ≻ x. The op command in the first line extracts the
first operand x+3 of its second argument B[2], and analogously in the following
two lines. We obtain

B1 = {z2 + 2z − 3, yz + 3y + 3z + 9, y2 + 2y − 3, x + 3},
B2 = {27z2 + 30z + 7, 9yz + 3y + 3z + 1, 27y2 + 30y + 7, 3x + 1},
B3 = { f1 , f5 , f6 }, (28)

where
f6 = 3xy + 3xz + 3yz + 2x + 2y + 2z + 1.
All polynomials in B1 and B2 are products of linear factors; for example, the second
one in B1 is yz + 3y + 3z + 9 = (y + 3)(z + 3). The solutions of B1 and B2 are easily
determined:

V (I1 ) = {(−3, −3, −3), (−3, 1, −3), (−3, −3, 1)},

1 1 1 1 7 1 1 1 7
V (I2 ) = {(− , − , − ), (− , − , − ), (− , − , − )},
3 3 3 3 9 3 3 3 9
as furnished by
696 24. Applications

V1 := solve({op(B1)});
V2 := solve({op(B2)});

The following three M APLE lines verify that of the six points in V (I1 ) and V (I2 ),
all but (−3, −3, −3) and C = (−1/3, −1/3, −1/3) are also contained in V (I3 ).
for v in V1 do
eval(B3, v);
end;
for v in V2 do
eval(B3, v);
end;

It now remains to understand V (I3 ). The output of the following command

reveals that the three polynomials in (28) are irreducible over Q.
factor(B3);

We note that f6 = 9g4 /(x − y), and proceeding as in our ad-hoc approach, we can
show that over each point of X = V ( f1 ) lies precisely one point of V (I3 ), and
hence V (F) is the disjoint union of V (I3 ) and two isolated points (−3, −3, −3)
and C = (−1/3, −1/3, −1/3).
Figure 1.5 on page 15 gives a three-dimensional plot of E = V (I3 ) ∩ A. We did
not proceed as described above to produce this plot, but instead took the equation
f5 (u, v, w) = 0 to obtain the third coordinate

−9u2 v − 6u2 − 6uv + 20u + 23v + 34

w= (29)
9u2 + 6u − 23
from a point (u, v) ∈ X, since the denominator 9u2 + 6u − 23 is nonzero inside the
yellow triangle (Exercise 24.5), which makes the numerical computations more
stable than using (24). E is not an algebraic curve, but only one of several real
connected components of the algebraic curve V (I3 ). Since E is described by al-
gebraic equations and inequalities, it is semialgebraic. It can be shown that the
points of E describe precisely the flexible conformations of cyclohexane, namely
those which can be obtained from a “boat” conformation by rotations around the
bonds, but we will not do this here.
When you have generated the M APLE plot in Figure 1.5 on your system, you can
click on it and rotate it. We did this, and from one direction it looked as if the curve
were nearly a circle lying in a plane perpendicular to the diagonal vector (1, 1, 1).
We knew that this could not be exactly true, plotted the value of u + v + w + 13/9
on E, and found that this value is always between −0.0051 and 0. It would be
interesting to know if this has any chemical significance.
We have now completed the three steps indicated at the beginning of Section 1.1:
modeling, solving the model, and interpreting. These steps are not straightforward,
Notes 697

and we used some ad hoc tricks. Our strategy can be summarized as follows: we
expressed the solutions as roots of polynomial equations, computed Gröbner bases,
and factored our polynomials whenever possible. A factorization splits the prob-
lem into smaller subproblems; each of these is better manageable than the big
problem. The bottleneck in this approach is typically the Gröbner basis computa-
tion.

Notes. 24.1. We refer the reader to Krajíček (1995), Urquhart (1995), Pitassi (1997), and
Beame & Pitassi (1998) for excellent surveys of the state of the art in proof systems. The
Nullstellensatz proof system was introduced by Beame, Impagliazzo, Krajíček, Pitassi &
Pudlák (1996). Gröbner proof systems are an active area of research; see Buss, Impagli-
azzo, Krajíček, Pudlák, Razborov & Sgall (1996/97) and Razborov (1998); they are also
called polynomial calculus systems . A central measure of the complexity of such a system
is the degree of the polynomials that occur. The pigeonhole principle for n = 100 is stated
in Schwenter (1636), 53. Auffgab.
24.2. Peterson (1981) and Reisig (1985) give general introductions to Petri nets. Our alge-
braic description of Petri nets follows Mayr (1992). Further results about the reachability
problem for Petri nets and connections to finitely presented commutative semigroups and
vector addition systems are given in Mayr (1995). The fact about binomial ideals is due to
Eisenbud & Sturmfels (1996).
Mayr (1984) solved a long-standing open problem by showing that reachability for Petri
nets is decidable. His algorithm is not primitive recursive, but the best proven lower bound
for reachability is EX PSPA CE -hardness. In fact, reachability for reversible Petri nets is
EX PSPA CE-complete.
24.3. See also Notes 23.4. The method of creative telescoping for proving hypergeometric
identities was pioneered by Doron Zeilberger (1990a, 1990b, 1991), who has his comput-
ers publish papers (Ekhad 1990, Ekhad & Tre 1990) (“And how many papers did your
workstation publish?”), and Herbert Wilf (Wilf & Zeilberger 1990, 1992), and started a
debate about the price of mathematical theorems (Zeilberger 1993, Andrews 1994). Wilf
and Zeilberger received the 1998 Leroy P. Steele Prize for a Seminal Contribution to Re-
search, as reported in the April 1998 issue of the Notices of the AMS. We refer the reader
to the well-written books of Petkovšek, Wilf & Zeilberger (1996) and Koepf (1998) for
further reading on the method.
Our proof of (6) actually uses WZ-pairs , as invented by Wilf & Zeilberger (1990); see
also Wilf (1994), §4.4. Van der Poorten (1978) gives an overview of Apéry’s proof. Paule’s
(1994) computer-generated proof of (11) can be verified using only high-school algebra.
For the analysis of algorithms, it is often sufficient to have only an asymptotic approxi-
mation for a sum such as (6), in particular in those cases where (provably) no closed form
exists. There is a powerful general tool, generating functions , which together with singu-
larity analysis from complex analysis provides a standard means to obtain such asymptotic
expansions (Flajolet & Odlyzko 1990, Vitter & Flajolet 1990, Flajolet, Salvy & Zimmer-
mann 1991, Odlyzko 1995a). The software package ΛΥΩ (Flajolet, Salvy & Zimmer-
mann 1989a, 1989b, Salvy 1991, Zimmermann 1991) automates this process; it is inte-
grated in the combstruct package of the M APLE library A LGOLIB developed at INRIA
(http://algo.inria.fr/libraries/software.html). Sedgewick & Flajolet (1996)
is a very readable textbook in this area.
698 24. Applications

24.4. Most introductory texts on organic chemistry discuss the cyclohexane conforma-
tions; see for example Wade (1995). Sachse (1890, 1892) first postulated the existence of
infinitely many flexible and a single rigid conformation of cyclohexane. Oosterhoff (1949)
and Hazebroek & Oosterhoff (1951), whose goal was to determine the potential energy of
cyclohexane conformations, pioneered the approach given here, based on the inner prod-
ucts. Levelt (1997) applied computer algebra to the problem, actually to cycloheptane; our
presentation is based on Levelt’s work. Levelt cautions: Does it matter to the chemists?
The answer is “NO!”. In our model the geometry rules: the ‘building blocks’ are rigid;
in chemistry energy rules and nothing is rigid. Distances between carbon molecules may
vary, just as angles between bonds. The molecule is viewed as a conglomerate of atoms
kept together by the various forces between the constituents. The geometry of the molecule
is the result of the balance of the forces. The chemist’s flexible model is the opposite of
the rigid one in this [model]. Of course, one might now take the formulas for the potential
energy and process them with computer algebra tools; we do not know whether something
really useful comes out of that.
In their important paper, Gō & Scheraga (1970) found that for a cyclic molecule with
n ≥ 6 carbon atoms, the solution space of possible conformations is (n − 6)-dimensional in
the generic case. This does not apply to cyclohexane, where n = 6 and we have seen that the
solution space contains a one-dimensional component. For more recent contributions and
references, see Havel & Najfeld (1995) and Emiris & Mourrain (1999). The cyclohexane
problem is closely related to the well-studied 6R inverse kinematics problem from robotics;
see Parsons & Canny (1994) for references and an overview on related problems.
M OLGEN is a general purpose computer chemistry system (Benecke, Grund, Hohberger,
Kerber, Laue & Wieland 1995).
The two equations (24) and (29) are equivalent since

−(3xy + 2x + 2y + 1) · h = 27g1 + (3x + 3y + 2) · g

in Q[x, y, z], where

g = −9x2 y − 6x2 − 6xy + 20x + 23y + 34, h = 9x2 + 6x − 23,

so that both equations determine the same value of w whenever (u, v) ∈ X = V (g1 ) and
both are defined.

Exercises.
24.1−→ Refute PHP2 with both the Nullstellensatz and the Gröbner proof system.
24.2 Prove that the Petri net in Figure 24.1 is not reversible. Hint: Consider the marking M(s1 ) = 2,
M(s2 ) = M(s3 ) = 0.
24.3∗ Prove (6) for all nonnegative integers n ≥ w ≥ s by double induction on w and n.
24.4∗ Let V be a vector space over a field F and ⋆:V ×V −→ F an inner product on V . For a finite
sequence a1 , . . ., an ∈ V of vectors, let G = (ai ⋆ a j ) ∈ F n×n be the Gramian matrix of a1 , . . ., an .
(i) Show that det G = 0 if and only if a1 , . . ., an are linearly dependent.
(ii) Conclude that the rank of G is equal to the rank of {a1 , . . ., an }.
Exercises 699

24.5 (i) Show that the set

2
B = {(u, v) ∈ [−1, 1]2 : u + v ≤ − } ⊆ R 2
3
(the yellow triangle in Figure 24.4) is the projection of A onto the x, y-plane.
(ii) Show that the denominator of (29) does not vanish on B.
24.6−→ Take an arbitrary point (u, v) on the curve in Figure 24.4 and compute the coordinates of
a corresponding conformation a1 , a2 , . . ., a6 . To constrain the degrees of freedom arising from the
possibility of rotating the whole structure in space, you may assume that a1 = (1, 0, 0) and the third
coordinate of a2 is zero. (Note: this is a lengthy calculation.) Since u, v, w represent cosines of angles
and cos α = cos(2π − α) for all angles α, to each point (u, v, w) ∈ E there correspond up to four
conformations which are mirror images of each other.
24.7−→ Buy six plumbing knees at your local hardware store, and find out their angle α. Then
perform the whole calculation with cos α instead of 1/3. Plot the ellipse-like curve as in Figure 24.4,
find a point on it, and construct physically the corresponding conformation with your plumbing
materials.
24.8−→ Redo the cyclohexane computations after making the substitutions x∗ = 3x + 1, y∗ = 3y + 1,
z∗ = 3z + 1 in (17).

C’est icy un livre de bonne foy, lecteur.1

Michel Eysquem Seigneur de Montaigne (1580)

Quis leget haec?2

Aules Persius Flaccus (c. 62 AD)

Der Schreibende selbst weiß freilich nie so recht,

ob er ein bloßer Spinner ist oder ein exemplarischer Mensch.3
Markus Werner (1984)

Non difficile nobis foret hoc Caput multis aliis observationibus

locupletare, nisi limites, intra quos restringi oportet,
vetarent. Iis qui ulterius progredi amant,
haec principia viam saltem addigitare poterunt.4
Carl Friedrich Gauß (1798)

1 This book is written in good faith, reader.

2 Who would read this stuff?
3 The writer himself is never quite sure whether he is plain nuts or a model of virtue.
4 It would not be difficult for us to enrich this chapter with many other observations, but the limits within which
it is proper to remain forbid this. These principles will at least be able to indicate the direction to those who like
to progress further.
Appendix
Elementary, my dear Watson.1
Sherlock Holmes (1929)

We may always depend upon it that algebra, which cannot be

translated into good English and sound common sense, is bad algebra.
William Kingdon Clifford (1885)

Angling may be said to be so like the Mathematicks,

that it can never be fully learnt.
Izaak Walton (1653)

At Kent he was curious about computer science

but in just the introductory course Math 10 061
in Merrill Hall the math got to be too much for him.
John Updike (1981)

At the mathematical school, the proposition and demonstration were

fairly written on a thin wafer, with ink composed of a cephalic tincture.
This the student was to swallow upon a fasting stomach, and for three
days following eat nothing but bread and water. As the wafer digested,
the tincture mounted to his brain, bearing the proposition along with it.
Jonathan Swift (1726)

1 Popularly but erroneously attributed to Sir Arthur Conan Doyle.

25
Fundamental concepts

This appendix presents some of the basic notions used throughout the text, for the reader’s
reference. By necessity, this is kept rather short and without proofs; we indicate, however,
reference texts where these can be found. The reader is required to either have previous
acquaintance with the material or be willing to read up on it. Our presentation is too concise
for self-study; its purpose is to fix the language and point the reader to those areas, if any,
where she needs brushing up.
The first five sections deal with algebra: groups, rings, polynomials and fields, finite
fields, and linear algebra. Then we discuss finite probability spaces. After this mathe-
matical background come some fundamentals from computer science: O-notation and a
modicum of complexity theory.

25.1. Groups
The material of the first three sections can be found in any basic algebra text, such as
Hungerford (1990) or the latest edition of van der Waerden’s (1930b, 1931) classic on
Modern Algebra.

D EFINITION 25.1. A group is a nonempty set G with a binary operation ·: G × G −→ G

satisfying
◦ Associativity: ∀a, b, c ∈ G (a · b) · c = a · (b · c),
◦ Identity: ∃1 ∈ G ∀a ∈ G a · 1 = 1 · a = a,
◦ Inverse: ∀a ∈ G ∃a−1 ∈ G a · a−1 = a−1 · a = 1.

The group is denoted by (G; ·, 1,−1 ), but usually just the set name G is sufficient.
It is usual, for convenience of notation, to omit the symbol · from products. Thus a · b
becomes the simpler ab. We will also frequently need to distinguish between two group
operations. The alternate notation + for · , 0 for 1 and −a instead of a−1 is used. The
first representation is called a multiplicative group and the new one is called an additive
group, denoted by (G; +, 0, −).
Familiar examples are the additive groups of Z, Q, R, and C, the multiplicative groups of
Q\{0}, R\{0}, and C\{0}, and for any n ∈ N, the additive group Zn = {0, 1, 2, . . . , n−1}
with addition modulo n and the multiplicative group Z× n = {1 ≤ a < n: gcd(a, n) = 1} with
multiplication modulo n.

703
704 25. Fundamental concepts

A nonempty subset H of a group G is called a subgroup of G if it is closed under

multiplication and inversion, that is, if ab ∈ H and a−1 ∈ H for all a, b ∈ H. A subset S
of a group G generates the set hSi ⊆ G consisting of all finite products of elements in S
and their inverses. If S = {g1 , . . . , gs } is finite, then we also write hg1 , . . . , gs i instead of
h{g1 , . . . , gs }i. hSi is the smallest (with respect to inclusion) subgroup of G containing S.
For G = Z12 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} with addition mod 12, S = {3, 8} gener-
ates G. In fact, Z12 has a generating set consisting of just one generator, namely S = {1}.
If there is an element g ∈ G that generates the entire group G = hgi, then the group is called
cyclic, and g a generator of G. Each of the four elements 1, 5, 7, and 11 generates Z12 .
The additive group of R has no finite generating set.
For a finite set A, #A is its cardinality, that is, the number of elements in it. The number
of elements in a finite group is its order #G. If G is a finite group and H ⊆ G is a subgroup,
then Lagrange’s theorem says that #G = #H · #(G : H), where G : H = {gH: g ∈ G} is the
set of right cosets of H. In particular, #H is a divisor of #G.
The order ord(g) of an element g ∈ G is the order of hgi, the cyclic subgroup generated
by g, or equivalently, the smallest positive integer n such that gn = 1. It divides any other
integer k with gk = 1. An immediate consequence of Lagrange’s theorem is that ord(g)
divides #G, and g#G = 1 for all g ∈ G. Special cases are Fermat’s little theorem 4.9 and
Euler’s theorem (Section 18.1).
Most groups occurring in this text are commutative (or Abelian), satisfying the addi-
tional property
◦ Commutativity: ∀a, b ∈ G ab = ba.
The integers under addition form a commutative group, while the invertible n × n matrices
over R are a noncommutative group under multiplication if n ≥ 2. Every cyclic group is
commutative, but not vice versa: Z× 12 is commutative but not cyclic.
A group homomorphism is a map ϕ: G −→ H between two (multiplicative) groups G
and H that respects the group operation: ϕ(g1 g2 ) = ϕ(g1 )ϕ(g2 ) for all g1 , g2 ∈ G. A map
ϕ: A −→ B between two sets A and B is injective (or one-to-one) if ϕ(a1 ) = ϕ(a2 ) implies
a1 = a2 for all a1 , a2 ∈ A; ϕ is surjective (or onto) if for all b ∈ B there exists some a ∈ A
such that ϕ(a) = b. If ϕ is both injective and surjective, then it is bijective. A bijective
group homomorphism ϕ: G −→ H is called an isomorphism between G and H; then G and
H are isomorphic and we write G ∼ = H. As far as group theory is concerned, G and H are
instantiations of the same object. A group homomorphism f : G −→ H is an isomorphism
if and only if there exists a group homomorphism ψ: H −→ G so that ϕ ◦ ψ = idH and
ψ ◦ ϕ = idG , where ◦ is the composition of maps.
The kernel of a group homomorphism ϕ: G −→ H is ker ϕ = {g ∈ G: ϕ(g) = 1} ⊆ G,
and the image of ϕ is ϕ(G) = {ϕ(g): g ∈ G} ⊆ H. The homomorphism ϕ is injective
if and only if ker ϕ = {1}. (For an additive group, we have to replace 1 by 0.) If K =
ker ϕ is the kernel of ϕ, then G : K, the set of right cosets of K, forms again a group by
letting (gK)(g∗ K) = (gg∗ )K for any g, g∗ ∈ G. This group is called the factor group of
G modulo K and denoted by G/K. The homomorphism theorem for groups states that
G/K = G/ ker ϕ is isomorphic to ϕ(G); and thus #G = # ker ϕ · #ϕ(G) if these are all finite.
Given two groups G and H, a new group G × H—called the direct product—is con-
structed by setting G × H = {(g, h): g ∈ G, h ∈ H}, and defining multiplication by (g1 , h1 ) ·
(g2 , h2 ) = (g1 g2 , h1 h2 ) for g1 , g2 ∈ G and h1 , h2 ∈ H. (What is the identity element of
G × H, and what is the inverse of some (g, h)?)
25.2. Rings 705

For n ∈ N>1 , the symmetric group Sn consists of all permutations of the elements
{1, . . . , n}:
Sn = {σ: {1, . . . , n} −→ {1, . . . , n}: σ bijective};
the group operation is the composition ◦ of maps. This group has #Sn = n! elements and is
not commutative if n ≥ 3.

25.2. Rings
A ring is an algebraic structure with two operations, as follows.

D EFINITION 25.2. A ring is a set R with two binary operations ·, + : R × R −→ R satis-

fying
◦ R together with + is a commutative group with identity 0,
◦ · is associative,
◦ R has an identity element 1 for ·,
◦ Distributivity: ∀a, b, c ∈ R a(b + c) = (ab) + (ac) and (b + c)a = (ba) + (ca).
The ring is commutative if furthermore · is commutative. One sometimes drops the re-
quirement that an identity element 1 exist.

Familiar examples are Z, Q, R, and C, with the usual addition and multiplication, and,
for all n ∈ N>0 , Zn with addition and multiplication modulo n, and the set R n×n of all
n × n matrices with real entries, with matrix addition and matrix multiplication. All these
examples, except the matrices, form a commutative ring. Matrix rings briefly occur in
Chapter 12.

In this book, all rings are commutative with 1 unless otherwise stated.

A ring homomorphism from a ring R to a ring S is a map ϕ such that ϕ(r1 + r2 ) =

ϕ(r1 ) + ϕ(r2 ), ϕ(r1 r2 ) = ϕ(r1 )ϕ(r2 ) for all r1 , r2 ∈ R, and ϕ(0R ) = 0S and ϕ(1R ) = 1S .
ϕ is an isomorphism if it is also bijective, and then R and S are isomorphic: R ∼ = S. (We
have a small ambiguity here, in that two rings might be isomorphic as groups with the
operation +, but not as rings, for example, R × R and C.) Thus a ring homomorphism is
a homomorphism for both the additive and the multiplicative structure. Again isomorphic
rings are considered to be essentially the same structure.
A right ideal I is a subset of the ring R satisfying
◦ ∀a, b ∈ I a + b ∈ I,
◦ ∀a ∈ I, r ∈ R ar ∈ I.
When R is commutative, this is called an ideal. (Question: is every ideal a ring?) As an
example, in Z the numbers that are divisible by 12 form the ideal

I = {. . . , −24, −12, 0, 12, 24, . . .} = {12r: r ∈ Z} = 12Z.

In general, we write

ha1 , . . . , as i = a1 R + · · · + as R = {a1 r1 + · · · + as rs : r1 , . . . , rs ∈ R}
706 25. Fundamental concepts

for the ideal generated by a1 , . . . , as , and say that a1 , . . . , as is a basis of that ideal. In
particular, hai = aR = {ar: r ∈ R} is the principal ideal generated by a ∈ R. We note an
ambiguity inherent in the notation hai, as exemplified by 12Z = “h12i” 6= 12Q = Q.
Suppose that I ⊆ R is an ideal, and r, s ∈ R. We say that r and s are congruent modulo I
(written as “r ≡ s mod I”) if r − s ∈ I. As an example, with R = Z, we have 14 ≡ 2
mod 12Z, which we also write as 14 ≡ 2 mod 12. If a, b ∈ R, we write a | b (“a divides b”)
if there exists some r ∈ R with ar = b, and a ∤ b otherwise.
For r ∈ R, the set r mod I = r + I = {r + a: a ∈ I} ⊆ R is a residue class modulo I or a
coset of the ideal I. (Note the distinction between the congruence relation modulo I, as in
14 ≡ 2 mod 12Z, where the “ mod ” belongs to the ≡ sign, and the residue class modulo I,
as in 2 mod 12Z.) We have the following equivalences:
r mod I = s mod I ⇐⇒ r − s ∈ I ⇐⇒ r ≡ s mod I
for all r, s ∈ R. The set R/I = {r mod I: r ∈ R} of all residue classes modulo I is again a ring
if we define the ring operations by (r mod I) + (s mod I) = (r + s) mod I and (r mod I) ·
(s mod I) = (rs) mod I. It is called the residue class ring (or factor ring) of R modulo I.
We have the canonical ring homomorphism ϕ: R −→ R/I mapping r ∈ R to its residue
class r mod I. For instance, if R = Z and I = 12Z, then R/I = Z/12Z = {0 mod 12Z,
1 mod 12Z, 2 mod 12Z, . . . , 11 mod 12Z}, and ϕ(14) = 14 mod 12Z = 2 mod 12Z. We
will also write 2 mod 12, or even simply 2, for 2 mod 12Z, thus identifying the residue
class ring Z/12Z with Z12 = {0, 1, 2, . . . , 11}.
More generally, a set of elements S ⊆ R is a system of representatives for I if for all
a ∈ R there exists exactly one b ∈ S such that a ≡ b mod I. For example, {0, 1, 2, . . . , 11}
is a system of representatives for I = 12Z; another one is {−5, −4, . . . , 4, 5, 6}. There are
many other such systems. A system of representatives can be made into a ring again, by
using multiplication and addition modulo I, and then S ∼ = R/I.
As for groups, there is also a homomorphism theorem for rings. If R and S are rings,
ϕ: R −→ S is a ring homomorphism, and I = ker ϕ = {r ∈ R: ϕ(r) = 0} ⊆ R is the kernel
of ϕ, then I is an ideal of R, and R/I is isomorphic to the subring ϕ(R) = {ϕ(r): r ∈ R}
of S, the image of ϕ.
If R and S are rings, then the ring R × S = {(r, s): r ∈ R, s ∈ R} is the direct product of R
and S. The ring operations are defined componentwise: (r1 , s1 )+(r2 , s2 ) = (r1 +r2 , s1 +s2 )
and (r1 , s1 ) · (r2 , s2 ) = (r1 r2 , s1 s2 ) for all r1 , r2 ∈ R and s1 , s2 ∈ S.
We can add more and more properties to rings, to get domains in which more interesting
things can be done, for instance computing greatest common divisors or factoring. The first
restriction is to consider integral domains, which are nontrivial commutative rings without
nonzero zero divisors. Here, nontrivial means that 1 and 0 are distinct, and a zero divisor
is an element a ∈ R for which there is a nonzero element b ∈ R such that ab = 0. (Is Z12
or Z7 an integral domain?) Thus 0 is a zero divisor in any ring. In an integral domain we
have the useful fact that if a 6= 0 and ab = ac then b = c, known as the cancellation law.
For the ring of integers, we have two further interesting properties.
◦ Division property: ∀a, b ∈ Z, b 6= 0, ∃!q, r ∈ Z a = qb + r and 0 ≤ r < |b|.
◦ Unique factorization: Every integer greater than 1 has an (essentially) unique factor-
ization as a product of primes.
These properties generalize to other rings. To talk about the division property we need an
extra function to satisfy the role played by the absolute value in the integer case. An integral
25.2. Rings 707

domain R is called a Euclidean domain if there exists a Euclidean function d: R −→

N ∪ {−∞} satisfying the division property for all a, b ∈ R: If b 6= 0 then there exist q, r ∈ R
satisfying a = qb + r and d(r) < d(b) (Section 3.1). Such a q is called a quotient and r is
called a remainder. They need not be unique. (Is there a Euclidean function on Z × Z?)
A greatest common divisor (gcd) of two elements a, b in an integral domain R is an
element c ∈ R such that c | a and c | b and any other common divisor of a and b divides c.
Two elements a, b ∈ R are coprime (or relatively prime) if gcd(a, b) = 1. Greatest com-
mon divisors need not be unique: for example, both −2 and 2 are gcd’s of 4 and 6 in
R = Z. The slightly tricky issue of choosing between several possible “gcd’s” is discussed
in Section 3.4. The Euclidean Algorithm for computing greatest common divisors and the
Chinese Remainder Algorithm (Chapters 3 and 5) work in Euclidean domains.
For unique factorization, we need some more concepts. A unit u in an integral domain
R is an element with a multiplicative inverse, so that there exists a v ∈ R with uv = 1. The
set R× of units of R is a group under multiplication. Thus Z× = {−1, 1}. An element
a is an associate of an element b if there is a unit u such that a = ub. We write a ∼ b
if a is an associate of b. This is an equivalence relation: we have a ∼ a (reflexivity),
a ∼ b ⇐⇒ b ∼ a (symmetry), and a ∼ b ∼ c =⇒ a ∼ c (transitivity), for all a, b, c ∈ R.
In Z, +4 and −4 are associates. A nonzero nonunit p in an integral domain R is reducible
if there are nonunits a, b ∈ R such that p = ab, otherwise p is irreducible. Units are neither
reducible nor irreducible. A nonzero nonunit p ∈ R is prime if p | (ab) implies p | a or
p | b for all a, b ∈ R. R is a Unique Factorization Domain (UFD) (or a factorial ring)
if every nonzero nonunit a ∈ R can be written as a product of irreducible elements in a
unique way up to reordering and multiplication by units. For example, R = Z is a UFD,
and 12 = 2 · 2 · 3 = (−3) · 2 · (−2) are both decompositions of 12 into irreducibles. (What
are the units of the ring Z × Z? Factor (18, 27) in Z × Z and figure out how many different
ways there are to factor it when associates are not considered the same.)
Greatest common divisors need not exist in arbitrary integral domains (we will see an
example below), but they always exist in UFDs. Similarly, “prime” and “irreducible” are
not necessarily equivalent notions in arbitrary integral domains (see below). Primes are
always irreducible, and the converse is true if any two nonzero ring elements have a gcd,
in particular, in a UFD.
Algebraic number fields furnish interesting
√ examples illustrating
√ some
√ ring properties.
We consider imaginary quadratic fields Q( d) = Q + Q d = {b + c d: b, c ∈ Q} ⊆ C,
√ d ∈ Z is negative and squarefree. Then R = Od , the ring of algebraic integers in
where
Q( d), equals
( √ √
Z[ d]√
= Z+Z d √ if d ≡ 2, 3 mod 4,
Od = 1+ d 1+ d
Z[ 2 ] = Z + Z 2 if d ≡ 1 mod 4.

The ring Z[i] of Gaussian integers is the special case where d = −1. The norm N: R −→ Z
is defined by N(a) = aa = |a|2 = b2 + c2 , where a denotes the complex conjugate of a =
b + ic, with b, c ∈ R, and takes only nonnegative values. This norm is a Euclidean function
on R if and only if d ∈ {−1, −2, −3, −7, −11}, and these are the only cases where R is
Euclidean at all. Furthermore, R is a Unique Factorization Domain if and only if R is
Euclidean or d ∈ {−19, −43, −67, −163}.
There exist integral domains that are not even UFDs. The√classical example (from
Dirichlet’s (1893) Zahlentheorie, page 451) is R = O−5 = Z + Z −5, the ring of algebraic
708 25. Fundamental concepts

√
integers of Q( −5). In this ring,
√ √
(1 + −5) · (1 − −5) = 6 = 2 · 3

are two essentially different decompositions

√ of 6 into irreducibles. To see this, we first
convince ourselves that 2, 3, and 1 ± −5 are all irreducible.
√
Let us assume that
√1 + −5 = bc for some b, c ∈ R. By the multiplicativity of the norm,
we have 6 = N(1 + −5) = N(b)N(c).√ The units of R are exactly the elements of norm 1,
that is 1, −1 ∈ R. But N(α + β −5) = α2 + 5β 2 is congruent to 0, 1, or 4 modulo √ 5 for
any α, β ∈ Z. Thus N(b) ∈ {2, 3}√is impossible, and either b or c is a unit, so that 1 + −5
is irreducible. The proof for 1 − −5, 2, and 3 goes analogously.
√
It remains to show that none of 1 ± √ −5 is associate to 2 or 3. But N(2) = 4 and
N(3) = 9 are both different from N(1 ± −5) = 6, and associate elements have the same
norm. This also shows that irreducibles are√ not necessarily
√ prime in arbitrary integral
domains: 2 is irreducible and divides (1 + −5)(1 √ − −5) in R, but divides none of the
factors; hence 2 is not prime. Moreover, 6 and 2 + 2 −5 have no gcd in O−5 .
The following theorem summarizes some properties of integral domains that are equiv-
alent to being a UFD.

T HEOREM 25.3.
For an integral domain R, the following are equivalent.

(i) R is a UFD.
(ii) Any nonzero nonunit in R can be written as a product of primes.
(iii) Any nonzero nonunit in R can be written as a product of irreducibles, and any irre-
ducible in R is prime.
(iv) Any nonzero nonunit in R can be written as a product of irreducibles, and any two
nonzero elements of R have a gcd in R.

In particular, since gcd’s exist in Euclidean domains (Chapter 3), every Euclidean do-
main is a UFD. The reverse is false in general, as we have seen above: O−19 is a UFD but
not a Euclidean domain. Other examples are in Exercises 3.17 and 21.1.

25.3. Polynomials and fields

Given a ring R, the ring of univariate polynomials over R is the set S of vectors a =
(a0 , a1 , . . .) with entries from R, where all but a finite number of the ai ’s are zero, and where
addition and multiplication are defined by (a0 , a1 , . . .) + (b0 , b1 , . . .) = (a0 + b0 , a1 + b1 , . . .)
and (a0 , a1 , . . .) · (b0 , b1 , . . .) = (c0 , c1 , . . .), with cn = ∑ni=0 ai bn−i . The degree deg a of a
nonzero polynomial a is the largest integer n such that an 6= 0. Its leading coefficient
lc(a) is an , and a is monic if lc(a) = 1. It is convenient to define the degree of the zero
polynomial as −∞. It is more usual to denote the polynomial (a0 , a1 , a2 , . . .) by an xn + · · ·
+ a2 x2 + a1 x + a0 when ak = 0 for all k > n. Here the indeterminate x is just a placeholder,
and we then write S = R[x]. The ring of power series is defined similarly but without the
restriction that only a finite number of terms be nonzero, and denoted by R[[x]].
25.3. Polynomials and fields 709

We also use polynomials in two or more variables. R[x][y] consists of univariate polyno-
mials in y with coefficients in R[x], but by collecting powers of x, we may as well consider
its elements as univariate polynomials in x with coefficients in R[y]. To reflect this symme-
try, we use the notation R[x, y], and more generally R[x1 , . . . , xn ]. We denote the degree and
the leading coefficient of such a multivariate polynomial a with respect to the variable xi
by degxi a and lcxi (a), respectively. The total degree of a multivariate monomial xe11 · · · xenn
is e1 + · · · + en , and the total degree of a 6= 0 is the maximal total degree of its monomials.
If R is a commutative ring or an integral domain, then so is R[x]. Gauß’ famous theo-
rem 6.8 shows that R[x] is a UFD if R is. We might hope that the same holds for Euclidean
domains. However, the division property goes away (say, in Z[x], as can be seen when you
try to divide x2 + 3 by 3x + 1). The division property holds if the leading coefficient of b is
a unit of R (Section 2.4).
If R is integral, then the units of R[x] are simply the units of R, where we use the nat-
ural identification of R with polynomials of degree 0. Irreducibles are a bit trickier. For
instance, x2 + 1 is irreducible in Z[x] and Z3 [x], but in Z5 [x], x2 + 1 = (x + 2)(x − 2).
The following lemma states an important property of polynomials over integral domains.

L EMMA 25.4. Let R be a ring (commutative, with 1) and f ∈ R[x].

(i) For any u ∈ R, we have f (u) = 0 if and only if (x − u) | f .

(ii) If R is an integral domain and f 6= 0, then f has at most deg f many roots in R.

Claim (ii) is not true in general rings: f = x2 ∈ Z16 [x] has the four roots 0, 4, 8, and 12.
For m ∈ Z, the canonical ring homomorphism ϕ: Z −→ Zm can be applied to each co-
efficient of a polynomial; this yields a homomorphism Z[x] −→ Zm [x] that is usually also
denoted by ϕ. Its kernel is m · Z[x], the ideal of polynomials with all coefficients divisible
by m. When u is an element of a ring R, then the evaluation homomorphism ε: R[x] −→ R
takes a polynomial f ∈ R[x] to ε( f ) = f (u) ∈ R. Its kernel is hx − ui, by Lemma 25.4 (i),
and the homomorphism theorem for rings shows that R[x]/hx − ui ∼ = im ε = R.
More generally, we have the canonical homomorphism R[x] −→ R[x]/hmi for any m
in R[x]. If m is nonconstant and monic, then the polynomials f ∈ R[x] of degree less
than deg m form a system of representatives for hmi, and hence they form the factor ring
R[x]/hmi, with addition and multiplication modulo m. When computing in R[x]/hmi, one
usually takes these representatives. The evaluation homomorphism ε is the special case
m = x − u.
If R and S are rings, ϕ: R −→ S is a ring homomorphism, f ∈ R[x1 , . . . , xn ] a polynomial
in n variables, and r1 , . . . , rn are elements of R, then

ϕ( f (r1 , . . . , rn )) = ϕ( f )(ϕ(r1 ), . . . , ϕ(rn )),

where ϕ( f ) ∈ S[x1 , . . . , xn ] is obtained from f by applying ϕ to the coefficients. Similarly,

if I is an ideal of R, f is as above, and r1 , . . . , rn , r1∗ , . . . , rn∗ are elements of R satisfying
ri ≡ ri∗ mod I for 1 ≤ i ≤ n, then f (r1 , . . . , rn ) ≡ f (r1∗ , . . . , rn∗ ) mod I. (Check this!) We say
that ring homomorphisms and congruences commute with polynomial expressions. This
is the basis for modular arithmetic (Section 4.1).
710 25. Fundamental concepts

A field is an integral domain in which every nonzero element is a unit, that is, it
has a multiplicative inverse. Familiar examples are the fields Q of the rational num-
bers, the field R of the real numbers, and the field C of the complex numbers. We have
Q ⊆ R ⊆ C. The polynomial ring F[x] over a field F is Euclidean.
The number of elements in a field or ring is called its order. The above fields all have
infinite order, however, there are fields of finite order too. Among them are the fields Z p ,
where p is a prime. The existence of the inverse of a nonzero element a in Z p follows from
the fact that for 1 ≤ a < p, the (traditional) Extended Euclidean Algorithm 3.6 computes
s,t ∈ Z such that 1 = as + pt ≡ as mod p. Finite fields will be discussed in the next section.
If we consider the ring Z3 × Z3 of order 9, then its multiplicative identity is (1, 1), and
(1, 1) + (1, 1) + (1, 1) = (0, 0). This leads us to define the characteristic char R of a ring
or field R to be the minimum number of times the identity element can be added to itself
to get 0. In the case where this can never produce zero, the ring or field is said to have
characteristic zero. Q, R, and C are fields of characteristic zero, and the characteristic of
Z p is p.
If R is an integral domain, then K = {a/b: a, b ∈ R, b 6= 0} is the field of fractions of R.
For example, Q is the field of fractions of Z, and F(x), the set of rational functions in x
with coefficients in the field F, is the field of fractions of the polynomial ring F[x].
If a field F is contained in another field E, then we say that E is an extension field
of F and F is a subfield of E. For instance, C is an extension field of R, and R is an
extension field of Q. E is a vector space over F (see Section 25.5). An element α ∈ E is
algebraic over F if it is the root of a polynomial f ∈ F[x]: f (α) = 0 (or equivalently, if the
F-subspace of E generated 2
√by 1, α, α , . . . is finite dimensional). Thus all elements of F are
algebraic over F, and i = −1 ∈ C is algebraic over Q and R (taking f = x2 +1). Elements
that are not algebraic are called transcendental. For example, π and e are transcendental
over Q (see Notes 4.6). If all elements of E are algebraic over F, then we say that E is an
algebraic extension of F. For example, C is an algebraic extension of R, but not of Q.
If the dimension of E as a vector space over F is finite, then we say that E is a finite
extension of F. The dimension is denoted by [E : F], also called the degree of E over F.
All finite extensions are algebraic. If F ⊆ E ⊆ K are finite extensions, we have the degree
formula [K : F] = [K : E] · [E : F].
If α ∈ E is algebraic over F, then the set I = { f ∈ F[x]: f (α) = 0} of all polynomials
with coefficients in F that have α as a root is an ideal in F[x]. Since F[x] is a Euclidean
domain, every ideal in F[x] is generated by a single element, namely the unique nonzero
monic polynomial mα of least degree in I, so that I = hma i. It is called the minimal
polynomial of α. The minimal polynomial mα is irreducible, since otherwise there would
be a divisor of mα of smaller degree that has α as a root. Since mα generates I, all other
polynomials having α as a root are divisible by mα , and it is the only monic polynomial
with that property. The degree of α over F is deg mα . If F(α) ⊆ E denotes the smallest
subfield of E containing F and α, then deg mα = [F(α) : F]. For example, if E = C, F = R,
and α = i, then mi = x2 + 1, R(i) = C, and [C: R] = 2.
We may construct an algebraic field extension of a field F by taking E = F[x]/h f i
for an irreducible polynomial f ∈ F[x]. Since F[x] is Euclidean, the Extended Euclidean
Algorithm 3.14 computes s,t ∈ F[x] with 1 = as + f t ≡ as mod f for all nonzero a ∈ F[x]
with deg a < deg f . This shows that all nonzero elements of E are invertible, and E is an
extension field of F if we identify F with the set of constant polynomials in E; in fact,
it is an algebraic extension. The polynomial f ∈ F[x] has α = (x mod f ) ∈ E as a root
25.4. Finite fields 711

(Lemma 4.5); actually, it is the minimal polynomial of α. If deg f = n, then the elements
αn−1 , . . . , α2 , α, 1 ∈ E are a basis of E over F. Thus E = F(α) and [E : F] = n.
On the other hand, if E is any extension field of F and α ∈ E is algebraic over F, with
minimal polynomial f ∈ F[x], then F(α), the smallest subfield of E containing F and α,
is isomorphic to F[x]/h f i. This follows from the homomorphism theorem for rings, since
the homomorphism F[x] −→ E which evaluates at α has kernel h f i and image F(α). For
example, R[x]/hx2 + 1i and C are isomorphic fields, under an isomorphism that associates
x mod (x2 + 1) to i.
An algebraic field extension E of a field F is the splitting field of a nonconstant poly-
nomial f ∈ F[x] if f splits into linear factors over E, but not over any proper subfield of E.
A field F is algebraically closed if and only if every nonconstant polynomial f ∈ F[x]
has a root in F; then f has deg f many roots, counting multiplicities. The fundamental
theorem of algebra says that the field C of complex numbers has this property. A smallest
algebraically closed field containing F (with no proper subfield enjoying this property) is
called an algebraic closure of F; this always exists.

rings

commutative rings

integral domains
UFDs

Euclidean
domains

fields

F IGURE 25.1: The hierarchy of rings.

Figure 25.1 illustrates the classes of rings that we have discussed so far and their con-
tainment relations.

25.4. Finite fields

The reader can find the following facts and techniques about finite fields, and all she ever
wants to know about this topic, in the “bible of finite fields” by Lidl & Niederreiter (1997).
We have already dealt with the finite fields Z p = Z/hpi, where p is a prime. These,
together with Q, are the prime fields; every field contains exactly one (isomorphic copy)
of these. What other finite fields are there? If f ∈ Z p [y] is an irreducible polynomial of
degree n, then Z p [y]/h f i is an algebraic field extension of Z p of degree n. It is a vector
space of dimension n over Z p ; hence it has pn elements. A basis for Z p [y]/h f i over Z p is
{yn−1 mod f , . . . , y mod f , 1 mod f }.
712 25. Fundamental concepts

For every prime power q = pn , there exists a field with q elements. All such fields are
(non-canonically) isomorphic to each other, and we write Fq for any of them. In partic-
ular, F p = Z p for a prime p, but it is important to remember that F pn ∼ 6 Z pn for n ≥ 2.
=
On the other hand, every finite field has pn elements, for some prime p and n ≥ 1, and
is isomorphic to Z p [y]/h f i for some irreducible polynomial f ∈ Z p [y] of degree n. The
characteristic of F pn is p.
Fermat’s little theorem 4.9 says that a p−1 = 1 for a prime p and all a ∈ F× p
p , hence a = a
for all a ∈ F p . This holds in arbitrary finite fields.

T HEOREM 25.5 Fermat’s little theorem.

Let q be a prime power. For all a ∈ Fq , we have aq = a, thus aq−1 = 1 if a 6= 0, and

xq − x = ∏ (x − a) in Fq [x].
a∈Fq

P ROOF. Lagrange’s theorem implies that each element g of a group with m elements sat-
isfies gm = 1. The unit group F× q = Fq \{0} has q − 1 elements, so that a
q−1 = 1 for all

nonzero a ∈ Fq , and aq = a for all a ∈ Fq . Thus x − a divides xq − x for all a ∈ Fq , and since
gcd(x − a, x − b) = 1 for a 6= b, we have that ∏a∈Fq (x − a) divides xq − x. Both polynomials
are monic and have degree q, and hence they are equal. ✷

Fq12

Fq6 Fq4

Fq3 Fq2

F IGURE 25.2: The subfield lattice of Fq12 .

If a finite field Fqm is contained in another finite field Fqn , then Fqn is a vector space
over Fqm , and in particular the number of elements #Fqn = qn is a power of #Fqm = qm , or
equivalently, m | n. Conversely, if m | n, then Fqn is an extension field of (an isomorphic
m
copy of) Fqm , namely the set of all roots of xq − x in Fqn . For example, F4 is a subfield
of F16 , but F8 is not, despite the fact that 8 | 16. Figure 25.2 shows the lattice of all sub-
fields of Fq12 corresponding to the lattice of divisors of 12; a field is contained in another
one if there is a path from the latter down to the former. (A different notion of “lattice” is
used in Chapter 16.)
25.5. Linear algebra 713

The order of the multiplicative group F× q is q − 1. Fermat’s little theorem implies that
ord(a) | q − 1 for all a ∈ F× × ×
q . An element a ∈ Fq is primitive if it generates the group Fq ,
× ×
or equivalently, if its order is q − 1. Fq contains a primitive element, so that Fq is cyclic
(Exercise 8.16). More generally, Fq contains an element of order n if and only if n | (q − 1)
(Lemma 8.8).
A ring R containing F p is called an F p -algebra. A fundamental property is that for any
commutative F p -algebra R, elements a, b ∈ R, and i ∈ N, we have
i i i
(a + b) p = a p + b p .

This is proved by induction on i; for i = 1, all binomial coefficients in the expansion of the
left hand power are divisible by p, and hence 0 in R. Let Fqn be an extension field of Fq .
The map
Fqn −→ Fqn
ϕ:
α 7−→ αq
is an automorphism of the finite field Fqn , called the Frobenius automorphism. The
following hold for all α, β ∈ Fqn :

(α + β)q = αq + β q , (αβ)q = αq β q ,
(1)
αq = α ⇐⇒ α ∈ Fq .
The last property, an immediate consequence of Fermat’s little theorem, says in the lan-
guage of Galois theory that Fq is the fixed field of ϕ.
Similarly, in any Fq -algebra R, we have the Frobenius endomorphism

R −→ R
ϕ: (2)
α 7−→ αq

Of particular importance is the case R = Fq [x], which also shows that ϕ is not surjective in
general.
If f ∈ Fq [x] is irreducible of degree n and α ∈ Fqn ∼ = Fq [x]/h f i is a root of f , then
f (αq ) = f (α)q = 0, so that αq is also a root of f . More generally, the roots of f in Fq n are
2 n−1
precisely the n conjugates α, αq , αq , . . . , αq of α.
In computer algebra, both the finite fields F pn and the finite commutative rings Z pn ,
each with pn elements, play a role. If n ≥ 2, then these are non-isomorphic objects, since
the former is a field while the latter has nonzero zero divisors. Another difference is that
char F pn = p and its additive group is isomorphic to Z pn , while char Z pn = pn and its additive
group is cyclic.

25.5. Linear algebra

We present the basic concepts of linear algebra, mainly to fix the language. This material
is too condensed to be understood on its own; previous instruction in linear algebra is an
indispensable prerequisite for this book. Among the good textbooks in this area is Strang
(1980).
The central objects in linear algebra are vector spaces over a field F. These are com-
mutative groups (V, +) with a scalar multiplication · by elements of F satisfying the
following properties:
714 25. Fundamental concepts

◦ λ · (v + w) = λ · v + λ · w,
◦ (λ + µ) · v = λ · v + µ · v,
◦ λ · (µ · v) = (λµ) · v,
for all λ, µ ∈ F and v, w ∈ V . We will write λv instead of λ · v for short. The elements of V
are called vectors, those of F scalars. The most popular example of a vector space is F n
for some n ∈ N, whose elements are n-tuples (a1 , . . . , an ) of elements a1 , . . . , an ∈ F, with
componentwise addition and scalar multiplication.
A subset U of a vector space V is a subspace of V if it is closed under addition and
scalar multiplication, so that u + v and λv are again in U for all u, v ∈ U and λ ∈ F. A fi-
nite sequence v1 , . . . , vn ∈ V of vectors is called linearly dependent if there exist scalars
λ1 , . . . , λn ∈ F, not all zero, such that λ1 v1 + · · · + λn vn = 0. Otherwise, v1 , . . . , vn are lin-
early independent. The subspace generated by the vectors v1 , . . . , vn ∈ V is the set of all
linear combinations hv1 , . . . , vn i = {λ1 v1 + · · · + λn vn : λ1 , . . . , λn ∈ F}. A vector space V is
finite-dimensional if it is generated by finitely many vectors. A finite sequence (v1 , . . . , vn )
of elements of V is a basis of V if the vectors are linearly independent and hv1 , . . . , vn i = V .
A central theorem in linear algebra is that any finitely generated vector space V has a finite
basis (and any generating sequence contains one), and that all bases have the same number
of elements, called the dimension dimV of V . For example, dim F 3 = 3, and a basis is
given by the three unit vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1). More generally, we have
dim F n = n for all n ∈ N>0 . With respect to a basis (v1 , . . . , vn ) of V , every vector v has a
unique representation v = λ1 v1 + · · · + λn vn as a linear combination of the basis elements,
with coordinates λ1 , . . . , λn ∈ F.
A map f :V −→ W between two vector spaces over the same field F is (F -)linear or
a homomorphism if f (v1 + v2 ) = f (v1 ) + f (v2 ) and f (λv1 ) = λ f (v1 ) for all λ ∈ F and
v1 , v2 ∈ V . The notions endo-, iso-, and automorphism are defined similarly as for groups.
V and W are isomorphic if there exists an isomorphism between them. If V and W are
finite-dimensional, then they are isomorphic if and only if dimV = dimW . The image
im f = { f (v): v ∈ V } of a homomorphism f :V −→ W is a subspace of W , and the kernel
ker f = {v ∈ V : f (v) = 0} is a subspace of V . As for groups, f is injective if and only
if ker f = {0}, and f is surjective if and only if im f = W . An equivalent of Lagrange’s
theorem is the dimension formula for homomorphisms:
dim ker f + dim im f = dimV, (3)
if V is finitely generated.
If V and W are vector spaces over F with bases v1 , . . . , vn and w1 , . . . , wm , respectively,
then to a homomorphism f :V −→ W corresponds the m × n matrix A = (ai j ) 1≤i≤m ∈ F m×n
1≤ j≤n
defined by
f (vi ) = a1i w1 + · · · + ami wm , (4)
and then
   
λ1 µ1
A  ...  =  ..  ⇐⇒ f (λ v + · · · + λ v ) = µ w + · · · + µ w
  
.  1 1 n n 1 1 m m
λn µm
for arbitrary λi , µ j ∈ F. Conversely, for any matrix A ∈ F m×n , (4) defines a homomorphism
f :V −→ W , and the kernel and image of A are defined to be those of f . The rank of A is
25.5. Linear algebra 715

dim(im A), or equivalently, the maximal number of linearly independent columns (or rows)
of A. Composition of homomorphisms corresponds to multiplication of matrices.
A square matrix A = (Ai j )1≤i, j≤n ∈ F n×n is nonsingular (or invertible) if there exists
a matrix B ∈ F n×n such that AB = In , where In is the n × n unit matrix. Otherwise, A is
singular. We write A−1 for B. The matrix A is nonsingular if and only if the endomorphism
y 7−→ Ay of F n is an automorphism, which holds if and only if the rank of A is n. The set
of all nonsingular n × n matrices forms a group with respect to matrix multiplication.
An n × n matrix A = (ai j )1≤i, j≤n ∈ F n×n is a permutation matrix if there is a permu-
tation σ ∈ Sn such that for all i, j, we have ai j = 1 if j = σ(i) and ai j = 0 otherwise. The
set of all permutation matrices in R n×n is a finite subgroup of the multiplicative group of
invertible n × n matrices which is isomorphic to Sn .
A system of linear equations over F has the form

a11 y1 + a12 y2 + · · · + a1n yn = b1 ,

a21 y1 + a22 y2 + · · · + a2n yn = b2 ,
..
.
am1 y1 + am2 y2 + · · · + amn yn = bm ,

where a11 , a12 , . . . , amn , b1 , . . . , bm ∈ F are given and y1 , . . . , yn ∈ F are sought. The matrix
A = (ai j ) ∈ F m×n is the coefficient matrix and the vector b = (b1 , . . . , bm )T ∈ F m the
right hand side of the system, where T denotes transposition. The system may then be
written more briefly as Ay = b, where y = (y1 , . . . , yn )T is the vector of indeterminates. The
solution space {y ∈ F n : Ay = b} of the linear system is either empty or a coset (in the sense
of additive groups) v + ker A of the subspace ker A = {y ∈ F n : Ay = 0}, where v ∈ F n is any
particular solution. In the language of homomorphisms, {y ∈ F n : Ay = b} is the preimage
of b under the homomorphism f : F n −→ F m given by f (y) = Ay.
The famous Gaussian elimination algorithm provides a means for solving linear sys-
tems (and many other computational problems in linear algebra). Given a matrix A ∈ F m×n ,
Gaussian elimination computes an invertible matrix L ∈ F m×m and a permutation matrix
P ∈ F n×n such that U = LAP is of block form

Ir V
U=
0 0

where r is the rank of A, Ir the r × r identity matrix, and V ∈ F r×(n−r) . If m = n, then

the procedure takes O(n3 ) arithmetic operations in F.
A basis of ker A is (Pv1 , . . . , Pvn−r ),
−V
where vi is the ith column of the block matrix In−r ∈ F n×(n−r) . If b ∈ F m is an arbitrary
right hand side, then Ay = b is solvable if and only if the lower m − r coefficients of Lb
are zero. In that case, we let v ∈ F n be the vector whose upper r coefficients are those of
Lb and whose lower n − r ones are zero, and then Pv is a particular solution of the linear
system. In case we start with an invertible n × n matrix A, then U is the identity matrix,
and A−1 = PL.
The determinant of a square matrix A = (Ai j )1≤i, j≤n ∈ F n×n is

det A = ∑ (−1)sign σ a1σ(1) · · · anσ(n) ∈ F,

σ ∈Sn
716 25. Fundamental concepts

where sign σ = #{1 ≤ i < j ≤ n: σ(i) > σ( j)} is the number of inversions of the permuta-
tion σ of {1, . . . , n}. The determinant is multiplicative, so that det(AB) = det A · det B for
all A, B ∈ F n×n, changes sign when two rows (or columns) are exchanged, and is invariant
under addition of a multiple of one row (or column) to another one. Moreover,

det A = 0 ⇐⇒ A is singular.

Let i, n ∈ N with 1 ≤ i ≤ n, A ∈ F n×n be a square matrix, a1 , . . . , an ∈ F the entries of

the ith row of A, and A j ∈ F (n−1)×(n−1) the matrix resulting from A by deleting row i and
column j, for 1 ≤ j ≤ n. Then

det A = ∑ (−1)i+ j a j det A j ;

1≤ j≤n

this is called Laplace expansion (or expansion into cofactors) along the ith row. Of course,
this also holds when the roles of rows and columns are exchanged.
For computing determinants, this is not useful. It is more efficient to use a variant of
Gaussian elimination which produces a matrix L ∈ F n×n with det L = 1 and a permutation
matrix P ∈ F n×n , which has det P = ±1, such that U = LAP is upper triangular, and then
det A = det L−1 detU det P−1 = ± detU is—up to sign—equal to the product of the diagonal
elements of U. This follows from repeated use of Laplace expansion.
If A ∈ F n×n is nonsingular, then the linear system Ay = b has a unique solution y ∈ F n
for any right hand side b ∈ F n , namely y = A−1 b. The following theorem is an impor-
tant theoretical application of determinants; it is not useful for the practical solution of
nonsingular systems of linear equations.

T HEOREM 25.6 Cramer’s rule.

Let A ∈ F n×n be nonsingular and b ∈ F n . Then the coefficients of the unique solution
y = (y1 , . . . , yn )T ∈ F n of the linear system Ay = b are given by

det Ai
yi = ,
det A
where Ai ∈ F n×n is the matrix A with the ith column replaced by b.

The characteristic polynomial χA of a square matrix A ∈ F n×n is χA = det(A − xI)

in F[x]. It has degree n, and its zeroes (over an algebraic closure of F) are precisely the
eigenvalues of A, that is, the scalar values λ ∈ F such that Av = λv for some nonzero
v ∈ F n . We have det A = χA (0). If f = ∑0≤i≤m fi xi ∈ F[x] is any polynomial, then we
may plug in A for the indeterminate x and obtain f (A) = ∑0≤i≤m fi Ai ∈ F n×n . The set
Ann(A) = { f ∈ F[x]: f (A) = 0} is an ideal in F[x] and has a unique nonzero monic gener-
ator mA ∈ F[x], the minimal polynomial of A. The Cayley–Hamilton theorem says that
χA (A) = 0 and mA divides χA .
For any real number q > 0, we have the q-norm || · ||q on the vector space C n defined by
1/q
q
||a||q = ∑ |ai |
1≤i≤n
25.6. Finite probability spaces 717

for a = (a1 , . . . , an ) ∈ C n . We also have the max-norm (or ∞-norm)

||a||∞ = max |ai | = lim ||a||q .

1≤i≤n q−→∞

All these norms for q > 0 (including q = ∞) satisfy

◦ ||a||q ≥ 0, with equality if and only if a = 0 (positive definiteness),
◦ ||a + b||q ≤ ||a||q + ||b||q (triangle inequality),
◦ ||λa|| = |λ| · ||a|| (homogeneity),
for all a, b ∈ C n and λ ∈ C. The norms are related by

||a|| p ≤ ||a||q ≤ n1/q ||a|| p

for all a ∈ C n and p > q. Besides the max-norm, the most important norms are the 1-norm
||a||1 = ∑1≤i≤n |ai | and the 2-norm (or Euclidean norm) ||a||2 = (∑1≤i≤n |ai |2 )1/2 . We have
the following relations between these three norms:
√
||a||∞ ≤ ||a||2 ≤ n||a||∞ , ||a||2 ≤ ||a||1 ≤ n||a||∞ . (5)

These norms carry over in a natural way to univariate polynomials with complex coeffi-
cients: for f = ∑0≤i≤n fi xi ∈ C[x], we write || f ||q for the q-norm of the coefficient vector
( f0 , . . . , fn ) ∈ C n+1 .
A map ⋆:V × V −→ F, where V is a vector space over a field F, is called an inner
product on V if
◦ v ⋆ v = 0 ⇐⇒ v = 0,
◦ u ⋆ v = v ⋆ u,
◦ (λu + µv) ⋆ w = λ(u ⋆ w) + µ(v ⋆ w)
hold for all u, v, w ∈ V and λ, µ ∈ F. (Not all vector spaces have such an inner product; for
example, F22 does not.) Two vectors v, w ∈ V are orthogonal (with respect to ⋆) if v ⋆ w = 0.
The most important example of an inner product on R n is (x1 , . . . , xn ) ⋆ (y1 , . . . , yn ) =
x1 y1 + · · · + xn yn , and we have v ⋆ v = ||v||22 for all v ∈ R n . For a sequence v1 , . . . , vn ∈ V
of vectors, G = (vi ⋆ v j )1≤i, j≤n ∈ F n×n is the Gramian matrix of v1 , . . . , vn , and det G is
their Gramian determinant. The vectors v1 , . . . , vn ∈ V are linearly dependent if and only
if their Gramian determinant vanishes. (Exercise 24.4).
A basis (v1 , . . . , vn ) of a vector space V is orthogonal with respect to an inner product ⋆
if its vectors are pairwise orthogonal, so that their Gramian matrix is a diagonal matrix.
The Gram-Schmidt orthogonalization procedure, described in Chapter 16, transforms
an arbitrary basis into an orthogonal one.

25.6. Finite probability spaces

An introduction to this material is in Chapter 8 of Graham, Knuth & Patashnik (1994), and
Feller (1971) is a classical reference.
A finite probability space is a finite set U with a probability function P:U −→
[0, 1] that satisfies ∑u∈U P(u) = 1. For example, if we think of rolling a fair die, then
718 25. Fundamental concepts

U = {1, 2, 3, 4, 5, 6} and P(u) = 1/6 for all u ∈ U gives a finite probability space describing
the possible outcomes of the experiment. When P(u) = 1/#U for all u, as in the example,
we say that P is a uniform probability function.
An event is a subset A ⊆ U, and the probability of A is P(A) = ∑u∈A P(u). In the above
example, the probability of the event “odd roll” A = {1, 3, 5} is 1/2. We have P(Ø) = 0,
P(U \A) = 1−P(A), and P(A∪B) = P(A)+P(B)−P(A∩B) for all A, B ⊆ U. In particular,
P(A ∪ B) = P(A) + P(B) if A and B are disjoint. We usually write prob(A) for P(A).
The conditional probability PB (A) = P(A ∩ B)/P(B) for two events A, B with P(B) 6= 0
is the probability of A under the condition that also B happens. This makes (B, PB ) into a
finite probability space. The events A and B are independent if P(A ∩ B) = P(A)P(B). In
that case, we have PB (A) = P(A) if P(B) 6= 0. In the above example, the two events A =
{u ∈ U: u is odd} = {1, 3, 5} and B = {u ∈ U: u ≤ 2} = {1, 2} are independent, while A
and C = {u ∈ U: u ≤ 3} = {1, 2, 3} are not. Intuitively, if two events are independent, then
the occurrence of one of them has no impact on the probability of the other one to happen.
A random variable X on a finite probability space (U, P) is a function X:U −→ R. The
expected value (or mean value, or average) of X is

E(X) = ∑ X(u) · P(u) = ∑ x · P(X = x),

u∈U x∈X(U)

where X = x is shorthand for the event X −1 (x) = {u ∈ U: X(u) = x}. If X(u) = u in our
running example, then the expected value of X is
1 21
E(X) = ∑ i· = = 3.5.
1≤i≤6 6 6

If Y :U −→ R is another random variable and a, b ∈ R, then E(aX + bY ) = aE(X) + bE(Y ),

so that the expected value is linear. The variance of a random variable X is var(X) =
E((X − E(X))2 ), and the standard deviation of X is σ(X) = var(X)1/2 . It expresses by
how much the actual value of X differs from its mean value on average. The function
PX : R −→ [0, 1] defined by PX (x) = P(X = x) = P({u ∈ U: X(u) = x}) is called the prob-
ability distribution of the random variable X, and (X(U), PX ) is again a finite probabil-
ity space. Two random variables X and Y are independent if P((X = x) ∩ (Y = y)) =
P(X = x)P(Y = y) (so that the events X = x and Y = y are independent) for all x, y ∈ R.
Similarly, n random variables X1 , . . . , Xn are independent if for any x1 , . . . , xn ∈ R and
I ⊆ {1, . . . , n}, the probability that Xi = xi for all i ∈ I equals ∏i∈I P(Xi = xi ).
Given two finite probability spaces (U1 , P1 ) and (U2 , P2 ), we may form their direct
product (U1 × U2 , P1 · P2 ) by letting (P1 · P2 )(u1 , u2 ) = P1 (u1 ) · P2 (u2 ). If Xi is a random
variable on Ui and πi denotes the projection of U1 ×U2 to Ui for i = 1, 2, then X1 ◦ π1 and
X2 ◦ π2 are independent random variables. We have the nth power (U n , Pn ) of (U, P) with
Pn (u1 , . . . , un ) = P(u1 ) · · · P(un ), and if X is a random variable on U, then the n random
variables Xi = X ◦ πi on U n defined by

Xi (u1 , . . . , un ) = X(ui ) (6)

are independent and have the same probability distribution.

For example, we consider the third power (U 3 , P3 ) of the experiment “rolling a die”.
Thus, for example, P3 (1, 2, 3) = 6−3 . Let X be the random variable counting the result of
25.6. Finite probability spaces 719

a roll of one die. The average sum of three rolls is

E(Y ) = E(X1 ) + E(X2 ) + E(X3 ) = 3E(X) = 10.5,
where Y = X1 + X2 + X3 and Xi is the result of die number i for i = 1, 2, 3. More generally,
if X1 , . . . , Xn are as in (6), then E(X1 + · · · + Xn ) = nE(X).
Let (U, P) be a finite probability space, A ⊆ U an event, p = P(A) > 0 and q = 1 − P(A),
and X the random variable indicating whether A has occurred or not, so that X(u) = 1 if
u ∈ A and X(u) = 0 otherwise. Such an X is called a Bernoulli random variable. We
are only interested in whether the outcomes of our random experiment are in A or not.
Suppose that we repeat the experiment (say, rolling dice) potentially infinitely often and
want to know the expected number of trials until X = 1 happens (say, a 6 occurs) for the
first time. Each new trial is independent of the previous ones, and intuitively we expect to
need about 1/p trials (that is, six rolls). We will now see that this is in fact the case.
In order to avoid infinite probability spaces, we model this situation by considering the
probability space (U n , Pn ) for some n ∈ N and random variables Xi for 1 ≤ i ≤ n as in (6).
(Later, we will let n tend to infinity.) Then P(Xi = 1) = p and P(Xi = 0) = q for all i
and X1 , . . . , Xn are independent. Now let Y (n) be the random variable counting the first
occurrence of Xi = 1. Thus Y (n) = i if and only if X1 = X2 = · · · = Xi−1 = 0 and Xi = 1
for 1 ≤ i ≤ n, and Y (n) = n + 1 if Xi = 1 never happens, that is, if X1 = X2 = · · · = Xn = 0.
Then
P(Y (n) = i) = P(X1 = 0)P(X2 = 0) · · · P(Xi−1 = 0)P(Xi = 1) = qi−1 p
for 1 ≤ i ≤ n (this is called a geometric distribution),
P(Y (n) = n + 1) = P(X1 = 0)P(X2 = 0) · · · P(Xn = 0) = qn ,
and hence
E(Y (n) ) = ∑ i · P(Y (n) = i) = (n + 1)qn + ∑ iqi−1 p.
1≤i≤n+1 1≤i≤n
Now
1 − qn
p ∑ iqi−1 = (1 − q) ∑ iqi−1 = −nqn + ∑ qi = −nqn + ,
1≤i≤n 1≤i≤n 0≤i<n 1−q
by the formula for the geometric sum, and we obtain
1 − qn qn+1 1
E(Y (n) ) = (n + 1)qn − nqn + =− + .
1−q 1−q p

Finally, limn−→∞ E(Y (n) ) = 1/p, as expected, since |q| < 1 implies that limn−→∞ qn+1 = 0.
As an example, the waiting time for a 6 to be rolled with a fair die is close to 1/(1/6) = 6.
More precisely, the value of E(Y (n) ) with p = 1/6 is the expected number of rolls until a 6
shows up, if that happens with no more than n rolls, and counting n + 1 if no 6 shows up
at all. This value gets close to 6 when n is large; the difference qn+1 /(1 − q) is about 0.13
for n = 20 and about 0.6 · 10−7 for n = 100.
The probability that we need at least k ≤ n trials until A happens for the first time is
P(Y (n) ≥ k) = P(X1 = 0) · · · P(Xk−1 = 0) = qk−1
for k ≥ 1, independent of n. It is exponentially decreasing with k. For example, the proba-
bility that we need at least 10 rolls until a 6 occurs is (5/6)9 ≈ 19.38 %.
720 25. Fundamental concepts

25.7. “Big Oh” notation

We only give a brief introduction; extensive discussions are in Graham, Knuth & Patashnik
(1994), Chapter 9, and Brassard & Bratley (1996), Chapter 3.
In many situations in algorithmics it is convenient to express the cost of an algorithm
only “up to a constant factor”. What is needed is not the exact cost for increasing input
sizes but merely the “rate of growth”. We say for example that the familiar multiplication
algorithm for n × n matrices over a field “is an n3 algorithm”, which means that the number
f (n) of field operations needed to multiply two n × n matrices is less than c · n3 for all
n ∈ N, where c is a real constant that we are not really interested in. This “sloppiness” can
be formalized by means of the “big Oh” notation.

D EFINITION 25.7. (i) A partial function f : N −→ R, that is, one that need not be de-
fined for all n ∈ N, is called eventually positive if there is a constant N ∈ N such
that f (n) is defined and strictly positive for all n ≥ N .
(ii) Let g: N −→ R be eventually positive. Then O(g) is the set of all eventually positive
functions f : N −→ R for which there exist N, c ∈ N such that f (n) and g(n) are
defined and f (n) ≤ cg(n) for all n ≥ N .

If f (n) denotes the cost for matrix multiplication as above and g(n) = n3 , we may write
f ∈ O(g). In the literature, one often finds f = O(g) or f (n) = O(n3 ) for this. The equal
sign then has unusual properties: if g(n) = n3 and h(n) = n4 , then g = O(h) and h = O(h),
but we do not want to conclude that g = h.
A slight abuse of notation is that we often write, for example, n3 ∈ O(n4 ). For each
n ∈ N, n3 is just a number and the O-notation makes little sense for single numbers. What
is meant is that g ∈ O(h), with g, h as above. Similarly, we may write f (n) ∈ O(n3 ), or
f ∈ O(n3 ). There is a notation to avoid this abuse, called the λ-calculus, but it is somewhat
clumsy and we do not use it.
For example, if f : N −→ R with f (n) = 3n4 − 300n + 1, then f (n) ∈ O(n4 ), and also
f (n) ∈ O(n5 ) (“O” does not imply “as accurate as possible”), but f 6∈ O(n3 ). An eventually
positive function h satisfies h(n) ∈ O(1) if and only if h is bounded from above. We have
f ∈ O( f ), and f ∈ O(g) and g ∈ O(h) imply that f ∈ O(h), for all eventually positive
functions f , g, h. If f : N −→ R is eventually nonzero, then we write f ∈ O(g) instead of
| f | ∈ O(g) for short. Thus, for example, −2n ∈ O(n).
Often the O is used in a more extended form, where it may appear anywhere on the
right hand side. For instance, f (n) ∈ g(n) + O(h(n)) is shorthand for f (n) = g(n) + k(n)
with some k ∈ O(h), or more briefly f − g ∈ O(h). Similarly, f (n) ∈ g(n) · O(h(n)) if
( f /g) ∈ O(h), f (n) ∈ g(n)O(h(n)) if f (n) = g(n)k(n) for some k ∈ O(h), and more generally
f (n) ∈ g(n, O(h(n))) if f (n) = g(n, k(n)) for some k ∈ O(h). If f , g: N −→ R are eventually
positive, then
◦ c · O( f ) = O( f ) for any c ∈ R>0 ,
◦ O( f ) + O(g) = O( f + g) = O(max( f , g)), where max is the pointwise maximum,
◦ O( f ) · O(g) = O( f · g) = f · O(g),
◦ O( f )m = O( f m ) for any m ∈ R>0 , where f m denotes the function for which f m (n) =
f (n)m (and not f ( f (· · · f (n) · · ·))), and
| {z }
m
◦ f (n) ∈ g(n)O(1) ⇐⇒ f is bounded by a polynomial in g.
25.8. Complexity theory 721

All equations are equations of sets, so for example O( f ) + O(g) is shorthand for the set
{h + k: h ∈ O( f ) and k ∈ O(g)}.
We use logarithms in O-expressions without explicit reference to a base, and the reader
may always think of some fixed base such as 2 or e.
Be cautious with exponentiation of O! At first glance, you might consider eO( f ) = O(e f )
to be valid. But we have e2n ∈ eO(n) , and yet e2n = (en )2 6∈ O(en ). The constant hidden
within the “O” does influence the rate of growth when it occurs as an exponent.
We often also use the O-notation for functions g of two or three arguments, in the fol-
lowing sense. A partial function g: N × N −→ R is eventually positive if there is a constant
N ∈ N such that g(m, n) is defined and positive for all m, n ≥ N. For such a function g,
O(g) is the set of all eventually positive functions f : N × N −→ R for which there exist
N, c ∈ N such that f (m, n), g(m, n) are defined and f (m, n) ≤ cg(m, n) for all m, n ≥ N, and
similarly for ternary functions.
In some situations, the “O” carries still too much information. For instance, the fast
algorithm for multiplying two integers of length n uses O(n log n loglog n) word operations
(Section 8.3), and hence is, up to logarithmic factors, essentially a linear algorithm like the
addition algorithm for n-word integers.

D EFINITION 25.8. Let f , g: N −→ R be eventually positive. Then we write f ∈ O∼ (g)

(pronounced “ f in soft Oh of g”), if f (n) ∈ g(n)(log2 (3 + g(n)))O(1) , or equivalently, if
there are constants N, c ∈ N such that f (n) ≤ g(n)(log2 (3 + g(n)))c for all n ≥ N . (The
addition of 3 makes the log2 eventually greater than 1.)

Thus n log n loglog n ∈ O∼ (n), and the ugly log-factors are swallowed by the “soft Oh”.
We use terminology like “quadratic time” for O(n2 ), and “softly linear time” for O∼ (n).

25.8. Complexity theory

Excellent introductions to this rich theory are Sipser (1997), Papadimitriou (1993), and
Wegener (1987). The notions are fairly intricate in their full generality, and to acquire ease
of manipulating them requires substantial exposure to them, which is not provided by this
précis. Algebraic complexity theory is briefly mentioned in Section 12.1.
A decision problem X ⊆ I is a subset of a set I of instances. As an example, X =
P RIMES = {x ∈ N: x is prime} ⊆ I = N. The complexity class P (“polynomial time”)
consists of those X for which there exists a Turing machine (an idealized binary com-
puter “hard-wired” to a particular program) which correctly accepts (if x ∈ X) or rejects
(if x ∈ I \ X) any x ∈ I in a number of steps that is polynomial in the input length λ(x);
such a machine is called a polynomial-time Turing machine. As an example, the binary
representation of a nonzero integer x ∈ N has length λ(x) = 1 + ⌊log2 x⌋.
The randomized complexity class BPP (“bounded-error probabilistic polynomial
time”) consists of those decision problems X for which there exists a polynomial-time
Turing machine which, given an instance x ∈ I of X and a random bit string of length poly-
nomial in λ(x), does the following. If x ∈ X, then with probability at least 2/3 it accepts x.
If x 6∈ X, then with probability at least 2/3 it rejects x. Such a machine is called a two-sided
Monte Carlo Turing machine. A standard trick is to run such an algorithm k times, for
an odd integer k, and accept if and only if most of the runs accept; then an element in X
722 25. Fundamental concepts

gets accepted with probability at least

√ !k
−k k i −k k/2 k 2 2
1−3 ∑ i 2 > 1 − 3 2 ∑ i > 1− ,
0≤i<k/2 0≤i<k/2
3

and the same bound holds for the rejection probability of an x ∈ I \ X. Furthermore, X ∈
BPP if and only if its complement I \ X is in BPP.
The complexity class RP (“random polynomial time”) consists of those decision prob-
lems X for which there exists a polynomial-time Turing machine which, given an instance
x ∈ I of X and a random bit string of length polynomial in λ(x), does the following. If
x ∈ X, then it accepts x with probability at least 1/2, while if x 6∈ X, then it always rejects x.
The difference to the definition of BPP is that a BPP machine is allowed to make mis-
takes both in accepting instances not in X and in rejecting instances in X, while an RP
machine is not allowed to accept instances not in X. A standard trick is to run such an
algorithm k times and accept if and only if one of the runs accepts; then an element in X
will be accepted with probability at least 1 − 2−k . Furthermore, we define co-RP to consist
of those problems X whose complement I \ X is in RP. An RP Turing machine is also
called a one-sided Monte Carlo Turing machine.
The class ZPP = RP ∩ co-RP (“zero-error probabilistic polynomial time”) consists of
those problems for which probabilistic polynomial-time algorithms exist that always give
the right answer; their running time is a random variable with a certain mean t (polynomial
in the input length) and exponential decay: prob(time ≥ at) ≤ 2−a . The only problem
in this class that is not known to be already in P is treated extensively in this text: (the
decision version of) factoring polynomials over finite fields. A ZPP Turing machine is
also called a Las Vegas Turing machine. We have P ⊆ ZPP, since every deterministic
algorithm is also a probabilistic one (that just does not use any random bits).
The class NP (“non-deterministic polynomial time”), introduced by Cook (1971) and
Karp (1972), comprises those problems X that have a non-deterministic polynomial-time
solution, so that there exists a deterministic polynomial-time Turing machine M such that
for all x ∈ I we have x ∈ X if and only if there exists a bit string y of length polynomial in
λ(x) such that M accepts (x, y). (“Non-deterministic” does not mean “not deterministic”;
taking the empty string for y shows that P ⊆ NP.) The only known simulations of M on a
realistic computer try all exponentially many possibilities for y, and it is the most important
open problem in theoretical computer science to prove Cook’s hypothesis that P = 6 NP.
The class co-NP consists of those X for which I \ X ∈ NP.
A (Turing-)reduction from a problem X to a problem Y is a deterministic polynomial-
time algorithm (Turing machine) for X that may use an (unspecified) subroutine for decid-
ing membership in Y . If such a reduction exists, then X is (polynomial-time) reducible
to Y ; this implies that X is not harder to solve than Y (in the sense of polynomial time). If
also Y is reducible to X, then they are (polynomial-time) equivalent.
A decision problem X is C-hard for a complexity class C if every problem in C is
reducible to it, and C-complete if in addition X ∈ C. The C-complete problems are the
“hardest” ones in C. Cook’s hypothesis implies that the NP-complete problems cannot
be solved in polynomial time. His first example were satisfiable formulas of propositional
calculus (see Section 24.1), and the subset sum problems of Section 17.1 are also NP-
complete. The classic by Garey & Johnson (1979) lists over 1000 such problems.
25.8. Complexity theory 723

EX PEX PT IME

EX PSPA CE

EX PT IME

PSPA CE

co-NP

co-RP NP ∩ co-NP

NP
ZPP
RP
P

F IGURE 25.3: Containment relations between various complexity classes.

For the classes EX PT IME and EX PEX PT IME, one allows the algorithms to take
exponential and doubly-exponential time, respectively, that is,
O(1) nO(1)
time 2n and 22 , respectively,

on inputs of length n. Such algorithms can be run in practice only for rather small values
of n.
In space bounded complexity classes, one limits the number of memory cells used by
algorithms. The read-only input cells and the write-only output cells are not counted, but
only the essential work cells. This leads to the classes PSPA CE and EX PSPA CE, with
polynomially and exponentially bounded work space, respectively.
The relations between the various complexity classes described above are illustrated
in the “complexity onion” of Figure 25.3. Clearly RP ⊆ BPP ∩ NP and co-RP ⊆
BPP ∩ co-NP, but no “nice” inclusion for BPP in these classes is known, except that
724 25. Fundamental concepts

BPP ⊆ PSPA CE. The feasible problems, for which algorithms exist that can handle in-
puts up to a “reasonable” size, are those in BPP and smaller classes. (This statement has to
be taken with a grain of salt.) The first two thirds of this book (up to Chapter 18) deal with
such problems. Sometimes an effort is required to show that they are in ZPP (for poly-
nomial factorization over finite fields) or P (P RIMES and Gaussian elimination over Q),
and sometimes they are clearly in P and our effort goes into reducing the time from O(n2 )
to O∼ (n), say (for multiplication, division with remainder, etc.). The later chapters treat
problems that are not known to be in BPP. For their solutions, there are reasonably small
inputs (say, a 400-digit integer to be factored) for which all known methods take more time
than is feasible in practice. But still, experience so far gives rise to the hope that improved
algorithms (and hardware) will increase the range of interesting solvable problems further
and further.

Notes. 25.2. Proposition 30 in Book 7 of Euclid’s Elements shows that prime numbers
are irreducible. Gauß (1831, 1863c) showed that O−1 = Z[i] and O−3 are Euclidean.
Hendrik Lenstra (1979a, 1980a, 1980b) studied Euclidean number fields in detail, and
Lemmermeyer (1995) provides an exhaustive discussion.
25.3. Gauß (1863a), article 243, proved Lemma 25.4 (ii) for R = Z p , where p is a prime.
25.4. Galois (1830) laid the foundations of the theory of finite fields. They are often called
Galois fields, and GF(q) is a common notation for our Fq .
25.5. See Notes 5.5 for Gauß’ elimination procedure. Laplace (1772), chapter IV, gives his
determinant expansion; it is also in Bézout (1764), pages 293 ff. Cramer (1750), Appendice
No I, page 658, states his rule.
25.7. The “big Oh” notation was introduced by Paul Bachmann and Edmund Landau in
number theory at the end of the 19th century, and popularized in computer science by
Don Knuth (1970). Von zur Gathen (1985) and Babai, Luks & Seress (1988) invented the
“soft Oh” notation.
25.8. Ulam used a randomized method to estimate the success probability in the card game
solitaire and apparently coined the term Monte Carlo . Levin (1973) also introduced the
class NP. Babai (1979) invented the designation Las Vegas algorithm. See Notes 6.5.
The best known upper bound for BPP appears to be BPP ⊆ MA ⊆ Σ2p ∩ Π2p ; Johnson
(1990) gives more detailed information about complexity classes.
Exercises for this chapter can be found on the book’s web page.
Sources of illustrations
Page 14: Désirée von zur Gathen twisting a plastic model of cyclohexane.
Page 23: First page of Euclid’s Elements , printed 1482 by Erhard Ratdolt in Venice. University Library, Basel.
Reproduced with kind permission.
Page 25: Miniature from the manuscript Agrimensorum on Roman land surveyors, 6th century AD. Possibly
represents Euclid. Courtesy Herzog–August–Bibliothek, manuscript 2403, Wolfenbüttel.
Page 217: Portrait of Isaac Newton by Sir Godfrey Kneller, 1689. Courtesy of the Trustees of the Portsmouth
Estates.
Page 219: One pound UK banknote (in circulation until 1988) depicting Isaac Newton. Reproduced with kind
permission of the Bank of England.
Pages 365 ff.: Schloß Neuhaus in Paderborn. Residence of Bishop Ferdinand von Fürstenberg (see page 513).
Page 371: Portrait of Carl Friedrich Gauß. Oil painting by Gottlieb Biermann (1824–1908), a copy made in
1887 of a portrait executed by Christian Albrecht Jensen (1792–1870) in 1840. Lecture Hall in the Sternwarte
(observatory), Göttingen. Reproduced with kind permission of the Universitäts-Sternwarte Göttingen.
Page 373: German 10 DM banknote, using a mirror-image of the portrait on page 371. Designed by Reinhold
Gerstetter. In circulation from 16 April 1991 to 31 December 2001. Reproduced with kind permission of
Deutsche Bundesbank.
Page 511: Marble statue of Pierre Fermat with muse, by Théophile Barrau, 1898. Inscription: Fermat. Inventeur
du calcul différentiel. 1585[sic!]–1665. Salle des illustres, Capitole, Toulouse.
Page 512: Engraving of Pierre Fermat by François Poilly. From Varia Opera, Toulouse, 1679.
Page 513: Dedication by Samuel Fermat to Ferdinand von Fürstenberg, Bishop of Paderborn, in the Varia Opera,
Toulouse, 1679.
Page 585: Portrait of David Hilbert. Lecture Hall in the Mathematisches Institut, Universität Göttingen.
Reproduced with kind permission of Mathematisches Institut der Georg-August-Universität, Göttingen.
Page 587: Signed photograph of David Hilbert. Apparently taken by the photographer August Schmidt. This
was one in a series of popular postcards Portraits Göttinger Professoren. Hrsg. von der Göttinger Freien
Studentenschaft. Nr. 13. Acquired by the library in 1915. Courtesy Niedersächsische Staats- und
Universitätsbibliothek, Göttingen.
All photographs except those on pages 25, 217, and 587 1999c by Joachim von zur Gathen.

Sources of quotations

Introduction (page 0): William Shakespeare (1564–1616), King Henry VIII, 1.1.123. The Works, Jacob
Tonson, London, 1709, vol. 4, p. 1725. Lord Francis Bacon (1561–1626), Essays, Of Studies, 1597. Reprinted
by Henry Altemus Company, Philadelphia PA, c. 1900, p. 201. Anonymous referee, Bulletin des sciences
mathématiques Férussac 3 (1825), p. 77. Isaac Newton (1642–1727), Universal Arithmetick: or, A Treatise of
Arithmetical Composition and Resolution, translated by the late Mr. Raphson and revised and corrected by Mr.
Cunn, London, 1728, Preface To The Reader. Translation of Arithmetica Universalis, sive de compositione et
resolutione arithmetica liber, 1707. Reprinted in: Derek T. Whiteside, The mathematical works of Isaac Newton,
vol. 2, Johnson Reprint Co, New York, 1967, pp. 4–5. Ghiyāth al-Dı̄n Jamshı̄d bin Mas֒ūd bin Mah.mūd

al-Kāshı̄ (c. 1390–c. 1448), H. AmÌ '@ hAJ®Ó (miftāh. al-h.isāb, The key to computing ), written in 1427. Manuscript
copied in 1645, now in the Preußische Staatsbibliothek, Berlin, edited by Luckey (1951), p. 128, lines 15–17.
Chapter 1 (page 10): Arthur C. Clarke (*1917). An article by Jeremy Bernstein in the New Yorker of 9 Au-
gust 1969 mentions Clarke’s Third Law as being most recently formulated and which he made use of in writing the
enigmatic ending of “2001”. Napoléon I. Bonaparte (1769–1821). Correspondance de Napoléon, t. 24, p. 131,
letter 19 028, 1 August 1812, Vitebsk, to Laplace. Imprimerie Royale, Paris, 1868. Augusta Ada Lovelace
(1815–1852), Sketch of the Analytical Engine Invented by Charles Babbage, Esq., by L. F. Menabrea (translated
and with notes by “A. A. L.”). Taylor’s Scientific Memoirs 3 (1843), Article XXIX, 666–731. Reprinted in Bab-
bage’s Calculating Engines, E. and F. N. Spon, London, 1889, 4–50, p. 23. Reprinted in The Charles Babbage
Institute Reprint Series for the History of Computing, vol. II, Tomash Publishers, Los Angeles/San Francisco CA,
1982. Robert Ludlum (*1927), Apocalypse Watch, Bantam paperback, 1996, ch. 8, p. 135. Reprinted with kind
permission of Bantam Books, a divison of Bantam, Doubleday, Dell Publishing Group, Inc., New York. Eric
Temple Bell (1883–1960), Men of Mathematics I, ch. 1: Introduction, Penguin Books, 1937, p. 2.
Chapter 2 (page 28): Leopold Kronecker (1823–1891), Vortrag bei der Berliner Naturforscher-Versamm-
lung, 1886. Quoted by H. Weber, Leopold Kronecker, Jahresberichte der Deutschen Mathematiker Vereinigung 2

725
726 Sources of quotations

(1891/92), p. 19. Also quoted by David Hilbert, Neubegründung der Mathematik, Abhandlungen aus dem Mathe-
matischen Seminar der Hamburger Universität 1 (1922), p. 161. Lewis Carroll (Rev. Charles Lutwidge Dodgson)
(1832–1898), Alice’s Adventures in Wonderland, Macmillan and Co., London, 1865, Ch. 9: The mock turtle’s
story. Reprinted by Avon, The Heritage Press, 1969. Isaac Newton (1642–1727), Universal Arithmetick: or,
A Treatise of Arithmetical Composition and Resolution, translated by the late Mr. Raphson and revised and
corrected by Mr. Cunn, London, 1728, p. 1. Translation of Arithmetica Universalis, sive de compositione et
resolutione arithmetica liber, 1707. Reprinted in: Derek T. Whiteside, The mathematical works of Isaac New-
ton, vol. 2, Johnson Reprint Co, New York, 1967, pp. 6–7. Stanisław Marcin Ulam (1909–1984), Computers,
Scientific American, September 1964, 203–216. Reprinted with kind permission. Also reprinted in Science,
Computers, and People, Birkhäuser, Boston, 1986, p. 43. Marcus Tullius Cicero (106–43 BC), Tusculanae Dis-
putationes, Liber primus, II.5. Opera Omnia, Lugdunus, Sumptibus Sybillæ à Porta, 1588, vol. 4, p. 165. Robert
Louis Stevenson (1850–1894), The Master of Ballantrae, Collins, London and Glasgow, 1889, p. 51. State of
California, Instructions for Form 540 NR, California Nonresident or Part-Year Resident Income Tax Return,
1996, p. 3.
Chapter 3 (page 44): Godfrey Harold Hardy (1877–1947), A Mathematician’s Apology, Cambridge Uni-
versity Press, 1940, ch. 8, p. 21. Robert Recorde (c. 1510–1558), The Whetstone of Witte, The seconde parte
of Arithmetike, London, 1557. Murray Gell-Mann (*1929), The Quark and the Jaguar, Abacus, London, 1994,
ch. 9: What is fundamental, p. 109. Reprinted with kind permission from Little, Brown, London and Murray Gell-
Mann, Santa Fe NM. Robert Boyle (1627–1691), Some Considerations touching the Usefulness of Experimental
Natural Philosophy, vol. 2, The Usefulness of Mathematicks to Natural Philosophy ; Oxford, 1671. The Works,
ed. by Thomas Birch, vol. 3, London, 1772, p. 426. Augustus De Morgan (1806–1871), Smith’s Dictionary of
Greek and Roman Biography and Mythology, London, c. 1844, Article “Eucleides”, 63–75, p. 63.
Chapter 4 (page 68): Novalis (Friedrich Leopold Freiherr von Hardenberg) (1772–1801), Materialien zur
Encyclopädie. In: Schriften, hrsg. Ernst Heilbronn, Teil 2, Georg Reimer, Berlin, 1901, p. 549. Karl Theodor
Wilhelm Weierstraß (1815–1897), letter to Sonja Kowalevski, 27 August 1883. See Gustav Magnus Mittag-
Leffler: Une page de la vie de Weierstrass, Compte rendu du deuxième congrès international des mathématiciens
(Paris, 1900), Gauthiers-Villars, Paris, 1902, p. 149. David Hume (1711–1776), A Treatise of Human Nature,
John Noon, London, 1739, Part III: Of Knowledge and Probability, Sect. I: Of Knowledge. Augustus De Morgan
(1806–1871), Elements of Algebra, London, 1837, Preface. Abū Ja֒far Muh.ammad bin Mūsā al-Khwārizmı̄

(c. 780–c. 850), éÊK. A®ÒË@ ð QJ.m.Ì '@ H

. Ak ú¯ QåJjÒË@ H. AJºË@ (al-kitāb al-mukhtas.ar fı̄ h.isāb al-jabr wa-l-muqābala,

The concise book on computing by moving and reducing terms ), often called Algebra, c. 825, marginal note
to p. 51 (51) and pp. 299–300 of Rosen’s (1831) edition. Manuscript in the Bodleian Library at Oxford, UK,
transcribed in 1342, edited by Frederic Rosen.
Chapter 5 (page 96): Eric Temple Bell (1883–1960), Men of Mathematics I, ch. 2: Modern minds in ancient
bodies, Penguin Books, 1937, p. 33. James Joseph Sylvester (1814–1897), Proof of the Fundamental Theorem
of Invariants, Philosophical Magazine (1878), p. 186. Collected Mathematical Papers, vol. 3, p. 126. Gottfried
Wilhelm Freiherr von Leibniz (1646–1716), Untitled and unpublished manuscript, Hannover Library. From:
Gottfried Wilhelm Leibniz, Opera philosophica, ed. Johann Eduard Erdmann, 1840, XI. De scientia universali
seu calculo philosophico (title by Erdmann). Reprint Scientia Verlag, Aalen, 1974, p. 84. Augustus De Morgan
(1806–1871), Study and Difficulties of Mathematics, Society for the Diffusion of Useful Knowledge, 1831,
chap. 12, On the Study of Algebra. Fourth Reprint Edition, Open Court Publishing Company, La Salle IL, 1943,
p. 176.
Chapter 6 (page 140): Godfrey Harold Hardy (1877–1947), A Mathematician’s Apology, Cambridge Uni-
versity Press, 1940, ch. 10, p. 25. David Hilbert (1862–1943), Mathematische Probleme, Nachrichten von der
Königlichen Gesellschaft der Wissenschaften zu Göttingen (1900), 253–297. Archiv für Mathematik und Physik,
3. Reihe 1 (1901), 44–63 and 213–237. Gesammelte Abhandlungen, Springer Verlag, 1970, 290–329, p. 294.
Reprinted with kind permission. Johann Wolfgang von Goethe (1749–1832), Wilhelm Meisters Wanderjahre,
Zweites Buch; Betrachtungen im Sinne der Wanderer: Kunst, Ethisches, Natur. Thomas Edward Lawrence
(1888–1935), Seven Pillars of Wisdom, George Doran Publishing Co., 1926. Book III: A railway diversion,
ch. XXXIII. Reprint by Anchor Books, Doubleday, New York, 1991, p. 192.
Chapter 7 (page 208): Oliver Cromwell (1599–1658), Letter C (= 100), to Richard Mayor, father of Crom-
well’s daughter-in-law, written off Milford Haven, 13th August 1649. In: Thomas Carlyle, Oliver Cromwell’s
Letters and Speeches, vol. II, Chapman and Hall, London, 1845, p. 41. John Locke (1632–1704), An Essay
concerning Humane Understanding: in Four Books, Thomas Basset, London, 1690, Bk. 4: Of Knowledge and
Opinion, chap. 3: Of the extent of human knowledge, sect. 18. John Cougar Mellencamp (*1951), CD Big
Daddy, J. M.’s Question, Mercury Records, Copyright c Full Keel Music Co. Rights for Germany, Austria,
Switzerland and Eastern Europe except Lithuania, Latvia, and Estonia by Heinz Funke Musikverlag GmbH,
Berlin. Reprinted with kind permission of Heinz Funke Musikverlag GmbH, Berlin. Michael Crichton (*1942),
The Lost World. Ballantine Books, Random House, Inc., New York, 1996, ch. Raptor, pp. 82–83. Reprinted with
kind permission of Alfred A. Knopf Incorporated, New York, and Random House, Inc., New York. Immanuel
Kant (1724–1804), Über Pädagogik (A. Von der physischen Erziehung). Notes on his lectures on pedagogy
Sources of quotations 727

between 1776 and 1787, published 1803 by Friedrich Theodor Rink. Werke, hrsg. Karl Rosenkranz und Friedrich
Wilhelm Schubert, Band 9, Leopold Voss, Leipzig, 1838, 367–439, p. 409.
Chapter 8 (page 220): Richard Phillips Feynman (1918–1988), Surely You’re Joking, Mr. Feynman. Ad-
ventures of a Curious Character. With Ralph Leighton. W. W. Norton Inc., 1984. Paperback: Vintage, 1992,
p. 100. Reprinted with kind permission of W. W. Norton & Company. Inc., New York and Random House
UK Limited, London. John le Carré (David John Moore Cornwell) (*1931), The Russia House, Hodder &
Stoughton, 1989, ch. 8, p. 160. Reprinted by kind permission of David Higham Associates Limited, London.
Arnold Schönhage (*1934), Andreas F. W. Grotefeld, Ekkehard Vetter, Fast Algorithms: A Multitape Tur-
ing Machine Implementation, BI-Wissenschaftsverlag, Mannheim, 1994, p. 284. c Spektrum Akademischer
Verlag, Heidelberg. Reprinted with kind permission. Ernst Mach (1836–1916), Populär-wissenschaftliche Vor-
lesungen. Barth, Leipzig, 1896. 13. Vorlesung: Die ökonomische Natur der physikalischen Forschung, 217–244,
pp. 228–229. Reprinted by Böhlau Verlag Wien, Köln, Graz 1987. English translation by McCormack, Popular
Scientific Lectures, Open Court Publishing Company, La Salle IL, 1895.
Chapter 9 (page 256): Isaac Newton (1642–1727), Saying attributed to Newton. Robert Edler von Musil
(1880–1942), Der mathematische Mensch, 1913. Gesammelte Werke, Band II, hrsg. Adolf Frisé, Rowohlt,
1978, p. 1006. Copyright c 1978 by Rowohlt Verlag GmbH, Reinbek. Reprinted with kind permission. Carl
Friedrich Gauß (1777–1855), Announcement of Theoria residuorum biquadraticorum, Commentatio secunda ;
Göttingische Gelehrte Anzeigen (1831). Werke II, Königliche Gesellschaft der Wissenschaften, Göttingen, 1863,
169–178, pp. 177–178. Reprinted by Georg Olms Verlag, Hildesheim New York, 1973. Alfred North White-
head (1861–1947), An Introduction to Mathematics, Ch. 5: The Symbolism of Mathematics, Oxford University
Press, 1911, pp. 39–40. Reprinted by kind permission of Oxford University Press, New York. Abū Ja֒far
Muh.ammad bin Mūsā al-Khwārizmı̄ (c. 780–c. 850), Algorithmi de numero Indorum, often called Arithmetic,
c. 830. 13th century Latin manuscript from the library of the Hispanic Society of America, New York. It is
probably a copy of a 12th century Latin translation of al-Khwārizmı̄’s book on arithmetic whose original is lost.
It was written after his Algebra. The recently discovered manuscript was edited by Folkerts (1997). Quote from
end of Chapter 7, Plate 8 (f. 20v) and p. 70. Crossley & Henry (1990) translate the Latin text of another surviving
manuscript, at Cambridge.
Chapter 10 (page 294): James William Cooley (*1926), The Re-Discovery of the Fast Fourier Transform
Algorithm, Mikrochimica Acta (Wien) 3 (1987), 33–45. Reprinted with kind permission of Springer-Verlag,
Wien. Voltaire (François-Marie Arouet) (1694–1778), Questions sur l’Encyclopédie, Article “Imagination”,
1771. Reprinted in Dictionnaire de la pensée de Voltaire par lui-même, Éditions Complexe, 1994, p. 604. Pierre
Simon Laplace (1749–1827), Théorie analytique des probabilités, Courcier, Paris, 1812. Œuvres, Paris, 1847,
t. 7, p. 131. James Joseph Sylvester (1814–1897), On the explicit values of Sturm’s quotients, Philosophical
Magazine 6 (1853), 293–296. Mathematical Papers, vol. 1, p. 637–640.
Chapter 11 (page 312): Leslie Gabriel Valiant (*1949), Circuits of the Mind, Oxford University Press,
1994, p. ix. Copyright c 1994 by Oxford University Press. Reprinted with kind permission of Oxford Univer-
sity Press, Inc. Charles Babbage (1792–1871), Passages from the Life of a Philosopher, Chapter VIII: Of the
Analytical Engine. Reprinted in Babbage’s Calculating Engines, E. and F. N. Spon, London, 1889, 154–283,
p. 167. Reprinted in The Charles Babbage Institute Reprint Series for the History of Computing, vol. II, Tomash
Publishers, Los Angeles/San Francisco CA, 1982. Plato (c. 428–c. 347 BC), Πολιτεια (Republic ), Book 7,
chap. 8.
Chapter 12 (page 334): Iosif Semenoviq Iohvidov, Gankelevy i teplicevy matricy
i formy, §18. Obrawenie teplicevyh i gankelevyh matric, Nauka, 1974, p. 171. English
translation by G. Philipp A. Thijsse: I. S. Iohvidov, Hankel and Toeplitz Matrices and Forms, §18. Inversion of
Toeplitz and Hankel matrices, Birkhäuser, Basel, 1982, p. 147. Reprinted with kind permission of Birkhäuser
Verlag AG, Basel, Switzerland. James Joseph Sylvester (1814–1897), On the relation between the minor deter-
minants of linearly equivalent quadratic functions, Philosophical Magazine 1 (1851), 295–305, p. 300. Collected
Mathematical Papers 1, 241–250, pp. 246–247. René Descartes (1596–1650), Discours de la Méthode, troisième
partie, 1637.
Chapter 13 (page 358): Emil Luckhardt, German version of the Internationale. Original French version
1871 by Eugène Pottier (Paris, 1816–1887), music 1888 by Pierre-Chrétien Degeyter (or de Geyter, Lille, 1848–
1932). Jean Baptiste Joseph Fourier (1768–1830), Théorie Analytique de la Chaleur, Firmin Didot Frères,
Paris, 1822. Discours Préliminaire, p. xxii. Felix Klein (1849–1925), Elementarmathematik vom höheren
Standpunkte aus, Band II, Springer, Leipzig, 1909. Also: Grundlehren der Mathematik 15, 1925, Springer, Berlin,
p. 206. c Springer-Verlag, Heidelberg. Reprinted with kind permission. Johann Wolfgang von Goethe (1749–
1832), Maximen und Reflexionen, Aus dem Nachlass, Sechste Abtheilung, No. 1279. Mark Twain (Samuel
Longhorne Clemens) (1835–1910), A Tramp Abroad, Vol. 2, Appendix D: The awful German language. Harper
& Brothers Publishers, New York and London, 1897.
Chapter 14 (page 376): Zhuojun Liu and Paul Shyh-Horng Wang, Height as a Coefficient Bound for
Univariate Polynomial Factors, Part I, SIGSAM Bulletin 28(2) (1994), ACM Press, 20–27. Reprinted with kind
permission of ACM Publications. Maurice Borisovitch Kraïtchik (1882–1957), Théorie des Nombres, Tome II,
728 Sources of quotations

Gauthier-Villars et Cie., Paris, 1926, Avant-propos, pp. iii–iv. Évariste Galois (1811–1832), Sur la théorie des
nombres, Bulletin des sciences mathématiques Férussac 13 (1830), 428–435. See Galois (1830). Hermann
Hankel (1839–1873), Die Entwicklung der Mathematik in den letzten Jahrhunderten, 2. Auflage, Fues’sche
Sortiment Buchhandlung Tübingen, 1885, p. 25. Tom Clancy (*1947), Debt of Honor, G. P. Putnam’s Sons,
New York, 1994, ch. 44 . . . from one who knows the score . . ., p. 687.
Chapter 15 (page 432): Charles Davies, University Algebra, Barnes & Co., New York, 1867, p. 41. Pierre
Simon Laplace (1749–1827), Théorie analytique des probabilités, Courcier, Paris, 1812. Œuvres, Paris, 1847,
t. 7, p. 156.
Chapter 16 (page 472): Joseph Liouville (1809–1882), Œuvres mathématiques d’Évariste Galois, Journal
de mathématiques pures et appliquées 9 (1846), 381–444, p. 382. Sue Taylor Grafton (*1940), “A” is for Alibi,
Bantam Books, 1987, ch. 9, p. 71. Holt, Rinehart & Winston 1982. Philip Friedman, Inadmissible Evidence,
Ivy Books, published by Ballantine Books, 1992, ch. 22, p. 224. c Random House, Inc., New York. Reprinted
with kind permission.
Chapter 17 (page 502): Voltaire (François-Marie Arouet) (1694–1778), Questions sur l’Encyclopédie, Ar-
ticle “Géométrie”, 1771. Reprinted in Dictionnaire de la pensée de Voltaire par lui-même, Éditions Complexe,
1994, p. 479. Napoléon I. Bonaparte (1769–1821). Correspondance de Napoléon, t. 2, letter 1231, 15 frimaire 5
= 5 December 1796, to Lalande. Imprimerie Royale, Paris, 1868. Robert Recorde (c. 1510–1558), The Whet-
stone of Witte, London, 1557.
Chapter 18 (page 516): Adrien-Marie Legendre (1752–1833), Théorie des nombres, Firmin Didot Frères,
Paris, 1830. 4e édition, Hermann, Paris, 1900, p. 70. The Rolling Stones, UK: LP The Rolling Stones, 26 April
1964; USA: LP England’s Newest Hit Makers, 1964. Composers: Eddie Holland/Lamont Dozier/Brian Hol-
land. Stanisław Marcin Ulam (1909–1984), Computers, Scientific American, September 1964, 203–216, p. 207.
Reprinted with kind permission. Also reprinted in Science, Computers, and People, Birkhäuser, Boston, 1986,
p. 48. Edgar Allan Poe (1809–1849), The Mystery of Marie Rogêt. Snowden’s Ladies’ Companion, Novem-
ber and December 1842 and February 1843, pp. 15–20, 93–99, 162–167. Collected Works, ed. Thomas Ollive
Mabbott, Harvard University Press, Cambridge MA, 1978, 723–774. Maj Sjöwall (*1935) and Per Wahlöö
(1926–1975), Mannen på balkongen, ch. 24, P. A. Norstedt & Söner, 1967. English translation: The Man On The
Balcony, Random House, New York, 1968. Reprinted with kind permission of Norstedts Förlag AB, Stockholm.
Chapter 19 (page 540): Carl Friedrich Gauß (1777–1855), Disquisitiones Arithmeticae, Duae methodi
numerorum factores investigandi. Article 329, p. 401. Carl Friedrich Gauß, Review of Ladislaus Chernac,
Cribrum Arithmeticum, 1811, in Göttingische Gelehrte Anzeigen (1812). Werke II, Königliche Gesellschaft
der Wissenschaften, Göttingen, 1863, p. 182. Reprinted by Georg Olms Verlag, Hildesheim New York, 1973.
Daniel W. Fish, The Complete Arithmetic, ch. Factoring, §162. Ivison, Blakeman, Taylor & Co, New York and
Chicago, 1874, p. 81. Maurice Borisovitch Kraïtchik (1882–1957), Théorie des nombres, Tome II, Gauthier-
Villars et Cie., Paris, 1926, chap. XII, p. 144. Richard Phillips Feynman (1918–1988), Surely You’re Joking,
Mr. Feynman. Adventures of a Curious Character. With Ralph Leighton. W. W. Norton Inc., 1984. Paperback:
Vintage, 1992, p. 77. Reprinted with kind permission of W. W. Norton & Company. Inc., New York and Random
House UK Limited, London.
Chapter 20 (page 572): Godfrey Harold Hardy (1877–1947), A Mathematician’s Apology, Cambridge Uni-
versity Press, 1940, ch. 28, p. 80. David Hilbert (1862–1943), Naturerkennen und Logik, Naturwissenschaften
(1930), 959–963. Gesammelte Abhandlungen, Springer-Verlag 1970, Teil 3, 378–387, p. 386. c Springer-
Verlag, Heidelberg. Reprinted with kind permission. Abraham Adrian Albert (1905–1972), Some Mathemat-
ical Aspects of Cryptography, Invited address, AMS Meeting in Manhattan KS on 22 November 1941. Collected
Mathematical Papers 2, AMS, Providence RI, 1993, 903–920. Reprinted with kind permission of American
Mathematical Society. Sir Arthur Conan Doyle (1859–1930), The Sign of the Four; or, The Problem of the
Sholtos, Lippincott’s Magazine, February 1890. Also The Sign of Four, Chapter 1, Spencer Blackett, London,
1890. Philip Friedman, Grand Jury, ch. 14, Ivy Books, Random House, Inc., New York, 1996. Tom Clancy
(*1947), The Cardinal of the Kremlin, ch. 18: Advantages, Harper Collins Publisher, London, 1988.
Chapter 21 (page 590): Joseph Louis Lagrange (1736–1813), Leçons élémentaires sur les Mathématiques,
Leçon Cinquième: Sur l’usage des courbes dans la solution des Problèmes, École Polytechnique, Paris, 1795.
Journal de l’École Polytechnique, VIIe et VIIIe cahiers, tome 2, 1812. Œuvres, publiées par J.-A. Serret,
Gauthiers-Villars, Paris, 1877, t. 7, 183–288, p. 271. Étienne Bézout (1739–1783), Recherches sur le degré
des équations résultantes de l’évanouissement des inconnues, Histoire de l’académie royale des sciences, 1764,
288–338, pp. 290–291. Francis Sowerby Macaulay (1862–1937), The Algebraic Theory of Modular Systems,
Introduction, Cambridge University Press, 1916, p. 2.
Chapter 22 (page 622): Ludwig Boltzmann (1844–1906), Gustav Robert Kirchhoff, Festrede, Graz, 15.11.
1887. Reprinted in: Ludwig Boltzmann, Populäre Schriften, eingeleitet und ausgewählt von Engelbert Broda,
Friedr. Vieweg & Sohn, Braunschweig/Wiesbaden, 1979, 47–53, p. 50. Reprinted with kind permission of Friedr.
Vieweg & Sohn, Wiesbaden. Marius Sophus Lie (1842–1899), Zur allgemeinen Theorie der partiellen Differ-
entialgleichungen beliebiger Ordnung, Leipziger Berichte 47 (1895), Math.-phys. Classe, 53–128, p. 53. Gesam-
melte Abhandlungen, herausgegeben durch Friedrich Engel und Poul Heegaard, B. G. Teubner, Leipzig, 1929,
Sources of quotations 729

vol. 4, p. 320. Augustus De Morgan (1806–1871), On Divergent Series, and various Points of Analysis con-
nected with them. Transactions of the Cambridge Philosophical Society 8 (1844), 182–203, p. 188. George
Berkeley (1684–1753), The Analyst, J. Tonson, London, 1734, sect. 7. William Shanks (1812–1882), Contri-
butions to Mathematics, comprising chiefly the Rectification of the Circle to 607 places of decimals, G. Bell,
London, 1853, p. vi. Excerpt reprinted in Berggren, Borwein & Borwein (1997), 147–161.
Chapter 23 (page 644): Joseph Rudyard Kipling (1865–1936), To the True Romance, In Many Inventions,
MacMillan, London, 1893. James Gleick (*1954), Genius: The life and science of Richard Feynman, Vintage
Books, Random House, Inc., New York, 1992, Prologue, p. 7. c Random House, Inc., New York. Reprinted
with kind permission. Eric Temple Bell (1883–1960), Men of Mathematics I, ch. 9: Analysis incarnate (Euler),
Penguin Books, 1937, p. 152. George Eyre Andrews (*1938), q-series: Their Development and Application in
Analysis, Number Theory, Combinatorics, Physics, and Computer Algebra, AMS Regional Conference Series in
Mathematics 66, American Mathematical Society, 1986, p. 87. Reprinted with kind permission of the American
Mathematical Society.
Chapter 24 (page 676): Alfred North Whitehead (1861–1947), An Introduction to Mathematics, Oxford
University Press, 1911, p. 71. Reprinted with kind permission. Jean le Rond D’Alembert (1717–1783), Quoted
in Edward Kasner, The present problems of geometry, Bulletin of the American Mathematical Society 11 (1905),
283–314, p. 285. Charles Babbage (1792–1871), On the Theoretical Principles of the Machinery for Calculat-
ing Tables, Letter to Dr. Brewster, 6 November, 1822. Appeared in Brewster’s Journal of Science. Reprinted in
Babbage’s Calculating Engines, E. and F. N. Spon, London, 1889, 216–219, p. 218. Reprinted in The Charles
Babbage Institute Reprint Series for the History of Computing, vol. II, Tomash Publishers, Los Angeles/San Fran-
cisco CA, 1982. Marko Petkovšek, Herbert Saul Wilf, and Doron Zeilberger, A=B, A K Peters, Natick MA,
1996, ch. 9, p. 193. Reprinted with kind permission.
End of Chapter 24 (page 699): Michel Eysquem Seigneur de Montaigne (1533–1592), Essais, Au Lecteur,
Bordeaux, 1580. Aules Persius Flaccus (34–62 AD), Satura prima, line 2. Published posthumously. Markus
Werner, Zündels Abgang, Residenz Verlag, 1984, p. 30. c 1984 Residenz Verlag, Salzburg und Wien. Reprinted
with kind permission. Carl Friedrich Gauß (1777–1855), Disquisitiones generales de congruentiis. Analysis
residuorum caput octavum. Article 367. Werke II, Handschriftlicher Nachlass, Königliche Gesellschaft der
Wissenschaften, Göttingen, 1863, 212–242. Reprinted by Georg Olms Verlag, Hildesheim New York, 1973.
Published posthumously, see page 372.
Chapter 25 (page 702): Sherlock Holmes’ most famous words do not occur in the writing of Sir Arthur
Conan Doyle (1859–1930). The actor Clifford Hardman (Clive) Brook (1887–1974) said them in his title role in
the first talking film The Return of Sherlock Holmes about the famous sleuth. Garrett Ford (1898–1945) and Basil
Dean wrote the screenplay, Basil Dean directed the movie of 79 minutes’ length, Paramount Famous Players
Lasky Corporation produced it, and it was released on 18 October 1929. William Kingdon Clifford (1845–
1879), The Common Sense of the Exact Sciences, London, 1885 (appeared posthumously), chap. 1, sect. 7, p. 20.
Izaak Walton (1593–1683), The Compleat Angler, Richard Marriot, London, 1653. Dedication to all readers.
p. xvii. John Updike (*1932), Rabbit is Rich, Fawcett Crest, New York, published by Ballantine Books, Random
House, Inc., 1982, ch. IV, p. 301. c Random House, Inc., New York. Reprinted with kind permission. Jonathan
Swift (1667–1745), Lemuel Gulliver, Travels into Several Remote Nations of the World, Part III: A voyage to
Laputa, Balribarbi, Glubbdubdrib, Luggnag, and Japan, Ch. V: The grand academy of Lagado, London, 1726.
References (page 734): Novalis (Friedrich Leopold Freiherr von Hardenberg) (1772–1801), Mathematische
Fragmente. In Schriften, hrsg. Richard Samuel, vol. 3, Verlag W. Kohlhammer, Stuttgart, 1983, Handschrift
Nr. 241, p. 594. Eugenio Beltrami (1835–1900), Foreword to A. Clebsch’s Commemorazione di Giulio Plücker,
Giornale di matematiche 11 (1873), Napoli, 153–179, p. 153. Bartel Leendert van der Waerden (1903–1996),
Ontwakende wetenschap, Een woord vooraf. P. Noordhoff N.V., Groningen, 1950, English translation by Arnold
Dresden: Science awakening, Oxford University Press, 1961. Raymond Chandler (1888–1959), The Simple
Art of Murder, An Essay, Houghton Mifflin, 1950. Copyright c 1950 by Raymond Chandler, c renewed 1978
by Helga Greene. Reprinted by kind permission of Houghton Mifflin Co. All rights reserved.
End of index (page 796): Al-Qur’ān, Sūra 27 al-naml (The ants), 76. Joseph Liouville (1809–1882), Œuvres
mathématiques d’Évariste Galois, Journal de mathématiques pures et appliquées 9 (1846), 381–444, p. 381.
René Descartes (1596–1650), Principia philosophiæ, Elzevier, Amsterdam, 1644. Œuvres de Descartes, tome
VIII-1, publiées par Charles Adam et Paul Tannery, 1905, p. 329. Reprinted by Librairie Philosophique J. Vrin,
Paris, 1973. Francis Sowerby Macaulay (1862–1937), The Algebraic Theory of Modular Systems, Preface,
Cambridge University Press, 1916, p. xiv. Robert Recorde (c. 1510–1558), The Whetstone of Witte, The preface.
London, 1557. Douglas Noël Adams (1952–2001), The Restaurant at the End of the Universe, Pan Books,
London, 1980. UK and Commonwealth copyright c Serious Productions Ltd 1980. Copyright for the rest of the
universe c Completely Unexpected Productions 1980. Reprinted with kind permission of The Crown Publishing
Group, New York, of Macmillan Publishers, London, and of Ed Victor Ltd, London.
Moritz’ (1914) compilation is a rich source of mathematical quotations.
List of algorithms
2.1 Addition of multiprecision integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Addition of polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Multiplication of polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4 Multiplication of multiprecision integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5 Polynomial division with remainder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Traditional Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.6 Traditional Extended Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.14 Extended Euclidean Algorithm (EEA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.17 Binary Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.8 Repeated squaring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4 Chinese Remainder Algorithm (CRA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.10 Small primes modular determinant computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.11 Gcd of primitive polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.28 Modular bivariate gcd: big prime version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
6.34 Modular gcd in Z[x]: big prime version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.36 Modular bivariate gcd: small primes version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.38 Modular gcd in Z[x]: small primes version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.45 Gcd of many polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.57 Modular EEA in Q[x]: small primes version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.59 Modular bivariate EEA: small primes version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.61 Primitive Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.1 Karatsuba’s polynomial multiplication algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.14 Fast Fourier Transform (FFT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
8.16 Fast convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
8.20 Fast negative wrapped convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.25 Three primes FFT integer multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
8.29 Fast convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
8.30 Schönhage’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
9.3 Inversion using Newton iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
9.5 Fast division with remainder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
9.10 p-adic inversion using Newton iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
9.14 Generalized Taylor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
9.22 p-adic Newton iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
9.35 Montgomery multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
10.3 Building up the subproduct tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
10.5 Going down the subproduct tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
10.7 Fast multipoint evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
10.9 Linear combination for linear moduli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
10.11 Fast interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
10.14 Fast simultaneous reduction with precomputation . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
10.16 Fast simultaneous modular reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
10.18 Simultaneous inverse computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
10.20 Linear combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
10.22 Fast Chinese Remainder Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
10.26 Building a mobile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
10.27 Building a Huffman tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
11.4 Half gcd for normal degree sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
11.6 Half gcd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
11.8 Fast Extended Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
12.1 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
12.3 Fast modular composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
12.9 Minimal polynomial for F N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
12.12 Solving a nonsingular square linear system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
12.13 Minimal polynomial for Krylov subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
12.20 Composition modulo powers of x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
14.3 Distinct-degree factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
14.8 Equal-degree splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
14.10 Equal-degree factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
14.13 Polynomial factorization over finite fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
14.15 Root finding over finite fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392

730
List of algorithms 731

14.17 Root finding over Z (big prime version) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392

14.19 Squarefree part in characteristic zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
14.21 Yun’s squarefree factorization in characteristic zero . . . . . . . . . . . . . . . . . . . . . . . . . 395
14.26 Iterated Frobenius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
14.31 Berlekamp’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
14.33 Kaltofen & Lobo’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
14.36 Irreducibility test over finite fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
14.40 Ben-Or’s generation of irreducible polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
14.48 Cyclotomic polynomial computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
14.52 Construction of BCH codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
14.54 Equal-degree splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
14.55 Norm computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
15.2 Factorization in Z[x] (big prime version) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
15.5 Polynomial factorization in Q[x] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
15.10 Hensel step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
15.17 Multifactor Hensel lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
15.19 Factorization in Z[x] (prime power version) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
15.22 Bivariate factorization (prime power version) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
15.24 Bivariate polynomial factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
16.10 Basis reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
16.22 Polynomial-time factorization in Z[x] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
16.26 Hermite normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
16.27 Basis reduction for polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
18.2 Fermat test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
18.5 Strong pseudoprimality test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
18.13 Lehmann’s primality test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
19.1 Trial division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
19.2 Pollard’s and Strassen’s integer factoring algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 544
19.6 Floyd’s cycle detection trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
19.8 Pollard’s ρ method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
19.12 Dixon’s random squares method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
19.16 Pollard’s p − 1 method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
19.22 Lenstra’s elliptic curve factoring method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
20.4 Functional decomposition of polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
21.11 Multivariate division with remainder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
21.33 Gröbner basis computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
22.10 Symbolic integration of rational functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630
22.14 Multiple of integration denominator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634
22.19 Hyperexponential integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637
22.22 Almkvist & Zeilberger’s multiple of integration denominator . . . . . . . . . . . . . . . . . . . . 643
23.13 Computation of the gff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658
23.18 Multiple of summation denominator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661
23.20 Gosper’s multiple of summation denominator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
23.25 Gosper’s algorithm for hypergeometric summation . . . . . . . . . . . . . . . . . . . . . . . . . . 665
List of figures and tables
1 Leitfaden. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 The structure formula for cyclohexane (C6 H12 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 A stereo image of a “chair” conformation of cyclohexane. . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Three “boat” conformations of cyclohexane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 A plumbing knee model of cyclohexane, with nearly right angles. . . . . . . . . . . . . . . . . . . 14
1.5 The curve E. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Eight flexible conformations of cyclohexane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7 A public key cryptosystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 An arithmetic circuit for polynomial multiplication. . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2 An arithmetic circuit for polynomial division. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 The probabilities that two random positive integers are coprime. . . . . . . . . . . . . . . . . . . . 55
3.2 The greatest common divisors of x and y for 1 ≤ x, y ≤ 200. . . . . . . . . . . . . . . . . . . . . . 56
4.1 Examples of continued fraction representations of real numbers. . . . . . . . . . . . . . . . . . . . 81
4.2 Rational approximations of π. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3 Computations of the decimal digits of π. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Continued fraction approximations of the length of the year. . . . . . . . . . . . . . . . . . . . . . 84
4.5 Frequency ratios of some musical intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.6 Frequency ratios with respect to the base tone C in the diatonic scale. . . . . . . . . . . . . . . . . 85
4.7 Part of a piano keyboard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.8 Diophantine approximations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.1 General scheme for modular algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2 General scheme for small primes modular algorithms. . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3 General scheme for prime power modular algorithms. . . . . . . . . . . . . . . . . . . . . . . . . 99
5.4 Lagrange interpolants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.5 Various interpolation problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.6 The Padé approximants of order 9 to the tangent function. . . . . . . . . . . . . . . . . . . . . . . 123
5.7 The difference of tan x to its Padé approximants of order 9 around the origin. . . . . . . . . . . . . 124
6.1 The Sylvester matrix of f and g. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.2 Intersection of the circle X with the line Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.3 Three curves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.4 Various gcd algorithms in Z[x]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
6.5 Comparison of various Euclidean Algorithms in Z[x] and F[x, y]. . . . . . . . . . . . . . . . . . . 195
6.6 The small primes modular gcd algorithm in Z[x] of NTL. . . . . . . . . . . . . . . . . . . . . . . 195
7.1 The BCH codes of length 15 over F2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8.1 An arithmetic circuit illustrating Karatsuba’s algorithm. . . . . . . . . . . . . . . . . . . . . . . . 224
8.2 Cost of Karatsuba’s algorithm for increasing recursion depths. . . . . . . . . . . . . . . . . . . . . 226
8.3 The 8th roots of unity in C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
8.4 An arithmetic circuit computing the FFT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8.5 Cost of the FFT for increasing recursion depths. . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8.6 Various polynomial multiplication algorithms and their running times. . . . . . . . . . . . . . . . 245
9.1 Newton iteration over the reals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
9.2 Convergence of Newton iteration to solve y3 = 1 over C. . . . . . . . . . . . . . . . . . . . . . . . 276
9.3 Representation of Z7 in Figure 9.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
9.4 Convergence of Newton iteration to solve y3 = 1 over the 7-adic integers. . . . . . . . . . . . . . . 277
9.5 Multiplication of polynomials in F2 [x] in B I P OL A R. . . . . . . . . . . . . . . . . . . . . . . . . . 281
9.6 Multiplication of polynomials in F2 [x] in B I P OL A R using the hybrid algorithm. . . . . . . . . . . 282
9.7 Polynomial division in F2 [x] in B I P OL A R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
9.8 Multiplication of integers in NTL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
9.9 Multiplication of polynomials of degree n − 1 with n bit integer coefficients in NTL. . . . . . . . . 284
9.10 Multiplication of polynomials with 64 bit integer coefficients in NTL. . . . . . . . . . . . . . . . . 285
9.11 Multiplication of polynomials of degree 63 with integer coefficients in NTL. . . . . . . . . . . . . 285
10.1 Subproduct tree for the multipoint evaluation algorithm. . . . . . . . . . . . . . . . . . . . . . . . 296
12.1 The matrix product in the modular composition algorithm 12.3. . . . . . . . . . . . . . . . . . . . 339
12.2 Evaluation cost for some matrix classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
12.3 Initial state of a linear feedback shift register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
13.1 An analog signal and the corresponding discrete signal. . . . . . . . . . . . . . . . . . . . . . . . 360
13.2 A 2π-periodic signal and its harmonics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
13.3 The discrete cosine signals γk for 0 ≤ k < 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
13.4 A grayscale image and the absolute values of its row-wise Discrete Cosine Transform. . . . . . . . 365
13.5 Row-wise DCT applied to the image from Figure 13.4. . . . . . . . . . . . . . . . . . . . . . . . . 366
13.6 Compression rates with the row-wise DCT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

732
List of figures and tables 733

13.7 Compression rates with the DCT for 8 × 8 squares. . . . . . . . . . . . . . . . . . . . . . . . . . . 367

13.8 DCT for 8 × 8 squares applied to the image from Figure 13.4. . . . . . . . . . . . . . . . . . . . . 367
14.1 Polynomial factorization in various domains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
14.2 The stages of univariate polynomial factorization over finite fields. . . . . . . . . . . . . . . . . . 379
14.3 Sample distinct-degree factorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
14.4 Squaring in F× ×
13 and F17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
14.5 The lucky and unlucky choices for factoring x4 + x3 + x − 1 ∈ F3 [x]. . . . . . . . . . . . . . . . . . 386
14.6 The workings of the equal-degree factorization algorithm 14.10. . . . . . . . . . . . . . . . . . . . 388
14.7 Sample polynomial factorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
14.8 The Berlekamp subalgebra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
14.9 Asymptotic running times of various factoring algorithms. . . . . . . . . . . . . . . . . . . . . . . 406
14.10 Algorithms for polynomial factorization over finite fields. . . . . . . . . . . . . . . . . . . . . . . 407
14.11 The number I(n, q) of irreducible polynomials for 2 ≤ n ≤ 10 and q ≤ 9. . . . . . . . . . . . . . . 410
14.12 The first 20 cyclotomic polynomials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
15.1 The building blocks of factoring algorithms for Z[x]. . . . . . . . . . . . . . . . . . . . . . . . . . 435
15.2 Factorization patterns in Z[x] and F p [x]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
15.3 Cycle types of Galois groups and their relative frequencies. . . . . . . . . . . . . . . . . . . . . . 442
15.4 Factorization patterns modulo primes not dividing the discriminant. . . . . . . . . . . . . . . . . . 443
15.5 Modular algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
15.6 Factoring polynomials in F2 [x] with B I P OL A R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
15.7 Factoring polynomials of degree n − 1 modulo k-bit primes in NTL. . . . . . . . . . . . . . . . . . 463
15.8 Factoring binomials in Z[x] in NTL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
15.9 Factoring products of two pseudorandom polynomials in NTL. . . . . . . . . . . . . . . . . . . . 464
15.10 Factoring polynomials with integer coefficients in NTL. . . . . . . . . . . . . . . . . . . . . . . . 464
16.1 The lattice in R 2 generated by (12, 2) and (13, 4). . . . . . . . . . . . . . . . . . . . . . . . . . . 474
16.2 The Gram-Schmidt orthogonal basis of (12, 2) and (13, 4). . . . . . . . . . . . . . . . . . . . . . . 476
16.3 The vectors computed by the basis reduction algorithm 16.10 for the lattice of Example 16.3. . . . 478
16.4 Trace of the basis reduction algorithm 16.10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
16.5 The effect of one replacement step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
16.6 Trace of the basis reduction algorithm 16.10 on the lattice Z(1, 1, 1) + Z(−1, 0, 2) + Z(3, 5, 6). . . . 484
16.7 An arithmetic circuit for x3 + y3 − z3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
17.1 The values of µ(n) and M(n) for n ≤ 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
18.1 Costs and requirements of various modular algorithms. . . . . . . . . . . . . . . . . . . . . . . . . 526
19.1 Some algorithms to factor an integer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
19.2 Factorization of Fermat numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
19.3 Factoring an integer into primes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
19.4 Pollard’s ρ method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
19.5 Some elliptic curves over the real numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
19.6 Adding two points on the elliptic curve y2 = x3 − x. . . . . . . . . . . . . . . . . . . . . . . . . . 559
19.7 Frequencies of the orders of all elliptic curves E over F7 . . . . . . . . . . . . . . . . . . . . . . . 562
19.8 Structure of the elliptic curve groups E and E ∗ from Example 19.21. . . . . . . . . . . . . . . . . 563
20.1 A public key cryptosystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
21.1 A two-segment robot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
21.2 The three medians AP, BQ, and CR of a triangle ABC intersect at the center of gravity S. . . . . . . 592
21.3 A circle and a line in R 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
21.4 A monomial ideal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
21.5 The twisted cubic in R 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
21.6 The twisted cubic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
23.1 The difference between Hn and ln n. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
23.2 A tower of books on a table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
23.3 The shift structure of x(x − 1)3 (x − 2)2 (x − 4)2 (x − 5). . . . . . . . . . . . . . . . . . . . . . . . . 655
23.4 The shift structure of f from Example 23.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
24.1 A marked Petri net. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
24.2 The Petri net from Figure 24.1 after one, two, and three firings, respectively. . . . . . . . . . . . . 680
24.3 A “chair” conformation of cyclohexane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686
24.4 Projections of V (F) and V (F) ∩ A onto the x, y plane. . . . . . . . . . . . . . . . . . . . . . . . . 690
25.1 The hierarchy of rings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
25.2 The subfield lattice of Fq12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
25.3 Containment relations between various complexity classes. . . . . . . . . . . . . . . . . . . . . . 723
Wer ein mathematisches Buch nicht mit Andacht ergreift
und es, wie Gottes Wort, ließt, der versteht es nicht.1
Novalis (1799)

I giovani [. . . ] imparino [. . . ] ad educarsi di buon’ora

sui capolavori dei grandi maestri, anzichè isterilire
l’ingegno in perpetue esercitazioni da scuola.2
Eugenio Beltrami (1873)

Het is niet alleen veel leerrijker, het geeft ook veel meer genot de
klassieke schrijvers zelf te lezen. [. . . ] Daarom zeg ik mijn lezers
met nadruk: geloof niets op mijn woord, maar kijk alles na!3
Bartel Leendert van der Waerden (1950)

Even Einstein couldn’t get very far if three hundred treatises

of the higher physics were published every year.
Raymond Chandler (1950)

References
The numbers in brackets at the end of a reference are the pages on which it is cited. Names of authors and titles
are usually given in the same form as on the article or book.

J OHN A BBOTT, V ICTOR S HOUP, and PAUL Z IMMERMANN (2000), Factorization in Z[x]: The Searching
Phase. In Proceedings of the 2000 International Symposium on Symbolic and Algebraic Computation
ISSAC2000, St. Andrews, Scotland, ed. C ARLO T RAVERSO, 1–7. [465]
S. A. Abramov (1971), O summirovanii racionalьnyh funkci i  . Жurnal
vyqislitelьnoi Matematiki i matematiqeskoi Fiziki 11(4), 1071–1075. S. A.
A BRAMOV, On the summation of rational functions, U.S.S.R. Computational Mathematics and
Mathematical Physics 11(4), 324–330. [671, 675]
S. A. Abramov (1975), Racionalьna komponenta rexeni line i nogo rekurrentnogo
sootnoxeni pervogo pordka s racionalьno i pravo i qastь . Жurnal
vyqislitelьnoi Matematiki i matematiqeskoi Fiziki 15(4), 1035–1039. S. A.
A BRAMOV, The rational component of the solution of a first-order linear recurrence relation with rational
right side, U.S.S.R. Computational Mathematics and Mathematical Physics 15(4), 216–221. [671]
S. A. Abramov (1989a), Zadaqi kompь terno i algebry, svzannye s poiskom
polinomialьnyh rexeni i  line i nyh differencialьnyh i raznostnyh
uravneni i  . Vestnik Moskovskogo Universiteta. Seri 15. Vyqislitelьna
Matematika i Kibernetika 3, 56–60. S. A. A BRAMOV, Problems of computer algebra involved
in the search for polynomial solutions of linear differential and difference equations, Moscow University
Computational Mathematics and Cybernetics 3, 63–68. [641, 671]
S. A. Abramov (1989b), Racionalьnye rexeni line i nyh differencialьnyh i
raznostnyh uravneni i  s polinomialьnymi koзfficientami. Жurnal
vyqislitelьnoi Matematiki i matematiqeskoi Fiziki 29(11), 1611–1620. S. A.
A BRAMOV, Rational solutions of linear differential and difference equations with polynomial coefficients,
U.S.S.R. Computational Mathematics and Mathematical Physics 29(6), 7–12. [641, 671]

1 He who does not take a mathematical book with reverence and reads it like God’s word, does not understand it.
2 Students [. . . ] should learn [. . . ] to study at an early stage the main works of the great masters instead of
making their minds sterile through the everlasting exercises of college.
3 It is not only more instructive but also more fun to read the classical authors themselves [. . . ] Therefore
I implore my readers: do not believe anything I say, verify everything!

734
References 735

S. A. A BRAMOV (1995), Rational solutions of linear difference and q-difference equations with polynomial
coefficients. In Proceedings of the 1995 International Symposium on Symbolic and Algebraic
Computation ISSAC ’95, Montreal, Canada, ed. A. H. M. L EVELT, ACM Press, 285–289. [671]
S ERGEI A. A BRAMOV, M ANUEL B RONSTEIN, and M ARKO P ETKOVŠEK (1995), On Polynomial Solutions of
Linear Operator Equations. In Proceedings of the 1995 International Symposium on Symbolic and
Algebraic Computation ISSAC ’95, Montreal, Canada, ed. A. H. M. L EVELT, ACM Press, 290–296.
[641]
S ERGEI A. A BRAMOV and M ARK VAN H OEIJ (1999), Integration of solutions of linear functional equations.
Integral Transforms and Special Functions 8(1–2), 3–12. [671]
S. A. A BRAMOV and K. Y U . K VANSENKO [K. Y U . K VASHENKO ] (1991), Fast Algorithms to Search for the
Rational Solutions of Linear Differential Equations with Polynomial Coefficients. In Proceedings of the
1991 International Symposium on Symbolic and Algebraic Computation ISSAC ’91, Bonn, Germany, ed.
S TEPHEN M. WATT, ACM Press, 267–270. [641]
S. A. A BRAMOV and M. P ETKOVŠEK (2001), Canonical Representations of Hypergeometric Terms. In Formal
Power Series and Algebraic Combinatorics (FPSAC01), Tempe AZ. [675]
L. M. A DLEMAN (1983), On Breaking Generalized Knapsack Public Key Cryptosystems. In Proceedings of the
Fifteenth Annual ACM Symposium on Theory of Computing, Boston MA, ACM Press, 402–412. [509]
L EONARD M. A DLEMAN (1994), Algorithmic Number Theory—The Complexity Contribution. In Proceedings
of the 35th Annual IEEE Symposium on Foundations of Computer Science, Santa Fe NM, ed. S HAFI
G OLDWASSER, IEEE Computer Society Press, Santa Fe NM, 88–113. [531]
L EONARD M. A DLEMAN and H ENDRIK W. L ENSTRA , J R . (1986), Finding Irreducible Polynomials over
Finite Fields. In Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing,
Berkeley CA, ACM Press, 350–355. [421]
M ANINDRA AGRAWAL, N EERAJ K AYAL, and N ITIN S AXENA (2004), PRIMES is in P. Annals of
Mathematics 160(2), 781–793. [517, 543]
A LFRED V. A HO, J OHN E. H OPCROFT, and J EFFREY D. U LLMAN (1974), The Design and Analysis of
Computer Algorithms. Addison-Wesley, Reading MA. [286, 332]
A. V. A HO, K. S TEIGLITZ, and J. D. U LLMAN (1975), Evaluating polynomials at fixed sets of points. SIAM
Journal on Computing 4, 533–539. [286, 292]
M. A JTAI (1997), The Shortest Vector Problem in L2 is N P-hard for Randomized Reductions. Electronic
Colloquium on Computational Complexity TR97-047. 33 pages. [496]
A NDRES A LBANESE, J OHANNES B LÖMER, J EFF E DMONDS, M ICHAEL L UBY, and M ADHU S UDAN (1994),
Priority Encoding Transmission. In Proceedings of the 35th Annual IEEE Symposium on Foundations of
Computer Science, Santa Fe NM, ed. S HAFI G OLDWASSER, IEEE Computer Society Press, Los
Alamitos CA, 604–612. [215]
W ILLIAM ROBERT A LFORD, A NDREW G RANVILLE, and C ARL P OMERANCE (1994), There are infinitely
many Carmichael numbers. Annals of Mathematics 140, 703–722. [529, 532]
G ERT A LMKVIST and D ORON Z EILBERGER (1990), The Method of Differentiating under the Integral Sign.
Journal of Symbolic Computation 10, 571–591. [641, 671]
N OGA A LON, J EFF E DMONDS, and M ICHAEL L UBY (1995), Linear Time Erasure Codes With Nearly Optimal
Recovery. In Proceedings of the 36th Annual IEEE Symposium on Foundations of Computer Science,
Milwaukee WI, IEEE Computer Society Press, Los Alamitos CA, 512–519. [215]
F RANCESCO A MOROSO (1989), Tests d’appartenance d’après un théorème de Kollár. Comptes Rendus de
l’Académie des Sciences Paris, série I 309, 691–694. [618]
G EORGE E. A NDREWS (1994), The Death of Proof? Semi-Rigorous Mathematics? You’ve Got to Be Kidding!
The Mathematical Intelligencer 16(4), 16–18. [697]
A NONYMOUS (1835), Wie sich die Division mit Zahlen erleichtern und zugleich sicherer ausführen läßt, als auf
die gewöhnliche Weise. Journal für die reine und angewandte Mathematik 13(3), 209–218. [41]
A NDREAS A NTONIOU (1979), Digital filters: analysis and design. McGraw-Hill electrical engineering series:
Communications and information theory section, McGraw-Hill, New York. [353]
TOM M. A POSTOL (1983), A Proof that Euler Missed: Evaluating ζ(2) the Easy Way. The Mathematical
Intelligencer 5(3), 59–60. Reprinted in Berggren, Borwein & Borwein (1997), 456–457. [62]
A RCHIMEDES (c. 250 BC), Κύκλου μέτρησις (Measurement of a circle). In Opera Omnia, vol. I, ed. I. L.
H EIBERG, 231–243. B. G. Teubner, Stuttgart, Germany, 1910. Reprinted 1972. [82]
A. A RWIN (1918), Über Kongruenzen von dem fünften und höheren Graden nach einem Primzahlmodulus.
Arkiv för matematik, astronomi och fysik 14(7), 1–46. [418]
C. A. A SMUTH and G. R. B LAKLEY (1982), Pooling, splitting and restituting information to overcome total
failure of some channels of communication. In Proceedings 1982 Symposium on Security and Privacy,
IEEE Computer Society Press, Los Alamitos CA, 156–159. [131]
A. O. L. ATKIN and R. G. L ARSON (1982), On a primality test of Solovay and Strassen. SIAM Journal on
Computing 11(4), 789–791. [532]
736 References

L. BABAI (1979), Monte Carlo algorithms in graph isomorphism testing. Technical Report 79-10, Département
de Mathématique et Statistique, Université de Montréal. [198, 724]
L ÁSZLÓ BABAI, E UGENE M. L UKS, and Á KOS S ERESS (1988), Fast Management of Permutation Groups.
In Proceedings of the 29th Annual IEEE Symposium on Foundations of Computer Science, White
Plains NY, IEEE Computer Society Press, Washington DC, 272–282. [724]
E RIC BACH (1990), Number-theoretic algorithms. Annual Review of Computer Science 4, 119–172. [531]
E RIC BACH (1996), Weil Bounds for Singular Curves. Applicable Algebra in Engineering, Communication and
Computing 7, 289–298. [568]
E RIC BACH, J OACHIM VON ZUR G ATHEN, and H ENDRIK W. L ENSTRA , J R . (2001), Factoring Polynomials
over Special Finite Fields. Finite Fields and Their Applications 7, 5–28. [421]
E RIC BACH, G ARY M ILLER, and J EFFREY S HALLIT (1986), Sums of divisors, perfect numbers and factoring.
SIAM Journal on Computing 15(4), 1143–1154. [532, 535]
E RIC BACH and J EFFREY S HALLIT (1988), Factoring with cyclotomic polynomials. Mathematics of
Computation 52(185), 201–219. [568]
E RIC BACH and J EFFREY S HALLIT (1996), Algorithmic Number Theory, Vol.1: Efficient Algorithms.
MIT Press, Cambridge MA. [61, 421, 531, 533, 534]
E RIC BACH and J ONATHAN S ORENSON (1993), Sieve algorithms for perfect power testing. Algorithmica 9,
313–328. [287]
E RIC BACH and J ONATHAN S ORENSON (1996), Explicit bounds for primes in residue classes. Mathematics of
Computation 65(216), 1717–1735. [529]
J OHANN S EBASTIAN BACH (1722), Das Wohltemperierte Klavier. BWV 846–893, Part I appeared in 1722,
Part II in 1738. [86]
C LAUDE G ASPAR BACHET DE M ÉZIRIAC (1612), Problèmes plaisans et délectables, qui se font par les
nombres. Pierre Rigaud, Lyon. [61]
DAVID H. BAILEY, K ING L EE, and H ORST D. S IMON (1990), Using Strassen’s Algorithm to Accelerate the
Solution of Linear Systems. The Journal of Supercomputing 4(4), 357–371. [2, 337]
G EORGE A. BAKER , J R . and P ETER G RAVES -M ORRIS (1996), Padé Approximants . Encyclopedia of
Mathematics and its Applications 59, Cambridge University Press, Cambridge, UK, 2nd edition. First
edition published in two volumes by Addison-Wesley, Reading MA, 1982. [132]
W. W. ROUSE BALL and H. S. M. C OXETER (1947), Mathematical Recreations & Essays. The Macmillan
Company, New York, American edition. First edition 1892. [531, 534]
J. M. BARBOUR (1948), Music and ternary continued fractions. The American Mathematical Monthly 55,
545–555. [91]
E RWIN H. BAREISS (1968), Sylvester’s Identity and Multistep Integer-Preserving Gaussian Elimination.
Mathematics of Computation 22(101–104), 565–578. [132]
A NDREJ BAUER and M ARKO P ETKOVŠEK (1999), Multibasic and Mixed Hypergeometric Gosper-Type
Algorithms. Journal of Symbolic Computation 28, 711–736. [671]
WALTER BAUR and VOLKER S TRASSEN (1983), The complexity of partial derivatives. Theoretical Computer
Science 22, 317–330. [352]
DAVID BAYER and M ICHAEL S TILLMAN (1988), On the complexity of computing syzygies. Journal of
Symbolic Computation 6, 135–147. [618]
PAUL W. B EAME, RUSSELL I MPAGLIAZZO, JAN K RAJÍ ČEK, TONIANN P ITASSI, and PAVEL P UDLÁK (1996),
Lower bounds on Hilbert’s Nullstellensatz and propositional proofs. Proceedings of the London
Mathematical Society 3, 1–26. [697]
PAUL B EAME and TONIANN P ITASSI (1998), Propositional Proof Complexity: Past, Present, and Future.
Bulletin of the European Association for Theoretical Computer Science 65, 66–89. [697]
T HOMAS B ECKER and VOLKER W EISPFENNING (1993), Gröbner Bases—A Computational Approach to
Commutative Algebra . Graduate Texts in Mathematics 141, Springer-Verlag, New York. [618]
A LBERT H. B EILER (1964), Recreations in the Theory of Numbers: The Queen of Mathematics Entertains.
Dover Publications, Inc., New York. [534]
E RIC T EMPLE B ELL (1937), Men of Mathematics. Penguin Books Ltd., Harmondsworth, Middlesex.
[219, 725, 726, 729]
C HRISTOF B ENECKE, ROLAND G RUND, R EINHARD H OHBERGER, A DALBERT K ERBER, R EINHARD L AUE,
and T HOMAS W IELAND (1995), MOLGEN, a computer algebra system for the generation of molecular
graphs. In Computer Algebra in Science and Engineering, Bielefeld, Germany, August 1994, eds.
J. F LEISCHER, J. G RABMEIER, F. W. H EHL, and W. K ÜCHLIN, World Scientific, Singapore, 260–272.
[698]
M. B EN -O R (1981), Probabilistic algorithms in finite fields. In Proceedings of the 22nd Annual IEEE
Symposium on Foundations of Computer Science, Nashville TN, 394–398. [421]
M. B EN -O R, D. KOZEN, and J. R EIF (1986), The complexity of elementary algebra and geometry. Journal of
Computer and System Sciences 32, 251–264. [619]
References 737

M ICHAEL B EN -O R and P RASOON T IWARI (1988), A Deterministic Algorithm For Sparse Multivariate
Polynomial Interpolation. In Proceedings of the Twentieth Annual ACM Symposium on Theory of
Computing, Chicago IL, ACM Press, 301–309. [498]
C ARLOS A. B ERENSTEIN and A LAIN Y GER (1990), Bounds for the Degrees in the Division Problem.
Michigan Mathematical Journal 37, 25–43. [618]
L ENNART B ERGGREN, J ONATHAN B ORWEIN, and P ETER B ORWEIN, eds. (1997), Pi: A Source Book.
Springer-Verlag, New York. [90, 729, 735, 737, 749, 751, 753, 761, 763]
E. R. B ERLEKAMP (1967), Factoring polynomials over finite fields. Bell System Technical Journal 46,
1853–1859. [401, 417, 420, 462]
E. R. B ERLEKAMP (1970), Factoring Polynomials Over Large Finite Fields. Mathematics of
Computation 24(11), 713–735. [198, 401, 406, 417, 419, 420, 421, 462, 465, 530]
E LWYN R. B ERLEKAMP (1984), Algebraic Coding Theory. Aegean Park Press. First edition McGraw Hill,
New York, 1968. [215, 467]
E LWYN R. B ERLEKAMP, ROBERT J. M C E LIECE, and H ENK C. A. VAN T ILBORG (1978), On the Inherent
Intractability of Certain Coding Problems. IEEE Transactions on Information Theory IT-24(3), 384–386.
[215]
B ENJAMIN P. B ERMAN and R ICHARD J. FATEMAN (1994), Optical character recognition for typeset
mathematics. In Proceedings of the 1994 International Symposium on Symbolic and Algebraic
Computation ISSAC ’94, Oxford, UK, eds. J. VON ZUR G ATHEN and M. G IESBRECHT, ACM Press,
348–353. [640]
J OANNES B ERNOULLIUS [J OHANN B ERNOULLI ] (1703), Problema exhibitum. Acta eruditorum, 26–31. [640]
DANIEL J. B ERNSTEIN (1998a), Composing Power Series Over a Finite Ring in Essentially Linear Time.
Journal of Symbolic Computation 26(3), 339–341. [353]
DANIEL J. B ERNSTEIN (1998b), Detecting perfect powers in essentially linear time. Mathematics of
Computation 67(223), 1253–1283. [287]
DANIEL J. B ERNSTEIN (2001), Multidigit multiplication for mathematicians. 19 pp.
http://cr.yp.to/papers/m3.ps. [247]
P. B ÉZIER (1970), Emploi des Machines à Commande Numérique. Masson & Cie , Paris. English translation:
Numerical Control, John Wiley & Sons, 1972. [138]
É TIENNE B ÉZOUT (1764), Recherches sur le degré des Équations résultantes de l’évanouissement des
inconnues, Et sur les moyens qu’il convient d’employer pour trouver ces Équations. Histoire de
l’académie royale des sciences, 288–338. Summary 88–91. [197, 724, 728]
J. B INET (1841), Recherches sur la théorie des nombres entiers et sur la résolution de l’équation indéterminée
du premier degré qui n’admet que des solutions entières. Journal de Mathématiques Pures et
Appliquées 6, 449–494. [61]
I AN B LAKE, G ADIEL S EROUSSI, and N IGEL S MART (1999), Elliptic Curves in Cryptography. London
Mathematical Society Lecture Note Series 265, Cambridge University Press. [580]
E NRICO B OMBIERI and A LFRED J. VAN DER P OORTEN (1995), Continued fractions of algebraic numbers.
In Computational Algebra and Number Theory, eds. W IEB B OSMA and A LF VAN DER P OORTEN,
Kluwer Academic Publishers, 137–155. [90]
O LAF B ONORDEN, J OACHIM VON ZUR G ATHEN, J ÜRGEN G ERHARD, O LAF M ÜLLER, and M ICHAEL
N ÖCKER (2001), Factoring a binary polynomial of degree over one million. ACM SIGSAM
Bulletin 35(1), 16–18. [461]
G EORGE B OOLE (1860), Calculus of finite differences. Chelsea Publishing Company, New York. 5th edition
1970. [669]
A. B ORODIN and R. M OENCK (1974), Fast Modular Transforms. Journal of Computer and System
Sciences 8(3), 366–386. [286, 306]
A. B ORODIN and I. M UNRO (1975), The Computational Complexity of Algebraic and Numeric Problems.
Theory of computation series 1, American Elsevier Publishing Company, New York. [306]
A LLAN B ORODIN and P RASOON T IWARI (1990), On the Decidability of Sparse Univariate Polynomial
Interpolation. In Proceedings of the Twenty-second Annual ACM Symposium on Theory of Computing,
Baltimore MD, ACM Press, 535–545. [498]
J. M. B ORWEIN, P. B. B ORWEIN, and D. H. BAILEY (1989), Ramanujan, Modular Equations, and
Approximations to Pi or How to Compute One Billion Digits of Pi. The American Mathematical
Monthly 96(3), 201–219. Reprinted in Berggren, Borwein & Borwein (1997), 623–641. [83]
R. C. B OSE and D. K. R AY-C HAUDHURI (1960), On A Class of Error Correcting Binary Group Codes.
Information and Control 3, 68–79. [215]
J OAN B OYAR (1989), Inferring Sequences Produced by Pseudo-Random Number Generators. Journal of
the ACM 36(1), 129–141. [505]
G ILLES B RASSARD and PAUL B RATLEY (1996), Fundamentals of Algorithmics. Prentice-Hall, Inc.,
Englewood Cliffs NJ. First published as Algorithmics - Theory & Practice, 1988. [41, 720]
A. B RAUER (1939), On addition chains. Bulletin of the American Mathematical Society 45, 736–739.
738 References

R ICHARD P. B RENT (1976), Analysis of the binary Euclidean algorithm. In Algorithms and Complexity, ed.
J. F. T RAUB, 321–355. Academic Press, New York. [61]
R ICHARD P. B RENT (1980), An improved Monte Carlo factorization algorithm. BIT 20, 176–184. [567]
R. P. B RENT (1989), Factorization of the eleventh Fermat number (preliminary report). AMS Abstracts 10,
89T-11-73. [542]
R ICHARD P. B RENT (1999), Factorization of the tenth Fermat number. Mathematics of Computation 68(225),
429–451. [542, 567]
R ICHARD P. B RENT, F RED G. G USTAVSON, and DAVID Y. Y. Y UN (1980), Fast Solution of Toeplitz Systems
of Equations and Computation of Padé Approximants. Journal of Algorithms 1, 259–295. [332]
R. P. B RENT and H. T. K UNG (1978), Fast Algorithms for Manipulating Formal Power Series. Journal of
the ACM 25(4), 581–595. [353, 354]
R ICHARD P. B RENT and J OHN M. P OLLARD (1981), Factorization of the Eighth Fermat Number. Mathematics
of Computation 36(154), 627–630. Preliminary announcement in AMS Abstracts 1 (1980), 565.
[542, 567]
E RNEST F. B RICKELL (1984), Solving low density knapsacks. In Advances in Cryptology: Proceedings of
CRYPTO ’83, Plenum Press, New York, 25–37. [509]
E RNEST F. B RICKELL (1985), Breaking iterated knapsacks. In Advances in Cryptology: Proceedings of
CRYPTO ’84, Santa Barbara, CA. Lecture Notes in Computer Science 196, Springer-Verlag, 342–358.
[509]
E GBERT B RIESKORN and H ORST K NÖRRER (1986), Plane Algebraic Curves. Birkhäuser Verlag, Basel. [568]
J OHN B RILLHART, D. H. L EHMER, J. L. S ELFRIDGE, B RYANT T UCKERMAN, and S. S. WAGSTAFF , J R .
(1988), Factorizations of bn ± 1, b = 2, 3, 5, 6, 7, 10, 11, 12 up to high powers. Contemporary
Mathematics 22, American Mathematical Society, Providence RI, 2nd edition. [542]
M ANUEL B RONSTEIN (1990), The Transcendental Risch Differential Equation. Journal of Symbolic
Computation 9, 49–60. [641]
M ANUEL B RONSTEIN (1991), The Risch Differential Equation on an Algebraic Curve. In Proceedings of the
1991 International Symposium on Symbolic and Algebraic Computation ISSAC ’91, Bonn, Germany, ed.
S TEPHEN M. WATT, ACM Press, 241–246. [641]
M ANUEL B RONSTEIN (1992), On solutions of linear ordinary differential equations in their coefficient field.
Journal of Symbolic Computation 13, 413–439. [641]
M ANUEL B RONSTEIN (1997), Symbolic Integration I—Transcendental Functions. Algorithms and
Computation in Mathematics 1, Springer-Verlag, Berlin Heidelberg. [640, 641, 642]
M ANUEL B RONSTEIN (2000), On Solutions of Linear Ordinary Difference Equations in their Coefficient Field.
Journal of Symbolic Computation 29, 841–877. [671]
M ANUEL BRRONSTEIN and A NNE F REDET (1999), Solving Linear Ordinary Differential Equations over
C(x, e f (x)dx ). In Proceedings of the 1999 International Symposium on Symbolic and Algebraic
Computation ISSAC ’99, Vancouver, Canada, ed. S AM D OOLEY, ACM Press, 173–180. [641]
W. S. B ROWN (1971), On Euclid’s Algorithm and the Computation of Polynomial Greatest Common Divisors.
Journal of the ACM 18(4), 478–504. [62, 197, 198, 199]
W. S. B ROWN (1978), The Subresultant PRS Algorithm. ACM Transactions on Mathematical Software 4(3),
237–249. [199]
W. S. B ROWN and J. F. T RAUB (1971), On Euclid’s Algorithm and the Theory of Subresultants. Journal of
the ACM 18(4), 505–514. [197, 199, 332]
W. DALE B ROWNAWELL (1987), Bounds for the degrees in the Nullstellensatz. Annals of Mathematics 126,
577–591. [618]
B RUNO B UCHBERGER (1965), Ein Algorithmus zum Auffinden der Basiselemente des Restklassenringes nach
einem nulldimensionalen Polynomideal. PhD thesis, Philosophische Fakultät an der
Leopold-Franzens-Universität, Innsbruck, Austria. [591, 609, 618]
B. B UCHBERGER (1970), Ein algorithmisches Kriterium für die Lösbarkeit eines algebraischen
Gleichungssystems. aequationes mathematicae 4(3), 271–272 and 374–383. English translation by
Michael Abramson and Robert Lumbert in Buchberger & Winkler (1998), 535–545. [618]
B. B UCHBERGER (1976), A theoretical basis for the reduction of polynomials to canonical forms.
ACM SIGSAM Bulletin 10(3), 19–29. [618]
B. B UCHBERGER (1985), Gröbner Bases: An Algorithmic Method in Polynomial Ideal Theory.
In Multidimensional Systems Theory, ed. N. K. B OSE, Mathematics and Its Applications, chapter 6,
184–232. D. Reidel Publishing Company, Dordrecht. [618]
B RUNO B UCHBERGER (1987), History and basic features of the critical–pair/completion procedure. Journal of
Symbolic Computation 3, 3–38. [618]
B RUNO B UCHBERGER and F RANZ W INKLER, eds. (1998), Gröbner Bases and Applications . London
Mathematical Society Lecture Note Series 251, Cambridge University Press, Cambridge, UK. [618, 738]
JAMES R. B UNCH and J OHN E. H OPCROFT (1974), Triangular Factorization and Inversion by Fast Matrix
Multiplication. Mathematics of Computation 28(125), 231–236. [352]
References 739

P ETER B ÜRGISSER (1998), On the Parallel Complexity of the Polynomial Ideal Membership Problem. Journal
of Complexity 14, 176–189. [616]
P ETER B ÜRGISSER, M ICHAEL C LAUSEN, and M. A MIN S HOKROLLAHI (1997), Algebraic Complexity
Theory. Grundlehren der mathematischen Wissenschaften 315, Springer-Verlag. [88, 222, 286, 338, 352]
C HRISTOPH B URNIKEL and J OACHIM Z IEGLER (1998), Fast Recursive Division. Research Report
MPI-I-98-1-022, Max-Planck-Institut für Informatik, Saarbrücken, Germany.
http://domino.mpi-inf.mpg.de/internet/reports.nsf/NumberView/1998-1-022, iv + 27
pages. [286]
S. B USS, R. I MPAGLIAZZO, J. K RAJÍ ČEK, P. P UDLÁK, A. A. R AZBOROV, and J. S GALL (1996/97), Proof
complexity in algebraic systems and bounded depth Frege systems with modular counting. computational
complexity 6(3), 256–298. [697]
M. C. R. B UTLER (1954), On the reducibility of polynomials over a finite field. Quarterly Journal of
Mathematics Oxford 5(2), 102–107. [420]
J OHN J. C ADE (1987), A modification of a broken public-key cipher. In Advances in Cryptology: Proceedings
of CRYPTO ’86, Santa Barbara, CA, ed. A. M. O DLYZKO. Lecture Notes in Computer Science 263,
Springer-Verlag, 64–83. [576]
PAUL C AMION (1980), Un algorithme de construction des idempotents primitifs d’idéaux d’algèbres sur Fq .
Comptes Rendus de l’Académie des Sciences Paris 291, 479–482. [420]
PAUL C AMION (1981), Factorisation des polynômes de Fq . Revue du CETHEDEC 18, 1–17. [419]
PAUL C AMION (1982), Un algorithme de construction des idempotents primitifs d’idéaux d’algèbres sur Fq .
Annals of Discrete Mathematics 12, 55–63. [419]
PAUL F. C AMION (1983), Improving an Algorithm for Factoring Polynomials over a Finite Field and
Constructing Large Irreducible Polynomials. IEEE Transactions on Information Theory IT-29(3),
378–385. [419]
E. R. C ANFIELD, PAUL E RD ŐS, and C ARL P OMERANCE (1983), On a problem of Oppenheim concerning
‘Factorisatio Numerorum’. Journal of Number Theory 17, 1–28. [567]
L ÉANDRO C ANIGLIA, A NDRÉ G ALLIGO, and J OOS H EINTZ (1988), Borne simple exponentielle pour les
degrés dans le théorème des zéros sur un corps de caractéristique quelconque. Comptes Rendus de
l’Académie des Sciences Paris, série I 307, 255–258. [619]
L ÉANDRO C ANIGLIA, A NDRÉ G ALLIGO, and J OOS H EINTZ (1989), Some new effectivity bounds in
computational geometry. In Algebraic Algorithms and Error-Correcting Codes: AAECC-6, Rome, Italy,
1988, ed. T. M ORA, Lecture Notes in Computer Science 357, 131–152. Springer-Verlag. [618]
J OHN C ANNY (1987), A New Algebraic Method for Robot Motion Planning and Real Geometry.
In Proceedings of the 28th Annual IEEE Symposium on Foundations of Computer Science,
Los Angeles CA, IEEE Computer Society Press, Washington DC, 39–48. [619]
J OHN F. C ANNY (1988), The Complexity of Robot Motion Planning. ACM Doctoral Dissertation Award 1987,
MIT Press, Cambridge MA. [619]
DAVID G. C ANTOR (1989), On Arithmetical Algorithms over Finite Fields. Journal of Combinatorial Theory,
Series A 50, 285–300. [280, 281, 282, 287]
DAVID G. C ANTOR and DANIEL M. G ORDON (2000), Factoring Polynomials over p-Adic Fields.
In Algorithmic Number Theory, Fourth International Symposium, ANTS-IV, Leiden, The Netherlands, ed.
W IEB B OSMA, Springer-Verlag, 185–208. [466]
DAVID G. C ANTOR and E RICH K ALTOFEN (1991), On Fast Multiplication of Polynomials Over Arbitrary
Algebras. Acta Informatica 28, 693–701. [245, 247]
DAVID G. C ANTOR and H ANS Z ASSENHAUS (1981), A New Algorithm for Factoring Polynomials Over Finite
Fields. Mathematics of Computation 36(154), 587–592. [405, 406, 417, 418]
L EONARD C ARLITZ (1932), The arithmetic of polynomials in a Galois field. American Journal of
Mathematics 54, 39–50. [426]
R. D. C ARMICHAEL (1909/10), Note on a new number theory function. Bulletin of the American Mathematical
Society 16, 232–238. [531]
R. D. C ARMICHAEL (1912), On composite numbers P which satisfy the Fermat congruence aP−1 ≡ 1 mod P.
The American Mathematical Monthly 19, 22–27. [531]
T HOMAS R. C ARON and ROBERT D. S ILVERMAN (1988), Parallel implementation of the quadratic sieve. The
Journal of Supercomputing 1, 273–290. [531, 567]
PAUL DE FAGET DE C ASTELJAU (1985), Shape mathematics and CAD. Hermes Publishing, Paris. [138]
P IETRO A NTONIO C ATALDI (1513), Trattato del modo brevissimo di trouare la radice quadra delli numeri.
Bartolomeo Cochi, Bologna. [89]
AUGUSTIN C AUCHY (1821), Sur la formule de Lagrange relative à l’interpolation. In Cours d’analyse de
l’École Royale Polytechnique (Analyse algébrique), Note V. Imprimerie royale Debure frères, Paris.
Œuvres Complètes, IIe série, tome III, Gauthier-Villars, Paris, 1897, 429–433. [132]
740 References

AUGUSTIN C AUCHY (1840), Mémoire sur l’élimination d’une variable entre deux équations algébriques.
In Exercices d’analyse et de physique mathématique, tome 1er . Bachelier, Paris. Œuvres Complètes,
IIe série, tome 11. Gauthier-Villars, Paris, 1913, 466–509. [197]
AUGUSTIN C AUCHY (1841), Mémoire sur diverses formules relatives à l’Algèbre et à la théorie des nombres.
Comptes Rendus de l’Académie des Sciences Paris 12, p. 813 ff. Œuvres Complètes, Ire série, tome 6,
Gauthier-Villars, Paris, 1888, 113–146. [131]
AUGUSTIN C AUCHY (1847), Mémoire sur les racines des équivalences correspondantes à des modules
quelconques premiers ou non premiers, et sur les avantages que présente l’emploi de ces racines dans la
théorie des nombres. Comptes Rendus de l’Académie des Sciences Paris 25, p. 37 ff. Œuvres Complètes,
Ire série, tome 10, Gauthier-Villars, Paris, 1897, 324–333. [286]
B. F. C AVINESS (1970), On Canonical Forms and Simplification. Journal of the Association for Computing
Machinery 17(2), 385–396. [640]
A RTHUR C AYLEY (1848), On the theory of elimination. The Cambridge and Dublin Mathematical Journal 3,
116–120. Also Cambridge Mathematical Journal 7. [197]
M IGUEL DE C ERVANTES S AAVEDRA (1615), El ingenioso cavallero Don Quixote de la Mancha, segunda parte .
Francisco de Robles, Madrid. [90]
JASBIR S. C HAHAL (1995), Manin’s Proof of the Hasse Inequality Revisited. Nieuw Archief voor Wiskunde,
Vierde serie 13(2), 219–232. [568]
B RUCE W. C HAR, K EITH O. G EDDES, and G ASTON H. G ONNET (1989), GCDHEU: Heuristic Polynomial
GCD Algorithm Based On Integer GCD Computation. Journal of Symbolic Computation 7, 31–48.
Extended Abstract in Proceedings of EUROSAM ’84, ed. J OHN F ITCH, Lecture Notes in Computer
Science 174, Springer-Verlag, 285–296. [202]
N. T SCHEBOTAREFF [N. C HEBOTAREV ] (1926), Die Bestimmung der Dichtigkeit einer Menge von
Primzahlen, welche zu einer gegebenen Substitutionsklasse gehören. Mathematische Annalen 95,
191–228. [441, 465]
P. L. Qebyxev (1849), Obь opredǌlenii qisla prostyhь qiselь ne prevoshodw ihь
danno i veliqiny. Mémoires présentés à l’Académie Impériale des sciences de St.-Pétersbourg par
divers savants 6, 141–157. P. L. C HEBYSHEV, Sur la fonction qui détermine la totalité des nombres
premiers inférieurs à une limite donnée. Journal de Mathématiques Pures et Appliquées, I série 17 (1852),
341–365. Œuvres I, eds. A. M ARKOFF and N. S ONIN, 1899, reprint by Chelsea Publishing Co.,
New York, 26–48. [533]
P. L. C HEBYSHEV (1852), Mémoire sur les nombres premiers. Journal de Mathématiques Pures et Appliquées,
I série 17, 366–390. Mémoires présentées à l’Académie Impériale des sciences de St.-Pétersbourg par
divers savants 6 (1854), 17–33. Œuvres I, eds. A. M ARKOFF and N. S ONIN, 1899, reprint by Chelsea
Publishing Co., New York, 49–70. [533]
Z HI -Z HONG C HEN and M ING -YANG K AO (1997), Reducing Randomness via Irrational Numbers.
In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, El Paso TX,
ACM Press, 200–209. [199]
Z HI -Z HONG C HEN and M ING -YANG K AO (2000), Reducing randomness via irrational numbers. SIAM
Journal on Computing 29(4), 1247–1256. [199]
A LEXANDRE L. C HISTOV (1990), Efficient Factoring Polynomials over Local Fields and Its Applications.
In Proceedings of the International Congress of Mathematicians 1990, Kyoto, Japan, vol. II, 1509–1519.
Springer-Verlag. [466]
A. L. C HISTOV and D. Y U . G RIGOR ’ EV (1984), Complexity of quantifier elimination in the theory of
algebraically closed fields. In Proceedings of the 11th International Symposium Mathematical
Foundations of Computer Science 1984, Praha, Czechoslovakia. Lecture Notes in Computer Science 176,
Springer-Verlag, Berlin, 17–31. [619]
B ENNY C HOR and RONALD L. R IVEST (1988), A knapsack–type public key cryptosystem based on arithmetic
in finite fields. IEEE Transactions on Information Theory IT-34(5), 901–909. Advances in Cryptology:
Proceedings of CRYPTO 1984, Santa Barbara CA, Lecture Notes in Computer Science 196,
Springer-Verlag, New York, 1985, 54–65. [509]
C.-C. C HOU, Y.-F. D ENG, G. L I, and Y. WANG (1995), Parallelizing Strassen’s Method for Matrix
Multiplication on Distributed-Memory MIMD Architectures. Computers & Mathematics with
Applications 30(2), 49–69. [352]
F RÉDÉRIC C HYZAK (1998a), Fonctions holonomes en calcul formel. PhD thesis, École Polytechnique, Paris.
[671]
F RÉDÉRIC C HYZAK (1998b), Gröbner Bases, Symbolic Summation and Symbolic Integration. In Gröbner
Bases and Applications , eds. B RUNO B UCHBERGER and F RANZ W INKLER. London Mathematical
Society Lecture Note Series 251, Cambridge University Press, Cambridge, UK, 32–60. [671]
F RÉDÉRIC C HYZAK (2000), An extension of Zeilberger’s fast algorithm to general holonomic functions.
Discrete Mathematics 217, 115–134. [671]
References 741

F RÉDÉRIC C HYZAK and B RUNO S ALVY (1998), Non-commutative Elimination in Ore Algebras Proves
Multivariate Identities. Journal of Symbolic Computation 26(2), 187–227. [671]
M ICHAEL C LAUSEN, A NDREAS D RESS, J OHANNES G RABMEIER, and M AREK K ARPINSKI (1991),
On Zero–Testing and Interpolation of k-Sparse Multivariate Polynomials over Finite Fields. Theoretical
Computer Science 84, 151–164. [498]
M ATTHEW C LEGG, J EFFREY E DMONDS, and RUSSELL I MPAGLIAZZO (1996), Using the Groebner basis
algorithm to find proofs of unsatisfiability. In Proceedings of the Twenty-eighth Annual ACM
Symposium on Theory of Computing, Philadelphia PA, ACM Press, 174–183. [679]
G. E. C OLLINS (1966), Polynomial remainder sequences and determinants. The American Mathematical
Monthly 73, 708–712. [197, 199]
G EORGE E. C OLLINS (1967), Subresultants and Reduced Polynomial Remainder Sequences. Journal of
the ACM 14(1), 128–142. [197, 199, 332]
G. E. C OLLINS (1971), The Calculation of Multivariate Polynomial Resultants. Journal of the ACM 18(4),
515–532. [197, 198]
G. E. C OLLINS (1973), Computer algebra of polynomials and rational functions. The American Mathematical
Monthly 80, 725–55. [199]
G. E. C OLLINS (1975), Quantifier elimination for real closed fields by cylindrical algebraic decomposition.
Lecture Notes in Computer Science 33, Springer-Verlag. [619]
G. E. C OLLINS (1979), Factoring univariate integral polynomials in polynomial average time. In Proceedings
of EUROSAM ’79, Marseille, France. Lecture Notes in Computer Science 72, 317–329. [455, 465]
G EORGE E. C OLLINS and M ARK J. E NCARNACIÓN (1996), Improved Techniques for Factoring Univariate
Polynomials. Journal of Symbolic Computation 21, 313–327. [465]
S. A. C OOK (1966), On the minimum computation time of functions. Doctoral Thesis, Harvard University,
Cambridge MA. [247, 286]
S TEPHEN A. C OOK (1971), The Complexity of Theorem–Proving Procedures. In Proceedings of the Third
Annual ACM Symposium on Theory of Computing, Shaker Heights OH, ACM Press, 151–158. [722]
JAMES W. C OOLEY (1987), The Re–Discovery of the Fast Fourier Transform Algorithm. Mikrochimica Acta 3,
33–45. [247, 727]
JAMES W. C OOLEY (1990), How the FFT Gained Acceptance. In A History of Scientific Computing, ed.
S TEPHEN G. NASH, ACM Press, New York, and Addison-Wesley, Reading MA, 133–140. [247]
JAMES W. C OOLEY and J OHN W. T UKEY (1965), An Algorithm for the Machine Calculation of Complex
Fourier Series. Mathematics of Computation 19, 297–301. [233, 247]
D. C OPPERSMITH (1993), Solving Linear Equations Over GF(2): Block Lanczos Algorithm. Linear Algebra
and its Applications 192, 33–60. [353]
D ON C OPPERSMITH (1994), Solving homogeneous linear equations over GF(2) via block Wiedemann
algorithm. Mathematics of Computation 62(205), 333–350. [353]
D ON C OPPERSMITH and S HMUEL W INOGRAD (1990), Matrix Multiplication via Arithmetic Progressions.
Journal of Symbolic Computation 9, 251–280. [352, 420]
ROBERT M. C ORLESS, E RICH K ALTOFEN, and S TEPHEN M. WATT (2003), Hybrid Methods. In Computer
Algebra Handbook – Foundations, Applications, Systems, eds. J OHANNES G RABMEIER, E RICH
K ALTOFEN, and VOLKER W EISPFENNING, 112–125. Springer-Verlag, Berlin, Heidelberg, New York.
[41]
T HOMAS H. C ORMEN, C HARLES E. L EISERSON, RONALD L. R IVEST, and C LIFFORD S TEIN (2009),
Introduction to Algorithms. MIT Press, Cambridge MA, London UK, third edition. [41, 368]
JAMES C OWIE, B RUCE D ODSON, R. M ARIJE E LKENBRACHT-H UIZING, A RJEN K. L ENSTRA, P ETER L.
M ONTGOMERY, and J ÖRG Z AYER (1996), A World Wide Number Field Sieve Factoring Record: On to
512 Bits. In Advances in Cryptology—ASIACRYPT ’96. Lecture Notes in Computer Science 1163,
Springer-Verlag, 382–394. [569]
DAVID A. C OX (1989), Primes of the Form x2 + ny2 — Fermat, Class Field Theory, and Complex
Multiplication . John Wiley & Sons, New York. [568]
DAVID C OX, J OHN L ITTLE, and D ONAL O’S HEA (1997), Ideals, Varieties, and Algorithms: An Introduction to
Computational Algebraic Geometry and Commutative Algebra. Undergraduate Texts in Mathematics,
Springer-Verlag, New York, 2nd edition. First edition 1992. [614, 617, 618]
DAVID C OX, J OHN L ITTLE, and D ONAL O’S HEA (1998), Using Algebraic Geometry. Graduate Texts in
Mathematics 185, Springer-Verlag, New York. [617]
G ABRIEL C RAMER (1750), Introduction a l’analyse des lignes courbes algébriques. Frères Cramer &
Cl. Philibert, Genève. [198, 724]
J OHN N. C ROSSLEY and A LAN S. H ENRY (1990), Thus Spake al-Khwārizmı̄: A Translation of the Text of
Cambridge University Library Ms. Ii.vi.5. Historia Mathematica 17, 103–131. [727]
A LLAN J. C. C UNNINGHAM and H. J. W OODALL (1925), Factorization of (yn ∓ 1), y = 2, 3, 5, 6, 7, 10, 11, 12
up to high powers (n). Francis Hodgson, London. [541]
742 References

I VAN DAMGÅRD, P ETER L ANDROCK, and C ARL P OMERANCE (1993), Average case error estimates for the
strong probable prime test. Mathematics of Computation 61(203), 177–194. [532]
J. H. DAVENPORT (1986), The Risch differential equation problem. SIAM Journal on Computing 15(4),
903–918. [641]
P IERRE D ÈBES (1996), Hilbert subsets and s-integral points. Manuscripta Mathematica 89, 107–137. [498]
R ICHARD A. D E M ILLO and R ICHARD J. L IPTON (1978), A probabilistic remark on algebraic program testing.
Information Processing Letters 7(4), 193–195. [88, 198]
A NGEL D ÍAZ and E RICH K ALTOFEN (1995), On Computing Greatest Common Divisors with Polynomials
Given By Black Boxes for Their Evaluations. In Proceedings of the 1995 International Symposium on
Symbolic and Algebraic Computation ISSAC ’95, Montreal, Canada, ed. A. H. M. L EVELT, ACM Press,
232–239. [199]
A NGEL D ÍAZ and E RICH K ALTOFEN (1998), F OX B OX: A System for Manipulating Symbolic Objects in Black
Box Representation. In Proceedings of the 1998 International Symposium on Symbolic and Algebraic
Computation ISSAC ’98, Rostock, Germany, ed. O LIVER G LOOR, ACM Press, 30–37. [498]
L EONARD E UGENE D ICKSON (1919), History of the Theory of Numbers, vol. 1. Carnegie Institute of
Washington. Published in 1919, 1920, and 1923 as publication 256. Reprinted by Chelsea Publishing
Company, New York, N.Y., 1971. [88]
W HITFIELD D IFFIE and M ARTIN E. H ELLMAN (1976), New directions in cryptography. IEEE Transactions on
Information Theory IT-22(6), 644–654. [503, 575, 576, 578, 581]
G. L EJEUNE D IRICHLET (1837), Beweis des Satzes, dass jede unbegrenzte arithmetische Progression, deren
erstes Glied und Differenz ganze Zahlen ohne gemeinschaftlichen Factor sind, unendlich viele Primzahlen
enthält. Abhandlungen der Königlich Preussischen Akademie der Wissenschaften, 45–81. Werke, Erster
Band, ed. L. K RONECKER, 1889, 315–342. Reprint by Chelsea Publishing Co., 1969. [528]
G. L EJEUNE D IRICHLET (1842), Verallgemeinerung eines Satzes aus der Lehre von den Kettenbrüchen nebst
einigen Anwendungen auf die Theorie der Zahlen. Bericht über die Verhandlungen der Königlich
Preussischen Akademie der Wissenschaften, 93–95. Werke, Erster Band, ed. L. K RONECKER, 1889,
635–638. Reprint by Chelsea Publishing Co., 1969. [506, 509]
G. L EJEUNE D IRICHLET (1849), Über die Bestimmung der mittleren Werthe in der Zahlentheorie.
Abhandlungen der Königlich Preussischen Akademie der Wissenschaften, 69–83. Werke, Zweiter Band,
ed. L. K RONECKER, 1897, 51–66. Reprint by Chelsea Publishing Co., 1969. [62]
P. G. L EJEUNE D IRICHLET (1893), Vorlesungen über Zahlentheorie, herausgegeben von R. D EDEKIND.
Friedrich Vieweg & Sohn, Braunschweig, 4th edition. Corrected reprint, Chelsea Publishing Co.,
New York, 1968. First edition 1863. [707]
J OHN D. D IXON (1970), The Number of Steps in the Euclidean Algorithm. Journal of Number Theory 2,
414–422. [61]
J OHN D. D IXON (1981), Asymptotically Fast Factorization of Integers. Mathematics of Computation 36(153),
255–260. [541, 549, 569]
B RUCE D ODSON and A RJEN K. L ENSTRA (1995), NFS with Four Large Primes: An Explosive Experiment.
In Advances in Cryptology: Proceedings of CRYPTO ’95, Santa Barbara, CA, ed. D ON C OPPERSMITH.
Lecture Notes in Computer Science 963, Springer-Verlag, 372–385. [569]
K ARL D ÖRGE (1926), Über die Seltenheit der reduziblen Polynome und der Normalgleichungen.
Mathematische Annalen 95, 247–256. [466]
J EAN L OUIS D ORNSTETTER (1987), On the Equivalence Between Berlekamp’s and Euclid’s Algorithms. IEEE
Transactions on Information Theory IT-33(3), 428–431. [215]
M. W. D ROBISCH (1855), Über musikalische Tonbestimmung und Temperatur. Abhandlungen der
Mathematisch-Physischen Classe der Königlich Sächsischen Gesellschaft der Wissenschaften 4, 1–120
plus 1 table. [91]
T HOMAS W. D UBÉ (1990), The structure of polynomial ideals and Gröbner bases. SIAM Journal on
Computing 19(4), 750–773. [618]
R AYMOND D UBOIS (1971), Utilisation d’un théorème de Fermat à la découverte des nombres premiers et notes
sur les nombres de Fibonacci . Albert Blanchard, Paris. [532]
L IONEL D UCOS (2000), Optimizations of the subresultant algorithm. Journal of Pure and Applied Algebra 145,
149–163. [199]
ATHANASE D UPRÉ (1846), Sur le nombre des divisions a effectuer pour obtenir le plus grand commun diviseur
entre deux nombres entiers. Journal de Mathématiques Pures et Appliquées 11, 41–64. [61]
WAYNE E BERLY and E RICH K ALTOFEN (1997), On Randomized Lanczos Algorithms. In Proceedings of the
1997 International Symposium on Symbolic and Algebraic Computation ISSAC ’97, Maui HI, ed.
W OLFGANG W. K ÜCHLIN, ACM Press, 176–183. [353]
JACK E DMONDS (1967), Systems of Distinct Representatives and Linear Algebra. Journal of Research of the
National Bureau of Standards 71B(4), 241–245. [132]
D. E ISENBUD and L. ROBBIANO, eds. (1993), Computational algebraic geometry and commutative algebra.
Symposia Mathematica 34, Cambridge University Press, Cambridge, UK. [617]
References 743

D. E ISENBUD and B. S TURMFELS (1996), Binomial ideals. Duke Mathematical Journal 84(1), 1–45. [697]
G. E ISENSTEIN (1844), Einfacher Algorithmus zur Bestimmung des Werthes von ab . Journal für die reine und
angewandte Mathematik 27(4), 317–318. [533]
S HALOSH B. E KHAD (1990), A Very Short Proof of Dixon’s Theorem. Journal of Combinatorial Theory,
Series A 54, 141–142. [697]
S HALOSH B. E KHAD and S OL T RE (1990), A Purely Verification Proof of the First Rogers–Ramanujan
Identity. Journal of Combinatorial Theory, Series A 54, 309–311. [697]
I. Z. E MIRIS and B. M OURRAIN (1999), Computer Algebra Methods for Studying and Computing Molecular
Conformations. Algorithmica 25(2/3), 372–402. Special Issue on Algorithms for Computational Biology.
[698]
L EONHARD E ULER (1732/33), Observationes de theoremate quodam Fermatiano aliisque ad numeros primos
spectantibus. Commentarii academiae scientiarum imperalis Petropolitanae 6, 103–107. Eneström 26.
Opera Omnia, series 1, volume 2, B. G. Teubner, Leipzig, 1915, 1–5. [76, 88, 513, 542]
L EONHARD E ULER (1734/35a), Solutio problematis arithmetici de inveniendo numero qui per datos numeros
divisus relinquat data residua. Commentarii academiae scientiarum imperalis Petropolitanae 7, 46–66.
Eneström 36. Opera Omnia, series 1, volume 2, B. G. Teubner, Leipzig, 1915, 18–32. [131]
L EONHARD E ULER (1734/35b), De summis serierum reciprocarum. Commentarii Academiae Scientiarum
Petropolitanae 7, 123–134. Eneström 41. Opera Omnia, series 1, volume 14, B. G. Teubner, Leipzig,
1925, 73–86. [62]
L EONHARD E ULER (1736a), Mechanica sive motus scientia analytice exposita, Tomus I. Typographia
Academia Scientiarum, Petropolis. Opera Omnia, series 2, volume 1, B. G. Teubner, Leipzig, 1912. [90]
L EONHARD E ULER (1736b), Theorematum quorundam ad numeros primos spectantium demonstratio.
Commentarii academiae scientiarum imperalis Petropolitanae 8, 1741, 141–146. Eneström 54. Opera
Omnia, series 1, volume 2, B. G. Teubner, Leipzig, 1915, 33–37. [88]
L EONHARD E ULER (1737), De fractionibus continuis dissertatio. Commentarii academiae scientiarum
imperalis Petropolitanae 9, 1744, 98–137. Eneström 71. Opera Omnia, series 1, volume 14, B. G.
Teubner, Leipzig, 1925, 187–215. [89, 90, 91]
L EONHARD E ULER (1743), Démonstration de la somme de cette suite 1 + 14 + 19 + 16 1 1
+ 25 1
+ 36 + etc. Journal
littéraire d’Allemagne, de Suisse et du Nord (La Haye) 2, 115–127. Bibliotheca Mathematica, Serie 3, 8
1907–1908, 54–60. Eneström 63. Opera Omnia, series 1, volume 14, 177–186. [62]
L EONHARD E ULER (1747/48), Theoremata circa divisores numerorum. Novi commentarii academiae
scientiarum imperalis Petropolitanae 1, 20–48. Summarium ibidem, 35–37. Eneström 134. Opera Omnia,
series 1, volume 2, B. G. Teubner, Leipzig, 1915, 62–85. [131, 513, 542]
L EONHARD E ULER (1748a), Introductio in analysin infinitorum, tomus primus et secundus. M.-M. Bousquet,
Lausanne. Opera Omnia, series 1, volume 8 and 9. Teubner, Leipzig, 1922/1945. [62, 90, 132]
L EONHARD E ULER (1748b), Sur une contradiction apparente dans la doctrine des lignes courbes. Mémoires de
l’Académie des Sciences de Berlin 4, 1750, 219–233. Eneström 147. Opera Omnia, series 1, volume 26,
Orell Füssli, Zürich, 1953, 34–45. [198]
L EONHARD E ULER (1748c), Démonstration sur le nombre des points où deux lignes des ordres quelconques
peuvent se couper. Mémoires de l’Académie des Sciences de Berlin 4, 1750, 234–248. Eneström 148.
Opera Omnia, series 1, volume 26, Orell Füssli, Zürich, 1953, 46–59. [197, 198]
L EONHARD E ULER (1754/55), Demonstratio theorematis Fermatiani omnem numerum sive integrum sive
fractum esse summam quatuor pauciorumve quadratorum. Novi commentarii academiae scientiarum
imperalis Petropolitanae 5, 13–58. Summarium ibidem 6–7. Eneström 242. Opera Omnia, series 1,
volume 1, B. G. Teubner, Leipzig, 1915, 339–372. [418]
L EONHARD E ULER (1760/61), Theoremata arithmetica nova methodo demonstrata. Novi commentarii
academiae scientiarum imperalis Petropolitanae 8, 74–104. Summarium ibidem 15–18. Eneström 271.
Opera Omnia, series 1, volume 2, B. G. Teubner, Leipzig, 1915, 531–555. [131]
L EONHARD E ULER (1761), Theoremata circa residua ex divisione potestatum relicta. Novi commentarii
academiae scientiarum imperalis Petropolitanae 7, 49–82. Eneström 262. Opera Omnia, series 1,
volume 2, B. G. Teubner, Leipzig, 1915, 493–518. [76, 418]
L EONHARD E ULER (1762/63), Specimen algorithmi singularis. Novi commentarii academiae scientiarum
imperalis Petropolitanae 9, 1764, 53–69. Summarium ibidem 10–13. Eneström 281. Opera Omnia,
series 1, volume 15, B. G. Teubner, Leipzig, 1927, 31–49. [90]
L EONHARD E ULER (1764), Nouvelle méthode d’éliminer les quantités inconnues des équations. Mémoires de
l’Académie des Sciences de Berlin 20, 1766, 91–104. Eneström 310. Opera Omnia, series 1, volume 6,
B. G. Teubner, Leipzig, 1921, 197–211. [197, 198]
L EONHARD E ULER (1783), De eximio methodi interpolationum in serierum doctrina. Opuscula analytica 1,
157–210. Eneström 555. Opera Omnia, ser. 1, vol. 15, Teubner, Leipzig, 1927, 435–497. [134]
S ERGEI E VDOKIMOV (1994), Factorization of Polynomials over Finite Fields in Subexponential Time under
GRH. In Algorithmic Number Theory, First International Symposium, ANTS-I, Ithaca, NY, USA. Lecture
Notes in Computer Science 877, 209–219. [421]
744 References

D. K. FADDEEV and V. N. FADDEEVA (1963), Computational methods of linear algebra. Freeman,

San Francisco, London. Translated by Robert C. Williams. [132]
ROBERT M. FANO (1949), The transmission of information. Technical Report 65, M.I.T., Research Laboratory
of Electronics. [307]
ROBERT M. FANO (1961), Transmission of information. MIT Press. [307]
J. C. FAUGÈRE, P. G IANNI, D. L AZARD, and T. M ORA (1993), Efficient computation of zero-dimensional
Gröbner bases by change of ordering. Journal of Symbolic Computation 16, 329–344. [619]
W. F ELLER (1971), An Introduction to Probability Theory and its Applications. John Wiley & Sons,
2nd edition. [717]
P IERRE F ERMAT (1636), Letter to Mersenne. In Œuvres de Fermat, vol. 2, Correspondance, eds. PAUL
TANNERY and C HARLES H ENRY, 63–71. Gauthier-Villars, Paris, 1894. French translation in volume 3,
1894, 286–293. [669]
C HARLES M. F IDUCCIA (1972a), Polynomial evaluation via the division algorithm: the fast Fourier transform
revisited. In Proceedings of the Fourth Annual ACM Symposium on Theory of Computing, Denver CO,
ACM Press, 88–93. [306]
C HARLES M. F IDUCCIA (1972b), On obtaining upper bounds on the complexity of matrix multiplication.
In Complexity of Computer Computations, eds. R AYMOND E. M ILLER and JAMES W. T HATCHER,
31–40. Plenum Press, New York. [353]
C HARLES M. F IDUCCIA (1973), On the Algebraic Complexity of Matrix Multiplication. PhD thesis, Brown
University, Providence RI. [353]
P.-J. E. F INCK (1841), Traité élémentaire d’arithmétique à l’usage des candidats aux écoles spéciales.
Derivaux, Strasbourg. [61]
N OAÏ F ITCHAS, A NDRÉ G ALLIGO, and JACQUES M ORGENSTERN (1987), Algorithmes rapides en séquentiel
et en parallèle pour l’élimination de quantificateurs en géometrie élémentaire. Séminaire Structures
Ordonnées, U. E. R. de Mathématiques, Université de Paris VII. [619]
N OAÏ F ITCHAS, A NDRÉ G ALLIGO, and JACQUES M ORGENSTERN (1990), Precise sequential and parallel
complexity bounds for quantifier elimination over algebraically closed fields. Journal of Pure and Applied
Algebra 67, 1–14. [619]
P. F LAJOLET, X. G OURDON, and D. PANARIO (2001), The Complete Analysis of a Polynomial Factorization
Algorithm over Finite Fields. Journal of Algorithms 40(1), 37–81. Extended Abstract in Proceedings of
the 23rd International Colloquium on Automata, Languages and Programming ICALP 1996, Paderborn,
Germany, ed. F. M EYER AUF DER H EIDE and B. M ONIEN, Lecture Notes in Computer Science 1099,
Springer-Verlag, 1996, 232–243. [419]
P HILIPPE F LAJOLET and A NDREW O DLYZKO (1990), Singularity analysis of generating functions. SIAM
Journal on Discrete Mathematics 3(2), 216–240. [697]
P. F LAJOLET, B. S ALVY, and P. Z IMMERMANN (1989a), Lambda–Upsilon–Omega: An Assistant Algorithms
Analyzer. In Algebraic Algorithms and Error-Correcting Codes: AAECC-6, Rome, Italy, 1988, ed.
T. M ORA. Lecture Notes in Computer Science 357, Springer-Verlag, 201–212. [697]
P HILIPPE F LAJOLET, B RUNO S ALVY, and PAUL Z IMMERMANN (1989b), Lambda–Upsilon–Omega—The
1989 CookBook. Rapport de Recherche 1073, INRIA. 116 pages,
http://hal.inria.fr/docs/00/07/54/86/PDF/RR-1073.pdf. [697]
P HILIPPE F LAJOLET, B RUNO S ALVY, and PAUL Z IMMERMANN (1991), Automatic average-case analysis of
algorithms. Theoretical Computer Science 79, 37–109. [697]
M ENSO F OLKERTS (1997), Die älteste lateinische Schrift über das indische Rechnen nach al-Hwārizmı̄ .
Abhandlungen der Bayerischen Akademie der Wissenschaften, Philosophisch-historische Klasse, neue
Folge 113, Verlag der Bayerischen Akademie der Wissenschaften, München. C. H. Beck’sche
Verlagsbuchhandlung, München. [286, 727]
J. B. J. F OURIER (1822), Théorie Analytique de la Chaleur. Firmin Didot, Paris. [247, 727]
T IMOTHY S. F REEMAN, G REGORY M. I MIRZIAN, E RICH K ALTOFEN, and L AKSHMAN YAGATI (1988),
Dagwood: A System for Manipulating Polynomials Given by Straight-Line Programs. ACM Transactions
on Mathematical Software 14(3), 218–240. [498]
R ÚSIŅŠ F REIVALDS (1977), Probabilistic machines can use less running time. In Information
Processing 77—Proceedings of IFIP Congress 77, ed. B. G ILCHRIST, North-Holland, Amsterdam,
839–842. [88]
A LAN M. F RIEZE, J OHAN H ÅSTAD, R AVI K ANNAN, J EFFREY C. L AGARIAS, and A DI S HAMIR (1988),
Reconstructing truncated integer variables satisfying linear congruences. SIAM Journal on
Computing 17(2), 262–280. [505]
F ERDINAND G EORG F ROBENIUS (1881), Über Relationen zwischen den Näherungsbrüchen von Potenzreihen.
Journal für die reine und angewandte Mathematik 90, 1–17. Gesammelte Abhandlungen, Band 2,
herausgegeben von J.-P. S ERRE, Springer-Verlag, Berlin, 1968, 47–63. [132, 197]
References 745

G. F ROBENIUS (1896), Über Beziehungen zwischen den Primidealen eines algebraischen Körpers und den
Substitutionen seiner Gruppe. Sitzungsberichte der Königlich Preussischen Akademie der
Wissenschaften, Berlin, 689–702. [441, 465]
A. F RÖHLICH and J. C. S HEPHERDSON (1955–56), Effective procedures in field theory. Philosophical
Transactions of the Royal Society of London 248, 407–432. [419]
W. F ULTON (1969), Algebraic Curves. W. A. Benjamin, Inc., New York. [568]
M ARTIN F ÜRER (2009), Faster Integer Multiplication. SIAM Journal on Computing 39(3), 979–1005.
[222, 244, 247]
P. X. G ALLAGHER (1973), The large sieve and probabilistic Galois theory. In Analytic Number Theory, ed.
H AROLD G. D IAMOND. Proceedings of Symposia in Pure Mathematics 24, American Mathematical
Society, Providence RI, 91–101. [466]
G. G ALLO and B. M ISHRA (1991), Wu-Ritt Characteristic sets and Their Complexity. In Discrete and
Computational Geometry: Papers from the DIMACS Special Year, eds. JACOB E. G OODMAN, R ICHARD
P OLLACK, and W ILLIAM S TEIGER. DIMACS Series in Discrete Mathematics and Theoretical Computer
Science 6, American Mathematical Society and ACM, 111–136. [619]
É. G ALOIS (1830), Sur la théorie des nombres. Bulletin des sciences mathématiques Férussac 13, 428–435.
See also Journal de Mathématiques Pures et Appliquées 11 (1846), 398–407, and Écrits et mémoires
d’Évariste Galois, eds. ROBERT B OURGNE and J.-P. A ZRA, Gauthier-Villars, Paris, 1962, 112–128.
[198, 418, 421, 724, 728]
TAHER E L G AMAL (1985), A Public Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms.
IEEE Transactions on Information Theory IT-31(4), 469–472. [580]
S HUHONG G AO (2003), Factoring multivariate polynomials via partial differential equations. Mathematics of
Computation 72(242), 801–822. [420]
S HUHONG G AO and J OACHIM VON ZUR G ATHEN (1994), Berlekamp’s and Niederreiter’s Polynomial
Factorization Algorithms. In Finite Fields: Theory, Applications and Algorithms, eds. G. L. M ULLEN
and P. J.-S. S HIUE. Contemporary Mathematics 168, American Mathematical Society, 101–115. [420]
S HUHONG G AO, J OACHIM VON ZUR G ATHEN, and DANIEL PANARIO (1998), Gauss periods: orders and
cryptographical applications. Mathematics of Computation 67(221), 343–352. With microfiche
supplement. [88, 580]
S HUHONG G AO, J OACHIM VON ZUR G ATHEN, DANIEL PANARIO, and V ICTOR S HOUP (2000), Algorithms
for Exponentiation in Finite Fields. Journal of Symbolic Computation 29(6), 879–889. [88, 580]
S HUHONG G AO and DANIEL PANARIO (1997), Tests and Constructions of Irreducible Polynomials over Finite
Fields. In Foundations of Computational Mathematics, eds. F ELIPE C UCKER and M ICHAEL S HUB,
346–361. Springer Verlag. [419, 421]
M ICHAEL R. G AREY and DAVID S. J OHNSON (1979), Computers and intractability: A Guide to the Theory of
NP-Completeness. W. H. Freeman and Co., San Francisco CA. [509, 722]
H ARVEY L. G ARNER (1959), The Residue Number System. IRE Transactions on Electronic Computers,
140–147. [132]
J OACHIM VON ZUR G ATHEN (1984a), Hensel and Newton methods in valuation rings. Mathematics of
Computation 42(166), 637–661. [419, 466, 497, 500]
J OACHIM VON ZUR G ATHEN (1984b), Parallel algorithms for algebraic problems. SIAM Journal on
Computing 13(4), 802–824. [197, 199]
J OACHIM VON ZUR G ATHEN (1985), Irreducibility of Multivariate Polynomials. Journal of Computer and
System Sciences 31(2), 225–264. [466, 497, 498, 724]
J OACHIM VON ZUR G ATHEN (1986), Representations and parallel computations for rational functions. SIAM
Journal on Computing 15(2), 432–452. [131]
J OACHIM VON ZUR G ATHEN (1987), Factoring polynomials and primitive elements for special primes.
Theoretical Computer Science 52, 77–89. [421]
J OACHIM VON ZUR G ATHEN (1988), Algebraic complexity theory. Annual Review of Computer Science 3,
317–347. [352]
J OACHIM VON ZUR G ATHEN (1990a), Functional Decomposition of Polynomials: the Tame Case. Journal of
Symbolic Computation 9, 281–299. [286, 580, 581]
J OACHIM VON ZUR G ATHEN (1990b), Functional Decomposition of Polynomials: the Wild Case. Journal of
Symbolic Computation 10, 437–452. [580]
J OACHIM VON ZUR G ATHEN (1991a), Tests for permutation polynomials. SIAM Journal on Computing 20(3),
591–602. [497]
J OACHIM VON ZUR G ATHEN (1991b), Values of polynomials over finite fields. Bulletin of the Australian
Mathematical Society 43, 141–146. [425]
J OACHIM VON ZUR G ATHEN and J ÜRGEN G ERHARD (1996), Arithmetic and Factorization of Polynomials
over F2 . In Proceedings of the 1996 International Symposium on Symbolic and Algebraic Computation
ISSAC ’96, Zürich, Switzerland, ed. L AKSHMAN Y. N., ACM Press, 1–9. Technical report
746 References

tr-rsfb-96-018, University of Paderborn, Germany, 1996, 43 pages. Final version in Mathematics of

Computation. http://www.juergen-gerhard.net/polyfactTR.ps. [279, 287, 467]
J OACHIM VON ZUR G ATHEN and J ÜRGEN G ERHARD (1997), Fast Algorithms for Taylor Shifts and Certain
Difference Equations. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic
Computation ISSAC ’97, Maui HI, ed. W OLFGANG W. K ÜCHLIN, ACM Press, 40–47. [669, 670]
J OACHIM VON ZUR G ATHEN and S ILKE H ARTLIEB (1998), Factoring Modular Polynomials. Journal of
Symbolic Computation 26(5), 583–606. Extended abstract in Proceedings of the 1996 International
Symposium on Symbolic and Algebraic Computation ISSAC ’96, Zürich, Switzerland, 10-17. [466]
J OACHIM VON ZUR G ATHEN and E RICH K ALTOFEN (1985), Factoring Sparse Multivariate Polynomials.
Journal of Computer and System Sciences 31(2), 265–287. [497]
J OACHIM VON ZUR G ATHEN, M AREK K ARPINSKI, and I GOR E. S HPARLINSKI (1996), Counting curves and
their projections. computational complexity 6, 64–99. Extended abstract in Proceedings of the
Twenty-fifth Annual ACM Symposium on Theory of Computing, San Diego CA (1993), 805-812. [198]
J OACHIM VON ZUR G ATHEN and T HOMAS L ÜCKING (2000), Subresultants revisited. In Proceedings of
LATIN 2000, Punta del Este, Uruguay, eds. G ASTÓN H. G ONNET, DANIEL PANARIO, and A LFREDO
V IOLA. Lecture Notes in Computer Science 1776, Springer-Verlag, 318–342. Final version in von zur
Gathen & Lücking (2003). [197]
J OACHIM VON ZUR G ATHEN and T HOMAS L ÜCKING (2003), Subresultants revisited. Theoretical Computer
Science 297, 199–239. [199, 746]
J OACHIM VON ZUR G ATHEN and M ICHAEL N ÖCKER (1997), Exponentiation in Finite Fields: Theory and
Practice. In Applied Algebra, Algebraic Algorithms and Error-Correcting Codes: AAECC-12, Toulouse,
France, eds. T EO M ORA and H AROLD M ATTSON. Lecture Notes in Computer Science 1255,
Springer-Verlag, 88–113. [88, 580]
J OACHIM VON ZUR G ATHEN and M ICHAEL N ÖCKER (1999), Computing Special Powers in Finite Fields:
Extended Abstract. In Proceedings of the 1999 International Symposium on Symbolic and Algebraic
Computation ISSAC ’99, Vancouver, Canada, ed. S AM D OOLEY, ACM Press, 83–90. [88]
J OACHIM VON ZUR G ATHEN and DANIEL PANARIO (2001), Factoring Polynomials Over Finite Fields:
A Survey. Journal of Symbolic Computation 31(1–2), 3–17. [419]
J OACHIM VON ZUR G ATHEN and V ICTOR S HOUP (1992), Computing Frobenius maps and factoring
polynomials. computational complexity 2, 187–224. [405, 406, 419, 420, 467]
J OACHIM VON ZUR G ATHEN and I GOR E. S HPARLINSKI (2006), GCD of Random Linear Combinations.
Algorithmica 46(1), 137–148. [199]
C ARL F RIEDRICH G AUSS (1801), Disquisitiones Arithmeticae. Gerh. Fleischer Iun., Leipzig. English
translation by A RTHUR A. C LARKE, Springer-Verlag, New York, 1986. [131, 197, 372, 497, 728]
C ARL F RIEDRICH G AUSS (1809), Theoria motus corporum coelestium in sectionibus conicis solem ambientum.
Perthes und Besser, Hamburg. Werke VII, Königliche Gesellschaft der Wissenschaften, Göttingen, 1906,
1–288. Reprinted by Georg Olms Verlag, Hildesheim New York, 1973. [131]
C ARL F RIEDRICH G AUSS (1810), Disquisitio de elementis ellipticis Palladis ex oppositionibus annorum 1803,
1804, 1805, 1807, 1808, 1809. Commentationes societatis regiae scientarium Gottingensis recentiores 1
(1811), 3–24. Werke VI, Königliche Gesellschaft der Wissenschaften, Göttingen, 1874, 3–24. Reprinted
by Georg Olms Verlag, Hildesheim New York, 1973. Announcement in Göttingische gelehrte Anzeigen
(1810), Werke VI, 1874, 61–64. [131]
C ARL F RIEDRICH G AUSS (1831), Theoria residuorum biquadraticorum, commentatio secunda.
Commentationes societatis regiae scientiarum Gottingensis recentiores 7 (1832). Werke II, Königliche
Gesellschaft der Wissenschaften, Göttingen, 1863, 93–148. Reprinted by Georg Olms Verlag, Hildesheim
New York, 1973. Announcement in Göttingische gelehrte Anzeigen (1831), Werke II, 1863, 169–178.
[724]
C ARL F RIEDRICH G AUSS (1849), Brief an Encke, 24. Dezember 1849. In Werke II, Handschriftlicher
Nachlass, 444–447. Königliche Gesellschaft der Wissenschaften, Göttingen, 1863. Reprinted by Georg
Olms Verlag, Hildesheim New York, 1973. [533]
C ARL F RIEDRICH G AUSS (1863a), Solutio congruentiae X m − 1 ≡ 0. Analysis residuorum. Caput sextum. Pars
prior. In Werke II, Handschriftlicher Nachlass, ed. R. D EDEKIND, 199–211. Königliche Gesellschaft der
Wissenschaften, Göttingen. Reprinted by Georg Olms Verlag, Hildesheim New York, 1973.
[417, 466, 724]
C ARL F RIEDRICH G AUSS (1863b), Disquisitiones generales de congruentiis. Analysis residuorum caput
octavum. In Werke II, Handschriftlicher Nachlass, ed. R. D EDEKIND, 212–240. Königliche Gesellschaft
der Wissenschaften, Göttingen. Reprinted by Georg Olms Verlag, Hildesheim New York, 1973.
[62, 197, 373, 417, 418, 419, 421, 466, 729]
C ARL F RIEDRICH G AUSS (1863c), Zur Theorie der complexen Zahlen. In Werke II, Handschriftlicher
Nachlass, 387–398. Königliche Gesellschaft der Wissenschaften, Göttingen. Reprinted by Georg Olms
Verlag, Hildesheim New York, 1973. [724]
References 747

C ARL F RIEDRICH G AUSS (1866), Theoria interpolationis methodo nova tractata. In Werke III, Nachlass,
265–330. Königliche Gesellschaft der Wissenschaften, Göttingen. Reprinted by Georg Olms Verlag,
Hildesheim New York, 1973. [90, 247]
L EOPOLD G EGENBAUER (1884), Asymptotische Gesetze der Zahlentheorie. Denkschriften der kaiserlichen
Akademie der Wissenschaften Wien 49, 37–80. [62]
W. M. G ENTLEMAN and G. S ANDE (1966), Fast Fourier transforms—for fun and profit. In Proceedings of the
Fall Joint Computer Conference, San Francisco CA. AFIPS Conference Proceedings 29, Spartan books,
Washington DC, 563–578. [247]
F RANÇOIS G ENUYS (1958), Dix mille décimales de π. Chiffres 1, 17–22. [82]
J OSEPH D IAZ G ERGONNE (1822), De la recherche des facteurs rationnels des polynomes. Annales de
mathématiques pures et appliquées 12, 309–316. [465]
J ÜRGEN G ERHARD (1998), High degree solutions of low degree equations. In Proceedings of the 1998
International Symposium on Symbolic and Algebraic Computation ISSAC ’98, Rostock, Germany, ed.
O LIVER G LOOR, ACM Press, 284–289. [674]
J ÜRGEN G ERHARD (2001a), Fast Modular Algorithms for Squarefree Factorization and Hermite Integration.
Applicable Algebra in Engineering, Communication and Computing 11(3), 203–226. [470, 640]
J ÜRGEN G ERHARD (2001b), Modular algorithms in symbolic summation and symbolic integration. Lecture
Notes in Computer Science 3218, Springer-Verlag, Berlin, Heidelberg. [641, 670]
J. G ERHARD, M. G IESBRECHT, A. S TORJOHANN, and E. V. Z IMA (2003), Shiftless Decomposition and
Polynomial-time Rational Summation. In Proceedings of the 2003 International Symposium on Symbolic
and Algebraic Computation ISSAC2003, Philadelphia PA, ed. J. R. S ENDRA, ACM Press, 119–126. [671]
M. G IESBRECHT, A. L OBO, and B. D. S AUNDERS (1998), Certifying Inconsistency of Sparse Linear Systems.
In Proceedings of the 1998 International Symposium on Symbolic and Algebraic Computation
ISSAC ’98, Rostock, Germany, ed. O LIVER G LOOR, ACM Press, 113–119. [353]
M ARK G IESBRECHT, A RNE S TORJOHANN, and G ILLES V ILLARD (2003), Algorithms for Matrix Canonical
Forms. In Computer Algebra Handbook – Foundations, Applications, Systems, eds. J OHANNES
G RABMEIER, E RICH K ALTOFEN, and VOLKER W EISPFENNING, 38–41. Springer-Verlag, Berlin,
Heidelberg, New York. [353]
J OHN G ILL (1977), Computational complexity of probabilistic Turing machines. SIAM Journal on
Computing 6(4), 675–695. [198]
A LESSANDRO G IOVINI, T EO M ORA, G IANFRANCO N IESI, L ORENZO ROBBIANO, and C ARLO T RAVERSO
(1991), “One sugar cube, please” or Selection strategies in the Buchberger algorithm. In Proceedings of
the 1991 International Symposium on Symbolic and Algebraic Computation ISSAC ’91, Bonn, Germany,
ed. S TEPHEN M. WATT, ACM Press, 49–54. [619]
M. G IUSTI (1984), Some effectivity problems in polynomial ideal theory. In Proceedings of EUROSAM ’84,
Cambridge, UK, ed. J OHN F ITCH, Lecture Notes in Computer Science 174, 159–171. Springer-Verlag,
Berlin. [618]
M ARC G IUSTI and J OOS H EINTZ (1991), Algorithmes – disons rapides – pour la décomposition d’une variété
algébrique en composantes irréductibles et équidimensionnelles. In Proceedings of Effective Methods in
Algebraic Geometry MEGA ’90, eds. T EO M ORA and C ARLO T RAVERSO. Progress in Mathematics 94,
Birkhäuser Verlag, Basel, 169–193. [619]
N OBUHIRO G Ō and H AROLD A. S CHERAGA (1970), Ring Closure and Local Conformational Deformations of
Chain Molecules. Macromolecules 3(2), 178–187. [698]
H ERMANN H. G OLDSTINE (1977), A History of Numerical Analysis from the 16th through the 19th Century.
Studies in the History of Mathematics and Physical Sciences 2, Springer-Verlag, New York. [286]
R. M. F. G OODMAN and A. J. M C AULEY (1984), A New Trapdoor Knapsack Public Key Cryptosystem.
In Advances in Cryptology: Proceedings of EUROCRYPT 1984, Paris, France, eds. T. B ETH, N. C OT,
and I. I NGEMARSSON. Lecture Notes in Computer Science 209, Springer-Verlag, Berlin, 150–158. [509]
PAUL G ORDAN (1885), Vorlesungen über Invariantentheorie. Erster Band: Determinanten. B. G. Teubner,
Leipzig. Herausgegeben von G EORG K ERSCHENSTEINER. [199, 332]
DANIEL M. G ORDON (1993), Discrete logarithms in GF(p) using the number field sieve. SIAM Journal on
Discrete Mathematics 6(1), 124–138. [579]
R. W ILLIAM G OSPER , J R . (1978), Decision procedure for indefinite hypergeometric summation. Proceedings
of the National Academy of Sciences of the USA 75(1), 40–42. [641, 662, 670, 671, 675]
R. G ÖTTFERT (1994), An acceleration of the Niederreiter factorization algorithm in characteristic 2.
Mathematics of Computation 62(206), 831–839. [420]
X AVIER G OURDON (1996), Combinatoire, Algorithmique et Géométrie des Polynômes. PhD thesis, École
Polytechnique, Paris. [419]
R. L. G RAHAM, D. E. K NUTH, and O. PATASHNIK (1994), Concrete Mathematics. Addison-Wesley,
Reading MA, 2nd edition. First edition 1989. [571, 669, 670, 717, 720]
J. P. G RAM (1883), Ueber die Entwickelung reeller Functionen in Reihen mittelst der Methode der kleinsten
Quadrate. Journal für die reine und angewandte Mathematik 94, 41–73. [496]
748 References

A NDREW G RANVILLE (1990), Bounding the Coefficients of a Divisor of a Given Polynomial. Monatshefte für
Mathematik 109, 271–277. [198]
D. Y U . G RIGOR ’ EV (1988), Complexity of deciding Tarski algebra. Journal of Symbolic Computation 4(1/2).
[619]
D IMA Y U . G RIGORIEV, M AREK K ARPINSKI, and M ICHAEL F. S INGER (1990), Fast parallel algorithms for
sparse multivariate polynomial interpolation over finite fields. SIAM Journal on Computing 19(6),
1059–1063. [498]
D IMA G RIGORIEV, M AREK K ARPINSKI, and M ICHAEL F. S INGER (1994), Computational complexity of
sparse rational interpolation. SIAM Journal on Computing 23(1), 1–11. [498]
H. F. DE G ROOTE (1987), Lectures on the Complexity of Bilinear Problems. Lecture Notes in Computer
Science 245, Springer-Verlag. [352]
M ARTIN G RÖTSCHEL, L ÁSZLÓ L OVÁSZ, and A LEXANDER S CHRIJVER (1993), Geometric Algorithms and
Combinatorial Optimization. Algorithms and Combinatorics 2, Springer-Verlag, Berlin, Heidelberg,
2nd edition. First edition 1988. [496]
L. J. G UIBAS and A. M. O DLYZKO (1980), Long Repetitive Patterns in Random Sequences. Zeitschrift für
Wahrscheinlichkeitstheorie und verwandte Gebiete 53, 241–262. [205]
R ICHARD K. G UY (1975), How to factor a number. In Proceedings of the Fifth Manitoba Conference on
Numerical Mathematics, 49–89. [568, 569]
WALTER H ABICHT (1948), Eine Verallgemeinerung des Sturmschen Wurzelzählverfahrens. Commentarii
Mathematici Helvetici 21, 99–116. [199]
J. H ADAMARD (1893), Résolution d’une question relative aux déterminants. Bulletin des Sciences
Mathématiques 17, 240–246. [496]
J. H ADAMARD (1896), Sur la distribution des zéros de la fonction ζ(s) et ses conséquences arithmétiques.
Bulletin de la Société mathématique de France 24, 199–220. [533]
A RMIN H AKEN (1985), The intractability of resolution. Theoretical Computer Science 39, 297–308. [678]
PAUL R. H ALMOS (1985), I want to be a mathematician. Springer-Verlag. [533]
J OHN H. H ALTON (1970), A retrospective and prospective survey of the Monte Carlo method. SIAM
Review 12(1), 1–63. [198]
R ICHARD W. H AMMING (1986), Coding and Information Theory. Prentice-Hall, Inc., Englewood Cliffs NJ,
2nd edition. First edition 1980. [308]
G. H. H ARDY (1937), The Indian Mathematician Ramanujan. The American Mathematical Monthly 44,
137–155. Collected Papers, volume VII, Clarendon Press, Oxford, 1979, 612–630. [535]
G ODFREY H AROLD H ARDY (1940), A mathematician’s apology. Cambridge University Press,
Cambridge, UK. [26, 726, 728]
G. H. H ARDY and E. M. W RIGHT (1985), An introduction to the theory of numbers. Clarendon Press, Oxford,
5th edition. First edition 1938. [62, 421, 532, 534]
W ILLIAM H ART, M ARK VAN H OEIJ, and A NDREW N OVOCIN (2011), Practical Polynomial Factoring in
Polynomial Time. In Proceedings of the 2011 International Symposium on Symbolic and Algebraic
Computation ISSAC2011, San Jose CA, ed. A NTON L EYKIN, ACM Press, 163–170. [497]
ROBIN H ARTSHORNE (1977), Algebraic Geometry. Graduate Texts in Mathematics 52, Springer-Verlag,
New York. [568]
M. W. H ASKELL (1891/92), Note on resultants. Bulletin of the New York Mathematical Society 1, 223–224.
[332]
H ELMUT H ASSE (1933), Beweis des Analogons der Riemannschen Vermutung für die Artinschen und F. K.
Schmidtschen Kongruenzzetafunktionen in gewissen elliptischen Fällen. Vorläufige Mitteilung.
Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische
Klasse 42, 253–262. [568]
J OHAN H ÅSTAD and M ATS N ÄSLUND (1998), The Security of Individual RSA Bits. In Proceedings of the 39th
Annual IEEE Symposium on Foundations of Computer Science, Palo Alto CA, IEEE Computer Society
Press, Los Alamitos CA, 510–519. [580]
T IMOTHY F. H AVEL and I GOR NAJFELD (1995), A new system of equations, based on geometric algebra,
for the ring closure in cyclic molecules. In Computer Algebra in Science and Engineering, Bielefeld,
Germany, August 1994, eds. J. F LEISCHER, J. G RABMEIER, F. W. H EHL, and W. K ÜCHLIN, World
Scientific, Singapore, 243–259. [698]
P. H AZEBROEK and L. J. O OSTERHOFF (1951), The isomers of cyclohexane. Discussions of the Faraday
Society 10, 88–93. [698]
T HOMAS L. H EATH, ed. (1925), The thirteen books of Euclid’s elements , vol. 1. Dover Publications, Inc.,
New York, Second edition. First edition appeared 1908. Translated from the text of Heiberg. [24, 25]
M ICHAEL T. H EIDEMAN, D ON H. J OHNSON, and C. S IDNEY B URRUS (1984), Gauss and the history of the
Fast Fourier Transform. IEEE ASSP Magazine, 14–21. [247]
H. H EILBRONN (1968), On the average length of a class of finite continued fractions. In Abhandlungen aus
Zahlentheorie und Analysis. Zur Erinnerung an Edmund Landau (1877–1938), ed. PAUL T ÚRAN, 87–96.
References 749

VEB Deutscher Verlag der Wissenschaften, Berlin. Also in Number Theory and Analysis, a Collection of
Papers in Honor of Edmund Landau (1877–1938), Plenum Press, New York, 1969. [61]
J OOS H EINTZ, TOMAS R ECIO, and M ARIE -F RANÇOISE ROY (1991), Algorithms in Real Algebraic Geometry
and Applications to Computational Geometry. In Discrete and Computational Geometry: Papers from the
DIMACS Special Year, eds. JACOB E. G OODMAN, R ICHARD P OLLACK, and W ILLIAM S TEIGER.
DIMACS Series in Discrete Mathematics and Theoretical Computer Science 6, American Mathematical
Society and ACM, 137–163. [619]
J OOS H EINTZ and M ALTE S IEVEKING (1981), Absolute Primality of Polynomials is Decidable in Random
Polynomial Time in the Number of Variables. In Proceedings of the 8th International Colloquium on
Automata, Languages and Programming ICALP 1981, Acre (‘Akko), Israel. Lecture Notes in Computer
Science 115, Springer-Verlag, 16–27. [497]
P ETER A. H ENDRIKS and M ICHAEL F. S INGER (1999), Solving Difference Equations in Finite Terms. Journal
of Symbolic Computation 27, 239–259. [671]
K URT H ENSEL (1918), Eine neue Theorie der algebraischen Zahlen. Mathematische Zeitschrift 2, 433–452.
[444, 466]
G RETE H ERMANN (1926), Die Frage der endlich vielen Schritte in der Theorie der Polynomideale.
Mathematische Annalen 95, 736–788. [616]
C. H ERMITE (1872), Sur l’intégration des fractions rationnelles. Annales de Mathématiques, 2 ème série 11,
145–148. [640]
N ICHOLAS J. H IGHAM (1990), Exploiting Fast Matrix Multiplication Within the Level 3 BLAS. ACM
Transactions on Mathematical Software 16(4), 352–368. [337]
DAVID H ILBERT (1890), Ueber die Theorie der algebraischen Formen. Mathematische Annalen 36, 473–534.
[586, 616, 618]
DAVID H ILBERT (1892), Ueber die Irreducibilität ganzer rationaler Functionen mit ganzzahligen Coefficienten.
Journal für die reine und angewandte Mathematik 110, 104–129. [495, 586]
DAVID H ILBERT (1893), Ueber die Transcendenz der Zahlen e und π. Mathematische Annalen 43, 216–219.
Nachrichten von der Königlichen Gesellschaft der Wissenschaften und der Georg-Augusts-Universität zu
Göttingen 2 (1893), 113–116. Reprinted in Berggren, Borwein & Borwein (1997), 226–229. [90]
DAVID H ILBERT (1900), Mathematische Probleme. Nachrichten von der Königlichen Gesellschaft der
Wissenschaften zu Göttingen, 253–297. Archiv für Mathematik und Physik, 3. Reihe 1 (1901), 44–63 and
213–237. English translation: Mathematical Problems, Bulletin of the American Mathematical Society 8
(1902), 437–479. [587, 726]
DAVID H ILBERT (1930), Probleme der Grundlegung der Mathematik. Mathematische Annalen 102, 1–9. [419]
H EISUKE H IRONAKA (1964), Resolution of singularities of an algebraic variety over a field of characteristic
zero. Annals of Mathematics 79(1), I: 109–203, II: 205–326. [591]
A. H OCQUENGHEM (1959), Codes correcteurs d’erreurs. Chiffres 2, 147–156. [215]
M ARK VAN H OEIJ (1998), Rational Solutions of Linear Difference Equations. In Proceedings of the 1998
International Symposium on Symbolic and Algebraic Computation ISSAC ’98, Rostock, Germany, ed.
O LIVER G LOOR, ACM Press, 120–123. [671]
M ARK VAN H OEIJ (1999), Finite singularities and hypergeometric solutions of linear recurrence equations.
Journal of Pure and Applied Algebra 139, 109–131. [671]
M ARK VAN H OEIJ (2002), Factoring polynomials and the knapsack problem. Journal of Number Theory 96(2),
167–189. [497]
J ORIS VAN DER H OEVEN (1997), Lazy Multiplication of Formal Power Series. In Proceedings of the 1997
International Symposium on Symbolic and Algebraic Computation ISSAC ’97, Maui HI, ed.
W OLFGANG W. K ÜCHLIN, ACM Press, 17–20. [469]
C. M. H OFFMAN, J. R. S ENDRA, and F. W INKLER, eds. (1997), Parametric Algebraic Curves and
Applications. Special Issue of the Journal of Symbolic Computation 23(2/3). [618]
D. G. H OFFMAN, D. A. L EONARD, C. C. L INDNER, K. T. P HELPS, C. A. RODGER, and J. R. WALL (1991),
Coding Theory: The Essentials. Marcel Dekker, Inc., New York. [215]
E LLIS H OROWITZ (1971), Algorithms for partial fraction decomposition and rational function integration.
In Proceedings 2nd ACM Symposium on Symbolic and Algebraic Manipulation, Los Angeles CA, ed.
S. R. P ETRICK, ACM Press, 441–457. [627]
E LLIS H OROWITZ (1972), A fast method for interpolation using preconditioning. Information Processing
Letters 1, 157–163. [306]
J EREMY H ORWITZ and R AMARATHNAM V ENKATESAN (2002), Random Cayley Digraphs and the Discrete
Logarithm. In Algorithmic Number Theory Symposium V, ANTS-V, eds. C LAUS F IEKER and DAVID R.
KOHEL. Lecture Notes in Computer Science 2369, Springer-Verlag, 100–114. [567]
M ING -D EH A. H UANG (1985), Riemann Hypothesis and Finding Roots over Finite Fields. In Proceedings of
the Seventeenth Annual ACM Symposium on Theory of Computing, Providence RI, ACM Press,
121–130. [421]
750 References

M ING -D EH H UANG and Y IU -C HUNG W ONG (1998), Extended Hilbert Irreducibility and its Applications.
In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms SODA ’98, 50–58.
[498]
X IAOHAN H UANG and V ICTOR Y. PAN (1998), Fast Rectangular Matrix Multiplication and Applications.
Journal of Complexity 14, 257–299. [353, 405, 420]
DAVID A. H UFFMAN (1952), A Method for the Construction of Minimum-Redundancy Codes. Proceedings of
the I.R.E. 40(9), 1098–1101. [307, 368]
C HRISTIANUS H UGENIUS [C HRISTIAAN H UYGENS ] (1703), Descriptio Automati Planetarii. In Opuscula
postuma, quae continent: Dioptricam. Commentarios de vitris figurandis. Dissertationem de corona &
parheliis. Tractatum de motu/de vi centrifuga. Descriptionem automati planetarii. Cornelius Boutesteyn,
Leyden. [89]
T HOMAS W. H UNGERFORD (1990), Abstract Algebra: An Introduction. Saunders College Publishing,
Philadelphia PA. [703]
A. H URWITZ (1891), Ueber die angenäherte Darstellung der Irrationalzahlen durch rationale Brüche.
Mathematische Annalen 39, 279–284. [90]
D UNG T. H UYNH (1986), A Superexponential Lower Bound for Gröbner Bases and Church-Rosser
Commutative Thue Systems. Information and Control 68(1–3), 196–206. [618]
C. G. J. JACOBI (1836), De eliminatione variabilis e duabus aequationibus algebraicis. Journal für die reine und
angewandte Mathematik 15, 101–124. [197]
C. G. J. JACOBI (1846), Über die Darstellung einer Reihe gegebner Werthe durch eine gebrochne rationale
Function. Journal für die reine und angewandte Mathematik 30, 127–156. [132, 197]
C. G. J. JACOBI (1868), Allgemeine Theorie der kettenbruchähnlichen Algorithmen, in welchen jede Zahl aus
drei vorhergehenden gebildet wird. Journal für die reine und angewandte Mathematik 69, 29–64. [91]
T UDOR J EBELEAN (1997), Practical Integer Division with Karatsuba Complexity. In Proceedings of the 1997
International Symposium on Symbolic and Algebraic Computation ISSAC ’97, Maui HI, ed.
W OLFGANG W. K ÜCHLIN, ACM Press, 339–341. [286]
DAVID S. J OHNSON (1990), A Catalog of Complexity Classes. In Handbook of Theoretical Computer Science,
vol. A, ed. J. VAN L EEUWEN, 67–161. Elsevier Science Publishers B.V., Amsterdam, and The MIT Press,
Cambridge MA. [724]
W ILLIAM J ONES (1706), Synopsis Palmariorum Matheseos: or, a New Introduction to the Mathematics,
London. [90]
C HARLES J ORDAN (1965), Calculus of finite differences. Chelsea Publishing Company, New York. First
edition Röttig and Romwalter, Sopron, Hungary, 1939. [669]
N ORBERT K AJLER and N EIL S OIFFER (1998), A Survey of User Interfaces for Computer Algebra Systems.
Journal of Symbolic Computation 25, 127–159. [21]
K. K ALORKOTI (1993), Inverting polynomials and formal power series. SIAM Journal on Computing 22(3),
552–559. [286]
E. K ALTOFEN (1982), Factorization of Polynomials. In Computer Algebra, Symbolic and Algebraic
Computation, eds. B. B UCHBERGER, G. E. C OLLINS, and R. L OOS, 95–113. Springer-Verlag,
New York, 2nd edition. [419]
E RICH K ALTOFEN (1983), On the Complexity of Finding Short Vectors in Integer Lattices. In Proceedings of
EUROCAL 1983, London, UK. Lecture Notes in Computer Science 162, Springer-Verlag, Berlin /
New York, 236–244. [497]
E RICH K ALTOFEN (1984), A Note on the Risch Differential Equation. In Proceedings of EUROSAM ’84,
Cambridge, UK, ed. J OHN F ITCH. Lecture Notes in Computer Science 174, Springer-Verlag, Berlin,
359–366. [641]
E RICH K ALTOFEN (1985a), Polynomial-time reductions from multivariate to bi- and univariate integral
polynomial factorization. SIAM Journal on Computing 14(2), 469–489. [497]
E RICH K ALTOFEN (1985b), Effective Hilbert Irreducibility. Journal of Computer and System Sciences 66,
123–137. [498]
E. K ALTOFEN (1989), Factorization of Polynomials Given by Straight-Line Programs. In Randomness and
Computation, ed. S. M ICALI, JAI Press, Greenwich CT, 375–412. [495, 497]
E. K ALTOFEN (1990), Polynomial factorization 1982–1986. In Computers in Mathematics, eds. D. V.
C HUDNOVSKY and R. D. J ENKS, Marcel Dekker, Inc., New York, 285–309. [419]
E. K ALTOFEN (1992), Polynomial Factorization 1987–1991. In Proceedings of LATIN ’92, São Paulo, Brazil,
ed. I. S IMON. Lecture Notes in Computer Science 583, Springer-Verlag, 294–313. [419]
E RICH K ALTOFEN (1995a), Effective Noether Irreducibility Forms and Applications. Journal of Computer and
System Sciences 50(2), 274–295. [498]
E RICH K ALTOFEN (1995b), Analysis of Coppersmith’s block Wiedemann algorithm for the parallel solution of
sparse linear systems. Mathematics of Computation 64(210), 777–806. [353]
References 751

E RICH K ALTOFEN (2000), Challenges of Symbolic Computation: My Favourite Open Problems. Journal of
Symbolic Computation 29(6), 891–919. With an Additional Open Problem By ROBERT M. C ORLESS
and DAVID J. J EFFREY. [353]
E RICH K ALTOFEN and L AKSHMAN YAGATI (1988), Improved Sparse Multivariate Polynomial Interpolation
Algorithms. In Proceedings of the 1988 International Symposium on Symbolic and Algebraic
Computation ISSAC ’88, Rome, Italy, ed. P. G IANNI. Lecture Notes in Computer Science 358,
Springer-Verlag, 467–474. [498]
E. K ALTOFEN and A. L OBO (1994), Factoring High-Degree Polynomials by the Black Box Berlekamp
Algorithm. In Proceedings of the 1994 International Symposium on Symbolic and Algebraic Computation
ISSAC ’94, Oxford, UK, eds. J. VON ZUR G ATHEN and M. G IESBRECHT, ACM Press, 90–98. [404, 405]
E RICH K ALTOFEN, DAVID R. M USSER, and B. DAVID S AUNDERS (1983), A generalized class of polynomials
that are hard to factor. SIAM Journal on Computing 12(3), 473–483. [465]
E RICH K ALTOFEN and H EINRICH ROLLETSCHEK (1989), Computing greatest common divisors and
factorizations in quadratic number fields. Mathematics of Computation 53(188), 697–720. [132]
E RICH K ALTOFEN and B. DAVID S AUNDERS (1991), On Wiedemann’s Method of Solving Sparse Linear
Systems. In Algebraic Algorithms and Error-Correcting Codes: AAECC-10, San Juan de Puerto Rico.
Lecture Notes in Computer Science 539, Springer-Verlag, 29–38. [340, 351, 404]
E RICH K ALTOFEN and V ICTOR S HOUP (1997), Fast Polynomial Factorization Over High Algebraic Extensions
of Finite Fields. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic
Computation ISSAC ’97, Maui HI, ed. W OLFGANG W. K ÜCHLIN, ACM Press, 184–188. [420]
E RICH K ALTOFEN and V ICTOR S HOUP (1998), Subquadratic-Time Factoring of Polynomials over Finite
Fields. Mathematics of Computation 67(223), 1179–1197. Extended Abstract in Proceedings of the
Twenty-seventh Annual ACM Symposium on the Theory of Computing, Las Vegas NV, ACM Press,
1995, 398–406. [401, 405, 406, 420]
E RICH K ALTOFEN and BARRY M. T RAGER (1990), Computing with Polynomials Given By Black Boxes for
Their Evaluations: Greatest Common Divisors, Factorization, Separation of Numerators and
Denominators. Journal of Symbolic Computation 9, 301–320. [496, 498]
M ICHAEL K AMINSKI, DAVID G. K IRKPATRICK, and NADER H. B SHOUTY (1988), Addition Requirements for
Matrix and Transposed Matrix Products. Journal of Algorithms 9, 354–364. [353]
YASUMASA K ANADA (1988), Vectorization of Multiple-Precision Arithmetic Program and 201, 326, 000
Decimal Digits of π Calculation. In Supercomputing ’88, Volume II: Science and Applications, 117–128.
Reprinted in Berggren, Borwein & Borwein (1997), 576–587. [247]
R AVI K ANNAN (1987), Algorithmic geometry of numbers. Annual Review of Computer Science 2, 231–267.
[496]
A. A. K ARATSUBA (1995), The Complexity of Computations. Proceedings of the Steklov Institute of
Mathematics 211, 169–183. Translated from Trudy Matematiqeskogo Instituta imeni
V. A. Steklova 211 (1995), 186–202. [247]
A. Karacuba i . Ofman (1962), Umnoжenie mnogoznaqnyh qisel na avtomatah.
Doklady Akademii Nauk SSSR 145, 293–294. A. K ARATSUBA and Y U . O FMAN, Multiplication
of multidigit numbers on automata, Soviet Physics–Doklady 7 (1963), 595–596. [223, 245]
A LAN H. K ARP and P ETER M ARKSTEIN (1997), High-Precision Division and Square Root. ACM
Transactions on Mathematical Software 23(4), 561–589. [286]
R ICHARD M. K ARP (1972), Reducibility among combinatorial problems. In Complexity of computer
computations, eds. R AYMOND E. M ILLER and JAMES W. T HATCHER, 85–103. Plenum Press, New York.
[509, 722]
M ICHAEL K ARR (1981), Summation in Finite Terms. Journal of the ACM 28(2), 305–350. [671]
M ICHAEL K ARR (1985), Theory of Summation in Finite Terms. Journal of Symbolic Computation 1, 303–315.
[671]
K IRAN S. K EDLAYA and C HRISTOPHER U MANS (2008), Fast modular composition in any characteristic.
In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science,
Philadelphia, PA, IEEE Computer Society Press, 146–155. [751]
K IRAN S. K EDLAYA and C HRISTOPHER U MANS (2009), Fast polynomial factorization and modular
composition. Merged work of Kedlaya & Umans (2008) and Umans (2008). SIAM Journal on Computing,
to appear. Conference version in Proceedings of the 49th Annual IEEE Symposium on Foundations of
Computer Science, Philadelphia, PA, 481-490. IEEE Computer Society Press. [339, 405, 406, 408, 420]
WALTER K ELLER -G EHRIG (1985), Fast algorithms for the characteristic polynomial. Theoretical Computer
Science 36, 309–317. [352]
H. K EMPFERT (1969), On the Factorization of Polynomials. Journal of Number Theory 1, 116–120. [417, 466]
T HORSTEN K LEINJUNG, K AZUMARO AOKI, J ENS F RANKE, A RIEN K. L ENSTRA, E MMANUEL T HOMÉ,
J OPPE W. B OS, P IERRICK G AUDRY, A LEXANDER K RUPPA P ETER L. M ONTGOMERY, DAG A RNE
O SVIK, H ERMAN TE R IELE, A NDREY T IMOFEEV and PAUL Z IMMERMANN (2010), Factorization of a
768-Bit RSA Modulus. In Advances in Cryptology: Proceedings of CRYPTO ’10, Santa Barbara, CA,
752 References

ed. TAL R ABIN. Lecture Notes in Computer Science 6223, Springer-Verlag, Berlin, Heidelberg, New
York, 333–350. [542]
A RNOLD K NOPFMACHER (1995), Enumerating basic properties of polynomials over a finite field. South
African Journal of Science 91, 10–11. [419]
A RNOLD K NOPFMACHER and J OHN K NOPFMACHER (1993), Counting irreducible factors of polynomials over
a finite field. Discrete Mathematics 112, 103–118. [419]
A RNOLD K NOPFMACHER and R ICHARD WARLIMONT (1995), Distinct degree factorizations for polynomials
over a finite field. Transactions of the American Mathematical Society 347(6), 2235–2243. [419]
D ONALD E. K NUTH (1970), The analysis of algorithms. In Proceedings of the International Congress of
Mathematicians 1970, Nice, France, vol. 3, 269–274. [332, 724]
D ONALD E. K NUTH (1993), Johann Faulhaber and sums of powers. Mathematics of Computation 61(203),
277–294. [670]
D ONALD E. K NUTH (1997), The Art of Computer Programming, vol. 1, Fundamental Algorithms.
Addison-Wesley, Reading MA, 3rd edition. First edition 1969. [308]
D ONALD E. K NUTH (1998), The Art of Computer Programming, vol. 2, Seminumerical Algorithms.
Addison-Wesley, Reading MA, 3rd edition. First edition 1969.
[25, 40, 61, 62, 88, 90, 247, 286, 417, 505, 531, 567]
D ONALD E. K NUTH and L UIS T RABB PARDO (1976), Analysis of a simple factorization algorithm.
Theoretical Computer Science 3(3), 321–348. [567]
N EAL KOBLITZ (1987a), A Course in Number Theory and Cryptography. Graduate Texts in Mathematics 114,
Springer-Verlag, New York. [531, 568]
N EAL KOBLITZ (1987b), Elliptic Curve Cryptosystems. Mathematics of Computation 48(177), 203–209. [580]
H ELGE VON KOCH (1904), Sur une courbe continue sans tangente obtenue par une construction géométrique
élémentaire. Arkiv för matematik, astronomi och fysik 1, 681–702. [287]
W OLFRAM KOEPF (1995), Algorithms for m-fold Hypergeometric Summation. Journal of Symbolic
Computation 20, 399–417. [671]
W OLFRAM KOEPF (1998), Hypergeometric Summation. Advanced Lectures in Mathematics, Friedrich Vieweg
& Sohn, Braunschweig / Wiesbaden. [670, 697]
J ÁNOS KOLLÁR (1988), Sharp effective Nullstellensatz. Journal of the American Mathematical Society 1(4),
963–975. [618]
A LWIN KORSELT (1899), Problème chinois. L’Intermédiaire des Mathématiciens 6, p. 143. [532]
H ENRIK KOY and C LAUS P ETER S CHNORR (2001a), Segment LLL-Reduction of Lattice Bases.
In Cryptography and Lattices, International Conference (CaLC 2001), Providence RI, ed. J OSEPH H.
S ILVERMAN. Lecture Notes in Computer Science 2146, Springer-Verlag, 67–80. [497]
H ENRIK KOY and C LAUS P ETER S CHNORR (2001b), Segment LLL-Reduction with Floating Point
Orthogonalization. In Cryptography and Lattices, International Conference (CaLC 2001), Providence RI,
ed. J OSEPH H. S ILVERMAN. Lecture Notes in Computer Science 2146, Springer-Verlag, 81–96. [497]
D EXTER KOZEN and S USAN L ANDAU (1986), Polynomial Decomposition Algorithms. Technical Report
86-773, Department of Computer Science, Cornell University, Ithaca NY. [752]
D EXTER KOZEN and S USAN L ANDAU (1989), Polynomial Decomposition Algorithms. Journal of Symbolic
Computation 7, 445–456. An earlier version was published as Kozen & Landau (1986). [576, 581]
L EON G. K RAFT , J R . (1949), A Device for Quantizing, Grouping, and Coding Amplitude Modulated Pulses.
M.Sc. thesis, Electrical Engineering Department, M.I.T. [307]
M. K RAÏTCHIK (1926), Théorie des Nombres, vol. II. Gauthier-Villars, Paris. [567, 727, 728]
J. K RAJÍ ČEK (1995), Bounded arithmetic, propositional logic and complexity theory. Encyclopedia of
Mathematics and its Applications 60, Cambridge University Press, Cambridge, UK. [697]
L. K RONECKER (1873), Die verschiedenen Sturm schen Reihen und ihre gegenseitigen Beziehungen.
Monatsberichte der Königlich Preussischen Akademie der Wissenschaften, Berlin, 117–154. [197]
L. K RONECKER (1878), Über Sturm sche Functionen. Monatsberichte der Königlich Preussischen Akademie
der Wissenschaften, Berlin, 95–121. Werke, Zweiter Band, ed. K. H ENSEL, Leipzig, 1897, 37–70.
Reprint by Chelsea Publishing Co., New York, 1968. [197]
L. K RONECKER (1881a), Zur Theorie der Elimination einer Variabeln aus zwei algebraischen Gleichungen.
Monatsberichte der Königlich Preussischen Akademie der Wissenschaften, Berlin, 535–600. Werke,
Zweiter Band, ed. K. H ENSEL, Leipzig, 1897, 113–192. Reprint by Chelsea Publishing Co., New York,
1968. [132, 137, 353]
L. K RONECKER (1881b), Auszug aus einem Briefe des Herrn Kronecker an E. Schering. Nachrichten der
Akademie der Wissenschaften, Göttingen, 271–279. [197]
L. K RONECKER (1882), Grundzüge einer arithmetischen Theorie der algebraischen Grössen. Journal für die
reine und angewandte Mathematik 92, 1–122. Werke, Zweiter Band, ed. K. H ENSEL, Leipzig, 1897,
237–387. Reprint by Chelsea Publishing Co., New York, 1968. [247, 465]
L EOPOLD K RONECKER (1883), Die Zerlegung der ganzen Grössen eines natürlichen Rationalitäts-Bereichs in
ihre irreductibeln Factoren. Journal für die reine und angewandte Mathematik 94, 344–348. Werke,
References 753

Zweiter Band, ed. K. H ENSEL, Leipzig, 1897, 409–416. Reprint by Chelsea Publishing Co., New York,
1968. [465]
A. N. Krylov [A. N. K RYLOV ] (1931), O qislennom rexenii uravneni, kotorym v
tehniqeskih voprosah opredel ts qastoty malyh kolebani i  materialьnyh
sistem (On numerical solutions which determine the frequencies of small oscillations of material
systems in technical problems). Izvesti Akademii Nauk SSSR, Otdelenie
Matematiqeskih i estestvennyh nauk (Bulletin de l’académie des sciences de l’URSS, Classe
des sciences mathématiques et naturelles) 4, 491–539. [353]
Y. H. K U and X IAOGUANG S UN (1992), The Chinese Remainder Theorem. Journal of the Franklin
Institute 329, 93–97. [131]
K LAUS K ÜHNLE and E RNST W. M AYR (1996), Exponential Space Computation of Gröbner Bases.
In Proceedings of the 1996 International Symposium on Symbolic and Algebraic Computation
ISSAC ’96, Zürich, Switzerland, ed. L AKSHMAN Y. N., ACM Press, 63–71. [616, 617]
H. T. K UNG (1974), On Computing Reciprocals of Power Series. Numerische Mathematik 22, 341–348. [286]
J. C. L AFON (1983), Summation in Finite Terms. In Computer Algebra, Symbolic and Algebraic Computation,
eds. B. B UCHBERGER, G. E. C OLLINS, and R. L OOS, 71–77. Springer-Verlag, New York, 2nd edition.
[671]
J. C. L AGARIAS (1982a), Best simultaneous Diophantine approximations. I. Growth rates of best approximation
denominators. Transactions of the American Mathematical Society 272(2), 545–554. [509]
J. C. L AGARIAS (1982b), Best simultaneous Diophantine approximations. II. Behavior of consecutive best
approximations. Pacific Journal of Mathematics 102(1), 61–88. [509]
J. C. L AGARIAS (1985), The computational complexity of simultaneous Diophantine approximation problems.
SIAM Journal on Computing 14(1), 196–209. [506, 509]
J. C. L AGARIAS (1990), Pseudorandom Number Generators in Cryptography and Number Theory.
In Cryptology and Computational Number Theory, ed. C ARL P OMERANCE. Proceedings of Symposia in
Applied Mathematics 42, American Mathematical Society, 115–143. [509, 580]
J. C. L AGARIAS and A. M. O DLYZKO (1977), Effective Versions of the Chebotarev Density Theorem.
In Algebraic Number Fields, ed. A. F RÖHLICH, 409–464. Academic Press, London. [443]
J. C. L AGARIAS and A. M. O DLYZKO (1985), Solving Low-Density Subset Sum Problems. Journal of
the ACM 32(1), 229–246. [509]
J OSEPH L OUIS DE L AGRANGE (1759), Recherches sur la méthode de maximis et minimis. Miscellanea
Taurinensia 1. Œuvres, publiées par J.-A. S ERRET, vol. 1, 1867, Gauthier-Villars, Paris, 1–20. [131]
J OSEPH L OUIS DE L AGRANGE (1769), Sur la résolution des équations numériques. Mémoires de l’Académie
des Sciences et Belles-Lettres de Berlin 23. Œuvres, publiées par J.-A. S ERRET, vol. 2, 1868,
Gauthier-Villars, Paris, 539–578. [419]
J OSEPH L OUIS DE L AGRANGE (1770a), Additions au mémoire sur la résolution des équations numériques.
Mémoires de l’Académie des Sciences et Belles-Lettres de Berlin 24. Œuvres, publiées par J.-A.
S ERRET, vol. 2, 1868, Gauthier-Villars, Paris, 581–652. [90]
J OSEPH L OUIS DE L AGRANGE (1770b), Nouvelle méthode pour résoudre les problèmes indéterminés en
nombres entiers. Mémoires de l’Académie des Sciences et Belles-Lettres de Berlin 24. Œuvres, publiées
par J.-A. S ERRET, vol. 2, 1868, Gauthier-Villars, Paris, 655–726. [131]
J OSEPH L OUIS DE L AGRANGE (1795), Sur l’usage des courbes dans la solution des Problèmes. In Leçons
élémentaires sur les mathématiques, Leçon cinquième. École Polytechnique, Paris. Œuvres, publiées par
J.-A. S ERRET, vol. 7, 1877, Gauthier-Villars, Paris, 271–287. [131, 728]
J OSEPH L OUIS DE L AGRANGE (1798), Additions aux éléments d’algèbre d’Euler. Analyse indéterminée.
In L EONHARD E ULER, Éléments d’algèbre, St. Petersburg. Œuvres, publiées par J.-A. S ERRET, vol. 7,
1877, Gauthier-Villars, Paris, 5–180. [90, 91]
L AKSHMAN Y. N. (1990), On the Complexity of Computing a Gröbner Basis for the Radical of a Zero
Dimensional Ideal. In Proceedings of the Twenty-second Annual ACM Symposium on Theory of
Computing, Baltimore MD, ACM Press, 555–563. [618]
B. A. L A M ACCHIA and A. M. O DLYZKO (1990), Solving large sparse linear systems over finite fields.
In Advances in Cryptology: Proceedings of CRYPTO ’90, Santa Barbara, CA. Lecture Notes in Computer
Science 537, Springer-Verlag, Berlin and New York, 109–133. [353]
L ARRY A. L AMBE, ed. (1997), Special Issue on Applications of Symbolic Computation to Research and
Education. Journal of Symbolic Computation 23(5/6). [21]
L AMBERT (1761), Mémoire sur quelques propriétés remarquables des quantités transcendentes circulaires et
logarithmiques. Histoire de l’Académie Royale des Sciences et des Belles-Lettres de Berlin 17, 265–322.
Reprint of pages 265–276 in Berggren, Borwein & Borwein (1997), 129–140. [82]
G ABRIEL L AMÉ (1844), Note sur la limite du nombre des divisions dans la recherche du plus grand commun
diviseur entre deux nombres entiers. Comptes Rendus de l’Académie des Sciences Paris 19, 867–870.
[61]
754 References

C. L ANCZOS (1952), Solutions of systems of linear equations by minimized iterations. Journal of Research of
the National Bureau of Standards 49, 33–53. [353]
E. L ANDAU (1905), Sur quelques théorèmes de M. Petrovitch relatifs aux zéros des fonctions analytiques.
Bulletin de la Société Mathématique de France 33, 251–261. [165]
F. L ANDRY (1880), Note sur la décomposition du nombre 264 + 1 (Extrait). Comptes Rendus de l’Académie des
Sciences Paris 91, p. 138. [542]
S ERGE L ANG (1983), Fundamentals of Diophantine Geometry . Springer-Verlag, New York. [498]
TANJA L ANGE and A RNE W INTERHOF (2000), Factoring polynomials over arbitrary finite fields. Theoretical
Computer Science 234, 301–308. [421]
DE LA P LACE (1772), Recherches sur le calcul intégral et sur le système du monde. Mémoires de l’Académie
Royale des Sciences II. Œuvres complètes de Laplace, vol. 8, Gauthier-Villars, Paris, 1891, 367–501.
[724]
DANIEL L AUER (2000), Effiziente Algorithmen zur Berechnung von Resultanten und Subresultanten. Berichte
aus der Informatik, Shaker Verlag, Aachen. PhD thesis, University of Bonn, Germany. [198, 466]
D. L AZARD and R. R IOBOO (1990), Integration of Rational Functions: Rational Computation of the
Logarithmic Part. Journal of Symbolic Computation 9, 113–115. [640]
V.-A. L EBESGUE (1847), Sur le symbole ba et quelques-unes de ses applications. Journal de Mathématiques
Pures et Appliquées 12, 497–517. [533]
A. M. L EGENDRE (1785), Recherches d’analyse indéterminée. Mémoires de l’Académie Royale des Sciences,
465–559. [198, 418, 420, 466, 468, 569]
A. M. LE G ENDRE (1798, An VI), Essai sur la théorie des nombres. Duprat, Paris. [418, 533, 728]
D. J. L EHMANN (1982), On primality tests. SIAM Journal on Computing 11, 374–375. [537]
D. H. L EHMER (1930), An extended theory of Lucas’ functions. Annals of Mathematics, Series II 31, 419–448.
[530]
D. H. L EHMER (1935), On Lucas’s test for the primality of Mersenne’s numbers. Journal of the London
Mathematical Society 10, 162–165. [530]
D. H. L EHMER (1938), Euclid’s algorithm for large numbers. The American Mathematical Monthly 45,
227–233. [332]
D. H. L EHMER and R. E. P OWERS (1931), On factoring large numbers. Bulletin of the American Mathematical
Society 37, 770–776. [569]
G OTTFRIED W ILHELM L EIBNIZ (1683), Draft letter to Tschirnhaus. In Der Briefwechsel von Gottfried
Wilhelm Leibniz mit Mathematikern, Erster Band, ed. C. I. G ERHARDT, 446–450. Mayer & Müller,
Berlin, 1899. Reprinted by Georg Olms Verlag, Hildesheim, 1987. [197]
G OTTFRIED W ILHELM L EIBNIZ (1697), Nova algebrae promotio. Undated manuscript, c. 1697.
In Mathematische Schriften, vol. 7, ed. C. I. G ERHARDT, 154–189. Halle, 1863. In: Gesammelte Werke
aus den Handschriften der Königlichen Bibliothek zu Hannover, Band VII, Kapitel XV, reprinted by
Georg Olms Verlag, Hildesheim, 1971. [88]
G OTTFRIED W ILHELM L EIBNIZ (1701), Initia mathematica. De ratione et proportione. Undated manuscript,
c. 1701. In Mathematische Schriften, vol. 7, ed. C. I. G ERHARDT, 1863, 40–49. Reprinted by Georg
Olms Verlag, Hildesheim, 1971. [89]
G OTHOFREDUS W ILHELMUS L EIBNITZ [G OTTFRIED W ILHELM L EIBNIZ ] (1703), Continuatio analyseos
quadraturarum rationalium. Acta eruditorum, 19–26. [640]
F RANZ L EMMERMEYER (1995), The Euclidean algorithm in algebraic number fields. Expositiones
Mathematicae 13, 385–416. [724]
A RJEN K. L ENSTRA (1984), Factoring Polynomials over Algebraic Number Fields. In Proceedings of the 11th
International Symposium Mathematical Foundations of Computer Science 1984, Praha, Czechoslovakia.
Lecture Notes in Computer Science 176, 389–396. [465]
A RJEN K. L ENSTRA (1987), Factoring multivariate polynomials over algebraic number fields. SIAM Journal
on Computing 16, 591–598. [465]
A RJEN K. L ENSTRA (1990), Primality Testing. In Cryptology and Computational Number Theory, ed. C ARL
P OMERANCE. Proceedings of Symposia in Applied Mathematics 42, American Mathematical Society,
13–25. [531]
A RJEN K. L ENSTRA and H ENDRIK W. L ENSTRA , J R . (1990), Algorithms in Number Theory. In Handbook of
Theoretical Computer Science, vol. A, ed. J. VAN L EEUWEN, 673–715. Elsevier Science Publishers B.V.,
Amsterdam, and The MIT Press, Cambridge MA. [531]
A RJEN K. L ENSTRA and H ENDRIK W. L ENSTRA , J R ., eds. (1993), The development of the number field sieve.
Lecture Notes in Mathematics 1554, Springer-Verlag, Berlin. [569]
A. K. L ENSTRA, H. W. L ENSTRA , J R ., and L. L OVÁSZ (1982), Factoring Polynomials with Rational
Coefficients. Mathematische Annalen 261, 515–534. [474, 497, 506]
A RJEN K. L ENSTRA, H ENDRIK W. L ENSTRA , J R ., M. S. M ANASSE, and J. M. P OLLARD (1990),
The number field sieve. In Proceedings of the Twenty-second Annual ACM Symposium on Theory of
Computing, Baltimore MD, ACM Press, 564–572. [569]
References 755

A RJEN K. L ENSTRA, H ENDRIK W. L ENSTRA , J R ., M. S. M ANASSE, and J. M. P OLLARD (1993),

The factorization of the ninth Fermat number. Mathematics of Computation 61(203), 319–349. [534, 542]
A. K. L ENSTRA and M. S. M ANASSE (1990), Factoring by electronic mail. In Advances in Cryptology:
Proceedings of EUROCRYPT 1989, Houthalen, Belgium. Lecture Notes in Computer Science 434,
Springer-Verlag, Berlin, 355–371. [531]
H ENDRIK W. L ENSTRA , J R . (1979a), Euclidean Number Fields 1. The Mathematical Intelligencer 2(1), 6–15.
[724]
H ENDRIK W. L ENSTRA , J R . (1979b), Miller’s primality test. Information Processing Letters 8(2), 86–88. [532]
H ENDRIK W. L ENSTRA , J R . (1980a), Euclidean Number Fields 2. The Mathematical Intelligencer 2(2), 73–77.
[724]
H ENDRIK W. L ENSTRA , J R . (1980b), Euclidean Number Fields 3. The Mathematical Intelligencer 2(2),
99–103. [724]
H. W. L ENSTRA , J R . (1987), Factoring integers with elliptic curves. Annals of Mathematics 126, 649–673.
[541, 557, 558, 565, 568]
H. W. L ENSTRA , J R . (1990), Algorithms for finite fields. In Number theory and cryptography, ed. J. H.
L OXTON, London Mathematical Society Lecture Note Series 154, 76–85. Cambridge University Press,
Cambridge, UK. [569]
H. W. L ENSTRA , J R . (1991), Finding isomorphisms between finite fields. Mathematics of
Computation 56(193), 329–347. [419]
H. W. L ENSTRA , J R . and C ARL P OMERANCE (1992), A rigorous time bound for factoring integers. Journal of
the American Mathematical Society 5(3), 483–516. [569]
A. H. M. L EVELT (1997), The cycloheptane molecule – a challenge to computer algebra. Invited lecture given
at the 1997 International Symposium on Symbolic and Algebraic Computation ISSAC ’97, Maui HI. [698]
L. A. L EVIN (1973), Universal sequential search problems. Problems of Information Transmission 9, 265–266.
Translated from Problemy Peredachi Informatsii 9(3) (1973), 115–116. [724]
DANIEL L EWIN and S ALIL VADHAN (1998), Checking Polynomial Identities over any Field: Towards a
Derandomization? In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing,
Dallas TX, ACM Press, 438–447. [199]
T. L ICKTEIG (1987), The computational complexity of division in quadratic extension fields. SIAM Journal on
Computing 16, 278–311.
T HOMAS L ICKTEIG and M ARIE -F RANÇOISE ROY (1996), Cauchy Index Computation. Calcolo 33, 331–357.
[184, 199, 332]
T HOMAS L ICKTEIG and M ARIE -F RANÇOISE ROY (2001), Sylvester-Habicht Sequences and Fast Cauchy
Index Computation. Journal of Symbolic Computation 31, 315–341. [184, 199, 332]
RUDOLF L IDL and H ARALD N IEDERREITER (1997), Finite Fields. Encyclopedia of Mathematics and its
Applications 20, Cambridge University Press, Cambridge, UK, 2nd edition. First published by
Addison-Wesley, Reading MA, 1983. [421, 711]
F. L INDEMANN (1882), Über die Zahl π. Mathematische Annalen 20, 213–225. [82]
J. H. VAN L INT (1982), Introduction to Coding Theory. Graduate Texts in Mathematics 86, Springer-Verlag,
New York. [215]
J OSEPH L IOUVILLE (1833a), Sur la détermination des Intégrales dont la valeur est algébrique. Journal de
l’École Polytechnique 14, Premier Mémoire: 124–148, Second Mémoire: 149–193. [640]
J OSEPH L IOUVILLE (1833b), Note sur la détermination des intégrales dont la valeur est algébrique. Journal für
die reine und angewandte Mathematik 10, 347–359. Errata 11 (1834), 406. [640]
J OSEPH L IOUVILLE (1835), Mémoire sur l’intégration d’une classe de fonctions transcendantes. Journal für die
reine und angewandte Mathematik 13(2), 93–118. [623, 640]
J OHN D. L IPSON (1971), Chinese remainder and interpolation algorithms. In Proceedings 2nd ACM
Symposium on Symbolic and Algebraic Manipulation, Los Angeles CA, ed. S. R. P ETRICK, ACM Press,
372–391. [306]
J OHN D. L IPSON (1981), Elements of Algebra and Algebraic Computing. Addison-Wesley, Reading MA. [247]
P ETR L ISON ĚK, P ETER PAULE, and VOLKER S TREHL (1993), Improvement of the degree setting in Gosper’s
algorithm. Journal of Symbolic Computation 16, 243–258. [670, 671]
DANIEL B. L LOYD (1964), Factorization of the general polynomial by means of its homomorphic congruential
functions. The American Mathematical Monthly 71, 863–870. [419]
DANIEL B. L LOYD and H ARRY R EMMERS (1966), Polynomial factor tables over finite fields. Mathematical
Algorithms 1, 85–99. [419]
H ENRI L OMBARDI, M ARIE -F RANÇOISE ROY, and M OHAB S AFEY E L D IN (2000), New Structure Theorem
for Subresultants. Journal of Symbolic Computation 29, 663–689. [199]
R ÜDIGER L OOS (1983), Computing rational zeroes of integral polynomials by p-adic expansion. SIAM Journal
on Computing 12(2), 286–293. [466]
S. C. L U and L. N. L EE (1979), A simple and effective public-key cryptosystem. COMSAT Technical
Review 9(1), 15–24. [509]
756 References

É DOUARD L UCAS (1878), Théorie des fonctions numériques simplement périodiques. American Journal of
Mathematics 1, I: 184–240, II: 289–321. [530]
PAUL L UCKEY (1951), Die Rechenkunst bei Ǧamšı̄d b. Mas֒ūd al-Kāšı̄ . Abhandlungen für die Kunde des
Morgenlandes, XXXI,1, Kommissionsverlag Franz Steiner GmbH, Wiesbaden. Herausgegeben von der
Deutschen Morgenländischen Gesellschaft. [725]
P. L UCKEY (1953), Der Lehrbrief über den Kreisumfang (Ar-risāla al-muh.ı̄tı̄ya) von Ǧamšı̄d B. Mas֒ūd
Al-Kāšı̄ . Abhandlungen der Deutschen Akademie der Wissenschaften zu Berlin, Klasse für Mathematik
und allgemeine Naturwissenschaften 6, Akademie-Verlag, Berlin. [90]
J. VAN DE L UNE, H. J. J. TE R IELE, and D. T. W INTER (1986), On the Zeros of the Riemann Zeta Function in
the Critical Strip. IV. Mathematics of Computation 46(174), 667–681. [533]
K EJU M A and J OACHIM VON ZUR G ATHEN (1990), Analysis of Euclidean Algorithms for Polynomials over
Finite Fields. Journal of Symbolic Computation 9, 429–455. [62]
F. S. M ACAULAY (1902), Some formulæ in elimination. Proceedings of the London Mathematical Society 35,
3–27. [197, 619]
F. S. M ACAULAY (1916), The algebraic theory of modular systems. Cambridge University Press,
Cambridge, UK. Reissued 1994. [197, 619, 728, 729]
F. S. M ACAULAY (1922), Note on the resultant of a number of polynomials of the same degree. Proceedings of
the London Mathematical Society, Second Series 21, 14–21. [197, 619]
D. M ACK (1975), On rational integration. Technical Report UCP-38, Department of Computer Science,
University of Utah. [642]
C OLIN M ACLAURIN (1742), A treatise of fluxions. 2 volumes, Edinburgh. 2nd ed., London, 1801; French
translation Paris, 1749. [286]
F. J. M AC W ILLIAMS and N. J. A. S LOANE (1977), The Theory of Error-Correcting Codes. Mathematical
Library 16, North-Holland, Amsterdam. [215]
D IETRICH M AHNKE (1912/13), Leibniz auf der Suche nach einer allgemeinen Primzahlgleichung. Bibliotheca
Mathematica, Serie 3, 13, 29–61. [88, 531]
Y IU -K WONG M AN (1993), On Computing Closed Forms for Indefinite Summations. Journal of Symbolic
Computation 16, 355–376. [671]
B ENOÎT B. M ANDELBROT (1977), The fractral geometry of nature. Freeman. [278]
. I. Manin (1956), O sravnenih tretьe i stepeni po prostomu modul . Izvesti
Akademii Nauk SSSR, Seri Matematiqeska 20, 673–678. Y U . I. M ANIN, On cubic
congruences to a prime modulus, American Mathematical Society Translations, Series 2, 13 (1960), 1–7.
[568]
J. L. M ASSEY (1965), Step by step decoding of the Bose-Chaudhuri-Hocquenghem codes. IEEE Transactions
on Information Theory IT-11, 580–585. [215]
. V. Matiseviq (1970), Diofantovostь pereqislimih mnoжestv. Doklady Akademii
Nauk SSSR 191(2), 279–282. Y U . V. M ATIYASEVICH, Enumerable sets are Diophantine, Soviet
Mathematics Doklady 11(2), 354–358. [89]
rii  V. Matiseviq (1993), Desta problema Gilьberta. Nauka, Moscow. Y URI V.
M ATIYASEVICH, Hilbert’s Tenth Problem, Foundations of Computing Series, The MIT Press,
Cambridge MA, 1993. [89, 640]
U ELI M. M AURER and S TEFAN W OLF (1999), The relationship between breaking the Diffie-Hellman protocol
and computing discrete logarithms. SIAM Journal on Computing 28(5), 1689–1721. [580]
E RNST W. M AYR (1984), An algorithm for the general Petri net reachability problem. SIAM Journal on
Computing 13(3), 441–460. [697]
E RNST M AYR (1989), Membership in Polynomial Ideals over Q Is Exponential Space Complete.
In Proceedings of the 6th Annual Symposium on Theoretical Aspects of Computer Science STACS ’89,
Paderborn, Germany, eds. B. M ONIEN and R. C ORI. Lecture Notes in Computer Science 349,
Springer-Verlag, 400–406. [616]
E RNST W. M AYR (1992), Polynomial ideals and applications. Mitteilungen der Mathematischen Gesellschaft in
Hamburg 12(4), 1207–1215. Festschrift zum 300jährigen Bestehen der Gesellschaft. [616, 697]
E RNST W. M AYR (1995), On Polynomial Ideals, Their Complexity, and Applications. In Proceedings of the
10th International Conference on Fundamentals of Computation Theory FCT ’95, Dresden, Germany, ed.
H ORST R EICHEL. Lecture Notes in Computer Science 965, Springer-Verlag, 89–105. [616, 697]
E RNST W. M AYR (1997), Some complexity results for polynomial ideals. Journal of Complexity 13, 303–325.
[618]
E RNST W. M AYR and A LBERT R. M EYER (1982), The Complexity of the Word Problems for Commutative
Semigroups and Polynomial Ideals. Advances in Mathematics 46, 305–329. [616, 617, 618]
E RNST W. M AYR and S TEPHAN R ITSCHER (2010), Degree Bounds for Gröbner Bases of Low-Dimensional
Polynomial Ideals. Proceedings of the 2010 International Symposium on Symbolic and Algebraic
Computation ISSAC2010, Munich, Germany, 21–27. [617]
References 757

K EVIN S. M C C URLEY (1990), The Discrete Logarithm Problem. In Cryptology and Computational Number
Theory, ed. C ARL P OMERANCE. Proceedings of Symposia in Applied Mathematics 42, American
Mathematical Society, 49–74. [580]
ROBERT J. M C E LIECE (1969), Factorization of Polynomials over Finite Fields. Mathematics of
Computation 23, 861–867. [419]
A LFRED M ENEZES (1993), Elliptic curve public key cryptosystems. Kluwer Academic Publishers, Boston MA.
[580]
R ALPH C. M ERKLE and M ARTIN E. H ELLMAN (1978), Hiding information and signatures in trapdoor
knapsacks. IEEE Transactions on Information Theory IT-24(5), 525–530. [503, 504, 509, 576]
M ARIN M ERSENNE (1636), Harmonie universelle contenant la théorie et la pratique de la musique. Sebastien
Cramoisy, Paris. Reprinted by Centre National de la Recherche Scientifique, Paris, 1975. [86]
F. M ERTENS (1897), Über eine zahlentheoretische Function. Sitzungsberichte der Akademie der
Wissenschaften, Wien, Mathematisch-Naturwissenschaftliche Classe 106, 761–830. [508]
N ICHOLAS M ETROPOLIS and S. U LAM (1949), The Monte Carlo Method. Journal of the American Statistical
Association 44, 335–341. [198]
S HAWNA M EYER E IKENBERRY and J ONATHAN P. S ORENSON (1998), Efficient algorithms for computing the
Jacobi symbol. Journal of Symbolic Computation 26(4), 509–523. [533]
M. M IGNOTTE (1974), An Inequality About Factors of Polynomials. Mathematics of Computation 28(128),
1153–1157. [198]
M. M IGNOTTE (1982), Some Useful Bounds. In Computer Algebra, Symbolic and Algebraic Computation, eds.
B. B UCHBERGER, G. E. C OLLINS, and R. L OOS, 259–263. Springer-Verlag, New York, 2nd edition.
[198]
M AURICE M IGNOTTE (1988), An Inequality about Irreducible Factors of Integer Polynomials. Journal of
Number Theory 30, 156–166. [198]
M AURICE M IGNOTTE (1989), Mathématiques pour le calcul formel. Presses Universitaires de France, Paris.
English translation: Mathematics for Computer Algebra, Springer-Verlag, New York, 1992. [198]
M AURICE M IGNOTTE and P HILIPPE G LESSER (1994), On the Smallest Divisor of a Polynomial. Journal of
Symbolic Computation 17, 277–282. [198]
M AURICE M IGNOTTE and C. S CHNORR (1988), Calcul des racines d-ièmes dans un corps fini. Comptes
Rendus de l’Académie des Sciences Paris 290, 205–206. [421]
X. E. Mikeladze [S H . E. M IKELADZE ] (1948), O razloжenii opredelitel, зlementami
kotorogo sluжat polinomy (On the expansion of a determinant whose entries are polynomials).
Prikladna matematika i mehanika (Prikladnaya matematika i mekhanika) 12, 219–222. [132]
G ARY L. M ILLER (1976), Riemann’s Hypothesis and Tests for Primality. Journal of Computer and System
Sciences 13, 300–317. [532]
V ICTOR S. M ILLER (1986), Use of Elliptic Curves in Cryptography. In Advances in Cryptology: Proceedings
of CRYPTO ’85, Santa Barbara, CA, ed. H UGH C. W ILLIAMS. Lecture Notes in Computer Science 218,
Springer-Verlag, Berlin, 417–426. [580]
H. M INKOWSKI (1910), Geometrie der Zahlen. B. G. Teubner, Leipzig. [496]
R. T. M OENCK (1973), Fast computation of gcd’s. In Proceedings of the Fifth Annual ACM Symposium on
Theory of Computing, Austin TX, ACM Press, 142–151. [332]
ROBERT T. M OENCK (1976), Practical Fast Polynomial Multiplication. In Proceedings of the 1976 ACM
Symposium on Symbolic and Algebraic Computation SYMSAC ’76, Yorktown Heights NY, ed. R. D.
J ENKS, ACM Press, 136–148. [247]
ROBERT T. M OENCK (1977a), On the Efficiency of Algorithms for Polynomial Factoring. Mathematics of
Computation 31(137), 235–250. [421]
ROBERT M OENCK (1977b), On computing closed forms for summation. In Proceedings of the 1977
MACSYMA Users Conference, Berkeley CA, NASA, Washington DC, 225–236. [671, 673]
R. M OENCK and A. B ORODIN (1972), Fast modular transform via division. In Proceedings of the 13th Annual
IEEE Symposium on Switching and Automata Theory, Yorktown Heights NY, IEEE Press, New York,
90–96. [306]
M ICHAEL M OELLER (1999), Good non-zeros of polynomials. ACM SIGSAM Bulletin 33(3), 10–11. [199]
H. M ICHAEL M ÖLLER and F ERDINANDO M ORA (1984), Upper and lower bounds for the degree of Gröbner
bases. In Proceedings of EUROSAM ’84, Cambridge, UK, ed. J OHN F ITCH. Lecture Notes in Computer
Science 174, Springer-Verlag, New York, 172–183. [618]
L OUIS M ONIER (1980), Evaluation and comparison of two efficient probabilistic primality testing algorithms.
Theoretical Computer Science 12, 97–108. [532, 533]
P ETER L. M ONTGOMERY (1985), Modular Multiplication Without Trial Division. Mathematics of
Computation 44(170), 519–521. [288]
P ETER L. M ONTGOMERY (1991), Factorization of X 216091 + X + 1 mod 2—A problem of Herb Doughty.
Manuscript. [280]
758 References

P ETER L AWRENCE M ONTGOMERY (1992), An FFT Extension of the Elliptic Curve Method of Factorization.
PhD thesis, University of California, Los Angeles CA.
http://research.microsoft.com/en-us/um/people/petmon/thesis.pdf. [287, 308]
P ETER L. M ONTGOMERY (1995), A Block Lanczos Algorithm for Finding Dependencies over GF(2).
In Advances in Cryptology: Proceedings of EUROCRYPT 1995, Saint-Malo, France, eds. L OUIS C.
G UILLOU and J EAN -JACQUES Q UISQUATER. Lecture Notes in Computer Science 921, Springer-Verlag,
106–120. [353]
E LIAKIM H ASTINGS M OORE (1896), A doubly-infinite system of simple groups. In Mathematical papers read
at the International Mathematical Congress: held in connection with the World’s Columbian exposition,
Chicago, 1893, Macmillan, New York, 208–242. [88]
ROBERT E DOUARD M ORITZ (1914), Memorabilia Mathematica. The Mathematical Association of America.
[729]
M ICHAEL A. M ORRISON and J OHN B RILLHART (1971), The factorization of F7 . Bulletin of the American
Mathematical Society 77(2), p. 264. [542, 568]
M ICHAEL A. M ORRISON and J OHN B RILLHART (1975), A Method of Factoring and the Factorization of F7 .
Mathematics of Computation 29(129), 183–205. [541, 568]
J OEL M OSES and DAVID Y. Y. Y UN (1973), The EZGCD Algorithm. In Proceedings of the ACM National
Conference, Atlanta GA, 159–166. [198, 466]
R AJEEV M OTWANI and P RABHAKAR R AGHAVAN (1995), Randomized Algorithms. Cambridge University
Press, Cambridge, UK. [88, 198]
T HOM M ULDERS (1997), A note on subresultants and the Lazard/Rioboo/Trager formula in rational function
integration. Journal of Symbolic Computation 24(1), 45–50. [199, 640]
T. M ULDERS and A. S TORJOHANN (2000), On Lattice Reduction for Polynomial Matrices. Technical
Report 356, Department of Computer Science, ETH Zürich. 26 pages,
ftp://ftp.inf.ethz.ch/pub/publications/tech-reports/3xx/356.ps.gz. [501]
R. C. M ULLIN, I. M. O NYSZCHUK, S. A. VANSTONE, and R. M. W ILSON (1989), Optimal normal bases in
GF(pn ). Discrete Applied Mathematics 22, 149–161. [88]
DAVID R. M USSER (1971), Algorithms for Polynomial Factorization. PhD thesis, Computer Science
Department, University of Wisconsin. Technical Report #134, 174 pages. [465]
M ATS N ÄSLUND (1998), Bit Extraction, Hard-Core Predicates, and the Bit Security of RSA. PhD thesis,
Department of Numerical Analysis and Computing Science, Kungl Tekniska Högskolan (Royal Institute
of Technology), Stockholm. [580]
I SAAC N EWTON (1691/92), De quadratura curvarum. The revised and augmented treatise. Unpublished
manuscript. In: D EREK T. W HITESIDE, The mathematical papers of Isaac Newton vol. VII, Cambridge
University Press, Cambridge, UK, 1976, pp. 48–128. [641]
I SAAC N EWTON (1707), Arithmetica Universalis, sive de compositione et resolutione arithmetica liber.
J. Senex, London. English translation as Universal Arithmetick: or, A Treatise on Arithmetical
composition and Resolution, translated by the late Mr. Raphson and revised and corrected by Mr. Cunn,
London, 1728. Reprinted in: D EREK T. W HITESIDE, The mathematical works of Isaac Newton, Johnson
Reprint Co, New York, 1967, p. 4 ff. [61, 203, 725, 726]
I SAAC N EWTON (1710), Quadrature of Curves. In Lexicon Technicum. Or, an Universal Dictionary of Arts and
Sciences, vol. 2, John Harris. Reprinted in: D EREK T. W HITESIDE, The mathematical works of Isaac
Newton, vol. 1, Johnson Reprint Co, New York, 1967. [286]
P HONG Q. N GUYEN and JACQUES S TERN (2001), The Two Faces of Lattices in Cryptology. In Cryptography
and Lattices, International Conference (CaLC 2001), Providence RI, ed. J OSEPH H. S ILVERMAN. Lecture
Notes in Computer Science 2146, Springer-Verlag, 146–180. [509, 580]
T HOMAS R. N ICELY (1996), Enumeration to 1014 of the Twin Primes and Brun’s Constant. Virginia Journal of
Science 46(3), 195–204. [83]
H. N IEDERREITER (1986), Knapsack-type cryptosystems and algebraic coding theory. Problems of Control and
Information Theory 15, 159–166. [509]
H ARALD N IEDERREITER (1993a), A New Efficient Factorization Algorithm for Polynomials over Small Finite
Fields. Applicable Algebra in Engineering, Communication and Computing 4, 81–87. [420]
H. N IEDERREITER (1993b), Factorization of Polynomials and Some Linear Algebra Problems over Finite
Fields. Linear Algebra and its Applications 192, 301–328. [420]
H ARALD N IEDERREITER (1994a), Factoring polynomials over finite fields using differential equations and
normal bases. Mathematics of Computation 62(206), 819–830. [420]
H ARALD N IEDERREITER (1994b), New deterministic factorization algorithms for polynomials over finite
fields. In Finite fields: theory, applications and algorithms, eds. G. L. M ULLEN and P. J.-S. S HIUE.
Contemporary Mathematics 168, American Mathematical Society, 251–268. [420]
H ARALD N IEDERREITER and R AINER G ÖTTFERT (1993), Factorization of Polynomials over Finite Fields and
Characteristic Sequences. Journal of Symbolic Computation 16, 401–412. [420]
References 759

H ARALD N IEDERREITER and R AINER G ÖTTFERT (1995), On a new factorization algorithm for polynomials
over finite fields. Mathematics of Computation 64(209), 347–353. [420]
P EDRO N UÑEZ (1567), Libro de algebra en arithmetica y geometrica. Iuan Stelfio, widow and heirs, Anvers.
[41]
A. M. O DLYZKO (1990), The Rise and Fall of Knapsack Cryptosystems. In Cryptology and Computational
Number Theory, ed. C ARL P OMERANCE. Proceedings of Symposia in Applied Mathematics 42,
American Mathematical Society, 75–88. [497, 509]
A. M. O DLYZKO (1995a), Asymptotic Enumeration Methods. In Handbook of Combinatorics, eds.
R. G RAHAM, M. G RÖTSCHEL, and L. L OVÁSZ. Elsevier Science Publishers B.V., Amsterdam,
and The MIT Press, Cambridge MA. [697]
A NDREW M. O DLYZKO (1995b), The Future of Integer Factorization. CryptoBytes 1(2), 5–12. [580]
A NDREW M. O DLYZKO (1995c), Analytic computations in number theory. In Mathematics of Computation
1943–1993: A Half-Century of Computational Mathematics, ed. WALTER G AUTSCHI. Proceedings of
Symposia in Applied Mathematics 48, American Mathematical Society, 451–463. [533]
A. M. O DLYZKO and H. J. J. TE R IELE (1985), Disproof of the Mertens conjecture. Journal für die reine und
angewandte Mathematik 357, 138–160. [508]
A. M. O DLYZKO and A. S CHÖNHAGE (1988), Fast algorithms for multiple evaluations of the Riemann zeta
function. Transactions of the American Mathematical Society 309(2), 797–809. [533]
J OSEPH O ESTERLÉ (1979), Versions effectives du théorème de Chebotarev sous l’hypothèse de Riemann
généralisée. Société Mathématique de France, Astérisque 61, 165–167. [443]
H. O NG, C. P. S CHNORR, and A. S HAMIR (1984), An efficient signature scheme based on quadratic equations.
In Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing, Washington DC,
ACM Press, 208–216. [509]
L UITZEN J OHANNES O OSTERHOFF (1949), Restricted free rotation and cyclic molecules. PhD thesis,
Rijksuniversiteit te Leiden. [698]
A LAN V. O PPENHEIM and RONALD W. S CHAFER (1975), Digital Signal Processing. Prentice-Hall, Inc.,
Englewood Cliffs NJ. [368]
A LAN V. O PPENHEIM, A LAN S. W ILLSKY, and I AN T. YOUNG (1983), Signals and Systems. Prentice-Hall
signal processing series, Prentice-Hall, Inc., Englewood Cliffs NJ. [368]
M. O STROGRADSKY (1845), De l’intégration des fractions rationnelles. Bulletin de la classe physico-
mathématique de l’Académie Impériale des Sciences de Saint-Pétersbourg 4(82/83), 145–167. [640]
H. PADÉ (1892), Sur la représentation approchée d’une fonction par des fractions rationnelles. Annales
Scientifiques de l’Ecole Normale Supérieure, 3e série 9, Supplément S3-S93. [132]
V. . Pan (1966), O sposobah vyqisleni znaqenii mnogoqlenov. Uspehi
Matematiqeskih Nauk 21(1(127)), 103–134. V. YA . PAN, Methods of computing values of
polynomials, Russian Mathematical Surveys 21 (1966), 105–136. [306]
V. YA . PAN (1984), How to multiply matrices faster. Lecture Notes in Computer Science 179, Springer-Verlag,
New York. [352]
V ICTOR Y. PAN (1997), Faster Solution of the Key Equation for Decoding BCH Error-Correcting Codes.
In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, El Paso TX,
ACM Press, 168–175. [332]
V ICTOR Y. PAN and X INMAO WANG (2004), On Rational Number Reconstruction and Approximation. SIAM
Journal on Computing 33(2), 502–503. [327]
DANIEL N ELSON PANARIO RODRIGUEZ (1997), Combinatorial and Algebraic Aspects of Polynomials over
Finite Fields. PhD thesis, Department of Computer Science, University of Toronto. Technical Report
306/97, 154 pages. [419]
DANIEL PANARIO, X AVIER G OURDON, and P HILIPPE F LAJOLET (1998), An Analytic Approach to Smooth
Polynomials over Finite Fields. In Algorithmic Number Theory, Third International Symposium,
ANTS-III, Portland, Oregon, USA, ed. J. P. B UHLER. Lecture Notes in Computer Science 1423,
Springer-Verlag, 226–236. [419]
DANIEL PANARIO and B RUCE R ICHMOND (1998), Analysis of Ben-Or’s Polynomial Irreducibility Test.
Random Structures and Algorithms 13(3/4), 439–456. [419, 421]
DANIEL PANARIO and A LFREDO V IOLA (1998), Analysis of Rabin’s polynomial irreducibility test.
In Proceedings of LATIN ’98, Campinas, Brazil, eds. C LÁUDIO L. L UCCHESI and A RNALDO V.
M OURA. Lecture Notes in Computer Science 1380, Springer-Verlag, 1–10. [419, 421]
C HRISTOS H. PAPADIMITRIOU (1993), Computational complexity. Addison-Wesley, Reading MA. [721]
DAVID PARSONS and J OHN C ANNY (1994), Geometric Problems in Molecular Biology and Robotics.
In Proceedings 2nd International Conference on Intelligent Systems for Molecular Biology, Palo Alto CA,
322–330. [698]
P ETER PAULE (1994), Short and Easy Computer Proofs of the Rogers-Ramanujan Identities and of Identities of
Similar Type. The Electronic Journal of Combinatorics 1(# R10). 9 pages. [697]
760 References

P ETER PAULE (1995), Greatest Factorial Factorization and Symbolic Summation. Journal of Symbolic
Computation 20, 235–268. [670, 671]
P ETER PAULE and VOLKER S TREHL (1995), Symbolic summation — some recent developments. In Computer
Algebra in Science and Engineering, Bielefeld, Germany, August 1994, eds. J. F LEISCHER,
J. G RABMEIER, F. W. H EHL, and W. K ÜCHLIN, World Scientific, Singapore, 138–162. [671]
H EINZ -OTTO P EITGEN, H ARTMUT J ÜRGENS, and D IETMAR S AUPE (1992), Chaos and Fractals:
New Frontiers of Sience. Springer-Verlag, New York. [278]
W ILLIAM B. P ENNEBAKER and J OAN C. M ITCHELL (1993), JPEG still image data compression standard.
Van Nostrand Reinhold, New York. [368]
n
P EPIN (1877), Sur la formule 22 + 1. Comptes Rendus des Séances de l’Académie des Sciences, Paris 85,
329–331. [530, 538]
O SKAR P ERRON (1929), Die Lehre von den Kettenbrüchen. B. G. Teubner, Leipzig, 2nd edition. Reprinted by
Chelsea Publishing Co., New York. First edition 1913. [90]
JAMES L. P ETERSON (1981), Petri net theory and the modeling of systems . Prentice-Hall, Inc., Englewood
Cliffs NJ. [697]
M ARKO P ETKOVŠEK (1992), Hypergeometric solutions of linear recurrences with polynomial coefficients.
Journal of Symbolic Computation 14, 243–264. [671, 675]
M ARKO P ETKOVŠEK (1994), A generalization of Gosper’s algorithm. Discrete Mathematics 134, 125–131.
[671]
M ARKO P ETKOVŠEK and B RUNO S ALVY (1993), Finding All Hypergeometric Solutions of Linear Differential
Equations. In Proceedings of the 1993 International Symposium on Symbolic and Algebraic Computation
ISSAC ’93, Kiev, ed. M ANUEL B RONSTEIN, ACM Press, 27–33. [641]
M ARKO P ETKOVŠEK, H ERBERT S. W ILF, and D ORON Z EILBERGER (1996), A=B. A K Peters,
Wellesley MA. [697, 729]
K AREL P ETR (1937), Über die Reduzibilität eines Polynoms mit ganzzahligen Koeffizienten nach einem
Primzahlmodul. Časopis pro pěstování matematiky a fysiky 66, 85–94. [402, 420]
C. A. P ETRI (1962), Kommunikation mit Automaten. PhD thesis, Universität Bonn. [679]
E CKHARD P FLÜGEL (1997), An Algorithm for Computing Exponential Solutions of First Order Linear
Differential Systems. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic
Computation ISSAC ’97, Maui HI, ed. W OLFGANG W. K ÜCHLIN, ACM Press, 164–171. [641]
R. G. E. P INCH (1993), Some Primality Testing Algorithms. Notices of the American Mathematical
Society 40(9), 1203–1210. [532]
R. P IRASTU (1992), Algorithmen zur Summation rationaler Funktionen. Diplomarbeit, Universität
Erlangen-Nürnberg, Germany. [670, 673]
ROBERTO P IRASTU (1996), On Combinatorial Identities: Symbolic Summation and Umbral Calculus.
PhD thesis, Johannes Kepler Universität, Linz. [671]
R. P IRASTU and V. S TREHL (1995), Rational Summation and Gosper-Petkovšek Representation. Journal of
Symbolic Computation 20, 617–635. [671]
TONIANN P ITASSI (1997), Algebraic Propositional Proof Systems. In Descriptive Complexity and Finite
Models: Proceedings of a DIMACS Workshop, January 14–17, 1996, Princeton NJ, eds. N EIL
I MMERMAN and P HOKION G. KOLAITIS. DIMACS Series in Discrete Mathematics and Theoretical
Computer Science 31, American Mathematical Society, Providence RI, 215–244. [697]
H. C. P OCKLINGTON (1917), The Direct Solution of the Quadratic and Cubic Binomial Congruences with
Prime Moduli. Proceedings of the Cambridge Philosophical Society 19, 57–59. [88, 198]
J OHN M. P OLLARD (1971), The Fast Fourier Transform in a Finite Field. Mathematics of
Computation 25(114), 365–374. [247, 280]
J OHN M. P OLLARD (1974), Theorems on factorization and primality testing. Proceedings of the Cambridge
Philosophical Society 76, 521–528. [198, 541, 567]
J OHN M. P OLLARD (1975), A Monte Carlo method for factorization. BIT 15, 331–334. [198, 541, 545, 568]
C. P OMERANCE (1982), Analysis and comparison of some integer factoring algorithms. In Computational
Methods in Number Theory, Part 1, eds. H. W. L ENSTRA , J R . and R. T IJDEMAN, Mathematical Centre
Tracts 154, 89–139. Mathematisch Centrum, Amsterdam. [557, 567, 569]
C ARL P OMERANCE (1985), The quadratic sieve factoring algorithm. In Advances in Cryptology: Proceedings
of EUROCRYPT 1984, Paris, France, eds. T. B ETH, N. C OT, and I. I NGEMARSSON. Lecture Notes in
Computer Science 209, Springer-Verlag, Berlin, 169–182. [557]
C ARL P OMERANCE (1990), Factoring. In Cryptology and Computational Number Theory, ed. C ARL
P OMERANCE. Proceedings of Symposia in Applied Mathematics 42, American Mathematical Society,
27–47. [520, 567]
C. P OMERANCE, J. L. S ELFRIDGE, and S. S. WAGSTAFF , J R . (1980), The pseudoprimes to 25 · 109 .
Mathematics of Computation 35, 1003–1025. [532]
C ARL P OMERANCE and S. S. WAGSTAFF , J R . (1983), Implementation of the continued fraction integer
factoring algorithm. Congressus Numerantium 37, 99–118. [569]
References 761

A LFRED VAN DER P OORTEN (1978), A proof that Euler missed . . . Apéry’s proof of the irrationality of ζ(3).
The Mathematical Intelligencer 1, 195–203. [697]
A LF VAN DER P OORTEN (1996), Notes on Fermat’s Last Theorem . Canadian Mathematical Society series of
monographs and advanced texts, John Wiley & Sons, New York. [514]
E UGENE P RANGE (1959), An algorism for factoring X n − 1 over a finite field. Technical Report
AFCRC-TN-59-775, Air Force Cambridge Research Center, Bedford MA. [419, 430]
PAUL P RITCHARD (1983), Fast Compact Prime Number Sieves (among Others). Journal of Algorithms 4,
332–344. [533]
PAUL P RITCHARD (1987), Linear prime-number sieves: a family tree. Science of Computer Programming 9,
17–35. [533]
G EORGE B. P URDY (1974), A high-security log-in procedure. Communications of the ACM 17(8), 442–445.
[581]
M ICHAEL O. R ABIN (1976), Probabilistic algorithms. In Algorithms and Complexity, ed. J. F. T RAUB,
Academic Press, New York, 21–39. [532]
M ICHAEL O. R ABIN (1980a), Probabilistic Algorithms for Testing Primality. Journal of Number Theory 12,
128–138. [532]
M ICHAEL O. R ABIN (1980b), Probabilistic algorithms in finite fields. SIAM Journal on Computing 9(2),
273–280. [421, 424]
M ICHAEL O. R ABIN (1989), Efficient Dispersal of Information for Security, Load Balancing, and Fault
Tolerance. Journal of the Association for Computing Machinery 36(2), 335–348. [131, 215]
J. L. R ABINOWITSCH (1930), Zum Hilbertschen Nullstellensatz. Mathematische Annalen 102, p. 520. [618]
BARTOLOMÉ R AMOS (1482), De musica tractatus. Bologna. [86]
J OSEPH R APHSON (1690), Analysis Æquationum Universalis seu Ad Æquationes Algebraicas Resolvendas
Methodus Generalis, et Expedita, Ex nova Infinitarum serierum Doctrina Deducta ac Demonstrata. Abel
Swalle, London. [219]
A LEXANDER A. R AZBOROV (1998), Lower bounds for the polynomial calculus. computational
complexity 7(4), 291–324. [697]
C ONSTANCE R EID (1970), Hilbert . Springer-Verlag, Heidelberg, 1st edition. Third Printing 1978. [587]
DANIEL R EISCHERT (1995), Schnelle Multiplikation von Polynomen über GF(2) und Anwendungen.
Diplomarbeit, Institut für Informatik II, Rheinische Friedrich-Wilhelm-Universität Bonn, Germany. [279]
DANIEL R EISCHERT (1997), Asymptotically Fast Computation of Subresultants. In Proceedings of the 1997
International Symposium on Symbolic and Algebraic Computation ISSAC ’97, Maui HI, ed.
W OLFGANG W. K ÜCHLIN, ACM Press, 233–240. [332]
W OLFGANG R EISIG (1985), Petri Nets: An Introduction . EATCS Monographs on Theoretical Computer
Science 4, Springer-Verlag, Berlin. Translation of the German edition Petrinetze: eine Einführung,
Springer-Verlag, 1982. [697]
G EORGE W. R EITWIESNER (1950), An ENIAC Determination of π and e to more than 2000 Decimal Places.
Mathematical Tables and other Aids to Computation 4, 11–15. Reprinted in Berggren, Borwein &
Borwein (1997), 277–281. [82]
JAMES R ENEGAR (1991), Recent Progress on the Complexity of the Decision Problem for the Reals.
In Discrete and Computational Geometry: Papers from the DIMACS Special Year, eds. JACOB E.
G OODMAN, R ICHARD P OLLACK, and W ILLIAM S TEIGER. DIMACS Series in Discrete Mathematics
and Theoretical Computer Science 6, American Mathematical Society and ACM, 287–308. [619]
JAMES R ENEGAR (1992a), On the Computational Complexity of the First-order Theory of the Reals. Part I:
Introduction. Preliminaries. The Geometry of Semi-algebraic Sets. The Decision Problem for the
Existential Theory of the Reals. Journal of Symbolic Computation 13(3), 255–299. [619]
JAMES R ENEGAR (1992b), On the Computational Complexity of the First-order Theory of the Reals. Part II:
The General Decision Problem. Preliminaries for Quantifier Elimination. Journal of Symbolic
Computation 13(3), 301–327. [619]
JAMES R ENEGAR (1992c), On the Computational Complexity of the First-order Theory of the Reals. Part III:
Quantifier Elimination. Journal of Symbolic Computation 13(3), 329–352. [619]
R EYNAUD (1824), Traité d’arithmétique à l’usage des élèves qui se destinent à l’école royale polytechnique à
l’école spéciale militaire et à l’école de marine. Courcier, Paris, 12th edition. [61]
DANIEL R ICHARDSON (1968), Some undecidable problems involving elementary functions of a real variable.
Journal of Symbolic Logic 33(4), 514–520. [640]
G EORG F RIEDRICH B ERNHARD R IEMANN (1859), Ueber die Anzahl der Primzahlen unter einer gegebenen
Grösse. Monatsberichte der Berliner Akademie, 145–153. Gesammelte Mathematische Werke, ed.
H EINRICH W EBER, Teubner Verlag, Leipzig, 1892, 177-185. [533]
ROBERT H. R ISCH (1969), The problem of integration in finite terms. Transactions of the American
Mathematical Society 139, 167–189. [640, 641]
ROBERT H. R ISCH (1970), The solution of the problem of integration in finite terms. Bulletin of the American
Mathematical Society 76(3), 605–608. [640, 641]
762 References

J. F. R ITT (1948), Integration in Finite Terms. Columbia University Press, New York. [640]
J OSEPH F ELS R ITT (1950), Differential Algebra. AMS Colloquium Publications XXXIII, American
Mathematical Society, Providence RI. Reprint by Dover Publications, Inc., New York, 1966. [619]
R. L. R IVEST, A. S HAMIR, and L. M. A DLEMAN (1978), A Method for Obtaining Digital Signatures and
Public-Key Cryptosystems. Communications of the ACM 21(2), 120–126. [576]
S TEVEN ROMAN (1984), The umbral calculus. Pure and applied mathematics 111, Academic Press,
Orlando FL. [669]
L AJOS R ÓNYAI (1988), Factoring Polynomials over Finite Fields. Journal of Algorithms 9, 391–400. [421]
L AJOS R ÓNYAI (1989), Galois groups and factoring over finite fields. In Proceedings of the 30th Annual IEEE
Symposium on Foundations of Computer Science, Research Triangle Park NC, IEEE Computer Society
Press, Los Alamitos CA, 99–104. [421]
F REDERIC ROSEN (1831), The Algebra of Mohammed ben Musa . Oriental Translation Fund, London. Reprint
by Georg Olms Verlag, Hildesheim, 1986. [726]
J. BARKLEY ROSSER and L OWELL S CHOENFELD (1962), Approximate formulas for some functions of prime
numbers. Illinois Journal of Mathematics 6, 64–94. [527, 532, 536]
M ICHAEL ROTHSTEIN (1976), Aspects of symbolic integration and simplification of exponential and primitive
functions. PhD thesis, University of Wisconsin-Madison. [640, 641]
M ICHAEL ROTHSTEIN (1977), A new algorithm for the integration of exponential and logarithmic functions.
In Proceedings of the 1977 MACSYMA Users Conference, Berkeley CA, NASA, Washington DC,
263–274. [640, 641]
J OHN H. ROWLAND and J OHN R. C OWLES (1986), Small Sample Algorithms for the Identification of
Polynomials. Journal of the ACM 33(4), 822–829. [199]
H. S ACHSE (1890), Ueber die geometrischen Isomerien der Hexamethylenderivate. Berichte der Deutschen
Chemischen Gesellschaft 23, 1363–1370. [698]
H. S ACHSE (1892), Über die Konfigurationen der Polymethylenringe. Zeitschrift für physikalische Chemie 10,
203–241. [698]
B RUNO S ALVY (1991), Asymptotique automatique et fonctions génératrices. PhD thesis, École Polytechnique,
Paris. [697]
E RHARD S CHMIDT (1907), Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil:
Entwicklung willkürlicher Funktionen nach Systemen vorgeschriebener. Mathematische Annalen 63,
433–476. Reprint of Erhard Schmidt’s Dissertation, Göttingen, 1905. [496]
C. P. S CHNORR (1982), Refined Analysis and Improvements on Some Factoring Algorithms. Journal of
Algorithms 3, 101–127. [567]
C. P. S CHNORR (1987), A hierarchy of polynomial time lattice basis reduction algorithms. Theoretical
Computer Science 53, 201–224. [497]
C. P. S CHNORR (1988), A More Efficient Algorithm for Lattice Basis Reduction. Journal of Algorithms 9,
47–62. [497]
C. P. S CHNORR and M. E UCHNER (1991), Lattice Basis Reduction: Improved Practical Algorithms and
Solving Subset Sum Problems. In Proceedings of the 8th International Conference on Fundamentals of
Computation Theory 1991, Gosen, Germany, ed. L OTHAR B UDACH. Lecture Notes in Computer
Science 529, Springer-Verlag, 68–85. [497]
A. S CHÖNHAGE (1966), Multiplikation großer Zahlen. Computing 1, 182–196. [247]
A. S CHÖNHAGE (1971), Schnelle Berechnung von Kettenbruchentwicklungen. Acta Informatica 1, 139–144.
[332]
A. S CHÖNHAGE (1977), Schnelle Multiplikation von Polynomen über Körpern der Charakteristik 2. Acta
Informatica 7, 395–398. [245, 247, 253]
A RNOLD S CHÖNHAGE (1984), Factorization of univariate integer polynomials by Diophantine approximation
and an improved basis reduction algorithm. In Proceedings of the 11th International Colloquium on
Automata, Languages and Programming ICALP 1984, Antwerp, Belgium. Lecture Notes in Computer
Science 172, Springer-Verlag, 436–447. [497]
A RNOLD S CHÖNHAGE (1985), Quasi-GCD Computations. Journal of Complexity 1, 118–137. [202]
A. S CHÖNHAGE (1988), Probabilistic Computation of Integer Polynomial GCDs. Journal of Algorithms 9,
365–371. [202]
A RNOLD S CHÖNHAGE, A NDREAS F. W. G ROTEFELD, and E KKEHART V ETTER (1994), Fast Algorithms –
A Multitape Turing Machine Implementation . BI Wissenschaftsverlag, Mannheim. [279, 286, 292, 727]
A. S CHÖNHAGE and V. S TRASSEN (1971), Schnelle Multiplikation großer Zahlen. Computing 7, 281–292.
[221, 222, 243, 245, 247, 254, 283]
F RIEDRICH T HEODOR VON S CHUBERT (1793), De inventione divisorum. Nova Acta Academiae Scientiarum
Imperalis Petropolitanae 11, 172–186. [465]
J. T. S CHWARTZ (1980), Fast Probabilistic Algorithms for Verification of Polynomial Identities. Journal of
the ACM 27(4), 701–717. [198, 332]
References 763

Š TEFAN S CHWARZ (1939), Contribution à la réductibilité des polynômes dans la théorie des congruences.
Věstník Královské České Společnosti Nauk, Třída Matemat.-Př Ročník Praha, 1–7. [420]
Š TEFAN S CHWARZ (1940), Sur le nombre des racines et des facteurs irréductibles d’une congruence donnée.
Časopis pro pěstování matematiky a fysiky 69, 128–145. [420]
Š TEFAN S CHWARZ (1956), On the reducibility of polynomials over a finite field. Quarterly Journal of
Mathematics Oxford 7(2), 110–124. [420]
Xtefan Xvarc [Š TEFAN S CHWARZ ] (1960), Ob odnom klasse mnogoqlenov nad koneqnym
telom (On a class of polynomials over a finite field). Matematicko-Fyzikálny Časopis 10, 68–80. [420]
Xtefan Xvarc [Š TEFAN S CHWARZ ] (1961), O qisle neprivodimyh faktorov dannogo
mnogoqlena nad koneqnym polem (On the number of irreducible factors of a polynomial over a
finite field). Qehoslovackii matematiqeskii жurnal (Czechoslovak Mathematical
Journal) 11(86), 213–225. [420]
DANIEL S CHWENTER (1636), Deliciæ Physico-Mathematicæ. Jeremias Dümler, Nürnberg. Reprint by Keip
Verlag, Frankfurt am Main, 1991. [61, 131, 697]
ROBERT S EDGEWICK and P HILIPPE F LAJOLET (1996), An Introduction to the Analysis of Algorithms.
Addison-Wesley, Reading MA. [697]
J.-A. S ERRET (1866), Cours d’algèbre supérieure. Gauthier-Villars, Paris, 3rd edition. [418]
J EFFREY S HALLIT (1990), On the Worst Case of Three Algorithms for Computing the Jacobi Symbol. Journal
of Symbolic Computation 10, 593–610. [533]
J EFFREY S HALLIT (1994), Origins of the Analysis of the Euclidean Algorithm. Historia Mathematica 21,
401–419. [61]
A DI S HAMIR (1979), How to Share a Secret. Communications of the ACM 22(11), 612–613. [131]
A DI S HAMIR (1984), A polynomial-time algorithm for breaking the basic Merkle-Hellman cryptosystem. IEEE
Transactions on Information Theory IT-30(5), 699–704. [503, 509]
A. S HAMIR (1993), On the Generation of Polynomials which are Hard to Factor. In Proceedings of the
Twenty-fifth Annual ACM Symposium on Theory of Computing, San Diego CA, ACM Press, 796–804.
[469]
A DI S HAMIR and R ICHARD E. Z IPPEL (1980), On the Security of the Merkle-Hellman Cryptographic Scheme.
IEEE Transactions on Information Theory IT-26(3), 339–340. [509]
DANIEL S HANKS and J OHN W. W RENCH , J R . (1962), Calculation of π to 100,000 Decimals. Mathematics of
Computation 16, 76–99. [82]
W ILLIAM S HANKS (1853), Contributions to Mathematics Comprising Chiefly the Rectification of the Circle to
607 Places of Decimals. G. Bell, London. Excerpt reprinted in Berggren, Borwein & Borwein (1997),
147–161. [82, 90, 729]
C. E. S HANNON (1948), A Mathematical Theory of Communication. Bell System Technical Journal 27,
379–423 and 623–656. Reprinted in C LAUDE E. S HANNON and WARREN W EAVER, The Mathematical
Theory Of Communication, University of Illinois Press, Urbana IL, 1949. [209, 215, 307]
S HEN K ANGSHENG (1988), Historical Development of the Chinese Remainder Theorem. Archive of the
History of Exact Sciences 38, 285–305. [131]
L. A. S HEPP and S. P. L LOYD (1966), Ordered cycle lengths in a random permutation. Transactions of the
American Mathematical Society 121, 340–357. [421]
V ICTOR S HOUP (1990), On the deterministic complexity of factoring polynomials over finite fields.
Information Processing Letters 33, 261–267. [421]
V ICTOR S HOUP (1991), Topics in the theory of computation. Lecture Notes for CSC 2429, Spring term,
Department of Computer Science, University of Toronto. [205]
V ICTOR S HOUP (1994), Fast Construction of Irreducible Polynomials over Finite Fields. Journal of Symbolic
Computation 17, 371–391. [421]
V ICTOR S HOUP (1995), A New Polynomial Factorization Algorithm and its Implementation. Journal of
Symbolic Computation 20, 363–397. [246, 279, 462]
V ICTOR S HOUP (1999), Efficient Computation of Minimal Polynomials in Algebraic Extensions of Finite
Fields. In Proceedings of the 1999 International Symposium on Symbolic and Algebraic Computation
ISSAC ’99, Vancouver, Canada, ed. S AM D OOLEY, ACM Press, 53–58. [354]
I GOR E. S HPARLINSKI (1992), Computational and Algorithmic Problems in Finite Fields. Mathematics and Its
Applications 88, Kluwer Academic Publishers. [419]
I GOR E. S HPARLINSKI (1999), Finite Fields: Theory and Computation. Mathematics and Its Applications,
Kluwer Academic Publishers, Dordrecht/Boston/London. [419]
A MIR S HPILKA and A MIR Y EHUDAYOFF (2010), Arithmetic Circuits: a survey of recent results and open
questions. Foundations and Trends in Theoretical Computer Science 5(3-4), 207–388. [199]
M. S IEVEKING (1972), An Algorithm for Division of Powerseries. Computing 10, 153–156. [286]
J OSEPH H. S ILVERMAN (1986), The Arithmetic of Elliptic Curves. Graduate Texts in Mathematics 106,
Springer-Verlag, New York. [568]
764 References

ROBERT D. S ILVERMAN (1987), The Multiple Polynomial Quadratic Sieve. Mathematics of

Computation 48(177), 329–339. [567]
M ICHAEL F. S INGER (1991), Liouvillian Solutions of Linear Differential Equations with Liouvillian
Coefficients. Journal of Symbolic Computation 11, 251–273. [641]
S IMON S INGH (1997), Fermat’s Enigma: The epic quest to solve the world’s greatest mathematical problem .
Anchor Books, New York. [514]
M ICHAEL S IPSER (1997), Introduction to the Theory of Computation. PWS Publishing Company, Boston MA.
[89, 721]
A. O. S LISENKO (1981), Complexity problems in computational theory. Uspehi Matematiqeski Nauk
(Uspekhi Matematicheski Nauk) 36(6), 21–103. Russian Mathematical Surveys 36 (1981), 23–125. [419]
R. S OLOVAY and V. S TRASSEN (1977), A fast Monte-Carlo test for primality. SIAM Journal on
Computing 6(1), 84–85. Erratum in 7 (1978), p. 118. [198, 529, 530, 533]
J ONATHAN P. S ORENSON (1998), Trading Time for Space in Prime Number Sieves. In Algorithmic Number
Theory, Third International Symposium, ANTS-III, Portland, Oregon, USA, ed. J. P. B UHLER. Lecture
Notes in Computer Science 1423, Springer-Verlag, Berlin, Heidelberg, 179–195. [533]
V. G. Sprindжuk (1981), Diofantovy uravneni s neizvestnymi prostymi qislami.
Trudy Matematiqeskogo instituta AN SSSR 158, 180–196. V. G. S PRINDZHUK,
Diophantine equations with unknown prime numbers, Proc. Steklov Institute of Mathematics 158 (1983),
197–214. [498]
V. G. S PRINDŽUK (1983), Arithmetic specializations in polynomials. Journal für die reine und angewandte
Mathematik 340, 26–52. [498]
J. S TEIN (1967), Computational Problems Associated with Racah Algebra. Journal of Computational Physics 1,
397–405. [61]
P. S TEVENHAGEN and H. W. L ENSTRA , J R . (1996), Chebotarëv and his density theorem. The Mathematical
Intelligencer 18(2), 26–37. [441]
S IMON S TEVIN (1585), De Thiende. Christoffel Plantijn, Leyden. Übersetzt und erläutert von H ELMUTH
G ERICKE und K URT VOGEL, Akademische Verlagsgesellschaft, Frankfurt am Main, 1965. [41, 61]
JACOBUS (JAMES ) S TIRLING (1730), Methodus Differentialis: sive Tractatus de Summatione et Interpolatione
Serierum Infinitarum. Gul. Bowyer, London. Translated into English with the Author’s Approbation By
F RANCIS H OLLIDAY, Master of the Grammar Free-School at Haughton-Park near Retford,
Nottinghamshire, London, 1749. [670]
A RNE S TORJOHANN (1996), Faster Algorithms for Integer Lattice Basis Reduction. Technical Report 249,
Eidgenössische Technische Hochschule Zürich. 24 pages,
ftp://ftp.inf.ethz.ch/pub/publications/tech-reports/2xx/249.ps.gz. [497]
A RNE S TORJOHANN (2000), Algorithms for Matrix Canonical Forms. PhD thesis, Swiss Federal Institute of
Technology Zürich. [353]
G ILBERT S TRANG (1980), Linear Algebra and Its Applications. Academic Press, New York, second edition.
[713]
VOLKER S TRASSEN (1969), Gaussian Elimination is not Optimal. Numerische Mathematik 13, 354–356.
[335, 337, 352]
V. S TRASSEN (1972), Berechnung und Programm. I. Acta Informatica 1, 320–335. [497]
VOLKER S TRASSEN (1973a), Vermeidung von Divisionen. Journal für die reine und angewandte
Mathematik 264, 182–202. [286, 352, 497]
V. S TRASSEN (1973b), Berechnung und Programm. II. Acta Informatica 2, 64–79. [497]
VOLKER S TRASSEN (1976), Einige Resultate über Berechnungskomplexität. Jahresberichte der DMV 78, 1–8.
[541, 567]
V. S TRASSEN (1983), The computational complexity of continued fractions. SIAM Journal on
Computing 12(1), 1–27. [324, 332]
VOLKER S TRASSEN (1984), Algebraische Berechnungskomplexität. In Perspectives in Mathematics,
Anniversary of Oberwolfach 1984, 509–550. Birkhäuser Verlag, Basel. [352]
VOLKER S TRASSEN (1990), Algebraic Complexity Theory. In Handbook of Theoretical Computer Science,
vol. A, ed. J. VAN L EEUWEN, 633–672. Elsevier Science Publishers B.V., Amsterdam, and The MIT
Press, Cambridge MA. [352]
C. S TURM (1835), Mémoire sur la résolution des équations numériques. Mémoires présentés par divers savants
à l’Acadèmie des Sciences de l’Institut de France 6, 273–318. [94]
A NTONÍN S VOBODA (1957), Rational numerical system of residual classes. Stroje na Zpracování Informací,
Sborník V, Nakl. ČSAV 5, 9–37. [132]
A NTONÍN S VOBODA and M IROSLAV VALACH (1955), Operátorové obvody (Operational Circuits). With
summaries in Russian and English. Stroje na Zpracování Informací 3, 247–295. [132]
R ICHARD G. S WAN (1962), Factorization of polynomials over finite fields. Pacific Journal of Mathematics 12,
1099–1106. [207, 332]
References 765

J. J. S YLVESTER (1840), A method of determining by mere inspection the derivatives from two equations of
any degree. Philosophical Magazine 16, 132–135. Mathematical Papers 1, Chelsea Publishing Co.,
New York, 1973, 54–57. [197, 199]
J. J. S YLVESTER (1853), On the explicit values of Sturm’s quotients. Philosophical Magazine VI, 293–296.
Mathematical Papers 1, Chelsea Publishing Co., New York, 1973, 637–640. [197, 727]
J. J. S YLVESTER (1881), On the resultant of two congruences. Johns Hopkins University Circulars 1, p. 131.
Mathematical Papers 3, Chelsea Publishing Co., New York, 1973, p. 475. [197]
N ICHOLAS S. S ZABÓ and R ICHARD I. TANAKA (1967), Residue arithmetic and its applications to computer
technology. McGraw-Hill, New York. [132]
G. TARRY (1898), Question 1401. Le problème chinois. L’Intermédiaire des Mathématiciens 5, 266–267.
Solution by Korselt. [531]
A LFRED TARSKI (1948), A decision method for elementary algebra and geometry. The Rand Corporation,
Santa Monica CA, 2nd edition. Project Rand, R-109. [619]
B ROOK TAYLOR (1715), Methodus Incrementorum Directa & Inversa. Gul. Innys, London. [286]
R ICHARD TAYLOR and A NDREW W ILES (1995), Ring-theoretic properties of certain Hecke algebras. Annals
of Mathematics 141, 553–572. [514]
G ÉRALD T ENENBAUM (1995), Introduction to analytic and probabilistic number theory. Cambridge studies in
advanced mathematics 46, Cambridge University Press, Cambridge, UK. [536]
A. T HUE (1902), Et par andtydninger til en talteoretisk methode. Videnskabers Selskab Forhandlinger
Christiana 7. [132]
A. L. Toom (1963), O sloжnosti shemy iz funkcionalьnyh зlementov,
realiziru we i umnoжenie celyh qisel. Doklady Akademii Nauk SSSR 150(3),
496–498. A. L. TOOM, The complexity of a scheme of functional elements realizing the multiplication of
integers, Soviet Mathematics Doklady 4 (1963), 714–716. [247]
BARRY M. T RAGER (1976), Algebraic Factoring and Rational Function Integration. In Proceedings of the 1976
ACM Symposium on Symbolic and Algebraic Computation SYMSAC ’76, Yorktown Heights NY, ed.
R. D. J ENKS, ACM Press, 219–226. [466, 640]
C ARLO T RAVERSO (1988), Gröbner trace algorithms. In Proceedings of the 1988 International Symposium on
Symbolic and Algebraic Computation ISSAC ’88, Rome, Italy, ed. P. G IANNI. Lecture Notes in
Computer Science 358, Springer-Verlag, Berlin, 125–138. [619]
J OHANNES T ROPFKE (1902), Geschichte der Elementar-Mathematik, vol. 1. Veit & Comp., Leipzig. [88]
N ICOLA T RUDI (1862), Teoria de’ determinanti e loro applicazioni. Libreria Scientifica e Industriale de
B. Pellerano, Napoli. [199]
A. M. T URING (1937), On computable numbers, with an application to the Entscheidungsproblem. Proceedings
of the London Mathematical Society, Second Series, 42, 230–265, and 43, 544–546. [419]
C HRISTOPHER U MANS (2008), Fast Polynomial Factorization and Modular Composition in Small
Characteristic. In Proceedings of the Fourtieth Annual ACM Symposium on Theory of Computing,
Victoria, BC, Canada, ACM Press, 481–490. Invited to the STOC 2008 special issue of SICOMP.
[339, 751]
A LASDAIR U RQUHART (1995), The complexity of propositional proofs. The Bulletin of Symbolic Logic 1(4),
425–467. [697]
G IOVANNI VACCA (1894), Intorno alla prima dimostrazione di un teorema di Fermat. Bibliotheca Mathematica,
Serie 2, 8, 46–48. [88]
B RIGITTE VALLÉE (2003), Dynamical Analysis of a Class of Euclidean Algorithms. Theoretical Computer
Science 297, 447–486. [61]
C H .-J. DE LA VALLÉE P OUSSIN (1896), Recherches analytiques sur la théorie des nombres premiers. Annales
de la Société Scientifique de Bruxelles 20, 183–256 and 281–397. [533]
R. C. VAUGHAN (1974), Bounds for the coefficients of cyclotomic polynomials. Michigan Mathematical
Journal 21, 289–295. [198]
G. S. V ERNAM (1926), Cipher Printing Telegraph Systems. Journal of the American Institute of Electrical
Engineers 45, 109–115. [580]
G. V ILLARD (1997), Further Analysis of Coppersmith’s Block Wiedemann Algorithm for the Solution of
Sparse Linear Systems. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic
Computation ISSAC ’97, Maui HI, ed. W OLFGANG W. K ÜCHLIN, ACM Press, 32–39. [353]
J EFFREY S COTT V ITTER and P HILIPPE F LAJOLET (1990), Average-Case Analysis of Algorithms and Data
Structures. In Handbook of Theoretical Computer Science, vol. A, ed. J. VAN L EEUWEN, 431–524.
Elsevier Science Publishers B.V., Amsterdam, and The MIT Press, Cambridge MA. [697]
L. G. WADE , J R . (1995), Organic Chemistry. Prentice-Hall, Inc., Englewood Cliffs NJ, 3rd edition. [698]
BARTEL L. VAN DER WAERDEN (1930a), Eine Bemerkung über die Unzerlegbarkeit von Polynomen.
Mathematische Annalen 102, 738–739. [419]
766 References

B. L. VAN DER WAERDEN (1930b), Moderne Algebra, Erster Teil. Die Grundlehren der mathematischen
Wissenschaften in Einzeldarstellungen 33, Julius Springer, Berlin. English translation: Algebra,
Volume I., Springer Verlag, 1991. [586, 703]
B. L. VAN DER WAERDEN (1931), Moderne Algebra, Zweiter Teil. Die Grundlehren der mathematischen
Wissenschaften in Einzeldarstellungen 34, Julius Springer, Berlin. English translation: Algebra,
Volume II., Springer Verlag, 1991. [349, 586, 703]
B. L. VAN DER WAERDEN (1934), Die Seltenheit der Gleichungen mit Affekt. Mathematische Annalen 109,
13–16. [465]
B. L. VAN DER WAERDEN (1938), Eine Bemerkung zur numerischen Berechnung von Determinanten und
Inversen von Matrizen. Jahresberichte der DMV 48, 29–30. [352]
S AMUEL S. WAGSTAFF , J R . (1983), Divisors of Mersenne numbers. Mathematics of Computation 40(161),
385–397. [534]
G REGORY K. WALLACE (1991), The JPEG Still Picture Compression Standard. Communications of the
ACM 34(4), 30–44. [368]
D. WAN (1993), A p-adic lifting lemma and its applications to permutation polynomials. In Proceedings 1992
Conference on Finite Fields, Coding Theory, and Advances in Communications and Computing, eds.
G. L. M ULLEN and P. J.-S. S HIUE. Lecture Notes in Pure and Applied Mathematics 141, Marcel Dekker,
Inc., 209–216. [425]
X INMAO WANG and V ICTOR Y. PAN (2003), Acceleration of Euclidean algorithm and rational number
reconstruction. SIAM Journal on Computing 32(2), 548–556. [327]
E DWARD WARING (1770), Meditationes Algebraicæ. J. Woodyer, Cambridge, England, second edition. English
translation by D ENNIS W EEKS, American Mathematical Society, 1991. [286]
E DWARD WARING (1779), Problems concerning Interpolations. Philosophical Transactions of the Royal
Society of London 69(7), 59–67. [131]
S TEPHEN M. WATT and H ANS J. S TETTER, eds. (1998), Symbolic-Numeric Algebra for Polynomials. Special
Issue of the Journal of Symbolic Computation 26(6). [41]
I NGO W EGENER (1987), The Complexity of Boolean Functions . Wiley-Teubner Series in Computer Science,
B. G. Teubner, Stuttgart, and John Wiley & Sons. [721]
B. M. M. DE W EGER (1989), Algorithms for Diophantine equations. CWI Tract no. 65, Centrum voor
Wiskunde en Informatica, Amsterdam. 212 pages. [497]
A NDRÉ W EIL (1984), Number theory: An approach through history; From Hammurapi to Legendre .
Birkhäuser Verlag. xxi+375 pages. [513]
A NDRÉ W EILERT (2000), (1 + i)-ary GCD Computation in Z[i] as an Analogue to the Binary GCD Algorithm.
Journal of Symbolic Computation 30(5), 605–617. [61]
A NDREAS W ERCKMEISTER (1691), Musicalische Temperatur. Theodorus Philippus Calvisius, Franckfurt und
Leipzig. First edition 1686/87. Reprint edited by G UIDO B IMBERG and R ÜDIGER P FEIFFER, Denkmäler
der Musik in Mitteldeutschland: Ser. 2., Documenta theoretica musicae; Bd. 1: Werckmeister-Studien.
Verlag Die Blaue Eule, Essen, 1996. [86]
D OUGLAS H. W IEDEMANN (1986), Solving Sparse Linear Equations Over Finite Fields. IEEE Transactions on
Information Theory IT-32(1), 54–62. [340, 346, 351, 352, 355, 556]
A NDREW W ILES (1995), Modular elliptic curves and Fermat’s Last Theorem. Annals of Mathematics 142,
443–551. [514]
H ERBERT S. W ILF (1994), generatingfunctionology. Academic Press, 2nd edition. First edition 1990.
[466, 697]
H ERBERT S. W ILF and D ORON Z EILBERGER (1990), Rational functions certify combinatorial identities.
Journal of the American Mathematical Society 3(1), 147–158. [697]
H ERBERT S. W ILF and D ORON Z EILBERGER (1992), An algorithmic proof theory for hypergeometric
(ordinary and “q”) multisum/integral identities. Inventiones mathematicae 108, 575–633. [671, 697]
M ICHAEL W ILLETT (1978), Factoring polynomials over a finite field. SIAM Journal on Applied
Mathematics 35, 333–337. [419]
H. C. W ILLIAMS (1982), A p + 1 Method of Factoring. Mathematics of Computation 39(159), 225–234. [568]
H. C. W ILLIAMS (1993), How was F6 factored? Mathematics of Computation 61(203), 463–474. [542]
H. C. W ILLIAMS and H ARVEY D UBNER (1986), The primality of R1031. Mathematics of
Computation 47(176), 703–711. [530]
H. C. W ILLIAMS and M. C. W UNDERLICH (1987), On the Parallel Generation of the Residues for the
Continued Fraction Factoring Algorithm. Mathematics of Computation 48(177), 405–423. [569]
L ELAND H. W ILLIAMS (1961), Algebra of Polynomials in Several Variables for a Digital Computer. Journal of
the ACM 8, 29–40. [20]
V IRGINIA VASSILEVSKA W ILLIAMS (2011), Breaking the Coppersmith-Winograd barrier.
http://www.cs.berkeley.edu/∼virgi/. Last visited 08 December 2011. 72 pp. [352]
S. W INOGRAD (1971), On Multiplication of 2 × 2 matrices. Linear Algebra and its Applications 4, 381–388.
[352]
References 767

W EN - TSÜN W U (1994), Mechanical Theorem Proving in Geometries: Basic Principles. Texts and Monographs
in Symbolic Computation, Springer-Verlag, Wien and New York. English translation by X IAOFAN J IN
and D ONGMING WANG. Originally published as “Basic Principles of Mechanical Theorem Proving in
Geometry” in Chinese language by Science Press, Beijing, 1984, XIV and 288 pp. [618, 619]
C HEE K. YAP (1991), A New Lower Bound Construction for Commutative Thue Systems with Applications.
Journal of Symbolic Computation 12, 1–27. [618]
A LEXANDER J. Y EE and S HIGERU KONDO (2011), Pi - 10 Trillion Digits. Last visited 16 October 2011. [90]
DAVID Y. Y. Y UN (1976), On Square-free Decomposition Algorithms. In Proceedings of the 1976 ACM
Symposium on Symbolic and Algebraic Computation SYMSAC ’76, Yorktown Heights NY, ed. R. D.
J ENKS, ACM Press, 26–35. [419, 466]
DAVID Y. Y. Y UN (1977a), Fast algorithm for rational function integration. In Information
Processing 77—Proceedings of the IFIP Congress 77, ed. B. G ILCHRIST, North-Holland, Amsterdam,
493–498. [640]
DAVID Y. Y. Y UN (1977b), On the equivalence of polynomial gcd and squarefree factorization problems.
In Proceedings of the 1977 MACSYMA Users Conference, Berkeley CA, NASA, Washington DC, 65–70.
[425]
H ANS Z ASSENHAUS (1969), On Hensel Factorization, I. Journal of Number Theory 1, 291–311.
[417, 444, 466]
D ORON Z EILBERGER (1990a), A holonomic systems approach to special function identities. Journal of
Computational and Applied Mathematics 32, 321–368. [671, 697]
D ORON Z EILBERGER (1990b), A fast algorithm for proving terminating hypergeometric identities. Discrete
Mathematics 80, 207–211. [671, 697]
D ORON Z EILBERGER (1991), The Method of Creative Telescoping. Journal of Symbolic Computation 11,
195–204. [671, 697]
D ORON Z EILBERGER (1993), Theorems for a Price: Tomorrow’s Semi-Rigorous Mathematical Culture.
Notices of the American Mathematical Society 40(8), 978–981. [697]
PAUL Z IMMERMANN (1991), Séries génératrices et analyse automatique d’algorithmes. PhD thesis, École
Polytechnique, Paris. [697]
P HILIP R. Z IMMERMANN (1996), The Official PGP User’s Guide. MIT Press. [18]
R ICHARD Z IPPEL (1979), Probabilistic Algorithms for sparse Polynomials. In Proceedings of EUROSAM ’79,
Marseille, France. Lecture Notes in Computer Science 72, Springer-Verlag, 216–226. [198, 498]
R ICHARD Z IPPEL (1993), Effective polynomial computation. Kluwer Academic Publishers, Boston MA. [204]
List of notation
N, N>n set of nonnegative integers, set of integers greater than n ∈ N
Z ring of integers
Q, Q>0 field of rational numbers, set of positive rational numbers
R, R>r field of real numbers, set of real numbers greater than r ∈ R
C field of complex numbers
Ø empty set
A∪B union of the sets A and B
A∩B intersection of the sets A and B
A\B set-theoretic difference of A and B
A×B Cartesian product of the sets A and B
An vectors of length n ∈ N over the set A
AN countably infinite sequences over the set A, page 341
#A cardinality (number of elements) of the set A
hAi subgroup, ideal, or subspace generated by the elements of A, pages 704, 706, 714
A∼ =B A and B are isomorphic groups or rings, pages 704, 705
R× group of units of the ring R, page 707
R[x] ring of polynomials in the variable x over the ring R, page 708
R[x1 , . . ., xn ] ring of polynomials in n variables over the ring R, page 709
R[[x]] ring of power series in the variable x over the ring R, page 708
Rn×m ring of n × m matrices over the ring R for n, m ∈ N
R/I residue class ring of the ring R modulo the ideal I ⊆ R, page 706
F(x) field of rational functions in the variable x over the field F, page 710
F((x)) field of Laurent series in the variable x over the field F, page 91
exp x exponential function, ex for x ∈ R
ln x natural (base e) logarithm of x ∈ R>0
log x binary (base 2) logarithm of x ∈ R>0
ℜa real part of a ∈ C
ℑa imaginary part of a ∈ C
|a| absolute value of a ∈ C
sign(a) sign of a ∈ R
⌊a⌋ greatest integer less or equal to a ∈ R
⌈a⌉ smallest integer greater or equal to a ∈ R
⌈a⌋ nearest integer to a ∈ R, ⌊a + 1/2⌋, page 478
||a||1 1-norm of a vector or polynomial a, page 717
||a||2 Euclidean norm of a vector or a polynomial a, page 717
||a||∞ max-norm of a vector or polynomial a, page 717
a⋆b inner product of vectors a and b, page 717
a|b a divides b, ∃c b = ac
a∤b a does not divide b
f′ formal derivative of the polynomial or rational function f , page 266
∂ f /∂x formal derivative of the multivariate polynomial f with respect to x
fm mth rising factorial power for m ∈ Z, f · E f · · · E m−1 f if m ∈ N, page 646
m
f n mth falling factorial power for m ∈ Z, f · E −1 f · · · E 1−m f if m ∈ N, page 647
binomial coefficient for n, k ∈ N
h nk i
Stirling number of the first kind for n, k ∈ N, page 672
nkn o
Stirling number of the second kind for n, k ∈ N, page 650
k
[q1 , q2 , . . ., qn ] continued fraction q1 + 1/(q2 + 1/(· · · + 1/qn ) · · ·), page 79
←− assignment in algorithm
∗ , ∗∗ , −→ ranking of exercises: medium, difficult, lengthy (no mark = easy)
✷ end of proof
✸ end of example

768
Index
A page number is underlined (for example: 667) when it represents the definition or the main source of
information about the index entry. For several key words that appear frequently only ranges of pages or the
most important occurrences are indexed.

AAECC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 analytic number theory . . . . . 508, 523, 532, 533, 652

Abbott, John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465, 734 Analytical Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
Abel, Niels Henrik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Andrews, George Eyre . . . . . . . . . . 644, 697, 729, 735
Abelian group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704 Queen Anne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Abramov, Sergeı̆ Aleksandrovich (Abramov annihilating polynomial . . . . . . . . . . . . . . . . . . . . . . . . . 341
Serge i Aleksandroviq) 7, 641, 671, 675, annihilator, Ann(·) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
734, 735 annus confusionis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Abramson, Michael . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738 Antoniou, Andreas . . . . . . . . . . . . . . . . . . . . . . . . 353, 735
Achilles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Antoniszoon, Adrian (Metius) . . . . . . . . . . . . . . . . . . . . 82
ACM . . . see Association for Computing Machinery Aoki, Kazumaro . . . . . . . . . . . . . . . . . . . . . . . . . . 542, 751
active attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580 Apéry, Roger . . . . . . . . . . . . . . . . . . . 671, 684, 697, 761
Adam, Charles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729 number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
Adams, Douglas Noël . . . . . . . . . . . . . . . . . . . . . 729, 796 Apollonius of Perga (᾿Απολλώνιος ὁ Περγαι̃ος)
addition chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
additive group . . . . . . . . . . . . . . . . . . 250, 578, 703, 713 Apostol, Tom Mike . . . . . . . . . . . . . . . . . . . . . . . . . 62, 735
Adleman, Leonard Max 16, 421, 509, 531, 576, 735, Archimedean valuation . . . . . . . . . . . . . . . . . . . . . . . . . . 274
762 Archimedes of Syracuse (᾿Αρχιμήδης ὁ
affine part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568 Συρακούσιος) 24, 82, 218, 294, 358, 372, 374,
Agrawal, Manindra . . . . . . . . . . . . . . . . . . 517, 543, 735 622, 669, 735
Aho, Alfred Vaino . . . . . . . . . . . . . . 286, 292, 332, 735 Aristotle (᾿Αριστοτέλης) . . . . . . . . . . . . . . . . . . . . . . . . . 24
Ajtai, Miklos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496, 735 arithmetic
Albanese, Andres . . . . . . . . . . . . . . . . . . . . . . . . . 215, 735 circuit . . . 36, 39, 43, 88, 101, 199, 223, 224, 234,
Albert, Abraham Adrian . . . . . . . . . . . . . . . . . . 572, 728 235, 248, 252, 495, 619
D’Alembert, Jean le Rond . . . . . . . . . . . . . . . . . 676, 729 representation . . . . . . . . . . . . . . 495, 497, 498, 501
Alford, William Robert (Red) . . . . . . . . 529, 532, 735 hardware ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279, 283
algebra operation . . . . . . . . . . . . . . . . . . . . . . . 31, 32, 34, 35, 40
fundamental theorem of ∼ . . . . . . . . . . . . . . 372, 711 Artin, Emil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568, 748
homomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Arwin, Axel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418, 735
algebraic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710 Āryabhat.a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61, 90
closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711, 716 Āryabhat.ı̄ya . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61, 286
complexity theory . . . . . . . . . . . . . . . . . 243, 338, 721 ascending chain condition . . . . . . . . . . . . . . . . . 604, 610
computation problem . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Asmuth, Charles A. . . . . . . . . . . . . . . . . . . . . . . . 131, 735
curve . . . . . . . . . . . . . . . . . . . . . . 11, 172, 174, 175, 696 associate . . . . . . . . . . . . . . . . . . . . . 46, 147, 518, 707, 708
degree of an ∼ element . . . . . . . . . . . . . . . . . . . . . . . 710 associated sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696 Association for Computing Machinery (ACM) . . 533
field extension . . . . . . . . . . . . . . . . . see field extension Journal of the ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
geometry . . . . . . . . . . . . . . . . . . . . . . . . . . 558, 568, 595 SIGSAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
computational ∼ . . . . . . . . . . . . . . . 4, 591, 595, 619 asymmetric cryptosystem . . . . . . . . . . . . . . . . . . . 17, 575
integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707, 708
asymptotic complexity . . . . . . . . . . . . . . . . . . . . . . . 6, 338
number . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203, 640, 663
Atkin, Arthur Oliver Lonsdale . . . . . . . . . . . . . 532, 735
field . . . . . . . . . . . . . . . . . . . . . . . 279, 378, 473, 533
authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576, 578
variety . . . . . . . . . . . . . 172, 198, 586, 591, 613, 614
automorphism . . . . . . . . . . . . . . . . . . 441, 713, 714, 715
algebraically closed . . . . . . . . 172, 595, 617, 621, 711
Frobenius ∼ . . . . . . . . . . . . . . . . . . . 398, 420, 465, 713
A LGOLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
al-H.ajjāj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . see Hajjāj
case analysis . . . . . . . . . . . . . . . 61, 62, 162, 419, 523
Alice . . . . . . . . . . 16–18, 503, 504, 573, 574, 577–580
depth
al-Kāshı̄ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . see Kāshı̄
al-Khwārizmı̄ . . . . . . . . . . . . . . . . . . . . . . see Khwārizmı̄ of a Huffman tree . . . . . . . . . . . . . . . . . . . . . . . . . . 308
Almkvist, Gert . . . . . . . . . . . . . . . . . . . . . . . 641, 671, 735 of a stochastic mobile . . . . . . . . . . . 306, 307, 308
and Zeilberger integration . . . . . . . . . 632, 637, 643 Avicenna (Abū ֒Alı̄ al-H.usayn bin ֒Abd Allāh bin

Alon, Noga . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215, 735 Sı̄nā, úæJ
áK . é<Ë @ YJ.« áK . á
mÌ '@ úÎ« ñK. @) . . 88
Amerbach, Bonifatius . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 A XIOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 20, 24
Amoroso, Francesco . . . . . . . . . . . . . . . . . . . . . . 618, 735 axiom
amplitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360, 362 in a proof system . . . . . . . . . . . . . . . . . . 677, 678, 679
analog signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 of choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
analysis of algorithms . . . . . . . . . . . . . . . . . . 29, 278, 697 Azra, Jean-Pierre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745

769
770 Index

B-number . . . . . . . . . . . . . 550, 551, 553, 556, 557, 568 Berman, Benjamin P. . . . . . . . . . . . . . . . . . . . . . . 640, 737
Babai, László . . . . . . . . . . . . . . . . . . . . . . . . 198, 724, 736 Bernardin, Laurent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Babbage, Charles . . . . . . . . . . 312, 676, 725, 727, 729 Bernoulli, Jakob (1654–1705)
baby step/giant step strategy . . . . . . . . . . . . . . . . . . . . . 544 number . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650, 669, 672
Babylonians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286, 291 random variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
Bach, Carl Eric 6, 61, 287, 421, 529, 531–535, 568, Bernoulli, Johann (1667–1748) . . . . . . . . . . . . 640, 737
736 Bernstein, Daniel Julius . . . . . . . . . 247, 287, 353, 737
Bach, Johann Sebastian . . . . . . . . . . . . . . . . . . . . . 86, 736 Bernstein, Jeremy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
Bachet, Claude Gaspard, sieur de Méziriac . 61, 513, Bert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
736 Bertossi, Leopoldo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Bachmann, Paul Gustav Heinrich . . . . . . . . . 531, 724 Bertrand, Joseph Louis François, postulate . . . . . . 525
back-tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678 Beschorner, Andreas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Bacon, Lord Francis . . . . . . . . . . . . . . . . . . . . . . . . . 0, 725 Beth, Thomas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747, 760
Bailey, David Harold . . . . . . . . . . . 2, 83, 337, 736, 737 Bézier, Pierre Étienne . . . . . . . . . . . . . . . . . . . . . 138, 737
Baker, George Allen, Jr. . . . . . . . . . . . . . . . . . . . 132, 736 curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Ball, Walter William Rouse . . . . . . . . . . 531, 534, 736 spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Barbour, James Murray . . . . . . . . . . . . . . . . . . . . . 91, 736 Bézout, Étienne . . . . . . . 172, 197, 590, 724, 728, 737
Bareiss, Erwin Hans . . . . . . . . . . . . . . . . . . . . . . 132, 736 -coprime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450, 471
Barnett, Michael . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 coefficients . 58, 62, 141, 153, 155, 161, 197, 325,
Barrau, Théophile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725 326
Barrow, Isaac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197, 201
basis theorem . . . . . . . 157, 172, 173, 175, 198, 560, 692
Gröbner ∼ . . . . . . . . . . . . . . . . . . . . . see Gröbner basis Bible . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24, 219, 711
Hilbert ∼ theorem . . 586, 601, 604, 605, 606, 618 Biermann, Gottlieb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
normal ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76, 580 big Oh, O(·) 2, 30, 32, 703, 715, 720, 721, 723, 724
of a lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473, 477 big prime modular algorithm . see modular algorithm
of a vector space . . . . 209, 212, 475, 714, 715, 717 bijective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704, 705
of an ideal . . . . . . . . . . . . . . . . . . . . 593, 601, 608, 706 bilinear
orthogonal ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475, 717 complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337, 338
reduced ∼ . . . . . . . . . . . . . . . . . . . . . . . see reduced basis map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
reduction 475, 478, 479, 480, 484, 488, 492, 493, Bimberg, Guido . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766
496, 497, 499, 500, 503, 505–509, 576, 580 binary
standard ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591 calendar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Bauer, Andrej . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671, 736 Euclidean Algorithm . . . . . . . . . . . . . . . . . 61, 65, 738
Baur, Walter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352, 736 rational . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Bayer, David . . . . . . . . . . . . . . . . . . . . . . . . . 618, 619, 736 representation . . . . 75, 88, 100, 262, 283, 408, 504
BCH code . . . . 3, 209, 210, 211, 212–215, 325, 332, tree . . . . . . . . . . . . . . . . . . . . . . 296, 303, 305–307, 309
377, 412, 416, 417, 756 Binet, Jacques Philippe Marie . . . . . . . . . . . . . . . 61, 737
designed distance of a ∼ . . . . . . . . . . . . . . . . 212, 213 Bini, Dario Andrea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
generator polynomial of a ∼ . . . 211, 212, 214, 215, binomial . . . . . . . . . . . . . . . . . . . . . . . . 230, 463, 616, 681
416 coefficient . 76, 166, 658–660, 669, 670, 684, 713,
Beame, Paul William . . . . . . . . . . . . . . . . . . . . 6, 697, 736 768
Becker, Thomas . . . . . . . . . . . . . . . . . . . . . . . . . . . 618, 736 ideal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616, 681, 697
Beiler, Albert H. . . . . . . . . . . . . . . . . . . . . . . . . . . 534, 736 theorem . . . . . . . . . . . . . . . . . . . . . . . . 76, 667, 669, 673
Bell, Eric Temple (John Taine) . . . . 10, 96, 219, 644, B I P OL A R . . . . . . . . . . . . . . . 3, 279, 281–283, 461, 462
725, 726, 729, 736 Birch, Thomas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726
Beltrami, Eugenio . . . . . . . . . . . . . . . . . . . . . . . . . 729, 734 birthday problem . . . . . . . . . . . . . . . . . . . . . . . . . . 546, 548
Benecke, Christof . . . . . . . . . . . . . . . . . . . . . . . . . 698, 736 bit operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Ben-Or, Michael . 410, 411, 421, 498, 619, 736, 737, bivariate
759 factorization . . . 433, 457, 459, 493, 496, 497, 586
Berenstein, Carlos Alberto . . . . . . . . . . . . . . . . 618, 737 interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Berggren, Lennart . 90, 729, 735, 737, 749, 751, 753, modular
761, 763 gcd . . . . . . . . . . . . . . . . . . . . . . . . . . . see modular gcd
Berkeley, George . . . . . . . . . . . . . . . . . . . . . . . . . . 622, 729 EEA . . . . . . . . . . . . . . . . . . . . . . . . see modular EEA
Berlekamp, Elwyn Ralph . . . 198, 215, 335, 340, 352, polynomial . . . . 141, 162, 178, 182, 186, 203, 205,
401, 402, 404, 406, 417, 419–421, 428, 462, 465, 206, 246, 254, 289, 332, 457, 473, 493
467, 530, 737 black box . . . . . . . . 101, 340, 351–353, 355, 496, 498
algebra . . . 401, 402, 403, 420, 423, 427, 428, 430 linear algebra . . . . . . . 335, 340, 346, 352, 404, 407
algorithm 161, 198, 335, 402, 403, 404, 405, 407, representation of a polynomial . . . . . . . . . . . . . . . . 496
420, 424, 427, 428, 530, 745, 751 Black, John Richard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
-Massey algorithm . . . . . . . . . . . . . . . . . . . . . . 325, 742 Blake, Ian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580, 737
matrix . . . . . . . . . . . . . . . . see Petr-Berlekamp matrix Blakley, George Robert (Bob) . . . . . . . . . . . . . 131, 735
Index 771

Blau, Peter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Bürgisser, Peter . 7, 88, 222, 286, 338, 352, 616, 739
blocking strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 422, 461 Burnikel, Christoph . . . . . . . . . . . . . . . . . . . . . . . 286, 739
Blömer, Johannes Friedrich . . . . . . . . . . . . . . . 215, 735 Burrus, Charles Sydney . . . . . . . . . . . . . . . . . . . 247, 748
boat conformation . . . see cyclohexane conformation Buss, Samuel Rudolph . . . . . . . . . . . . . . . . . . . . 697, 739
Bob . . . . . . . . . . . . . . . . . 16, 17, 503, 573, 574, 577–580 Butler, Michael Charles Richard . . . . . . . . . . 420, 739
du Bois-Reymond, Emil . . . . . . . . . . . . . . . . . . . . . . . . . 588 butterfly operation . . . . . . . . . . . . . . . . . . . . . . . . 234, 235
Boltzmann, Ludwig . . . . . . . . . . . . . . . . . . . . . . . 622, 728 Büttner, J. G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
Bolyai, Wolfgang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
Bolyai de Bolya, János (Johann) . . . . . . . . . . . . . . . . 374 C, field of complex numbers . . . . . . . . . . . . . . . . . . . . 768
Bombieri, Enrico . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90, 737 Cade, John Joseph . . . . . . . . . . . . . . . . . . . . . . . . 576, 739
Bonn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i, 8 Caesar, Gaius Julius . . . . . . . . . . . . . . . . . . . . . 83, 84, 575
Bonnet, Ossian Pierre . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573, 574, 580
Bonorden, Olaf . . . . . . . . . . . . . . . . . . . . . . . . . . . 461, 737 Caldwell, Chris Kelly . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
Boole, George . . . . . . . . . . . . . . . . . . . . . . . 669, 737, 766 calendar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11, 69, 83
Boolean Gregorian ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84, 91
circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Julian ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83, 84, 91
variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678
lunar ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Borodin, Allan Bertram . . 6, 286, 306, 498, 737, 757
lunisolar ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Borwein, Jonathan Michael . . 83, 90, 729, 735, 737,
Camion, Paul Frédéric Roger . . . . . . . . . 419, 420, 739
749, 751, 753, 761, 763
cancellation law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706
Borwein, Peter Benjamin . . . . . 83, 90, 729, 735, 737,
Canfield, Earl Rodney . . . . . . . . . . . . . . . . . . . . . 567, 739
749, 751, 753, 761, 763
Caniglia, Leandro . . . . . . . . . . . . . . . . . . . . 618, 619, 739
Bos, Joppe Willem . . . . . . . . . . . . . . . . . . . . . . . . 542, 751
Bose, Nirmal Kumar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738 Cannon, John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Bose, Raj Chandra . . . . . . . . . . . . . . . . . . . 215, 737, 756 Canny, John Francis . . . . . . . . . . . . . 619, 698, 739, 759
Bosma, Wiebren . . . . . . . . . . . . . . . . . . . . . . . . 6, 737, 739 canonical
bound form
Hasse ∼ . . . . . . . . . . . . . . . . . . 508, 562, 564, 565, 740 of a rational function . . 116, 117, 119, 121, 122,
Mignotte ∼ . . . . . . . . . . . . . . . . . . see Mignotte bound 124, 138
Weil ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568, 736 of a rational number . . . . . . . . . . . . . . . . . . 126, 127
Bourgne, Robert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745 representative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
Bouyer, Martine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 ring homomorphism . . . . . . . 72, 104, 110, 706, 709
Boyar, Joan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505, 737 Cantor, David Geoffrey . . . . 245, 247, 280–282, 287,
Boyle, Robert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44, 726 405, 406, 417, 418, 466, 739
BPP . . . . . . . . . . . . . . . . . 496, 532, 616, 721, 722, 724 and Zassenhaus algorithm . . . . . . . . . . . . . . 382, 407
Brassard, Gilles . . . . . . . . . . . . . . . . . . . . . . . . 41, 720, 737 multiplication algorithm . . . . . . . . . . . 281, 282, 287
Bratley, Paul . . . . . . . . . . . . . . . . . . . . . . . . . . . 41, 720, 737 Carathéodory, Constantin . . . . . . . . . . . . . . . . . . . . . . . 586
Brauer, Alfred Theodor . . . . . . . . . . . . . . . . . . . . . . . . . 737 cardinality, # . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704, 768
Bremner, Murray Ronald . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Carlitz, Leonard . . . . . . . . . . . . . . . . . . . . . . . . . . 426, 739
Brent, Richard Peirce . . . 61, 90, 332, 353, 354, 542, Carlyle, Thomas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726
567, 738 Carmichael, Robert Daniel . . . . . . . . . . . . . . . . 531, 739
Brickell, Ernest Francis . . . . . . . . . . . . . . . . . . . 509, 738 function, λ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
Brieskorn, Egbert . . . . . . . . . . . . . . . . . . . . . . . . . 568, 738 number . . . 520, 521–523, 531, 532, 535, 537, 735
Brillhart, John David . . . . . . . 541, 542, 568, 738, 758 Carmody, Phil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
Broda, Engelbert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728 Caron, Thomas R. . . . . . . . . . . . . . . . . . . . . 531, 567, 739
Bronstein, Manuel . . . . 640–642, 671, 735, 738, 760 le Carré, John (David John Moore Cornwell) . . . 220,
Brook, Clifford Hardman (Clive) . . . . . . . . . . . . . . . . 729 727
Brown, William Stanley . . . . . . 62, 197–199, 332, 738 Carroll, Lewis (Rev. Charles Lutwidge Dodgson)
Brownawell, Woodrow Dale . . . . . . . . . . . . . . 618, 738 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28, 726
Brun, Viggo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758 carry
Bruns, Winfried . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 flag . . . . . . . . . . . . . . . . . . . . . 30, 41, 42, 222, 262, 280
AJk
Bshouty, Nader Hanna ( úGñ

. PXAK ) . . . 353, 751 look-ahead addition . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
bubble sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 de Casteljau, Paul de Faget . . . . . . . . . . . . . . . . 138, 739
Bucciarelli, Louis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Castelnuovo, Guido, -Mumford regularity . . . . . . . 618
Buchberger, Bruno 21, 591, 609, 618, 738, 740, 750, Cataldi, Pietro Antonio . . . . . . . . . . . . . . . . . . . . . . 89, 739
753, 757 Cauchy, Augustin Louis . . . . 131, 132, 197, 286, 373,
algorithm . 591, 608, 609, 610, 611, 612, 617, 747 739, 740, 755
Buchmann, Johannes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 interpolation . . . 118, 121, 137, 138, 190, 325, 331
Budach, Lothar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762 -Schwarz inequality . . . . . . . . . . . . . . . 485, 500, 555
Buffon, Georges Louis Leclerc, Comte de . . . . . . . 198 sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
Buhler, Joe Peter . . . . . . . . . . . . . . . . . . . . . . . . . . 759, 764 Cavalieri, Bonaventura . . . . . . . . . . . . . . . . . . . . . . . . . . 622
Bunch, James Raymond . . . . . . . . . . . . . . . . . . . 352, 738 Caviness, Bob Forrester . . . . . . . . . . . . . . . . . . . 640, 740
772 Index

Cayley, Arthur . . . . . . . . . . . . . . . . . . . . . . . 197, 740, 749 classical

-Hamilton theorem . . . . . . . . . . . . . . . . . . . . . 341, 716 algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
center of gravity . . . . . . . . . . . . . . . . 592, 612, 613, 652 polynomial multiplication . . . . . . . . . . . . . . . . . . . . . 34
Ceres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 Clausen, Michael Hermann . . . 7, 88, 222, 286, 338,
Cervantes Saavedra, Miguel de . . . . . . . . . . . . . . 90, 740 352, 498, 739, 741
van Ceulen, Ludolph (Ludolf) . . . . . . . . . . . . . . . . 82, 90 Clebsch, Rudolf Friedrich Alfred . . . . . . . . . . . . . . . . 729
Chahal, Jasbir Singh . . . . . . . . . . . . . . . . . . . . . . 568, 740 Clegg, Matthew . . . . . . . . . . . . . . . . . . . . . . . . . . . 679, 741
chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266, 354 Clifford, William Kingdon . . . . . . . . . . . . . . . . 702, 729
chair conformation . . . see cyclohexane conformation closed form . . . . . . . . . . . . . . . 1, 66, 645, 673, 675, 697
Chandler, Raymond . . . . . . . . . . . . . . . . . . . . . . . 729, 734 von Coburg (Koburgk), Simon Jacob . . . . . . . . . . . . . 61
change of representation . . . . . . . . . . 99, 100, 231, 363 code . . . . . . . . . . . . . . . . . . . . . 11, 18, 209–212, 214, 215
Char, Bruce Walter . . . . . . . . . . . . . . . . . . . . . . . . 202, 740 BCH ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . see BCH code
characteristic cyclic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
of a ring or field, char . . . . 394, 395, 397, 415, 460, dimension of a ∼ . . . . . . . . . . . . . . . . . . . 209, 210, 211
558, 561, 581, 623, 626, 630, 637, 658, 665, erasure ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18, 215
710, 712 error correcting ∼ . . . . . . . . . . . . . . . . . . . . . . . . . 18, 209
polynomial Huffman ∼ . . . . . . . . . . . . . . . . . . . . . . . . . 307, 365–368
of a matrix . . . . . . . . . . . . . . . . . 337, 346, 420, 716 instantaneous ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
of a sequence 341, 342–344, 348, 349, 354, 355 length of a ∼ . . . . . . . . . . . . . . . . . . . . . . . 209, 210–212
set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619 linear ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209, 215
King Charles II of England . . . . . . . . . . . . . . . . . . . . . . 218 minimal distance of a ∼ . . . . . . . 210, 211–213, 215
Ray-Chaudhuri, Dwijendra Kumar . . . 215, 737, 756 coding theory . . . . . . . . . . . . . . . . 37, 209, 215, 325, 416
Chebotarev, Nikolaı̆ Grigor’evich (Qebotarëv coefficient
Nikola i Grigorьeviq) . . . . 441, 442, 465, growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141, 143
740 matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
theorem . . 441, 442, 443, 465, 467, 753, 759, 764
of a polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Chebyshev, Pafnutı̆ L’vovich’ (Qebyxev
representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Pafnut i  Lьvoviqь) . . . . . . . . . . . . . . 533, 740
Cohen, Henri José . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chen, Evan Jingchi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
coincide up to . . . . . . . . . 314, 315–317, 322, 323, 332
Chen, Pehong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Collins, George Edwin . 20, 197–199, 332, 455, 465,
Chen, Zhi-Zhong . . . . . . . . . . . . . . . . . . . . . . . . . . 199, 740
619, 741, 750, 753, 757
Cheng, Howard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
combinatorial identity . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
Chernac, Ladislaus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
commutative
Chinese Remainder
group . . . . . . . . . . . . . . . . . . . . 342, 349, 704, 705, 713
Algorithm (CRA) . 3, 19, 100, 101, 106, 104–139,
170, 171, 189, 190, 244, 246, 286, 295, 332, ring . . . . . . . . . . . . . . . . . . . . . 705, 706, 709, 711, 713
339, 580, 707 commute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
fast ∼ . . . . . . . 131, 301, 303, 305, 309, 331, 626 complete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292, 722
Problem . . . . . . . . . . . . 108, 109, 112, 114, 305, 306 completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
rational ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 complexity
Theorem (CRT) . 17, 75, 105, 104–139, 231, 243, asymptotic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6, 338
295, 302, 309, 384, 424, 518, 554 class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721–723
Chistov, Aleksandr Leonidovich (Qistov theory . . . . . . . . . . . . 3, 199, 222, 573, 616, 703, 721
Aleksandr Leonidoviq) . . . 466, 619, 740 composite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
Chor, Ben-Zion (Benny) . . . . . . . . . . . . . . . . . . 509, 740 compositeness test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
Chou, Chung-Chiang . . . . . . . . . . . . . . . . . . . . . 352, 740 C OMPOSITES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
chromatic scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 computational
Chudnovsky, David Vol’fovich (Qudnovski i  algebraic geometry . . . . . . . . . . . . . . 4, 591, 595, 619
David Volьfoviq) . . . . . . . . . . . . . . . . 90, 750 complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509, 576
Chudnovsky, Gregory Vol’fovich (Qudnovski i  number theory . . . . . . . . . . . . . . . . 3, 4, 517–571, 586
Grigori i Volьfoviq) . . . . . . . . . . . . . . . . . . 90 computational complexity (journal) . . . . . . . . . . . . . . 21
Ch’ung-chih, Tsu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 computer algebra system . . . . 1, 2, 4, 11, 19–21, 197,
Church, Alonzo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750 221, 278, 279, 378, 494, 532, 619, 623, 631, 640,
Chyzak, Frédéric . . . . . . . . . . . . . . . . . . . . . 671, 740, 741 645
Cicero, Marcus Tullius . . . . . . . . . . . . . . . . . . . . . . 28, 726 conditional probability . . . . . . . . . . . . . . . . . . . . 682, 718
circuit conformation . . . . . . . . see cyclohexane conformation
arithmetic ∼ . . . . . . . . . . . . . . . . see arithmetic circuit congruent modulo, mod . . . . . . . . . . . . . . . . . . . . . 69, 706
Boolean ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 conjugacy class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
Clancy, Tom . . . . . . . . . . . . . . . . . . . . . . . . . 376, 572, 728 conjugates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
Clarke, Arthur A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746 co-NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722, 723
Clarke, Arthur Charles . . . . . . . . . . . . . . . . . . . . . . 10, 725 construction of irreducible polynomials . . . 377, 406,
class field theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 410
Index 773

content, cont(·) . . 147, 148, 149, 150–152, 162, 192, ElGamal ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579, 580
199, 200, 433, 695 elliptic curve ∼ . . . . . . . . . . . . . . . . . . . . . . . . . 573, 580
continuant polynomial . . . . . . . . . . . . . . . . . . . . . . . 65, 93 key in a ∼ . . . . . . . . . 16, 18, 505, 509, 573, 573–582
continued fraction . . 3, 69, 79–81, 84, 87, 89–91, 93, knapsack ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503, 509
94, 132, 542, 768 Rabin ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573, 579
expansion . . . . . . . . . . . . . 79, 80, 81, 84, 87, 90, 568 RSA ∼ . . . . . . . . . . . . . . . . . . . . see RSA cryptosystem
factoring method . . . . . . . . . . . . . . . . . . . . . . . 541, 568 short vector ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
control point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 subset sum ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
convergent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 symmetric ∼ . . . . . . . . . . . . . . . . . . . . . . . . . 16, 575, 578
convex cubic spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 Cucker, Felipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Cunn, Samuel . . . . . . . . . . . . . . . . . . . . . . . . 725, 726, 758
convolution Cunningham, Lt.-Col. Allan Joseph Champneys
cyclic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230, 231 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541, 543, 741
fast ∼ . . . . . 235, 240, 244, 250, 251, 252, 253, 254 number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222, 542
negative wrapped ∼ . . . . . . . . . . . . . . . . . . . . . 238, 239 project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541, 542, 569
of polynomials . . . . . . . . . . . . . . . 230, 235, 237, 252 curve
of signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368, 369 algebraic ∼ . . . . . . . . . . . . . . . . 11, 172, 174, 175, 696
property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 Bézier ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Vandermonde ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672 elliptic ∼ . . . . . . . . . . . . . . . . . . . . . . . . see elliptic curve
Conway, John Horton . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 Gauß bell ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
Cook, Stephen Arthur . . . . . . . . . 6, 247, 286, 722, 741 nonsingular ∼ . . . . . . . . . . . . . . . . . . . . . . 559, 568, 571
Cookie Monster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 plane ∼ . . . . . . . . . . . . . . . . . . 173, 198, 203, 594, 615
Cooley, James William . . . . . 233, 247, 294, 727, 741 projective ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567, 568
Coppersmith, Don 352, 353, 420, 741, 742, 750, 765 cycle structure of a permutation . . . . . . . . . . . . . . . . . 465
coprime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46, 55, 450, 707 cyclic
Bézout-∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450, 471 code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
Cori, Robert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756 convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230, 231
Corless, Robert Malcolm . . . . . . . 7, 41, 287, 741, 751 group . . . . . . . . . . 250, 251, 349, 422, 578, 704, 713
Cormen, Thomas H. . . . . . . . . . . . . . . . . . . . 41, 368, 741 module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349, 350
co-RP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532, 722, 723 cycloheptane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694, 698
de Correa, Isabel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 cyclohexane 11, 12, 14, 16, 494, 619, 685–699, 725
coset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704, 715 conformation of ∼ 11, 12, 15, 685, 687, 689, 698,
cyclotomic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 699
of an ideal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706 boat ∼ . . . . . . . . . . . . 12, 13, 15, 16, 690, 691, 696
cosine theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 chair ∼ . . . . . . . . . . . . . . . 12, 13, 16, 686, 692, 693
Cot, Norbert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747, 760
flexible ∼ . . . . . . . . . . . . . . . . . . 12, 15, 16, 696, 698
Courant, Richard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
rigid ∼ . . . . . . . . . . . . . . . . . . . . . . . . . 12, 15, 16, 698
Cowie, James . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569, 741
cyclotomic
Cowles, John Richard . . . . . . . . . . . . . . . . . . . . . 199, 762
coset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
Cox, David Archibald . . . . . . 568, 614, 617, 618, 741
polynomial, Φn . . . . . 164, 201, 211, 253, 412, 413,
Coxeter, Harold Scott Macdonald . . . . 531, 534, 736
414, 416, 421, 441, 442, 467, 568
CRA . . . . . . . . . . . . see Chinese Remainder Algorithm
Cramer, Gabriel . . . . . . . . . . . . . . . . . . . . . . 198, 724, 741
rule 116, 136, 157, 183, 186, 200, 204, 205, 485, D, differential operator . . . . . . . . . . 624, 633, 669, 673
716 D, division time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Cray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 D’Alembert, Jean le Rond . . . . . . . . . . . . . . . . . 676, 729
Creutzig, Christopher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Damgård, Ivan Bjerre . . . . . . . . . . . . . . . . . . . . . 532, 742
Crichton, Michael . . . . . . . . . . . . . . . . . . . . . . . . . 208, 726 Das, Abhijit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
critical line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 data
Cromwell, Oliver . . . . . . . . . . . . . . . . . . . . . . . . . 208, 726 compression . . . . . . . . . . . . . . . . . . . . . . . 307, 363–366
Crossley, John Newsome . . . . . . . . . . . . . . . . . . 727, 741 structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97, 493
crossover point . . . 221, 222, 241, 251, 279, 281, 282, database integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
337 Datta, Ruchira Sreemati . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
CRT . . . . . . . . . . . . . . see Chinese Remainder Theorem Daubert, Katja Elisabeth . . . . . . . . . . . . . . . . . . . . . . . . . . 7
cryptanalysis . . . . . . . . . . . . . . . . . . . . . . . . . 503, 504, 575 Davenport, James Harold . . . . . . . . . . . . . . . . . 641, 742
cryptography . . . . 11, 16, 18, 37, 209, 503, 505, 509, Davies, Charles . . . . . . . . . . . . . . . . . . . . . . . . . . . 432, 728
517, 523, 525, 573–582 Davis, Martin David, -Putnam procedure . . . . . . . . 678
public key ∼ . . . . . . . . . . . . 3, 17, 503, 575, 573–582 DCT, Discrete Cosine Transform 363, 364, 363–369
cryptosystem . . . . . . . . 3, 17, 503, 504, 541, 542, 573, Dean, Basil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
573–582 Dèbes, Pierre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498, 742
asymmetric ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17, 575 de Casteljau, Paul de Faget . . . . . . . . . . . . . . . . 138, 739
774 Index

decimal representation . . . 31, 37, 40, 70, 71, 82, 92, difference
100, 505 equation . . . . . . . . . . . . . . . . . . . . . . 660, 669, 671, 675
decision problem . . . . . . . . . . . . . . . . . . . . . . . . . . 721, 722 field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659, 660, 675
hard ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722 operator, ∆ . . . . . . . . . . . . . . 646, 647, 660, 671, 673
instance of a ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721 differential
Decker, Wolfram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 algebra . . . . . . . . . . . . . . . . . . . . . . . 623, 624, 640, 641
de Correa, Isabel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 equation . . . 1, 4, 90, 353, 428, 633, 640–643, 653,
decryption . . . . . . . . . . . . . . . . . . . . . . . . . 16, 17, 573–582 669, 684
Dedekind, Julius Wilhelm Richard . . . 373, 419, 742, Risch ∼ . . . . . . . . . . . . . . . . . . . . 641, 738, 742, 750
746 field . . . . . . . . . . . . . . . . . . . . . . . . . . 624, 625, 633, 641
Degeyter, Pierre-Chrétien . . . . . . . . . . . . . . . . . . . . . . . 727 operator, D . . . . . . . . . . . . . . . . . . . 624, 633, 669, 673
degree Diffie, Bailey Whitfield 503, 575, 576, 578, 581, 742
formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710 -Hellman
function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63, 94 key exchange . . . . . . . . . . . . . . 573, 578, 579, 756
of a field extension . . . . . . . . . . . . . . . . 384, 710, 711 Problem . . . . . . . . . . . . . . . . . . . . . . . . 579, 580, 582
of a polynomial, deg . . . . . . . . . . . . . . . . 32, 708, 709 digital
of an algebraic element . . . . . . . . . . . . . . . . . . . . . . . 710 filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
sequence . . . 92, 93, 142, 178–181, 187, 188, 190, signal . . . . . . . . . . . . . . . . . . . . . . . . 247, 359, 363, 368
204, 314, 329, 333 signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
normal ∼ . 51, 53, 59, 60, 65, 93, 195, 314, 317, dimension
319, 321–324, 326, 330, 333 formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714
total ∼ . . . . 157, 172, 176, 493, 597, 616, 689, 709 of a code . . . . . . . . . . . . . . . . . . . . . . . . . . 209, 210, 211
valuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91, 94, 274 of a lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474, 480
de Groote, Hans Friedrich . . . . . . . . . . . . . . . . . 352, 748 of a vector space . . . . 349, 401, 674, 685, 687, 688,
Delaunay, Charles Eugène . . . . . . . . . . . . . . . . . . . . . . . 20 698, 710, 711, 714
∆, difference operator . . . . . 646, 647, 660, 671, 673 Diophantine
DeMillo, Richard Allan . . . . . . . . . . . . . . . . 88, 198, 742 approximation . . 3, 79, 80, 87, 473, 497, 505, 762
de Moivre, Abraham . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 simultaneous ∼ . . . . . 87, 503, 505, 507–509, 753
De Morgan, Augustus . . . . . 44, 68, 96, 622, 726, 729 equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 512, 764, 766
Deng, Yuefan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352, 740 linear . . . . . . . . . . . . . . . . . . . . . . . . 69, 77, 79, 89, 93
dense representation . . . . . . . . . . . . 101, 231, 493, 494 Diophantus of Alexandria (Διόφαντος
derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197, 624 ᾿Αλεξανδρέως) . . . . . . . . . . . . . . 513, 514, 754, 756
derivative . . 113, 114, 122, 133, 156, 213, 259, 265, direct product
266, 267, 289–291, 300, 394, 623, 624, 633, 642, of finite probability spaces . . . . . . . . . . . . . . . . . . . . 718
647, 667, 768 of groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
Hasse-Teichmüller ∼ . . . . . . see Hasse-Teichmüller of rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706
logarithmic ∼ . . . . . . . . . . . . 633, 635, 636, 639, 641 directed graph . . . . . . . . . . . . . . . . . . . . . . . . 423, 468, 679
trivial ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624, 642 Lejeune Dirichlet, Johann Peter Gustav . . . . . 62, 506,
D ERIVE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 507, 509, 528, 588, 707, 742
de Sainte-Croix, Jumeau . . . . . . . . . . . . . . . . . . . . . . . . 669 Schubfachprinzip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678
Descartes, René, du Perron . 334, 512, 622, 727, 729, DISCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
796 Discrete
designed distance of a BCH code . . . . . . . . . . 212, 213 Cosine Transform (DCT) . . . . . 363, 364, 363–369
determinant, det . . . 50, 100, 109–111, 113, 136, 157, Inverse ∼ (IDCT) . . . . . . . . . . . . . . . 363, 366, 369
172, 197–199, 204, 205, 328, 329, 335, 337, 688, Fourier Transform (DFT) . . . . . 229, 221–254, 262,
715, 716 340, 352, 362, 359–369
Gramian ∼ . . . . . . . . . . . . . . . . . . . . . . . . . 482, 484, 717 Logarithm Problem (DL) . . . . . . . . . . 579, 580, 582
modular ∼ . . . . . . . . . . . . . . . . . . . . 109, 113, 132, 525 signal . . . . . . . . . . . . . . . . . . . 359, 360–364, 368, 369
big prime ∼ . . . . . . . . . . . 110, 113, 168, 460, 526 discriminant, disc 156, 207, 435, 441, 443, 454, 455,
small primes ∼ . . . . . . . see modular determinant 466, 467, 470, 471, 537, 689
de Weger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . see Weger dispersion, dis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
DFT . . . . . . . . . . . . . . . . see Discrete Fourier Transform distinct-degree
DH . . . . . . . . . . . . . . . . . . . . see Diffie-Hellman problem decomposition . . . . . . . . . . . . . . . . 381, 392, 400, 422
Diamond, Harold George . . . . . . . . . . . . . . . . . . . . . . . 745 factorization . . . . . . . . 373, 381, 377–421, 461, 462
diatonic scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85, 86 distributed
Díaz, Angel Luis . . . . . . . . . . . . . . . . . . . . . 199, 498, 742 computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19, 99, 567
Dickman, Karl Daniel, ρ-function . . . . . . . . . . . . . . . 553 data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18, 19
Dickson, Leonard Eugene . . . . . . . . . . . . . . 88, 591, 742 divide-and-conquer . . . 286, 289, 298, 300, 309, 317,
lemma . . . . . . . . . . . . . . . . . . . . . . . . 602, 603, 604, 620 353
Didymos of Alexandria (Δίδυμος ᾿Αλεξανδρέως) division
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 exact ∼ . . . . . . . . . . . . . . 42, 202, 251, 261, 289, 310
Index 775

property . . . . . . . . . . . . . . . . . . . . . . . . . . . 706, 707, 709 elliptic curve . . . . . . . . . . 508, 558, 557–568, 571, 580
pseudo-∼ . . 38, 183, 190, 191, 197, 199, 204–206 cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . 573, 580
time, D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 factoring method . . . 287, 541, 542, 563, 557–567,
trial ∼ . . . . . . . . . . . . . . . . . . . 389, 541, 543, 544, 552 571
with remainder . . . . . . 2, 26, 37, 38, 39, 41, 45, 51, size of an ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561, 565
59–62, 100, 131, 257, 261, 262, 282, 283, 314, smooth ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559, 560
407, 445 Emiris, Ioannis Zacharias (᾿Εμίρης, ᾿Ιωάννης
fast ∼ . . . . . . . . . . . . 221, 261, 264, 282, 287, 339 Ζαχαρίου) . . . . . . . . . . . . . . . . . . . . . . . . . . 7, 698, 743
multivariate ∼ . . . . 595, 598, 599, 600, 604, 605 Encarnación, Mark James . . . . . . . . . . . . . . . . . 465, 741
Dixon, Alfred Cardew . . . . . . . . . . . . . . . . . . . . . 671, 743 Encke, Johann Franz . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746
Dixon, John Douglas . . . . . . . . . . . . . . 61, 568, 569, 742 encoding map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
random squares method . 340, 541, 549, 550, 551, encryption . . . . . . . . . . . . . . . . . . . . . . . . . 16, 17, 573–582
556, 558, 567, 569, 570, 579 endomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714, 715
DL, Discrete Logarithm Problem . . . . . 579, 580, 582 Frobenius ∼ . . . . 398, 402, 404, 427, 428, 713, 746
Dodson, Bruce . . . . . . . . . . . . . . . . . . . . . . . 569, 741, 742 Eneström, Gustav . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743
Don Quixote de la Mancha . . . . . . . . . . . . . . . . . . 90, 740 Engel, Friedrich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
Dooley, Samuel Sean . . . . . . . . . . . . . . . . 738, 746, 763 E NIGMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
Dörge, Karl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466, 742 entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298, 324, 452
Dornstetter, Jean Louis . . . . . . . . . . . . . . . . . . . . 215, 742 equal-degree
double-precision integer . . . . . . . . . . . . . . . . . . . . . . . . . . 29 factorization . . . 387, 377–421, 424, 461, 462, 554,
Doughty, Herb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757 579
Doyle, Sir Arthur Conan . . . . . . . . 572, 702, 728, 729 splitting . . . . . . . . . . . . . . . . . . . . . . 385, 387, 423, 424
Dozier, Lamont . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728 equivalence relation . . . . 92, 314, 332, 430, 673, 707
Dreker, Stefan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Erasmus of Rotterdam . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Dresden, Arnold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729 erasure code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18, 215
Dress, Andreas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498, 741 Eratosthenes of Cyrene (᾿Ερατοσθένης ὁ
Drobisch, Moritz Wilhelm . . . . . . . . . . . . . . . . . . 91, 742 Κυρηναι̃ος) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24, 518
Dubé, Thomas Willliam . . . . . . . . . . . . . . . . . . . 618, 742 sieve . . . . . . . . . . . . . . . 171, 527, 531, 533, 552, 557
Dubner, Harvey Allen . . . . . . . . . . . . . . . . . . . . . 530, 766 Erdmann, Johann Eduard . . . . . . . . . . . . . . . . . . . . . . . . 726
Dubois, Raymond . . . . . . . . . . . . . . . . . . . . . . . . . 532, 742 Erdős, Pál . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512, 567, 739
du Bois-Reymond, Emil . . . . . . . . . . . . . . . . . . . . . . . . . 588 ERH . . . . . . . . . . . . see Extended Riemann Hypothesis
Ducos, Lionel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199, 742 Ernie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Dupré, Athanase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61, 742 error
Durucan, Emrullah . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 correcting code . . . . . . . . . . . . . . . . . . . . . . . . . . . 18, 209
dynamical systems theory . . . . . . . . . . . . . . . . . . . . . . . 276 locator polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Euchner, Martin . . . . . . . . . . . . . . . . . . . . . . . . . . . 497, 762
Euclid (Εὐκλείδης) . . . . 3, 24, 25, 26, 44, 61, 73, 93,
E, shift operator . . . . . . . . . . . . 646, 648, 659, 660, 671 518, 531, 724, 725, 748
early abort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382 Euclidean
Eberly, Wayne Michael . . . . . . . . . . . . . . . . . 6, 353, 742 Algorithm . . 3, 4, 25, 45–207, 313–333, 530, 612,
Edmonds, Jeffrey Allen . . . . . . . . . 215, 679, 735, 741 616, 707, 738, 742, 754, 756, 763, 765, 766
Edmonds, John Robert (Jack) . . . . . . . . . . . . . 132, 742 binary ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61, 65, 738
EEA . . . . . . . . . . . . see Extended Euclidean Algorithm Extended ∼ (EEA) . . . . see Extended Euclidean
effective univariate factorization . . . . see factorization fast ∼ . . 3, 7, 178, 263, 325, 313–333, 345, 626
eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716 monic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . see monic
Einstein, Albert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734 primitive ∼ . . 190, 191, 192, 194–197, 199, 206
Eisenbrand, Friedrich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 quotient in the ∼ . . . . . . . . . . . . . . . . . . . see quotient
Eisenbud, David . . . . . . . . . . . . . . . . 617, 697, 742, 743 remainder in the ∼ . . . . . . . . . . . . . . . see remainder
Eisenstein, Ferdinand Gotthold Max . 373, 533, 743 traditional ∼ . . . . . . . . . . . . . . . . . . . . . see traditional
theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 with least absolute remainders . . . . . . . . . . . . . . . 66
Ekhad, Shalosh B. . . . . . . . . . . . . . . . . . . . . . . . . 697, 743 domain . . 45, 45–95, 97, 104, 106, 135, 147, 158,
Eleatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 159, 186, 257, 352, 595, 707, 708–711
Electronic Frontier Foundation . . . . . . . . . . . . . . . . . . 517 engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
elementary function . . . . . . . . . . . . . 46, 47, 48, 61–64, 257, 707
functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623 minimal ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62, 63
symmetric polynomial . . . . . . . . . . . . . . . . . . . . . . . . 166 norm, two-norm, || · ||2 . . . . 12, 157, 164, 473, 474,
Elements (Euclid) 24, 25, 26, 61, 518, 531, 724, 725 480, 487, 497, 717, 768
ElGamal, Taher ( ÈAÒm.Ì '@ QëA£) . . . . . . . . . . . . see Gamal number field . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724, 755
elimination of variables . . . . . . . . . . . . . . . . . . . . . . . . . 172 representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Elkenbracht-Huizing, Reina Marije . . . . . . . . 569, 741 Eudoxus of Cnidus (Εὔδοξος Αἰσχίνου Κνίδιος)
ellipsoid method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
776 Index

Euler, Leonhard . . 62, 76, 88–91, 131, 132, 134, 197, modular ∼ . . . . . . . . . . . . . . see modular factorization
198, 372, 418, 513, 520, 533, 542, 586, 644, 670, of integers . . . . 3, 17, 18, 198, 222, 335, 340, 352,
735, 743, 753, 761 353, 505, 513, 517, 521, 531–533, 541,
constant, γ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534, 651 541–571, 577–579
number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669 of multivariate polynomials . . . . . . . . 493, 497, 501
theorem . . . . . . . . . . . . . . . . . . . 17, 518, 519, 577, 704 of polynomials . . 2, 4, 15, 20, 148, 271, 282, 286,
totient function, ϕ . . . . 17, 75, 108, 131, 136, 250, 372, 373, 377, 377–501, 505, 513, 586, 588
412, 518, 535, 577 over Z and Q . . . . . 37, 100, 164, 257, 373, 440,
evaluation 433–471, 473, 474, 487–501, 525, 528
homomorphism . . . . . . . . . . . . . . . . . . . . . . . . . 107, 709 over finite fields 3, 77, 279, 283, 340, 352, 389,
map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103, 215, 295 377–430, 461–463, 488, 493, 569, 724
multipoint ∼ 19, 97–103, 231–233, 280, 281, 295, of sparse polynomials . . . . . . . . . . . . . . . . . . . . . . . . 497
296, 299, 300, 302, 309, 333, 339, 399, 407, pattern . . . 435, 442, 443, 444, 462, 465, 467, 468
460 prime ∼ . . 106, 131, 291, 292, 518, 529, 535, 550,
fast ∼ . . . . . . . 231, 295, 298, 299, 308, 399, 544 554
of a matrix . . . . . . . . . . . . . . 340, 346, 348, 352, 353 squarefree ∼ . . . 377, 379, 389, 393, 395, 397, 416,
Evdokimov, Sergeı̆ Alekseevich (Evdokimov 426, 658
Serge i Alekseeviq) . . . . . . . . . . . . . . 421, 743 unique ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706, 707
Eve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16, 573, 580, 582 domain (UFD) see unique factorization domain
event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718 Faddeev, Dmitriı̆ Konstantinovich (Faddeev
eventually positive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720 Dmitri i  Konstantinoviq) . . . . 132, 419,
exact division . . . . . . . . . . 42, 202, 251, 261, 289, 310 744
exp, exponential function . . . . . . . . . . . . . . . . . . . . . . . 768 Faddeeva, Vera Nikola’evna (Faddeeva Vera
expected value . . . . . . . . . . . . . 184, 205, 411, 682, 718 Nikolaьevna) . . . . . . . . . . . . . . . . . . . . . . 132, 744
EX PEX PT IME . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723 Fahle, Torsten Klemens . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
explicit linear algebra . . . . . . . . . . . . . . . . 335, 352, 407 falling factorial . . 647, 649, 654, 669, 670, 673, 768
EX PSPA CE . . . . . . . . . . . . . . . . . . . . . . . 616, 697, 723 Fano, Robert Mario . . . . . . . . . . . . . . . . . . . . . . . 307, 744
EX PT IME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723 fast
Extended convolution . . . . 235, 240, 244, 250, 251, 252, 253,
Euclidean Algorithm (EEA) . . 17, 48, 57, 45–207, 254
214, 242, 283, 304, 313–333, 344, 407, 448, CRA . . . . . . . . . see Chinese Remainder Algorithm
450–452, 505, 577, 710 division with remainder . . . . . . . . . . . . . . see division
big prime modular ∼ . . . . . . . 189, 190, 195, 206 Euclidean Algorithm . . . see Euclidean Algorithm
bivariate modular ∼ . . . . . . . . . . . . . . . . . . . . . . . . 189 exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 373, 580
modular ∼ . . . . . . . . . . . . . 183, 186, 206, 326, 331 integer multiplication . . . . . . . . . . . . . . . . . . . . 221–254
primitive ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 interpolation . . . . . . . . . . . . . 231, 301, 295–310, 331
small primes modular ∼ . . . . . see modular EEA matrix multiplication . . . . . . . . . . . . . . 336, 337, 340
traditional ∼ . . . . . . . . . . . . . . . . . . . . . see traditional modular composition . . . . . . . . . . . . . . 338, 339, 405
Riemann Hypothesis (ERH) . . 421, 532, 533, 743, multipoint evaluation . . . . . . . . . . . . . . see evaluation
759 polynomial multiplication . . . . . . . . . . . . . . . 221–254
extension field . . . . . . . . . . . . . . . . . . . see field extension sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
EZ-GCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460, 466 Fast Fourier Transform (FFT) . . . 3, 19, 82, 211, 233,
221–254, 281, 363, 364, 373, 741, 744, 747, 748,
factor 760
base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550, 557 arithmetic circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
combination . . . 434–436, 441, 453, 455, 458, 462, Fermat number ∼ . . . . . . . . . . . . . . . . . . . . . . . . 284–286
465, 489, 492, 496, 497 multiplication . . . 3, 101, 238, 243, 247, 250, 251,
group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704 262, 279, 280, 283, 286, 333
factorial support the ∼ . . . . . . . . . . . . 237, 245, 251, 296, 333
falling ∼ . . . . . . . 647, 649, 654, 669, 670, 673, 768 3-adic ∼ . . . . . . . . . . . . . . . . . . . . . . 242, 247, 252, 253
greatest ∼ factorization, gff . see greatest factorial three primes ∼ . . . . . . 243, 246, 247, 283, 284, 286
ring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 Fateman, Richard Jay . . . . . . . . . . . . . . . . . . . . . 640, 737
rising ∼ . . . . . . . . . . . . . . . . . . . . . . . 647, 670, 673, 768 Faugère, Jean-Charles . . . . . . . . . . . . . . . . . . . . . 619, 744
factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2, 20 Faulhaber, Johann . . . . . . . . . . . . . . . . . . . . . . . . . 670, 752
bivariate ∼ . . . . . 433, 457, 459, 493, 496, 497, 586 feasible matrix multiplication exponent . . . . 337, 338,
by continued fraction method . . . . . . . . . . . 541, 568 352
by elliptic curve method . . . . . . . . see elliptic curve Feisel, Sandra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6, 7
distinct-degree ∼ . . . . 373, 381, 377–421, 461, 462 Felkel, Anton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
effective univariate ∼ . . . . 457, 459, 473, 493, 501 Feller, William . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717, 744
equal-degree ∼ . . . . . see equal-degree factorization Ferdinand, Duke of Braunschweig . . . . . . . . . . . . . . 372
greatest factorial ∼ , gff . . . . . see greatest factorial Ferdinand von Fürstenberg . . . . . . . . . . . . . . . . 514, 725
irreducible ∼ . . . . . . . . . see irreducible factorization Fermat, Clément-Samuel de . . . . . . . . . . . . . . . 514, 725
Index 777

Fermat, Pierre de . . 3, 7, 24, 76, 88, 89, 93, 218, 512, Flajolet, Philippe Patrick Michel 419, 697, 744, 759,
513, 514, 520, 530, 550, 622, 669, 725, 739, 741, 763, 765
743, 744, 764, 765 Fleischer, Jochem . . . . . . . . . . . . . . . . . . . . 736, 748, 760
last theorem . . . . . . . . . . . . . . . . . . 514, 595, 761, 766 flexible conformation see cyclohexane conformation
liar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519, 534 floating point
little theorem . . . . 77, 88, 379, 380, 398, 513, 518, arithmetic . . . . . . . . . . . . . . . . . . . . . . . 20, 82, 283, 497
520, 531, 704, 712, 713, 742, 743 number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32, 286, 337
number, Fn . 76, 88, 246, 513, 520, 530, 538, 542, representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
738, 755 Floyd, Robert W. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284–286 cycle detection trick . . . . . . . . . . 546, 547, 548, 567
polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 fluxions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
primality test . . . . . . . . . . . . 519, 520, 521, 523, 534 FOCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
prime . . . . . . . . . . . . . . . . . . . . . . . . . 228, 251, 530, 536 Folkerts, Menso . . . . . . . . . . . . . . . . . . . . . . 286, 727, 744
witness . . . . . . . . . . . . . . . . . . 519, 520, 522, 523, 534 Ford, Garrett . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
Feynman, Richard Phillips . . . . . . 220, 540, 727, 728 Fourier, Jean Baptiste Joseph . . . . 247, 358, 727, 744
FFT . . . . . . . . . . . . . . . . . . . . . see Fast Fourier Transform coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362, 369
Fibonacci, Leonardo Pisano, son of Bonaccio prime . . . . . . . . . . . . . . . . 99, 243, 246, 528, 529, 536
number . . . . . . . . . . . . . . . . . . . 53, 54, 61, 66, 89, 742 series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361, 741
sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66, 341, 343 Transform . . . . . . . . . . 247, 359, 361–363, 369, 513
Fich, Faith Ellen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Continuous ∼ . . . . . . . . . . . . . . . . . . . 359, 361, 362
Fiduccia, Charles (Chuck) Michael . . . 306, 353, 744 Discrete ∼ (DFT) . . . . . . . . . see Discrete Fourier
Fieker, Claus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749 Fast ∼ (FFT) . . . . . . . see Fast Fourier Transform
field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32, 708, 710, 711 F p , finite prime field . . . 73, 421, 427, 428, 462, 471,
algebraic number ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 568
difference ∼ . . . . . . . . . . . . . . . . . . . . . . . 659, 660, 675 Fq , finite field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73, 712
differential ∼ . . . . . . . . . . . . . . . . . 624, 625, 633, 641 fractal . . . . . . . . . . . . . . . . . . . . . 226, 273, 276–278, 287
extension . . . . . . . 74, 398, 408, 411, 633, 663, 710, Franke, Jens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542, 751
711–713 Fredet, Anne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641, 738
algebraic ∼ . . 175, 343, 378, 384, 493, 627, 630, Freeman, Timothy Scott . . . . . . . . . . . . . . . . . . . 498, 744
710, 711 Frege, Friedrich Ludwig Gottlob . . . . . . . . . . 588, 739
degree of a ∼ . . . . . . . . . . . . . . . . . . . . 384, 710, 711 -Hilbert proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678
finite ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710 Freivalds, Rùsiņš . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88, 744
Galois ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 Frénicle de Bessy, Bernard . . . . . . . . . . . . . . . . . . . . . . 513
normal ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 frequency . . . . . . 84–86, 359, 360, 361–363, 365, 366
finite ∼ , Fq . . . . . . . . . . . . . . . . . . . . . . . . see finite field analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
Hilbertian ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498 Friedman, Philip . . . . . . . . . . . . . . . . . . . . . 472, 572, 728
of constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627, 659 Frieze, Alan Michael . . . . . . . . . . . . . . . . . . . . . . 505, 744
of fractions . 42, 79, 147, 149, 150, 152, 157, 177, FRISCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
186, 191, 200, 275, 292, 433, 500, 710 Frisé, Adolf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
operation . . . . . . . . . . . . . . . . see arithmetic operation Frobenius, Ferdinand Georg . . . . . 132, 197, 441, 465,
perfect ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 744, 745
splitting ∼ 177, 426, 429, 441, 627, 628, 630, 711 automorphism . . . . . . . . . . . . . . . . 398, 420, 465, 713
Fields, John Charles, medal . . . . . . . . . . . . . . . . . . . . . 591 density theorem . . . . . . . . . . . . . . 441, 442, 443, 465
Finck, Pierre Joseph Étienne . . . . . . . . . . . . . . . . 61, 744 endomorphism 398, 402, 404, 427, 428, 713, 746
fingerprinting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70, 88, 91 polynomial representation of the ∼ . . . 398, 408
finite iterated ∼ algorithm . . . . . . . see iterated Frobenius
-dimensional vector space . . . . . . . . . . . . . . 710, 714 Fröhlich, Albrecht . . . . . . . . . . . . . . . . . . . 419, 745, 753
duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363, 369 Fuchssteiner, Benno . . . . . . . . . . . . . . . . . . . . . . . . . . . 7, 20
extension of a field . . . . . . . . . . . . . . . . . . . . . . . . . . . 710 Fulton, William . . . . . . . . . . . . . . . . . . . . . . . . . . . 568, 745
field, Fq . . . 2, 18, 20, 55, 73, 75, 76, 88, 229, 266, functional decomposition . . . . . . . . . . . . 576, 580, 581
286, 313, 355, 377–430, 711, 712, 713 fundamental
irreducibility test over a ∼ . . . . . . . . . . . . . . . . . . 407 lemma about gff . . . . . . . . . . . . . . . . . . . 657, 658, 661
root finding over a ∼ . . . . . . . . 377, 392, 418, 428 period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
prime field, F p . 73, 421, 427, 428, 462, 471, 568 theorem
probability space . . . . . . . . . . . . . 703, 717, 718, 719 of algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372, 711
finitely generated of calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
ideal . . . . . . . . . . . . . . . . . . . . . . . . . . 593, 603, 604, 618 of number theory . . . . . . . . . . . . . . . . . . . . . 377, 518
vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714 on subresultants . . . . . . . . . . . . . . . . . 327, 329, 332
Fish, Daniel W. . . . . . . . . . . . . . . . . . . . . . . . . . . . 540, 728 Fürer, Martin . . . . . . . . . . . . . . . 222, 244, 245, 247, 745
Fitch, John . . . . . . . . . . . . . . . . . . . . . . 740, 747, 750, 757
Fitchas, Noaï . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619, 744 Galileo Galilei . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
Flaccus, Aules Persius . . . . . . . . . . . . . . . . . . . . 699, 729 Gallagher, Patrick Ximenes . . . . . . . . . . . . . . . 466, 745
778 Index

Galligo, André . . . . . . . . . . . . . . . . . . 618, 619, 739, 744 of numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473

Gallo, Giovanni . . . . . . . . . . . . . . . . . . . . . . . . . . . 619, 745 projective ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
Gallot, Yves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 Georgetown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i, 8
Galois, Évariste . . 198, 376, 418, 421, 724, 728, 745 Gergonne, Joseph Diez . . . . . . . . . . . . . . . . . . . . 465, 747
extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 Gerhard, Jürgen . . 279, 287, 461, 467, 470, 640, 641,
field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . see finite field 669–671, 674, 737, 745–747
group . . . . . . . . . . 373, 398, 421, 441–443, 465, 762 Gerhardt, Carl Immanuel . . . . . . . . . . . . . . . . . . . . . . . . 754
theory . . . . . . . . . . . . . . . . . . . 398, 414, 441, 713, 745 Gerhold, Stefan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
ElGamal, Taher ( ÈAÒm.Ì '@ QëA£) . . . . . . . . . . . . . 580, 745 Gericke, Helmuth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . 579, 580 Gerstetter, Reinhold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
gamma function, Γ . . . . . . . . . . . . . 659, 660, 670, 673 Gesellschaft für Informatik (GI) . . . . . . . . . . . . . . . . . . 21
Gao, Shuhong . . . . . . . . 6, 88, 407, 419–421, 580, 745 gff . . . . . . . . . . . . . . . see greatest factorial factorization
Garey, Michael Randolph . . . . . . . . . . . . 509, 722, 745 Gianni, Patrizia . . . . . . . . . . . . . . . . . 619, 744, 751, 765
Garner, Harvey Louis . . . . . . . . . . . . . . . . . . . . . 132, 745 Giesbrecht, Mark William 6, 353, 671, 737, 747, 751
von zur Gathen, Désirée Dorothea Sarah Fatima . . 6, GIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
725 Gilchrist, Bruce . . . . . . . . . . . . . . . . . . . . . . . . . . . 744, 767
von zur Gathen, Joachim Paul Rudolf . . 62, 88, 131, Gill, John Thomas, III. . . . . . . . . . . . . . . . . . . . . 198, 747
197–199, 279, 286, 287, 352, 405–407, 419–421, GIMPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
425, 461, 466, 467, 497, 498, 500, 580, 581, 669, Giovini, Alessandro . . . . . . . . . . . . . . . . . . . . . . . 619, 747
670, 724, 736, 737, 745, 746, 751, 756 Giuliani, Charles-Antoine . . . . . . . . . . . . . . . . . . . . . . . . . 7
Gaudry, Pierrick . . . . . . . . . . . . . . . . . . . . . . . . . . 542, 751 Giusti, Marc François . . . . . . . . . . . . . . . . 618, 619, 747
Gauß, Carl Friedrich . . . 3, 24, 62, 90, 131, 148, 197, Gleick, James . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644, 729
247, 256, 358, 372, 373, 374, 376, 417–419, 421, Glesser, Philippe . . . . . . . . . . . . . . . . . . . . . . . . . . 198, 757
444, 466, 497, 529, 533, 540, 586, 670, 699, 724, Gloor, Oliver . . . . . . . . . . . . . . . . . . . . . . . . . 742, 747, 749
725, 727–729, 746–748 Glover, Roderick Edward . . . . . . . . . . . . . . . . . . . . . . 6, 7
bell curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 GNU MP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
lemma . . . . 141, 146, 147, 148, 190, 197, 433, 438 Gō, Nobuhiro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698, 747
period . . . . . . . . . . . . . . . . . . . . . . . . . . 76, 373, 580, 745

theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149, 709 God ( é<Ë @, deus, Dieu, Gott) . . . . 10, 28, 68, 208, 256,
Gaussian 294, 734
elimination . . . . 109–113, 131, 132, 137, 190, 340, Gödel, Kurt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
373, 402, 403, 475, 612, 621, 627, 638, 666, Goethe, Johann Wolfgang von . . . 140, 358, 726, 727
715, 716, 724, 736, 764 Goldbach, Christian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
unimodular ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Goldberg, David Marc . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
integers . . . . . . . . . . . . . . . . . . . . . . . . . . 46, 61, 64, 707 golden ratio, φ . . . . . . . . . . . . . . . . . . . . . . 54, 66, 89, 198
Gautschi, Walter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 Goldstine, Hermann Heine . . . . . . . . . . . . . . . . 286, 747
gcd . . . . . . . . . . . . . . . . . . . see greatest common divisor Goldwasser, Shafrira . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
gcdE, shift gcd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657 Gonnet Haas, Gaston Henry . . . . . . . 20, 202, 740, 746
Geddes, Keith Oliver . . . . . . . . . . . . . . . . 6, 20, 202, 740 Goodman, Jacob Eli . . . . . . . . . . . . . . . . . . 745, 749, 761
Gegenbauer, Leopold Bernhard . . . . . . . . . . . . . 62, 747 Goodman, Rodney Michael Frederick . . . . . 509, 747
Geil, Olav . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Gordan, Paul Albert . . . . . . . . . . . . . . . . . . 199, 332, 747
Gell-Mann, Murray . . . . . . . . . . . . . . . . . . . . . . . . . 44, 726 Gordon, Daniel Martin . . . . . . . . . . 466, 579, 739, 747
le Gendre, Adrien Marie . . . . . . . . . . . . . . see Legendre Gosper, Ralph William, Jr. . 641, 662, 670, 671, 675,
generating 747
function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697 algorithm 641, 658, 662, 665, 670, 671, 736, 755,
set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704, 714 760
generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704 -Petkovšek representation . . . . . . . . . . . . . . . 675, 760
polynomial of a BCH code . . . 211, 212, 214, 215, Göttfert, Rainer . . . . . . . . . . . . . . . . . 420, 747, 758, 759
416 Gourdon, Xavier Richard . . . . . . . 419, 744, 747, 759
Genovese, Giulio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Grabmeier, Johannes . . 498, 736, 741, 747, 748, 760
Gentleman, William Morven . . . . . . . . . . . . . . 247, 747 graded
Gentzen, Gerhard, system . . . . . . . . . . . . . . . . . . . . . . . 678 lexicographic order . . . . . . . . . . . . . . . . . . . . . 596, 598
genus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568, 618 reverse lexicographic order . . . 596, 597, 598, 618
Genuys, François . . . . . . . . . . . . . . . . . . . . . . . . . . . 82, 747 Grafton, Sue Taylor . . . . . . . . . . . . . . . . . . . . . . . 472, 728
geometric Graham, Ronald Lewis . . . . . 509, 571, 669, 670, 717,
elimination theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 720, 747, 759
series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66, 122, 670 Gram, Jørgen Pedersen . . . . . . . . . . . . . . . . . . . . 496, 747
sum . . . . . . . . . . . . . . . . . . . . . 225, 229, 451, 653, 719 -Schmidt
theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612, 618 orthogonal basis . 475, 476–478, 480, 481, 485,
geometry 498
algebraic ∼ . . . . . . . . . . . . . . . . . . . . . . . . 558, 568, 595 orthogonalization (GSO) . . . 475, 476, 478–480,
non-Euclidean ∼ . . . . . . . . . . . . . . . . . . . . . 25, 373, 374 485, 496, 498, 500, 717
Index 779

Gramian Hadamard, Jacques Salomon . . . . . . . . . 496, 533, 748

determinant . . . . . . . . . . . . . . . . . . . . . . . 482, 484, 717 inequality . . . . . 111, 136, 157, 182, 183, 474, 477
matrix . . . . . . . . . . . . . . . . . . . 475, 482, 688, 698, 717 al-H.ajjāj bin Yūsuf bin Mat.ar
grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618 ( Q¢Ó áK . ñK áK h Aj mÌ '@) . . . . . . . . . . . . . . . . . . 25

. . .
Granville, Andrew Jam . . . . . 198, 529, 532, 735, 748 Haken, Armin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678, 748
Graves-Morris, Peter . . . . . . . . . . . . . . . . . . . . . . 132, 736 Halmos, Paul Richard . . . . . . . . . . . . . . . . . . . . . 533, 748
greatest Halton, John Henry . . . . . . . . . . . . . . . . . . . . . . . 198, 748
common divisor, gcd 2, 24, 45, 46, 47, 48, 55–57, Hamilton, Sir William Rowan . . . . . . . . 341, 373, 716
60, 73, 141, 194, 286, 326, 327, 333, 706, 707, Hamming, Richard Wesley . . . . . . . . . . . . . . . . 308, 748
708 weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75, 210
bivariate modular ∼ . . . . . . . . 141, 162, 168, 203 Hammurapi (Hammurabi) . . . . . . . . . . . . . . . . . . . . . . . 766
heuristic ∼ . . 194–196, 202, 206, 317, 320, 333 Hankel, Hermann . . . . . . . . . . . . . . . 332, 353, 376, 728
modular ∼ . . . . . . . . . . . . . . . . . . . . . see modular gcd hard decision problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
monic ∼ . 60, 146, 150–152, 156, 192, 194, 195 von Hardenberg . . . . . . . . . . . . . . . . . . . . . . . . . see Novalis
multivariate ∼ . . . . 190, 198, 202, 466, 496, 501 hardware arithmetic . . . . . . . . . . . . . . . . . . . . . . . 279, 283
normalized ∼ . . . . . . . . . . . . . . . . . . . . 150, 152, 167 Hardy, Godfrey Harold . . . 26, 44, 62, 140, 421, 532,
of many integers . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 534, 535, 572, 726, 728, 748
of many polynomials . . . . . . . . . . . . 141, 176, 177 harmonic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361, 362
of primitive polynomials . . . . . . . . . . . . . . . . . . . 152 number, Hn . . . . . . . . . . . . . . 466, 645, 650, 651, 652
shift ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657 series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
factorial factorization, gff . . . . . 653, 654, 655–658, Harris, Mitchell Alan (Mitch) . . . . . . . . . . . . . . . . . . . . . 7
661, 670, 673 Hart, William Bruce . . . . . . . . . . . . . . . . . . . . . . 497, 748
fundamental lemma about ∼ . . . . . 657, 658, 661 Hartlieb, Silke . . . . . . . . . . . . . . . . . . . . . . . . . . 6, 466, 746
Gregorian calendar . . . . . . . . . . . . . . . . . . . . . . . . . . . 84, 91 Hartshorne, Robin . . . . . . . . . . . . . . . . . . . . . . . . 568, 748
Pope Gregory XIII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Caliph Hārūn al-Rashı̄d ( YJ
QË@ àðPAë ) . . . . . . . . . . 25
Grigoriev, Dimitriı̆ Yur’evich (Grigorьev hash function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
Dmitri i  rьeviq) . 6, 498, 619, 740, 748 hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88, 198
Grimsell, Sebastian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Haskell, Mellen Woodman . . . . . . . . . . . . . . . . 332, 748
Gröbner, Wolfgang Anton Maria . . . . . . . . . . 591, 765 Hasse, Helmut . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568, 748
basis . . . 3, 15, 101, 175, 604, 591–621, 679, 681, bound . . . . . . . . . . . . . . . . . . . 508, 562, 564, 565, 740
694, 697, 736, 738, 740–742, 744, 750, 753, -Teichmüller derivative . . . . . . . . . . . . . . . . . . . . . . . 290
756, 757 Håstad, Johan Torkel . . . . . . . . . . 6, 505, 580, 744, 748
minimal ∼ . . . . . . . . . . . . . . . . . . . . . . . 611, 620, 621 Havel, Timothy Franklin . . . . . . . . . . . . . . . . . . 698, 748
reduced ∼ . . . 611, 613, 614, 616–618, 620, 621, Hazebroek, P. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698, 748
679, 681, 694, 695 Hearn, Anthony Clem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
proof system . . . . . . . . . . . . . . . . . 677, 679, 697, 698 Heath, Sir Thomas Little . . . . . . . . . . . . . . . . 24, 25, 748
de Groote, Hans Friedrich . . . . . . . . . . . . . . . . . 352, 748 Hecke, Erich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765
Grotefeld, Andreas Friedrich Wilhelm . . . . . 279, 286, Heegaard, Poul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
292, 727, 762 Hehl, Friedrich Wilhelm . . . . . . . . . . . . . 736, 748, 760
Grötschel, Martin . . . . . . . . . . . . . . . . . . . . 496, 748, 759 Heiberg, Johan Ludwig . . . . . . . . . . . . . . . . . . . . . . . . . 735
group . . . . . . . . . . . . . . . . . . . . . . 703, 704, 706, 712–715 Heideman, Michael Thomas . . . . . . . . . . . . . . 247, 748
additive ∼ . . . . . . . . . . . . . . . . . . . . . 250, 578, 703, 713 Heilbronn, Ernst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726
commutative ∼ . . . . . . . . . . . 342, 349, 704, 705, 713 Heilbronn, Hans . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61, 748
cyclic ∼ . . . . . . . . 250, 251, 349, 422, 578, 704, 713 Heintz, Joos . . . . . . . . . . . 497, 618, 619, 739, 747, 749
factor ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704 Hellman, Martin Edward . . . 503, 504, 509, 573, 575,
Galois ∼ . . . . . . . 373, 398, 421, 441–443, 465, 762 576, 578–582, 742, 756, 757, 763
homomorphism . . . . . . . . . . . . . . . . . . . . . . . . . 554, 704 Hendriks, Peter Anne . . . . . . . . . . . . . . . . . . . . . 671, 749
isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . 105, 704 Henry, Alan Sorley . . . . . . . . . . . . . . . . . . . . . . . . 727, 741
Klein ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 Henry, Charles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
of units . . . . . . . . . . . 63, 74, 150, 518, 707, 712, 768 Hensel, Kurt Wilhelm Sebastian . 444, 466, 745, 749,
order of a ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704 752, 753, 767
symmetric ∼ . . . . . . . . . . . . . . . . . . 136, 442, 465, 705 lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447, 449
Grund, Roland . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698, 736 lifting . . . . . . 3, 100, 101, 198, 257, 271, 373, 444,
Guibas, Leonidas Ioannis (Leo John) . . . . . . 205, 748 433–471, 488, 489, 491, 492, 496
Guillou, Louis Claude . . . . . . . . . . . . . . . . . . . . . . . . . . . 758 multifactor ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
Guilloud, Jean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 step . . . . . . . . . . . . . . . . . . . . . 445, 447, 448, 450, 469
Gulliver, Lemuel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729 Hermann, Grete . . . . . . . . . . . . . . . . . . . . . . . . . . . 616, 749
Gustavson, Fred Gehrung . . . . . . . . . . . . . . . . . 332, 738 Hermite, Charles . . . . . . . . . . . . . . . . 626, 640, 747, 749
Guy, Richard Kenneth . . . . . . . . . . . . . . . . 568, 569, 748 interpolation . . . . . . . . . . . . . 113, 114, 115, 118, 137
rational ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Hn , harmonic number . . . . . . 466, 645, 650, 651, 652 normal form . . . . . . . . . . . . . . . . . . . . 89, 352, 498, 499
Habicht, Walter . . . . . . . . . . . . . . . . . . . . . . 199, 748, 755 reduction . . . . . . 625, 626, 627, 631, 640, 642, 673
780 Index

Herzog, Dieter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6, 7 integration . . . . . . . . . . . . . 3, 632, 633, 634, 637, 671

Hessenberg, Gerhard, form . . . . . . . . . . . . . . . . . . . . . . 338 hypergeometric . . 645, 659, 660, 665, 667, 669, 674,
heuristic gcd . . . . . 194–196, 202, 206, 317, 320, 333 675, 683
Higham, Nicholas John . . . . . . . . . . . . . . . . . . . 337, 749 distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682
Hilbert, David 3, 24, 25, 89, 90, 140, 373, 419, 495, function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
572, 586, 587, 588, 591, 616, 618, 678, 725, 726, identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
728, 742, 749, 761 series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373, 670
basis theorem . . . . . . . 586, 601, 604, 605, 606, 618 summation . 3, 641, 660, 665, 658–669, 671, 674,
class field theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 683, 685
function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586, 618
irreducibility theorem . . . . 495, 496, 498, 586, 750 I(·, ·), number of irreducible polynomials . . . . . . . . 409
Nullstellensatz . 586, 595, 617, 618, 621, 736, 761 ℑ, imaginary part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 Iamblichus (᾿Ιάμβλιχος) of Chalcis . . . . . . . . . . . . . 531
tenth problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89, 756 IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Hilbertian field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498 IDCT, Inverse Discrete Cosine Transform . 363, 366,
Hirn, Andreas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 369
Hironaka, Heisuke . . . . . . . . . . . . . . . . . . . . . . . . 591, 749 ideal . . . . . . . . . 71, 681, 705, 706, 709, 710, 716, 768
Hocquenghem, Alexis . . . . . . . . . . . . . . . . 215, 749, 756 binomial ∼ . . . . . . . . . . . . . . . . . . . . . . . . . 616, 681, 697
van Hoeij, Marinus (Mark) Hubertus Franciscus . . 7, containment testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
497, 671, 735, 748, 749 coset of an ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706
van der Hoeven, Joris . . . . . . . . . . . . . . . . . . . . . 469, 749 equality testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
Hoffman, Christoph M. . . . . . . . . . . . . . . . . . . . 618, 749 finitely generated ∼ . . . . . . . . . . . 593, 603, 604, 618
Hoffman, Dean Gunnar . . . . . . . . . . . . . . . . . . . 215, 749 homogeneous ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
Hohberger, Reinhard . . . . . . . . . . . . . . . . . . . . . . 698, 736 membership . . . 594, 600, 601, 605, 610, 616–618
Holland, Brian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728 monomial ∼ . . . . . . . . . . . . . . . . . . 601, 602, 603, 620
Holland, Eddie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728 of polynomials . . . . . . . . . . . . . . . 498, 591–621, 677
Holliday, Francis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764 right ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
Holmes, Thomas Sherlock Scott . . . . . . . . . . . 702, 729 variety of an ∼ . . . . . . 593, 594, 595, 614, 617, 618
Homer (῞Ομηρος) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 image
homogeneous ideal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616 compression . . . . . . . . . . . . 3, 11, 359, 363, 364, 368
homomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . 714, 715 of a homomorphism, im . . . . . . . . . . . 704, 706, 714
image of a ∼ , im . . . . . . . . . . . . . . . . . . 704, 706, 714 processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
kernel of a ∼ , ker . . . 105, 349, 383, 401, 704, 706, imaginary quadratic field . . . . . . . . . . . . . . . . . . . . . . . . 707
709, 714 Imirzian, Gregory Manug . . . . . . . . . . . . . . . . . 498, 744
of algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Immerman, Neil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760
of groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554, 704 Impagliazzo, Russell Graham . . . 679, 697, 736, 739,
of rings . . . 104, 107, 133, 295, 302, 705, 706, 709 741
theorem implicit
for groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384, 704 function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
for modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
for rings . . . . . . . . . . . . . . . . . . . . . . . . 105, 706, 711 implicitization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613, 614
Hoover, H. James . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 impulse response sequence . . . . . . . . . . . . . . . . . . . . . . 349
Hopcroft, John Edward . . . . . 286, 332, 352, 735, 738 indefinite summation . . . . . . . . . . . . . . . . . . . . . . 646, 683
Horn, Alfred, clause . . . . . . . . . . . . . . . . . . . . . . 678, 679 independent
Horner, William George, rule . . . . 88, 101, 133, 235, events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
270, 289, 296, 306, 338, 346, 348 random variables . . . . . . . . . . . . . . . . . . . . . . . 718, 719
Horowitz, Ellis . . . . . . . . . . . . . . . . . . . . . . . 306, 627, 749 inference rule . . . . . . . . . . . . . . . . . . . . . . . . 677, 678, 679
Horwitz, Jeremy Aaron . . . . . . . . . . . . . . . . . . . 567, 749 Ingemarsson, Ingemar . . . . . . . . . . . . . . . . . . . . . 747, 760
Huang, Ming-Deh Alfred . . . . . . . 421, 498, 749, 750 injective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704, 714
Huang, Xiaohan . . . . . . . . . . . . . . . . 353, 405, 420, 750 inner product . 12, 15, 348, 353, 473, 485, 498, 685,
Huffman, David Albert . . . . . . . . . . . . . . . 307, 368, 750 687, 689, 698, 717, 768
code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307, 365–368 insertion sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307, 308 instance of a decision problem . . . . . . . . . . . . . . . . . . 721
average depth of a ∼ . . . . . . . . . . . . . . . . . . . . . . . . 308 instantaneous code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Hugenius, Christianus . . . . . . see Huygens, Christiaan integer
Hume, David . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68, 726 addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Hungerford, Thomas William . . . . . . . . . . . . . 703, 750 factorization . . . . . . . . . . . . . . . . . . . . . see factorization
Hurwitz, Adolf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90, 750 multiplication . . . . . . . . . . . . . . . . . see multiplication
Huygens (Hugenius, Huyghens), Christiaan . 89, 750 multiprecision ∼ . . 29, 30–32, 34, 37, 41, 82, 283,
Huynh, Thiet-Dung . . . . . . . . . . . . . . . . . . . . . . . 618, 750 286
hybrid algorithm . 248, 252, 280, 281, 282, 411, 492 p-adic ∼ . . . . . . . . . . . . . . . . . . . . . . 270, 275, 449, 466
hyperexponential . . . . . . 623, 633, 635–637, 639, 641 partition . . . . . . . . . . . . . . . . . . . . . . . . . . . 441, 442, 468
Index 781

representation of ∼ s . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 of rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705

root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271, 460 ISSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
integral 369, 570, 623, 624, 625, 627, 640, 642, 647, iterated Frobenius algorithm 76, 398, 399, 400, 404,
685 405, 407, 408, 420, 427
domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706, 707–711
logarithmic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 jackpot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
root . . . . . . . . . . . . . . . . . . . . . . . . . . 392, 393, 635, 641 Jacob von Coburg (Koburgk), Simon . . . . . . . . . . . . . 61
integration 4, 20, 101, 199, 264, 395, 623, 625, 631, Jacobi, Carl Gustav Jacob . . . 91, 132, 197, 373, 586,
640, 647, 650, 653 644, 750
Almkvist and Zeilberger ∼ . . . . . . . . . 632, 637, 643 symbol . . . . . . . . 508, 529, 533, 537, 554, 757, 763
by parts . . . . . . . . . . . . . . . . . . . . . . 623, 624, 626, 673 Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450, 470, 621
hyperexponential ∼ . . . . 3, 632, 633, 634, 637, 671 Jebelean, Tudor . . . . . . . . . . . . . . . . . . . . . . . . . . . 286, 750
Lazard-Rioboo-Trager ∼ . 627, 630, 631, 640, 758 Jeffrey, David John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751
of rational functions . . . . . . . . 3, 626, 630, 632, 640 Jenks, Richard Dimick . . . . . . . 20, 750, 757, 765, 767
Rothstein and Trager ∼ . see Rothstein and Trager Jensen, Christian Albrecht . . . . . . . . . . . . . . . . . . . . . . 725
Intel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Jin, Xiaofan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767
intermediate expression swell . . . . . . . . . . 97, 145, 616 Johnson, David Stifler . . . . . . 509, 722, 724, 745, 750
internal node in a mobile . . . . . . . . . . . . . . . . . . 306, 308 Johnson, Don Herrick . . . . . . . . . . . . . . . . . . . . . 247, 748
internet . . . . . . . . . . . . . . . . . . . . . . . 18, 83, 517, 531, 575 Jones, William . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90, 750
interpolation . . . 18, 19, 97–139, 168, 169, 190, 231, Jordan, Károly (Charles) . . . . . . . . . . . . . . . . . . 669, 750
280, 281, 295, 295–310, 324, 333, 352, 496, 498, Journal
681 of Symbolic Computation . . . . . . . . . . . . 21, 41, 618
Cauchy ∼ . . . . . . 118, 121, 137, 138, 190, 325, 331 of the ACM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
fast ∼ . . . . . . . . . . . . . . . . . . . . 231, 301, 295–310, 331 SIAM ∼ on Computing . . . . . . . . . . . . . . . . . . . . . . . . 21
formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 JPEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
Hermite ∼ . . . . . . . . . . . . . . . 113, 114, 115, 118, 137 Julia, Gaston Maurice . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
rational ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273, 276
Lagrange ∼ 18, 100, 101, 102, 107, 118, 134, 299, Julian calendar . . . . . . . . . . . . . . . . . . . . . . . . . . . 83, 84, 91
303, 739 Jung, Dirk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103, 232 Jürgens, Hartmut . . . . . . . . . . . . . . . . . . . . . . . . . . 278, 760
Newton ∼ . . . . . . . . . . . . . . . . . . . . . 103, 134, 135, 671
of bivariate polynomials . . . . . . . . . . . . . . . . . . . . . . 134 Kajler, Norbert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21, 750
of sparse polynomials . . . . . . . . . . . . . . . . . . . . . . . . 498 Kalorkoti, Kyriakos (Καλορκότη, Κψριάκος) . . . 7,
polynomial . . . . . . . . . . . . 18, 97–139, 249, 295–310 286, 750
rational ∼ . . . . . . . . . . . . . . . . 101, 118, 119, 120, 132 Kaltofen, Erich Leo 6, 7, 20, 41, 132, 199, 245, 247,
intersecting plane curves . . . 141, 171, 172, 175, 203 340, 351, 353, 401, 404–407, 419, 420, 465,
invariant ring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618 495–498, 641, 739, 741, 742, 744, 746, 747, 750,
inversion 751
formula . . . . . . . . . . . . . . . . . . . . . . 361, 362–364, 369 Kaminski, Michael . . . . . . . . . . . . . . . . . . . . . . . . 353, 751
Möbius ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . 410, 413, 429 Kanada, Yasumasa . . . . . . . . . . . . . . . . . . . . . 82, 247, 751
modular ∼ . . . . . . . . . . . . . . . . . see modular inversion Kanellakis, Paris Christos (Κανελλάκης, Πάρις
Newton ∼ . . . . . . . . . . . . . . . . . . . see Newton inversion Χρίστος), Award . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Kannan, Ravindran . . . . . . . . . . . . . 496, 505, 744, 751
p-adic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 Kanno, Masaaki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Iohvidov, Iosif Semenovich (Iohvidov Iosif Kant, Immanuel . . . . . . . . . . . . . . . . . . . . . . 208, 586, 726
Semenoviq) . . . . . . . . . . . . . . . . . . . . . . . . . 334, 727 Kao, Ming-Yang . . . . . . . . . . . . . . . . . . . . . . . . . . 199, 740
irreducibility Karatsuba, Anatoliı̆ Alekseevich (Karacuba
Hilbert ∼ theorem . . . . . . . 495, 496, 498, 586, 750 Anatoli i  Alekseeviq) . . . . 223, 245, 247,
test . . . . . . . . . . . . . . . . . . . . . . . . . . . 406, 408, 461, 497 750, 751
over finite fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 algorithm . . . . . . 193, 223, 221–254, 278–286, 335
irreducible . . . . . . . . . . . . . . . . . 149, 707, 708, 709, 713 circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
factor . . . . 381, 385, 387, 389, 392, 400, 405, 411, Karp, Alan Hersh . . . . . . . . . . . . . . . . . . . . . . . . . 286, 751
416, 424 Karp, Richard Manning . . . . . . . . . . . . . . 509, 722, 751
factorization . . . 149, 150, 377, 394, 397, 404, 414, Karpinski, Marek Mieczysław 6, 198, 498, 741, 746,
419, 422, 433, 435, 441, 453, 462, 468, 469, 748
471, 630, 635 Karr, Michael . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671, 751
polynomial . . . . . 73, 377, 406, 408–412, 414, 421, al-Kāshı̄, Ghiyāth al-Dı̄n Jamshı̄d bin Mas֒ūd bin
710–712 Mah.mūd ( XñÒm× áK . XñªÓ áK . YJ
Ô g. áK
YË@ HAJ
«
construction of an ∼ . . . . . . . . . . . . . 377, 406, 410
isomorphic . . . . . . . . . . . . . . . . . . . . . . 704, 705, 714, 768 úæA¾Ë@) . . . . . . . . . . . . . . 0, 82, 88, 90, 286, 725, 756
isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231, 714 Kasner, Edward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
of groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105, 704 Kayal, Neeraj . . . . . . . . . . . . . . . . . . . . . . . . 517, 543, 735
782 Index

Kedlaya, Kiran Sridhara . . . . 339, 405, 407, 408, 420, Kraft, Leon Gordon, Jr. . . . . . . . . . . . . . . . . . . . 307, 752
751 Kraïtchik, Maurice Borisovitch . . 376, 540, 567, 727,
Keller, Carsten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 728, 752
Keller, Wilfrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Krajíček, Jan . . . . . . . . . . . . . . . . . . . . 697, 736, 739, 752
Keller-Gehrig, Walter . . . . . . . . . . . . . . . . . . . . . 352, 751 Krandick, Werner Johannes . . . . . . . . . . . . . . . . . . . . 6, 7
Kelley, Colin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Kronecker, Leopold . . . . 28, 132, 137, 197, 247, 353,
Kempfert, Horst . . . . . . . . . . . . . . . . . . . . . . 417, 466, 751 465, 725, 742, 752
Kepler, Johannes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 substitution . . . . . . . . . . . . . . 245, 246, 254, 494, 501
Kerber, Adalbert . . . . . . . . . . . . . . . . . . . . . . . . 7, 698, 736 Krummel, Volker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
kernel of a homomorphism, ker . 105, 349, 383, 401, Kruppa, Alexander . . . . . . . . . . . . . . . . . . . . . . . . 542, 751
704, 706, 709, 714 Krylov, Alekseı̆ Nikolaevich (Krylov Alekse i
Kerschensteiner, Georg . . . . . . . . . . . . . . . . . . . . . . . . . 747 Nikolaeviq) . . . . . . . . . . . . . . . . . . . . . . . 353, 753
key subspace . . . . . . . . . . . . . . . . . . . . . 341, 346, 347, 355
Diffie-Hellman ∼ exchange . . . see Diffie-Hellman Ku, Yu Hsui . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131, 753
in a cryptosystem . 16, 18, 505, 509, 573, 573–582 Küchlin, Wolfgang Wilhelm . . . . . . . . . . 736, 742, 746,
private ∼ . . . . . . . . 17, 504, 509, 575, 576–579, 582 748–751, 760, 761, 765
public ∼ . . . . . . . . 17, 503, 509, 575, 577–579, 582 Kuhnert, Martina Ariane . . . . . . . . . . . . . . . . . . . . . . . 7, 8
tonal ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85, 86 Kühnle, Klaus . . . . . . . . . . . . . . . . . . . . . . . 616, 617, 753
al-Khwārizmı̄, Abū Ja֒far Muh.ammad bin Mūsā Kummer, Ernst Eduard . . . . . . . . . . . . . . . . . . . . . . . . . . 514
ñm Ì '@ úæñÓ áK . YÒm × Q®ªk
( ú× PP@
ñK @) . 68, 88, 90,
. . Kunerle, Jens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Kung, Hsiang Tseng . . . . . . . 286, 353, 354, 738, 753
256, 286, 726, 727, 741, 744, 762
Kipling, Joseph Rudyard . . . . . . . . . . . . . . . . . . 644, 729 Kurowski, Scott . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Kirchhoff, Gustav Robert . . . . . . . . . . . . . . . . . . . . . . . 728 Kvashenko, Kirill Yur’evich (Kvaxenko Kirill
Kirkpatrick, David Galer . . . . . . . . . . . . . . . . . . 353, 751 rьeviq) . . . . . . . . . . . . . . . . . . . . . . . . . . . 641, 735
Kiyek, Karl-Heinz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Klapper, Andrew Manoch . . . . . . . . . . . . . . . . . . . . . . . . . 7 Lafon, Jean-Claude . . . . . . . . . . . . . . . . . . . . . . . 671, 753
Klein, Felix . . . . . . . . . . . . . . . . . . . . . . . 25, 358, 586, 727 Lagally, Klaus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 Lagarias, Jeffrey Clarke . . . . 443, 505–507, 509, 580,
Kleinjung, Thorsten . . . . . . . . . . . . . . . . . . . . . . . 542, 751 744, 753
Klinger, Leslie Stuart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Lagrange (la Grange), Joseph Louis, Comte de . . 90,
knapsack 91, 131, 419, 590, 728, 753
cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . 503, 509 interpolant . . . . . 101, 102, 105, 107, 131, 133, 246,
problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 249, 427
Kneller, Sir Godfrey, Baronet . . . . . . . . . . . . . . . . . . . 725 interpolation . . . . 18, 100, 101, 102, 107, 118, 134,
Knopfmacher, Arnold . . . . . . . . . . . . . . . . . . . . . 419, 752 299, 303, 739
Knopfmacher, John Peter Louis . . . . . . . . . . . 419, 752 multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
Knörrer, Horst . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568, 738 theorem . . . 89, 212, 412, 415, 518, 519, 537, 538,
Knuth, Donald Ervin . . . . 7, 8, 25, 40, 61, 62, 88, 90, 562, 704, 712, 714
247, 286, 308, 332, 417, 505, 531, 567, 571, 669, Lakshman, Yagati Narayana . . . . . 498, 618, 744, 745,
670, 717, 720, 724, 747, 752 751, 753
Koblitz, Neal . . . . . . . . . . . . . . . . . . . 531, 568, 580, 752 Lalande, Joseph-Jérôme Lefrançais de . . . . . . . . . . . 728
von Koch, Niels Fabian Helge . . . . . . . . . . . . . 287, 752 LaMacchia, Brian A. . . . . . . . . . . . . . . . . . . . . . . 353, 753
snowflake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278, 287 λ, Carmichael function . . . . . . . . . . . . . . . . . . . . . . . . . . 535
Koepf, Wolfram . . . . . . . . . . . . . . . . . 670, 671, 697, 752 λ, length of an integer . . . . . . . . . . . . . . . . . . . 30, 53, 142
Kohel, David Russell . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749 Lambe, Larry Albert . . . . . . . . . . . . . . . . . . . . . . . . 21, 753
Kolaitis, Phokion-Gerasimos (Κολα´ϊτης, Lambert, Johann Heinrich . . . . . . . . . . . . . . . . . . . 82, 753
Φωκίων-Γεράσιμος) . . . . . . . . . . . . . . . . . . . . . . . 760 Lamé, Gabriel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61, 753
Kollár, János . . . . . . . . . . . . . . . . . . . . . . . . . 618, 735, 752 theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61, 66
Kollberg, Lennart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516 Lamport, Leslie B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Kolmogorov, Andreı̆ Nikolaevich (Kolmogorov Lanczos, Cornelius . . . . . . . . . . . . . . . . . . . . . . . . 353, 754
Andre i Nikolaeviq) . . . . . . . . . . . . . . . . . . 247 algorithm . . . . . . . . . . . . . . . . . . . . . 353, 741, 742, 758
Kondo, Shigero . . . . . . . . . . . . . . . . . . . . . . . . . 82, 90, 767 Landau, Edmund Georg Hermann . . . . 165, 586, 724,
Koornwinder, Tom Hendrik . . . . . . . . . . . . . . . . . . . . . . . 7 748, 752, 754
Körner, Heiko . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165, 166
Korselt, Alwin Reinhold . . . . . . . . . . . . . . . . . . 532, 752 Landau, Susan Eva . . . . . . . . . . . . . . . . . . . 576, 581, 752
criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532 Landrock, Peter . . . . . . . . . . . . . . . . . . . . . . . . . . . 532, 742
Kotsireas, Ilias Sotirios (Κοτσιρέας, ᾿Ηλίας Landry, Fortuné . . . . . . . . . . . . . . . . . . . . . . . . . . . 542, 754
Σωτηρίου) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Lang, Serge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498, 754
Kovalevskaya, Sof’ya Vasil’evna (Sonya Kowalewski, Lange, Tanja . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421, 754
Kovalevska Sofь Vasilьevna) . . 726 Laplace (la Place), Pierre Simon, Marquis de . . . 294,
Koy, Henrik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497, 752 432, 724, 725, 727, 728, 754
Kozen, Dexter . . . . . . . . . . . . . 576, 581, 619, 736, 752 expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159, 716
Index 783

Larson, Richard Gustavus . . . . . . . . . . . . . . . . . 532, 735 Levin, Leonid Anatol’evich (Levin Leonid
Las Vegas Anatolьeviq) . . . . . . . . . . . . . . . . . . . . . . 724, 755
algorithm . . . . . . . . . . . 161, 198, 402, 470, 471, 724 Lewin, Daniel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199, 755
Turing machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722 lexicographic order . . . 596, 598, 608, 615, 694, 695
lattice . . . . 3, 286, 434, 473, 473–501, 504, 506–508, Leykin, Anton (Le i kin, Anton Gennadi i oviq)
573, 712 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748
dimension of a ∼ . . . . . . . . . . . . . . . . . . . . . . . . 474, 480 Leyland, Paul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
norm of a ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473, 474 Li, Gang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352, 740
Laue, Reinhard . . . . . . . . . . . . . . . . . . . . . . . . . . . 698, 736 Lickteig, Thomas Michael . . . . . . 184, 199, 332, 755
Lauer, Daniel (Daniel Reischert) . . 7, 198, 279, 332, L IDIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20, 279
466, 754, 761 Lidl, Rudolf . . . . . . . . . . . . . . . . . . . . . . . . . . 421, 711, 755
Laurent, Pierre Alphonse, series . . . . . . . . . 91, 94, 768 Lie, Marius Sophus . . . . . . . . . . . . . . . . . . . . . . . 622, 728
law of quadratic reciprocity . . . . . 372, 529, 537, 586 von Lindemann, Carl Louis Ferdinand . . . . . . . 82, 755
Lawrence, Thomas Edward . . . . . . . . . . . . . . . 140, 726 Lindner, Charles Curt . . . . . . . . . . . . . . . . . . . . . 215, 749
Lazard, Daniel . . . . . . . . . . . . . 619, 640, 744, 754, 758 line at infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
-Rioboo-Trager integration . . . 627, 630, 631, 640, linear
758 algebra . . . . 3, 4, 21, 109, 175, 179, 335–356, 373,
lc, leading coefficient . . . . . . . . 32, 38, 597, 708, 709 401, 420, 475, 557, 703, 713, 714, 715
lcm, least common multiple . . . . . . . . . . . . . . . . . . 46, 57 black box ∼ . . . . . . 335, 340, 346, 352, 404, 407
le Carré, John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . see Carré explicit ∼ . . . . . . . . . . . . . . . . . . . . . . . . 335, 352, 407
leading sparse ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
coefficient, lc . . . . . . . . . . . . . . 32, 38, 597, 708, 709 code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209, 215
digit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30, 40 combination map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
monomial, lm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598 congruential generator . . . . . . . . . . . . . 503, 505, 574
principal submatrix . . . . . . . . . . . . . . . . . . . . . 204, 351 Diophantine equation . . . . . . . . . . . 69, 77, 79, 89, 93
term, lt . . . . . . . . 595, 598, 599, 600, 604, 606–608 equation . . . 1, 3, 66, 129, 175, 197, 340, 346, 685
unit, lu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 feedback shift register . . . . . . . . . . . . . . . . . . 341, 342
leaf of a mobile . . . . . . . . . . . . . . . . . . . . . . 306, 307, 308 map . . . . . . . . . . . . . . . . . . . . . 103, 229, 349, 354, 714
leap day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
least softly ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
absolute remainder Euclidean Algorithm . . . . . . . 66 subspace . . . . . . . . . . . 209, 210, 280, 714, 715, 768
absolute residue . . . . . . . . . . . . . . . . . . . . . . . . . . 72, 550 system of ∼ equations 1, 120, 131, 136, 183, 214,
common multiple, lcm . . . . . . . . . . . . . . . . . . . . . 46, 57 335–356, 460, 485, 552, 621, 638, 664–666,
Lebesgue (Le Besgue), Victor Amédée . . . . 533, 754 715, 716
Lee, King . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2, 337, 736 sparse ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325, 556
Lee, Lin-Nan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509, 755 linearly
van Leeuwen, Jan . . . . . . . . . . . . . . . 750, 754, 764, 765 convergent Newton iteration . . . . . . . . . . . . . . . . . . 291
Legendre (le Gendre), Adrien Marie . . 198, 372, 418, dependent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714, 717
420, 466, 468, 516, 533, 569, 728, 754, 766 independent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714, 715
symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529, 537, 562 recurrent sequence . . 340, 341, 343–347, 349, 353,
Lehmann, Daniel Jean . . . . . . . . . . . . . . . . . . . . 537, 754 355
primality test . . . . . . . . . . . . . . . . . . . . . . . . . . . 537, 538 van Lint, Jacobus Hendricus . . . . . . . . . . . . . . . 215, 755
Lehmer, Derrick Henry 332, 530, 542, 569, 738, 754 Liouville, Joseph . 472, 623, 640, 728, 729, 755, 764,
Leibniz, Gottfried Wilhelm, Freiherr von . 26, 88, 89, 796
96, 197, 219, 294, 512, 513, 531, 622, 640, 726, L IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
754, 756 Lipson, John David . . . . . . . . . . . . . . . . 6, 247, 306, 755
rule . . . . . . . . . . . . . . . . . . . . . 266, 290, 425, 623, 647 Lipton, Richard Jay . . . . . . . . . . . . . . . . . . . . 88, 198, 742
Leighton, Ralph . . . . . . . . . . . . . . . . . . . . . . . . . . . 727, 728 Lisoněk, Petr . . . . . . . . . . . . . . . . . . . . . . . . . 670, 671, 755
Leiserson, Charles Eric . . . . . . . . . . . . . . . . 41, 368, 741 Little, John Brittain . . . . . . . . . . . . . 614, 617, 618, 741
Lemmermeyer, Franz . . . . . . . . . . . . . . . . . . . . . 724, 754 Liu, Zhuojun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376, 727
length LLL algorithm . . . . . . . . see basis reduction algorithm
of a code . . . . . . . . . . . . . . . . . . . . . . . . . . 209, 210–212 Lloyd, Daniel Boone, Jr. . . . . . . . . . . . . . . . . . . 419, 755
of an integer, λ . . . . . . . . . . . . . . . . . . . . . . . 30, 53, 142 Lloyd, Daniel Bruce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Lenstra, Arjen Klaas . . 279, 465, 474, 475, 497, 506, Lloyd, Stuart Phinney . . . . . . . . . . . . . . . . . . . . . 421, 763
531, 534, 542, 569, 741, 742, 751, 754, 755 lm, leading monomial . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
Lenstra, Hendrik Willem, Jr. . . . . 419, 421, 441, 474, ln, natural logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
475, 497, 506, 531–534, 541, 542, 557, 558, 563, Lobachevskiı̆, Nikolaı̆ Ivanovich (Lobaqevski i 
565, 568, 569, 724, 735, 736, 754, 755, 760, 764 Nikola i Ivanoviq) . . . . . . . . . . . . . . . . . . . 374
elliptic curve factoring method . . see elliptic curve Lobo, Austin . . . . . . . . . . 353, 404, 405, 407, 747, 751
Leonard, Douglas Alan . . . . . . . . . . . . . . . . . . . 215, 749 local area network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Levelt, Antonius Henricus Maria (Ton) . 6, 698, 735, Locke, John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208, 726
742, 755 log, binary logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
784 Index

logarithmic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625 M ATHEMATICA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

derivative . . . . . . . . . . . . . . . . 633, 635, 636, 639, 641 Mathematics of Computation . . . . . . . . . . . . . . . . . . . . . 21
integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 Matiyasevich, Yuriı̆ Vladimirovich (Matiseviq
part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625, 642 ri i  Vladimiroviq) . . . . . . 89, 640, 756
Lombardi, Henri . . . . . . . . . . . . . . . . . . . . . . . . . . 199, 755 Matooane, Mantsika . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Loos, Rüdiger Georg Konrad . . . . 466, 750, 753, 755, matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337, 714, 768
757 Bézout ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197, 201
Lotz, Martin Andreas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 coefficient ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
Lovász, László . . . 474, 475, 496, 497, 506, 748, 754, evaluation . . . . . . . . . . . . . . . 340, 346, 348, 352, 353
759 Gramian ∼ . . . . . . . . . . . . . . . 475, 482, 688, 698, 717
Lovelace, Augusta Ada Byron King, Countess of 10, inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
725 multiplication . . 43, 335, 337–339, 411, 715, 720
Loxton, John Harold . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755 exponent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
LR-decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 fast ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . 336, 337, 340
lt, leading term . . . 595, 598, 599, 600, 604, 606–608 feasible ∼ exponent . . . . . . . . . . . . . 337, 338, 352
lu, leading unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 rectangular ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Lu, Shyue-Ching . . . . . . . . . . . . . . . . . . . . . . . . . . 509, 755 nonsingular ∼ . . . 89, 109, 116, 212, 213, 346, 351,
Luby, Michael George . . . . . . . . . . . . . . . . . . 6, 215, 735 353, 355, 477, 498, 621, 688, 715, 716
Lucas, François Édouard Anatole . . . . 530, 754, 756 normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
-Lehmer test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530, 754 permutation ∼ . . . . . . . . . . . . . . . . . . . . . 483, 715, 716
sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Petr-Berlekamp ∼ . . . . . see Petr-Berlekamp matrix
Lucasian Professor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698, 714, 715
Lucchesi, Cláudio Leonardo . . . . . . . . . . . . . . . . . . . . 759 singular ∼ 103, 204, 347, 351, 355, 688, 715, 716
Luckey, Paul . . . . . . . . . . . . . . . . . . . . . . . . . . . 90, 725, 756 sparse ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335, 340
Luckhardt, Emil . . . . . . . . . . . . . . . . . . . . . . . . . . 358, 727 Sylvester ∼ . . . . . . . . . . . . . . . . . . see Sylvester matrix
Lücking, Thomas . . . . . . . . . . . . . . . 6, 7, 197, 199, 746 Toeplitz ∼ . . . . . . . . . . . . . . . 202, 332, 335, 353, 738
Ludlum, Robert (Jonathan Ryder) . . . . . . . . . . . 10, 725 triangular ∼ . . . . . . . . . . . . . . . . . . . 329, 638, 666, 716
Luks, Eugene Michael . . . . . . . . . . . . . . . . . . 8, 724, 736 unimodular ∼ . . . . . . . . . . . . . . . . . . . . . . . . 89, 498, 499
Lumbert, Robert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738 Vandermonde ∼ , VDM . see Vandermonde matrix
lunar calendar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Mattson, Harold F. (Skip), Jr. . . . . . . . . . . . . . . . . . . . 746
van de Lune, Jan . . . . . . . . . . . . . . . . . . . . . . . . . . 533, 756 Maurer, Ulrich Martin (Ueli) . . . . . . . . . . . . . . 580, 756
Lüneburg, Heinz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 maximum
lunisolar calendar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 likelihood decoding . . . . . . . . . . . . . . . . . . . . . . . . . . 210
ΛΥΩ (Lambda–Upsilon–Omega) . . . . . . . . . . . . . . . 697 norm, max-norm, || · ||∞ . . 158, 166, 183, 191, 202,
205, 206, 261, 327, 493, 500, 675, 717, 768
M, multiplication time 244, 245, 247, 254, 257, 381 Mayor, Richard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726
Ma, Keju . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6, 62, 756 Mayr, Ernst Wilhelm . . . . . 6, 616–618, 697, 753, 756
Mabbott, Thomas Ollive . . . . . . . . . . . . . . . . . . . . . . . . 728 McAuley, Anthony Joseph . . . . . . . . . . . . . . . . 509, 747
M ACAULAY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619 McCormack, Thomas Joseph . . . . . . . . . . . . . . . . . . . . 727
Macaulay, Francis Sowerby 197, 590, 619, 728, 729, McCurley, Kevin Snow . . . . . . . . . . . . . . . . . . . 580, 757
756, 796 McEliece, Robert James . . . . . . . . . 215, 419, 737, 757
Mach, Ernst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220, 727 McInnes, James L. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Machin, John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 McKenzie, Pierre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
machine cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31, 42, 99 mdeg, multidegree of a polynomial . . . . . . . . 597, 598
Mack, Dieter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642, 756 mean value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
Maclaurin, Colin . . . . . . . . . . . . . . . . 197, 198, 286, 756 median in a triangle . . . . . . . . . . . . . . . . . . . . . . . 592, 593
M ACSYMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 MEGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
MacWilliams, Florence Jessie . . . . . . . . . . . . . 215, 756 Mellencamp, John Cougar . . . . . . . . . . . . . . . . 208, 726
M AGMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Menabrea, Luigi Federico . . . . . . . . . . . . . . . . . . . . . . . 725
Mahnke, Dietrich . . . . . . . . . . . . . . . . . . . . . . 88, 531, 756 Mendoza, Olga Lisvet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Makowsky, János . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Menezes, Alfred John . . . . . . . . . . . . . . . . . . . . . 580, 757
Man, Yiu-Kwong . . . . . . . . . . . . . . . . . . . . . . . . . 671, 756 Meng, Sun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Manasse, Mark Steven . . . . . 531, 534, 569, 754, 755 mergesort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Mandelbrot, Benoît Baruch . . . . . . . . . . . . . . . 278, 756 Merkle, Ralph Charles 503, 504, 509, 576, 757, 763
Manin, Yuriı̆ Ivanovich (Manin ri i  Mersenne, Marin . . . . . . . . . . . . . 86, 512, 669, 744, 757
Ivanoviq) . . . . . . . . . . . . . . . . . . . . . 568, 740, 756 number . . . . . . . . . . . . . 251, 517, 530, 534, 754, 766
M APLE i, 8, 20, 21, 65, 82, 143, 173, 181, 192, 201, prime . . . . . . . . . . . . . . . . . . . . 517, 530, 531, 534, 535
227, 534, 621, 632, 679, 686–697 Mertens, Franz Carl Joseph . . . . . . . . . . . . . . . 508, 757
Markov, Andreı̆ Andreevich (Markov Andre i conjecture . . . . . . . . . . . . . . . . . . . . . . . . . 503, 508, 759
Andreeviq) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740 method of undetermined coefficients . 627, 638, 639,
Markstein, Peter . . . . . . . . . . . . . . . . . . . . . . . . . . 286, 751 641, 667
Massey, James Lee . . . . . . . . . . . . . . 215, 325, 742, 756 Metropolis, Nicholas . . . . . . . . . . . . . . . . . . . . . . 198, 757
Index 785

Metzner, Torsten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 prime power ∼ . . . 435, 453, 457, 466, 467, 526,
Meyer, Albert Ronald da Silva . . . . . . . 616–618, 756 528, 529
Meyer auf der Heide, Friedhelm . . . . . . . . . . . . . . . . . 744 gcd . 141, 146, 152, 158, 161, 163, 164, 190, 196,
Meyer Eikenberry, Shawna . . . . . . . . . . . . . . . . 533, 757 198, 202, 313, 681
Meyn, Helmut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–8 big prime ∼ . 162, 166, 168, 169, 171, 193–196,
Micali, Silvio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750 206, 411, 460, 526, 529
Mierendorff, Eva . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 bivariate ∼ . . . . . . . . . . . . . . . . . 141, 162, 168, 203
Mignotte, Maurice . . . . . . . . . . . . . . 146, 198, 421, 757 small primes ∼ . . . 168, 169, 170, 171, 194–196,
bound . . . 141, 164, 166, 167, 171, 184, 194, 196, 203, 206, 460, 526, 528
198, 434, 436, 438, 455, 470, 488, 490, 492 inversion 69, 73, 76, 77, 111, 115, 124, 138, 163,
Mihăilescu, Preda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 263, 265, 268
Mikeladze, Sh. E. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 multiplication . . 73, 243, 262, 283, 460, 461, 536
millenium bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 module . . . . . . . . . . . . . . . . . . . . . . . . . 342, 349, 354, 500
Miller, Gary Lee . . . . . . 532, 533, 535, 736, 755, 757 cyclic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349, 350
Miller, Raymond Edward . . . . . . . . . . . . . . . . . 744, 751 Z-∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349, 473
Miller, Victor Saul . . . . . . . . . . . . . . . . . . . . . . . . 580, 757 Moenck, Robert Thomas . . . 247, 286, 306, 332, 421,
minimal 671, 673, 737, 757
distance of a code . . . . . . . . . . . . 210, 211–213, 215 de Moivre, Abraham . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Euclidean function . . . . . . . . . . . . . . . . . . . . . . . . 62, 63 Möller, Hans Michael . . . . . . . . . . . . . . . . 199, 618, 757
Gröbner basis . . . . . . . . . . . . . . . . . . . . . 611, 620, 621 monic
polynomial Euclidean Algorithm . 57, 62, 184, 186, 187, 192,
196, 197, 199, 630, 631
of a matrix . . . . . . . . . . . . 343, 346, 355, 404, 716
polynomial . . . . . . . 32, 35, 40, 56, 59, 60, 708, 710
of a sequence 343, 344–351, 354–356, 404, 407
Monien, Burkhard . . . . . . . . . . . . . . . . . . . . . . . . 744, 756
of an algebraic element . . . . 152, 175, 203, 210,
Monier, Louis Marcel Gino . . . . . . . . . . 532, 533, 757
211, 343, 354, 415–417, 663, 710, 711
monomial . . 591, 596, 597, 601, 602, 606, 611, 620,
Minkowski, Hermann . . . . . . . . . . . 473, 496, 586, 757
709
Mishra, Bhubaneswar (Bud) . . . . . . . . . . . . . . . 619, 745
ideal . . . . . . . . . . . . . . . . . . . . . . . . . . 601, 602, 603, 620
Mitchell, Joan Laverne . . . . . . . . . . . . . . . . . . . . 368, 760
leading ∼ , lm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
Mittag-Leffler, Gustav Magnus (Gösta) . . . . . . . . . . 726 order 595, 596, 597–599, 603–605, 610, 620, 621
mixed-radix representation . . . . . . . . . . . . . . . . 132, 134 Montaigne, Michel Eysquem, Seigneur de . 699, 729
mobile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306, 307 Monte Carlo
stochastic ∼ . . . . . . . . . . . . . . . . . . . . . . . . 306, 307, 308 algorithm . . . . . . . . . . . . . . . . . . . . . 161, 198, 428, 724
Möbius, August Ferdinand Turing machine
function, µ . . . . . . . . . . . . . . . . . . . . . . . . 410, 429, 508 one-sided ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
inversion . . . . . . . . . . . . . . . . . . . . . . . . . . 410, 413, 429 two-sided ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
mod, congruent modulo . . . . . . . . . . . . . . . . . . . . . 69, 706 Montgomery, Peter Lawrence . . . 280, 287, 288, 308,
mod, residue class . . . . . . . . . . . . . . . . . 71, 72, 398, 706 353, 542, 569, 741, 751, 757, 758
modular Moore, Eliakim Hastings . . . . . . . . . . . . . . . . . . . . 88, 758
algorithm . . 3, 19, 97, 97–139, 152, 161, 183, 192, Mora, Ferdinando Teo 618, 619, 739, 744, 746, 747,
339, 408, 433, 444, 505, 517, 523, 525, 526 757
big prime ∼ . . 97, 100, 152, 161, 289, 444, 460, Morain, François . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
527 Moreno-Socías, Guillermo . . . . . . . . . . . . . . . . . . . . . . . . 8
prime power ∼ 97, 99, 100, 198, 271, 433, 460, Morenz, Robert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
470, 528, 536 De Morgan, Augustus . . . . . 44, 68, 96, 622, 726, 729
small primes ∼ 97, 98–100, 112, 137, 247, 310, Morgenstern, Jacques . . . . . . . . . . . . . . . . . . . . . 619, 744
444, 460, 467, 470, 471, 528, 536 Moritz, Robert Edouard . . . . . . . . . . . . . . . . . . . 729, 758
arithmetic . . . . . . . . . 69, 70, 71, 132, 282, 289, 709 Morrison, Michael Allan . . . . . . . . 541, 542, 568, 758
composition . . . 338, 339, 354, 356, 407, 408, 427 Moses, Joel . . . . . . . . . . . . . . . . . . . . . . . 20, 198, 466, 758
fast ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . 338, 339, 405 Motwani, Rajeev . . . . . . . . . . . . . . . . . . . . . . . 88, 198, 758
determinant . . . . . . . . . . . . . . . . . . 109, 113, 132, 525 Moura, Arnaldo Vieira . . . . . . . . . . . . . . . . . . . . . . . . . . 759
big prime ∼ . . . . . . . . . . . 110, 113, 168, 460, 526 Mourrain, Bernard . . . . . . . . . . . . . . . . . . . . . . . . 698, 743
small primes ∼ . . . 112, 113, 136, 168, 189, 460, µ, Möbius function . . . . . . . . . . . . . . . . . . 410, 429, 508
526, 528, 536, 537 Mulders, Thom . . . . . . . . . . . . . . . . . 199, 501, 640, 758
EEA Mullen, Gary Lee . . . . . . . . . . . . . . . . . . . . 745, 758, 766
big prime ∼ . . . . . . . . . . . . . . . . 189, 190, 195, 206 Müller, Daniel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
bivariate ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Müller, Dirk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
small primes ∼ . . . 188, 189, 190, 195, 332, 460, Müller, Eva-Maria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
526, 528, 537 Müller, Olaf . . . . . . . . . . . . . . . . . . . . . . . . . 6–8, 461, 737
exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Mullin, Ronald Cleveland . . . . . . . . . . . . . . . . . . . 88, 758
factorization . . . . . . . . . . . . . . . . . . 436, 453, 458, 489 multidegree of a polynomial, mdeg . . . . . . . . 597, 598
big prime ∼ . 433, 435, 436, 467, 526, 528, 529 multifactor Hensel lifting . . . . . . . . . . . . . . . . . . . . . . . 450
786 Index

multiple polynomial quadratic sieve . . . . . . . . . . . . . 567 iteration 3, 90, 100, 101, 218, 219, 221, 259, 268,
multiplication 257–292, 295–310, 444, 448, 450, 451, 581,
by scalars . . . . . . . . . . . . . . . 346, 348, 351, 713, 714 623
Cantor ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . 281, 282, 287 linearly convergent ∼ . . . . . . . . . . . . . . . . . . . . . . . 291
FFT ∼ . . . . . . . . . . . . . . . . see Fast Fourier Transform multivariate ∼ . . . . . . . . . . . . . . . . . . . . . . . . 449, 450
matrix ∼ . . . . . . . . 43, 335, 337–339, 411, 715, 720 numerical ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . 262, 271
exponent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 p-adic ∼ . . . . . . . . . . . . . . . 268, 271, 272, 290, 292
modular ∼ . . . . . . 73, 243, 262, 283, 460, 461, 536 Nguyen, Phong Quang . . . . . . . . . . . . . . . 509, 580, 758
of integers . 37, 227, 243, 247, 283, 284, 335, 337, Nicely, Thomas Ray . . . . . . . . . . . . . . . . . . . . . . . . 83, 758
460 Niederreiter, Harald Günther . . . . 407, 420, 421, 428,
fast ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221–254 509, 711, 745, 747, 755, 758, 759
of polynomials . . . 35, 36, 39, 221–254, 280–282, Niesi, Gianfranco . . . . . . . . . . . . . . . . . . . . . . . . . 619, 747
284, 285, 319, 323, 335, 460 Nilsson, Bengt Ola Peter . . . . . . . . . . . . . . . . . . . . . . . . . . 8
classical ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Nöcker, Michael . . . . . . . 6, 7, 88, 461, 580, 737, 746
fast ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221–254 Noether, Amalie Emmy . . . . . . . . . . . . . . 586, 604, 750
Schönhage and Strassen ∼ . . . . . . . . see Schönhage Noetherian ring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
time, M . . . . . . . . . . . . . 244, 245, 247, 254, 257, 381 non-Archimedean valuation . . . . . . . . . . . . . . . . . . . . . 274
multiplicative group . . . . 93, 105, 133, 211, 212, 250, non-Euclidean geometry . . . . . . . . . . . . . . . 25, 373, 374
280, 384, 535, 578, 580, 703, 704, 713 nonresidue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
multiplicity . 200, 389–392, 394, 419, 440, 460, 470, nonscalar model of computation . . . . . . . . . . . 286, 324
552, 560, 630, 656, 692, 711 nonsingular
multipoint evaluation . . . . . . . . . . . . . . . . . see evaluation curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559, 568, 571
multiprecision integer . . . . 29, 30–32, 34, 37, 41, 82, matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . see matrix
283, 286 norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419, 473, 707, 717
multivariate . . . . . . . . . . . . . . . . . . . . . . . see also bivariate Euclidean ∼ , || · ||2 . . . 12, 157, 164, 473, 474, 480,
division with remainder . . 595, 598, 599, 600, 604, 487, 497, 717, 768
605 max-∼ , || · ||∞ . . . . . . . . . . . . . . . . see maximum norm
factorization . . . . . . . . . . . . . . . . . . . . . . . 493, 497, 501 of a lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473, 474
gcd . . . . . . . . . . . . . . . . . 190, 198, 202, 466, 496, 501 one-∼ , || · ||1 . . . . . . . . . . . . . . . . . . . . . . . 165, 717, 768
Newton iteration . . . . . . . . . . . . . . . . . . . . . . . . 449, 450 q-∼ , || · ||q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716, 717
polynomial . 3, 4, 21, 60, 101, 191, 198, 199, 254, normal
378, 493, 586, 591–621, 709, 768 basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76, 580
quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 degree sequence . . . . . . . . . . . . . see degree sequence
remainder . . . . . . . . . . . . . . . 599, 600, 601, 608, 610 field extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
M U M ATH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 form . . . . . . . . . 56, 57, 59, 60, 63, 64, 150, 191, 200
Mumford, David Bryant . . . . . . . . . . . . . . . . . . . . . . . . . 618 Hermite ∼ . . . . . . . . . . . . . . . . . . . . 89, 352, 498, 499
Munro, James Ian . . . . . . . . . . . . . . . . . . . . . . . . . 306, 737 matrix ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
M U PAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8, 20 Smith ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
musical normalized . . . . . . . . . . . . . . . . 57, 59, 63, 147, 148, 149
interval . . . . . . . . . . . . . . . . . . . . . . . 84, 85, 86, 88, 507 polynomial . . . . . 57, 144, 150, 151, 152, 163, 167
scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11, 69, 84 Novalis (Friedrich Leopold Freiherr von Hardenberg)
theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84, 85 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68, 726, 729, 734
Musil, Robert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256, 727 Novocin, Andrew . . . . . . . . . . . . . . . . . . . . . . . . . 497, 748
Musser, David Rea . . . . . . . . . . . . . . . . . . . 465, 751, 758 NP . . . 215, 474, 496, 503, 504, 509, 576, 579, 616,
Myerson, Gerald . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 722, 723
co-∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722, 723
N, set of nonnegative integers . . . . . . . . . . . . . . . . . . . 768 N TL . . . . . . . . . 3, 8, 20, 193–196, 279, 283, 284–286,
NAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 461–466, 497
Najafi, Seyed Hesameddin . . . . . . . . . . . . . . . . . . . . . . . . 7 Nullstellensatz
Najfeld, Igor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698, 748 Hilbert ∼ . . . . . . 586, 595, 617, 618, 621, 736, 761
Napoléon I. Bonaparte . . . . . . . . . . . . 10, 502, 725, 728 proof system . . . . . . . . . . . . . . . . . . . . . . 679, 697, 698
Nash, Stephen Gregory . . . . . . . . . . . . . . . . . . . . . . . . . . 741 number
Näslund, Mats . . . . . . . . . . . . . . . . . . . . . . . 580, 748, 758 field
Newton, Humphrey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 algebraic ∼ . . . . . . . . . . . . . . . . . 279, 378, 473, 533
Newton, Sir Isaac . . . 0, 3, 24, 28, 61, 197, 203, 218, sieve . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541, 542, 569
219, 256, 286, 290, 358, 372, 374, 512, 622, 641, theory . . . . . . . . . . . . . . . . . . . . . . . . 529, 530, 533, 724
725–727, 745, 758 analytic ∼ . . . . . . . . . . . . . 508, 523, 532, 533, 652
expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 computational ∼ . . . . . . . . . . . . 3, 4, 517–571, 586
formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 267, 290, 291 fundamental theorem of ∼ . . . . . . . . . . . . 377, 518
interpolation . . . . . . . . . . . . . . . . . . 103, 134, 135, 671 numerical
inversion . . . . . . 259, 261, 262, 268–270, 275, 282, analysis . . . . . . 1, 32, 118, 119, 121, 132, 259, 621
286–289 Newton iteration . . . . . . . . . . . . . . . . . . . . . . . . 262, 271
Index 787

Nuñez, Pedro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41, 759 p-adic

Nüsken, Michael . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–8 completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
expansion 129, 130, 132, 133, 264, 265, 289, 581
O, “big Oh” . 2, 30, 32, 703, 715, 720, 721, 723, 724 integers, Z(p) . . . . . . . . . . . . . . . . . 270, 275, 449, 466
O∼ , “soft Oh” . . . . . . . . . . . . . . 264, 265, 324, 721, 724 inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
O, neutral element of an elliptic curve . . . . . . . . . . . 558 lifting . . . . . . . . . . . . . . . . . . . . . . . . . . see Hensel lifting
Odlyzko, Andrew Michael . 205, 353, 443, 497, 508, Newton iteration . . . . . . . . . 268, 271, 272, 290, 292
509, 533, 580, 697, 739, 744, 748, 753, 759 valuation . . . . . . . . . . . . . . . . . . . . . . . . . . 273, 274, 275
Oesterhelt, Andreas Stefan . . . . . . . . . . . . . . . . . . . . . . . . 7 Pan, Victor Yakovlevich (Pan Viktor
Oesterlé, Joseph . . . . . . . . . . . . . . . . . . . . . . . . . . 443, 759 kovleviq) . . . . 306, 327, 332, 352, 353, 405,
Ofman, Yuriı̆ Pavlovich (Ofman ri i  420, 750, 759, 766
Pavloviq) . . . . . . . . . . . . . . . . 223, 245, 247, 751 Panario Rodríguez, Daniel Nelson . . . . . 6, 7, 88, 419,
ω, feasible matrix multiplication exponent . . . . . . . 337 421, 580, 744–746, 759
one-norm, || · ||1 . . . . . . . . . . . . . . . . . . . . . . 165, 717, 768 Papadimitriou, Christos Harilaos (Παπαδημητρίου,
one-time pad . . . . . . . . . . . . . . . . . . . . 574, 578, 580, 581 Χρίστος Χαριλάος) . . . . . . . . . . . . . . . . . . . 721, 759
Ong, Heidrun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509, 759 parallel computation . . . 4, 19, 21, 99, 112, 197, 353,
online algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 411, 461, 591, 679
Onyszchuk, Ivan Matthew . . . . . . . . . . . . . . . . . . . 88, 758 PARI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20, 279
Oosterhoff, Luitzen Johannes . . . . . . . . 698, 748, 759 Parsons, David . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698, 759
operation partial
arithmetic ∼ . . . . . . . . . . . . . . . . . . . . 31, 32, 34, 35, 40 fraction decomposition . . . 66, 100, 128, 130–132,
bit ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 138, 290, 309, 428, 625–627, 631, 640, 642,
butterfly ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234, 235 673, 674
word ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32, 34, 40 order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595, 596
Oppenheim, Alan Victor . . . . . . . . . . . . . . . . . . 368, 759 partition of an integer . . . . . . . . . . . . . . . . 441, 442, 468
Oppenheim, Tan Sri Sir Alexander . . . . . . . . . . . . . . 739 Pascal, Blaise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
optical character recognition (OCR) . . . . . . . 623, 640 PASCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
order . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595, 596, 598, 710 Patashnik, Oren . . . . . 8, 571, 669, 670, 717, 720, 747
graded lexicographic ∼ . . . . . . . . . . . . . . . . . . 596, 598 Paule, Peter . . . . . . . . . 7, 670, 671, 697, 755, 759, 760
graded reverse lexicographic ∼ 596, 597, 598, 618 Peitgen, Heinz-Otto . . . . . . . . . . . . . . . . . . . . . . . 278, 760
lexicographic ∼ . . . . . 596, 598, 608, 615, 694, 695 Pell, John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
monomial ∼ . . . 595, 596, 597–599, 603–605, 610, Pengelley, David . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
620, 621 Penk, Michael Alexander . . . . . . . . . . . . . . . . . . . . . . . . 61
of a group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704 Pennebaker, William Boone, Jr. . . . . . . . . . . . 368, 760
of a group element, ord . . . . . . . . . . . . . . . . . 518, 704 Pentium processor . . . . . . . . . . . . . . . . . . . . . . . . . . 83, 497
partial ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595, 596 Pepin, Jean François Théophile . 530, 534, 538, 760
recursion ∼ . . . . . . . . . . . . . . 343, 344, 345, 354, 355 perfect
total ∼ . . . . . . . . . . . . . . . . . . . 595, 596, 602, 603, 620 field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
well-∼ . . . . . . . . . . . . . . . . . . . . . . . . 596, 603, 608, 620 number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531, 535
ordering information . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii power . . . . . . . . . 263, 273, 292, 535, 543, 548, 563
Ore, Öystein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550
orthogonal . 473, 476, 477, 480, 482, 486, 497, 506, period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360, 362
717 periodic
basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475, 717 function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361, 362
Gram-Schmidt ∼ . . . . . . . . . . . see Gram-Schmidt sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476 permutation . . 89, 109, 136, 441, 573, 574, 672, 694,
O’Shea, Donal Bartholomew . . . . 614, 617, 618, 741 705, 716
Osthoff, Johanna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483, 715, 716
Ostrogradsky, Mikhail Vasil’evich (Ostrogradsk i  polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Mihail Vasilьeviq) . . . . . . . . . . . . 640, 759 Perron, Oskar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90, 760
Ostrowski, Alexander Markus . . . . . . . . . . . . . . . . . . . 586 Peterson, James Lyle . . . . . . . . . . . . . . . . . . . . . . 697, 760
Osvik, Dag Arne . . . . . . . . . . . . . . . . . . . . . . . . . . 542, 751 Petkovšek, Marko . . . . . 641, 671, 675, 676, 697, 729,
Oughtred, William . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 735, 736, 760
Petr, Karel . . . . . . . . . . . . . . . . . . . . . . . . . . . 402, 420, 760
P, polynomial time . . . 496, 518, 616, 721, 722–724 -Berlekamp matrix . 335, 340, 352, 402, 404, 428
P2 , projective plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Petri, Carl Adam . . . . . . . . . . . . . . . . . . . . . . . . . . 679, 760
pn , nth prime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 net . . . . . . . . . . . . 679, 680, 697, 698, 756, 760, 761
Padé, Henri Eugène . . . . . . . . . . . . . . . . . . . . . . . 132, 759 Petrick, Stanley Roy . . . . . . . . . . . . . . . . . . . . . . 749, 755
approximant . . . . 81, 121, 122–124, 132, 137, 138, Petrovitch, Michel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
204, 214, 215, 332, 353, 736, 738 Pfaff, Johann Friedrich . . . . . . . . . . . . . . . . . . . . . . . . . . 670
approximation . 101, 118, 121, 132, 138, 325, 344 -Saalschütz identity . . . . . . . . . . . . . . . . . . . . . . . . . . . 671
Paderborn . . . . . i, 6, 20, 365, 514, 725, 744, 746, 756 Pfeiffer, Rüdiger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766
788 Index

Pflügel, Eckhard . . . . . . . . . . . . . . . . . . . . . . . . . . 641, 760 permutation ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425

PGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 primitive ∼ . . . . . . . . . . . . . see primitive polynomial
Phelps, Kevin Thomas . . . . . . . . . . . . . . . . . . . . 215, 749 random ∼ . . . . . . . . . . . . . . . . . see random polynomial
ϕ . . . . . . . . . . . . . . . . . . . . . . . . . see Euler totient function remainder sequence . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Φ, totient function for polynomials . . . . . . . . . . . . . . . 93 primitive ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Φn , cyclotomic polynomial . . . . . . . . . . . . . . . . . . . . . . 412 reduced ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
π . . . . . . . 68, 81–83, 89, 90, 198, 221, 247, 588, 710 subresultant ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
rational approximation of ∼ . . . . . . . . . . . . . . . . . . . . 81 representation of the Frobenius map . . . . 398, 408
π(·), number of primes . . . . . . . . . . . . . . . . . . . . . . . . . . 524 ring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2, 708, 768
piano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 S-∼ . . . . . . . . . . . . . . . . . . . . . . 604, 606, 608, 610, 619
Piazzi, Guiseppe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 separable ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Pickering, William Graham . . . . . . . . . . . . . . . . . . . . . . . 6 splitting ∼ . . . . . . . . . . . . . . . . . . . . . . . . . 384, 385, 387
pigeonhole principle . . . . . . . . . . . . . . . . . 678, 697, 698
squarefree ∼ . . . . . . . . . . . . . . . . . . 377, 397, 426, 453
Pilote, Michel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
summation . . . . . . . . . . . . . . . . . 3, 645, 649, 650, 658
Pinch, Richard Gilmour Eric . . . . . . . . . . . . . . 532, 760
Swinnerton-Dyer ∼ . 434, 441, 442, 443, 465, 467
Pinzon (née Sharrow), Katherine Rita . . . . . . . . . . . . . 8
Pirastu, Roberto Maria . . . . . . . . . . 670, 671, 673, 760 Taylor ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Pitassi, Toniann . . . . . . . . . . . . . . . . . . . . . . 697, 736, 760 time, P . . . . . . . . . . . . . 496, 518, 616, 721, 722–724
Pitt, François . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 -time
pivot element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110, 111 equivalent . . . . . . . . . . . . . . . . . . . . . . . 577, 579, 722
de la Place, Pierre Simon, Marquis . . . . . see Laplace reducible . . . . . . . . . . . . . . . . . . . . . . . . 577, 579, 722
plane curve . . . . . . . . . . . . . . . . 173, 198, 203, 594, 615 weight of a ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
intersection . . . . . . . . . . . . . . 141, 171, 172, 175, 203 Pomerance, Carl . 520, 529, 532, 557, 567, 569, 735,
Plato (Πλάτων) . . . . . . . . . . . . . . . . . . . . . . . . 24, 312, 727 739, 742, 753–755, 757, 759, 760
Playboy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 van der Poorten, Alfred Jacobus . . 90, 514, 697, 737,
Plücker, Julius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729 761
plumbing knee . . . . . . . . . . . . . . . . . . . . . . . . . . 14, 16, 699 POSSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
PMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Pottier, Eugène . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
Pochhammer, Leo, symbol . . . . . . . . . . . . . . . . . . . . . . 670 power series . . . 66, 81, 94, 121, 122, 132, 275, 292,
Pocklington, Henry Cabourn . . . . . . . . . . . 88, 198, 760 343, 449, 618, 670, 673, 708, 768
Poe, Edgar Allan . . . . . . . . . . . . . . . . . . . . . . . . . . 516, 728 Powers, Raymond Earnest . . . . . . . . . . . . . . . . 569, 754
Poilly, François . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725 pp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . see primitive part
point at infinity . . . . . . . . . . . . . . . . . 558, 561, 564, 567 Prange, Eugene . . . . . . . . . . . . . . . . . . . . . . 419, 430, 761
Pollack, Richard . . . . . . . . . . . . . . . . . . . . . 745, 749, 761 pretend field technique . . . . . . . . . . . . . . . . . . . . . . . . . . 558
Pollard, John Michael . 198, 247, 280, 534, 542, 545, primality
567–569, 738, 754, 755, 760 test 3, 77, 286, 513, 517, 518, 519, 521, 523, 524,
and Strassen method . . . . . 536, 541, 544, 552, 569 530, 532, 533, 535, 543, 554
p − 1 method . . . . . . . . . . . . 541, 557, 564, 567, 568 Fermat ∼ . . . . . . . . . . . . . . 519, 520, 521, 523, 534
ρ method . . . . . . 541, 542, 545, 547, 548, 567, 569 Lehmann ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . 537, 538
polynomial Lucas-Lehmer ∼ . . . . . . . . see Lucas-Lehmer test
addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 probabilistic ∼ . . . . . . . . . . . . . . . . 17, 196, 529, 543
annihilating ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Solovay and Strassen ∼ . . . . . . . . . . . . see Solovay
bivariate ∼ . . . . . . . . . . . . . . . see bivariate polynomial strong pseudo∼ . . . . . . . . 520, 521, 523, 532, 536
calculus system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
prime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149, 707, 708
characteristic ∼ . . . . . see characteristic polynomial
element . . . . . . . . . . . . . . . . . . . . . . 149, 268, 707, 708
continuant ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65, 93
factorization . . . 106, 131, 291, 292, 518, 529, 535,
cyclotomic ∼ , Φn . . . . . see cyclotomic polynomial
550, 554
degree of a ∼ . . . . . . . . . . . . . . . . . . . . . . . 32, 708, 709
elementary symmetric ∼ . . . . . . . . . . . . . . . . . . . . . . 166 Fermat ∼ . . . . . . . . . . . . . . . . . . . . . 228, 251, 530, 536
equation 1, 3, 4, 89, 218, 267, 270, 274, 444, 591, field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
614, 621, 694, 697 finding . . . . . . . . . . . . . . . . . . . . . . . 523, 525, 527, 528
error locator ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Fourier ∼ . . . . . . . . . . . . 99, 243, 246, 528, 529, 536
factorization . . . . . . . . . . . . . . . . . . . . . see factorization Mersenne ∼ . . . . . . . . . . . . . . 517, 530, 531, 534, 535
ideal . . . . . . . . . . . . . . . . . . . . . . . . . . 498, 591–621, 677 number . . . . . 17, 26, 112, 190, 197, 422, 513, 517,
interpolation . . . . . . . . . . . . . . . . . . . . see interpolation 517–538, 557, 565, 706
irreducible ∼ . . . . . . . . . . see irreducible polynomial theorem . 97, 373, 434, 523, 524, 527–529, 532,
minimal ∼ . . . . . . . . . . . . . . . see minimal polynomial 533, 536, 553, 556
monic ∼ . . . . . . . . . . 32, 35, 40, 56, 59, 60, 708, 710 power modular algorithm . . see modular algorithm
multiplication . . . . . . . . . . . . . . . . . see multiplication relatively ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46, 707
multivariate ∼ . . . . . . . . see multivariate polynomial single precision ∼ 69, 113, 243, 246, 528, 529, 536
normalized ∼ . . . 57, 144, 150, 151, 152, 163, 167 twin ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83, 221, 534
part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94, 625–627 P RIMES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532, 721
Index 789

primitive Q, field of rational numbers . . . . . . . . . . . . . . . . . . . . . 768

EEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 q-norm, || · ||q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716, 717
element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713 QR-decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Euclidean Algorithm . . . . 190, 191, 192, 194–197, quadratic
199, 206 character . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554, 561
part, pp . . 147, 148, 149, 150, 152, 162, 178, 191, law of ∼ reciprocity . . . . . . . . . . 372, 529, 537, 586
192, 200, 433, 434 sieve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557, 567
polynomial . . . . 147, 148, 150, 162, 166, 168, 170, multiple polynomial ∼ . . . . . . . . . . . . . . . . . . . . . . 567
191, 199, 200, 202, 206, 433, 695 time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
greatest common divisor of ∼ s . . . . . . . . . . . . . 152 quantifier elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
remainder sequence . . . . . . . . . . . . . . . . . . . . . . . . 199 quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
root of unity . . . . . . . . . . . . . . . . . . . . . see root of unity Quisquater, Jean-Jacques . . . . . . . . . . . . . . . . . . . . . . . . 758
squarefree decomposition . . . . . . . . . . . . . . . 470, 471 Don Quixote de la Mancha . . . . . . . . . . . . . . . . . . 90, 740
Pritchard, Paul Andrew . . . . . . . . . . . . . . . . . . . 533, 761 quotient, quo . . . . . . . . . . . 38, 40, 41, 46, 47, 261, 707
private key . . . . . . . . 17, 504, 509, 575, 576–579, 582 in the Euclidean Algorithm 48, 49, 52, 53, 58, 59,
65, 313, 314, 317, 318, 321, 323, 324, 326
probabilistic
multivariate ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
algorithm 161, 162, 167, 176, 177, 201, 204, 351,
rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671
378, 421, 423, 434, 438, 536, 569, 576, 722
primality test . . . . . . . . . . . . . . . . . . . 17, 196, 529, 543
probabilistically checkable proof . . . . . . . . . . . . . . . . . 88 R, field of real numbers . . . . . . . . . . . . . . . . . . . . . . . . . 768
ℜ, real part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
probability, prob(·) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
Rabin, Michael Oser . . 131, 215, 421, 424, 532, 533,
conditional ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . 682, 718
759, 761
distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523, 718
cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . 573, 579
finite ∼ space . . . . . . . . . . . . . . . . . 703, 717, 718, 719 Rabin, Tal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752
function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717 Rabinowitsch, J. L. . . . . . . . . . . . . . . . . . . . . . . . . 618, 761
uniform ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718 Racah, Giulio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 3, 372, 512, 717 Rackoff, Charles Weill . . . . . . . . . . . . . . . . . . 6, 567, 570
Proclus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 radix
product rule . . . . . . . . . . . . . . . . . . . . . 266, 646, 671, 674 conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264, 265
program checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 representation . . . . . . . . . . . . . . . . . . . . 29, 33, 41, 129
projection . . . 172–174, 476, 483, 499, 652, 689–691, Raghavan, Prabhakar . . . . . . . . . . . . . . . . . . . 88, 198, 758
693, 699, 718 RAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
projective Ramanujan, Srinivasa Aiyangar . 535, 644, 671, 685,
curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567, 568 737, 743, 748, 759
geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Ramos, Bartolomé . . . . . . . . . . . . . . . . . . . . . . . . . . 86, 761
plane, P2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 random
Prolog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678 element . . . . . . . . . . . . . . . . . 104, 177, 424, 534, 545
proof polynomial 93, 200, 314, 379, 411, 419, 426, 429,
certificate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684 461
probabilistically checkable ∼ . . . . . . . . . . . . . . . . . . . 88 pseudo∼ number generator . . . 503, 505, 509, 574,
system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677, 697 578, 580
propositional squares method . . . . . . . . . . . . . . . . . . . . . . . . see Dixon
calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677, 722 variable . . . . . . . 161, 546, 562, 682, 718, 719, 722
formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678, 679 Bernoulli ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
proof system . . . . . . . . . . . . . . . . . . . . . . . . . . . 677, 678 walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562, 571
rank of a matrix . . . . . . . . . . . . . . . . . . . . . . 698, 714, 715
Pruschke, Thilo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Raphson, Joseph . . . . . . . . . . . 219, 725, 726, 758, 761
pseudodivision 38, 183, 190, 191, 197, 199, 204–206
Ratdolt, Erhard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25, 725
pseudoprimality test . . . . . . . . . . . . . . . see primality test
rate of a code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209, 210
pseudoprime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 rational
pseudorandom number generator . . . . . 503, 505, 509, approximation of π . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
574, 578, 580 function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710, 768
PSPA CE . . . . . . . . . . . . . . . . . . . . . . . . . . . 616, 617, 723 canonical form of a ∼ . 116, 117, 119, 121, 122,
Ptolemy, Claudius (Πτολεμαι̃ος Κλαύδιος) . . . . . 24 124, 138
public key . . . . . . . . 17, 503, 509, 575, 577–579, 582 integration . . . . . . . . . . . . . . . 3, 626, 630, 632, 640
cryptography . . . . . . . . . . . 3, 17, 503, 575, 573–582 reconstruction . . . . 115, 117–119, 123, 124, 325
Pudlák, Pavel . . . . . . . . . . . . . . . . . . . . . . . . 697, 736, 739 interpolation . . . . . . . . . . . . . 101, 118, 119, 120, 132
Purdy, George Barry . . . . . . . . . . . . . . . . . . . . . . 581, 761 number
Putnam, Hilary Whitehall . . . . . . . . . . . . . . . . . . . . . . . 678 canonical form of a ∼ . . . . . . . . . . . . . . . . 126, 127
Pythagorean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85, 518 reconstruction . . . . 101, 124, 137, 146, 188–190,
triple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567, 570 331
790 Index

part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625, 642 Riemann, Georg Friedrich Bernhard . . 373, 533, 761
root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456, 466 Hypothesis . . . . . . . . . . . . . . 508, 533, 748, 749, 757
Razborov, Aleksandr Aleksandrovich (Razborov Extended ∼ , (ERH) . . . . see Extended Riemann
Aleksandr Aleksandroviq) . . . 697, 739, zeta function, ζ . 62, 221, 508, 533, 652, 684, 756,
761 759
reachability problem . . . . . . . . . . . . . . . . . 680, 681, 697 right ideal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
Recio Muñiz, Tomás Jesús . . . . . . . . . . . . . . 6, 619, 749 rigid conformation of cyclohexane . . 12, 15, 16, 698
Recorde, Robert . . . . . . . 44, 502, 726, 728, 729, 796 ring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32, 705, 711
rectangular matrix multiplication . . . . . . . . . . . . . . . . 353 characteristic of a ∼ . 394, 395, 397, 415, 460, 558,
recurrence . . . . . . . . . . 1, 349, 353, 354, 653, 669, 684 561, 581, 623, 626, 630, 658, 665, 710, 712
recursion order . . . . . . . . . . . . . 343, 344, 345, 354, 355 commutative ∼ . . . . . . . . . . . 705, 706, 709, 711, 713
recursively enumerable . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 factorial ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
R EDUCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 homomorphism . . . . . 104, 107, 133, 295, 302, 705,
reduced 706, 709
basis 286, 478, 479, 480, 482, 488, 491, 497, 498, canonical ∼ . . . . . . . . . . . . . 72, 104, 110, 706, 709
504, 506, 508 invariant ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
Gröbner basis . . . . . . . . . . . . . . . . . see Gröbner basis Noetherian ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
polynomial remainder sequence . . . . . . . . . . . . . . 199 of algebraic integers . . . . . . . . . . . . . . . . . . . . 707, 708
reducible . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 of constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624
refutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678, 679 of polynomials . . . . . . . . . . . . . . . . . . . . . . . 2, 708, 768
Reichel, Horst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756 operation . . . . . . . . . . . . . . . . see arithmetic operation
Reid, Constance . . . . . . . . . . . . . . . . . . . . . . . . . . 587, 761 Rink, Friedrich Theodor . . . . . . . . . . . . . . . . . . . . . . . . . 727
Reif, John Henry . . . . . . . . . . . . . . . . . . . . . . . . . . 619, 736 Rioboo, Renaud . . . . . . . 627, 630, 631, 640, 754, 758
Reischert, Daniel . . . . . . . . . . . . . . . . . . . . . . . . . see Lauer Risch, Robert Henry . . . . . . . . . . . . . . . . . 640, 641, 761
Reisig, Wolfgang . . . . . . . . . . . . . . . . . . . . . . . . . 697, 761 differential equation . . . . . . . . . . 641, 738, 742, 750
Reitwiesner, George Walter . . . . . . . . . . . . . . . . . 82, 761 rising factorial . . . . . . . . . . . . . . . . . . 647, 670, 673, 768
remainder, rem 38, 40, 41, 46, 47, 261, 323, 600, 707 Ritscher, Stephan . . . . . . . . . . . . . . . . . . . . . . . . . 617, 756
division with ∼ . . . . . . 2, 26, 37, 38, 39, 41, 45, 51, Ritt, Joseph Fels . . . . . . . . . . . . . . . . 619, 640, 745, 762
59–62, 100, 131, 257, 261, 262, 282, 283, 314, Rivest, Ronald Linn 16, 41, 368, 509, 576, 740, 741,
407, 445 762
in the Euclidean Algorithm 48, 52, 57, 58, 59, 61, Robbiano, Lorenzo . . . . . . . . . . . . . . 617, 619, 742, 747
197, 199, 313, 324, 331, 630, 631 robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591, 592
multivariate ∼ . . . . . . . . . . . . 599, 600, 601, 608, 610 kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615, 698
Remmers, Harry . . . . . . . . . . . . . . . . . . . . . . . . . . 419, 755 Rodger, Christopher Andrew . . . . . . . . . . . . . . 215, 749
Renegar, James . . . . . . . . . . . . . . . . . . . . . . . . . . . 619, 761 Rogers, Leonard James, -Ramanujan identity . . . 671,
repeated squaring . . 17, 75, 76, 77, 88, 93, 264, 291, 685, 743, 759
381, 385, 389, 392, 403, 405, 407, 424, 519, 521, Rolletschek, Heinrich Franz . . . . . . . . . . . . . . . 132, 751
537 The Rolling Stones . . . . . . . . . . . . . . . . . . . . . . . . 516, 728
representative Roman, Steven . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669, 762
canonical ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 Rónyai, Lajos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421, 762
system of ∼ s . . . . . . . . . . . . . . . . . . . . . . . . 72, 706, 709 root
symmetric ∼ . . . . . . . . . . . . . . . . . . . . . . 72, 110, 436 finding . . 132, 219, 257, 273, 286, 392, 456, 457,
repunit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530, 534, 569 460, 466, 525, 526
Research Institute for Symbolic Computation (RISC) over finite fields . . . . . . . . . . . . 377, 392, 418, 428
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618 integral ∼ . . . . . . . . . . . . . . . . . . . . . 392, 393, 635, 641
residue of an integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271, 460
class, mod . . . . . . . . . . . . . . . . . . . . . . . 71, 72, 398, 706 of unity . . . . 19, 227, 221–254, 262, 373, 384, 412
class ring . . 71, 72, 75, 92, 93, 163, 262, 326, 327, primitive ∼ . . 211, 209–215, 227, 221–254, 296,
398, 706, 768 333, 340, 352, 362, 412, 412–417, 536
resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678 rational ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456, 466
resultant, res . 15, 155, 157, 141–207, 327, 331–333, Rosen, Frederic . . . . . . . . . . . . . . . . . . . . . . . . . . . 726, 762
434, 435, 453, 615, 619, 628, 630, 635, 641, 643, Rosenkranz, Karl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
662, 663, 694 Rosser, John Barkley . . . . . . . 527, 532, 536, 750, 762
reversal, rev . . . . . . . . . . . 203, 258, 262, 287, 343, 424 Rota, Gian-Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
Reynaud, Antoine André Louis . . . . . . . . . . . . . . 61, 761 Rothstein, Michael . . . . . . . . . . . . . . . . . . . 640, 641, 762
Rhind Papyrus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 and Trager integration . . . . . . . . . . . . . . . . . . 627, 640
Richardson, Daniel . . . . . . . . . . . . . . . . . . . . . . . . 640, 761 rounding error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Richmond, Lawrence Bruce . . . . . . . . . . 419, 421, 759 routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18, 198
te Riele, Herman(us) Johannes Joseph . . . . . 508, 533, Rowland, John Hawley . . . . . . . . . . . . . . . . . . . . 199, 762
542, 751, 756, 759 Roy, Marie-Françoise . 184, 199, 332, 619, 749, 755
Index 791

RP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532, 722, 723 semigroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697

co-∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532, 722, 723 Sendra, Juan Rafael . . . . . . . . . . . . . . . . . . 618, 747, 749
RSA separable polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542 sequence
cryptosystem . . . . . 16, 17, 18, 503, 542, 573, 576, associated ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
578–580, 582 Cauchy ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
run length encoding . . . . . . . . . . . . . . . . . . 365, 366–368 degree . . . . . . . . . . . . . . . . . . . . . . . see degree sequence
Runge, Carl David Tolmé . . . . . . . . . . . . . . . . . . . . . . . 586 Fibonacci ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . 66, 341, 343
Russell, Bertrand Arthur William . . . . . . . . . . . . . . . . 588 impulse response ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
linearly recurrent ∼ . . . . . . . . . see linearly recurrent
S-polynomial, S(·, ·) . . . . . . . 604, 606, 608, 610, 619 Lucas ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Saalschütz, Louis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 periodic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Sachse, Hermann . . . . . . . . . . . . . . . . . . . . . . . . . 698, 762 superincreasing ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
S ACLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Seress, Ákos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724, 736
Safey El Din, Mohab . . . . . . . . . . . . . . . . . . . . . . 199, 755 Serocka, Peter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
S AGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Seroussi, Gadiel . . . . . . . . . . . . . . . . . . . . . . . . . . 580, 737
de Sainte-Croix, Jumeau . . . . . . . . . . . . . . . . . . . . . . . . 669 Serre, Jean-Pierre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
Salvy, Bruno . . . . . 641, 671, 697, 741, 744, 760, 762 Serret, Joseph Alfred . . . . . . . . . . . . 418, 728, 753, 763
Samuel, Richard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729 Sesame Street . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Sande, G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247, 747 Sgall, Jiří . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697, 739
Saunders, Benjamin David . 340, 351, 353, 404, 465, Shakespeare, William . . . . . . . . . . . . . . . . . . . . . . . . 0, 725
747, 751 Shallit, Jeffrey Outlaw . 6–8, 61, 421, 531–535, 568,
Saupe, Dietmar . . . . . . . . . . . . . . . . . . . . . . . . . . . 278, 760 736, 763
Saxena, Nitin . . . . . . . . . . . . . . . . . . . . . . . . 517, 543, 735 Shamir, Adi . . 16, 131, 469, 503, 505, 509, 576, 744,
scalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714 759, 762, 763
multiplication . . . . . . . . . . . 346, 348, 351, 713, 714 Shanks, Daniel Charles . . . . . . . . . . . . . . . . . . . . . 82, 763
scale Shanks, William . . . . . . . . . . . . . . 82, 90, 622, 729, 763
chromatic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Shannon, Claude Elwood, Jr. . . . . 209, 215, 307, 763
diatonic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85, 86 Sharrow, Katherine Rita . . . . . . . . . . . . . . . . . see Pinzon
musical ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11, 69, 84 O’Shea, Donal Bartholomew . . . . 614, 617, 618, 741
well-tempered ∼ . . . . . . . . . . . . . . . . . . . . . . . 86, 87, 88 Shen, Kangsheng . . . . . . . . . . . . . . . . . . . . . . . . . 131, 763
Schafer, Ronald W. . . . . . . . . . . . . . . . . . . . . . . . 368, 759 Shepherdson, John Cedric . . . . . . . . . . . . . . . . . 419, 745
Scheraga, Harold Abraham . . . . . . . . . . . . . . . . 698, 747 Shepp, Lawrence Alan . . . . . . . . . . . . . . . . . . . . 421, 763
Schering, Ernst Christian Julius . . . . . . . . . . . . . . . . . 752 shift
Schloß Neuhaus . . . . . . . . . . . . . . . . . . . . . . . . . . . 365, 725 -equivalence class . . . . . . . . . . . . . . . . . . . . . . 655, 656
Schmidt, August . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725 gcd, gcdE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
Schmidt, Erhard . 475–481, 485, 496, 498, 500, 717, operator, E . . . . . . . . . . . . . . 646, 648, 659, 660, 671
762 Shiue, Peter Jau-Shyong . . . . . . . . . . . . . 745, 758, 766

Schmidt, Friedrich Karl . . . . . . . . . . . . . . . . . . . . . . . . . 748 Shokrollahi, Mohammad Amin ( úæêË@Qº á
Ó@ YÒm×)
Schnorr, Claus-Peter . . 421, 497, 509, 567, 752, 757, . . . . . . . . . . . . . . . . . . . . . . 88, 222, 286, 338, 352, 739
759, 762 short vector . . . 3, 434, 435, 473–501, 504–507, 509,
Schoenfeld, Lowell . . . . . . . . . . . . . 527, 532, 536, 762 574
Schönhage, Arnold . 7, 202, 220–222, 243, 245, 247, cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
253, 254, 279, 283, 286, 292, 332, 352, 497, 533, shortest vector . . . . . . . . . . . . . . . . . . 480, 493, 497, 500
727, 759, 762 Shoup, Victor John . . . . 3, 6–8, 20, 88, 193, 205, 246,
and Strassen multiplication algorithm . . . 238, 243, 278, 279, 283, 286, 354, 401, 405–407, 419–421,
245, 283, 284, 286 462, 465–467, 497, 580, 734, 745, 746, 751, 763
Schrijver, Alexander . . . . . . . . . . . . . . . . . . . . . . 496, 748 Shparlinski, Igor’ Evgen’ovich (Xparlinski i 
von Schubert, Friedrich Theodor . . . . . . . . . . 465, 762 Igorь Evgenьeviq) . 6, 198, 199, 419, 746,
Schubert, Friedrich Wilhelm . . . . . . . . . . . . . . . . . . . . 727 763
Schwartz, Jacob Theodore . . . . . . . . . . . 198, 332, 762 Shparlinski, Irina Igorevna (Xparlinska
Schwarz, Hermann Amandus . . . . . . . . . 485, 500, 555 Irina Igorevna) . . . . . . . . . . . . . . . . . . . . . . . . 6
Schwarz (Švarc), Štefan . . . . . . . . . . . . . . . . . . . 420, 763 Shpilka, Amir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199, 763
Schwarz, Robert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Shub, Michael Ira . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
Schwenter, Daniel . . . . . . . . . . . . . . . . . 61, 131, 697, 763 SIAM Journal on Computing . . . . . . . . . . . . . . . . . . . . . 21
S CRATCHPAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20, 640 Siegel, Carl Ludwig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
secret sharing . . . . . 3, 11, 18, 19, 100, 103, 131, 134 sieve
Sedgewick, Robert . . . . . . . . . . . . . . . . . . . . . . . . 697, 763 number field ∼ . . . . . . . . . . . . . . . . . . . . . 541, 542, 569
seed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505, 574, 578 of Eratosthenes . . . . . 171, 527, 531, 533, 552, 557
self-reciprocal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424, 425 quadratic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557, 567
Selfridge, John Lewis . . . . . . . . . . . 532, 542, 738, 760 Sieveking, Malte . . . . . . . . . . . . . . . . 286, 497, 749, 763
semialgebraic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696 sign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
792 Index

signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359–369 linear

analog ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359, 360 algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
continuous ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359, 363 system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325, 556
digital ∼ . . . . . . . . . . . . . . . . . . . . . . 247, 359, 363, 368 matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335, 340
discrete ∼ . . . . . . . . . . . . . . . . 359, 360–364, 368, 369 representation . . 43, 101, 494, 496, 498, 501, 641
even ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 spline
odd ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Bézier ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
periodic ∼ . . . . . . . . . . . . . . . 360, 361–364, 368, 369 cubic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
processing . . . . . . . . . . . . . . . . . . . . . . . 3, 247, 359, 363 splitting
sine ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 equal-degree ∼ . . . . . . . . . . . . . . . . 385, 387, 423, 424
SIGSAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 field . . . . . . 177, 426, 429, 441, 627, 628, 630, 711
Silverman, Joseph Hillel . . . . . . . . 568, 752, 758, 763 polynomial . . . . . . . . . . . . . . . . . . . . . . . . 384, 385, 387
Silverman, Robert David . . . . . . . . 531, 567, 739, 764 Sprindzhuk, Vladimir Gennadievich (Sprindжuk
Simon, Horst Dieter . . . . . . . . . . . . . . . . . . . . . 2, 337, 736 Vladimir Gennadieviq) . . . . . . . . 498, 764
Simon, Imre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750 square
Singer, Michael F. . . . . . 498, 641, 671, 748, 749, 764 and multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Singh, Simon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514, 764 wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
single precision . . . . 29, 30, 31, 37, 40–42, 244, 280, squarefree . . . . . . . . . . . . . . . . . . . 55, 419, 435, 558, 559
282, 283 decomposition . 395, 396, 397, 425, 426, 460–462,
prime . . . . . . . . . . . 69, 113, 243, 246, 528, 529, 536 470, 471, 492, 625, 626, 630, 631, 634, 635,
S INGULAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20, 619 642, 655, 657, 658
singular matrix . . . 103, 204, 347, 351, 355, 688, 715, primitive ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . 470, 471
716 factorization . . . 377, 379, 389, 393, 395, 397, 416,
singularity . . . . . . . . . . . . . . . . . 118, 121, 132, 591, 697 426, 658
Sipser, Michael . . . . . . . . . . . . . . . . . . . . . . . . 89, 721, 764 part 394, 395, 414, 419, 425, 426, 627, 640, 642,
size of an elliptic curve . . . . . . . . . . . . . . . . . . . . 561, 565 663
Sjöwall, Maj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516, 728 polynomial . . . . . . . . . . . . . . . . . . . 377, 397, 426, 453
Skopin, Aleksandr Ivanovich (Skopin standard
Aleksandr Ivanoviq) . . . . . . . . . . . . . . . . 419 basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
slide rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 deviation . . . . . . . . . . . . . . . . . . . . . . . . . . 196, 281, 718
Slisenko, Anatol’ Oles’evich (Slisenko representation . . . . . . . . . . . . . . 30, 31, 37, 53, 54, 74
Anatolь Olesьeviq) . . . . . . . . . . . . . 419, 764 starting solution . . . . . . . 265, 268–272, 290, 446, 448
Sloane, Neil James Alexander . . . . . . . . . . . . . 215, 756 Steel, Allan Kenneth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
small primes modular algorithm . . . . . . . . see modular Steele, Leroy P., Prize . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
Smart, Nigel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580, 737 Steiger, William . . . . . . . . . . . . . . . . . . . . . . 745, 749, 761
Smith, Edson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Steiglitz, Kenneth . . . . . . . . . . . . . . . . . . . . 286, 292, 735
Smith, Henry John Stephen, normal form . . . . . . . . . 89 Stein, Clifford . . . . . . . . . . . . . . . . . . . . . . . . . 41, 368, 741
smooth Stein, Josef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61, 764
elliptic curve . . . . . . . . . . . . . . . . . . . . . . . . . . . 559, 560 Stein, William Arthur . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
function . . . . . . . . . . . . . . . . . . . . . . . . . . . 121, 361, 368 Stern, Jacques . . . . . . . . . . . . . . . . . . . . . . . . 509, 580, 758
number . . . 421, 549, 552–555, 557, 558, 564–566, Stetter, Hans Jörg . . . . . . . . . . . . . . . . . . . . . . . . 7, 41, 766
568, 570 Stevenhagen, Peter . . . . . . . . . . . . . . . . . . . . . . . . 441, 764
point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618 Stevenson, Robert Louis . . . . . . . . . . . . . . . . . . . . 28, 726
soft Oh, O∼ . . . . . . . . . . . . . . . . 264, 265, 324, 721, 724 Stevin, Simon . . . . . . . . . . . . . . . . . . . . . . . . . . . 41, 61, 764
softly linear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721 Stieltjes, Thomas Joannes . . . . . . . . . . . . . . . . . . . . . . . 508
Soiffer, Neil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21, 750 Stillman, Michael . . . . . . . . . . . . . . . . . . . . . . . . . 618, 736
solar calendar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Stirling, James . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670, 764
Solovay, Robert Martin . . . . 198, 529, 530, 533, 764 formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
and Strassen primality test . . . . 161, 198, 529, 530, number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669, 670
532, 533, 735 of the first kind . . . . . . . . . . . . . . . . . . . . . . . 672, 768
solution space . . . . . . . . . 351, 638, 666, 685, 698, 715 of the second kind . . . . . . . . . . . . . . . . . . . 650, 768
Sonin, Nikolaı̆ Yakovlevich (Sonin Nikola i STOC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
kovleviq) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740 stochastic mobile . . . . . . . . . . . . . . . . . . . . 306, 307, 308
Sorenson, Jonathan Paul . . . 287, 529, 533, 736, 757, Storjohann, Arne . 353, 497, 501, 671, 747, 758, 764
764 Stoutemyer, David . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198, 221 straight-line program . . . . . . . . . . . . . . . . . . . . . . 495, 498
fast ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Strang, Gilbert . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713, 764
Sosigenes of Alexandria . . . . . . . . . . . . . . . . . . . . . . . . . 83 Strassen, Volker . . . . 6, 161, 198, 221, 222, 238, 243,
space-bounded complexity class . . . . . . . . . . . . . . . . . 723 245, 247, 254, 283, 284, 286, 324, 332, 335, 337,
sparse 338, 352, 497, 529, 530, 532, 533, 536, 541, 544,
factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 552, 567, 569, 735, 736, 740, 762, 764
interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498 algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 335, 337, 352
Index 793

Strehl, Volker . . . . . . . . . . . . . . 6, 7, 670, 671, 755, 760 Taniyama, Yutaka, -Weil conjecture . . . . . . . . . . . . . 514
string matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Tannery, Paul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729, 744
strong Tarry, Gaston . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531, 765
liar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523, 532 Tarski, Alfred (Tajtelbaum) . . . . . . . . . . 619, 748, 765
pseudoprimality test . . . . . 520, 521, 523, 532, 536 taxi-cab number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
witness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523, 532, 534 Taylor, Brook . . . . . . . . . . . . . . . . . . . . . . . . 286, 746, 765
Sturm, Jacques Charles François . . 94, 332, 748, 764 coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95, 752, 765 expansion 100, 114, 121, 113–131, 259, 264–278,
theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95, 198 286, 289, 290, 292, 353, 623, 671
Sturmfels, Bernd . . . . . . . . . . . . . . . . . . . . . . . . . . 697, 743 generalized ∼ . . . . . . . . . . . . . . . . . . . . . . . . 264, 289
subdeterminant . . . . . . . . . . . . . . . . . . . . . . 688, 689, 694 polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
subfield . . . . . . . . . . . . . . . . . . . . . . 94, 641, 710, 711, 712 series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
subgroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373, 704, 768 Taylor, Richard . . . . . . . . . . . . . . . . . . . . . . . . . . . 514, 765
submodule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Teichmüller, Oswald . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
subproduct tree . . . . . . . . . . . . . . . . . 296, 297, 298, 302 telescoping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
subresultant . . . . . 3, 33, 45, 141, 143, 152, 164, 181, Tenenbaum, Gérald . . . . . . . . . . . . . . . . . . . . . . . 536, 765
178–207, 313, 327–332, 616, 630, 681 te Riele, Herman(us) Johannes Joseph . . . . . see Riele
fundamental theorem on ∼ s . . . . . . . 327, 329, 332 term
polynomial remainder sequence . . . . . . . . . . . . . . 199 ratio . . . . . . . . . . . 659, 660, 663, 664, 667, 674, 683
subring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641, 706 rewriting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591, 618
subset sum Thatcher, James Winthrop . . . . . . . . . . . . . . . . . 744, 751
cryptosystem . . . . . . . . . see knapsack cryptosystem Theaitetus (Θεαίτητος) . . . . . . . . . . . . . . . . . . . . . . . . . . 24
problem . . . . . . . . . . . . . . . . . . . . . . 503, 504, 509, 576 Theiwes, David . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
subspace theorem in a proof system . . . . . . . . . . . . . . . . . . . . . . . 677
Krylov ∼ . . . . . . . . . . . . . . . . . . . . . 341, 346, 347, 355 Theoretical Computer Science . . . . . . . . . . . . . . . . . . . 21
linear ∼ . . . . . . . . . . . . . 209, 210, 280, 714, 715, 768 theory of a proof system . . . . . . . . . . . . . . . . . . . . . . . . 677
substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623 Thijsse, Gérard Philip Antoine . . . . . . . . . . . . . . . . . . 727
Sudan, Madhu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215, 735 Thomé, Emmanuel . . . . . . . . . . . . . . . . . . . . . . . . 542, 751
summation . . . . . . . . . . . . . . . . . . . . 3, 101, 645–675, 681 three primes FFT . . . . . . 243, 246, 247, 283, 284, 286
hypergeometric ∼ 3, 641, 660, 665, 658–669, 671, 3-adic FFT . . . . . . . . . . . . . . . . . . . . . . 242, 247, 252, 253
674, 683, 685 Thue, Axel . . . . . . . . . . . . . . . . . . . . . . 132, 750, 765, 767
indefinite ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646, 683 Tijdeman, Robert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760
of polynomials . . . . . . . . . . . . . 3, 645, 649, 650, 658 van Tilborg, Henricus Carolus Adrianus (Henk) 215,
Sun, Xiaoguang . . . . . . . . . . . . . . . . . . . . . . . . . . . 131, 753 737
Sun-Tsŭ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Timofeev, Andrey . . . . . . . . . . . . . . . . . . . . . . . . . 542, 751
supercomputer . . . . . . . . . . . . . . . . . . . . . . . 1, 18, 83, 575 Tiwari, Prasoon . . . . . . . . . . . . . . . . . . . . . . . . . . . 498, 737
superincreasing sequence . . . . . . . . . . . . . . . . . . . . . . . 504 Toeplitz, Otto, matrix . . . . . . 202, 332, 335, 353, 738
superlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 tonal key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85, 86
surjective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704, 713, 714 Toom, Andrei Leonovich (Toom Andre i
Svoboda, Antonín . . . . . . . . . . . . . . . . . . . . . . . . . 132, 764 Leonoviq) . . . . . . . . . . . . . . . . . . . . . . . . . . 247, 765
Swan, Richard Gordon . . . . . . . . . . . . . . . 207, 332, 764 total
Swift, Jonathan . . . . . . . . . . . . . . . . . . . . . . . . . . . 702, 729 degree . . . . 157, 172, 176, 493, 597, 616, 689, 709
Swinnerton-Dyer, Sir Henry Peter Francis . . . . . . . 465 order . . . . . . . . . . . . . . . . . . . . 595, 596, 602, 603, 620
polynomial . . . . . . . . . 434, 441, 442, 443, 465, 467 Trabb Pardo, Luis Isidoro . . . . . . . . . . . . . . . . . 567, 752
Sylvester, James Joseph 96, 197, 199, 294, 334, 726, trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382, 419
727, 736, 755, 765 traditional
matrix, Syl . . . . 155, 158, 159, 181, 197, 199, 201, Euclidean Algorithm . . 47, 51, 54, 57, 79, 94, 95,
204, 205, 335, 340, 435, 470 99, 144, 184, 185, 187, 197, 199, 329
symbolic-numeric computation . . . . . . . . . . . . . . . . . . . 41 Extended Euclidean Algorithm (EEA) . 48, 49, 51,
symmetric 52, 54, 57, 59, 60, 64, 65, 80, 94, 111, 125,
cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . 16, 575, 578 184, 186, 189, 205, 313, 317, 325, 710
group . . . . . . . . . . . . . . . . . . . . . . . . . 136, 442, 465, 705 Trager, Barry Marshall . . . . . 466, 496, 498, 627, 630,
system of representatives . . . . . . . . . . . . . . . 72, 706, 709 631, 640, 751, 758, 765
symmetric ∼ . . . . . . . . . . . . . . . . . . . . . . . . . 72, 110, 436 transcendental . . . . . . . . . . . . . . . . . . . . . . . . . . 82, 90, 710
Szabó, Nicholas Sigismund . . . . . . . . . . . . . . . 132, 765 transmission
channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16, 209
T, transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
tableau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678 rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Takahashi, Daisuke . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 transposition principle . . . . . . . . . . . . . . . . . . . . 340, 353
Tamura, Yoshiaki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 trapdoor function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
Tanaka, Richard Isamu . . . . . . . . . . . . . . . . . . . . 132, 765 Traub, Joseph Frederick . . . . 197, 199, 332, 738, 761
tangent function . . . . . . . . . . . . . . . . . . . . . . . . . . . 123, 124 Traverso, Carlo . . . . . . . . . . . . . . . . . 619, 734, 747, 765
794 Index

Tre, Sol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697, 743 degree ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91, 94, 274

trial division . . . . . . . . . . . . . . . 389, 541, 543, 544, 552 non-Archimedean ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . 274
triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592, 593, 612 p-adic ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . 273, 274, 275
triangular x-adic ∼ . . . . . . . . . . . . . . . . . . . . 91, 94, 274, 275, 292
matrix . . . . . . . . . . . . . . . . . . . . . . . . 329, 638, 666, 716 value representation . . . . . . . . . . . . . . . . . . . . . . . 100, 231
wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 van Ceulen, Ludolph (Ludolf) . . . . . . . . . . . . . . . . 82, 90
triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 van de Lune, Jan . . . . . . . . . . . . . . . . . . . . . . . . . . 533, 756
trivial derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . 624, 642 van der Hoeven, Joris . . . . . . . . . . . . . . . . . . . . . 469, 749
Tropfke, Johannes . . . . . . . . . . . . . . . . . . . . . . . . . . . 88, 765 Vandermonde, Alexandre Alexis Théophile . . . . . 670
tropical year . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83, 84 convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
Trudi, Nicola . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199, 765 matrix, VDM . . 103, 213, 231, 232, 249, 335, 340,
Tschirnhaus (Tschirnhausen), Ehrenfried Walther, 352
Graf von . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197, 754 van der Poorten, Alfred Jacobus . . . . . . . . see Poorten
Tuckerman, Bryant . . . . . . . . . . . . . . . . . . . . . . . . 542, 738 van der Waerden, Bartel Leendert . . . . . see Waerden
Tukey, John Wilder . . . . . . . . . . . . . . . . . . 233, 247, 741 van Hoeij, Marinus (Mark) Hubertus Franciscus . . 7,
Túran, Paul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748 497, 671, 735, 748, 749
Turing, Alan Mathison . . . . . . . 89, 419, 574, 588, 765 van Leeuwen, Jan . . . . . . . . . . . . . . . 750, 754, 764, 765
machine . . . . . . . . . . . . . . . . . . . 32, 721, 722, 747, 762 van Lint, Jacobus Hendricus . . . . . . . . . . . . . . . 215, 755
Las Vegas ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722 Vanstone, Scott Alexander . . . . . . . . . . . . . . . . . . 88, 758
one-sided Monte Carlo ∼ . . . . . . . . . . . . . . . . . . . 722 van Tilborg, Henricus Carolus Adrianus . see Tilborg
two-sided Monte Carlo ∼ . . . . . . . . . . . . . . . . . . . 721 variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722 variety
Twain, Mark (Samuel Longhorne Clemens) 358, 727 algebraic ∼ . . . . . . . . . . . . . . 172, 198, 586, 591, 613
twin prime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83, 221, 534 of an ideal, V (·) . . . . 593, 594, 595, 614, 617, 618
twisted cubic . . . . . . . . . . . . . . . . . . . . 608, 609, 613, 614 Vaughan, Robert Charles . . . . . . . . . . . . . 198, 201, 765
two-norm, || · ||2 . . . . . . . . . . . . . . . . see Euclidean norm VDM . . . . . . . . . . . . . . . . . . . . . see Vandermonde matrix
vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714
UFD . . . . . . . . . . . . . see Unique Factorization Domain addition system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
Ulam, Stanisław Marcin . . . . . 28, 198, 516, 724, 726, short ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . see short vector
728, 757 shortest ∼ . . . . . . . . . . . . . . . . . . . . . 480, 493, 497, 500
Ullman, Jeffrey David . . . . . . . . . . . 286, 292, 332, 735 space . . . . 211, 341, 342, 354, 362, 401, 674, 698,
ultrametric inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 710–712, 713, 714, 716, 717
Ulugh Beg Muh.ammad Tūrghāy bin Shāh Rukh dimension of a ∼ . 349, 401, 674, 685, 687, 688,
( pP èA áK . ø A«Pñ × 698, 710, 711, 714

K YÒm ÁK. ©Ëð@) . . . . . . . . . . 90 finite-dimensional ∼ . . . . . . . . . . . . . . . . . . 710, 714
Umans, Christopher Matthew . . . 339, 405, 407, 408,
420, 751, 765 finitely generated ∼ . . . . . . . . . . . . . . . . . . . . . . . . . 714
umbral calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669 Venkatesan, Ramarathnam . . . . . . . . . . . . . . . . 567, 749
undecidable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419, 640 Vercauteren, Frederik R. G. . . . . . . . . . . . . . . . . . . . . . . . 8
Underbakke, David Lee . . . . . . . . . . . . . . . . . . . . . . . . . 534 Vernam, Gilbert S. . . . . . . . . . . . . . . . . . . . . . . . . 580, 765
undetermined coefficients, method of ∼ . . see method Vetter, Herbert Dieter Ekkehart . . 279, 286, 292, 727,
uniform probability function . . . . . . . . . . . . . . . . . . . . 718 762
unimodular Viehmann, Benjamin Thomas Johannes . . . . . . . . . . . 7
matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89, 498, 499 Viète, François, Seigneur de la Bigotière . . . . . . . . 219
transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
unique factorization domain (UFD) . . . 63, 147–150, Villard, Gilles . . . . . . . . . . . . . . . . . . . . . . . . 353, 747, 765
152, 157, 158, 191, 199, 200, 202, 204, 206, 274, Viola Deambrosis, Alfredo (Tuba) . . . . 419, 421, 746,
377, 433, 470, 518, 706, 707, 708, 711 759
unit . . . . . . . . . . . . . . . . . . . . . . . 38, 46, 73, 707, 708–710 Vitter, Jeffrey Scott . . . . . . . . . . . . . . . . . . . . . . . 697, 765
unsatisfiable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678 Vogel, Kurt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
Updike, John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702, 729 Voltaire (François Marie Arouet) 294, 502, 727, 728
Urquhart, Alasdair . . . . . . . . . . . . . . . . . . . . . . . . 697, 765 von Goethe, Johann Wolfgang . . . 140, 358, 726, 727
von Koch, Niels Fabian Helge . . . . . . . . . . . . . 287, 752
V (·), variety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 von Lindemann, Carl Louis Ferdinand . . . . . . . 82, 755
Vacca, Giovanni . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88, 765 von Schubert, Friedrich Theodor . . . . . . . . . . 465, 762
Vadhan, Salil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199, 755 Vrbik, Paul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Valach, Miroslav . . . . . . . . . . . . . . . . . . . . . . . . . . 132, 764
Valiant, Leslie Gabriel . . . . . . . . . . . . . . . . . . . . 312, 727 Wade, Leroy Grover (Skip), Jr. . . . . . . . . . . . . 698, 765
Vallée, Brigitte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61, 765 van der Waerden, Bartel Leendert . . . . 198, 349, 352,
de la Vallée Poussin, Charles Jean Gustave Nicolas 419, 465, 586, 588, 703, 729, 734, 765, 766
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533, 765 Wagstaff, Samuel Standfield, Jr. . 532, 534, 542, 543,
valuation . . . . . . . . . . . . . . . 94, 257, 273, 274, 275, 292 569, 738, 760, 766
Archimedean ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Wahlöö, Per . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516, 728
Index 795

Waldeck, Minna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Wilson, Sir John, theorem . . . . . . . . . . . . . . . . . 422, 428

Wall, James Robert . . . . . . . . . . . . . . . . . . . . . . . 215, 749 Winkler, Franz . . . . . . . . . . . . . . . . . . 618, 738, 740, 749
Wallace, Gregory K. . . . . . . . . . . . . . . . . . . . . . . 368, 766 Winograd, Shmuel . . . . . . . . . . . . . . 352, 420, 741, 766
Wallis, John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 Winter, Dik T. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533, 756
Walton, Izaak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702, 729 Winterhof, Arne . . . . . . . . . . . . . . . . . . . . . . . . . . 421, 754
Wan, Daqing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425, 766 witness
Wang, Dongming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767 Fermat ∼ . . . . . . . . . . . . . . . . 519, 520, 522, 523, 534
Wang, Paul Shyh-Horng . . . . . . . . . . . . . 376, 727, 759 strong ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . 523, 532, 534
Wang, Xinmao . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327, 766 van de Woestijne, Christiaan Evert . . . . . . . . . . . . . . . . 8
Wang, Yuan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352, 740 Wolf, Stefan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580, 756
Waring, Edward . . . . . . . . . . . . . . . . . 131, 286, 588, 766 Wolfram, Stephen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Warlimont, Richard Clemens . . . . . . . . . . . . . . 419, 752 Woltman, George Frederick . . . . . . . . . . . . . . . . . . . . . 517
Waterloo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i, 20 Wong, Yiu-Chung . . . . . . . . . . . . . . . . . . . . . . . . . 498, 750
Watson, John H., M. D. . . . . . . . . . . . . . . . . . . . . . . . . . 702 Woodall, H. J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541, 741
Watt, Stephen Michael . 41, 735, 738, 741, 747, 766 word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29, 30, 42
weather prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32, 34, 40
Weaver, Warren Eduard . . . . . . . . . . . . . . . . . . . . . . . . . 763 Wrench, John William, Jr. . . . . . . . . . . . . . . . . . . . 82, 763
web page . xiii, 2, 4, 8, 18, 193, 286, 531, 534, 542, Wright, Edward Maitland . . . . 62, 421, 532, 534, 748
687, 697, 724 Wu, Wen-tsün . . . . . . . . . . . . . . . . . . . 618, 619, 745, 767
Weber, Heinrich . . . . . . . . . . . . . . . . . . . . . . . . . . . 725, 761 Wunderlich, Marvin Charles . . . . . . . . . . . . . . 569, 766
Weber, Wilhelm Eduard . . . . . . . . . . . . . . . . . . . . . . . . . 374 WZ-pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
Weeks, Dennis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766
Wegener, Ingo Werner . . . . . . . . . . . . . . . . . . . . 721, 766 x-adic
de Weger, Benjamin Marinus Marnix (Benne) . . 497, expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
766 valuation . . . . . . . . . . . . . . . . . . . 91, 94, 274, 275, 292
Wehry, Marianne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6, 7
Weierstraß, Karl Theodor Wilhelm . . . . . 68, 373, 726
coefficients . . . . . . . . . . . . . . 559, 561, 563, 564, 571 Yagati Narayana Lakshman . . . . . . . . . . see Lakshman
equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 Yap, Chee Keng . . . . . . . . . . . . . . . . . . . . . . . . . . . 618, 767
weight Yee, Alexander Jih-Hing . . . . . . . . . . . . . . . . 82, 90, 767
Hamming ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75, 210 Yehudayoff, Amir . . . . . . . . . . . . . . . . . . . . . . . . . 199, 763
of a node in a mobile . . . . . . . . . . . . . . 306, 307, 308 Yger, Alain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618, 737
of a polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Yong, Huang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Weil, André . . . . . . . . . . . . . . . . . . . . . 198, 513, 514, 766 Yoshino, S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568, 736 Young, Ian Theodore . . . . . . . . . . . . . . . . . . . . . . 368, 759
Weilert, André . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61, 766 Yun, David Yuan Yee . 198, 332, 419, 425, 466, 631,
Weispfenning, Volker Bernd . . 7, 618, 736, 741, 747 640, 738, 758, 767
well-order . . . . . . . . . . . . . . . . . . . . . . 596, 603, 608, 620 algorithm . . . . . . . . . . . . . . . . . . . . . 395, 426, 440, 631
well-tempered scale . . . . . . . . . . . . . . . . . . . . . . 86, 87, 88
Werckmeister, Andreas . . . . . . . . . . . . . . . . . . . . . . 86, 766 Z, ring of integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
Werner, Markus . . . . . . . . . . . . . . . . . . . . . . . . . . . 699, 729 Z-module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349, 473
Weyl, Claus Hugo Hermann . . . . . . . . . . . . . . . . . . . . . 586 Zm , integers mod m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Whitehead, Alfred North . . . 256, 588, 676, 727, 729 Z(p) , p-adic integers . . . . . . . . . . . . . . . . . . . . . . 275, 449
Whiteside, Derek Thomas . . . . . . . . . . . . 725, 726, 758 Zassenhaus, Hans Julius . . . . 382, 405–407, 417, 418,
Wiedemann, Douglas Henry . . . . 340, 346, 351, 352, 444, 466, 739, 767
355, 556, 766 algorithm . . . . . . . . . . . . . . . . . . . . . 453, 455–457, 497
algorithm 340, 346, 352, 353, 355, 420, 556, 557, Zayer, Jörg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569, 741
741, 750, 751, 765 Zeilberger, Doron . . . . . 632, 641, 643, 671, 676, 684,
Wieland, Thomas . . . . . . . . . . . . . . . . . . . . . . . . . 698, 736 697, 729, 735, 740, 760, 766, 767
Wiles, Andrew John . . . . . . . . . . . . . . . . . . 514, 765, 766 Zermelo, Ernst Friedrich Ferdinand . . . . . . . . . . . . . 586
Wilf, Herbert Saul . . . . 466, 671, 676, 684, 697, 729, zero divisor . . . . . . . . . . . . . . . . . . . . . . . 92, 227, 706, 713
760, 766 zeta function, ζ . . . . . . . . . see Riemann zeta function
Kaiser Wilhelm II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 Ziegler, Joachim . . . . . . . . . . . . . . . . . . . . . . . . . . 286, 739
Willett, Michael . . . . . . . . . . . . . . . . . . . . . . . . . . . 419, 766 Ziegler, Konstantin Josef Franzisuks . . . . . . . . . . . . . . 8
Williams, Hugh Cowie 530, 542, 568, 569, 757, 766 Zima, Eugene Viktorovich (Zima Evgeni i 
Williams, Leland Hendry . . . . . . . . . . . . . . . . . . . 20, 766 Viktoroviq) . . . . . . . . . . . . . . . . . . . . 7, 671, 747
Williams, Robert Chadwell . . . . . . . . . . . . . . . . . . . . . . 744 Zimmermann, Paul . . . . 6–8, 465, 542, 697, 734, 744,
Williams, Thomas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 751, 767
Williams, Virginia Panayotova Vassilevska 352, 766 Zimmermann, Philip R. . . . . . . . . . . . . . . . . . . . . . 18, 767
Willsky, Alan Steven . . . . . . . . . . . . . . . . . . . . . . 368, 759 Zippel, Richard Eliot . . 198, 204, 498, 509, 763, 767
Wilson, Richard Michael . . . . . . . . . . . . . . . . . . . . 88, 758 ZPP . . . . . . . . . . . . . . . . . . . . . . . . . . . 518, 722, 723, 724

á
J. Ó I.J» AÓ ð 1
ú¯ B @ P B @ ð Z AÒË@ ú¯ é J. K A« áÓ
The Holy Qur’ān (732)

Les bons élèves font la gloire du maître.2

Joseph Liouville (1846)

Nihilque ab ullo credi velim,

nisi quod ipsi evidens & invicta ratio persuadebit.3
René Descartes (1644)

The subject is full of pitfalls. I have pointed out

some mistakes made by others, but have no doubt
that I have made new ones. It may be expected that any errors
will be discovered and eliminated in due course.
Francis Sowerby Macaulay (1916)

Wherfore I trust thei that be learned, and happen to reade

this worke, wil beare the moare with me, if thei finde any thyng,
that thei doe mislike: Wherein if thei will use this curtesie,
either by writynge to admonishe me thereof, either
theim selfes to sette forthe a moare perfecter woorke,
I will thynke them praise worthie.
Robert Recorde (1557)

There is a theory which states that if ever anyone discovers exactly

what the Universe is for and why it is here, it will instantly disappear
and be replaced by something even more bizarre and inexplicable.
There is another theory which states that this has already happened.
Douglas Adams (1980)

1 There is nothing hidden in heaven or on earth that is not in a clear book.

2 Good students are the teacher’s glory.
3 I would not want [the reader] to believe anything [I wrote] unless clear and strong reason convinces him of it.

(Studies in Advanced Mathematics) Kehe Zhu - An Introduction To Operator Algebras-CRC Press (1993)
No ratings yet
(Studies in Advanced Mathematics) Kehe Zhu - An Introduction To Operator Algebras-CRC Press (1993)
169 pages
SC - Exploring Analytic Geometry With Mathematica
100% (7)
SC - Exploring Analytic Geometry With Mathematica
883 pages
Problems and Exercises in Integral Equations-Krasnov-Kiselev-Makarenko
100% (3)
Problems and Exercises in Integral Equations-Krasnov-Kiselev-Makarenko
224 pages
A N Kolmogorov, S V Fomin Elements of The Theory of Functions and
No ratings yet
A N Kolmogorov, S V Fomin Elements of The Theory of Functions and
279 pages
Grade 9 Rationalized Mathematics Lesson Plans Term 1
No ratings yet
Grade 9 Rationalized Mathematics Lesson Plans Term 1
147 pages
Shelly Cashman Series Microsoft Office 365 and Access 2016 Introductory 1st Edition Pratt Solutions Manual
75% (4)
Shelly Cashman Series Microsoft Office 365 and Access 2016 Introductory 1st Edition Pratt Solutions Manual
20 pages
Modern Geometry - Methods and Applications - Part II - The Geometry and Topology of Manifolds - PDF Room
100% (1)
Modern Geometry - Methods and Applications - Part II - The Geometry and Topology of Manifolds - PDF Room
447 pages
String Theory
100% (1)
String Theory
247 pages
Baumslag G. A Course in Mathematical Cryptography 2015
100% (2)
Baumslag G. A Course in Mathematical Cryptography 2015
598 pages
Floating Break Water
No ratings yet
Floating Break Water
160 pages
Marketing Measurement and Forecasting
86% (14)
Marketing Measurement and Forecasting
16 pages
Introduction To Axiomatic Set Theory
100% (1)
Introduction To Axiomatic Set Theory
255 pages
Bhatia R Fourier Series
No ratings yet
Bhatia R Fourier Series
131 pages
Davvaz B Vectors and Functions of Several Variables
No ratings yet
Davvaz B Vectors and Functions of Several Variables
428 pages
Bollobás - Linear Analysis (1999)
100% (1)
Bollobás - Linear Analysis (1999)
251 pages
Vdoc - Pub The Theory of Partitions
100% (1)
Vdoc - Pub The Theory of Partitions
271 pages
Pipe and Identification Diagrams
No ratings yet
Pipe and Identification Diagrams
11 pages
Numerical Analysis Introduction
100% (1)
Numerical Analysis Introduction
24 pages
Algebra and Equations
No ratings yet
Algebra and Equations
36 pages
Arora A. - 101 Algorithms Questions You Must Know - 2018
100% (3)
Arora A. - 101 Algorithms Questions You Must Know - 2018
115 pages
A First Course in Computational Algebraic Geometry
100% (1)
A First Course in Computational Algebraic Geometry
133 pages
A Collection of Problems On Complex Analysis
No ratings yet
A Collection of Problems On Complex Analysis
436 pages
Linear Algebra (Mir, 1983)
No ratings yet
Linear Algebra (Mir, 1983)
393 pages
Lectures On The Differential Geometry of Curves and Surfaces (2005) - Blaga
100% (1)
Lectures On The Differential Geometry of Curves and Surfaces (2005) - Blaga
237 pages
012 - Advanced Mathematical Analysis PDF
No ratings yet
012 - Advanced Mathematical Analysis PDF
243 pages
Computer Architecture ECE 361 Lecture 5: The Design Process & ALU Design
No ratings yet
Computer Architecture ECE 361 Lecture 5: The Design Process & ALU Design
55 pages
PHY10 Lesson 2 Kinematics (Full)
No ratings yet
PHY10 Lesson 2 Kinematics (Full)
35 pages
Game Theory Lecture Notes - Levent Kockesen
No ratings yet
Game Theory Lecture Notes - Levent Kockesen
120 pages
Mathematical Notation
100% (1)
Mathematical Notation
5 pages
The Art of Proof
No ratings yet
The Art of Proof
185 pages
Prime Numbers and Cryptography: A L A P
No ratings yet
Prime Numbers and Cryptography: A L A P
8 pages
Handouts RealAnalysis II
100% (1)
Handouts RealAnalysis II
168 pages
Edexcel GCSE Maths Higher Paper 32
No ratings yet
Edexcel GCSE Maths Higher Paper 32
24 pages
m248 Block C
No ratings yet
m248 Block C
123 pages
Definiteness in A Language Without Articles A Study On Polish Adrian Czardybon PDF Download
No ratings yet
Definiteness in A Language Without Articles A Study On Polish Adrian Czardybon PDF Download
74 pages
Harold Cramer-The Elements of Probability Theory and Some of Its Applications-Krieger Publishing Company (1973)
No ratings yet
Harold Cramer-The Elements of Probability Theory and Some of Its Applications-Krieger Publishing Company (1973)
276 pages
F04 MSC Mathematics 0
No ratings yet
F04 MSC Mathematics 0
5 pages
Hypergeometric Functions of Two Variables
100% (1)
Hypergeometric Functions of Two Variables
201 pages
Algebraic Complexity Theory - Burgisser, Clausen & Shokrollahi
No ratings yet
Algebraic Complexity Theory - Burgisser, Clausen & Shokrollahi
323 pages
Ambrosio L. Geometric Measure Theory and Real Analysis (Pisa 2014, IsBN 9788876425233, 236pp)
100% (1)
Ambrosio L. Geometric Measure Theory and Real Analysis (Pisa 2014, IsBN 9788876425233, 236pp)
236 pages
Edwards - 1973 - Advanced Calculus of Several Variables
100% (1)
Edwards - 1973 - Advanced Calculus of Several Variables
465 pages
Applied Functional Analysis Main Principles and Their Applications Zeidler Compress
No ratings yet
Applied Functional Analysis Main Principles and Their Applications Zeidler Compress
417 pages
(Perspectives in Logic 9) Peter G. Hinman - Recursion-Theoretic Hierarchies-Cambridge University Press (2017) PDF
No ratings yet
(Perspectives in Logic 9) Peter G. Hinman - Recursion-Theoretic Hierarchies-Cambridge University Press (2017) PDF
494 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
A New Approach To Current Differential Protection For Transmission Lines
No ratings yet
A New Approach To Current Differential Protection For Transmission Lines
25 pages
H Koenig - Measure and Integration
100% (6)
H Koenig - Measure and Integration
282 pages
(Graduate Texts in Mathematics 265) Konrad Schmüdgen (Auth.) - Unbounded Self-Adjoint Operators On Hilbert Space-Springer Netherlands (2012)
No ratings yet
(Graduate Texts in Mathematics 265) Konrad Schmüdgen (Auth.) - Unbounded Self-Adjoint Operators On Hilbert Space-Springer Netherlands (2012)
435 pages
Combinatorial Algorithms - Edward M Reingold
0% (2)
Combinatorial Algorithms - Edward M Reingold
12 pages
Pengaruh Tata Letak Gudang Terhadap Kelancaran Produktivitas Bongkar Muat Di Gudang Pt. NCT
No ratings yet
Pengaruh Tata Letak Gudang Terhadap Kelancaran Produktivitas Bongkar Muat Di Gudang Pt. NCT
10 pages
Aggregate Functions Combine Multiple Rows Together To Form A Single Value of More Meaningful
No ratings yet
Aggregate Functions Combine Multiple Rows Together To Form A Single Value of More Meaningful
3 pages
Jan 2006 Paper 2
No ratings yet
Jan 2006 Paper 2
16 pages
Vibrant Academy: (India) Private Limited
No ratings yet
Vibrant Academy: (India) Private Limited
2 pages
Autodesk Nastran 2022 Nonlinear Analysis Handbook
No ratings yet
Autodesk Nastran 2022 Nonlinear Analysis Handbook
2 pages
Nonlinear Modal Analysis of A Full-Scale Aircraft
No ratings yet
Nonlinear Modal Analysis of A Full-Scale Aircraft
11 pages
Class Test
No ratings yet
Class Test
10 pages
Geometric Measure Theory by The Book - Notes, Articles and Books by Kevin R. Vixie
No ratings yet
Geometric Measure Theory by The Book - Notes, Articles and Books by Kevin R. Vixie
5 pages
Effect of Friction Coefficient On Finite Element Modeling of The Deep - Cold Rolling Process
No ratings yet
Effect of Friction Coefficient On Finite Element Modeling of The Deep - Cold Rolling Process
5 pages
CSAT - 01 - Explanation File
No ratings yet
CSAT - 01 - Explanation File
27 pages
Engineering Economy Take-Home Exam (Fall 2019-2020)
No ratings yet
Engineering Economy Take-Home Exam (Fall 2019-2020)
5 pages
On Maximal Paths and Circuits Erods Gallai
No ratings yet
On Maximal Paths and Circuits Erods Gallai
20 pages
Elementary Course On PDEs by TMK
No ratings yet
Elementary Course On PDEs by TMK
199 pages
Wca Regulations and Guidelines
No ratings yet
Wca Regulations and Guidelines
25 pages
Tenenbaum Pollard
100% (1)
Tenenbaum Pollard
819 pages
EN3037 FiniteDifference discussion-AMJ
No ratings yet
EN3037 FiniteDifference discussion-AMJ
9 pages
Greenleaf, F.P. Marques, S - Linear Algebra I & II (Lecture Notes) (2014) PDF
100% (1)
Greenleaf, F.P. Marques, S - Linear Algebra I & II (Lecture Notes) (2014) PDF
316 pages
Comments On "Robust Stabilization of A Class of Time-Delay Nonlinear Systems"
No ratings yet
Comments On "Robust Stabilization of A Class of Time-Delay Nonlinear Systems"
1 page
Capgemini Frequent Questions+Previous Year
100% (1)
Capgemini Frequent Questions+Previous Year
57 pages
Reinventing Discovery
No ratings yet
Reinventing Discovery
4 pages
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
No ratings yet
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
8 pages
Luciano M Barone, Enzo Marinari, Giovanni Organtini, Federico Ricci Tersenghi-Scientific Programming - C-Language, Algorithms and Models in Science-World Scientific Publishing Company (2013)
No ratings yet
Luciano M Barone, Enzo Marinari, Giovanni Organtini, Federico Ricci Tersenghi-Scientific Programming - C-Language, Algorithms and Models in Science-World Scientific Publishing Company (2013)
718 pages
Functional Linear Algebra (Hannah Robbins)
100% (4)
Functional Linear Algebra (Hannah Robbins)
406 pages
Duistermaat J.J., Kolk J.a.C. - Multidimensional Real Analysis II - Integration (2004)
No ratings yet
Duistermaat J.J., Kolk J.a.C. - Multidimensional Real Analysis II - Integration (2004)
396 pages
A Story of Real Analysis - Rogers & Boman
100% (3)
A Story of Real Analysis - Rogers & Boman
221 pages
Rings and Ideals A First Course in
No ratings yet
Rings and Ideals A First Course in
208 pages
KRR UNIT-1 Part-2 Logic - Historical Background - Representing Knowledge in Logic - Varieties of Logic
100% (1)
KRR UNIT-1 Part-2 Logic - Historical Background - Representing Knowledge in Logic - Varieties of Logic
15 pages
Functions of Matrices Theory and Computation TQW - Darksiderg PDF
100% (3)
Functions of Matrices Theory and Computation TQW - Darksiderg PDF
446 pages
Burkill A Second Course in Mathematical Analysis PDF
No ratings yet
Burkill A Second Course in Mathematical Analysis PDF
268 pages
Introduction to Combinatorial Analysis
From Everand
Introduction to Combinatorial Analysis
John Riordan
4.5/5 (3)
The Theory of Matrices in Numerical Analysis
From Everand
The Theory of Matrices in Numerical Analysis
Alston S. Householder
3.5/5 (3)
Nonnegative Matrices and Applicable Topics in Linear Algebra
From Everand
Nonnegative Matrices and Applicable Topics in Linear Algebra
Alexander Graham
No ratings yet
Geometric Algebra
From Everand
Geometric Algebra
Emil Artin
5/5 (1)
Combinatorial Algorithms: Enlarged Second Edition
From Everand
Combinatorial Algorithms: Enlarged Second Edition
T. C. Hu
3.5/5 (2)
Counterexamples in Probability: Third Edition
From Everand
Counterexamples in Probability: Third Edition
Jordan M. Stoyanov
No ratings yet
Numerical Methods for Two-Point Boundary-Value Problems
From Everand
Numerical Methods for Two-Point Boundary-Value Problems
Herbert B. Keller
No ratings yet
An Introduction to the Calculus of Variations
From Everand
An Introduction to the Calculus of Variations
L.A. Pars
No ratings yet
Integration, Measure and Probability
From Everand
Integration, Measure and Probability
H. R. Pitt
No ratings yet
A Short Course in Automorphic Functions
From Everand
A Short Course in Automorphic Functions
Joseph Lehner
No ratings yet
Elementary Theory and Application of Numerical Analysis: Revised Edition
From Everand
Elementary Theory and Application of Numerical Analysis: Revised Edition
David G. Moursund
No ratings yet
Exercises of Distributions
From Everand
Exercises of Distributions
Simone Malacrida
No ratings yet
Introduction to Equations and Disequations
From Everand
Introduction to Equations and Disequations
Simone Malacrida
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.