0% found this document useful (0 votes)
30 views29 pages

Topic 1 Handout

The document outlines a course on Numerical Mathematics at Maastricht University, focusing on computer arithmetic and algebraic equations. It includes topics such as numerical solutions of differential equations, polynomial interpolation, and numerical linear algebra, with a grading system based on exams and homework. The course emphasizes the importance of individual work and adherence to academic regulations.

Uploaded by

Nathan Bouquet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views29 pages

Topic 1 Handout

The document outlines a course on Numerical Mathematics at Maastricht University, focusing on computer arithmetic and algebraic equations. It includes topics such as numerical solutions of differential equations, polynomial interpolation, and numerical linear algebra, with a grading system based on exams and homework. The course emphasizes the importance of individual work and adherence to academic regulations.

Uploaded by

Nathan Bouquet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Numerical Mathematics

Computer Arithmetic & Algebraic Equations


Pieter Collins

Department of Knowledge Engineering


Maastricht University
pieter.collins@maastrichtuniversity.nl

KEN1540, Block 5, April-May 2021

Organisation 2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Regulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Homeworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Mathematical Preliminaries 9
Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Rate of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Computer Arithmetic 12
Matlab arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Decimal expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Significant figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Scientific notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Representations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Binary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Floating-point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Machine epsilon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Matlab floats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Errors in Scientific Computing 27


Sources of error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Absolute/relative error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Error estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Rounded arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Fixed/floating point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Accuracy/precision. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Working guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

1
Reducing Errors in Scientific Computing 39
Subtraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Quadratic formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Nested form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Solutions of Equations of One Variable 48


Algebraic equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Existence of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
The bisection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
The secant method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Stopping criteria. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Newton method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Rounding effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Parametrised
. . . . . . . . . . . equations
............................................................................ 73
Systems of equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Brent’s method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

2
Organisation 2 / 76

Introduction

Numerical mathematics deals with methods for the solution of problems in continuous mathematics which can be
implemented on a digital computer.
Typically, use floating-point arithmetic to perform approximate calculations on real numbers.
Based on ideas and techniques from calculus and linear algebra, but yields numerical values for the solution of
specific problems, rather than general formulae.
Important part of data science:
• estimate models from data,
• generate data as predictions from models, and
• compute properties of data directly.
3 / 76

Course

Topics
1. Computer Arithmetic & Algebraic Equations
2. Numerical Solution of Differential Equations
3. Polynomial (and Spline) Interpolation
4. Numerical Integration and Differentiation
5. Least-Squares Approximation
6. Numerical Linear Algebra
Classes Per topic: 2-3h lectures; 3-4h tutorials.
Plus: 2h revision tutorial.
Grading
80% Written exam (with calculator),
20% Homework programming assignments (4×5%).
10% Homework questions (preparation for tutorial).

4 / 76

Regulations

Assignments The graded assignments are individual assignments, and follow standard DKE regulations as
such.

Guidelines:
• You may not receive help solving a graded assignment from anybody else, including working together or
sharing code.
• Any sources (other than the textbook, slides, the Student Portal, and other material presented in-class) must
be referenced.
• You may work with other students to understand the material and on non-graded assignments (and are encouraged to do so)
• If you have written previously written code for a related problem together with other students, you should
re-write the code yourself for the graded assignment.
• If you are unsure whether any work you have done together is allowed, you should declare this on your
homework.

3
5 / 76

Homeworks

Homeworks The homeworks are a vital part of the course!!! There is a very strong correlation between doing
the homeworks and passing the course!!!!!

Preparation You should attempt a significant proportion of the homeworks before the tutorials. Part of the
grade (for DKE students) is based on preparation. This way, we can spend time going over questions which you
find difficult.

Learning This course has a lot of formulae, which may seem hard at first, but don’t panic! With practise, most
of the questions should become routine. But you do really need to put the work in!

6 / 76

Computer use

Tutorials Bring your computer to the tutorial classes!

Matlab You are expected to have access to a computer with Matlab.


Alternatively, you may use a Matlab clone, such as GNU Octave or Scilab.

7 / 76

Online Learning

Instead of giving lectures in class time, I will pre-record lecture snippets.

You should read the slides and watch the snippets before the first class on a topic.

All class-time will be run as tutorial sessions. This will give you the maximum time to ask questions and
receive feedback.

In general, during tutorials, I will answer common questions in a “plenary” session, while the teaching
assistants provide individual help.

Online teaching is new to me (and new-ish to you), so this approach may change if it seems not to be working!

8 / 76

4
Mathematical Preliminaries 9 / 76

Calculus

• Definition of limit, derivative and integral.

• Differentiation including product and chain rules.

• Integrals of polynomials.

◦ No need to be able to perform complex integration :)


• Intermediate value theorem and mean value theorem.

• We will cover Taylor series later!

10 / 76

Rate of convergence

Positive limits
Write an ց 0 or an → 0+ if all an ≥ 0 and limn→∞ an = 0.

Big-O Notation
If an , bn ց 0 as n → ∞, say an = O(bn ) if there is a constant C > 0 such that an ≤ Cbn for all n.
If f, g ց 0 as h → 0, say f = O(g) if there is a constant C > 0 such that f (h) ≤ Cg(h) whenever |h| < 1.

Little-o Notation
Say an = o(bn ) if limn→∞ an /bn = 0.
Say f = o(g) if limh→0 f (h)/g(h) = 0.

2n 6
Example The sequence an = n+3 satisfies |an − 2| = n+1 ≤ 6 × n1 .
Hence an − 2 = O(1/n). Say an converges to 2 at rate O(1/n).

Example If f ′ (x) = 0, then f (x + h) − f (x) = o(h).

11 / 76

5
Computer Arithmetic 12 / 76

Arithmetic in Matlab

Let’s try doing some simple arithmetic with Matlab:


>> 0.1+0.3+0.6 >> 0.1+0.3+0.6
ans = 1
This is what we expect.
>> 0.6+0.3+0.1
ans = 1.0000
Also as expected. But why this time 1.0000 instead of 1?
Substract 1 from the answer:
>> (0.6+0.3+0.1)-1
ans = -1.1102e-16
The answer is not exactly 0! But why does this occur??

13 / 76

Computer arithmetic

Try displaying more digits in Matlab:


>> format long
>> 0.6+0.3+0.1
ans = 1.000000000000000
>> (0.6+0.3+0.1)-1
ans = -1.11022302462516e-16
Try using Python:
>>> 0.6+0.3+0.1
0.9999999999999999
>>> (0.6+0.3+0.1)-1
-1.1102230246251565e-16
Now we see that 0.6 + 0.3 + 0.1 is computed to a value different from 1!
Matlab does not display sufficient digits to distinguish computed value from 1, whereas Python displays enough
digits to read a number back in.
We shall see that the computed value of 0.6 + 0.3 + 0.1 is exactly 1 − 2−53 .

14 / 76

6
Numbers and representations

Numbers What kinds of numbers are there?


Integers, Rationals, Reals, Complex, ...

Integers What are integers, and how can we describe them?


Positive integers count “how many ” objects there are in a finite set.
Decimal “42”, binary “1010102 ”, and English “forty-two” are different representations of the same number.
..............
The each of these descriptions means “as many as ............................”.
Even though there are infinitely many integers, we can specify any integer with a finite amount of data.

Real numbers What are real numbers, and how can we describe them? Positive real numbers measure “how
much”, “where”, or “when”. √
Representations include symbolic “ 2” and decimal “1.414213562373 · · · ”.
Real numbers are uncountable, would need an infinite amount of data for a representation capable of describing
all of them!
15 / 76

Decimal expansions of real numbers

Rational Rational numbers have terminating or recurring decimal expansions.


e.g. 14 = 0.25, 1
6 = 0.16̇, 1
7 = 0.1̇42857̇ = 0.142857142857 · · · .
— Note that some numbers have two different, but equal representations!
e.g. 0.25 = 0.250̇ = 0.249̇ = 0.249999 · · · .

Irrational
√ Most real numbers are irrational and don’t have a repeating decimal.
e.g. 2 = 1.41421356 · · · , e = 2.718281828 · · · , π = 3.1415926535 · · · .
— But each of the numbers above can be represented by a finite formula
P∞ 1 1 n
e.g. e = n=0 n! = limn→∞ (1 + n ) .
— And we can write a program compute arbitrarily many digits of the decimal expansion!

Uncomputable For “almost all” real numbers, there is no finite description of the decimal expansion!
— Requires “Computing with Infinite Data”. [Now, that’s BIG Data!!]

16 / 76

Decimal approximations to real numbers

Approximation Usually we only require a reasonably good approximation to a real number!

Digits far after the point have a small impact on the value.

Decimal places Approximate real numbers to a finite number of decimal places.

— Round to the nearest representable number.


e.g. π = 3.14159 (5 dp ) = 3.1416 (4 dp ) = 3.142 (3 dp ) = 3.14 (2 dp ).

— Traditionally, round ties (i.e. halves) away from zero.


e.g. 5.45 = 5.5 (1 dp ); −5.45 = −5.5 (1 dp ).

— Don’t round an already-rounded number!


e.g. 5.45 = 5.5 (1 dp ) = 5 (0 dp ) even though 5.5 = 6 (0 dp)! [Sorry]

17 / 76

7
Significant figures

Significant figures The number of significant figures of an approximation is the number of digits excluding
leading zeros.
π = 3.14159 (to 5 decimal places, 6 significant figures)
α = 0.007 297 352 57 (11 dp , 9 sf)

Zero Note that 0 has no significant figures!

Units The number of significant figures is independent of the unit used.


e.g. The density of gold is 19.32 g cm−3 (4 sf, 2 dp) = 19320 kg m−3 (4 sf).
Whereas the number of decimal places depends on the units used.
e.g. The density of gold is ρAu = 19.32 g cm−3 (2 dp) = 1932? kg m−3 .

18 / 76

Scientific notation

Scientific Notation Write a number as a value ± m × 10e


where 1 ≤ m < 10 and e is an integer.
e.g. α = 0.007 297 352 57 (11 dp , 9 sf) = 7.297 352 57×10−3 (9 sf).

Mantissa and exponent m is the mantissa and e the exponent.

Significant figures The length of the mantissa is the number of significant figures.
e.g. 10200 = 1.020 × 104 (4 sf).

Physical constants Scientific notation is especially useful in physics, where quantities are often very, very
large or small:
e.g. h = 6.62607015×10−34 J s
c = 299 792 458 m s−1 = 2.99792458×108 m s−1
= 3.00×108 m s−1 (3 sf)

19 / 76

Digital representations

Memory How can we represent numbers on a digital computer?


Digital computer memory consists of a huge number of electromagnetic switches, capable of storing values ↑ or
↓.

Hardware Current digital computer hardware works most efficiently with fixed-width data types.
Only finitely many values can be represented in a fixed-width type.

Software Infinite data types must be implemented in software.


Countable types like the integers can be represented by lists of fixed-size words.
Uncountable types like the reals can be represented by infinite streams of data.
— At any time, we only have a finite approximation to the result.

20 / 76

8
Binary integers

Binary Each memory location can store a single binary digit (bit).
Represent 0 by ↓ and 1 by ↑.

Fixed-width Use a fixed number of digits for elementary data types.


e.g. Java’s int uses 32-bits to hold a value between −231 and 231 −1.
Example The number 42 in 8-bit binary is 001010102 , or ↓↓↑↓↑↓↑↓.
Using 16-bits, have 00000000001010102 , or ↓↓↓↓↓↓↓↓↓↓↑↓↑↓↑↓.

Variable-width Arbitrarily-sized integers can be implemented in software using a list of (e.g. 32-bit) words.
e.g. mpz t from the GNU Multiple-Precision Library (GMP).

21 / 76

Fixed- and floating-point numbers

Fixed-point A fixed-point representation of the real numbers uses a fixed number of fractional (binary) digits.
Used in signal and image processing, less often for scientific computing.

Floating-point Use a fixed number of significant digits in the mantissa, and determine the size by the
exponent.
Single-precision An IEEE standard, 32-bit floating-point format.
Double-precision Currently, the most commonly used approach for representing real numbers is the 64-bit
IEEE 754 double-precision binary floating-point format:
11-bit exponent
1-bit sign
z}|{ z }| {
±XX · · · X
± 1·XX
| {z· · · X} × 2
1+52-bit mantissa
Example −6.75 = −1.6875 × 22 = −1·1011000 · · · 02 ×2+00000000102

Dyadic Any binary fixed- or floating- point number is a dyadic of the form p/2q for integers p, q .

22 / 76

Floating-point arithmetic

Inexact The result of an arithmetical operation on floating-point numbers need not be a floating-point number!
e.g. 1/3 = 1.00000002 /11.0000002 = 0.0101010101 · · · 2 .

Round-to-Nearest Round the result of any arithmetical operation to the nearest representable number.
e.g. Using a 1+7-bit mantissa,
1/3 = 1.00000002 /11.0000002 = 0.010101010101· · · 2 ≈ 0.0101010112 .
Break-ties-to-even.
e.g. 1.00000012 + 0.100000012 = 1.100000112 ≈ 1.10000102 .

Directed Round upward or downward.


85 1 86
256 = 0.010101012 < 3 = 1.0000002 /11.000002 < 256 = 0.010101102

Example ( 17 + 47 ) + 27 ≈ (0.00100100102 + 0.100100102 ) + 0.0100100102


= 0.1011011010 2 + 0.0100100102 = 0.111111110 2 = 0.111111112
23 / 76

9
Machine epsilon

Machine epsilon The difference between 1 and the next higher representable number.
For -precision floating-point,
ǫ = 1+ − 1 = 2−23 ≈ 1.1921×10−7 .
For double-precision floating-point,
ǫ = 1+ − 1 = 2−52 ≈ 2.2204×10−16 .

Spacing Over the interval [1, 2], numbers have a spacing of ǫ.


Over [ 21 , 1], the spacing is ǫ/2; on [2, 4] it is 2ǫ.
Small numbers are more closely spaced allowing greater precision; large numbers more widely spaced.

Minimum/maximum representable number For double-precision floating-point, the minimum strictly-positive


representable number is
0+ = 2−1074 ≈ 4.94×10−324 .
The maximum representable number is
∞− = 21023 (2 − ǫ) = 21024 (1 − ǫ/2) ≈ 1.798×10308 .
24 / 76

Floating-point numbers in Matlab

Double precision By default, Matlab uses double-precision floating-point numbers.

Single precision Use single(x) to convert x to single-precision. Use double(xs) to convert to


double-precision.

Display format By default, Matlab only displays 4-5 significant figures.


To display 15 significant figures, use
>> format long
To go back to 4-5 significant figues, use
>> format short
The use of format long is vital for displaying intermediates and results of highly accurate calculations!!

25 / 76

Philosophical question

Philosophical question Do Klingons use floating-point?

26 / 76

10
Errors in Scientific Computing 27 / 76

Sources of error

There are three main sources of error in scientific computing:


Roundoff errors Errors due to the use of inexact (floating-point) arithmetic for computations.
• Usually extremely small for simple double-precision calculations, but may become significant for long
calculations or due to ill-conditioning of a problem or method.

Truncation errors Errors due to the use of an inexact method.


• For example, the approximation f ′ (x) ≈ (f (x + h) − f (x − h))/2h has a trunctation error O(h2 ).

Errors in data Data often contains measurement errors.


• Although we as knowledge engineers cannot do anything about these errors, we can try and estimate their
impact on the final result, and maybe even choose a method which reduces this.

28 / 76

Absolute and relative errors

Absolute error The absolute error in approximating p by p∗ is |p∗ − p|;


equivalently |p − p∗ |.

Relative error The relative error in approximating p by p∗ is |p∗ − p|/|p|.


An alternative form of the relative error is |p∗ /p − 1|.
The relative error is dimensionless; it is the same in any units.

Example Compute the absolute and relative errors of the approximation π ≈ 22


7 .
22
p = π = 3.1415927 · · · p∗ = 7 = 3.1428571 · · ·
Absolute error:
| 22
7 − π| = |3.1428571 · · · − 3.1415927 · · · | = 0.0012645 · · · = 1.3×10
−3
(2 sf)
Relative error:
| 22
7 − π|/|π| = 0.0012645 · · · /3.1415927 · · · = 0.00040250 · · · = 4.0×10
−4
(2 sf)
| 22
7 /π − 1| = |3.1428571 · · · /3.1415927 · · · − 1| = |1.0004024 · · · − 1|
= 4.0×10−4 (2 sf)

29 / 76

Error computation

Error computation Use an unrounded version of the exact value, or a version rounded to much higher
precision than your approximation.
e.g. π ≈ π ∗ = 3.14 with relative error
|3.14 − π|/π = |3.14/π − 1| = 0.00050697 · · · = 5.1×10−4 = 0.051% (2 sf)
≈ |3.14/3.1416 − 1| = 0.00050929 · · · = 0.051% (2 sf)
∼ |3.14/3.142 − 1| = 0.00063654 · · · = 0.064% (2 sf)
6≈ |3.14/3.14 − 1| = 0.
30 / 76

11
Error estimates and bounds

Exact errror In practice, we use numerical estimates because we don’t know the exact value. In this situation,
we can’t compute the actual error!
However, sometimes we use a numerical method for a problem for which we have an exact answer to test the
method itself! Here, the exact error indicates the quality of the method.

Error estimates An error estimate is a value ẽ such that |p∗ − p| ≈ ẽ.

Error bounds An error bound is a value ē such that |p∗ − p| ≤ ē.

31 / 76

Error specification

Decimal places Requesting an answer to n decimal places corresponds to an absolute error of 10−n .
Significant figures Requesting an answer to n significant figures corresponds to a relative error of roughly
10−n .

Error specification In general, it is better to request a given number of significant figures when computing a
positive quantity (e.g. area), since this is independent of the units used.
For a dimensionless quantity which may be positive or negative, it is usually better to request a given number of
decimal places.
If we request a number of significant figures, and the quantity is near zero, then we often need to compute
with very high precision!
For a physical quantity which may be positive or negative, it is best to specify an accuracy relative to a
characteristic scaling for the problem.
e.g. For the difference in surface area of two balls whose diameter is measured
using a ruler with 1mm markings, might aim find the answer to within 10mm2 .

32 / 76

Rounded arithmetic

Floating-point When working with floating-point numbers, the result of every arithmetical operation is rounded
to the nearest representable number.
Accumulation of rounding errors can cause significant errors in the final result.

Decimal rounded arithmetic We can simulate the effect of round-off errors by performing hand calculations to
a fixed number of decimal significant figures.

Example Exact computation


π × e = 3.14159 · · · × 2.71828 · · · = 8.53973 · · · = 8.54 (3 sf) = 8.5 (2 sf).
Three-digit rounded arithmetic:
3 sf 3 sf
π × e ≈ 3.14 × 2.72 = 8.5408 ≈ 8.54.
Two-digit rounded arithmetic:
2 sf 2 sf
π × e ≈ 3.1 × 2.7 = 8.37 ≈ 8.4.

Important Round after every operation!


2 sf 2 sf 2 sf
π × e2 ≈ 3.1 × 2.72 = 3.1 × 7.29 ≈ 3.1 × 7.3 = 22.63 ≈ 23. .
33 / 76

12
Example of rounded arithmetic

Example Let f (x) = x3 − 5.34x2 + 1.52x + 4.61. Evaluate f at 4.89 using 3-digit rounded arithmetic.
Compare your answer to the exact value.
3 sf
x2 = x × x = 4.89 × 4.89 = 23.9121 ≈ 23.9
3 sf 3 sf
x3 = x2 × x ≈ 23.9 × 4.89 = 116.871 ≈ 117.
3 sf 3 sf
5.34x2 = 5.34 × 4.892 ≈ 5.34 × 23.9 = 127.626 ≈ 128.
3 sf
1.52x = 1.52 × 4.89 = 7.4328 ≈ 7.43
f (x) = ((x3 − 5.34x2 ) + 1.52x) + 4.61
3 sf
≈ ((117. − 128.) + 7.43) + 4.61 = (−11.0 + 7.43) + 4.61 = −3.57 + 4.61
= 1.04.

Exact answer f (4.89) = 1.282355 = 1.28 (3 sf).


Relative error 19%, even though each step has a relative error of 0.1%!

34 / 76

Example of rounded arithmetic

Example Let f (x) = x3 − 5.34x2 + 1.52x + 4.61. Evaluate f at 4.89 using single-precision arithmetic.
Compare your answer to the exact value (estimated using double-precision arithmetic).
In Matlab:
c=[1.00,-5.34,1.52,4.61], x=4.89
fx=polyval(c,x)
sc=single(c), sx=single(x)
sfx=polyval(sc,sx)
es = abs(double(sfx)-fx)/abs(fx)

Exact value f (x) = 1.282355 (given by fx).


Single-precison result f (x) ≈ 1.2823482 (given by sfx).
Absolute error of 6.84×10−6 , relative error 5.3×10−6 .
Again, the accumulated error 5.3×10−6 is much higher than the machine epsilon for single-precision
ǫ = 2−23 ≈ 1.2×10−7 .
35 / 76

Fixed-point versus floating-point

Fixed-point Addition and subtraction are exact when working to a fixed number of decimal places!
5 dp 3 dp
e.g. 148.41316 + 0.00067 = 148.41383; 149.905 − 146.936 = 2.969.

Multiplication of small numbers behaves poorly in fixed-point arithmetic!


5 dp
e.g. 0.00674 × 0.00034 ≈ 0.00000.

Floating-point Under multiplication and division in floating-point arithmetic, small relative errors remain small!
4 sf
e.g. 403.4 × 0.006738 = 2.7181092 ≈ 2.718
3 sf
e.g. 0.00674 × 0.000335 ≈ 0.0000022579 = 2.26 ×10−6
Subtraction of almost-equal numbers causes loss of precision!
6 sf
e.g. 149.905 − 146.936 = 2.969 = 2.96900 .

13
36 / 76

Accuracy and precision

Accuracy Accurate to n digits mean n digits are correct (±1 in last digit).

Precision Precision is number of digits used.


e.g. 3.1428571 is an approximation to π = 3.1415926 · · · specified with a precision of 8 digits, but only
accurate to 3 digits.

Giving an answer to higher precision than the accuracy is useless,


and gives a false impression of the accuracy!!

A certain amount of extra precision is useful in intermediate values to prevent unnecessary loss of accuracy
when rounding.

37 / 76

Working guidelines

Final answer If not specifically asked for, use the precision appropriate for the accuracy.
e.g. If the accuracy is ±0.02, give 2 decimal places of precision.

Intermediates Use more precision for intermediate results than needed in final answer.
— For hand calculations, try to use at least two (decimal) significant figures more.
— For computer calculations, use machine precision in intermediate results (and if necessary, write out as for
hand calculations).

Errors Use at most two significant figues when giving an error (estimate).
— e.g. Absolute error 0.0013, relative error 0.04%.
If asked to compare an approximate value with the exact value, use more precision for the exact value!

38 / 76

14
Reducing Errors in Scientific Computing 39 / 76

Subtraction

Loss of significance When subtracting two almost-equal quantities in rounded


floating-point
or arithmetic, many
significant figures of accuracy can be lost!

Example Compute x3 − y 3 using three-digit arithmetic for x = 427, y = 426.


x3 − y 3 = 4273 − 4263 = 77854483 − 77308776
3 sf 3 sf
≈ 77900000 − 77300000 = 600000 = 6.00×105 .
Exact answer 545707. High relative error of 9.9%.
Re-write x3 − y 3 = (x − y) × (x2 + xy + y 2 ). Then
x3 − y 3 = (427 − 426) × (4272 + 427×426 + 4262 )
= 1 × (182329 + 181902 + 181476)
3 sf
≈ 1 × (182000 + 182000 + 181000) = 545000.
Exact answer 545707. Relative error 1.3×10−3 = 0.13%.

Safe subtraction Subtraction of exact values at the first step is safe! This is because errors have not had a
chance to accumulate.

40 / 76

Subtraction

Example Now compute x3 − y 3 using single-precision arithmetic for the values x = 427, y = 426.
x3 − y 3 = 4273 − 4263 = 77854483 − 77308776
sp sp
≈ 77854480 − 77308776 = 545704 = 545704.
Exact answer 545707. Relative error 5.5×10−6 .
Re-write x3 − y 3 = (x − y) × (x2 + xy + y 2 ). Then
x3 − y 3 = (427 − 426) × (4272 + 427! ×426 + 4262 )
= 1 × (182329 + 181902 + 181476)
sp sp
≈ 1 × (182329 + 181902 + 181476) = 545707 ≈ 545707.
Answer is exact!

41 / 76

15
Quadratic formula

Problem Compute the positive root of 0.5x2 + 2x − 0.05 using 3-digit arithmetic.
Use the quadratic formula

−b ± b2 − 4ac
x=
2a
Take a = 0.5, b = 2, c = −0.05.

b2 − 4ac = 22 − 4 × 0.5 × (−0.05) = 4 × (−0.1) = 4.1,


2a = 2 × 0.5 = 1,
√ √
b2 − 4ac − b 4.1 − 2 2.02498 · · · − 2
x= = =
2a 1 1
3 sf 3 sf
≈ 2.02 − 2 = 0.02 = 0.0200.
Exact answer 0.02484567 · · · = 0.0248 (3 sf).
Absolute error |0.02 − 0.02484567| = 0.00484567 · · · = 0.0048 (2 sf ).
Relative error |0.00484567|/|0.02484567| = 0.195031 · · · = 0.20 (2 sf ) ≈ 20%!!

42 / 76

Quadratic formula
Rearrange the formula by completing the square:
√ √ √
b2 − 4ac − b b2 − 4ac − b b2 − 4ac + b
x= = × √
2a 2a b2 − 4ac + b
2 2
(b − 4ac) − b −4ac
= √ = √
2
2a( b − 4ac + b) 2
2a( b − 4ac + b)
−2c
=√
b2 − 4ac + b

Example Compute the positive root of 0.5x2 + 2x − 0.05 using 3-digit arithmetic.
−2c −2 × (−0.05) 0.1
x= √ = √ =
b2 − 4ac + b 4.1 + 2 2.02498 · · · + 2
3 sf 0.1 3 sf
≈ = 0.1/4.02 = 0.0248756 · · · ≈ 0.0249.
2.02 + 2
Exact answer x = 0.02484567 · · · = 0.0248 (3 sf ).
Absolute error |0.0249 − 0.02484567| = 0.00054326 · · · = 0.00054 (2 sf ).
Relative error |0.00054326|/|0.02484567| = 0.002187 · · · = 0.0021 (2 sf ) ≈ 0.2%.

43 / 76

16
Polynomials in Horner nested form

Problem Evaluate f (x) = x3 − 5.34x2 + 1.52x + 4.61 at x = 4.89 using 3-digit arithmetic.
We previously found f (x) ≈ 1.04 by direct evaluation; relative error 19%. Re-write in nested form (also known
as Horner’s rule):
x3 − 5.34x2 + 1.52x + 4.61 = (x2 − 5.34x + 1.52) · x + 4.61

= (x − 5.34) · x + 1.52 · x + 4.61
f (4.89) = ((4.89 − 5.34) × 4.89 + 1.52) × 4.89 + 4.61
= (−0.45 × 4.89 + 1.52) × 4.89 + 4.61
3 sf
= (−2.2005 + 1.52) × 4.89 + 4.61 ≈ (−2.20 + 1.52) × 4.89 + 4.61
3 sf
= −0.68 × 4.89 + 4.61 = −3.3252 + 4.61 ≈ −3.33 + 4.61
3 sf
= 1.28

Exact answer f (4.89) = 1.282355 = 1.28 (3 sf).


Correct to given precision!

44 / 76

Nested form

Problem Evaluate f (x) = x3 − 5.34x2 + 1.52x + 4.61 at x = 4.89 using 3-digit arithmetic in Matlab.
Use round(x,n,’significant’) or the rnd(x,n) method on the Student Portal to round x to n significant
figures.
Use the shorthand r=@(x)round(x,3,’significant’) to reduce implementation.
c=[1.0,-5.34,1.52,4.61]
fdirect = @(x) c(1)*x^3 + c(2)*x^2 + c(3)*x + c(4)
fnested = @(x) ((c(1)*x+c(2))*x+c(3))*x+c(4)
fdirectrounded = @(x) r(r(r(r(r(x*x)*x)-r(5.34*r(x*x)))
+r(1.52*x))+4.61)
fnestedrounded = @(x) r(r(r(r(r(x-5.34)*x)+1.52)*x)+4.61)
Alternatively, use the Rounded class from the Student Portal.
xr=Rounded(x,3)
ydr=fdirect(xr); ydr.value
ynr=fnested(xr); ynr.value

45 / 76

17
Nested form

The nested form of


Pn
k=0 ak x
k
= an xn + an−1 xn−1 + · · · + a2 x2 + a1 x + a0
is

(· · · (an x + an−1 ) · x + · · · ) · x + a2 ) · x + a1 ) · x + a0
Here, the formula is simply evaluated from left to right.
Alternatively, starting with the lowest power first:
Pn 
k=0 = a0 + x · a1 + x · a2 + x · (· · · + x · (an−1 + x · an ) · · · )
But here, we evaluate from right to left.
e.g. For n = 5,
a5 x 5 + a4 x 4 + a3 x 3 + a2 x 2 + a1 x + a0
= ((((a5 · x + a4 ) · x + a3 ) · x + a2 ) · x + a1 ) · x + a0
= a0 + x · (a1 + x · (a2 + x · (a3 + x · (a4 + x · a5 ))))

46 / 76

Quality of methods

A good method for a problem will always give an accurate answer, regardless of the input.

A bad method gives an inaccurate answer on some (or most) inputs,


but may give a (very) accurate answer in some cases.
e.g. Horner’s method does not always give a more accurate result than direct evaluation, does not have as bad a
worst-case.

47 / 76

18
Solutions of Equations of One Variable 48 / 76

Algebraic equations

Example problem Suppose x2 = a. Can compute a easily from x by multiplication.



But how can we determine x given a? i.e. Compute x = a.
Approach Solve the equation f (x) = x2 − a = 0 for x in terms of a.

Example problem Suppose we know variables x and y are related by


cos(x) − x + ex y + y 3 = 0.
How can we determine y for various values of x? Or x for a given value of y ?
Approach Fix x-values (x0 , x1 , . . . , xn ), and try to find y -values (y0 , y1 , . . . , yn ). i.e. Solve equation of the
form f (xi , y) = 0 to find yi .
0.5

-0.5

-1

-1.5
-4 -2 0 2 4 6

49 / 76

Algebraic equations

General Problem Given a continuous function f : R → R and real numbers a < b,


solve f (x) = 0 for x ∈ [a, b].
Roots A value p such that f (p) = 0. p is called a root of f .
Note that f may have many roots in [a, b], or none at all!

Approximation Given a tolerance ǫ, compute some p∗ within ǫ of an actual root p.


Error The (absolute) error is |p∗ − p|.
Residual The residual is |f (p∗ )|.

Example f (x) = x2 − 2 has root p = 2 = 1.414 · · · ; approximate by p
. ∗ = 1.4
Error: |p∗ − p| = 0.014 · · · = 1.4×10−2 (2 sf);
Residual: |f (p∗ )| = |1.42 − 2| = |1.96 − 2| = 0.04 = 4 ×10−2 .

50 / 76

19
Existence of solutions

Intermediate value theorem Suppose


(i) f : [a, b] → R is continuous, and
(ii) f (a) and f (b) have different signs
(i.e. f (a) < 0 and f (b) > 0, or f (a) > 0 and f (b) < 0).
Then f has a root in (a, b).

a b

Bracket Call [a, b] a bracket for the root(s) of f .

Signs Note that if f (a), f (b) 6= 0, then


sgn(f (a)) 6= sgn(f (b)) ⇐⇒ f (a)f (b) < 0.
51 / 76

The bisection method

Problem Find a root of f given a bracket [a, b].

Idea Shrink the bracket [a, b] to a point while preserving the bracket property.

a+b
Method Let c be the midpoint of [a, b], which is given by c = .
2
If f (c) = 0, then c is a root.
Otherwise sgn(f (c)) differs from either sgn(f (a)) or sgn(f (b)).

a c b
Update a := c if sgn(f (a)) = sgn(f (c)) 6= sgn(f (b)),
or b := c if sgn(f (a)) 6= sgn(f (c)) = sgn(f (b)).

a c b
The width of the interval [a, b] is halved.

52 / 76

20
The bisection method

Problem Find a root of f given a bracket [a, b].

Idea Shrink the bracket [a, b] to a point while preserving the bracket property.

a+b
Method Let c be the midpoint of [a, b], which is given by c = .
2
Update a := c if sgn(f (a)) = sgn(f (c)) 6= sgn(f (b)),
or b := c if sgn(f (a)) 6= sgn(f (c)) = sgn(f (b)).

a c b a c b

Termination Stop when we can locate the root to within a tolerance ǫ.


b−a
If the radius of [a, b], which is given by , is less than ǫ, then any point in [a, b], including the root p, is within
2
ǫ of the midpoint c.
Taking p∗ = (a + b)/2 then yields |p∗ − p| < ǫ.

53 / 76

Iterative methods

Iterative methods The bisection method is an iterative method: we apply the same steps over and over again.
Iterative methods are typically implemented as a loop:
input f, a, b, ǫ such that sgn(f (a)) 6= sgn(f (b)) < 0 and ǫ > 0. while (b − a)/2 > ǫ,
c := (a + b)/2;
if sgn(f (c)) = sgn(f (a))
then a := c, b := b
else a := a, b := c
end if
end while
r := (a + b)/2
Here, we overwrite variables as they are no longer needed.

54 / 76

21
Iterative methods

Iterative methods The bisection method is an iterative method: we apply the same steps over and over again.

Indexed values In mathematical work, or if a record of previous values is needed, we often index the variables
by the loop-count:
input f, a0 , b0 , ǫ such that sgn(f (a0 )) 6= sgn(f (b)) and ǫ > 0. n:=0;
while (bn − an )/2 > ǫ,
cn := (an + bn )/2;
if sgn(f (cn )) = sgn(f (an ))
then an+1 := cn , bn+1 := bn
else an+1 := an , bn+1 := cn
end if
n := n + 1
end while
r := (an + bn )/2
55 / 76

The bisection method



Example Estimate 2 to within 0.1.

Compute √ 2 by solving x2 = 2, or equivalently, x2 − 2 = 0.
Since 1 < 2 < 2, solve in interval [1, 2].
So need to find a root of f (x) = x2 − 2, with initial a = 1 and b = 2.
Compute f (a) = f (1) = 12 − 2 = −1 and f (b) = f (2) = 22 − 2 = +2.
The midpoint of the interval [1, 2] is 1+2
2 = 1.5, so set c = 1.5.
Compute f (c) = f (1.5) = 1.52 − 2 = 2.25 − 2 = 0.25.
Since f (c) > 0 has the opposite sign to f (a), keep a = 1.0 and set b := c = 1.5.

56 / 76

The bisection method



Example Estimate 2 to within 0.1.
Continue by finding a root of f (x) = x2 − 2 in the interval [a, b] = [1.0, 1.5].
1.0+1.5
Set c = a+b
2 = 2 = 1.25.
Compute f (c) = 1.252 − 2 = 1.5625 − 2 = −0.4375.
Since f (c) < 0 has the opposite sign to f (b),
set a := c = 1.25 and keep b = 1.5.
1.25+1.5
Set c = a+b
2 = 2 = 1.375.
Compute f (c) = 1.3752 − 2 = −0.109375.
Since f (c) < 0 has the opposite sign to f (b),
set a := c = 1.375 and keep b = 1.5.
Since (b − a)/2 = (1.5 − 1.375)/2 √ = 0.0625 < 0.1, taking p∗ = 1.4375,
the midpoint of [a, b], means |p∗ − 2| < 0.0625 < 0.1.

In fact, |p∗ − 2| = 0.023 (2sf)

22
57 / 76

The bisection method-Example (Complete)



Example Estimate 2 to within 0.1.
Start with a0 = 1 and b0 = 2.
f (1) = 12 − 2 = −1
Set c0 = 1.5 with f (c0 ) = 0.25 > 0,
f (2) = 22 − 2 = 2
so f has a root in [a0 , c0 ] = [1, 1.5].
Update a1 = a0 = 1.0, b1 = c0 = 1.5. f (1.5) = 1.52 − 2
Set c1 = a1 +b
2
1
= 1.0+1.5
2 = 1.25 = 2.25 − 2 = 0.25
Compute f (c1 ) = −0.4375 < 0, f (1.25) = 1.252 − 2
so f has a root in [c1 , b1 ] = [1.25, 1.5]. = 1.5625 − 2
Update a2 = c1 = 1.25, b2 = b1 = 1.5. = −0.4375
Set c2 = a2 +b 2
= 1.25+1.5 = 1.375.
2 2 f (1.375) = 1.3752 − 2
Since f (c2 ) = f (1.375) = −0.109375 < 0,
= 1.890625 − 2
f has a root in [c2 , b2 ] = [1.375, 1.5].
= −0.109375
Update a3 = c2 = 1.375, b3 = b2 = 1.5.
Since (b3 − a3 )/2 = (1.5 − 1.375)/2 = 0.0625 < √ 0.1, taking p∗ = 1.4375,
the midpoint of [a3 , b3 ] = [1.375, 1.5], means |p∗ − 2| < 0.0625 < 0.1.

58 / 76

The bisection method

Implementation In file bisection root.m


function r=bisection_root(f,a,b,e)
% Solve f(x)=0 for x in [a,b] up to a tolerance of e.
assert a<b; assert e>0;
assert sign(f(a))==-sign(f(b));
while (b-a)/2 > e,
c=(a+b)/2;
if sign(f(c))==sign(f(a)),
then a=c;
else b=c;
endif
endwhile
r=(a+b)/2;
endfunction
Usage In a separate script file e.g. sqrt two.m
f=@(x)x^2-2; a=1; b=2; tol=0.1;
r=bisection_root(f, a, b, tol)
59 / 76

The bisection method

Convergence Since the error halves at each step, the method obtains an approximation to within tolerance ǫ in
n steps, where 21 (b − a)/2n < ǫ, or

n > log2 (b − a)/2ǫ = O(log2 (1/ǫ)).
Note log2 (x) = ln(x)/ ln(2) where ln is the natural logarithm.

Example To find a root of f in [1, 2] to tolerance ǫ = 0.1, need



n > log2 (2 − 1)/(2 × 0.1) = log2 (5) ≈ 2.3,
so take n = 3 steps.

23
60 / 76

The secant method

Idea Given approximations p, q to a root of f , approximate f by the line joining (p, f (p)) and (q, f (q)).

p r q

Obtain a better approximation r = S(f, p, q) to the root by finding where this line crosses the x-axis.
Starting from initial points p0 , p1 , iteratively compute p2 = S(f, p0 , p1 ), p3 = S(f, p1 , p2 ), . . ..

p0 p2 p1 p3 p2 p1

61 / 76

The secant method

Derivation

Line joining (p, f (p)) to (q, f (q)) has slope m = f (q) − f (p) /(q − p)
Line through (q, f (q)) with slope m has equation y = f (q) + m(x − q)
Setting y = 0 and solving for x = r gives
1
f (q) + m(r − q) = 0 ⇐⇒ r = q − m f (q).
Obtain intercept
q−p
r=q− f (q)
f (q) − f (p)
Algorithm Apply as an iterative algorithm. Start with p0 , p1 , and set
pn − pn−1
pn+1 = pn − f (pn ).
f (pn ) − f (pn−1 )

Bracketing The points pn , pn+1 do not need to bracket a root!

62 / 76

24
The secant method

Example Solve f (x) = x2 − 2 = 0. Start with p0 = 1, p1 = 2.


Secant method iterative formula: pn − pn−1
pn+1 = pn − f (pn )
f (pn ) − f (pn−1 )
We will work to machine precision, displaying intermediates to 3 decimal places.
Initial step computes p2 by taking n = 1 in general formula.
Need f (p0 ) = f (1) = −1.000 and f (p1 ) = f (2) = 2.000.
p1 − p0 2.000 − 1.000
p2 = p1 − f (p1 ) = 2.000 − × 2.000
f (p1 ) − f (p0 ) 2.000 − (−1.000)
1.000
= 2.000 − × 2.000 = 2.000 − 0.667 = 1.333 (3 dp)
3.000
Second step computes p3 by taking n = 2 in formula. Need f (p2 ) = −0.222.
p2 − p1 3 dp 1.333 − 2.000
p3 = p2 − f (p2 ) = 1.333 − × (−0.222)
f (p2 ) − f (p1 ) −0.222 − 2.000
−0.667
= 1.333 − (−0.222) = 1.333 − (−0.0667) = 1.400.
−2.222
63 / 76

The secant method

Example Solve f (x) = x2 − 2 = 0. Start with p0 = 1, p1 = 2.


p1 − p0 f (p0 ) = f (1.000) = −1.000
p2 = p1 − f (p1 )
f (p1 ) − f (p0 )
f (p1 ) = f (2.000) = 2.000
2.000 − 1.000
= 2.000 − × 2.000 = 1.333
2.000 − (−1.000) f (p2 ) = f (1.333) = 1.3332 − 2
p2 − p1 = 1.778 − 2 = −0.222
p3 = p2 − f (p2 )
f (p2 ) − f (p1 )
1.333 − 2.000 f (p3 ) = f (1.400) = 1.4002 − 2
= 1.333 − × (−0.222) = 1.400 = 1.9600 − 2 = −0.0400
−0.222 − 2.000
1.4000 − 1.3333 f (p4 ) = f (1.4146) = −0.00119
p4 = 1.4000 − × (−0.0400)
−0.0400 − (−0.2222)
f (p5 ) = f (1.414211)
= 1.4000 − (−0.0146) = 1.4146 (4 dp)
= −0.0000060
1.41463 − 1.40000
p5 = 1.41463 − × (0.00119)
0.00119 − (−0.0400)
= 1.41463 − 0.00042 = 1.41421 (5 dp)

64 / 76

Stopping criteria

Convergence Want to stop when |pn − p| < ǫ.

Problem Don’t know exact root p!


If convergence is rapid, expect |pn − p| ≪ |pn−1 − p|.
If |pn − p| = γ1 |pn−1 − p| with γ1 ≤ 12 , find |pn−1 − pn | ≥ |p − pn |.

Practical stopping heuristic Stop when |pn − pn−1 | < ǫ.

Error estimate For the heuristic |pn − pn−1 | < ǫ, expect |pn − p| . ǫ.

Error bound If also f (pn ) and f (pn−1 ) have different signs, then |pn − p| < ǫ.

25
65 / 76

Stopping criteria

Example Solve x2 − 2 = 0 to an accuracy of 0.01.


We’ve already computed (to 4 dp):
p3 = 1.4000, p4 = 1.4146, p5 = 1.4142.
Check differences:
|p4 − p3 | = |1.415 − 1.400| = 0.015 > 0.01
Need another step!
|p5 − p4 | = |1.4142 − 1.4146| = 0.0004 < 0.01.

So we can expect |p5 − 2| < 0.01.

Solution 2 ≈ p5 = 1.4142 (4 dp) = 1.41 (2 dp).
√ √
Note: Actual error |p5 − 2| = |1.4142 − 2| ≈ 2.1×10−4 ≪ 0.01.

66 / 76

The secant method

Implementation

function r=secant(f,p,q,e)
while abs(q-p) > e,
r = ... ;
p = q; q=r;
endwhile
endfunction
67 / 76

Newton-Raphson method

Idea Instead of using the secant line joining (p, f (p)) and (q, f (q)), use the tangent line at (p, f (p))
r
p

Tangent line at (p, f (p)) has equation y = f (p) + f ′ (p) (x − p).

Setting y = 0 and solving for r = x gives intercept at r = p − f (p)/f ′ (p).

Algorithm Apply iteratively:

f (pn )
pn+1 = pn − .
f ′ (pn )

p2
p1 p0 p1 p0

Stopping heuristic As for the secant method, stop when |pn − pn−1 | < ǫ.

26
68 / 76

Newton-Raphson method

Example Solve f (x) = x2 − 2 = 0 to an accuracy of 0.01.


Derivative f ′ (x) = 2x. Start with p0 = 1. Work to 4 decimal places.

f (p0 ) −1.0000
p1 = p0 − = 1.0000 − p0 = 1.0000
f ′ (p0 ) 2.0000
= 1.5000 f (p0 ) = 1.00002 − 2 = −1.0000;
f (p1 ) 0.2500 f ′ (p0 ) = 2 × 1.0000 = 2.0000.
p2 = p1 − = 1.5000 −
f ′ (p1 ) 3.0000 p1 = 1.5000
= 1.5000 − 0.0833 = 1.4167. f (p1 ) = 1.50002 − 2 = 0.2500;
Error estimate
f ′ (p1 ) = 2 × 1.5000 = 3.0000.
e2 := |p2 − p| . |p2 − p1 | = 0.083 > 0.01
Need another step! p2 = 1.4167
f (p2 ) 0.0069 f (p2 ) = 1.41672 − 2
p3 = p2 − = 1.4167 −
f ′ (p2 ) 2.8333 = 2.0069 − 2 = 0.0069;
= 1.4167 − 0.0025 = 1.4142. ′
f (p2 ) = 2 × 1.4167 = 2.8333.
Error estimate e3 . |p3 − p2 | = 0.0025 < 0.01

Solution 2 ≈ p3 = 1.4142 (4 dp) = 1.41 (2 dp).

69 / 76

Rounding effects

Rounding effects There is usually a small difference between rounded and exact computation.
For p1 = 1.5, the exact value of p2 = 17 5
12 = 1 12 = 1.416̇ = 1.4167 (4 dp )
577
Using exact arithmetic, find p3 = 408 = 1 169
408 = 1.41421568 · · · .
Taking p2 = 1.4167, using rounded arithmetic to 4 decimal places:
4 dp
f (p2 ) = f (1.4167) = 1.41672 − 2 = 2.0070 − 2 = 0.0070.
f ′ (p2 ) = f ′ (1.4167) = 2 × 1.4167 − 2.8334.
f (p2 ) 0.0070 4 dp
p3 = p2 − f ′ (p2 ) = 1.4167 − 2.8334 = 1.4167 − 0.0025 = 1.4142.
In this case, rounding the exact value of p3 gives the value computed using rounded arithmetic!
This is fairly common in iterative methods:
In iterative methods, rounding errors in early steps can be compensated for by using higher precision in later
steps!

70 / 76

Newton-Raphson method

Convergence analysis Let p∗ be the root. Then by Taylor’s theorem,


0 = f (p∗ ) = f (pn ) + f ′ (pn )(p∗ − pn ) + 12 f ′′ (ξ)(p∗ − pn )2 ,
so

pn+1 = pn − f (pn )/f ′ (pn ) = p∗ + f ′′ (ξ)/2f ′ (pn ) (p∗ − pn )2 .
Setting error ǫn = pn − p∗ gives
f ′′ (ξ) 2 f ′′ (p∗ ) 2
ǫn+1 = ǫ n ≈ ǫ = Cǫ2n .
2f ′ (pn ) 2f ′ (p∗ ) n
Error decays quadratically; very fast.

27
71 / 76

Comparison of methods

Reliability
+ The bisection method always works.
– The Newton-Raphson method and the secant method may cycle or diverge.

Requirements
+ The bisection and secant methods only require function values.
– The Newton-Raphson method requires the derivative of the function.

Efficiency
– The bisection method converges only linearly, ǫn+1 ∼ 12 ǫn
+ The Newton-Raphson method converges superlinearly at rate ǫn+1 ∼ Cǫ2n , the secant method
ǫn+1 ∼ Cǫ1.6n .
– Per evaluation of f or f ′ , the Newton-Raphson method is only O(ǫ1.4 ), slower than the secant method
O(ǫ1.6 ).
72 / 76

Parametrised equations (Non-examinable)

Problem Solve f (x, y) = 0 for y in terms of x at points (x0 , . . . , xn ).


Equivalently, solve fa (x) = 0 for x in terms of the parameter a.

Solution
1. Solve f (x0 , y) = 0 using the Newton-Raphson method (or the secant method) with arbitrary starting y to
find y0 .

2. Successively solve f (xi , y) = 0 to find yi , using the solution yi−1 for xi−1 to hot-start the method.

73 / 76

28
Parametrised equations (Non-examinable)

Solve f (x, y) = cos(x) − x + ex y + y 3 = 0 for y in terms of x.

Implementation
f=@(x,y)cos(x)-x+exp(x)*y+y*y*y,
dyf=@(x,y)exp(x)+3*y*y;
xmin=-4; xmax=+6;
h=0.1; tol=1e-8;
N=round((xmax-xmin)/h);
xs=linspace(xmin,xmax,N+1); ys=xs*NaN;
y=0;
for i=0:N,
x=xs(i+1); yp=-inf;
while abs(y-yp)>tol,
yn=y-f(x,y)/dyf(x,y);
yp=y; y=yn;
end;
ys(i+1)=y;
end;
plot(xs,ys)

74 / 76

Systems of equations (Non-examinable)

Systems of nonlinear equations Find a root of f : Rn → Rn .

Newton-Raphson method Generalises directly:


pn+1 = pn − Df (pn )−1 f (pn ).

Secant method Generalises to the simplex method.

75 / 76

Brent’s method (Non-examinable)

Problem The secant method and the Newton-Raphson method do not always converge!

Description Aim to keep bracketing properties of the bisection method with the fast convergence of the secant
method.

Idea If a secant step does not sufficiently reduce the size of the bracketing interval, use bisection.

Efficiency Don’t allow successive bisections.

76 / 76

29

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy