0% found this document useful (0 votes)
11 views34 pages

Annotated

The document outlines the syllabus for CS 323, focusing on numerical analysis and computing, specifically rounding and error propagation. It discusses rounding modes, absolute and relative errors, and how errors propagate through arithmetic operations and function evaluations. The document also references examples and concepts from the Atkinson-Han textbook to illustrate these topics.

Uploaded by

Ring Za
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views34 pages

Annotated

The document outlines the syllabus for CS 323, focusing on numerical analysis and computing, specifically rounding and error propagation. It discusses rounding modes, absolute and relative errors, and how errors propagate through arithmetic operations and function evaluations. The document also references examples and concepts from the Atkinson-Han textbook to illustrate these topics.

Uploaded by

Ring Za
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

CS 323 — Numerical analysis and computing, Spring 2025

Rounding and error


propagation
Peng Zhang
2025-01-28
Today’s plan
• More class logistics (office hours, who to contact for what, grading scale)

• Rounding a number

• Rounding modes, absolute error, relative error

• Error propagation

• Ordinary arithmetic operations, function evaluation

• Reference: [Atkinson-Han] Chapter 2.1.2, 2.3


Rounding
• Numbers with infinite binary digits cannot be stored exactly in computers.

1
• Example: in base 10
10
1 −4 −5 −8 −9 −12 −13
• sum of powers of 2: = 2 + 2 + 2 + 2 + 2 + 2 + ⋯
10
Rounding
• Numbers with infinite binary digits cannot be stored exactly in computers.

1
• Example: in base 10
10
1 −4 −5 −8 −9 −12 −13
• sum of powers of 2: = 2 + 2 + 2 + 2 + 2 + 2 + ⋯
10
Right-hand-side
= (2−4
+ 2 ) + 2 (2 + 2 ) + 2 (2 + 2 ) + 2
−5 −4 −4 −5 −8 −4 −5 −12
(2
−4
+2 )+⋯
−5

= (2 + 2 ) (1 + 2 + 2 + 2
−4 −5 −4 −8 −12
+ ⋯)

= (2 + 2 ) ×
−4 −5 1
1 − 2−4 =
Rounding
• Numbers with infinite binary digits cannot be stored exactly in computers.

1
• Example: in base 10
10
1 −4 −5 −8 −9 −12 −13
• sum of powers of 2: = 2 + 2 + 2 + 2 + 2 + 2 + ⋯
10
• Base 2: 0.000110011001100… (1100 repeats)

• Floating-point format: 1.10011001100…x 2-4 (1100 repeats)


e
aX2 (19
,
<
2)
Rounding
e
• Floating-point format: σ × a × 2 (base b = 2)

• IEEE double precision: 1 bit for sign σ, 11 bits for exponent e


(−1022 ≤ e ≤ 1023), 52 bits for significand a (store 52 digits after the
binary point)

• Need to “round” 1.10011001100…x 2-4 to a nearby floating-point number

• x —> round(x)
Rounding modes in IEEE standard
Baddow
Y

↑o round

• Round down—round(x) is the largest floating-point number that round(x)≤x.


up

• Round up—round(x) is the smallest floating-point number that round(x)≥x.


• Round towards 0—round(x) is either round-down(x) or round-up(x),
whichever lies between 0 and x. Thus if x is positive then round(x) = round-
down(x), while if x is negative then round(x) = round-up(x).

• Round to nearest—round(x) is either round-down(x) or round-up(x),


whichever is closer. In case of a tie, it is the one whose least significant
(rightmost) bit is 0. & 11

can
only store I
digit
• Default: round to nearest down
found
: . 1
0
round
up 10X
:

round to nearest

Rounding examples in base 10


• Round to 2 digits for fraction part.

Round towards Round to


Round down Round up
0 nearest

10.459 10.45 10.46 10.45 10.46

13.322 13.32 13.33 13.32 13.32

-0.554 -0.56 -0.55 -0.55 -0.55


Rounding examples in IEEE double #2nd
1100 .....
-

11805000/100
• 01
Example: 1/10 in base 2: 1.10011001100…x 2-4 (1100 repeats)
-
I

-- ↑ Y
+ 1 53
• IEEE double precision: 52 bits for significand (“normalized”)

• Round down or round towards 0:

-
-

• Round up or round to nearest


Error
• Absolute error: |round(x) − x|
t
± e
• In IEEE double precision, if x = (1.a1a2…a52a53…) × 2 where
−1022 ≤ e ≤ 1023, then . .....
oh
0

0
20 Ol
.

--
.

• round(x) is either ± (1.a1a2…a52) × 2 or e


down
round
± (1.a1a2…a52 + 0.00…1) × 2 e up
~
52
↓ a

X
=

2
−52 e
• |round(x) − x| is less than 2-
× 2 for any rounding mode
Error
• Absolute error: |round(x) − x|

± e
• In IEEE double precision, if x = (1.a1a2…a52a53…) × 2 where
−1022 ≤ e ≤ 1023, then

• round(x) is either ± (1.a1a2…a52) × 2 or e

± (1.a1a2…a52 + 0.00…1) × 2 e

−52 e
• |round(x) − x| is less than 2 × 2 for any rounding mode
−52
2 is the machine epsilon for IEEE double precision
(the smallest ϵ > 0 such that 1 + ϵ is representable)
Error
• Absolute error: |round(x) − x|

± e
• In IEEE double precision, if x = (1.a1a2…a52a53…) × 2 where
−1022 ≤ e ≤ 1023, then

• round(x) is either ± (1.a1a2…a52) × 2 or e

± (1.a1a2…a52 + 0.00…1) × 2 e

−52 e
• |round(x) − x| is less than 2 × 2 for any rounding mode
−53 e
• For round to nearest, |round(x) − x| is less than 2 ×2
Error
• Relative error: |round(x) − x|/|x|

−52 e
• We already know |round(x) − x| <2 ×2

± e
• Note x = a × 2 , where 1 ≤ a < 2
−52 e
2 ×2 −52
• Thus, |round(x) − x|/|x| ≤ =2
2 e

−53
• For round to nearest, |round(x) − x|/|x| is less than 2
Error propagation
• Input errors -> calculations -> output errors

• Calculations:

• Ordinary arithmetic operations: +, − , × , ÷

• Function evaluations (aka, given x, compute f(x))


Ordinary arithmetic operations
• xA, yA: numbers used for calculation. xT, yT: true values (unknown)

• ω: operations+, − , × , ÷

• Propagated error: E = xT ωyT − xAωyA

• How to analyze propagated error:

• Relative error: bound the relative error of xAωyA

• Interval arithmetic: find an interval that is guaranteed to contain xT ωyT


Relative error
xT ωyT − xAωyA
Relative error Rel(xAωyA) =
• xT ωyT

• Want to bound Rel(xAωyA) using Rel(xA), Rel(yA)

• It is desirable if relative errors propagate slowly, that is, Rel(xA), Rel(yA)


both close to 0 implies Rel(xAωyA) close to 0.
Error propagation in multiplication
XA Ya
- y+ y + 2y + nx+
-

-
+ 2
Error propagation in multiplication

Usually, Rel(xA), Rel(yA)


≈ 0, for example, machine epsilon
Thus, Rel(xA)Rel(yA) ≪ Rel(xA) + Rel(yA).
Errors propagate slowly! Desirable!
Error propagation in multiplication
Error propagation in division is
similar. For example, see
Problem 9 in Exercise 2.3 of
[Atkinson-Han]

Usually, Rel(xA), Rel(yA)


≈ 0, for example, machine epsilon
Thus, Rel(xA)Rel(yA) ≪ Rel(xA) + Rel(yA).
Errors propagate slowly! Desirable!
Error propagation in addition and subtraction

• Relative errors don’t propagate slowly. Even if Rel(xA), Rel(yA) are close
to 0, Rel(xA + yA), Rel(xA − yA) can be far from 0. Undesirable!

• Example: xT = xA = 13, yT = 168 = 12.9614814..., yA = 12.961

• Rel(xA) = 0, Rel(yA) = 0.0000371…

• Error of xA - yA = -0.0004814…

• Rel(xA - yA) = -0.0125…

• | Rel(xA - yA) | >> |Rel(xA)| + |Rel(yB)|


Error propagation in addition and subtraction

• Relative errors don’t propagate slowly. Even if Rel(xA), Rel(yA) are close
to 0, Rel(xA + yA), Rel(xA − yA) can be far from 0. Undesirable!

• Example: xT = xA = 13, yT = 168 = 12.9614814..., yA = 12.961

• Rel(xA) = 0, Rel(yA) = 0.0000371… xT − yT − (xA − yA)


Rel(xA − yA) =
xT − yT
• Error of xA - yA = -0.0004814…
=
xT − xA − (yT − yA)
xT − yT
• Rel(xA - yA) = -0.0125… =
ϵ−η
.
xT − yT
• | Rel(xA - yA) | >> |Rel(xA)| + |Rel(yB)| If xT − yT ≈ 0, Rel(xA − yA) can be large.
Function evaluation
• Want to evaluate f(xT ), but can only use an approximation xA

• What is the approximation error f(xT ) − f(xA)?


-

• Easy case: f(x) = x + 5, f(x) = 2x

• Hard case: f(x) = log(x), f(x) = 10x7 + 8x6 + 4x5 + 19x3 + 10x2 + 1
Function evaluation

Mi
-
• Assume f is continuous and has a derivative f′ around xA.

• Main tool: the Mean Value Theorem. · -

2
O
Function evaluation
• Assume f is continuous and has a derivative f′ around xA.

• Main tool: the Mean Value Theorem.

• Take b = xT, a = xA

• f(xT ) − f(xA) = f′(c) ⋅ (xT − xA)


Function evaluation
• f(xT ) − f(xA) = f′(c)(xT − xA) for some c ∈ (xA, xT )

• In general, xA ≈ xT => c ≈ xA ≈ xT

• f′(c) ≈ f′(xA) ≈ f′(xT )

• f(xT ) − f(xA) ≈ f′(xA)(xT − xA) ≈ f′(xT )(xT − xA)


-X
retative error
=
XX
Example 2.3.2 in [Atkinson-Han]
• In chemistry, the ideal gas law: PV = nRT
• P: the pressure of the gas, V: the volume of the gas, n: the number of moles of
the gas, R: the ideal gas constant, and T: the temperature of the gas in Kelvin.

1
• Assume P = V = n = 1. Then, T =
R
−1 −1
• In the MKS measurement system (unit: J ⋅ K ⋅ mol ),
R = 8.3143 + ϵ, | ϵ | ≤ 0.0012

• Goal: evaluate T
Example 2.3.2 in [Atkinson-Han]
Example 2.3.2 in [Atkinson-Han]
Example 2.3.3 in [Atkinson-Han]
x
• f(x) = b , where b is a positive constant.
Example 2.3.3 in [Atkinson-Han]
x
• f(x) = b , where b is a positive constant.

• K = (log b) xT is called a condition number.


xA
• If K is large, then the relative error in b will be much larger than the relative
error in xA
Total calculation error
• When using floating-point format, the calculation of xAωyA has an
additional error, caused by rounding xAωyA to round(xAωyA).

• Total error: (assume first calculate xAωyA, then round)

• xT ωyT − round(xAωyA) = xT ωyT − xAωyA + (xAωyA − round(xAωyA))


• Total error = propagated error + rounding error
Total calculation error
• When using floating-point format, the calculation of xAωyA has an
additional error, caused by rounding xAωyA to round(xAωyA).

• Total error: (assume first calculate xAωyA, then round)

• xT ωyT − round(xAωyA) = xT ωyT − xAωyA + (xAωyA − round(xAωyA))


• Total error = propagated error + rounding error

• Correctly rounded: round(xAωyA) = (1 + ϵ)(xT ωyT ), where | ϵ | is less


−52
than the machine epsilon (e.g., in IEEE double precision, | ϵ | < 2 )
Interval arithmetic
• Suppose we know a range of xT − xA, yT − yA. We want to find an interval
that contains xT ωyT.

• Example: xA = 3.14,yA = 2.651, obtained by correctly rounding xT, yT to


the numbers of digits shown.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy