CS137Part03 Floats Mathlib Root Finding Post
CS137Part03 Floats Mathlib Root Finding Post
−2.61202 · 1030
There are many different ways we can display these numbers using
the printf command. They in general have the format %± m.pX
where
• ± is the right or left justification of the number depending on
if the sign is positive or negative respectively
• m is the minimum field width, that is, how many spaces to
leave for numbers
• p is the precision (this heavily depends on X as to what it
means)
• X is a letter specifying the type (see next slide)
Conversion Specifications Continued
Write the code that displays the following numbers (Ensure you
get the white space correct as well!)
1. 3.14150e+10
2. 0436 (two leading white spaces)
3. 436 (three white spaces at the end)
4. 2.00001
IEEE 754 Floating Point Standard
• Number is
(−1)sign · fraction · 2exponent
(This is a bit of a lie but good enough for us - the details of
this can get messy. See Wikipedia if you want more
information)
(Picture courtesy of Wikipedia)
A Fun Aside
• From
a1 a2 a3 ak
0.1 = + + + ... + k + ...
2 4 8 2
• Multiplying by 2 yields
a2 a3 ak
0.2 = a1 + + + ... + k−1 + ...(Eqn1)
2 4 2
and so a1 = 0 since 0.2 < 1.
• Repeating gives
a3 a4 ak
0.4 = a2 + + + ... + k−2 + ...
2 4 2
and again a2 = 0.
Continuing
• From
a3 a4 ak
0.4 = 0 + + + ... + k−2 + ...
2 4 2
multiplying by 2 gives
a4 a5 ak
0.8 = a3 + + ... + k−3
2 4 2
and again a3 = 0. Doubling again gives
a5 a6 ak
1.6 = a4 + + ... + k−4
2 4 2
and so a4 = 1. Now, we subtract 1 from both sides and then
repeat to see that... (see next slide)
Continuing
a5 a6 ak
1.6 − 1 = + ... + k−4
2 4 2
a5 a6 ak
0.6 = + ... + k−4
2 4 2
a6 a7 ak
1.2 = a5 + + ... + k−4
2 4 2
giving a5 = 1 as well. At this point, subtracting 1 from both sides
gives
a6 a7 ak
0.2 = + ... + k−4
2 4 2
which is the same as (Eqn 1) from two slides ago and hence,
(0.1)10 = (0.00011)2
Short Hand
0.1 · 2 = 0.2
0.2 · 2 = 0.4
0.4 · 2 = 0.8
0.8 · 2 = 1.6
0.6 · 2 = 1.2
0.2 · 2 = 0.4
Be wary of...
• Subtracting nearly equal numbers
• Dividing by very small numbers
• Multiplying by very large numbers
• Testing for equality
An Example
and
• Given some a and b with f (a) > 0 and f (b) < 0, set
m = (a + b)/2.
• If f (m) < 0, set b = m.
• Otherwise, set a = m
• Loop until either |f (m)| < , |mn−1 − mn | < , or the number
of iterations has been met.
Bisection.h
Bisection.h
# ifndef BISECTION_H
# define BISECTION_H
/*
Pre : None
Post : Returns the value of x - cos ( x )
*/
double f ( double x );
/*
Pre : epsilon > 0 is a tolerance , iterations > 0 ,
f ( x ) has only one root in [a , b ] , f ( a ) f ( b ) < 0
Post : Returns an approximate root of f ( x ) using
bisection method . Stops when either number of
iterations is exceeded or | f ( m )| < epsilon
*/
double bisect ( double a , double b ,
double epsilon , int iterations );
# endif
Bisection.c (Note: Squished code at the bottom!)
# include < assert .h >
# include < math .h >
# include " bisection . h "
double f ( double x ){ return x - cos ( x );}
double bisect ( double a , double b ,
double epsilon , int iterations ){
double m = a ;
double fb = f ( b ); // Why is this a good idea ?
assert ( epsilon > 0.0 && f ( a )* f ( b ) < 0);
for ( int i =0; i < iterations ; i ++){
m = ( a + b )/2.0;
if ( fabs (b - a ) < epsilon ) return m ;
// Alternatively :
// if ( fabs ( f ( m )) < epsilon ) return m ;
if ( f ( m )* fb > 0) { b = m ; fb = f ( b );
} else { a = m ; }
}
return m ;}
Main.c
x0 = 0
g (x0 ) = 1
g (g (x0 )) = g (1) = 0.540
g (g (g (x0 ))) = g (g (1)) = g (0.540) = 0.858
g (g (g (g (x0 )))) = g (g (g (1))) = g (g (0.540)) = g (0.858) = 0.654
# ifndef BISECTION2_H
# define BISECTION2_H
double bisect2 ( double a , double b ,
double epsilon , int iterations ,
double (* f )( double ));
# endif
Bisection2.c
# include < assert .h >
# include < math .h >
# include " bisection2 . h "
double bisect2 ( double a , double b ,
double epsilon , int iterations ,
double (* f )( double )){
double m = a ;
double fb = f ( b );
assert ( epsilon > 0.0 && f ( a )* f ( b ) < 0);
for ( int i =0; i < iterations ; i ++){
m = ( a + b )/2.0;
if ( fabs (b - a ) < epsilon ) return m ;
// Alternatively :
// if ( fabs ( f ( m )) < epsilon ) return m ;
if ( f ( m )* fb > 0) { b = m ; fb = f ( b );
} else { a = m ; }
}
Main.c