DR - Shoeb ME212 Lec-3
DR - Shoeb ME212 Lec-3
Numerical
Methods
1
Binary Representation
How a Decimal Number is
Represented
3
Base 2
4
Convert Base 10 Integer to
binary representation
Table 1 Converting a base-10 integer to binary representation.
Quotient Remainder
11/2 5 1 = a0
5/2 2 1 = a1
2/2 1 0 = a2
1/2 0 1 = a3
Hence
(11)10 = (a3 a2 a1a0 ) 2
= (1011) 2
5
Start
Integer N to be
Input (N)10 converted to binary
format
i=0
Divide N by 2 to get
quotient Q & remainder R
i=i+1,N=Q ai = R
No
Is Q = 0?
Yes
n=i
(N)10 = (an. . .a0)2
STOP
6
Fractional Decimal Number
to Binary
Table 2. Converting a base-10 fraction to binary representation.
Number before
Number Number after decimal
decimal
0.1875 2 0.375 0.375 0 = a−1
0.375 2 0.75 0.75 0 = a−2
0.75 2 1.5 0.5 1 = a−3
0.5 2 1.0 0.0 1 = a−4
Hence
(0.1875)10 = (a−1a− 2 a− 3a− 4 )2
= (0.0011)2
7
Start
Fraction F to be
Input (F)10 converted to binary
format
i = −1
Multiply F by 2 to get
number before decimal,
S and after decimal, T
i = i − 1, F = T
ai = R
No
Is T =0?
Yes
n=i
(F)10 = (a-1. . .a-n)2
STOP
8
Decimal Number to Binary
(11.1875)10 = ( ?.? )2
Since
(11)10 = (1011) 2
and
(0.1875)10 = (0.0011) 2
we have
(11.1875)10 = (1011.0011) 2
9
All Fractional Decimal Numbers
Cannot be Represented Exactly
Table 3. Converting a base-10 fraction to approximate binary representation.
Number Number
Number after before
decimal Decimal
0.3 2 0.6 0.6 0 = a−1
0.6 2 1.2 0.2 1 = a−2
0.2 2 0.4 0.4 0 = a−3
0.4 2 0.8 0.8 0 = a−4
0.8 2 1.6 0.6 1 = a−5
10
Another Way to Look at
Conversion
(11.1875)10 = (1011.0011)2
12
Floating Point Representation
Floating Decimal Point : Scientific Form
The form is
sign × mantissa ×10exponent
or
σ × m ×10e
Example: For
− 2.5678×10 2
σ = −1
m = 2.5678
e=2
Floating Point Format for Binary
Numbers
y = σ × m × 2e
σ = sign of number (0for + ve,1for - ve)
m = mantissa [(1)<
2
m < (10)2 ]
1 is not stored as it is always given to be 1.
e = integer exponent
Example
9 bit-hypothetical word
▪the first bit is used for the sign of the number,
▪the second bit for the sign of the exponent,
▪the next four bits for the mantissa, and
▪the next three bits for the exponent
0 0 1 0 1 1 1 0 1
mantissa exponent
Sign of the Sign of the
number exponent
Machine Epsilon
Defined as the measure of accuracy and found
by difference between 1 and the next number
that can be represented
IEEE 754 Standards for Single
Precision Representation
IEEE-754 Floating Point
Standard
• Standardizes representation of
floating point numbers on
different computers in single and
double precision.
• Standardizes representation of
floating point operations on
different computers.
One Great Reference
http://www.validlab.com/goldberg/paper.pdf
IEEE-754 Format Single
Precision
.
Value = (−1) × (1 m )2 ×2 e'−127
s
Exponent for 32 Bit IEEE-754
8 bits would represent
0 ≤ e′ ≤ 255
Bias is 127; so subtract 127 from
representation
−127 ≤ e ≤ 128
Exponent for Special Cases
Actual range of e′
1 ≤ e′ ≤ 254
e′ = 0 and e′ = 255 are reserved for special numbers
Actual range of e
−126 ≤ e ≤ 127
Special Exponents and Numbers
e′ = 0 all zeros
e′ = 255 all ones
s e′ m Represents
0 all zeros all zeros 0
1 all zeros all zeros -0
0 all ones all zeros ∞
1 all ones all zeros −∞
0 or 1 all ones non-zero NaN
IEEE-754 Format
(1.1........1)2 × 127
= 3.40×10 38
Machine epsilon
ε mach = 2−23 = 1.19 ×10−7
Taylor Series Revisited
What is a Taylor series?
Some examples of Taylor series which you must have
seen
x2
x 4 x6
cos(x) = 1− + − +
2! 4! 6!
x3 x5 x7
sin(x) = x − + − +
3! 5! 7!
x2 x3
e = 1+ x +
x
+ +
2! 3!
General Taylor Series
The general form of the Taylor series is given by
f (x ) 2 f (x ) 3
f (x + h ) = f ( x ) + f (x )h + h + h+
2! 3!
provided that all derivatives of f(x) are continuous and
exist in the interval [x,x+h]
2. http://numericalmethods.eng.usf.edu/topics/binary_repr
esentation.html
3.http://numericalmethods.eng.usf.edu/topics/floatingpoint_re
presentation.html
4. http://numericalmethods.eng.usf.edu/topics/taylor_seri
es.html
THE END