0% found this document useful (0 votes)
19 views43 pages

DR - Shoeb ME212 Lec-3

The document provides an introduction to numerical methods, focusing on binary representation of decimal numbers, floating point representation, and Taylor series. It explains how to convert decimal numbers to binary, the IEEE 754 standard for floating point representation, and includes examples and references for further reading. The content is aimed at students in mechanical engineering, particularly in the context of numerical methods.

Uploaded by

24304054joge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views43 pages

DR - Shoeb ME212 Lec-3

The document provides an introduction to numerical methods, focusing on binary representation of decimal numbers, floating point representation, and Taylor series. It explains how to convert decimal numbers to binary, the IEEE 754 standard for floating point representation, and includes examples and references for further reading. The content is aimed at students in mechanical engineering, particularly in the context of numerical methods.

Uploaded by

24304054joge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Introduction to

Numerical
Methods

Numerical Methods (ME 212)

Author: Dr. Shoeb Ahmed Syed


Papua New Guinea University of Technology
Department of Mechanical Engineering

1
Binary Representation
How a Decimal Number is
Represented

257.76 = 210 2 + 5101 + 7 100 + 7 10 −1 + 610 −2

3
Base 2

 (1 23 + 0 2 2 +1 21 +1 20 ) 


(1011.0011) 2 =  
 + (0 2 −1 + 0 2 −2 +1 2 −3 + 1 2−4 ) 
 10
= 11.1875

4
Convert Base 10 Integer to
binary representation
Table 1 Converting a base-10 integer to binary representation.
Quotient Remainder
11/2 5 1 = a0
5/2 2 1 = a1
2/2 1 0 = a2
1/2 0 1 = a3
Hence
(11)10 = (a3 a2 a1a0 ) 2
= (1011) 2

5
Start

Integer N to be
Input (N)10 converted to binary
format

i=0

Divide N by 2 to get
quotient Q & remainder R

i=i+1,N=Q ai = R

No
Is Q = 0?

Yes

n=i
(N)10 = (an. . .a0)2

STOP

6
Fractional Decimal Number
to Binary
Table 2. Converting a base-10 fraction to binary representation.
Number before
Number Number after decimal
decimal
0.1875  2 0.375 0.375 0 = a−1
0.375 2 0.75 0.75 0 = a−2
0.75 2 1.5 0.5 1 = a−3
0.5 2 1.0 0.0 1 = a−4

Hence
(0.1875)10 = (a−1a− 2 a− 3a− 4 )2
= (0.0011)2

7
Start

Fraction F to be
Input (F)10 converted to binary
format

i = −1

Multiply F by 2 to get
number before decimal,
S and after decimal, T

i = i − 1, F = T
ai = R

No
Is T =0?

Yes
n=i
(F)10 = (a-1. . .a-n)2

STOP

8
Decimal Number to Binary
(11.1875)10 = ( ?.? )2
Since
(11)10 = (1011) 2
and
(0.1875)10 = (0.0011) 2

we have
(11.1875)10 = (1011.0011) 2

9
All Fractional Decimal Numbers
Cannot be Represented Exactly
Table 3. Converting a base-10 fraction to approximate binary representation.
Number Number
Number after before
decimal Decimal
0.3 2 0.6 0.6 0 = a−1
0.6 2 1.2 0.2 1 = a−2
0.2  2 0.4 0.4 0 = a−3
0.4  2 0.8 0.8 0 = a−4
0.8 2 1.6 0.6 1 = a−5

(0.3)10  (a−1a−2 a−3a−4 a−5 ) 2 = (0.01001) 2 = 0.28125

10
Another Way to Look at
Conversion

Convert (11.1875)10 to base 2


(11)10 = 23 + 3
= 23 + 21 +1
= + 21 + 20
3
2
= 1 23 + 0 2 2 +1 21 +1 20
=
(1011)2
11
(0.1875)10 = 2−3 + 0.0625
−3
= 2 + 2 −4
= 0 2 −1 + 0 2 −2 +1 2 −3 +1 2 −4
= (.0011)2

(11.1875)10 = (1011.0011)2
12
Floating Point Representation
Floating Decimal Point : Scientific Form

256.78 is written as + 2.5678×10 2

0.003678 is written as + 3.678×10 −3

− 256.78 is written as − 2.5678 ×10 2


Example

The form is
sign × mantissa ×10exponent
or
σ × m ×10e
Example: For
− 2.5678×10 2
σ = −1
m = 2.5678
e=2
Floating Point Format for Binary
Numbers

y = σ × m × 2e
σ = sign of number (0for + ve,1for - ve)
m = mantissa [(1)<
2
m < (10)2 ]
1 is not stored as it is always given to be 1.
e = integer exponent
Example
9 bit-hypothetical word
▪the first bit is used for the sign of the number,
▪the second bit for the sign of the exponent,
▪the next four bits for the mantissa, and
▪the next three bits for the exponent

(54.75)10 = (110110.11) =2(1.1011011)2 × 25


≅ (1.1011)2 × (101)2
We have the representation as

0 0 1 0 1 1 1 0 1

mantissa exponent
Sign of the Sign of the
number exponent
Machine Epsilon
Defined as the measure of accuracy and found
by difference between 1 and the next number
that can be represented
IEEE 754 Standards for Single
Precision Representation
IEEE-754 Floating Point
Standard
• Standardizes representation of
floating point numbers on
different computers in single and
double precision.

• Standardizes representation of
floating point operations on
different computers.
One Great Reference

What every computer scientist (and even if


you are not) should know about floating point
arithmetic!

http://www.validlab.com/goldberg/paper.pdf
IEEE-754 Format Single
Precision

32 bits for single precision


0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Sign Biased Mantissa (m)


(s) Exponent (e’)

.
Value = (−1) × (1 m )2 ×2 e'−127
s
Exponent for 32 Bit IEEE-754
8 bits would represent
0 ≤ e′ ≤ 255
Bias is 127; so subtract 127 from
representation
−127 ≤ e ≤ 128
Exponent for Special Cases
Actual range of e′
1 ≤ e′ ≤ 254
e′ = 0 and e′ = 255 are reserved for special numbers

Actual range of e
−126 ≤ e ≤ 127
Special Exponents and Numbers
e′ = 0 all zeros
e′ = 255 all ones
s e′ m Represents
0 all zeros all zeros 0
1 all zeros all zeros -0
0 all ones all zeros ∞
1 all ones all zeros −∞
0 or 1 all ones non-zero NaN
IEEE-754 Format

The largest number by magnitude

(1.1........1)2 × 127
= 3.40×10 38

The smallest 2number by magnitude


(1.00......0)2 × 2 −126
= 2.18×10 −38

Machine epsilon
ε mach = 2−23 = 1.19 ×10−7
Taylor Series Revisited
What is a Taylor series?
Some examples of Taylor series which you must have
seen
x2
x 4 x6
cos(x) = 1− + − +
2! 4! 6!
x3 x5 x7
sin(x) = x − + − +
3! 5! 7!
x2 x3
e = 1+ x +
x
+ +
2! 3!
General Taylor Series
The general form of the Taylor series is given by
f  (x ) 2 f (x ) 3
f (x + h ) = f ( x ) + f (x )h + h + h+
2! 3!
provided that all derivatives of f(x) are continuous and
exist in the interval [x,x+h]

What does this mean in plain English?


As Archimedes would have said, “Give me the value of the function at
a single point, and the value of all (first, second, and so on) its
derivatives at that single point, and I can give you the value of the
function at any other point” (fine print excluded)
Example—Taylor Series
Find the value of f (6) given that f (4) = 125, f (4) = 74,
f  (4) = 30, f (4 ) = 6 and all other higher order derivatives
of f (x ) at x = 4 are zero.
Solution:
f (x + h ) = f (x ) + f (x )h +f  (x ) + f (x ) +
h2 h3
2! 3!
x=4
h=6−4= 2
References
1. http://numericalmethods.eng.usf.edu

2. http://numericalmethods.eng.usf.edu/topics/binary_repr
esentation.html

3.http://numericalmethods.eng.usf.edu/topics/floatingpoint_re
presentation.html

4. http://numericalmethods.eng.usf.edu/topics/taylor_seri
es.html
THE END

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy