0% found this document useful (0 votes)
54 views93 pages

Numerical Analysis Guide

The document discusses numerical analysis and various computational methods. It introduces algebraic computation and computer arithmetic. It also describes finite difference methods and numerical solutions to linear and non-linear systems.

Uploaded by

Keamogetswe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views93 pages

Numerical Analysis Guide

The document discusses numerical analysis and various computational methods. It introduces algebraic computation and computer arithmetic. It also describes finite difference methods and numerical solutions to linear and non-linear systems.

Uploaded by

Keamogetswe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 93

TABLE OF CONTENT Page no

CHAPTER 1

 Algebraic Computation. 1-9


 Computer programming and errors. 10-16

CHAPTER 2

NON-LINEAR FUNCTIONS

 APPROXIMATION TO FUNCTIONS

 Non-linear algebraic and transcendental equations 17-20

 ROOTS OF NON-LINEAR FUNCTIONS


 Bisection method. 21-23
 Newton Raphson method and Secant method. 24-28
 False position method. 29-32
 Simple iteration method. 33-34

CHAPTER 3

FINITE DIFFERENCE METHODS

 FINITE DIFFERENCE # 1 35-40

 FINITE DIFFERENCE # 2

FORWARD,BACKWARD , CENTRAL DIFFERENCE NOTATIONS


 The Shift operator E . 40-40
 The Forward difference operator ∆ . 41-41
 The Backward difference operator ∇ . 41-42
δ
 The Central difference operator . 42-43
 Differences displays. 43-46
 Lagrange Interpolation formula. 46-48

CHAPTER 4

1|Page
 NUMERICAL SYSTEM TO LINEAR SYSTEM
 LU decomposition method. 49-54
 Gauss-Seidel method. 55-57
 Curve fitting. 58-66

CHAPTER 5

 MATHEMATICAL MODELLING
 First –Order Differential Equations and Applications. 67-76
 Difference Equations and Applications. 76-87
 REFERENCES 88

2|Page
SECTION A

CHAPTER 1

INTRODUCTION

Numerical Analysis is an introductory text for students of engineering, science, mathematics,


and computer science. Its goals are straightforward: to describe algorithms for solving
science and engineering problems, and to discuss the mathematical underpinnings of the
algorithms. Numerical Analysis is so rich with ideas that there is a clear danger of presenting
it as a bag of neat, but unrelated, tricks. For a deep understanding, it is essential for readers to
learn more than how to code Newton’s method, bisection method and Finite difference
method. They must absorb the big ideas, the ones that permeate numerical analysis and unify
its competing concerns.

1.1 ALGEBRAIC COMPUTATION

1.1.1 COMPUTER ARITHMETIC

This is a scientific area that refers to the study and development of algorithms and software
for manipulating mathematical expressions and other mathematical objects.

The Fundamentals of Algorithms

The study of algorithms has an ancient pedigree. The study of computers as we know them
may date back a hundred years or so. The study of algorithms dates back a millennia or more.
In fact the word algorithm is derived from the name of the great Islamic Mathematician,
Astronomer, Geographer and all-round polymath, Muhammad ibn Musa al-Khwarizmi, who
was a member of Dar Al-Hikmah (the House of Knowledge) in Baghdad in the 800s.
Algorithms are just precise ways of achieving some task. Follow the algorithm and job is
done. It may be a computer following the algorithm but it doesn't have to be. For most of
history it has been people doing so. Al-Khwarizmi was interested in algorithms for solving
algebraic equations and on calculation using our "modern" Hindu-Arabic positional number
system which he introduced to the western world. Back in the 9th century Islam, having ways
to do things such as calculating shares in inheritance was an important requirement of the
Qur'an. It was vital to be sure you had a way of calculating such things that was guaranteed to
get the right answer. That is what the study of algorithms is about, though algorithms can be
devised to do much more than simple algebra. Every computer gadget you ever used is
following algorithms to do whatever it does

Computer Scientists both invent algorithms and study their properties. Algorithms have been
devised to beat humans at games, fly planes, recognize faces, process DNA, send money
around the world, crack codes, navigate you home, control your washing machine, detect
your movements, write down the words you speak, paint works of art, write jokes, control
nuclear power plants ... You name it. Any individual program, in fact, will involve a whole
range of algorithms some simple, some complex.

Software is a program and other operating information used by a computer. This relates to
the use of machines, such as computers to manipulate mathematical equations.

3|Page
How computers communicates

Most of computers use ASCII code to represent text {words,sentences and paragraphs} which
makes it possible for the transfer of data from one computer to the other. ASCII is an
acronym for American Standard Code for Information Interchange {This is a character set not
a language}. This represents English characters as numbers, with each letter being assigned a
number from 0 to 127. For example ASCII code for upper case M is 77.

Computers can only understand numbers, so an ASCII code is the numerical representation of
a character such as 'a' or '@' or an action of some sort. ASCII was developed a long time ago
and now the non-printing characters are rarely used for their original purpose. Below is the
ASCII character table and this includes descriptions of the first 32 non-printing characters.
ASCII was actually designed for use with teletypes and so the descriptions are somewhat
obscure. If someone says they want your CV however in ASCII format, all this means is they
want 'plain' text with no formatting such as tabs, bold or underscoring - the raw format that
any computer can understand. This is usually so they can easily import the file into their own
applications without issues. Notepad.exe creates ASCII text, or in MS Word you can save a
file as 'text only'

Importance of ASCII Code

This helps in the process of allowing symbols represented on the keyboard to be printed on
the screen. All letters, digits, punctuation symbols and many more other things are given
codes. As an example ASCII code for “a” is 97 and for “A” is 65. If you open a notepad you
can demonstrate this , first make sure that Num-Lock is on . Hold down the alt key and
keeping it held down type 65 on the numeric keyboard { the one on the right of your
keyboard}. Let go of the alt key. An “A” should appear on the notepad. Try this for another
numbers between 0 and 255 you will be able to get characters and symbols that are not on
your keyboard.

The software in your computer is able to recognise what each key is defined to do. If you
change your software in your computer you can have your key board giving different
characters than the ones on your keyboard.

ASCII has limited set of characters and cannot support the Chinese and Japanese languages
as they have thousands of different characters. To use Chinese and Japanese characters you
need character set called Unicode which is on modern computers. Other code is EBCDIC
which is an acronym for Extended Binary Coded Decimal Interchange Code. Which does not
store characters in time and this can create problems alphabetising “words”. This EBCDIC
code is an IBM designed code for representing characters as numbers. IBM is an acronym for
International Business Machines, which is a leading U.S computer manufacturer.

The disadvantage of ASCII is that it is biased to English language, so other countries cannot
write programs in ASCII.

4|Page
Storage of characters

When you write 123 using text editor, the file does not store 123, instead it stores the ASCII
code for the character “1”; ”2”; ”3”which is 31, 32, 33 in hexadecimal or 0011 0001, 0010
0010, 0011 0011 in binary.

ASCII TABLE

Decimal # Binary # Octal # Hexadecimal # Character


Base 10 Base 2 Base 8 Base 16
0 0000 0 0 NUL

1 0001 1 1 SOH

2 0010 2 2 STX
3 0011 3 3 ETX

4 0100 4 4 EOT

5 0101 5 5 ENQ

6 0110 6 6 ACK

7 0111 7 7 BEL
8 1000 10 8 BS

9 1001 11 9 TAB

10 1010 12 A LF
11 1011 13 B VT
12 1100 14 C FF
13 1101 15 D CR
14 1110 16 E SO

15 1111 17 F SI

The above table is a block of 4 bits called nibble {this is half a byte} and can hold a
maximum number of 1111=15 in decimal.

The following table is a block of 8 bits called a byte and can hold a maximum number of
11111111=255 in decimal.

5|Page
Extended ASCII Codes

Most computers manipulate data in 8 bit-bytes for each character. From 0 to 127, characters
used to be stored as 7 bits. Bit is a single numeric value either “1” or “0” and a byte is
sequence of bits, usually 8bits=1 byte {e.g. 11000000 = 1byte}. Ever since Extended ASCII
is introduced which is 128 extra characters from 128 to 255 characters each character is
stored as 8 bits. The first 32 characters are control codes which Microsoft word does not
display on screen and are non-printable.

6|Page
An acronym for BITS is Binary Intelligent Transfer Service. The only way to understand
BITS is to compare them to something you know, which is digit:

A digit is a single place that can hold numerical values between 0 and 9. Computers happen
to use base 2 number system also known as binary system, reason being that it is easy to
implement them with the current electronic technology and are relatively cheap as compared
to base 10.

Digits are normally combined to together to create large numbers. For example 5,357 has
four digits. It is understood that 7 is filling the 1st (first) place or unit place, 5 the 10th (tenths)
place, 3 the 100s (hundreds) place and 6 the 1000s (thousands) place.

If you want to be explicit you could express this number as:

(5×1000) +¿ (3×100) +¿ (5×10) +¿ (7×1) ¿5,357

Another way to express this would be to use powers of 10 as:

(5×10 3) +¿ (3×10 2) +¿ (5×10 1) +¿ (7×10 0) ¿5,357

The above equation is in polynomial form of degree 3 {raised by base 10}, where “5”, “3”,
“5”, and “7” are the leading coefficient and “10” the base of a polynomial.

BITS have only two possible values: 0 and 1, therefore a binary number is composed of only
0s and 1s like 1011. How do you figure out what the value of the binary number is?

We do it similar way as we did for 5,357 but this time we use base 2 instead of base 10. Now
we have:

(1011)2 = 1× 23 +¿ 0× 22 +¿ 1× 21 +¿ 1× 20 ¿ 11

Why Computers Use Binary


Binary numbers – seen as strings of 0's and 1's – are often associated with computers. But
why is this? Why can't computers just use base 10 instead of converting to and from binary?
Isn't it more efficient to use a higher base, since binary (base 2) representation uses up more
"spaces"?

I was recently asked this question by someone who knows a good deal about computers. But
this question is also often asked by people who aren't so tech-savvy. Either way, the answer is
quite simple.

WHAT IS "DIGITAL"?
A modern-day "digital" computer, as opposed to an older "analog" computer, operates on the
principle of two possible states of something – "on" and "off". This directly corresponds to
there either being an electrical current present, or said electrical current being absent. The
"on" state is assigned the value "1", while the "off" state is assigned the value "0".

The term "binary" implies "two". Thus, the binary number system is a system of numbers
based on two possible digits – 0 and 1. This is where the strings of binary digits come in.

7|Page
Each binary digit, or "bit", is a single 0 or 1, which directly corresponds to a single "switch"
in a circuit. Add enough of these "switches" together, and you can represent more numbers.
So instead of 1 digit, you end up with 8 to make a byte. (A byte, the basic unit of storage, is
simply defined as 8 bits; the well-known kilobytes, megabytes, and gigabytes are derived
from the byte, and each is 1,024 times as big as the other. There is a 1024-fold difference as
opposed to a 1000-fold difference because 1024 is a power of 2 but 1000 is not.)

DOES BINARY USE MORE STORAGE THAN DECIMAL?


On first glance, it seems like the binary representation of a number 10010110uses up more
space than its decimal (base 10) representation 150. After all, the first is 8 digits long and the
second is 3 digits long. However, this is an invalid argument in the context of displaying
numbers on screen, since they're all stored in binary regardless! The only reason that 150 is
"smaller" than 10010110 is because of the way we write it on the screen (or on paper).

Increasing the base will decrease the number of digits required to represent any given
number, but taking directly from the previous point, it is impossible to create a digital circuit
that operates in any base other than 2, since there is no state between "on" and "off" (unless
you get into quantum computers... more on this later).

WHAT ABOUT OCTAL AND HEX?


Octal (base 8) and hexadecimal (base 16) are simply a "shortcut" for representing binary
numbers, as both of these bases are powers of 2. 3 octal digits = 2 hex digits = 8 binary digits
= 1 byte. It's easier for the human programmer to represent a 32-bit integer, often used for 32-
bit colour values, as FF00EE99 instead of11111111000000001110111010011001. Read the
Bitwise Operators article for a more in-depth discussion of this.

NON-BINARY COMPUTERS
Imagine a computer based on base-10 numbers. Then, each "switch" would have 10 possible
states. These can be represented by the digits (known as "bans" or "dits", meaning "decimal
digits") 0 through 9. In this system, numbers would be represented in base 10. This is not
possible with regular electronic components of today, but it is theoretically possible on a
quantum level.

Is this system more efficient? Assuming the "switches" of a standard binary computer take up
the same amount of physical space (nanometers) as these base-10 switches, the base-10
computer would be able to fit considerably more processing power into the same physical
space. So although the question of binary being "inefficient" does have some validity in
theory, but not in practical use today.

WHY DO ALL MODERN-DAY COMPUTERS USE BINARY THEN?


Simple answer: Computers weren't initially designed to use binary... rather, binary was
determined to be the most practical system to use with the computers we did design.

Full answer: We only use binary because we currently do not have the technology to create
"switches" that can reliably hold more than two possible states. (Quantum computers aren't
exactly on sale at the moment.) The binary system was chosen only because it is quite easy to
distinguish the presence of an electric current from an absence of electric current, especially
when working with trillions of such connections. And using any other number base in this

8|Page
system ridiculous, because the system would need to constantly convert between them. That's
all there is to it.

DECIMAL NUMBERS

The commonly used scientific digits are the so called real numbers. The basis
arithmetic operations performed by a computer are addition, subtraction, division
and multiplication. This real numbers are first converted into machine language
which consists of 0 and 1(binary system).

Representing decimal numbers in a polynomial form

( 9998 )10 can be represented as a polynomial in, the form

( 9998 )10=9× 103 +9 ×102 +9 × 101+ 8× 100

3 2 1 0
¿ a3 β + a2 β + a1 β +a0 β

where a 3=9 is called the leading coefficient and β is called the base of the
polynomial.

Definition: Let a 0 , a 1 , a2 , a3 , … , an be n+1 numbers with a n=0 then the function

P ( z )=¿ a n z n +a n−1 z n−1+ an−2 z n−2 +…+ a1 z +a 0.

is called a polynomial of degree n where also a n is called the leading coefficient.

BINARY SYSTEM

A non-negative integer N will be represented in the binary system as

N= ( an an−1 an−2 … a1 a0 )2

¿ a n 2n +a n−1 2n−1 +an −2 2n−2+ …+a 1 21 +a0 20

Where the coefficients a n are either 0 or 1. Note that N is again represented as a


polynomial, but now in the base 2. Many computer systems used in scientific work
operate internally in the binary system. We the users of computers, however, prefer
to work in the more familiar decimal system. It is therefore necessary to have some
means of converting from decimal to binary when the information is submitted to
the computer, and from binary for output purposes.

Conversion of a binary to decimal number may be accomplished directly

9|Page
( 11100 )2=1 ×2 4+ 1× 23+1 ×22 +0 × 21+ 0× 20=28

( 1101 )2=1 ×23 +1× 22 +0 ×21 +0 ×20 =13

HORNER’S ALGORITHM

Given the coefficients a 0 , a 1, a2 , a3 ,… , a n of the polynomial

P ( z )=¿ a n z n +a n−1 z n−1+ an−2 z n−2 +…+ a1 z +a 0.

and z=β (base of the polynomial).

Compute recursively the numbers b n , b n−1 , bn−2 , … , b0

b n=a n

b n−1=an−1 +b n z

b n−2=an−2 +b n−1 z

b n−3=an −3 +b n−2 z

⋮ ⋮⋮

b 0=a 0+ b1 z

The decimal equivalent of ( 1101 )2 computed using the Horner’s algorithm

b 3=1

b 2=1+ 2× 1=3

b 1=0+2 ×3=6

b 0=1+2× 6=13

Thisimplies that ( 1101 )2=( 13 )10

And the decimal equivalent of ( 10000 )2 is

b 4=1

b 3=0+2 ×1=2

b 2=0+2 ×2=4

b 1=0+2 × 4=8

10 | P a g e
b 0=0+2 ×8=16

This implies that ( 10000 )2=( 16 )10

Converting a decimal integer N into its binary equivalent can also be accomplished
by Horner’s algorithm.

Conversion of a decimal numbers to binary numbers

Example : Convert ( 156 )10 into binary system using the Horner’s algorithm.

( 156 )10=1 ×102 +5 ×10 1+6 × 100

First convert the following decimal digits into binary form

( 1 )10=( 1 )2

( 5 )10=( 101 )2

( 6 )10= (110 )2

( 10 )10=( 1010 )2

( 156 )10=1 ×102 +5 ×10 1+6 × 100

¿ ( 1 )2 × ( 1010 )22+ (101 )2 × ( 1010 )12 + ( 110 )2 × ( 1010 )02

Using the algorithm ( β=( 1010 )2)

b 2=a2= (1 )2

b 1=( 1111 )2

b 0=( 10011100 )2

Verify the answer ( 10011100 )2=( 156 )10

OCTAL SYSTEM

The octal system uses base ( β=8 )

N= ( an an−1 an−2 … a1 a0 ) 8

n n−1 n−2 1 0
¿ a n 8 + an−1 8 + an−2 8 + …+a 1 8 +a 0 8

Example : Convert the following decimal numbers into octal numbers.

11 | P a g e
a) ( 9978 )10 b) ( 998 )10

Solution:

( 9978 )10=9× 103 +9 ×102 +7 ×10 1+ 8× 100

We first convert the following decimal digits into octal digits

( 9 )10= (11 )8

( 8 )10= (10 )8

( 7 )10=( 7 )8

( 10 )10=( 12 )8

( 9978 )10=9× 103 +9 ×102 +7 ×10 1+ 8× 100

3 2 1 0
¿ ( 11 )8 × ( 12 )8+ ( 11 )8 × (12 )8+ (7 )8 × ( 12 )8+ (10 )8 × ( 12 )8

Where ( β=12)

b 3=a3 =( 11 )8

b 2=a2 +b 3 ( β )

¿ ( 11 )8 + ( 11 )8 × ( 12 )8

¿ ( 143 )8
b 1=a1 +b 2 ( β )

¿ ( 7 )8 + ( 143 )8 × ( 12 )8

¿ ( 1745 )8

b 0=a 0+ b1 ( β )

¿ ( 10 )8 + ( 1745 )8 × ( 12 )8

¿ ( 23372 )8

Verify the answer

( 23372 )8=2 ×8 4 +3 ×8 3+ 3× 82 +7 ×8 1+2 × 80

12 | P a g e
¿ 8192+1536+192+56+2 ¿ ( 9978 )10

EXERCISES:

1. Convert the following binary numbers to decimal number:

a) ( 1010 )2 b) ( 100110 )2 c) ( 100110101 )2

d) ( 11101111010 )2 e) ( 11101110 )2 f) ( 10101111010 )2

g) ( 111011110100011 )2

2. Convert the following decimal numbers to binary form using Horner’s

algorithm:

a) ( 4832 )10 b)( 4921 )10 c) ( 428 )10 d) ( 998 )10

e) ( 189 )10 f)( 46832 )10 g) ( 96832 )10 h) ( 668328 )10

3. Convert the following decimal numbers to octal form using Horner’s

algorithm:

a) ( 4832 )10 b)( 4921 )10 c) ( 428 )10 d) ( 998 )10

e) ( 189 )10 f)( 46832 )10 g) ( 96832 )10 h) ( 668328 )10

13 | P a g e
1.COMPUTER ARITHMETIC AND ERRORS

How can we define ‘error’ in a computation? In its simplest form, it is the


~
difference between the exact answer A , say, and the computed answer, A . Hence,
we can write,
~
ERROR= A− A .

Since we are usually interested in the magnitude or absolute value of the error we
can also define
~
ABSOLUTE ERROR=| A− A|

In practical calculations, it is important to obtain an upper bound on the error i.e. a


number, E , such that,

|~
A− A|< E

Clearly, we would like E to be small!

In practice we are often more interested in so-called ‘relative error’ than absolute
error and we define,

|~
A− A|
RELATIVE ERROR =
| A|

This is often expressed as a percentage. Hence, an ‘error’ of 10−5 may be a good or


bad ‘relative error’ depending on the answer. For example ,

−5
answer =1000 error=10 very good
−5
answer =1 error=10 good
−5 −5
answer =10 error=10 very bad

What are the possible sources of error in a computation?

1. Human error
2. Truncation error
3. Rounding error

A typical ‘human error’ is

14 | P a g e
 Arithmetic error
 Programming error

These errors can be very hard to detect unless they give obviously incorrect
solutions. In discussing errors, we shall assume that human errors are not present.

1.1 Truncation error

A truncation error is present when some infinite process is approximated by a


finite process. For example, consider the Taylor series expansion

x x2 xn
e =1+ x + + …+ + …
2! n!

If this formula is used to calculate f =e 0.1 we get:

( 0.1 )2 ( 0.1 )3
f =1+0.1+ + +…
2! 3!

Where do we stop calculation? How many terms do we include?


Theoretically the calculation will never stop. There are always more terms to
add on. If we do stop after a finite number of terms, we will not get the exact
answer. For example, if we take the first five terms as the approximation we
get,

( 0.1 )2 ( 0.1 )3 ( 0.1 )4 ~


f =1+0.1+ + + = f ≈ 1.105
2! 3! 4!

For this calculation, the truncation error TE (i.e. the sum of the terms that
have been chopped off) is,

~ − ( 0.1 )5 ( 0.1 )6
TE=f −f = − −…
5! 6!

The numerical analyst might try and estimate the size of the truncation error,
i.e. |TE|. In this example, we can easily get a rough estimate.

( 0.1 )5
( )
2
0.1 ( 0.1 ) ( 0.1 )3
|TE|= 1+ + + +…
5! 6 ! 6 × 7 6 × 7 ×8


( 0.1 )5
5! (1+0.1+
( 0.1 )2 ( 0.1 )3
+
1 ×2 1× 2× 3
+… )
( 0.1 )5 0.1 0.00001
≤ e ≅ ×1.105 ≈ 10−7
5! 120

15 | P a g e
∴ the error in truncating to five terms is approximately

10 at x=0.1.
−7

In general it is much harder to estimate the truncation error!

1.2 Rounding error


~
In order to introduce the idea of a rounding error, consider the calculation of f
above.
1=1

( 0.1 )
=0.1
1!

( 0.1 )2
=0.005
2!

( 0.1 )3
=0.000166 6̇
3!

( 0.1 )4
=0.00000416 6̇
4!
~
summing above=1.10517083 3̇=f
~
The exact answer to the truncated problem, f is an infinite string of digits and ,
as such, is not very useful. Since we know that it is in error in the seventh
decimal place we could round it to six or seven decimal places. For example,
rounding to six decimal places gives,
~
f ≅ 1.105171=f

where the usual rounding process has been adopted; namely, if the next figure
~
is 0,1,2,3 or 4 round down; 5,6,7,8 or 9 round up. The difference between f ∧f
~
f −f =0.00000016 6̇=ℜ

is the rounding error RE. Using the usual rounding process (and rounding to six
1−6
decimal places) the rounding error is always bounded by 2 10 . Thus , in
computing the answer,
0.1 ~
e ≈ f =1.105171

16 | P a g e
two errors are present and we have,
~ ~
ERROR= f −f =( f −f ) + ( f −f )

¿ ℜ+TE

|ERROR|≤|ℜ|+|TE|

1 1
≈ 10−6 +10−7 ≈ 10−6
2 2

Note that in this case the actual error is dominated by ROUNDING.

1.3 COMPUTER ARITHMETIC

Computers allocate a fixed amount of storage to every number they use. Each
number is stored as a string of digits. In practice, so-called floating point
numbers are used. A computer using four digit decimal arithmetic with floating
point numbers would store

37.31 as ( 0.3731,2 )=0.3731×10 2

0.00004717 as ( 0.4717 ,−4 )=0.4717× 10− 4

0.03 as ( 0.3000 ,−1 )=0.3 ×10−1

2
14.211 as ( 0.1421,2 )=0.1421 ×10

The number pair ( p , q ) is called a floating point number. pis called the
MANTISSA (or REAL PART) and q is the CHARACTERISTIC(or INDEX or
EXPONENT). The mantissa is always a fixed number of digits and the index
must lie some range. Typically

−256< INDEX < 256.

If the INDEX goes outside that range then we get underflow(less than -256) or
overflow (greater than 256). Some computers/systems automatically replace
underflow by the special number 0 (zero). Overflow always gives some sort of
error.

We note that the mantissa is always of the form 0. … and the digit after the
decimal point is always no-zero. Thus, in the third example above 0.03 is
stored as (0.3000,-1) and not as (0.0300,0). We also note that there is no
representation of zero. A computer normally has some special representation

17 | P a g e
for this number. We further note, as in the fourth example, that the
representation may not be exact.

Finally it should be remembered that in practice computers do not use decimal


numbers. They actually use binary numbers. There is often some error in
converting decimal numbers to or from binary numbers.

Rounding errors are therefore always present since we can never be certain that
a computation has been done exactly. For example, a computer working with
four digit, decimal, floating point arithmetic with,

A=( 0.3333,2 ) , B=(0.4625,3)

would compute

A+ B → ( +0.4958,3 ) ≠ A + B

A−B → (−0.4292,3 ) ≠ A−B

A × B → ( +0.1542,5 ) ≠ A × B

and none is exact.

In a more challenging computation like solving A x=b , where A is a 1000×


1000 matrix , there are millions of floating point calculations in all of which
small errors are present! Can we be certain of the result? How accurate is it?
Can we find algorithms that overcome the problem? All these questions are
considered by Numerical Analysts.

1.4 CANCELLATION ERROR

In many books ‘cancellation’ or more precisely ‘cancellation error’ is listed as


a fourth source of error. This is not strictly a new source of error but rather a
consequence of rounding and truncation errors leading to severe loss of
accuracy in certain circumstances. Suppose, for example, that we have two
numbers that we know really accurately,

( 1
a=0.642136 accurate ¿ 10−6
2 )

( 1
b=0.642125 accurate ¿ 10−6
2 )

18 | P a g e
Then,

a−b=0.000011

and this quantity will contain an error bounded by

|error ∈a|+|error ∈b|≤ 1× 10−6

The relative error in a−b is about 10% and is therefore unacceptably large when
the relative errors in a and b are only 0.0005%. Moreover, if the errors in the data,
1 −4
a and , were 10 then the answer would be meaningless!
2

1.5 Examples
Example 1

Using 3 digit floating point arithmetic find the answer of the calculation,

a+b∗c
M=
b+c

when a=11.13 , b=1.247∧c=−0.145 . Identify the rounding error at each stage of


the calculation and the total effect of rounding error.

Solution
The representation of a , b ,∧c as three digit floating point numbers are:

a ≔ ( +0.111 ,+2 ) rounding error=−0.0003

b ≔ ( +0.125 ,+1 ) rounding error=+ 0.003

c ≔ (−0.145,0 ) rounding error=0

Each calculation is performed as a series of single operations each with their


own rounding error. Thus we compute:
X ≔b∗c
Y ≔a+ X
Z ≔b+ c
Y
M≔
Z
and we obtain,
X ≔ (−0.181,0 ) rounding error =+0.00025

Y ≔ ( +0.109 ,+2 ) rounding error=−0.019

Z ≔ ( 0.111 ,+1 ) rounding error=+0.005

19 | P a g e
M ≔ (+ 0.982,+1 ) rounding error =+ 0.00018

Thus the computed answer is 9.82. The exact answer is 9.8812. Hence, the total
effect of rounding error, i.e. the computed value minus the exact value, is -0.06.

EXERCISES
~
1. If the exact answer is A and the computed answer is A , find the absolute
and relative error when
~
a) A=10.147 , A=10.159
~
b) A=0.0047 , A=0.0045
c) A=0.671× 1012 , ~
A=0.669× 10
12

2. Let a=0.471 ×10−2 ∧b=−0.185× 10− 4. Use 3 digit floating point arithmetic to
a∗b∧a
compute a+ b , a−b , b
.Find the rounding error in each case.

CHAPTER 2

Non-linear algebraic and transcendental equations


The first non-linear equation encountered in algebra courses is usually the
quadratic equation
2
a x +bx+ c=0

and all students will be familiar with the formula for its roots:

20 | P a g e
−b ± √ b2−4 ac
x=
2a

The formula for the roots of a general cubic is somewhat more complicated and
that for a general quartic usually takes several pages to describe! We are spared
further effort by a theorem which states that there is no such formula for general
polynomials of degree higher than four. Accordingly, except in special cases (for
example, when factorization is easy), we prefer in practice to use a numerical
method to solve polynomial equations of degree higher than two.

Another class of nonlinear equations consists of those which involve


transcendental functions such as e x , lnx , sinx∧tanx .

Useful analytic solutions of such equations are rare so that we are usually forced to
use numerical methods.

1. A transcendental equation

We shall use a simple mathematical problem to show that transcendental


equations do arise quite naturally. Consider the height of a liquid in a
cylindrical tank of radius r and horizontal axis, when the tank is a quarter
full (see Figure 2). Denote the height of the liquid by h (DB in the diagram).
The condition to be satisfied is that the area of the segment ABC should be
¼ of the area of the circle. This task reduces to
2
r θ
where is the area of the sector OAB, the triangle OAD. Hence
2

π π
2 θ−2 sinθcosθ= ∨x +cosx=0 , where x= −2 θ
2 2

since 2 sinθcosθ=sin 2 θ=sin ( π2 −x )=cosx

21 | P a g e
FIGURE 2.
Cylindrical tank (cross-section).

When we have solved the transcendental equation

f ( x )=x +cosx=0

[
π x
we obtain h from h=OB−OD=r−rcosθ=r 1−cos 4 − 2 ( )]

2. Locating roots

Let it be required to find some or all of the roots of the nonlinear f(x) = 0.
Before we use a numerical method (Bisection method, False position
method ,Newton Raphson method and Simple iteration method), we should
have some idea about the number, nature and approximate location of the
roots. The usual approach involves the construction of graphs and perhaps a
table of values of the function f, in order to confirm the information
obtained from the graph.

We will now illustrate this approach by a few examples.

a) sinx−x +0.5=0

If we do not have a calculator or computer available to immediately plot the


graph of f(x )= sin x - x + 0.5,

22 | P a g e
we can separate f into two parts, sketch two curves on a single set of axes,
and find out whether they intersect. Thus we sketch .
y=sinx∧ y=x−0.5 .Since |sinx|≤ 1, we are only interested in the interval -0.5
 x  1.5 (outside which |x - 0.5| > 1). Thus we deduce from Fig. 3 that the
equation has only one real root, near x =1.5 as follows:

x 1.5 1.45 1.49


sinx 0.9975 0.9927 0.9967
f ( x) -0.0025 0.0427 0.0067

We now know that the root lies between 1.49 and 1.50, and we can use a

numerical method to obtain a more accurate answer as is discussed in later

Steps.

b) e−0.2 x =x ( x−2 ) ( x−3 )

Again, we sketch two curves: y=e−0.2 x ∧ y =x ( x−2 ) ( x−3 )

In order to sketch the second curve, we use the three obvious zeros at x = 0,
2, and 3, as well as the knowledge that x(x - 2) (x - 3) is negative for x < 0
and 2 < x < 3, but positive and increasing steadily for x > 3. We deduce from
the graph (Fig. 4) that there are three real roots, near x = 0.2, 1.8, and 3. 1,
and tabulate as follows (with f ( x )=e−0.2 x −x ( x −2 )( x−3 )) :

x 0.2 0.15 1.8 1.6 3.1 3.2


e−0.2 x
0.9608 0.9704 0.6977 0.7261 0.5379 0.5273
x ( x−2 )( x−3 ) 1.0080 0.7909 0.4320 0.8960 0.3410 0.7680
f (x) -0.0472 0.1796 0.2657 -0.1699 0.1969 -0.2407

We conclude that the roots lie between 0.15 and 0.2, 1.6 and 1.8, and 3.1
and 3.2, respectively. Note that the values in the table were calculated to an
accuracy of at least 5SD. For example, working to 5S accuracy, we have f
(0.15) = 0.97045- 0.79088= 0.17957, which is then rounded to 0.1796. Thus
the entry in the table for f(0.15) is 0.1796 and not 0.1795 as one might
expect from calculating 0.9704 - 0.7909.

23 | P a g e
EXERCISES

 Locate the roots of the equation x+cos x=0.


 Use curve sketching to roughly locate all the roots of the equations:

a) x + 2 cos x = 0.

b) x + ex= 0.

c) x(x - 1) - ex= 0.

d) x(x - 1 - sin x = 0.

The bisection method


The bisection method, suitable for implementation on a computer allows to find
the roots of the equation f (x) = 0, based on the following theorem:

Theorem: If f is continuous for x between a and b and if f (a) and f(b) have
opposite signs, then there exists at least one real root of f (x) = 0 between a and b.

1. Procedure: Suppose that a continuous function f is negative at x = a and


positive at x = b, so that there is at least one real root between a and b. (As

24 | P a g e
a rule, a and b may be found from a graph of f.) If we calculate f ( )
a+b
2
,
which is the function value at the point of bisection of the interval

a< x<b , there are three possibilities:

1. f ( a+b2 )=0, in which case ( a+b2 ) is the root;


f( )
a+b ( a+b )
2. <0, in which case the root lies between andb ;
2 2
f(
2 )
a+b ( a+b )
3. >0, in which case the root lies between a and .
2

Presuming there is just one root, in Case 1 the process is terminated. In


either Case 2 or Case 3, the process of bisection of the interval containing
the root can be repeated until the root is obtained to the desired accuracy. In
Figure 5, the successive points of bisection are denoted by x1 , x2, and x3.

2. Effectiveness: The bisection method is almost certain to give a root.


Provided the conditions of the above theorem hold; it can only fail if the
accumulated error in the calculation of f at a bisection point gives it a small
negative value when actually it should have a small positive value (or vice
versa); the interval subsequently chosen would then be wrong.

This can be overcome by working to sufficient accuracy, and this almost-


assured convergence is not true of many other methods of finding roots.

One drawback of the bisection method is that it applies only to roots of f


about which f (x) changes sign. In particular, double roots can be
overlooked; one should be careful to examine f(x) in any range where it is
small, so that repeated roots about which f (x) does not change sign are

25 | P a g e
otherwise evaluated (for example, see Steps 9 and 10). Of course, such a
close examination also avoids another nearby root being overlooked.

Finally, note that bisection is rather slow; after n iterations the interval
( b−a )
containing the root is of length . However, provided values of f can be
2n
generated readily, as when a computer is used, the rather large number of
iterations which can be involved in the application of bisection, is of
relatively little consequence.

3. Example

a) Solve 3xex = 1 to three decimal places by the bisection method. Consider


f(x) = 3x - ex, which changes sign in the interval 0.25 < x < 0.27: one
tabulates (working to 4D ) as follows:

x 3x ex f(x)
0.25 0.7 0.7788 -0.0288
5
0.27 0.8 0.7634 0.0466
1

(Ascertain graphically that there is just one root!)

Denote the lower and upper endpoints of the interval bracketing the root at
the n -th iteration by a n and b n , respectively (with a 1=0.25 and b 1=0.27
). Then the approximation to the root at the n-th iteration is given by
x n= ( a +b2 ). Since the root is either in[ a , b ] or [ x ,b ] and both intervals are
n n
n n n n

( bn −an )
of length , we see that x n will be accurate to three decimal places
2
( bn −an )
when < 5 10-4. Proceeding to bisection:
2

n an bn ( a n+ bn ) 3 xn e
− xn
f ( xn )
x n=
2
1 0.25 0.27 0.26 0.78 0.7711 0.0089
2 0.25 0.26 0.255 0.765 0.7749 -0.0099
3 0.255 0.26 0.2575 0.7725 0.7730 -0.0005
4 0.2575 0.26 0.2588 0.7763 0.7720 0.0042
5 0.2575 0.258 0.2581 0.7744 0.7725 0.0019
8
6 0.2575 0.258 0.2578
1

26 | P a g e
(Note that the values in the table are displayed to only 4D.) Hence the root
accurate to three decimal places is 0.258.

b) Use the bisection method to solve f ( x )=x 2−3. Let ε step=0.01 and ε ¿ =0.01 ¿
||

and start with the interval [1,2].

n an bn f(a n) f(b n) a n+ b n f(c n) update Width


c n=
2 b n−a n
1 1.0 2.0 -2.0 1.0 1.5 -0.75 a n=c n 0.5
2 1.5 2.0 -0.75 1.0 1.75 0.062 b n=c n 0.25
3 1.5 1.75 -0.75 0.0625 1.625 -0.359 a n=c n 0.125
4 1.625 1.75 -03594 0.0625 1.6875 -0.1523 a n=c n 0.0625
5 1.6875 1.75 -0.1523 0.0625 1.7188 -0.0457 a n=c n 0.0313
6 1.7188 1.75 -0.0457 0.0625 1.7344 0.0081 b n=c n 0.0156
7 1.7188 1.7344 -0.0457 0.0081 1.7266 -0.0189 a n=c n 0.0078

Thus, with the seventh iteration , we note that the final interval
[1.7266,17344] has a width less than 0.01 and |f (1.7344)|<0.01 and
therefore we choose b=1.7344 to be our approximation of the root.

EXERCISES

a. Use the bisection method to find the root of the equation x+cosx = 0.

correct to two decimal places (2D ).

b. Use the bisection method to find to 3D the positive root of the equation

x - 0.2sinx - 0.5=0.

c. Each equation in Exercises 2(a)-2(c) above has only one root. For each
equation use the bisection method to find the root correct to 2 D.

1
d. Use the bisection method to solve f ( x )=x + x −3 sinx with the

interval [ 0.7 , 0.9 ] , work to 4-decimal.

27 | P a g e
The Newton-Raphson iterative method
The Newton-Raphson method is suitable for implementation on a computer . It is
a process for the determination of a real root of an equation f (x) = 0, given just one
point close to the desired root. It can be viewed as a limiting case of the secant
method or as a special case of simple iteration .

1. Procedure

Let x0 denote the known approximate value of the root of f(x) = 0 and h the
difference between the true value α and the approximate value, i.e.,

α =x 0 +h

The second degree, terminated Taylor expansion about x0 is


2
' h ''
f ( α )=f ( x 0+ h ) =f ( x 0 ) +h f ( x 0 ) + f (ξ )
2!

where ξ=x 0 +θh , 0<θ<1, lies between α ∧x 0.

Ignoring the remainder term and writing f ( α )=0.

' f ( x0)
f ( x 0 ) + h f ( x0 ) ≈ 0 ,whence h ≈− ' and consequently,
f ( x 0)

f (x 0)
x 1=x 0− '
f ( x0 )

should be a better estimate of the root than x0. Even better approximations may be
obtained by repetition (iteration) of the process, which then becomes

f ( xn)
x n+1=x n −
f ' ( x n)

28 | P a g e
The geometrical interpretation is that each iteration provides the point at which the
tangent at the original point cuts the x-axis (Figure 9). Thus the equation of the
tangent at (xn, f (xn)) is

y - f(x0) = f '(x0)(x - x0)

so that (x1, 0) corresponds to

-f(x0) = f '(x0)(x1 - x0),

f ( x 0)
whence x1 = x0 - ' .
f (x 0 )

2. Example

We will use the Newton-Raphson method to find the positive root of the equation
sin x = x2, correct to 3D.

It will be convenient to use the method of false position to obtain an initial


approximation. Tabulation yields

x f ( x )=sinx−x
2

0 0
0.25 0.1849
0.5 0.2294
0.75 0.1191
1 −¿0.1585

With numbers displayed to 4D, we see that there is a root in the interval
0.75 < x < 1 at approximately

x 0=
1
|
0.75 0.1191
−0.1585−0.1191 1 −0.1585 |
1 0.2380
¿− (−0.1189−0.1191 )= =0.8573
0.2777 0.2777

Next, we will use the Newton-Raphson method:

29 | P a g e
2
f ( 0.8573 )=sin ( 0.8573 )−(0.8573)

¿ 0.7561−0.7349=0.0211

and

f ' ( x )=cosx −2 x

yielding

f ' ( 0.8573 ) =0.6545−1.7145=−1.0600

Consequently, a better approximation is

0.0211
x 1=0.8573+ =0.8573+0.0200=0.8772
1.0600

Repeating this step, we obtain f ( x 1 ) =f ( 0.8772 )=−0.0005

' '
and f ( x 1 )=f ( 0.8772 ) =−1.1151

0.0005
so that x 2=0.8772− 1.1151 =0.8772−0.0005=0.8767

Since f(x2) = 0.0000, we conclude that the root is 0.877 to 3D.

3. Convergence

f (x)
If we write ∅ ( x ) =x− '
, the Newton-Raphson iteration expression
f (x )

f ( xn )
x n+1=x n −
f ' ( x n)

may be rewritten

x n+1=∅ ( xn )

We have observed that, in general, the iteration method converges when


|∅' (x)|< 1 near the root. In the case of Newton-Raphson, we have
2
' [ f ' ( x ) ] −f ( x ) f ' ' ( x ) f ( x ) f ' ' ( x )
∅ (x)=1− 2
= 2
[ f ' ( x )] [ f ' ( x )]

30 | P a g e
so that the criterion for convergence is
2
|f ( x) f ' ' ( x )|< [ f ' ( x ) ]
i.e., convergence is not as assured as, say, for the bisection method.

4. Rate of convergence

The second degree terminated Taylor expansion about xn is


2
' en ' '
f ( α )=f ( x n+ e n) =f ( x n ) + en f ( x n ) + f (ξ n)
2!

where e n=α−x n is the error at the n-th iteration and

ξ n=x n +θ e n,0<θ <1.

Since f ( α )=0 , we find


2 ''
f (x n ) e n f (ξ n )
0= '
+ ( α −x n ) + '
f (x n) 2 f ( xn)

But, by the Newton-Raphson formula,

f (x n ) e2n f '' (ξ n )
0= + ( α −x n ) +
f ' (x n) 2 f ' ( xn)

whence the error at the (n + 1)-th iteration is

e n+1=α −x n+1

2 ''
en f (ξ n)
¿− '
2 f ( xn)

2 ''
en f (α )
≈− '
2 f (α )

'' '
f (α )≈ 4 f (α )

provided en is sufficiently small.

This result states that the error at the (n + 1)-th iteration is proportional to the
square of the error at the nth iteration; hence, if f '' (α )≈ 4 f ' (α ), an answer correct to
one decimal place at one iteration should be accurate to two places at the next
iteration, four at the next, eight at the next, etc. This quadratic - second-order
31 | P a g e
convergence - outstrips the rate of convergence of the methods of bisection and
false position!

In relatively little used computer programs, it may be wise to prefer the methods of
bisection or false position, since convergence is virtually assured. However, for
hand calculations or for computer routines in constant use, the Newton-Raphson
method is usually preferred.

5. The square root

One application of the Newton-Raphson method is in the computation of square


roots. Since a½ is equivalent to finding the positive root of x2 = a. i.e.,

f(x) = x2 - a = 0.

Since f '(x) = 2x, we have the Newton-Raphson iteration formula:

xn+1 = xn
2
−(x n−a)
2 xn
=
xn+
a
xn , ( )
2

a formula known to the ancient Greeks. Thus, if a = 16 and x0 = 5, we find to 3D

x1 = (5 + 3.2)/2 = 4.1, x2 = (4.1 + 3.9022)/2 = 4.0012, and x3 = (4.0012 + 3.9988)/2


= 4.0000.

EXERCISES

1. Use the Newton-Raphson method to find to 4S the (positive) root of 3xex=1?

2.Derive the Newton-Raphson iteration formula

( xnk −a )
x n+1=x n − k−1
k xn

for finding the k-th root of a.

3. Compute the square root of 10 to 5 significant digits from an initial guess.

4. Use the Newton-Raphson method to find to 4D the root of the equation

x cos x = 0.

32 | P a g e
Method of false position
As mentioned in the Prologue, the method of false position dates back to the
ancient Egyptians. It remains an effective alternative to the bisection method for
solving the equation f(x) = 0 for a real root between a and b, given that f (x) is
continuous and f (a) and f(b) have opposite signs. The algorithm is suitable for
automatic computation .

1. PROCEDURE

The curve y = f(x) is not generally a straight line. However, one may join the
points (a,f(a)) and (b,f(b)) by the straight line

y−f (a) x−a


=
f ( b ) −f ( a) b−a

Thus straight line cuts the x-axis at (X, 0) where

y−f (a) x−a


=
f ( b ) −f ( a) b−a

y−f (a) x−a


so that =
f ( b ) −f ( a) b−a

Suppose that f(a) is negative and f(b) is positive. As in the bisection method, there
are the three possibilities :

1. f(x) = 0, when case x is the root ;


2. f(x) < 0, when the root lies between x and b ;
3. f(x)>0, when the root lies between x and a.

Again, in Case 1, the process is terminated, in either Case 2 or Case 3, the process
can be repeated until the root is obtained to the desired accuracy. In Fig. 6, the
successive points where the straight lines cut the axis are denoted by x1, x2, x3.

33 | P a g e
2. EFFECTIVENESS AND THE SECANT METHOD

Like the bisection method, the method of false position has almost assured
convergence, and it may converge to a root faster. However, it may happen that
most or all the calculated values of X are on the same side of the root, in which
case convergence may be slow (Fig. 7). This is avoided in the secant method,
which resembles the method of false position except that no attempt is made to
ensure that the root is enclosed; starting with two approximations to the root
(x0, x1), further approximations x2, x3,… are computed from

x n−x n−1
x n+1=x n −f (x n )
f (x n)−f ( x n−1)

There is no longer convergence, but the process is simpler (the sign of f(xn+1) is not
tested) and often converges faster.

With respect to speed of convergence of the secant method, one has at the (n+1)th
step:

34 | P a g e
Hence, expanding in terms of the Taylor series,

e n+1=
e n−1
[ '
( )
f ( α ) −e n f ( α ) +
e 2n ' '
2!
f ( α )−…
] −
[ '
e n f ( α ) −e n−1 f ( α )+ ( )
e 2n−1 ' '
2!
f ( α )−…
]
[ f ( α ) −e n f ' ( α ) +… ] −[ f ( α )−en−1 f ' ( α ) +… ] [ f ( α )−e n f ' ( α ) +… ]−[ f ( α )−en −1 f ' ( α )+ … ]

[ ]
''
f (α)
≈− '
en −1 e n
2 f (α )

where we have used the fact that f()=0. Thus we see that en+1 is proportional to
enen-1, which may be expressed in mathematical notation as

e n+1 ≈ e n−1 e n

We seek k such that

2 ( 1+ √ 5 )
e n+1 ≈ e kn ≈ e kn , en−1 e n ≈ e k+ 1 2
n−1 ,⟹ k ≈ k +1 , ⟹ k ≈ ≈ 1.618 .
2

35 | P a g e
Hence the speed of convergence is faster than linear (k =1 ), but slower than
quadratic (k=2). This rate of convergence is sometimes referred to as superlinear
convergence.

3. EXAMPLE

Use the method of false position to solve

3 x e x =1 , stopping when|f ( x n )|<5∗10−6 with f ( x )=3 x−e−x .

Then

f (x1) =f (0.257637) = 3(0.257637) −¿ 0.772875 = 0.772912 - 0.772875 =


0.000036.

The student may verify that doing one more iteration of the method of false
position yields an estimate x2 = 0.257628 for which the function value is less than
5*10-6. Since x1 and x2 agree to 4D, we conclude that the root is 0.2576, correct to
4D.

EXERCISES

a. Use the method of false position to find the smallest root of the equation
f (x) = 2 sin x + x - 2 = 0, stopping when

|f ( x n)|< 5∗10−5.
b. Compare the results obtained when you use
i. the bisection method,
ii. the method of false position, and
iii. the secant method

with starting values 0.7 and 0.9 to solve the equation

3sin x = x + 1/x.

iv. Use the method of false position to find the root of the equation

f ( x ) ≡ x +cosx=0, stopping when |f ( x n)|< 5∗10−6.

36 | P a g e
The method of simple iteration
The method of simple iteration involves writing the equation f(x) = 0 in a form

x = f(x), suitable for the construction of a sequence of approximations to some root

in a repetitive fashion.

1. Procedure

The iteration procedure follows: In some way, we obtain a rough approximation x0


of the desired root, which may then be substituted into the right-hand side to give a
new approximation, x1=(x0). The new approximation is again substituted into the
right-hand side to give a further approximation x2=(x1), etc., until (hopefully) a
sufficiently accurate approximation to the root is obtained. This repetitive process,
based on xn+1 = (xn) is called simple iteration; provided that |xn+1 - xn| decreases as n
increases, the process tends to  = (), where  denotes the root.

2. Example

Use simple iteration to find the root of the equation

3xex = 1

to an accuracy of 4D. One first writes x = e-x/3 = (x).

37 | P a g e
Assuming x0 = 1, successive iterations yield

x1 = 0.12263, x2 = 0.29486,
x3 = 0.24821, x4 = 0.26007,
x5 = 0.25700, x6 = 0.25779,
x7 = 0.25759, x8 = 0.25764.

Thus, we see that after eight iterations the root is 0.2576 to 4D. A graphical
interpretation of the first three iterations is shown in Fig. 8.

3. Convergence

Whether or not an iteration procedure converges or indeed at all, depends on the


choice of the function (x) as well as the starting value x0. For example, the
equation x² = 3 has two real roots ± √ 3 ( ¿ ± 1.732 ) .It can be given the form

x = 3/x = (x)

which suggests the iteration xn+1 = 3/xn.

However, if the starting value x0 = 1 is used, the successive iterations yield


3 3 3
x 1= =3 , x 2= =1 , x 3 = =3 , etc !
x0 x1 x2

We can examine the convergence of the iteration process

x =  (xn) to  ()

with the aid of the Taylor series

∅ ( α )=∅ ( x k ) + ( α −x k ) ∅ ' ( ζ k ) , k=0,1 , … , n ,

where k is a point between the root and the approximation xk. We have
'
α −x 1=∅ ( α )−∅ ( x 0 )=( α−x 0 ) ∅ ( ζ 0 )

α −x 2=∅ ( α )−∅ ( x 1) =( α −x 1 ) ∅ ' ( ζ 1)

. .

. .

. .

38 | P a g e
'
α −x n+1 =∅ ( α )−∅ ( x n )= ( α −x n ) ∅ ( ζ n )

Multiplying the n + 1 rows together and cancelling the common factors x1,
x2, ··· , xn leaves |α−x n +1|=|α−x 0||∅ ( ζ 0 )||∅ ( ζ 1 )|…|∅ ( ζ n )| ,
' ' '

whence

|α−x n +1|=|α−x 0||∅' ( ζ 0 )||∅' ( ζ 1 )|…|∅' ( ζ n )| ,

so that the absolute error |xn+1| can be made as small as we please by sufficient
iteration if | '|< 1 in the neighbourhood of the root.

Note that (x) = 3/x has derivative | '(x)| = |-3/x²| > 1 for |x| < 3½.

1. Assuming x0 = 1, show by simple iteration that one root of the equation

2x - 1 -2sinx = 0 is 1.4973.

2. Use simple iteration to find (to 4D) the root of the equation x + cos x = 0.

CHAPTER 3

FINITE DIFFERENCES 1

Tables

Historically speaking, numerical analysts have always been concerned with tables
of numbers, and many techniques have been developed for dealing with
mathematical functions, represented in this way.

For example, the value of the function at an untabulated point may be required, so
that a interpolation is necessary. It is also possible to estimate the derivative or
the definite integral of a tabulated function, using some finite processes to
approximate the corresponding (infinitesimal) limiting procedures of calculus. In
each case, it has been traditional to use finite differences. Another application of
finite differences, which is outside the scope of this book, is the numerical
solution of partial differential equations.

1. Tables of values

39 | P a g e
Many books contain tables of mathematical functions. One of the most
comprehensive is the Handbook of Mathematical Functions, edited by
Abramowitz and Stegun (see the Bibliography for publication details),
which also contains useful information about numerical methods.

Although most tables use constant argument intervals, some functions do


change rapidly in value in particular regions of their argument, and hence
may best be tabulated using intervals varying according to the local
behaviour of the function. Tables with varying argument intervals are
more difficult to work with, however, and it is common to adopt uniform
argument intervals wherever possible. As a simple example, consider the
6S table of the exponential function over 0.10 (0.01 ) 0.14 (a notation
which specifies the domain 0.10

x x
f ( x )=e
0.10 1.10517

0.11 1.11628

0.12 1.12750

0.13 1.13883

0.14 1.15027

It is extremely important that the interval between successive values is small


enough to display the variation of the tabulated function, because usually the
value of the function will be needed at some argument value between values
specified (for example, e x at x=0.105 from the above table). If the table is
constructed in this manner, we can obtain such intermediate values to a
reasonable accuracy by using a polynomial representation (hopefully, of
low degree) of the function f.

2. Finite differences

Since Newton, finite differences have been used extensively. The


construction of a table of finite differences for a tabulated function is
simple: One obtains first differences by subtracting each value from the
succeeding value in the table, second differences by repeating this
operation on the first differences, and so on for higher order differences.
From the above table of e x for x =0.10 ( 0.01 ) 0.14 one has the (note the

40 | P a g e
standard layout, with decimal points and leading zeros omitted from the
differences):

Differences

x f ( x )=e x 1st 2nd 3rd


0.10 1.10517

1111

0.11 1.11628 11

1122 0

0.12 1.12750 11

1133 0

0.13 1.13883 11

1144 1

0.14 1.15027 12

1156 1
0.15 1.16183 12

1168 −1
0.16 1.17351 11

1179 2
0.17 1.18530
13

1192
0.18 1.19722

(In this case, the differences must be multiplied by 10 -5 for comparison with
the function values.)

3. Influence of round-off errors

41 | P a g e
Consider the difference table given below for f ( x )=e x : 0.1 ( 0.05 ) 0.5 to 6S,
constructed as in Section 2. As before, differences of increasing order
decrease rapidly in magnitude, but the third differences are irregular. This is
largely a consequence of round-off errors, as tabulation of the function to
7S and differencing to fourth order illustrates (compare Exercise 3 ).

Differences

x f ( x )=e x 1st 2nd 3rd


0.10 1.10517

5666

0.15 1.16183 291

5957 15

0.20 1.22140 306

6263 14

0.25 1.28403 320

6583 18

0.30 1.34986 338

6921 16
0.35 1.41907 354

7275 20
0.40 1.49182 374

7649 18
0.45 1.56831
392

8041
0.50 1.64872

42 | P a g e
Although the round-off errors in f should be less than 1/2 in the last
significant place, they may accumulate; the greatest error that can be
obtained corresponds to:

Differences

Tabular error 1st 2nd 3rd 4th 5th 6th


+1
2
−1

+2
−1
2 -4

+1 +8

+1 -2
2
−1 +4 -16

+2 -8 +32
−1
2

+1 -4 +16

+1 -2 +8
2

−1 +4
−1
2 +2

+1

+1
2

A rough working criterion for the expected fluctuations (noise level) due to
round-off error is shown in the table:

Order of differences 1 2 3 4 5 6
Expected error limits ±1 ±2 ±3 ±6 ± 12 ± 22

43 | P a g e
EXERCISES

1. Construct the difference table for the function f (x) = x3 for x = 0(1) 6.

2. Construct difference tables for each of the polynomials:

a) 2 x−1for x=0 ( 1 ) 3.

b) 3 x 2+2 x−4 for x=0 ( 1 ) 4 .

c) 2 x3 +3 x−3 for x=0 ( 1 ) 5.

3.Construct a difference table for the function f ( x )=e x, given to 7D for

x = 0.1(0.05) 0.5

x f (x) x f ( x) x f ( x)
0.1 1.105171 0.25 1.284025 0.40 1.491825
0
1.161834 0.30 1.349859 0.45 1.568312
0.1
5 1.221403 0.35 1.419068 0.50 1.648721

0.2
0

FINITE DIFFERENCES 2

Forward, backward, central difference notations

There are several different notations for the single set of finite differences,
described in the preceding Step. We introduce each of these three notations in
terms of the so-called shift operator, which we will define first.

1. The shift operator E

Let f j ≡ f ( x j ) , where x j =x 0+ jh, j=0,1,2 , … ,n . be a set of values of the function


f(x) The shift operator E is defined by:

E f j ≡ f j+ 1.

44 | P a g e
Consequently,
2
E f j=E ( E f j ) =E f j+1=f j+2.

and so on, i.e.,

Ek f j=f j+k ,

where k is any positive integer. Moreover, the last formula can be extended
to negative integers, and indeed to all real values of j and k, so that, for
example,

E−1 f j=f j−1,

And

( ) ( ( ))
1
1 1
E 2 f j=f 1 =f x j + h =f x 0 + j+ h .
j+
2
2 2

2. The forward difference operator ∆

If we define the forward difference operator ∆ by

∆ ≡ E−1

then

∆ f j =( E−1 ) f j=E f j −f j=f j +1−f j ,

which is the first-order forward difference at xj. Similarly, we find that

2
∆ f j =∆ ( ∆ f j) =∆ f j +1−∆ f j=f j+2 −2 f j +1+ f j

is the second-order forward difference at xj, and so on. The forward


difference of order k is

45 | P a g e
k k−1
∆ f j=∆ ( ∆ f j )=∆k−1 ( f j+1−f j ) =∆ k−1 f j +1−∆ k−1 f j

where k is any integer.

3. The backward difference operator ∇

If we define the backward difference operator ∇ by

∇ ≡ 1−E−1,

then

∇ f j=( 1−E−1 ) f j=f j− E−1 f j=f j−f j−1

which is the first-order backward difference at xj. Similarly,

∇ 2 f j=∇ ( ∇ f j ) =∇ f j −∇ f j−1=f j−2 f j−1+ f j−2

is the second-order backward difference at xj, etc. The backward


difference of order k is
k k−1
∇ f j=∇ ( ∇ f j ) =∇ k−1 ( f j −f j−1 )=∇k−1 f j−∇ k−1 f j−1

where k is any integer. Note that ∇ f j=∆ f j −1 and ∇ k f j=∆ k f j−k .

4. The central difference operator δ

If we define the central difference operator δ by


1 −1
δ ≡ E 2 −E 2

then

( ) f =E
1 −1 1 −1
δ f j= E 2 −E 2
j
2
f j−E 2
f j=f 1 −f 1
j+ j−
2 2

46 | P a g e
which is the first-order central difference at xj. Similarly,

( )=f
2
δ f j =δ ( δ f j )=δ f 1 −f 1 j +1 −2 f j+ f j−1
j+ j−
2 2

is the second-order central difference at xj, etc. The central difference of


order k is

( )
k k−1
δ f j=δ ( δ f j )=δk−1 f j+ 1 −f j− 1 =δ k−1 f j + 1 −δ k−1 f j −1
2 2 2 2

k
where k is any integer. Note that δ f j+ 12 =∆ f j=∇ f j+1 .

5. Differences display

The role of the forward, central, and backward differences is displayed by


the difference table:

Differences

x f ( x) 1st 2nd 3rd 4th


x0 f0

∆f0

x1 f1 ∆2 f 0

∆ f1 ∆ f0
3

x2 f2
∆2 f 1 ∆4 f 0

∆ f2 ∆ f1
3

x3 f3
2
∆ f2

∆ f3
x4 f4

47 | P a g e
⋮ ⋮

x j−2 f j−2 δf 3
j−
2

2
δ f j −1
x j−1 f j−1
δf
j−
1 δ3 f 1
j −¿ ¿
2 2

xj fj δ2f j δ4 f j

δf
j+
1 δ3 f 1
j+
2 2

x j +1 f j +1
2
δ f j +1

δf 3
j+
x j +2 f j +2 2

⋮ ⋮

x n−4 f n−4

x n−3 f n−3

∇ f n−3
x n−2 f n−2
∇ 2 f n−2

∇ f n−2
x n−1 f n−1 3
∇ f n−1

2 4
∇ f n−1 ∇ fn
xn fn
∇ f n−1 ∇3 f n

48 | P a g e
2
∇ fn

∇fn

Although forward, central, and backward differences represent precisely


the same data:

1. Forward differences are useful near the start of a table, since they
only involve tabulated function values below xj ;
2. Central differences are useful away from the ends of a table, where
there are available tabulated function values above and below xj;
3. Backward differences are useful near the end of a table, since they
only involve tabulated function values above xj.

EXERCISES

1. Construct a table of differences for the polynomial

f ( x )=3 x 3−2 x 2+ x +5 ;

for x = 0(1)4. Use the table to obtain the values of :

a ¿ ∆ f 1 , ∆ f 1 , ∆ f 1 , ∆ f 0 , ∆ f 2 .;
2 3 3 2

b ¿ ∇ f 1 , ∇ f 2 , ∇ f 2 , ∇ f 3 , ∇ f 4 .;
2 2 3

c ¿ δ f 1 , δ 2 f 1 , δ 3 f 3 , δ 3 f 5 , δ2 f 2 .
2 2 2

2.For the difference table of f (x) = ex for x = 0.1(0.05)0.5 determine to

49 | P a g e
six significant digits the quantities (taking x0 = 0.1 ):

a ¿ ∆ f 2 , ∆ 2 f 2 , ∆3 f 2 , ∆ 4 f 2 . ; b ¿ ∇ f 6 , ∇ 2 f 6 , ∇3 f 6 , ∇ 4 f 6 .;

3 3 3
d ¿ ∆ f 1 , δ f 2 , ∇ f 3.; e ¿ ∆ f 3 , ∇ f 6 , δ f 9 .
2 4 2 2 2
c ¿ δ f 4 , δ f 4 .;
2

3.Prove the statements:

a ¿ E x j=x j+1 .;

b ¿ ∆ f j =f j+3 −3 f j+2 +3 f j+1−f j.;


3

c ¿ ∇3 f j=f j−3 f j−1+ 3 f j−2−f j−3 .;

d ¿ δ 3 f j=f 3 −3 f 1 +3 f 1 −f 3 .
j+ j+ j− j−
2 2 2 2

Lagrange interpolation formula


In this part we consider an interpolation formula attributed to Lagrange, which
does not require function values at equal intervals. Lagrange's interpolation
formula has the disadvantage that the degree of the approximating polynomial
must be chosen at the outset; an alternative approach is discussed in the next Step.
Thus, Lagrange's formula is mainly of theoretical interest for us here; in passing,
we mention that there are some important applications of this formula beyond the
scope of this book - for example, the construction of basis functions to solve
differential equations using a spectral (discrete ordinate) method.

1. Procedure

Let the function f be tabulated at (n + 1), not necessarily equidistant points


x j , j=1,2 ,… , n and be approximated by the polynomial

n n−1
Pn ( x ) =an x + an−1 x +…+ a1 x +a 0

of degree at most n, such that

f j=f ( x j ) =Pn ( x j ) for j=0,1,2 , … ,n

Since for k = 0,1, 2, . . , n

50 | P a g e
( x−x 0 ) ( x−x 1 ) … ( x−x k−1 )( x− xk +1 ) … ( x−x n )
Lk ( x ) =
( x k −x 0) ( x k −x1 ) … ( x k −x k−1 ) ( x k −x k+1 ) … ( x k −x n )

is a polynomial of degree n which satisfies

Lk ( x j )=0 , j ≠ k , j =0,1,2 ,… , n∧Lk ( x k ) =1

then:
n
Pn ( x ) =∑ Lk ( x) f k
k=0

is a polynomial of degree n which satisfies

Lk ( x j )=0 , j≠ k , j =0,1,2 ,… , n∧Lk ( x k ) =1

Hence,
n
Pn ( x ) =∑ Lk ( x) f k
k=0

is a polynomial of degree (at most) n such that

Pn ( x j )=f j , j=0,1,2 , … ,n ,

i.e., the (unique) interpolating polynomial. Note that for x = xj all terms in
the sum vanish except the j-th, which is fj; Lk(x) is called the k-th Lagrange
interpolation coefficient, and the identity
n

∑ L k ( x )=1
k=0

(established by setting f(x)  1) may be used as a check. Note also that with
n = 1 we recover the linear interpolation formula:

( x−x 1 ) ( x−x 0 ) ( x−x 0 )


P1 ( x ) = f 0+ f 1=f 0+ ( f 1−f 0 )
( x 0−x 1 ) ( x 1−x 0 ) ( x 1−x 0 )

2. Example

We will use Lagrange's interpolation formula to find the interpolating


polynomial P3 through the points (0, 3), (1, 2), (2, 7), and (4, 59), and then
find the approximate value P3(3).

51 | P a g e
The Lagrange coefficients are:

( x−1 ) ( x−2 ) ( x−4 ) −1 3


L0 ( x ) = = ( x −7 x 2+14 x −8 )
( 0−1 )( 0−2 ) ( 0−4 ) 8

( x−0 )( x−2 ) ( x−4 ) 1 3


= ( x −6 x +8 x )
2
L1 ( x ) =
( 1−0 ) (1−2 ) (1−4 ) 3

( x−0 )( x−1 ) ( x−4 ) −1 3


L2 ( x ) = = ( x −5 x 2+ 4 x )
( 2−0 ) ( 2−1 ) (2−4 ) 4

( x−0 ) ( x−1 )( x−2 ) 1 3


L3 ( x ) = = ( x −3 x 2 +2 x )
( 4−0 ) ( 4−1 ) ( 4−2 ) 24

(The student should verify that ( L0 ( x )+ L1 ( x )+ L2 ( x )+ L3 ( x )=1 ) . Hence, the


required polynomial is

−3 3
P3 ( x ) = ( x −7 x 2+14 x−8 ) + 2 ( x 3−6 x2 +8 x )− 7 ( x 3−5 x 2 +4 x ) + 59 ( x 3−3 x 2+2 x )
8 3 4 24

1
¿ ( −9 x3 +63 x 2−126 x +72+ 16 x3 −96 x 2+128 x−42 x 3 +210 x 2−168 x +59 x 3−177 x 2+118 x )
24

1
¿ ( 24 x3 + 0 x 2−48 x+72 )
24

¿ x 3−2 x+3

Consequently, f ( 3 ) ≈ P3 ( 3 )=27−6+3=24. However, note that, if the explicit


form of the interpolating polynomial were not required, one would
proceed to evaluate P3(x) for some value of x directly from the factored
forms of Lk(x). Thus, in order to evaluate P3(3), one has

( 3−1 )( 3−2 )( 3−4 ) 1


L0 ( 3 ) = = , etc .
( 0−1 )( 0−2 )( 0−4 ) 4

EXERCISE

Given that f (-2) = 46, f (-1 ) = 4, f ( 1 ) = 4, f (3) = 156, and f (4) = 484, use
Lagrange's interpolation formula to estimate the value of f(0).

52 | P a g e
CHAPTER 4

Use of LU decomposition
Another general approach to solving Ax = b is known as the method of LU
decomposition, which provides new insights into matrix algebra and has many
theoretical and practical uses. It yields efficient computer algorithms for handling
practical problems.

The symbols L and U denote lower triangular matrix and upper triangular
matrices, respectively. Examples of lower triangular matrices are

[ ] [ ]
1 0 0 2 0 0
L1 = 0 1 0 ∧L 2 = 1 −1 0
2 −0.5 1 2 3 1

Note that in such a matrix all elements above the leading diagonal are zero.
Examples of upper triangular matrices are:

[ ] [ ]
−1 2 1 −1 2 0
U 1= 0 8 6 ∧U 2= 0 1 2
2 0 6 0 0 −1

where all elements below the leading diagonal are zero. The product of L1 and U1
is

[ ]
−1 2 1
A=L1 U 1= 0 8 6
−2 0 5

1. Procedure

53 | P a g e
Suppose we have to solve a linear system Ax = b and that we can express the
coefficient matrix A in the form of the socalled LU decomposition A = LU. Then
we may solve the linear system as follows:

Stage l:

Write Ax = LUx = b.

Stage 2:

Set y = Ux, so that Ax = Ly = b. Use forward substitution with Ly = b to find


y1, y2, . . , yn in that order, i.e., assume that the augmented matrix for the system

Ly = b is:

[ ]
l 11 0 ⋯ 0 0 b1
l 21 l 22 ⋯ 0 0 b2
⋮ ⋮ ⋱ ⋮ ⋮ ⋮
l n−1,1 ln−1,2 ⋯ l n−1 , n−1 0 b n−1
ln 1 ln 2 ⋯ l n ,n−1 l nn b n

b1
Then forward substitution yields y 1= , and, subsequently,
l 11

[ ]
i−1
1
y i= b −∑ l y , i=2,3 , …
l ii i j=1 ij j

Note that the value of yi depends on the values y1, y2, . . , yi-1, which have already
been calculated.

Stage 3:

Finally, use back-substitution with Ux = y to find xn, . . . , x1 in that order.

Later on we shall outline a general method for finding LU decompositions of


square matrices. The following example demonstrates this method, involving the
matrix A = L1U1 above. If we wish to solve Ax = b for a number of different
vectors b, then this method is more efficient than Gauss elimination. Once we
have found an LU decomposition of A, we need only forward and backward
substitute to solve the system for any b.

54 | P a g e
Example

We shall solve the system

−x 1+ ¿ 2 x 2 +¿ x3 =0
¿ 8 x 2+ ¿ 6 x3 =10 +¿ 5 x 3=−11 ¿
¿

Stage l:

An LU decomposition of the system is

AX =L1 U 1 X =b

[ ][ ][ ] [ ]
1 0 0 −1 2 1 x 1 0
AX= 0 1 0 0 8 6 x2 = 10
2 −0.5 1 2 0 6 x 3 −11

Stage 2:

Set y = U1x and then solve the system L1y = b, i.e.,

[ ][ ] [ ]
1 0 0 y1 0
0 1 0 y2 = 10
2 −0.5 1 y 3 −11

Using forward substitution, we obtain:

y 1=0

y 2=10

2 y 1−0.5∗y 2+ y 3=−11 ⟹ y 3 =−6

[]
0
Y = 10
−6

Stage 3:

55 | P a g e
Solve U 1 X=Y

[ ][ ] [ ]
−1 2 1 x 1 0
0 8 6 x 2 = 10
2 0 6 x 3 −6

Back-substitution yields:

6 x 3=−6 ⟹ x 3=−1

8 x 2+ 6 x3 =10⟹ x 2=2

−x 1+ 2 x 2 + x 3=0 ⟹ x 1=3

Thus, the solution of Ax = b is:

[]
3
X= 2
−1

which you may check, using the original equations. We turn now to the problem of
finding an LU decomposition of a given square matrix A.

Realizing an LU decomposition

For an LU decomposition of a given n x n matrix A, we seek a lower triangular


matrix L and an upper triangular matrix U (both of order n x n) such that A = LU.
The matrix U may be taken to be the upper triangular matrix resulting from
Gauss elimination, and the matrix L may be taken to be the lower triangular
matrix which has diagonal elements 1 and which has as the (i, k) element the
multiplier mik. This multiplier is calculated at the k-th stage of Gauss elimination
and is required to transform the current value of aik into 0. In the notation form,
aik
these multipliers were given by mik = , I = k+l, k+2,.. ,n.
akk

An example will help to clarify this procedure.

x+ ¿ y −¿ z=2
x+ ¿ 2 y +¿ z=6
2 x−¿− y+ ¿ z =1

yielded the upper triangular matrix:

56 | P a g e
[ ]
1 1 −1
U= 0 1 2
0 0 9

a21
Also, we saw that in the first stage we calculated the multipliers m21= =1 and
a11
a31 a32
m31= =2 , while, in the second stage, we calculated the multiplier m32= =−3.
a11 a22

Thus

[ ][ ]
1 0 0 1 0 0
L= m 21 1 0 = 1 1 0
m 31 m32 1 2 −3 1

It is readily verified that LU equals the coefficient matrix of the original system:

[ ]
1 1 −1
LU = 1 2 1
2 −1 1

Another technique which may be used to find an LU decomposition of an n x n


matrix is by direct decomposition. In order to illustrate this process, let it be
required to find an LU decomposition for the 3 x 3 coefficient matrix of the system
above. Then the required L and U are of the form

[ ] [ ]
l 11 0 0 u 11 u12 u13
L= l 21 l 22 0 , U= 0 u22 u23
l 31 l 32 l 33 0 0 u33

Note that the total number of unknowns in L and U is 12, whereas there are only 9
elements in the 3 x 3 coefficient matrix A. To ensure that L and U are unique, we
need to impose 12 - 9 = 3 extra conditions on the elements of these two triangular
matrices. (In the general nn case, n extra conditions are required.) One common
choice is to require all the diagonal elements of L to be 1; the resulting method is
known as Doolittle's method. Another choice is make the diagonal elements in U
to be 1; this is Crout's method. Since Doolittle's method will give the same in this
direct LU decomposition for A, given above, we shall use Crout's method to
illustrate decomposition procedure.

57 | P a g e
We then require that

[ ][ ][ ]
l 11 0 0 1 u12 u13 1 1 −1
l 21 l22 0 0 1 u23 = 1 2 1
l 31 l32 l 33 0 0 1 2 −1 1

Multiplication of L and U yields:

[ ac bd ]
It is clear that this construction by Crout's method yields triangular matrices L and
U for which A=LU.

EXERCISES

1. Find an LU decomposition of the matrix

[ ac bd ]
where

a , b , c , d ≠ 0.

2. Solve each of the following systems) by first finding an LU decomposition


of the coefficient matrix and then using forward and backward substitutions.
a.

x 1 +¿ x 2−¿ x 3=0
2 x 1−¿ x 2 +¿ x3 =6
3 x 1+¿ 2 x 2−¿ 4 x 3=−4

58 | P a g e
b.

2 x +¿ 6 y +¿ 4 z=5
6 x +¿ 19 y+ ¿12 z=6
2 x +¿ 8 y +¿ 14 z=7

The Gauss-Seidel iterative method


1. Iterative methods

Iterative methods provide an alternative approach. Recall that an iterative method


starts with an approximate solution and uses it by means of a recurrence formula
to provide another approximate solution; by repeated application of the formula, a
sequence of solutions is obtained which (under suitable conditions) converges to
the exact solution. Iterative methods have the advantages of simplicity of operation
and ease of implementation on computers, and they are relatively insensitive to
propagation of errors; they would be used in preference to direct methods for
solving linear systems involving several hundred variables, particularly, if many of
the coefficients were zero.

Systems of over 100 000 variables have been successfully solved on computers by
iterative methods, whereas systems of 10 000 or more variables are difficult or
impossible to solve by direct methods.

2. The Gauss-Seidel method

This text will only present one iterative method for linear equations, due to Gauss
and improved by Seidel. We shall use this method to solve the system

59 | P a g e
10 x 1 +2 x 2 + x 3=13
2 x1 +10 x 2 + x 3=13
2 x1 + x2 +10 x 3=13

It is suitable for implementation on computers.

The first step is to solve the first equation for x1, the second for x2, and the third for
x3 when the system becomes:

x 1=1.3−0.2 x 2−0.1 x 3 … … … …..(1)

x 2=1.3−0.2 x 1−0.1 x 3 … … … …..(2)

x 3=1.3−0.2 x 1−0.1 x 2 … … … …..(3)

An initial solution is now assumed; we shall use xl = 0, x2 = 0 and x3 = 0. Inserting


these values into the right-hand side of Equation (1) yields xl = 1.3. This value for
xl is used immediately together with the remainder of the initial solution (i.e., x2 = 0
and x3 = 0) in the right-hand side of Equation (2) and yields x2 =1.3 - 0.2 x 1.3 - 0 =
1.04. Finally, the values xl = 1.3 and x2 = 1.04 are inserted into Equation (3) to
yield x3 = 0.936. This second approximate solution (1.3, 1.04, 0.936) completes the
first iteration.

Beginning with this second approximation, we repeat the process to obtain a third
approximation, etc. Under certain conditions relating to the coefficients of the
system, this sequence will converge to the exact solution.

We can set up recurrence relations which show clearly how the iterative process
proceeds. Denoting the k-th and k+1-th approximations by (x(k)1, x(k)2, x(k)3) and
(x(k+1)1, x(k+1)2, x(k+1)3), respectively, we find
( k +1) ( k) (k )
x1 =1.3−0.2 x 2 −0.1 x 3 … … … … … … .(1) '

( k +1) ( k+1) ( k)
x2 =1.3−0.2 x 1 −0.1 x 3 … … … … … .(2)'

x (3k +1)=1.3−0.2 x (1k+2) −0.1 x (2k+1) … … … … .(3)'

We begin with the starting vector x(0) = (x(0)1, (x(0)2, (x(0)3) all components of which
are 0, and then apply these relations repeatedly in the order (1)', (2)' and (3)'. Note
that, when we insert values for xl, x2 and x3 into the right-hand sides, we always use
the most recent estimates found for each unknown.

3. Convergence

The sequence of solutions produced by the iterative process for the above
numerical example are shown in the table:
60 | P a g e
Iteration Approximate solution(Gauss-seidel)
(k ) (k ) (k )
k x1 x2 x3
0 0 0 0

1 1.3 1.04 0.936

2 0.9984 1.00672 0.999648

3 0.998691 1.000297 1.000232

The student should check that the exact solution for this system is (1,1,1). It is seen
that the Gauss-Seidel solutions are rapidly approaching these values; in other
words, the method is converging.

Naturally, in practice, the exact solution is unknown. It is customary to end the


iterative procedure as soon as the differences between the x(k+1) and x(k) values are
suitably small. 0ne stopping rule is to end the iteration when
n
Sk =∑ |x (ik +1)−x (ik )|
i=1

becomes less than a prescribed small number (usually chosen according to the
accuracy of the machine on which the calculations are carried out).

The question of convergence with a given system of equations is crucial. As in the


above example, the Gauss-Seidel method may quickly lead to a solution very
close to the exact one; on the other hand, it may converge too slowly to be of
practical use, or it may produce a sequence which diverges from the exact solution.
The reader is referred to more advanced texts (such as Conte and de Boor ( 1980))
for treatments of this question.

In order to improve the chance (and rate) of convergence, the system of equations
should be rearranged before applying the iterative method, so that, as far as
possible, each leading-diagonal coefficient is larger (in absolute value) than any
other in its row.

EXERCISES

1. For the example treated above, compute the value of S3, the quantity used in
the suggested stopping rule after the third iteration.

61 | P a g e
2. Use the Gauss-Seidel method to solve the following systems to 5D
accuracy (remembering to rearrange the equations if appropriate).
Compute the value of Sk (to 6D) after each iteration.

a)

x - y + z = -7,
20x + 3y - 2z = 51,
2x + 8y + 4z = 25.

Remember to rearrange! Compute the value of Sk to 5 D after each iteration.

b)

10x   -   y                  = 1
-x   +   10y   -   z          = 1
    -   y   +   10z   -   w  =  1
         -  z   +   10w   =   1

Remarks: Other methods to solve systems of linear equations can be found in


the MATH 102 Curriculum methods such as Cramer’s Rule, Gaussian-
Elimination, Gauss-Jordan and others.

CURVE FITTING

1. Least squares

Scientists often wish to fit a smooth curve to experimental data. Given (n + 1)


points, an obvious approach is to use the interpolating polynomial of degree n,
but when n is large, this is usually unsatisfactory. Better results are obtained by
piecewise use of polynomials, i.e., by fitting lower degree polynomials through
subsets of the data points.

A rather different, but often quite suitable approach is a least square fit, in which,
instead of trying to fit points exactly, a polynomial of low degree (often linear or
quadratic) is obtained which fits the points closely (after all, the points themselves
may not, in general, be exact, but subject to experimental error).

2. An illustration of the problem

Suppose we are studying experimentally the relationship between two variables x


and y - for example, quantities x of drug injected and observed responses y, reorded
in a laboratory experiment. By carrying out the appropriate experiment, say, six

62 | P a g e
times, we obtain six pairs of values (xj, yj), which can be plotted on a diagram such
as Figure 11(a).

Fig. 12 Fitting a straight line and a parabola

We may believe that the relationship between the variables can be described
satisfactorily by a function y = f (x), but that the y-values, obtained experimentally,
are subject to errors (or noise). Therefore one arrives at the mathematical model:

f ( x i ) = y i+ ϵ i , i=1,2 , … , n

with n data, where f (xi ) are the values of y, corresponding to the value of xi, used
in the experiment, and i is the experimental error involved in the measurement of
the variable y at the point. Thus, the error in y at the observed point is
ϵ i=f ( x i )− y i .

In the problem of curve fitting, we use the information of the sample data points to
determine a suitable curve (i.e., find a suitable function f ) so that the equation
y = f (x) gives a description of the (x, y) relationship, in other words, it is hoped
that predictions made by means of this equation will not be too much in error.

How does on choose the function f ? There is an unlimited range of functions


available. Figure 11(b) shows four possibilities. The polygon A passes through all
six points; intuitively, however, we would prefer to fit a straight line B, or an
exponential curve such as C. The curve D is clearly not a good candidate for our
model.

3. A general approach to the problem

Let us, first of all, answer the question regarding the choice of function. Given a set
of values (x1, y1), (x2, y2),. . , (xn, yn); we shall pick a function which we can specify
completely except for·the values of a set of k parameters c1, c2, .. , ck; we shall
denote this function by y=f ( x ; c1 , c 2 , … , c k ). We then choose values for the

63 | P a g e
parameters which will make the errors at the observation points (xi, yi) as small as
possible. Next, we shall suggest three ways by which the phrase as small as
possible can be given specific meaning.

Examples of functions to use are:


2 k−1
1. y ( x ) =c 1+ c 2 x +c 3 x +… c k x (Polynomials),
2. y ( x ) =c 1 sinωx+ c2 sin 2 ωx +c 3 sin 3 ωx + …+c k sinkωx (
Combinations of· sine functions),
3. y ( x ) =c 1 cosωx +c 2 cos 2ωx + c3 cos 3 ωx + …+c k coskωx

( Combination of cosine functions).

These examples may be termed general, linear forms:

4. y ( x ) =c 1 ∅1 ( x ) + c2 ∅ 2 ( x ) +c 3 ∅ 3 ( x ) +… c k ∅ k ( x ), where the functions ∅ 1 , ∅2 , … , ∅ k are a


preselected set of functions.

In 1., the set of functions is{ 1 , x , x 2 , … , x k−1 } ; in 2., { sinωx , sin 2 ωx , … , sinkωx } with ω
a constant chosen to coincide with a periodicity in the data, while in 3., the set is
{ cosωx , cos 2 ωx , … , coskωx }. Other functions commonly used in curve fitting are
exponential functions, Bessel functions, Legendre polynomials, and
Chebyshev polynomials (cf., for example, Burden and Faires (1993)).

4. The meaning of Errors as small as possible

We now present criteria which make precise the concept of choosing a function to
make measurement errors as small as possible. We suppose that the curve to be
fitted can be expressed in a general linear form, with a known set of functions
{∅1 , ∅ 2 , … , ∅ k }.

The errors ϵ i= y ( x i )− yi at the n data points are:

ϵ 1=c1 ∅ 1 ( x 1 ) +c 2 ∅ 2 ( x 1 ) +…+ c k ∅k ( x 1 )− y 1

ϵ 2=c1 ∅ 1 ( x 2 ) +c 2 ∅ 2 ( x 2 ) +…+ c k ∅k ( x 2 )− y 2

⋮⋮

ϵ n=c 1 ∅ 1 ( xn ) + c 2 ∅ 2 ( xn ) + …+c k ∅ k ( x n ) − y n

If the number of data points is less than or equal to the number of parameters, i.e.,
n ≤ k , it is possible to find values for {c1, c2,. .. ., ck) which make all the errors i
zero. If n is an infinite number of solutions for {ci} which make al1 the errors zero,
then an infinite number of curves of the given form pass through all the

64 | P a g e
experimental points; in this case, the problem is not fully determined, i.e., more
information is needed to choose an appropriate curve.

If n > k, which, in practice, is mostly the case, then it is not normally possible to
make all the errors zero by a choice of the {ci}. There are three possible choices:

1. A set {ci} which minimizes the total absolute error, i.e., minimize the sum:
n

∑|ϵ i|;
i=1

2. a set {ci} which minimizes the maximum absolute error, i.e., minimizes

max |ϵ i|;
i=1,2,… ,n

3. a set {cI} which minimizes the sum of the squares of the errors, i.e.,

minimize
n
S=∑ |ϵ 2i |;
i=1

In general, Procedures 1 and 2 are not readily applied. Procedure 3 leads to a


linear system of equations for the set {cI}, referred to as the Principle of
least squares; it is used almost exclusively.

5. The least squares method and normal equations

In order to apply the principle of least squares, use has to be made of partial
differentiation. We now describe the method here and give examples, in order to
show how it is used.

The sum of squared errors to be minimized is

n n
S=∑ ϵ i =∑ [ c1 ∅ 1 ( x i ) +c 2 ∅ 2 ( x i ) +…+ c k ∅k ( x i )− y i ]
2 2

i=1 i=1

The n values of (xi, yi) are the known measurements taken from n experiments.
When they are inserted on the right-hand side, S becomes an expression involving
only the k unknowns c1, c2, . . , ck. In other words, S may be regarded as a function

65 | P a g e
of the ci, i.e., ≡ S ( c1 , c 2 , … , c k ). The problem is now to choose that set of values {ci}
which makes S a minimum.

A theorem in calculus tells us that, under certain conditions which are usually
satisfied in practice, the minimum of S occurs when all the partial derivatives

∂S ∂S ∂S
, ,…,
∂ c1 ∂ c2 ∂c k

∂S
vanish. The partial derivative ∂ c coincides here with the differential coefficient
1
dS
d c1
, while all the other ci are held constant; for instance, if S = 3cl + 5c2, then

∂ S 3∧∂ S
= =5
∂ c1 ∂ c2

Thus, we have to solve the system of k equations:

∂S
=0
∂ c1

∂S
=0
∂ c2

∂S
=0
∂ ck

This system is a set of equations which is linear in the variables cl, c2, . . , ck and is
referred to as the normal equations for the least squares approximation.

6. Example

The following points were obtained in an experiment:

x 1 2 3 4 5 6
y 1 3 4 3 4 2

66 | P a g e
We shall plot the points on a diagram and use the method of least squares to fit
through them

a) a straight line, and b) a parabola.

The plotted points are shown in Figure 12(a). In order to fit a straight line, we have
to find a function y = cl + c2x, i.e., a first degree polynomial which minimizes
6 6
s=∑ ϵ 2i =∑ [ y i−c1 −c 2 x i ]
2

i=1 i=1

Differentiating first with respect to cl (keeping c2 constant) and then with respect to
c2 (keeping cl constant), and setting the results equal to zero, yields the normal
equations:

6
∂S
≡−2 ∑ ( y i−c 1−c 2 xi ) =0
∂ c1 i=1

6
∂S
≡−2 ∑ x i ( y i−c 1−c2 x i )=0
∂ c2 i =1

We may divide both equations by -2, take the summation operations through the
brackets, and rearrange, in order to obtain:

(∑ )
6 6

∑ yi =6 c 1+ xi c2
i=1 i=1

( ) (∑ )
6 6 6

∑ x i y i= ∑ xi c 1+ x 2i c 2
i=1 i=1 i=1

We see that, in order to obtain a solution, we have to evaluate the four sums
∑ x i , ∑ y i , ∑ x 2i , ∑ x i y i and insert them into these equations. We can arrange the
work in a table as follows (the last three columns are devoted to fitting of the
parabola and the required sums are in the last row):

i xi yi x 2i xi yi x 2i y i x 3i x 4i
1 1 1 1 1 1 1 1

2 2 3 4 6 12 8 16

67 | P a g e
3 3 4 9 12 36 27 81

4 4 3 16 12 48 64 256

5 5 4 25 20 100 125 625

6 6 2 36 12 72 216 1296
∑❑ 21 17 91 63 269 441 2275

The corresponding normal equations for fitting a straight line are:

17=6 c 1+ 21c 2

63=21 c1 +91 c 2

The solutions to 2D are c1 = 2.13 and c2 = 0.20, whence the required line is

(Figure 12(b)):

y=2.13+ 0.20 x

In order to fit a parabola, we must find the second degree polynomial


2
y=c1 + c2 x+ c3 x

which minimizes
6 6
s=∑ ϵ =∑ [ y i −c 1−c 2 x i−c3 x 2i ]
2 2
i
i=1 i=1

Taking partial derivatives and proceeding as above we obtain the normal


equations:

( ) ( )
6 6 6

∑ yi =6 c 1+ ∑ x i c2 + ∑ x 2i c3
i=1 i=1 i=1

( ) ( ) (∑ )
6 6 6 6

∑ x i y i= ∑ xi c 1+ ∑ x 2i c 2 + x 3i c3
i=1 i=1 i=1 i =1

(∑ ) (∑ ) (∑ )
6 6 6 6

∑x 2
i y i=
2
x c1 +
i
3
x c2+
i xi c3
4

i=1 i=1 i=1 i =1

68 | P a g e
Inserting the values for the sums (see the table above), we obtain the system of
linear equations:

17=6 c 1+ 21c 2 +91 c 3

63=21 c1 +91 c 2 +441 c 3

269=91 c1 + 441c 2 +2275 c 3

The solution to 3D is c1 = -1.200, c2 = 2.700, and c3 = -0.357. The required


parabola is therefore (retaining 2D):
2
y=−1.20+2.70 x −0.36 x

it is also plotted in Figure 13(b). Obviously, the parabola is a better fit than the
straight line!

EXERCISES

1. For the example above (the data points are shown in Figure 12(a)) compute
the value of S, the sum of the squares of the errors at the points, from 1. the
fitted line, and 2. the fitted parabola. Plot the points on graph paper, and fit
a straight line by eye (i.e., use a ruler to draw a line, guessing its best
position). Determine the value of S for this line and compare it with the
value for the least squares line.

69 | P a g e
Fit a straight line by the least squares method to each of the following sets
of data:

a) Toughness x and percentage of nickel y in eight specimens of alloy


steel.

toughness x 36 41 42 43 44 45 47 50
% nickel y 2.5 2.7 2.8 2.9 3.0 3.2 3.3 3.5

b) Aptitude test marks x, given to six trainee sales people, and their first-
year sales y in thousands of dollars.

Aptitude test x 25 29 33 36 42 54
First-year sales y 42 45 50 48 73 90

For both sets of data, plot the points and draw the least squares line. Use the
lines to predict the % - nickel of a specimen of steel the toughness of which
is 38, and the likely first-year sales of a trainee sales person who obtains a
mark of 48 in the aptitude test.

2. Obtain the normal equations for fitting a third-degree polynomial


y = c1 + c2x + c3x2 + c3x3 to a set of n points. Show that they can be written in
matrix form (all sums being from i =1 to i=n )

[ ][ ][ ]
∑ yi n ∑ xi ∑ x 2i ∑ x31 c1
∑ xi y i =
∑ xi ∑ x 2i ∑ x3i ∑ x 4i c2
∑ x2i y i ∑ x 2i ∑ x 3i ∑ x i4 ∑ x5i c3
∑ x 3i y i ∑ x 3i ∑ xi4 ∑ x5i ∑ x6i c4

Deduce the matrix form of the normal equations for fitting a fourth-
degree polynomial.

3. Use the least squares method to fit a parabola to the points (0,0), (1,I),
(2,3), (3,3), and (4,2). Find the value of S for this fit.

70 | P a g e
4. Find the normal equations which arise while fitting by the least squares
method an equation of the form y = c1 + c2sin x to the set of points
( 0,0 ) ( π6 , 1)( π2 ,3)∧( 56π , 2) .Solve them for c and c .
1 2

SECTION B

FIRST-ORDER DIFFERENTIAL EQUATIONS AND APPLICATIONS.

Definition:

Differential equation¿ is an equation involving one or more derivatives of an


unknown function.

In this section we will denote the unknown function by y= y ( x) unless the


differential equation arises from an applied problem involving time (t) , in which
case we will denote it by y= y (t).

The ORDER of a differential equation is the order of the highest derivative that it
contains.

71 | P a g e
Examples

Differential equation Order


dy 1
a) dx =3 y

d2 y dy 2
b) 2 −6 + 8 y=0
dx dx

d3 y dy t 3
c) 3 −t +ty=e
dt dt

d) y ' − y =e2 x 1

e) y ' ' + y ' =cost 2

f) y ' ' ' − y ' +t=2 x 3

SOLUTIONS OF DIFFERENTIAL EQUATINS

A function y= y (x) is a solution of a differential equation on an open interval I if


the equation is satisfied identically on I when y and its derivatives are substituted
into the equation.

Example :
dy 2x
y=e is a solution of the differential equation
2x
− y=e ……………..(1)
dx
on the interval I ¿ (−∞ , ∞ ) since substituting y and its derivatives into the left side of
dy d (e2x ) 2x
this equation yields − y=
2x 2x
−e =2 e −e =e
2x
dx dx

for all real values of x . However, this is not the only solution on I (e.g)

y=Ce x +e 2 x ……………..(2) is also a solution for every real values of the constant
C . Since

72 | P a g e
dy d ( Cex +e 2 x )
−(Ce ¿ ¿ x+ e )=(Ce +2 e )−(Ce ¿ ¿ x +e )=e ¿ ¿ .
2x x 2x 2x 2x
− y=
dx dx

After developing some techniques for solving equations such as (1) we will be able
to show that all solutions of on (−∞ , ∞ ) can be obtained by substituting values for
the constant C in (2).

On a given interval I, a solution of a differential equation from which all solutions


on I can be derived by substituting values for arbitrary constants is called a
general solution of the equation on I . Thus (2) is a general solution of (1) on the
interval I ¿ (−∞ , ∞ ).

The graph of a solution of differential equation is called on integral curve

INITIAL-VALUE PROBLEM(IVP)

For a first order equation, the single arbitrary constant can be determined by
specifying the value of the unknown function y ( x ) at an arbitrary x−¿value x 0 say
y ( x 0 )= y 0 . This is called an initial condition , and the problem of solving a first-
order equation subject to an initial condition is called a first-order initial-value
problem.

Example:(The solution of the initial-value problem).


dy 2x
− y=e , y ( 0 )=3
dx

can be obtained by substituting the initial condition x=0 , y=3 in the in the general
solution (2) to find C.

⟹ 3=C e0 + e0 =C+1

⟹ C=2

73 | P a g e
x 2x
⟹ y ( x )=2 e +e

Geometrically : Passes through the point ( 0,3 ) .

FIRST-ORDER LINEAR EQUATIONS


dy
Consider: =Q(x)…………….(3)
dx

dy
(e.g) =x 3.………………(4)
dx

x4
⟹ y= +C .
4

More generally, a first-order differential equation is called linear if it is expressible


in the form
dy
+ P ( x ) y =Q(x)…………….(5)
dx

(e.g)

dy 2 x
2
P ( x ) =x , Q ( x )=e
x
a) dx + x y=e

dy 3 P ( x ) =sinx ,Q ( x )=x 3
b) dx + ( sinx ) y + x =0

dy P ( x ) =5 ,Q ( x )=2
c) dx +5 y =2

dy P ( x ) =−1 ,Q ( x )=x
d) dx − y=x

We will generally define a family of solutions implicitly. In some cases it may be


possibly to solve this equation explicitly for y .

74 | P a g e
Example #1: (Solve the differential equation)
dy 2 dy 2
=−4 xy and then solve the (I.V.P) =−4 xy , y ( 0 )=1.
dx dx

1 dy
⟹ =−4 x where y ≠ 0
y 2 dx

⟹∫ y dy=−4∫ xdx
−2

1
⟹− =−2 x 2+C
y

1
Solving for y as a function of x , we obtain y= 2 .
2 x +C

Using the initial condition y ( 0 )=1 requires x=0 and y=1


1 1
1= ⟹ C=−1 ⟹ y ( x )= 2
0−C 2 x −1

INTEGRATING FACTOR.

We will assume that the functions P ( x ) and Q ( x ) in (5) are continuous on a common
interval I, and we will look for a general solution that is valid on I.

Define μ=μ (x) by: μ=e∫ ……………….(6)


P (x)dx

dμ ∫ P (x )dx d
⟹ =e . ∫ P( x )dx
dx dx


⟹ =μP ( x )
dx

d dy dμ
Thus , ( μy )=μ + y
dx dx dx

dy
¿μ + μP ( x ) y …………………….(7)
dx

dy
Multiply (5) by μ ⟹μ + μP ( x ) y=μQ ( x )
dx
75 | P a g e
d
⟹ ( μy ) =μQ( x)……………………………..(8)
dx

1
⟹ y=
μ
∫ μQ( x)dx …………………………….(9)
The function μ is called an integrating factor for (5) and this method for finding a
general solution of (5) is called the method of integrating factors.

THE METHOD OF INTEGRATING FACTORS

STEP # 1: Calculate the integrating factor μ=e∫ P (x)dx since any μ will

suffice, we can take the constant of integration to be zero in this

step.

STEP # 2: Multiply both sides of (5) by μ and express the result as


d
( μy )=μQ (x)
dx

STEP # 3: Integrate both sides of the equation obtained in STEP # 2 and

then solve for y . Be sure to include a constant of integration in

this step.

Example #2 (Solve the differential equation)


dy 2x
− y=e ……………….(1)
dx
2x
P ( x ) =−1∧Q ( x )=e

μ=e∫ =e∫ =e ………………..(2)


P (x)dx (−1)dx −x

Multiply (1) by μ=e−x both sides.


dy −x
⟹ e−x −e y =e−x e 2 x
dx

d −x
⟹ ( e y )=e x
dx

⟹∫ d ( e−x y )=∫ e x dx

76 | P a g e
−x x
⟹e y=e +C
2x x
⟹ y=e +C e

NOTE: A differential equation of the form


dy
P ( x) +Q ( x ) y =R ( x )
dx

dy Q ( x ) R(x)
⟹ + y= … … … … …(10)
dx P ( x ) P(x)

Example #3:(Solve the initial-value problem)


dy
x − y =x , y (1 ) =2
dx

dy 1 1
⟹ − y=1 , P ( x ) = ∧Q ( x ) =1
dx x x

∫ −1 dx 1
μ=e∫
P (x)dx x −lnx
=e =e =
x

μ
dy
dx
−μ
1
x ()
y=μ .1

1 dy 1 1
⟹ − 2 y=
x dx x x


d 1
dx x ( )
y =
1
x

⟹∫ d ( 1x y)=∫ 1x dx
1
⟹ y =ln |x|+C
x

At x=1∧ y=2 ⟹C=2

Therefore y=xln |x|+2 x

FIRST-ORDER SEPARABLE EQUATIONS

77 | P a g e
Although there is no general method for solving non-linear first (O.D.E). we will
now consider a method of solution that can often be applied to first-order equations
that are expressible in the form
dy
h( y) =g( x)……………….(11)
dx

⟹ h ( y ) dy=g(x) dx ………………(12)

The process of rewriting (11) in form (12) is called separating variables.

METHODS OF SEPARATION OF VARIABLES

STEP #1: Separate the variables in (11) by rewriting the equation in the

differential form h ( y ) dy=g( x ) dx.

STEP #2: Integrate both sides of the equation in STEP #1 (the left side with

respect to y and right side with respect x ).

∫ h ( y ) dy=∫ g (x)dx
STEP #3: If H ( y ) is any antiderivative of h ( y ) and G(x ) is any

antiderivatives of g ( x ) then the equation H ( y )=G ( x ) +C .

Example #4 (Solve the initial value problem)


dy
= y , y ( 0 ) =2
dx

1
⟹∫ dy=∫ dx
y

⟹ ln | y|=x+C
ln| y| x+C
⟹e =e
x c
⟹ y= A e , where A=e (constant )

Since ¿ 0∧ y=2 ⟹ A=2 . Therefore y=2e x .

EXERCISES

1) Solve the equations using both the method of integrating factors and

78 | P a g e
the method of separation of variables and determine whether the

solutions produced are the same.


dy dy
a) dx +3 y =0 b) dx −4 y=0

dy dy
c) dx −4 xy =0 d) dx + y =0

2) Solve the differential equation by the method of integrating factors


dy −3 x 1 dy
b) dx + 4 y=e b) x dx +2 y=1

dy
c) y ' + y=cos ⁡(e x ) d) 2 dx +4 y=1

dy
e) ( x +1 ) dx + xy=0
2

dy 1
b) dx + y + =0
1−e x

3) Solve the differential equation by separation of variables. Where

reasonable ,express the family of solutions as explicit functions of x .


dy y dy 2
a) dx = x b) dx =2(1+ y ) x

c) ( √ 1+ x 2
1+ y ) dy
dx
=−x d) y ' =−xy

e) (2+2 y 2) y ' =e x y f) e− x sin ( x )− y ' cos 2 ( x)=0


2
dy ( y − y ) dy
g) − =0 h) y− dx sec ⁡( x )=0
dx sin ⁡( x)

dy
4) In each part, find the solution of the differential equation x + y=x
dx

that satisfies the initial-value problem(I.V.P)

a) y (1)=2 b) y (−1)=2

79 | P a g e
dy
5) In each part, find the solution of the differential equation =xy that
dx
1
satisfies the initial-value problem(I.V.P) a) y (0)=1 b) y (0)= 2

6) Solve the initial-value problem by any method.


dy dy
a) dx −2 xy =2 x , y ( 0 )=3 b) dt + y =2 , y ( 0 ) =1

3 x2
c) y'= , y ( 0 )=π d) y ' −x e y =2 e y , y ( 0 )=0
2 y + cos ( y )
2
dy 2 t+1 dx t +1 ( )
e) = , y ( 0 )=−1 f) = , x 0 =−2
dt 2 y−2 dt x +2

dx dx
h) dt =( x −1 ) cost , x ( 0 )=2
2
g) t ( t−1 ) dt =x ( x+1 ) , x ( 2 ) =2

dx x+t dx 4 lnt
i) dt =e , x ( 0 )=a j) dt = 2 , x ( 1 )=0
x

7) At time t=0 ,a tank contains 25 oz of salt dissolved in 50gal of

water. Then brine containing 4 oz of salt per gallon of brine is allowed to

enter the tank at a rate of 2gal/min and the mixed solution is drained

from the tank at the same rate.

a) How much salt is in the tank at an arbitrary time t?

b) How much salt is in the tank after 25 minutes ?

8) A chemical reaction is governed by the differential equation


dx
=K ( 5−x )2 where x (t) is the concentration of the chemical at time t .
dt

The initial concentration at time 5 s is found to be 2.

Determine the reaction rate constant K and find

the concentration at time 10 s and 50 s . What is the

80 | P a g e
ultimate value of the concentration ?

9) Solve the differential equation by the method of integrating factor


dy
with the initial value problem. dx −ky=0 , y ( 0 ) = y 0

10) According to United Nation data , the World population in 1998


was approximately 5.9 billion and growing at a rate of about
1.33% per year. Assuming exponential growth model estimate
the World population at the beginning of the year 2023.

DIFFERENCE EQUATIONS TO DIFFERENTIAL EQUATIONS

Difference Equations

At this point almost all of our sequences have had explicit formulas for their terms.
That is, we have looked mainly at sequences for which we could write the nth term
as a n=f (n) for some known function f . For example, if

n+1
a n= 2
n +3

11 101
then it is an easy matter to compute explicitly, say, a 10 = 103 ∨a100 = 10003 . In such
cases we are able to compute any given term in the sequence without reference to
any other terms in the sequence. However, it is often the case in applications that
we do not begin with an explicit formula for the terms of a sequence; rather, we
may know only some relationship between the various terms. An equation which
expresses a value of a sequence as a function of the other terms in the sequence is
called a difference equation. In particular, an equation which expresses the value
an of a sequence{ a n } as a function of the term a n−1 is called a first-order difference
equation. If we can find a function f such that a n=f ( n ) , n=1,2,3 , … then we will have
solved the difference equation. In this section we will consider a class of difference
equations that are solvable in this sense; in the next section we will discuss an
example where an explicit solution is not possible.

Example Suppose a certain population of owls is growing at the rate of 2% per

year. If we let x 0 represent the size of the initial population of owls and

81 | P a g e
x n the number of owls n years later, then

x n+1=x n +0.02 x n ……………………..(1)

for n=0,1,2 , ….That is, the number of owls in any given year is equal to the number
of owls in the previous year plus 2% of the number of owls in the previous year.
Equation (1) is an example of a first-order difference equation; it relates the
number of owls in a given year with the number of owls in the previous year.
Hence we know the value of a specific x n once we know the value of x n−1.
To get the sequence started we have to know the value of x 0. For example, if
initially we have a population of x 0=100 owls and we want to know what the
population will be after 4 years, we may compute

x 1=1.02 x 0 =( 1.02 )( 100 ) =102,


x 2=1.02 x 1= (1.02 ) ( 102 )=104.04 ,
x 3=1.02 x2 =( 1.02 )( 104.04 )=106.1208 ,
and
x 4 =1.02 x 3=( 1.02 ) ( 106.1208 )=108.243216 .

Figure 1.1 Plot of( n , x n ) , n=0,1,2 , …, where x 0=100 and x n+1=1.02 x n


Thus we would expect about 108 owls in the population after 4 years. Note that
although it is not possible to have a fractional part of an owl, it is nevertheless
important to keep the fractional part in intermediary calculations.
We may work backwards to find x 4 explicitly in terms of x 0 :

x 4 =1.02 x 3
¿ ( 1.02 )( 1.02 ) x 2
¿ ( 1.02 )( 1.02 ) (1.02) x 1

82 | P a g e
¿ ( 1.02 )( 1.02 ) (1.02)(1.02)x 0
¿ ( 1.02 )4 x 0.

This is interesting because it indicates that we can compute x 4 without reference to


the values of x 1 , x 2 and x 3 , provided, of course, that we know the value of x 0.
If we do this in general, then we have solved the difference equation
x n+1=1.02 x n. Namely, we have, for any n=0,1,2 , …,

x n=1.02 x n−1 =( 1.02 ) x n−2=( 1.02 ) x n−3=( 1.02 ) x 0………………(2)


2 3 n

For example, if x 0=100 as above, then we can compute

x 20=( 1.02 )20 ( 100 ) ≈ 149 ,


or even
150
x 150 =( 1.02 ) ( 100 ) ≈ 1,950 ,

without having to compute any intermediate values.

For a geometric feeling of how the population is changing with time, Figure 1.1
shows a plot of the points ( n , x n ) , n=0,1,2, . . . 100. Of course, whether or not our
model will provide an accurate prediction of the owl population 100 or 200 years
into the future is an entirely different question. Frequently, a simple population
model like this will be valid only for a short span of time during which the rate of
growth of population remains stable.

By replacing 1.02 with an arbitrary constant α in (2), we arrive at the general


result that the solution of the difference equation

x n+1=α x n …………………………..(3)

n=0,1,2 , …, is given by

n
x n=α x 0 …………………………….(4)

n=0,1,2 , … . Note that this difference equation, and its solution, are useful
whenever we are interested in a sequence of numbers where the (n+1)st term is a
constant proportion of the nth term. Our first example, where a population was
assumed to grow at a constant rate, is a common example of this type of behavior.
Another common example is when a quantity decreases at a constant rate over
time. This behavior is discussed in the next example in the context of radioactive
decay.

Example

83 | P a g e
Radium is a radioactive element which decays at a rate of 1% every 25 years. This
means that the amount left at the beginning of any given 25 year period is equal to
the amount at the beginning of the previous 25 year period minus 1% of that
amount. That is, if x0 is the initial amount of radium and x n is the amount of
radium still remaining after 25n years, then

x n+1=x n −0.01 x n=0.99 x n……………………….(5)

for ¿ 0,1,2 , … . Since this is a difference equation of the form of (3) with α =0.99 we
know that the solution is of the form (4). Namely,

x n=( 0.99 )n x 0

for ¿ 0,1,2 , … . For example, the amount left after 100 years is given by
4
x 4 =( 0.99 ) x 0=0.9606 x 0

where we have rounded the answer to four decimal places. That is, approximately
96% of the initial amount of radium will be left after 100 years. A plot of the
amount of radium left versus number of years, assuming an initial amount of 500
grams, is given in Figure1.2.

The half-life of a radioactive element is the number of years required for one-half
of an initial amount to decay. Suppose that, for this example, N is the smallest
integer for which x N is less than one-half of the initial amount of radium. This
would mean that

1 N
x ≥ ( 0.99 ) x0
2 0

84 | P a g e
Figure 1.2 Plot of amount of radium versus number of years.
which implies that
1 N
≥ ( 0.99 ) .
2

Taking logarithms, we have

log 10 ( 12 ) ≥ log 10 ( ( 0.99 )N ),

which implies that

log 10 ( 12 ) ≥ N log 10 ( 0.99 ) .

Solving for N , and remembering that log 10 ( 0.99 ) <0 , we have

N≥
log 10 ( 12 ) =68.98,
log 10 ( 0.99 )

rounding to two decimal places. Hence, since N must be an integer, we have N=69.
Recalling that we are working with 25 year units of time, this shows that the half-
life of radium is approximately (25)(69) = 1725 years. For example, this means
that if we started with an initial amount of 100 grams of radium, after 1725 years
we would still have 50 grams left. It would then take an additional 1725 years until
the remaining amount would be reduced to 25 grams.

Although we have stated the results of the preceding example in discrete time
units, namely, units of 25 years each, later we will see that the results hold for
continuous time as well. In other words, although the difference equation (5) has
been set up for nonnegative integer values of n, the solution (6) is valid for
arbitrary nonnegative values of n .

It is interesting to compare the plots in Figures 1.1 and 1.2. The first is an example

85 | P a g e
of exponential growth, whereas the second is an example of exponential decay.
In the first, the steepness of the graph increases with time; in the second, the graph
flattens out over time. The difference equation (3) will always lead to the first
behaviour when α >1 and to the second when 0< α <1 .

First-order linear difference equations

Given constants α and β , a difference equation of the form

x n+1=α x n+ β …………………….(6)

n=0,1,2 , …, is called a first-order linear difference equation. Note that the


difference
equation (3) is of this form with β=0 . A procedure analogous to the method we
used to solve (3) will enable us to solve this equation as well. Namely,

x n=α x n−1 + β
¿ α ( αx n−2 + β ) + β
2
¿ α x n−2 + β ( α +1 )
¿ α 2 ( α x n−3 + β ) + β (α +1)
3 2
¿ α x n−3 + β (α + α +1)

¿ α x 0 + β ( α n−1 + α n−2 +⋯+ α 2 +α +1 )
n

Note that if α =1, this gives us

x n=α x 0 +nβ ……………………..(7)

n=0,1,2 , …, as the solution of the difference equation x n+1=x n + β . If


α ≠ 1,we know that

1−α n
α n−1+ α n −2 + ⋯+α 2+ α +1=
1−α

Hence

n
x n=α x 0 + β ( 1−α)
1−α n
………………………..(8)

n=0,1,2 , …,is the solution of the first-order linear difference equation


x n+1=α x n+ β when α ≠ 1.

86 | P a g e
We have seen examples of first-order linear equations in the population growth and
radioactive decay examples above. Another interesting example arises in modeling
the change in temperature of an object placed in an environment held at some
constant temperature, such as a cup of tea cooling to room temperature or a glass of
lemonade warming to room temperature. If T 0 represents the initial temperature of
the object, S the constant temperature of the surrounding environment, and T n the
temperature of the object after n units of time, then the change in temperature over
one unit of time is given by

T n+1−T n=k ( T n + S ) … … … … … … … …. (9)

n=0,1,2 , …, where k is a constant which depends upon the object. This difference
equation is known as Newton’s law of cooling. The equation says that the change
in temperature over a fixed unit of time is proportional to the difference between
the temperature of the object and the temperature of the surrounding environment.
That is, large temperature differences result in a faster rate of cooling (or warming)
than do small temperature differences. If S is known and enough information is
given to determine k , then this equation may be rewritten in the form of a first
order-linear difference equation and, hence, solved explicitly. The next example
shows how this may be done.

Example

Suppose a cup of tea, initially at a temperature of 180° F , is placed in a room which


is held at a constant temperature of 80° F . Moreover, suppose that after one minute
the tea has cooled to 175° F . What will the temperature be after 20 minutes?

Solution

If we let T n be the temperature of the tea after n minutes and we let S be the
temperature of the room, then we have T 0=180 , T 1=175 , and S=80.
Newton’s law of cooling states that

T n+1−T n=k ( T n −80 ) … … … … … … … … … ..(10)

n=0,1,2 , …, where k is a constant which we will have to determine. To do so, we


make use of the information given about the change in the temperature of the tea
during the first minute. Namely, applying (10) with n=0, we must have

T 1−T 0=k ( T 0−80 ).

That is,

87 | P a g e
175−180=k (180−80).

Hence

−5=100 k ,

and so

−5
k= =−0.05
100

Thus (10) becomes

T n+1−T n=−0.05 ( T n +80 )=−0.05 T n +4


Hence

T n+1=T n−0.05T n +4=0.95 T n +4 … … … … … …(11)

Figure 1.3 Tea temperature decreases asymptotically toward room temperature


for n=0,1,2 , …, . Now (11) is in the standard form of a first-order linear difference
equation, so from (8) we know that the solution is

( )
n
n 1−( 0.95 )
T n=( 0.95 ) ( 180 ) +4
1−0.95
¿ 180 ( 0.95 )n +80 ( 1− ( 0.95 )n )
n
¿ 80+100 ( 0.95 )

88 | P a g e
for ¿ 0,1,2 , … . In particular,
20
T 20 =80+100 ( 0.95 ) =115.85

where we have rounded the answer to two decimal places. Hence after 20 minutes
the tea has cooled to just under 116° F . Also, since

lim ( 0.95 )n=0


n→∞

we see that

lim T n=lim ( 80+100 ( 0.95 ) ¿ ) =80 … … … ..(12)


n

n→∞ n →∞

That is, as we would expect, the temperature of the tea will approach an
equilibrium temperature of 80° F , the room temperature. In Figure 1.3 we have
plotted temperature T n versus time n for n=0,1,2 , … ,60 , along with the horizontal
line T =80. As indicated by (12), we can see that T n decreases asymptotically
toward 80° F as n increases.

EXERCISES

1. Compute the next five terms of each of the following sequences from the given

information.
1
(a) x 0=10 , x n+1=x n +4 (b) y 0=−1, y n +1= y
n

89 | P a g e
(c) x 0=40 , x n+ 1=2 x n−20 (d) z 0=2 , z n+1=z 2n−z n

1
(e) x 0=2 , x1 =3 , x n +2=x n+1 + x n (f) x 0=15 , x n= 3 x n−1+ 2

2. Solve the following difference equations with the given initial condition.
Use your solution to find x 10.

3
(a) x n+1=2 x n , x 0=5 (b) x n+1= 4 x , x 0 =100
n

(c) x n+1=1.8 x n +10 , x0 =20 (d) 4 x n+1−2 x n=12 , x 0=6

(e) x n+1−x n =3 x n + 4 , x 0=2

(f) 5 x n+1−3 x n=2 x n+1−x n , x 0 =100

3. A population of weasels is growing at rate of 3% per year. Let w n be the number

of weasels n years from now and suppose that there are currently 350 weasels.

(a) Write a difference equation which describes how the population changes from

year to year.

(b) Solve the difference equation of part (a). If the population growth continues at

the rate of 3%, how many weasels will there be 15 years from now?

(c) Plot w n versus n for n=0,1,2 , … ,100.

(d) How many years will it take for the population to double?

(e) Find nlim


→∞
w n. What does this say about the long-term size of the population?
Will this really happen?

4. If the rate of growth of the weasel population in exercise 3 was 5% instead of

3%, how many years would it take for the population to double?

5. Suppose that the weasel population of Problem 3 would grow at a rate of 3% a

year if left to itself, but poachers kill 6 weasels every year for their fur.

90 | P a g e
(a) Write a difference equation which describes how the population changes from

year to year.

(b) Solve the difference equation of part (a). How many weasels will there be in 15

years?

(c) Find nlim


→∞
w n. What does this say about the long-term size of the population?

(d) Will the population eventually double? If so, how long will this take?

(e) Plot w n versus n for w n n=0,1,2 , … , 100.

6. A cup of coffee has an initial temperature of 180° F , but cools to 180° F in one

minute when placed in a room with a temperature of 70° F . Let T n be the

temperature of the coffee after n minutes.

(a) Write a difference equation, in standard first order linear form, which describes

the change in temperature of the coffee from minute to minute.

(b) Solve the difference equation from part (a).

(c) Find the temperature of the coffee after 25 minutes.

(d) Find nlim


→∞
T n.

(e) Plot T n versus n for n=0,1,2 , … ,120.

(f) Does the temperature ever reach 70° F ?

7. A glass of lemonade, initially at a temperature of 42° F , is placed in a room

with a temperature of 78° F . If the lemonade warms to 45 ° F in 30 seconds, what

will its temperature be in 10 minutes?


91 | P a g e
8. An iron ingot, heated to a temperature of 300° C , is placed in a liquid bath held

at a constant temperature of 90° C . If the ingot cools to 70° C in two minutes,

what will its temperature be in 20 minutes?

9. A glass of ginger ale is left in a room. Initially, the ginger ale has a temperature

of 45 ° F , but after one minute the temperature has increased to 50° F and after

two minutes it has increased to 54 ° F . What is the temperature of the room?

REFERENCES

1. First steps in numerical analysis (2nd Edition) by Hosking, Joe, Joyce and
Turner.
2. Applied Numerical Analysis (7th Edition) by Gerald and Wheatley.
(Recommended as a prescribed).

3. Numerical Analysis (8th Edition) by Richard L. Burden and J. Douglas Faires.

92 | P a g e
4. Numerical Analysis by Timothy Sauer.

5 Applied Numerical Analysis Using MATLAB (2nd Edition)

by Laurene V. Fausett.

6 Difference Equations to Differential Equations by Dan Sloughter.

93 | P a g e

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy