0% found this document useful (0 votes)
40 views22 pages

Chapter 2 Optimization and Solving Nonlinear Equations

The document discusses using the bisection method to find roots of nonlinear equations. Specifically, it: 1) Explains that the bisection method works by repeatedly narrowing the gap between two values (a and b) where the function has opposite signs, until a root is found; 2) Provides examples of using the bisection method to find roots of polynomial equations and maximize likelihood functions.

Uploaded by

RMolina65
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views22 pages

Chapter 2 Optimization and Solving Nonlinear Equations

The document discusses using the bisection method to find roots of nonlinear equations. Specifically, it: 1) Explains that the bisection method works by repeatedly narrowing the gap between two values (a and b) where the function has opposite signs, until a root is found; 2) Provides examples of using the bisection method to find roots of polynomial equations and maximize likelihood functions.

Uploaded by

RMolina65
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Chapter 2

Optimization and Solving Nonlinear


Equations

This chapter deals with an important problem in mathematics and statistics: finding values of x to satisfy
f (x) = 0. Such values are called the roots of the equation and also known as the zeros of f (x).

2.1 The bisection method

The goal is to find the solution of an equation f (x) = 0.

A question that should be raised is the following: Is there a (real) root of f (x) = 0? One answer is
provided by the intermediate value theorem.

Intermediate value theorem. If f (x) is continuous on an interval [a, b], and f (a) and f (b) have opposite signs,
i.e., f (a)f (b) < 0, then there exists a point ξ ∈ (a, b) such that f (ξ) = 0.

The intermediate value theorem guarantees that a root exists under those conditions. However, it does
not tell us the precise value of the root ξ.

The bisection method works by assuming that we know two values a and b such that f (a)f (b) < 0, and
works by repeatedly narrowing the gap between a and b until it closes in on the correct answer.

It narrows the gap by taking the average a+b a+b


2 of a and b. If f ( 2 ) = 0, then we find a root at 2 .
a+b

Otherwise, look at two subsections: (a, a+b a+b


2 ) and ( 2 , b). By the intermediate value theorem again , there
a+b
must be a root in the interval (a, 2 ) when f (a)f ( 2 ) < 0, or in the interval ( a+b
a+b a+b
2 , b) when f ( 2 )f (b) < 0.
We continue this procedure until a desired accuracy has been achieved.

19
20 CHAPTER 2. OPTIMIZATION AND SOLVING NONLINEAR EQUATIONS

Example 1. Find the zeros of f (x) = 5x 5 − 4x4 + 3x3 − 2x2 + x − 1.

There is at least one real zero of f (x) (why?)

It would be better to start by drawing a graph of f (x).

> f=function(x){5*x^5-4*x^4+3*x^3-2*x^2+x-1}
> x=seq(-50, 50, length=500)
> plot(x, f(x))

> x=seq(-5, 5, length =500)


> plot(x, f(x))

> x=seq(0, 1, length =500)


> plot(x, f(x))

Next we use the bisection method to find the zero between 0 and 1.

> f(0)
[1] -1
> f(1)
[1] 2

> f(.5) # f value at midpoint of (0, 1)


[1] -0.71875 # This suggests next step go to (0.5, 1)

> f(0.75) # f value at midpoint of (0.5, 1)


[1] -0.1884766 # Go to (0.75, 1)

> f(0.875) # f value at midpoint of (0.75, 1)


[1] 0.5733337 # Go to (0.75, 0.875)

> f(0.8125) # f value at midpoint of (0.75, 0.875)


[1] 0.1285563 # Go to (0.75, 0.8125)

> f(0.78125) # at midpoint of (0.75, 0.8125)


[1] -0.04386625 # Go to (0.78125, 0.8125)

> f(0.796875) # at midpoint of (0.78125, 0.8125)


[1] 0.03862511 # Go to (0.78125, 0.796875)

> (0.78125+ 0.796875)/2


[1] 0.7890625
> f(0.7890625) # at midpoint of (0.78125, 0.796875)
[1] -0.003519249 # Go to (0.7890625, 0.796875)
2.1. THE BISECTION METHOD 21

> (0.7890625+ 0.796875)/2


[1] 0.7929688
> f(0.7929688) # at midpoint of (0.7890625, 0.796875)
[1] 0.01732467 # Go to (0.7890625, 0.7929688)

> (0.7890625+ 0.7929688)/2


[1] 0.7910157
> f(0.7910157)
[1] 0.006846331 # Go to (0.7890625, 0.7910157)

> (0.7890625+ 0.7910157)/2


[1] 0.7900391
> f((0.7890625+ 0.7910157)/2) # Go to (0.7890625, 0.7900391)
[1] 0.001649439

> (0.7890625+ 0.7900391)/2


[1] 0.7895508
> f((0.7890625+ 0.7900391)/2) # What do you think?
[1] -0.0009384231 # Is f(0.7895508) close enough to 0?

Below is a simple way.

f=function(x){5*x^5-4*x^4+3*x^3-2*x^2+x-1}

bisection=function(a,b,n){
xa=a
xb=b
for(i in 1:n){ if(f(xa)*f((xa+xb)/2)<0) xb=(xa+xb)/2
else xa=(xa+xb)/2}
list(left=xa,right=xb, midpoint=(xa+xb)/2)
}

> bisection(0,1,15)
$left
[1] 0.7897034

$right
[1] 0.7897339

$midpoint
[1] 0.7897186
22 CHAPTER 2. OPTIMIZATION AND SOLVING NONLINEAR EQUATIONS

Example 2. Find the maximum of


log x
g(x) = , x > 0.
1+x

Since g(x) is differentiable, we look at its derivative


 
0 1 log x 1 x+1
g (x) = − 2
= − log x , x > 0,
x(1 + x) (1 + x) (1 + x)2 x

and find a critical point of g(x) by solving g 0 (x) = 0, or by solving


x+1
− log x = 0,
x
whose root is denoted by c. Clearly, c > 1. It can be shown that g 0 (x) > 0 for all x ∈ (0, c), and g 0 (x) < 0
for all x ∈ (c, ∞). Thus, g(c) is the maximum value of g(x).

> gd=function(x){(1+x)/x-log(x)}
> x=seq(1, 2, length=50)
> plot(x, gd(x)) # It seems that c is between 1.2 and 1.4

> gd(3)
[1] 0.2347210
> gd(6)
[1] -0.6250928

> bisection=function(a,b,n){
xa=a
xb=b
for(i in 1:n){ if(gd(xa)*gd((xa+xb)/2)<0) xb=(xa+xb)/2
else xa=(xa+xb)/2}
list(left=xa,right=xb, midpoint=(xa+xb)/2)
}

> bisection(3,6,30)
$left
[1] 3.591121

$right
[1] 3.591121

$midpoint
[1] 3.591121
2.1. THE BISECTION METHOD 23

Example 3. A Cauchy density function takes the form


1
f (x) = , x ∈ R,
{1 + (x − θ)2 }π
where θ is a parameter.

(1) Generate 50 random numbers from a Cauchy distribution with θ = 1.

data = rcauchy(50, 1)

Log−likelihood function Derivative


−150

10
5
−250

ld(data, θ)
l(data, θ)

0
−350

−5
−10
−450

−40 0 20 40 −40 0 20 40

θ θ

(2) Treat the data you get from step (1) as sample observations from a Cauchy distribution with an
unknown θ. Plot the log-likelihood function of θ,
n
X
l(θ) = −n ln π − ln{1 + (xi − θ)2 }, θ ∈ R.
i=1

l=function(x,t){
s=0
n=length(x)
for(j in 1:n) s=s + log(1+(x[j]-t)^2)
l=-n*log(pi)-s
l
}
theta=seq(-50, 50,length=500)
plot(theta, l(data,theta), type="l",main="Log-likelihood function",
xlab=expression(theta), ylab=expression(l(data, theta)))
24 CHAPTER 2. OPTIMIZATION AND SOLVING NONLINEAR EQUATIONS

(3) The maximum value seems to be shown in the above plot of the log-likelihood function of θ. Use the
bisection method to find the maximum likelihood estimator of θ.

To do so, we calculate the derivative of l(θ), with a constant dropped,


n
0
X θ − xi
l (θ) = , θ ∈ R,
{1 + (xi − θ)2 }
i=1

and draw a plot of l 0 (θ).

ld=function(x,t){
s=0
n=length(x)
for(j in 1:n) s=s + (t-x[j])/(1+(x[j]-t)^2)
l=s
l
}
theta=seq(-10, 10,length=500)
plot(theta, ld(data,theta), type="l",main="Derivative",
xlab=expression(theta), ylab=expression(ld(data, theta)))

The bisection method is applicable to l 0 (θ), since it is continuous everywhere.

f=function(t){ld(data, t)}

bisection(-10,10,30)
$left
[1] 0.9758892

$right
[1] 0.9758892

$midpoint
[1] 0.9758892

Hence, θ̂ = 0.9758892
2.2. SECANT METHOD 25

2.2 Secant method

The secant method begins by finding two points on the curve of f (x), (x 0 , f (x0 )) and (x1 , f (x1 )), hopefully
near to a root r we seek. A straight line that passes these two points is

y − f (x0 ) x − x0
= .
f (x1 ) − f (x0 ) x1 − x 0

If x2 is the root of f (x) = 0, and the point (x 2 , f (x2 )) is on the line, then

0 − f (x0 ) x2 − x 0
= .
f (x1 ) − f (x0 ) x1 − x 0

From this we solve for x2 ,


x0 − x 1
x2 = x1 − f (x1 ) .
f (x0 ) − f (x1 )
Because f (x) is not exactly linear, x 2 is not equal to r, but it should be closer than either of the two points
we begin with.

If we repeat this, we have


xn−1 − xn
xn+1 = xn − f (xn ) , n = 1, 2, . . .
f (xn−1 ) − f (xn )

Under the assumptions that the sequence {x n , n = 1, 2, . . .} converges to r, f (x) is differentiable near r,
and f 0 (r) 6= 0, we obtain

lim xn−1 − lim xn


n→∞ n→∞
lim xn+1 = lim xn − f ( lim xn ) ,
n→∞ n→∞ n→∞ f ( lim xn−1 ) − f ( lim xn )
n→∞ n→∞

or
r = r − f (r)/f 0 (r),
which gives f (r) = 0.
26 CHAPTER 2. OPTIMIZATION AND SOLVING NONLINEAR EQUATIONS

Example. Find the zeros of f (x) = x3 − 2x2 + x − 1 using the secant method.

f=function(x){x^3-2*x^2+x-1}

secant=function(x,y,n){ # Katherine Earles’s code


if(abs(f(x))<abs(f(y))){xa=x} & {xb=y}
else {xa=y}&{xb=x}
xc=0
for(i in 1:n){xc=xb-f(xb)/(f(xa)-f(xb))*(xa-xb)
xa=xb
xb=xc}
list("x(n)"=xa, "x(n+1)"=xb)
}

> secant(0,5,12)
$"x(n)"
[1] 1.754878

$"x(n+1)"
[1] 1.754878

> secant(5,0, 12)


$"x(n)"
[1] 1.754878

$"x(n+1)"
[1] 1.754878

> secant(5,0, 15)


$"x(n)"
[1] 1.754878

$"x(n+1)"
[1] NaN

The above code does break down for high enough values of n (returns NaN). The following is an improvement
on function h that fixes the problem. The “if statement” will break out of the loop if the values of xa and
xb are equal.
2.2. SECANT METHOD 27

g=function(x,y){y-(f(y)/(f(x)-f(y)))*(x-y)}

h=function(x,y,n){ # Katherine Earles’s code


xa=x
xb=y
xc=0
for(i in (1:n)){if (identical(all.equal(xa, xb), TRUE)) break
else # or {xc=g(xa,xb)}&{ xa=xb}&{xb=xc}
xc=g(xa,xb)
xa=xb
xb=xc
}
list("x(n)"=xa,"x(n+1)"=xb)}

> h(-10,50,500)
$"x(n)"
[1] 1.754878

$"x(n+1)"
[1] 1.754878
28 CHAPTER 2. OPTIMIZATION AND SOLVING NONLINEAR EQUATIONS

2.3 Newton’s method

Newton’s method or the Newton-Raphson method is a procedure or algorithm for approximating the zeros
of a function f (or, equivalently, the roots of an equation f (x) = 0). It consists of the following three steps:

Step 1. Make a reasonable initial guess as to the location of a solution, which is denoted by x 0 .

Step 2. Calculate
f (x0 )
x1 = x 0 − .
f 0 (x0 )

Step 3. If x1 is sufficiently close to a solution, stop; otherwise, continue this procedure by


f (x1 )
x2 = x 1 − f 0 (x1 ) ,
f (x2 )
x3 = x 2 − f 0 (x2 ) ,
···
f (xn−1 )
xn = xn−1 − f 0 (xn−1 ) .

Under the assumptions that the sequence x 0 , x1 , . . . , xn , . . . converges to r, and that f (x) is differentiable
near r with f 0 (r) 6= 0, by taking the limit on both sides of

f (xn−1 )
xn = xn−1 − ,
f 0 (xn−1 )

we obtain
f (r)
r=r− ,
f 0 (r)
which results in f (r) = 0.

This method requires that the first approximation is sufficiently close to the root r.

A comparison between the secant method and Newton’s method. The secant method is obtained from
Newton’s method by approximating the derivative of f (x) at two points x n and xn−1 by

f (xn ) − f (xn−1 )
f 0 (x) = .
xn − xn−1
Geometrically, Newton’s method uses the tangent line and the secant method approximates the tangent line
by a secant line.
2.3. NEWTON’S METHOD 29

Example 1. Find a zero of f (x) = x2006 + 2006x + 1.

> newton=function(x0, n){


f=function(x){x^(2006)+2006*x+1}
fd=function(x){2006*x^(2005)+2006}
x=x0
for (i in 1:n){x=x-f(x)/fd(x)}
list(x)
}

> newton(-.5, 20)


[1] -0.0004985045

> newton(3.5, 10) * It is sensitive to an initial guess x0


[1] NaN

> nr=function(x0, numstp,eps){


numfin = numstp
small = 1.0*10^(-8)
istop = 0
while(istop == 0){
f=function(x){x^(2006)+2006*x+1}
fd=function(x){2006*x^(2005)+2006}
x1=x0-f(x0)/fd(x0)
check = abs(x0-x1)/abs(x0 + small)
if(check < eps){istop=1}
x0=x1
}
list(x1=x1,check=check)
}

> nr(20,0,20,0.3)
$x1
[1] -0.0004985045

$check
[1] 2.174953e-16
30 CHAPTER 2. OPTIMIZATION AND SOLVING NONLINEAR EQUATIONS

Example 2. The Weibull distribution function is of the form

1 − exp{−(βx)λ },

if x ≥ 0,
F (x) =
0, elsewhere,

where λ and β are positive parameters.

(1) Generate 50 random numbers from a Weibull distribution with β = 1 and λ = 1.8.

> weib = rweibull(50, shape=1.8, scale = 1)

(2) Add three more numbers to the above group. Treat these 53 observations as your data from a Weibull
distribution with an unknown λ, but keep β = 1 fixed. Plot the log-likelihood function of λ.

> mydata = c(weib, 0.9, 1, 1.1) # add 3 numbers 0.9, 1, 1.1

The likelihood and log-likelihood functions of λ are

n
!λ−1 n
!
Y X
L(λ) = λn xk exp − xλk ,
k=1 k=1

and
n
X n
X
l(λ) = n ln λ + (λ − 1) ln xk − xλk ,
k=1 k=1

respectively.

> loglike=function(t){
x=mydata
s=0
for(i in 1:length(x)) s=s-x[i]^t+(t-1)*log(x[i])
loglike=53*log(t)+s
loglike
}

> l=seq(0.5, 3,len=200)


> plot(l, loglike(l), type=’l’, xlab=expression(lambda),
ylab=expression(l(lambda)),
main=‘loglikelihood function for Weibull Data’)

It can be seen from the plot of loglikelihood function that l(λ) is concave.
2.3. NEWTON’S METHOD 31

(3) Use Newton’s method to find the maximum likelihood estimator of λ.

To do so, we need solve the equation l 0 (λ) = 0 for stationary points. The first and second derivatives of
l(λ) are
n n
n X X
0
l (λ) = + ln xk − xλk ln xk ,
λ
k=1 k=1

and
n
n X
l00 (λ) = − − xλk ln2 xk .
λ2
k=1

Now l00 (λ) < 0 indicates that there is the unique maximum point of l(λ).

ld=function(t){ # define l’(lambda)


x=mydata
s=0
for(i in 1:length(x)) s=s-(log(x[i]))*x[i]^t+log(x[i])
ld=s+53/t
ld
}

ldd=function(t){ # define l’’(lambda)


x=mydata
s=0
for(i in 1:length(x)) s=s-(log(x[i]))^2*x[i]^t
ldd=s-53/t^2
ldd
}

newton=function(t,n){ # Newton’s iteration


for(i in 1:n) {t=t-ld(t)/ldd(t)}
t
}

> newton(0.1,20) # x_0=0.1


[1] 1.704811
32 CHAPTER 2. OPTIMIZATION AND SOLVING NONLINEAR EQUATIONS

2.4 Fixed-point iteration: x = g(x) method

Suppose that we can bring an equation f (x) = 0 in the form x = g(x), which usually can be done in several
ways. Whenever r = g(r), r is said to be a fixed point for the function g(x).

We can solve this equation, on certain conditions, using iteration.

Start with an approximation x0 of the root.

Calculate
x1 = f (x0 ),
x2 = f (x1 ),
···
xn+1 = g(xn ), n = 0, 1, 2, . . .

Example. Consider a simple equation


x2 − 3x + 2 = 0.
It can be rewritten as x = g(x) in many ways. For instance,
x2 + 2
x= ,
√3
x = 3x − 2,

x = − 3x − 2,
x = x2 − 2x + 2,
1√ x
x= 3x − 2 + .
2 2
The for loop can be easily set down.

> fixed=function(x, n){


for(i in 1:n){ x = g(x) }
x
}

x2 +2
Let us take a look of x = 3 .

> g=function(x){(x^2+2)/3}
> fixed(0.1, 20)
[1] 0.9999037 # It’s close to 1, one of the roots.
> fixed(3, 20)
[1] Inf # A problem of the initial point?
> fixed(-4, 20)
[1] Inf
2.4. FIXED-POINT ITERATION: X = G(X) METHOD 33

A solution is guaranteed under the assumptions of the following theorem.

Theorem. If |g 0 (x)| ≤ k < 1 in an interval (a, b), and the sequence {x 0 , x1 , ..., xn , ...} belongs to (a, b), then
the sequence has a limit r, and r is the only root of x = g(x) in the interval (a, b).

Proof. Appealing on Lagrange’s theorem we can write

x2 − x1 = g(x1 ) − g(x0 ) = (x1 − x0)g 0 (c1 ), c1 is between x0 and x1 ,


x3 − x2 = g(x2 ) − g(x1 ) = (x2 − x1 )g 0 (c2 ), c2 is between x1 and x2 ,
···

xn+1 − xn = g(xn ) − g(xn−1 ) = (xn − xn−1 )g 0 (cn ), cn is between xn−1 and xn .


Since |g 0 (x)| < k < 1, we obtain

|x2 − x1 | < k|x1 − x0 |,


|x3 − x2 | < k|x2 − x1 | < k 2 |x1 − x0 |,
···
|xn+1 − xn | < k|xn − xn−1 | < · · · < k n |x1 − x0 |,

and for m > n,

|xm − xn | ≤ |xm − xm−1 | + |xm−1 − xm−2 | + . . . + |xn+1 − xn |


< k m−1 |x1 − x0 | + k m−2 |x1 − x0 | + . . . + k n |x1 − x0 |
= (k m−1 + . . . + k n )|x1 − x0 |
kn − km
= |x1 − x0 |.
1−k

Thus, by Cauchy’s criterion, the sequence {x n , n = 0, 1, 2, . . .} converges. Say the limit is r. By taking limit
of both sides of the equation
xn+1 = g(xn ),
we obtain limn→∞ xn+1 = limn→∞ g(xn ), or
r = g(r),
which means that r is a root of the equation x = g(x).

If r1 is a second root of x = g(x) in the interval (a, b), then

r1 − r = g(r1 ) − g(r) = (r1 − r)g 0 (c), with c ∈ (a, b).

Then
g 0 (c) = 1,
and this gives a contradiction. 2
34 CHAPTER 2. OPTIMIZATION AND SOLVING NONLINEAR EQUATIONS

Notice that Newton’s method is a special case of the fixed-point iteration, with
f (x)
g(x) = x − ,
f 0 (x)
and
{f 0 (x)}2 − f (x)f 00 (x) f (x)f 00 (x)
g 0 (x) = 1 − = .
{f 0 (x)}2 {f 0 (x)}2
Applying the above theorem to this particular case, we obtain

Corollary. Assume that the function f (x) is continuous in the interval [a, b] and is twice differentiable in
(a, b), with
f (x)f 00 (x)
≤ k < 1, x ∈ (a, b).
{f 0 (x)}2
If the sequence {x0 , x1 , x2 , . . .} is formulated by Newton’s method with
f (xn )
xn+1 = xn − , n = 0, 1, 2, . . . ,
f 0 (xn )
and xn ∈ (a, b), n = 0, 1, 2, . . . , then the sequence has a limit r, and r is the only root of f (x) = 0 in the
interval [a, b].

This corollary indicates that the initial point x 0 is very important for Newton’s method. A good try
should start with a x0 that satisfies
f (x0 )f 00 (x0 )
≤ k < 1.
{f 0 (x0 )}2

2.5 Convergence rate

Consider a fixed-point iteration for solving the equation x = g(x) with the procedure

xn+1 = g(xn ), n = 0, 1, 2, . . .

Let r be the root of the equation. Define the nth step error by
en = r − x n , n = 1, 2, . . .

Since r = g(r), we obtain

en+1 = r − xn+1
= g(r) − g(xn )
= g 0 (cn )(r − xn ) by the mean value theorem
0
= g (cn )en .

This means the error at the (n + 1)th step is linearly related to the error at the nth step.

For Newton’s method, it can be shown that the error at the (n + 1)th step is quadratically related to
the error at the nth step.
2.6. NEWTON’S METHOD FOR A SYSTEM OF NONLINEAR EQUATIONS 35

2.6 Newton’s method for a system of nonlinear equations

Newton’s method can be applied for solving a system of nonlinear equations. This is particularly useful
when we try to find maximum likelihood estimators of several parameters.

Let F(x) be a vector-valued function of a vector argument x, assuming that both vectors contain m
components. To apply Newton’s method to the problem of approximating a solution of

F(x) = 0,

we would like to start from an initial point x 0 and then write

xn+1 = xn − F(xn )/F0 (xn ), n = 0, 1, 2, . . . .

Two questions arise in the above procedure immediately. First, what is meant by F 0 (xn )? and second, what
is meant by the division F(xn )/F0 (xn )?

Here, F0 (x) is a matrix defined by


 ∂f1 (x) ∂f1 (x) ∂f1 (x) 
∂x1 ∂x2 ... ∂xm
∂f2 (x) ∂f2 (x) ∂f2 (x)
...
 
F0 (x) =  ∂x1 ∂x2 ∂xm .
 
 ... ... 
∂fm (x) ∂fm (x) ∂fm (x)
∂x1 ∂x2 ... ∂xm

This matrix is known as the Jacobian matrix for the system and is typically denoted by J(x).

For the division of two matrices, we use multiplication of an inverse. Thus, Newton’s method takes the
form
xn+1 = xn − (J(xn ))−1 F(xn ), n = 0, 1, 2, . . . .
When implementing this scheme, rather than actually computing the inverse of the Jacobian matrix, we
define
vn = −(J(xn ))−1 F(xn ),
and then solve the linear system of equations

J(xn )vn = −F(xn ),

for vn . Once vn is known, the next iterate is computed according to the rule

xn+1 = xn + vn , n = 0, 1, 2, . . . .
36 CHAPTER 2. OPTIMIZATION AND SOLVING NONLINEAR EQUATIONS

Example 1. Find the solution of the system of two nonlinear equations

x31 − 2x2 + 1 = 0,


x1 + 2x32 − 3 = 0.

First of all, we set up

x31 − 2x2 + 1
   
x1
x= , and F(x) = .
x2 x1 + 2x32 − 3

Then, find the Jacobian matrix for the system,

3x21 −2
 
J(x) = .
1 6x22

The next codes were written by Katherine Earles.

F=function(x){ # define the (column) vector of equations


F=matrix(0,nrow=2) # nrow depends on the length of F
F[1]= x[1]^3-2*x[2]+1 # The first component of F
F[2]= x[1]+2*x[2]^3-3 # The second component of F
F # output F, a column vector of values
}

J=function(x){ # define the Jacobian of F


j=matrix(0,ncol=2,nrow=2) # ncol & nrow depend on the length of F
j[1,1]= 3*x[1]^2
j[1,2]= -2
j[2,1]= 1
j[2,2]= 6*x[2]^2
j # output j, a matrix of values
}

NNL=function(initial,n){ # Newton’s method for a system of non-linear equations


x=initial
v=matrix(0,ncol=length(x))
for (i in 1:n){
v=solve(J(x),-F(x))
x=x+v}
cat(" x1=",x[1],"\n","x2=",x[2],"\n")
}

Sometimes we may need check whether the Jacobian matrix is invertible. For this purpose, the above codes
are improved.
2.6. NEWTON’S METHOD FOR A SYSTEM OF NONLINEAR EQUATIONS 37

NNL=function(initial,n){ # Newton’s method for a system of non-linear equations


x=initial
v=matrix(0,ncol=length(x))
for (i in 1:n){
d=det(J(x)) # check that J(x) is invertible
if (identical(all.equal(d,0),TRUE))
{cat("Jacobian has no inverse. Try a different initial point.","\n")
break}
else
v=solve(J(x),-F(x))
x=x+v
}
cat(" x1=",x[1],"\n","x2=",x[2],"\n")
}

> NNL(c(0.1, 0.2), 1)


x1= 2.901794
x2= 0.5425269
> NNL(c(0.1, 0.2), 2)
x1= 1.969765
x2= 0.9450524
> NNL(c(0.1, 0.2), 3)
x1= 1.387231
x2= 0.9309951
> NNL(c(0.1, 0.2), 4)
x1= 1.093614
x2= 0.9872401
> NNL(c(0.1, 0.2), 5)
x1= 1.007192
x2= 0.9989359
> NNL(c(0.1, 0.2), 6)
x1= 1.000047
x2= 0.9999933
> NNL(c(0.1, 0.2), 7)
x1= 1
x2= 1
38 CHAPTER 2. OPTIMIZATION AND SOLVING NONLINEAR EQUATIONS

Example 2 (Logistic regression model). Let Y denote a binary response variable. The regression model

exp(β0 + β1 x)
E(Y ) = π(x) =
1 + exp(β0 + β1 x)

is called the logistic regression model, where β 0 and β1 are parameters.

Suppose that Y1 , Y2 , . . . , Yn are independent Bernoulli random variables with

exp(β0 + β1 xi )
E(Yi ) , πi = , i = 1, . . . , n,
1 + exp(β0 + β1 xi )

where x observations are assumed to be known constants.

The likelihood function of parameters β 0 and β1 is


n
πiyi (1 − πi )1−yi
Y
L(β0 , β1 ) =
i=1
n  n
Y π i  yi Y
= · (1 − πi )
1 − πi
i=1 i=1
n
nX o Yn
= exp (β0 + β1 xi )yi · {1 + exp(β0 + β1 xi )}−1 .
i=1 i=1

From this we obtain the log-likelihood function


n
X n
X
`(β0 , β1 ) = (β0 + β1 xi )yi − ln {1 + exp(β0 + β1 xi )} .
i=1 i=1

However, no closed-form solution exists for the values of β 0 and β1 ) that maximize the log-likelihood function
`(β0 , β1 ). So we need maximize `(β0 , β1 ) numerically.

A data set from Kutner et al. (2005), Applied Statistical Models, page 566, (x=months of experience,
y=task success):

x=c(14,29,6,25,18,4,18,12,22,6,30,11,30,5,20,13,9,32,24,13,19,4,28,22,8) # months
y=c(0,0,0,1,1,0,0,0,1,0,1,0,1,0,1,0,0,1,0,1,0,0,1,1,1) # success

∂ ∂
We start by defining the partial derivatives of `(β 0 , β1 ), ∂β0 `(β0 , β1 ) and ∂β1 `(β0 , β1 ), which are our target
functions.

F1=function(b){
F1=0
for(i in 1:length(x)) F1=F1+y[i]-exp(b[1]+b[2]*x[i])/(1+exp(b[1]+b[2]*x[i]))
F1
}
2.6. NEWTON’S METHOD FOR A SYSTEM OF NONLINEAR EQUATIONS 39

F2=function(b){
F2=0
for(i in 1:length(x)) F2=F2+x[i]*y[i]-x[i]*exp(b[1]+b[2]*x[i])/(1+exp(b[1]+b[2]*x[i]))
F2
}

F=function(b){
F=matrix(0,nrow=2)
F[1]=F1(b)
F[2]=F2(b)
F
}

Alternatively, the vector function F(β 0 , β1 ) can be set as follows

F=function(b){
F=matrix(0,nrow=2)
s1=0
s2=0
for(i in 1:length(x)){
s1 = s1 +y[i]-((exp(b[1]+b[2]*x[i]))*(1+exp(b[1]+b[2]*x[i]))^(-1))
s2 = s2 +x[i]*y[i]-(x[i]*(exp(b[1]+b[2]*x[i]))*(1+exp(b[1]+b[2]*x[i]))^(-1))}
F[1]=s1
F[2]=s2
F}

The next step is to set down the Jacobian matrix, a 2 × 2 matrix.

J=function(b){
j=matrix(0,ncol=2,nrow=2) # The format of J is 2 by 2
s11=0
s12=0
s22=0
for(i in 1:length(x)){
s11 = s11-exp(b[1]+b[2]*x[i])*(1+exp(b[1]+b[2]*x[i]))^(-2)
s12 = s12 -x[i]*exp(b[1]+b[2]*x[i])*(1+exp(b[1]+b[2]*x[i]))^(-2)
s22 = s22 -(x[i]^(2))*exp(b[1]+b[2]*x[i])*(1+exp(b[1]+b[2]*x[i]))^(-2)
}
j[1,1]=s11
j[1,2]=s12
j[2,1]=s12
j[2,2]=s22
j
}
40 CHAPTER 2. OPTIMIZATION AND SOLVING NONLINEAR EQUATIONS

The R codes for Newton’s method are

NNL = function(initial,n){
b=initial
v=matrix(0,ncol=length(b))
for (i in 1:n){
d=det(J(b)) # check that J(b0,b1) is invertible
if(identical(all.equal(d,0),TRUE))
{cat(’Jacobian has no inverse.Try a different initial point.’,’\n’)
break}
else
v=solve(J(b),-F(b))
b=b+v}
cat(’ b0=’,b[1],’\n’,’b1=’,b[2],’\n’)
}

Finally let us try several particular cases.

> NNL(c(1,1),10) # A good initial point is important!


Error in qr(x, tol = tol) : NA/NaN/Inf in foreign function call (arg 1)

> NNL(c(1,0),5) # A small n


b0= -3.059696
b1= 0.1614859

> NNL(c(1,0),200) # A large n


b0= -3.059696
b1= 0.1614859

> F(c(-3.059696, 0.1614859)) # check the value of F


[1,] 2.066355e-06
[2,] 4.156266e-05

Thus, the maximum likelihood estimators of β 0 and β1 are -3.059696 and 0.1614859, respectively.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy