0% found this document useful (0 votes)
27 views68 pages

Steepest Decent and CG

The document discusses iterative methods for solving large linear systems Ax = b. It introduces steepest descent and conjugate gradients methods. Steepest descent takes steps in the direction of the negative gradient (residual) to minimize the objective function. Conjugate gradients uses orthogonal search directions to find the solution in at most n steps, where n is the dimension, by ensuring the error vectors are A-orthogonal.

Uploaded by

jhdfjhfn
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views68 pages

Steepest Decent and CG

The document discusses iterative methods for solving large linear systems Ax = b. It introduces steepest descent and conjugate gradients methods. Steepest descent takes steps in the direction of the negative gradient (residual) to minimize the objective function. Conjugate gradients uses orthogonal search directions to find the solution in at most n steps, where n is the dimension, by ensuring the error vectors are A-orthogonal.

Uploaded by

jhdfjhfn
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 68

Steepest Decent and Conjugate Gradients (CG)

Steepest Decent and Conjugate Gradients (CG)

• Solving of the linear equation system Ax  b


Steepest Decent and Conjugate Gradients (CG)

• Solving of the linear equation system Ax  b


• Problem: dimension n too big, or not enough time for
gauss elimination
 Iterative methods are used to get an
approximate solution.
Steepest Decent and Conjugate Gradients (CG)

• Solving of the linear equation system Ax  b


• Problem: dimension n too big, or not enough time for
gauss elimination
 Iterative methods are used to get an
approximate solution.
• Definition Iterative method: given starting point x0 , do
steps  x1 , x2 , 
hopefully converge to the right solution x
starting issues
starting issues

• Solving Ax  b is equivalent to minimizing


1 T
f ( x) : x Ax  bT x  c
2
starting issues

• Solving Ax  b is equivalent to minimizing


1 T
f ( x) : x Ax  bT x  c
2

• A has to be symmetric positive definite:


AT  A  xT Ax  0 x  0
starting issues

• f ( x)  1 AT x  1 Ax  b
A symmetric !
 Ax  b  0
2 2
starting issues

• f ( x)  1 AT x  1 Ax  b
A symmetric !
 Ax  b  0
2 2

• If A is also positive definite the solution of Ax  b is the


minimum
starting issues

• f ( x)  1 AT x  1 Ax  b
A symmetric !
 Ax  b  0
2 2

• If A is also positive definite the solution of Ax  b is the


minimum
1 1 T
f ( A1b  d )     bT A1b  c  d  Ad
2 2 0 d  0
starting issues

• error: ei : xi  x
The norm of the error shows how far we are away from the
exact solution, but can’t be computed without knowing of
the exact solution x .
starting issues

• error: ei : xi  x
The norm of the error shows how far we are away from the
exact solution, but can’t be computed without knowing of
the exact solution x .
• residual: ri : b  Axi   Aei  f ( x)
can be calculated
Steepest Decent
Steepest Decent

• We are at the point xi . How do we reach xi 1 ?


Steepest Decent

• We are at the point xi . How do we reach xi 1 ?


• Idea: go into the direction in which f (x) decreases most
quickly (  f ( xi )  ri )
Steepest Decent

• We are at the point xi . How do we reach xi 1 ?


• Idea: go into the direction in which f (x) decreases most
quickly (  f ( xi )  ri )
• how far should we go?
Steepest Decent

• We are at the point xi . How do we reach xi 1 ?


• Idea: go into the direction in which f (x) decreases most
quickly (  f ( xi )  ri )
• how far should we go?
Choose  so that f ( xi   ri ) is minimized:
Steepest Decent

• We are at the point xi . How do we reach xi 1 ?


• Idea: go into the direction in which f (x) decreases most
quickly (  f ( xi )  ri )
• how far should we go?
Choose  so that f ( xi   ri ) is minimized:
d
f ( xi   ri )  0
d
Steepest Decent

• We are at the point xi . How do we reach xi 1 ?


• Idea: go into the direction in which f (x) decreases most
quickly (  f ( xi )  ri )
• how far should we go?
Choose  so that f ( xi   ri ) is minimized:
d
f ( xi   ri )  0
d
f ( xi   ri )T ri  0
Steepest Decent

• We are at the point xi . How do we reach xi 1 ?


• Idea: go into the direction in which f (x) decreases most
quickly (  f ( xi )  ri )
• how far should we go?
Choose  so that f ( xi   ri ) is minimized:
d
f ( xi   ri )  0
d
f ( xi   ri )T ri  0
( A( xi   ri )  b)T ri  0
Steepest Decent

• We are at the point xi . How do we reach xi 1 ?


• Idea: go into the direction in which f (x) decreases most
quickly (  f ( xi )  ri )
• how far should we go?
Choose  so that f ( xi   ri ) is minimized:
d
f ( xi   ri )  0
d
f ( xi   ri )T ri  0
( A( xi   ri )  b)T ri  0
 ( Ari )T ri  (b  Axi )T ri
  
ri
Steepest Decent

• We are at the point xi . How do we reach xi 1 ?


• Idea: go into the direction in which f (x) decreases most
quickly (  f ( xi )  ri )
• how far should we go?
Choose  so that f ( xi   ri ) is minimized:
d
f ( xi   ri )  0
d
f ( xi   ri )T ri  0
( A( xi   ri )  b)T ri  0
T
 ( Ari ) ri  (b  Axi ) ri
T T ri ri
    
ri riT Ari
Steepest Decent

 one step of steepest decent can be calculated as follows:

ri  b  Axi
riT ri
i  T
ri Ar
xi 1  xi   i ri
Steepest Decent

 one step of steepest decent can be calculated as follows:

ri  b  Axi
riT ri
i  T
ri Ar
xi 1  xi   i ri
• stopping criterion: i  imax or r   r with an given small
i 0
 be better to use the error instead of the residual, but
It would
you can’t calculate the error.
Steepest Decent

 Method of steepest decent:


i0
r  b  Ax0
r0  r
T
while ( i  imax and r T r   r0 r0 )
rT r
 T
r Ar
x  x  r
r  b  Ax
i  i 1
Steepest Decent

• As you can see the starting point is important!


Steepest Decent

• As you can see the starting point is important!

When you know anything about the solution use it to guess


a good starting point. Otherwise you can choose a starting
point you want e.g. x0  0 .
Steepest Decent - Convergence
Steepest Decent - Convergence

• Definition energy norm: x A


: x T
Ax
Steepest Decent - Convergence

• Definition energy norm: x A


: x T
Ax

max
• Definition condition:  :
min

( max is the largest and min the smallest eigenvalue of A)


Steepest Decent - Convergence

• Definition energy norm: x A : xT Ax

• Definition condition: max


 :
min
(  is the largest and the smallest eigenvalue of A)
min
max

i
•   1 
ei A    e0
  1
A

convergence gets worse when the condition gets larger



Conjugate Gradients
Conjugate Gradients

• is there a better direction?


Conjugate Gradients

• is there a better direction?


• Idea: d 0 , d1 , , d n 1 orthogonal search directions
Conjugate Gradients

• is there a better direction?


• Idea: d 0 , d1 , , d n 1 orthogonal search directions
n 1
 x   i d i
i 0
Conjugate Gradients

• is there a better direction?


• Idea: d 0 , d1 , , d n 1 orthogonal search directions
n 1
 x   i d i
i 0

• only walk once in each direction and minimize


Conjugate Gradients

• is there a better direction?


• Idea: d 0 , d1 , , d n 1 orthogonal search directions
n 1
 x   i d i
i 0

• only walk once in each direction and minimize


 maximal n steps are needed to reach the exact solution
Conjugate Gradients

• is there a better direction?


• Idea: d 0 , d1 , , d n 1 orthogonal search directions
n 1
 x   i d i
i 0

• only walk once in each direction and minimize


 maximal n steps are needed to reach the exact solution
 ei 1 has to be orthogonal to d i
Conjugate Gradients

• example with the coordinate axes as orthogonal search


directions:
Conjugate Gradients

• example with the coordinate axes as orthogonal search


directions:
Problem: can’t be
computed because
T
d e
 i   iT i
di di
(you don’t know ei !)
Conjugate Gradients

• new idea: d 0 , d1 ,  , d n 1 A-orthogonal


Conjugate Gradients

• new idea: d 0 , d1 ,  , d n 1 A-orthogonal


• Definition A-orthogonal: d i , d j A-orthogonal  d iT Ad j  0
(reminder: d i , d j orthogonal:  d iT d j  0 )
Conjugate Gradients

• new idea: d 0 , d1 ,  , d n 1 A-orthogonal


• Definition A-orthogonal: d i , d j A-orthogonal  d iT Ad j  0
(reminder: d i , d j orthogonal:  d iT d j  0 )

• now ei 1 has to be A-orthogonal to d i


Conjugate Gradients

• new idea: d 0 , d1 ,  , d n 1 A-orthogonal


• Definition A-orthogonal: d i , d j A-orthogonal  d iT Ad j  0
(reminder: d i , d j orthogonal:  d iT d j  0 )

• now ei 1 has to be A-orthogonal to d i


d iT Aei d iT ri
 i   T  T
d i Ad i d i Ad i
Conjugate Gradients

• new idea: d 0 , d1 ,  , d n 1 A-orthogonal


• Definition A-orthogonal: d i , d j A-orthogonal  d iT Ad j  0
(reminder: d i , d j orthogonal:  d iT d j  0 )

• now ei 1 has to be A-orthogonal to d i


d iT Aei d iT ri
 i   T  T
d i Ad i d i Ad i

can be computed!
Conjugate Gradients

• A set of A-orthogonal directions can be found with n


linearly independent vectors ui and conjugate Gram-
Schmidt (same idea as Gram-Schmidt).
Conjugate Gradients

• Gram-Schmidt:
u0 , , u n 1 linearly independent vectors
Conjugate Gradients

• Gram-Schmidt:
u0 , , u n 1 linearly independent vectors
d 0  u0
i 1
i  0 : d i  ui    ij d j
j 0

uiT d j
 ij  
d Tj d j
Conjugate Gradients

• Gram-Schmidt:
u0 , , u n 1 linearly independent vectors
d 0  u0
i 1
i  0 : d i  ui    ij d j
j 0

uiT d j
 ij  
d Tj d j

• conjugate Gram-Schmidt:
uiT Ad j
 ij  
d Tj Ad j
Conjugate Gradients

• A set of A-orthogonal directions can be found with n


linearly independent vectors ui and conjugate Gram-
Schmidt (same idea as Gram-Schmidt).
• CG works by setting ui  ri (makes conjugate Gram-
Schmidt easy)
Conjugate Gradients

• A set of A-orthogonal directions can be found with n


linearly independent vectors ui and conjugate Gram-
Schmidt (same idea as Gram-Schmidt).
• CG works by setting ui  ri (makes conjugate Gram-
Schmidt easy)
riT ri
 d i  ri   i d i 1 with  i  T
ri 1ri 1
Conjugate Gradients
n 1 n 1
• i  j : d r   d Aei  d A k d k   k d iT Ad k  0
T
i j
T
i
T
i
k j k j
  
0
Conjugate Gradients
n 1 n 1
• i  j : d r   d Aei  d A k d k   k d iT Ad k  0
T
i j
T
i
T
i
k j k j
  
0
i 1

• d i  ui    ik d k
k 0
Conjugate Gradients
n 1 n 1
• i  j : d r   d Aei  d A k d k   k d iT Ad k  0
T
i j
T
i
T
i
k j k j
  
0
i 1

• d i  ui    ik d k
k 0
i 1
i  j : 0  d r  u r    ik d k rj
T T T
i j i j
k 0

0 k  j
Conjugate Gradients
n 1 n 1
• i  j : d r   d Aei  d A k d k   k d iT Ad k  0
T
i j
T
i
T
i
k j k j
  
0
i 1

• d i  ui    ik d k
k 0
i 1
i  j : 0  d r  u r    ik d k rj
T T T
i j i j
k 0

0 k  j

 uiT rj  0 i  j
Conjugate Gradients
n 1 n 1
• i  j : d r   d Aei  d A k d k   k d iT Ad k  0
T
i j
T
i
T
i
k j k j
  
0
i 1

• d i  ui    ik d k
k 0
i 1
i  j : 0  d r  u r    ik d k rj
T T T
i j i j
k 0

0 k  j

 uiT rj  0 i  j

ui : ri  riT rj  0 i  j
riT rj   ij
Conjugate Gradients
n 1 n 1
• i  j : d r   d Aei  d A k d k   k d iT Ad k  0
T
i j
T
i
T
i
k j k j
  
0

• ui : ri  ri rj  0 i  j
T

riT r j   ij
i 1
d T
• i ir  u T
r
i i  
k 0
 d T
ik k j

r  u T
i ri
0 k  j
Conjugate Gradients

riT Ad j
•  ij   T
i j
d Ad j
j
Conjugate Gradients

riT Ad j
•  ij   T
i j
d Ad j
j

• rj 1   Ae j 1   A(e j   j d j )  rj   j Ad j
Conjugate Gradients

riT Ad j
•  ij   T
i j
d Ad j
j

• rj 1   Ae j 1   A(e j   j d j )  rj   j Ad j
riT rj 1  riT rj   j riT Ad j
Conjugate Gradients

riT Ad j
•  ij   T
i j
d Ad j
j

• rj 1   Ae j 1   A(e j   j d j )  rj   j Ad j
riT rj 1  riT rj   j riT Ad j

  j riT Ad j  riT rj  riT rj 1


Conjugate Gradients

  j riT Ad j  riT rj  riT rj 1


riT rj   ij
 riT ri
 i j
 i
T
 ri ri
ri Ad j   
T
i  j 1
  i 1
0 i  j  i  j  1


Conjugate Gradients

0 i  j  1
  ij  

Conjugate Gradients

 0 i  j 1
 def 
  ij  
T T T
r r r r r i ri
i i
 i 1d i 1 Ad i 1
T  T
i i
 T
d i 1ri 1 ri 1ri 1
i  j 1
 Method of Conjugate Gradients:

T
r  b  Ax0 while ( i  imax and r T r   r0 r0 )
d r  T
rT r
r0  r d Ad
x  x  d
i0
rold  r
r  b  Ax
rT r
 T
rold rold
d  r  d
i  i 1
Conjugate Gradients - Convergence
Conjugate Gradients - Convergence
i
  1 
• ei  2  e0

A
   1 
A
Conjugate Gradients - Convergence
i
  1 
• ei  2  e0

A
   1 
A

•  for steepest decent   for CG


 Convergence of CG is much better!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy