Regularization Methods: An Applied Mathematician's Perspective
Regularization Methods: An Applied Mathematician's Perspective
Outline
Well (and Ill-) Posedness Regularization Optimization approach (Tikhonov) Bayesian connection (MAP) Filtering approach Iterative approach Regularization Parameter Selection Applied Math Wish List Focus on linear problems.
Well-Posedness
Denition due to Hadamard, 1915: Given mapping , equation
such that
; and
is continuous.
exists
det
well-posed
. . .
Innite-Dimensional Example
(Compact) diagonal operator on (Hilbert) space
6 9 8 7 8 ; < : 8 : 8 6 @ / A = : / / / 8 8 8 B: : / / / : : / / / 9 C 7 C B C : : : / / / : / / / : ; / >= ? : /
Dene
6
by
Then
, but
--
J
for all
, but
by pseudo-inverse
But ... Discrete problems approximate underlying innite dimensional problems (Discrete problems become increasingly ill-conditioned as they become more accurate). In Inverse Problems applications is often compact, and it acts like the diagonal operator in the above example (Compact operators can be diagonalized using the SVD; diagonal entries decay to zero).
Regularization
Remedy for ill-posedness (or ill-conditioning, in discrete case). Informal Denition: Imposes stability on an ill-posed problem in a manner that yields accurate approximate solutions, often by incorporating prior information. More Formal Denition: Parametric family of approximate inverse operators with the following property. If , and , we can pick parameters such that
NOM P Q VU W V RS T Y TZ N M P , , /T N RS X
Tikhonov Regularization
Math Interpretation. In simplest case, assume are Hilbert spaces. To obtain regularized soln to , choose to t data in least-squares sense, but penalize solutions of large norm. Solve minimization problem
: - . \ X ' & ( %$ #" U -- -N ) * [ [ ^ ; ] 7 U ] F G
` _ a
def d e d e d e d e d f d e d e d d def d e d e d e d e d f d e d e d d F G D
. h hee e h h
c 7
U / GF I GF / D H i
Contours for ||Ax(y)||2 0.2 0.4 0.6 0.8 1 x 1.2
8
/ j H
1.4
g H
1.8
1.8
1.6
1.6
1.4
1.4
1.2
1.2
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.4
0.6
0.8
1 x
1.2
1.4
1.6
1.8
1.6
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.4
0.6
0.8
1 x
1.2
1.4
1.6
1.8
0.2
0.4
0.6
H
0.8 1 x
I
1.2
1.4
--
1.6
--
1.8
GF
u = H v I m rs
GF
pr GF
l H rs = r = = = pq J Q nJ q mno nq m tn m nq mn s = q m nq I v G v H G nz v p{ = n =K r ; 7 ; 7 x w y x w y ; x w y $ $ l l $ . . F I F . )
Maximum a posteriori (MAP) extimator is max of posterior pdf. Equivalently, minimize w.r.t.
H l 7
nz v
If
| } l 7 VU ; l 7 ; B7 | D ; j .
If
c 7 ; --
Normal
: ~ ^ ; D 7
and
. - j ~ U X -- : X ~ j ~
SAMSI Opening Workshop p.13/33
B7 l
, then prior is
l ~ ;
Normal
7 : ~ ^ ;
. ~ . - BE j / BE ~
--
Illustrative Example
, conditional pdf is
SNR
9 7
In and
matrix case,
: : / / /
diag
: : / ] / /
with
/ / / :
: 0 =
: =
: =
Tikhonov Filtering
In the case of Tikhonov regularization, using the SVD (and assuming matrix with for simplicity),
] ^ ; ] 7 N M U X ] = b
] ;
] X /
^ ;
diag
GF
97
If
, then
N M
= ;
, so
Y TZ ; ] 5
diag
D E 9
as
Plot of Tikhonov lter function shows that Tikhonov regularization lters out singular components that are small (relative to ) while retaining components that are large.
N o
1
0.8
w (s2)
0.6
0.4
0.2
0 5 10
10
10
10 s2
10
10
10
Has sharp cut-off behavior instead of smooth roll-off behavior of Tikhonov lter.
Iterative Regularization
Certain iterative methods, e.g., steepest descent, conjugate gradients, and Richardson-Lucy (EM), have regularizing effects with the regularization parameter equal to the number of iterations. These are useful in applications, like 3-D imaging, with many unknowns. An example is Landweber iteration, a variant of steepest descent. Minimize the least squares t-to-data functional
D B . - c 7 ; -
using gradient descent iteration, initial guess xed step length parameter .
0@ @ D - E-
, and
Landweber Iteration
{ { c 7
grad
7 ] ] ;
; : : : : { { . U = ;{ I ; ] ; 7 D G s H pq qQ p J= p v
10 s2
1
DB / s
10
0
diag
0.8
w (s )
0.6
0.4
0.2
0 4 10
10
10
/
10
1
noise
0.8
0.6
0.4
0.2
0.2
0.1
0.2
0.3
0.4
0.5 x axis
0.6
0.7
0.8
0.9
/ : / / / :
.=
RS
Singular Vectors
Singular Vector v1 0.2 0.2 v4
Singular Values of K
10
0.15
0.1
10
0.1
10
0.05
0.1
10
0.2
0.8
0.2
0.2
0.8
10
10
0.2
0.3 0.2
10
12
0.1
0.1 0
10
14
10
16
0.1
10
18
10
20
30
40 index i
50
60
70
80
0.2
0.2
0.8
0.4
0.2
0.8
Tikhonov Solutions vs
Tikhonov regularized solution is Solution error is .
N 8 .P 8 N RS T
Norm of Tikhonov Solution Error 4 3.5 3 1.2 1 0.8
^ ;
= 0.0014384
||e||
x(t)
2.5
1.5 1 0.5 6 10 10
4
10
10
0.2
0.2
0.4 t axis =1
0.6
0.8
]
0.8
x(t)
x(t)
0 0.2 0.4 t axis 0.6 0.8 1
0.5
0.5 1 1.5
0.2
0.2
0.4 t axis
0.6
N Y TZ N Y TZ N .Q P N M O F N M . ^ P ; Q RS . ^ ; RS T U NO V M
SAMSI Opening Workshop p.23/33
Solution Error:
N . P RS T N 7 N M O uo H = G 7 T RS N
Predictive Error:
P Q
Regularized Solution:
diag
N = ; E 9 7
RS T 7 VU
Error Indicators
I T U uq u tp
P Q
= G = H I F NOM V ]
Residual is
Y TZ . P N . N RS T F . G qQ pJ H p P V 7 ; RS 7 X ; V T I V ; 7 7 ; ; 7 ^ 7 ; X U X ^
Let denote expected value operator. Assume deterministic (or independent of ), assume note that is symmetric. Then ...
is , and
N -- B F V ] X XV ; 7 ; . ^ 7 7 --
So up to const
7 ; -- V -.
X N --- UB N -- UB j ~ V ] XV X ; 7 ; 7 -Y TZ
V = : j
UPRE, Continued
U V ] N --- I 7 X V ;
trace
is
Predictive error norm and solution error norm need not have the same minimizer, but the mins are often quite close. There is a variant of UPRE, called generalized cross validation (GCV), which requires minimization of
Y TZ 7 X ; N -- -; 7 7 N -N
--
--
trace
2-D image reconstruction problem, noise N , Tikhonov regularization. o-o indicates soln error norm; indicates GCV; indicates ; and indicates predictive error norm.
V | 7 . . X ; .
10
7
10
10
10
10
10 6 10
10
10
10 10 Regularization Parameter
10
;
10
1
10
Mathematical Summary
There exists a well-developed mathematical theory of regularization. There are a number of different approaches to regularization. optimization-based (equivalent to MAP) ltering-based iteration-based There are robust schemes for choosing regularization parameters. These techniques often work well in practical applications.
noise
: =
For high contrast imaging (dim object near very bright object), accurate modeling of noise is critical. With ordinary (and even weighted) least squares, dim object is missed.
tJ p
Poisson
7
: =
: =
/ / / =
Poisson
xed, known
xed, known
Log likelihood (
7 ; .
x w y
g
<
) is messy
= U 8 ; 7 r g
x w y
>=
With pixel discretization, dimension is very large, e.g., size size or more.
7 ; 7 ; B
>