Society For Industrial and Applied Mathematics
Society For Industrial and Applied Mathematics
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Society for Industrial and Applied Mathematics is collaborating with JSTOR to digitize, preserve and extend
access to SIAM Journal on Applied Mathematics.
http://www.jstor.org
SIAM J. APPL. MATH. ) 2002 Society for Industrial and Applied Mathematics
Vol. 62, No. 3, pp. 1019-1043
Key words. inpainting, disocclusion, interpolation, variational/PDE method, prior image mod-
els, total variation, digital zooming, image coding
PII. S0036139900368844
Ul D is given
AD-X
inpaintin g domai)
FIG. 1.1. Inpainting is to paint the missing u D on an inpainting domain D based on the image
information available outside.
The word "inpainting" was initially invented by museum or art restoration work-
ers [17, 47]. The concept of digital inpainting was only recently introduced into digital
image processing by Bertalmio et al. [1], who were the first group to develop inpaint-
ing models based on high order PDEs. Earlier related works based on second order
PDEs and variational techniques can be found in Caselles, Morel, and Sbert [4] and
Masnou and Morel [31]. Meanwhile, in computer science, much research similar to
*Received by the editors March 10, 2000; accepted for publication (in revised form) May 9, 2001;
published electronically February 6, 2002. This research was supported by grants from the NSF
under grant number DMS-9973341 and from the ONR under N00014-96-1-0277.
http://www.siam.org/journals/siap/62-3/36884.html
tlnstitute of Pure and Applied Mathematics (IPAM), UCLA, Los Angeles, CA 90095-7121 (chan
@ipam.ucla.edu).
tSchool of Mathematics, University of Minnesota, Minneapolis, MN 55455 (jhshen@math.
umn.edu).
1019
1020 TONY F. CHAN AND JIANHONG SHEN
the inpainting problem has also been carried out in the context of image interpo-
lation [26], image replacement [22], and error concealment [23, 27], although these
works are more based on statistical and algorithmic approaches.
Important applications of digital inpainting include (a) digital restoration of an-
cient paintings for conservation purposes [17, 47], (b) restoring aged or damaged
photographs and films [25, 26], (c) text removal and object removal in images for
special effects [1], (d) disocclusion in vision research [31, 36], (e) digital zooming and
edge-based image coding (sections 8 and 9).
Mathematically, what makes the inpainting problem so challenging is the com-
plexity of image functions. Unlike many traditional interpolation or boundary value
problems, the target image functions to be inpainted typically lie outside the Sobolev
category. Some examples include: (a) natural images (clutters) are modeled by dis-
tributions (Mumford [33]); (b) texture images contain very rich statistical content
and are modeled by Markov random fields and Gibbs fields (Geman and Geman [18],
Bremaud [3]); and (c) most nontexture images can be well approximated by func-
tions with bounded variations (Rudin and Osher [40], Rudin, Osher, and Fatemi [41],
Chambolle and Lions [5]) and the celebrated Mumford and Shah object-boundary
model [34]. Such multilevel complexities of image functions force researchers to de-
velop inpainting schemes targeted at specific classes of images. As a result, these
inpainting models are of low levels. The ultimate goal, of course, as in the blueprint
of vision and artificial intelligence, is eventually to be able to combine and integrate all
the low-level inpainting components into an ideal program that can well approximate
human inpainters.
The current paper represents a first systematic step toward this goal, and the
restrictive words "local" and "nontexture" in the title clearly indicate the low-level
nature of all inpainting models developed in this paper. The crucial concept and
principle of "locality" will be explained in the next section. And the reason that the
current paper does not touch texture inpainting is that all inpainting models here are
based on the variational principle or PDEs, and the resulting regularity requirement
is unsuitable for general statistical textures. For inpaintings of textures, some recent
work has been accomplished by Wei and Levoy [48] and Igehy and Pereira [22].
The paper is organized as follows. Section 2 clarifies the meaning of locality
through two examples connected to human visual inference. In section 3, we study
inpainting models and their accuracy analysis for smooth images. The key tool is
Green's second formula, which leads to linear and cubic schemes realized by harmonic
and biharmonic inpaintings. In section 4, we propose three inpainting principles for a
realistic low-level inpainting model. In this spirit, the total variation (TV) inpainting
model is formulated in section 5, which extends the classical TV denoising model of
Rudin and Osher [40] and Rudin, Osher, and Fatemi [41]. The digital implementation
of the TV inpainting model is also presented. In section 6, we propose a segmentation-
based inpainting scheme, as inspired by the well-known image model of Mumford and
Shah [34]. Section 7 introduces the so-called connectivity principle and the recent
model of Chan and Shen [11] on inpaintings based on curvature driven diffusions
(CDD). In section 8, for the first time in the literature of image processing, we make
the link between digital zoom-ins and TV inpaintings. A digital zoom-in model almost
identical to the continuous TV inpainting model is constructed based on the self-
contained digitized PDE method developed by Chan, Osher, and Shen [9]. Section 9
explains another new important application of the inpainting technique to edge-based
image coding schemes. The last section demonstrates many interesting applications
of the inpainting models developed in the paper.
MODELS FOR LOCAL NONTEXTURE INPAINTINGS 1021
Embedded
FIG. 2.1. A local inpainting scheme does not require global pattern recognition.
A classical example in vision analysis as shown in Figure 2.1 can clarify the above
discussion. For the image to be inpainted on the left, the inpainting (or occluded)
domain is the gray square at the intersection. Human observers can usually easily
"see" a complete black cross and thus fill in the black color. Most of us would agree
that it gives the best guess. In the right panel, the image on the left is embedded into
a larger structure. One can easily recognize the global chessboard pattern and thus
fill in the white color to complete the spatial symmetry. Therefore, human perceptual
inference depends on the global context.
Such complexity of human visual inference parallels that of inpainting models.
Any high-level inpainting scheme must be able to carry out pattern recognition. In this
paper, the inpainting models should be considered as low-level ones-the inpainting
outputs are independent of global patterns. Therefore, even for the right panel in
Figure 2.1, the inpainted color for the missing square domain will still be black.
The factor of scale or aspect ratio. Scale plays a universally significant role in
image and vision analysis. Thus it also does in the problem of inpainting.
Consider Figure 2.2. In the left panel, the inpainting scale L is much larger than
that of the characteristic feature (denoted by 1), and the left part "E" and right part
"3"seem to be more uncorrelated. We thus tend to accept the figure as two separated
letters "E 3." The image on the right, on the other hand, has an inpainting scale L
smaller than 1. Accordingly, we are more likely to believe that the figure is a broken
letter "B." In this example, the nonuniqueness is not caused by global patterns, but
by our guess on the correlation among features left there. The controlling parameter
is thus the scale or aspect ratio. The TV inpainting model and the segmentation-based
inpainting model developed later can both imitate this effect. However, as discussed
in section 7, for many applications in image processing, due to the large dynamic
range of scales present, the connectivity principle must be enforced regardless of the
scale factor. Therefore, the major inpainting models developed in the current paper
1022 TONY F. CHAN AND JIANHONG SHEN
I -
_, TI __ 1
L L
see their best performance in inpainting problems whose inpainting domains are small
or local, as in the right panel of Figure 2.2.
3. Inpaintings of smooth images and Green's second formula. To develop
a rigorous mathematical framework for inpaintings, as is well practiced in numerical
analysis, we start from a simple setting, in which the accuracy of inpainting can be
well studied. This is the case when the target image functions are smooth, or the
inpainting domains are contained in the interior of smooth two-dimensional (2-D)
objects. This simple model serves as the first step toward more general and realistic
inpainting models.
Let u? be a smooth image function defined on a 2-D domain Q (a rectangular
domain, typically). Denote the domain to be inpainted by D, its diameter by d,
and the restriction of u? on D by u?|D. Then to inpaint is to construct a good
approximation UD to u?OD.
An inpainting scheme is said to be linear if for any smooth test image u?, as the
diameter d of the inpainting region D shrinks to 0,
02u 02u
u := +
Ox2 y2'
Then Green's second formula on D is
where
(a) u and v are any C2 functions defined on the closure of D;
MODELS FOR LOCAL NONTEXTURE INPAINTINGS 1023
(b) n is the outward (w.r.t. D) normal direction of F, and s the length parameter.
Take G(zo, z) to be the Green's function for the grounded Poisson equation on D.
That is, for any "source" point z0 = (xo, yo) C D, as a function of the "field" point
z = (x,y) c D, G(zo,z) solves
dwz0 = O(-G(zo,z))d
dz ^
On
is the harmonic measure of F associated with a source point z0 (Nevanlinna [35]).
The antiharmonic component na := u? - uh satisfies the Poisson equation
- z E D, and = 0.
(3.5) A\ua(z) Au(z), uar
Computationally, the Poisson equation is favored over the direct integration formula-
tion since one can profit from many numerical PDE schemes and their fast solvers.
To establish a rigorous result on the inpainting accuracy for smooth images, we
turn to the geometry of a 2-D domain encoded into its associated Green's function.
The following results on Green's functions are indeed standard. We include them
here due to the increasingly important role played by the complex potential theory
in signal and image processing. (For example, recent applications of the complex
potential theory to digital signal processing have been studied in [42, 43].)
THEOREM 3.1. Let d denote the diameter of a domain D and G(zo, z) the asso-
ciated Green's function for the Poisson equation. Then
since the grounded Green's function is always nonnegative. Moreover, g(z) is harmonic
inside D1 because the logarithm singularities at z0 are canceled out. Therefore g(z) >
1024 TONY F. CHAN AND JIANHONG SHEN
0 for all z C D1 due to the maximum principle of harmonic functions: The minimum
is always achieved along the boundary (Gilbarg and Trudinger [19]). This proves the
lemma. D
LEMMA3.3. Suppose B1 is the unit disk centered at 0, and G (zo, z) its Green's
function. Then
' 1 o2
Gl(zo,z)dxdy= 4
1Bi
G (zo, z) --1 -
the lemma can also be worked out by evaluating the integral explicitly.) [l
We are now ready to give a proof of Theorem 3.1.
Proof. Take any single point w e D, and let Bd denote the disk centered at w
and with radius d. Then
D c Bd.
Let Gd(ZO,z) denote the Green's function for Bd. Then Lemma 3.2 shows that
where G1, as in Lemma 3.3, is the Green's function for B1. (This scaling law is true
only for the 2-D case.) Therefore, by Lemma 3.3, for any z0 E D,
= G
(d dxdy = d2 G1 ( z')dx'dy'
1-zo/dJ2 d2
_d2
4 - 4'
MODELS FOR LOCAL NONTEXTURE INPAINTINGS 1025
as asserted by the theorem. (The last step is due to our assumption that w = 0 c D
and zo E D. If this is not the case, then simply replace z0 and z by z0 - w and z - w,
and the proof still holds.) This completes the proof. D
Based on this theorem, we can easily establish the accuracy orders for inpaintings
based on Green's second formula.
(a) Linear inpainting via harmonic extension. Suppose we inpaint u? D simply
by the harmonic extension, i.e., UD = uh. We now show that this is a linear
inpainting scheme, i.e.,
IAu?(z) I < Al
u (z) -
Au(z) I < Md2
for all z c D. Hence,
'
Anidealstepedgeto be inpainted Harmonic
inpainting
FIG. 4.1. Harmonic inpaintings of a smooth image (u = r = +/x2 + y2) and an ideal step edge.
Human inpainters seem to have no difficulty in dealing with these two factors,
and we intend to design more realistic low-level inpainting models that can at least
imitate such functions. Thus we propose the following three inpainting principles for
the next step of model construction.
(a) Inpainting principle I. The model shall be local. Since we restrict ourselves
to models which do not require global learning, the inpainting UD must be
MODELS FOR LOCAL NONTEXTURE INPAINTINGS 1027
E (an extendedr)
inpaintingdomain)
-- (boundary)
FIG. 5.1. The TV inpainting model finds the best guess for U D based on the TV norm on the
extended domain E U D and the noise constraint on E.
1
(5.2) / u -u012 dxdy a2.
Area(?) JE dxdy
Here,
(i) r is an appropriate real function which is nonnegative for nonnegative inputs;
1028 TONY F. CHAN AND JIANHONG SHEN
E r(61)dxdy
EUD
where A plays the role of the Lagrange multiplier for the constrained variational
problem (5.1)-(5.2).
The Euler-Lagrange equation for the energy functional Jx is
(5.4) _V ( v) + Ae(-U) =
for all z = (x, y) C E U D, plus the Neumann boundary condition [10, 41]. Here the
extended Lagrange multiplier Ae is given by
_JA, z E,
{0, z D.
V +A(u-). -U).
(5.5) V(at=v(
=V- +Ae(U
Since Ae takes two different values, (5.4) or (5.5) is a two-phase problem, and the
interface is the boundary F of the inpainting domain.
From the numerical point of view, in all of the above differential equations we
replace the curvature term
(5.6) V.(iu) by V (
MODELS FOR LOCAL NONTEXTURE INPAINTINGS 1029
for some (usually small) positive lifting parameter a. This corresponds to the choice
of r(s) = /s2 + a2 for the regularizer R[u] in (5.1). We are thus actually minimizing
J[u]] = | a2 + V- 2 dy dxdy.-
JEUD 2E
As in most processing tasks involving thresholdings (like denoising and edge detec-
tion), the lifting parameter a also plays a thresholding role. In smooth regions where
IVuI < a, the model tries to imitate the harmonic inpainting, while along edges where
IVul > a, the model resumes the TV inpainting.
On the other hand, from the theoretical point of view, the lifting parameter a
also better conditions the TV inpainting model (5.3). In a noise-free situation, (5.3)
is reduced to a boundary value problem:
(5.7) V( 0, x CD;
=-) UloD=UI
As explained in [4], this boundary value problem, unlike harmonic extensions, is
generally ill-posed and may fail to have or to uniquely have a solution. The parameter
a plays a conditioning role as follows. For the lifted model,
/. (Va) - 1 2
v + - 2uxuyuxy).
* IV/ | )= Iw
|V 13 (pax a(y
-1
luy2 2Vu|3-UUy _ a2
ga - IVU13 _-UxU I~l2 _ I23 + '0
aVu -u uy u^_- IVUla |VWl~U3
where a0 is the symbol for the original TV model. Then it is easy to show the
following.
PROPOSITION 5.1 (the conditioning effect of a). The TV symbol co has eigen-
values 0 and IVul-1, while the lifted TV symbol ega satisfies
1Va/a ;3 2
12 ?< ga < -I2.
a a
Therefore, at each pixel away from the edges (where IVul is finite), the lifted TV
equation is strongly elliptic; if u has a bounded gradient, then the lifted TV equation
is in fact uniformly strongly elliptic. This is the conditioning effect of a.
Remark 3. In the most recent work of Chan, Kang, and Shen [6], the existence
of solutions to the variational TV inpainting model (5.3) has been established in the
space of functions with bounded variations. The issue of uniqueness is also discussed
from the vision research point of view.
Remark 4. So far, the TV inpainting model has been solely motivated by the
three inpainting principles. We now further justify the TV inpainting model through
a well-known class of illusions in visual perception.
The vision phenomenon we are to discuss is best illustrated through the example
of Kanizsa's entangled woman and man, which is one of the many artistic inventions
1030 TONY F. CHAN AND JIANHONG SHEN
FIG. 5.2. Can the TV inpainting model explain Kanizsa's entangled man?
of Kanizsa [24]. Its importance for the mathematical understanding and modeling
of human vision was first emphasized in Nitzberg, Mumford, and Shiota's systematic
work on disocclusion [36]. We have plotted a simplified version in Figure 5.2, which
we call "Kanizsa's entangled man."
Figure 5.2 shows how our visual perception can subconsciously contradict common
knowledge in life. What we perceive is a man entangled in the fence. Knowing by
common sense that he is behind the fence does not erase this false perception. As
Nitzberg, Mumford, and Shiota [36] wrote, "Simply put, we navigate in the world
successfully by seeing what's in front of what independently of knowing what's what."
We now apply the TV inpainting model to explain such a "stubborn best guess" by
our visual perception.
The contradiction occurs inside the circled region in Figure 5.2: the "fact" is that
the upper body of the man is behind the fence, while our perception strongly prefers
the opposite scenario. This disjunction is apparently caused by the presence of a color
shared by the fence and the man's upper body. So the puzzle is, Why does human
perception prefer to assign the controversial intersection to the upper body?
Kanisza's original explanation was based on the modal and amodal completion
accomplished by the shortest edge continuation between T-junctions. Here we show
that the TV inpainting model offers another similar explanation. While in practice
the detection of T-junctions often relies on the sharpness of edges, our functional
approach based on the variational principle seems to be more general.
First we simplify the problem to that of the left image in Figure 5.3. The vertical
and horizontal bars separately model the man's upper body and the fence. Notice
the length scales L > 1; in Figure 5.2, L is roughly a triple of 1. Assume that the two
bars share the same gray level Ub= Uf = 1/2 (with "b" and "f" tracking the "body"
and "fence" variables). The uncertain region is denoted by D.
Outside D, let us make a small perturbation of the two gray levels:
for some small positive gray value e (see the image in the right panel of Figure 5.3).
Now treat D as an inpainting domain and denote by UD the optimal solution on D
obtained from the TV inpainting model with A = oc (since there is no noise) and E
the complement of D. A simple calculation shows that
(5.8) UD = Ub = 1/2 + e,
MODELS FOR LOCAL NONTEXTURE INPAINTINGS 1031
L
Ub=.5 Ub-5+e_
to
Perturbed
U
=r-5 U=.5-.
L>l
which coincides with our "stubborn" perception. In other words, the TV model is
consistent with the "algorithm" performed by our visual neurons.
In fact, it is easy to see that the optimal solution UD must be a constant, say c.
Then the maximum principle [9] requires that uf < c < Ub. The total variation of UD
on the closure of D concentrates along the four edges and equals (Giusti [20])
We do not care about the TV measure on E because it is a fixed quantity for this
noise-free inpainting problem. To minimize the TV norm as given in (5.9), the only
choice is c = Ub = 1/2 + e, since L > 1. This proves the claim.
5.2. Numerical implementation. If the inpainting domain D is empty, then
(5.4) and (5.5) together comprise exactly the Rudin-Osher-Fatemi [41] denoising and
deblurring restoration model. Its theoretical study can be found in Chambolle and
Lions [5]. Numerical investigations and discussions can be found in [2, 8, 15, 41],
and more recent ones in [9, 29, 38]. New applications of the TV model for restoring
nonflat image features such as optical flows and chromaticity have appeared in the
recent works of Perona [39]; Tang, Sapiro, and Caselles [45, 46]; Chan, Kang, and
Shen [7]; and Chan and Shen [10].
In this paper, we have adopted the following numerical scheme for the TV in-
painting model (5.4). Here we look for the steady solution directly, instead of by time
marching (5.5), which is usually slow due to the time step constraints imposed by
numerical stability.
NW N NE
n_
W I e E
w
- -
SW
--SW S
sSE ~ S
As in Figure 5.4, at a given target pixel 0, let E, N, W, S denote its four adjacent
pixels, and e, n, w, s the corresponding four midway points (not directly available from
1032 TONY F. CHAN AND JIANHONGSHEN
Ao = {E,N,W,S}.
Let v = (v1, v2) = Vu/lVul. Then the divergence is first discretized by central
differencing:
OQV1 Vy2
(5.10) Vv =- +
Ox ay
1 1 2 2
(5.11) -
heV + h
where h denotes the grid size, which is always taken to be 1 in image processing. Next,
we generate further approximations at the midway points, where image information
is not directly available. Take the midpoint e, for example:
1 1 Du- 1 UE-UO
(5.12) ve =|V | xe |VUe h
- + [(UNE + UN - US - USE)/4]2
(5.13) Vue h, (UE Uo)2
0=-E - -
(5.14) IV (uo Up) + A,e() (Uo Uo)
WpP
(5.16) hop W= e()'
(5.17)Ao WQ + Ae()'
(5.17) Ae(0) 0
ho
EQEAo WQ +Ae()
with
hOP + hoo = 1.
PEAo
Equation (5.18) is in the form of a low pass filter, which is of course a system of
nonlinear equations since the filter coefficients all depend on u.
MODELS FOR LOCAL NONTEXTURE INPAINTINGS 1033
Freezing the filter coefficients (to linearize the equations), and adopting the Gauss-
Jacobi iteration scheme for linear systems, at each step n, we update u(n-l) to u(n)
by
PEAo
where h(n-l) = h(u(n-1)). Since h is a low pass filter, the iterative algorithm is stable
and satisfies the maximum principle [9]. In particular, the gray value interval [0,1] is
always preserved during the iterating process.
Useful variations of the algorithm can be obtained by altering the definition wp
or |Vupl in (5.15). For instance, instead of (5.13), we can also try
- -
|VUe| (UE Uo)2 + [(UNE SE)/2]2.
h
Experiments show that such variations sometimes work better for inpainting sharp
edges in the digital setting.
In implementation, as in (5.6), the weights wp are "lifted" to
(5.20) wp= VU a
for some small number a, to avoid a zero divisor in smooth regions. Notice that
choosing a large a brings the TV model closer to the harmonic inpainting (especially
computationally, since the spatial step size h is set to 1, and u takes values from the
finite gray-scale interval [0,1]). In addition, as a gets bigger, the convergence of the
iteration scheme speeds up.
The size of the extension domain E is also easily determined. If the image is
clean, E can simply be the boundary of the inpainting domain D. Otherwise, to
clean up statistical noise and extract reliable image information near the boundary,
one can choose E with a reasonable size, e.g., several pixels wide, as practiced in
image processing [21]. If, as for the inpainting of an old photo, the entire image
is contaminated by noise, then one should take E to be the complement of D, to
simultaneously clean and inpaint the photo.
6. Segmentation-based inpainting. The key to image inpainting is the right
model for image functions. Image models play a universally crucial role for image
restoration problems, such as image denoising, deblurring, and segmentation. In terms
of the Bayesian methodology, this is the significance of figuring out an appropriate
prior model. The link between the Bayesian approach and the variational method is
clearly explained in Mumford [32].
In the previous section, the inpainting model has been constructed based on
the total variation norm. The main merits of the total variation prior model are its
permission of edges and its convenient numerical PDE implementation. In this section,
we briefly discuss an inpainting model that is based on Mumford and Shah's [34]
object-boundary image model.
An image is considered as the union of a collection of 2-D smooth objects, which
meet each other along their edges. Thus in the variational formulation, the regularity
functional is no longer in the simple form of
R[u] = r(1Vu1)dx
Q^
1034 TONY F. CHAN AND JIANHONG SHEN
as in (5.1). Instead, it imposes the regularity condition on both the edge curves and
individual objects:
-
TV[u] IVuldxdy = 27r Ur\.rdr= 27rro(iu - uo),
where we have used the polar coordinates (r, 0). Similarly, the segmentation regularity
(for the perfect segmentation) is
Up to a multiplicative constant (i.e., the edge contrast), the two measures are equiva-
lent. This equivalence holds even for more complex and general image topology as long
as the image remains nearly a cartoon. But in terms of numerical implementation,
the TV inpainting model is much easier and faster.
7. The connectivity principle and CDD inpainting. Both TV inpainting
and segmentation-based inpainting share one drawback. That is, they both fail to
realize the so-called connectivity principle of the human disocclusion process [11]. See
Figure 7.1 for a typical case.
The example in the figure easily explains why the TV and segmentation-based
inpainting models fail to realize the connectivity principle when the inpainting scale
becomes large. Let Udis and ucon denote the disconnected and connected inpainting
reconstructions as in the figure. Suppose that 1 > w. Then the TV model prefers udis
to Ucon, since
assuming that the black bar has u0 = 0 and the white background u1 = 1. In the
same fashion, under the segmentation regularity, we have
Rseg[Ucon, Fcon]
-
Rseg[Udis, Fdis] = p(21 - 2w) = 2U(1 - w) > 0.
MODELS FOR LOCAL NONTEXTURE INPAINTINGS 1035
h temb
U_ j~c d
What is behind the box? Answer fronlm most humanas Ansiver by the TV mnodel
(I >> w)
FIG. 7.1. When 1 > w, the TV and segmentation-based inpaintings both act against the connec-
tivity principle of human perception-human observers mostly prefer to have the two disjoint parts
connected, even when they are far apart [24, 36].
= .
(7.1) t IVulVu + e(u-u), x E Q,
where , is the scalar curvature V [Vu/lVul]. The new ingredient of the CDD model,
compared with the TV inpainting model, is the diffusion coefficient G(e, x) which is
given by
1 x C
G(, x) Q\D,
g9(1), x E D.
The choice of a coefficient value of 1 outside the inpainting domain indicates that
the model carries out the regular TV denoising task outside D. Meanwhile, g(s)
can be any appropriate function that penalizes large curvatures and stabilizes small
curvatures inside the inpainting domain. In Chan and Shen [11], it is argued that
g(s) must satisfy
Thus, for example, one can choose g(s) = sa for some a > 1. Under this condition, the
model stretches out bent level lines inside the inpainting domain, outputs connected
objects, and therefore realizes the connectivity principle (see Figure 10.6, for example).
8. Digital zoom-in based on TV inpainting. Digital zoom-in has wide appli-
cations in digital photography, image superresolution, data compression, etc. Zoom-
out is a process of losing details or, in the framework of wavelets and multiresolution
analysis, a process of projections from fine scales to coarser ones [14, 44]. Zoom-in,
on the other hand, is the inverse problem of zoom-out and thus belongs to the gen-
eral category of image restoration problems. The literature on zoom-ins in image
processing has been growing.
One level of zoom-in from a given digital image u? of size n by m is to reconstruct
a new digital image u of size 2n by 2m (2 is typical but not unique), so that u? can
1036 TONY F. CHAN AND JIANHONG SHEN
be the one level zoom-out of u. Thus it is important to know the exact form of
the zoom-out operator. Typically, the zoom-out operator consists of two steps: a
low pass filtering (or local smooth averaging) of the fine scale image u, followed by a
subsampling process leading to the zoom-out u? on a coarser grid, a scenario much less
strange in wavelet theory [44]. In what follows, we shall assume a direct subsampling
zoom-out. That is, the filter is a Dirac 6, and thus the zoom-out is simply a restriction
from a 2n by 2m grid to its n by m double-spaced subgrid.
In contrast to its utility for inpaintings on block domains, continuous modeling
becomes less appropriate for the digital setting of zoom-ins. A similar problem has
been addressed by Chan, Osher, and Shen [9] for image denoising and enhancement,
where a self-contained digital theory for TV denoising was developed and studied.
Here we follow the same framework to construct a zoom-in model, which is exactly
the digital version of the continuous TV inpainting model.
Let Q denote the fine grid on which the zoom-in u is to be defined. The grid
for the given coarse scale image u? is denoted by Q0, which is a subgrid of Q. As in
the practice of Markov random fields [3], assign a neighborhood system to Q, so that
each pixel a E Q has its neighborhood Na, a collection of "nearby"pixels (excluding
a itself). For example, we can assign a rectangular neighborhood system so that if
a = (i,j), then Ns consists of the four pixels (i,j ? 1), (i ? 1,j).
At each pixel a, define the local variation as
Also define the extended Lagrange multiplier Ae as a function on the fine grid Q:
( A, a C Qo,
) 0, otherwise.
Then the digital TV zoom-in model attempts to minimize the digital energy Jx over
all possible fine scale images u:
J[u] = Vu| + -
(8.1) 2a (U )2.
For the purpose of comparison, one may also try the digital harmonic zoom-in model:
As established in [9], the minimization of the digital TV zoom-in energy can be carried
out by repeatedly applying the so-called digital TV filter u -- v = F(u): at each pixel
a,
where the exact formulae for the filter coefficients ha, depend on the input u and A\e
and are worked out in [9]. Starting with an arbitrary initial guess u(?) for the zoom-in,
we improve its quality by iterating the digital TV filter u( =- F(u('-1)). As n goes
to oo, u( converges to the "best" digital zoom-in of u?.
MODELS FOR LOCAL NONTEXTURE INPAINTINGS 1037
As we have noticed, the digital TV zoom-in model (8.1) is almost identical to the
continuous TV inpainting model (5.3). The reason we prefer the self-contained digital
framework lies in the facts that it is independent of the numerical PDE schemes one
applies and always permits a solution (since we are working with finite-dimensional
data). The technical difficulty with continuous modeling is that existence is not
guaranteed, as discussed by Caselles, Morel, and Shert [4]. The most understandable
case is when we choose the H1 regularity, analogous to the digital version (8.2).
Then in the noise-free case, the continuous model is equivalent to finding a harmonic
function u on a continuous 2-D domain Q, which interpolates the given data u? on a
finite set of pixels. But for harmonic extensions, it is a well-known ill-posed problem
to impose both the boundary condition and the 0-dimensional interior interpolation
constraint.
9. The inpainting approach to edge-based image coding. In this section,
we discuss a very interesting new application of the inpainting technique to edge-based
image coding and compression.
Ever since Marr and Hildreth [30], edge has played a crucial role in vision and
image analysis, from the classical theory of zero crossings to the more recent theory
of wavelets. In image coding, for example, the performance of a scheme is very
much determined by its reaction to edges. This viewpoint is further supported by
mainstream developments in the current wavelet theory for image coding: Donoho's
invention of curvelets and beamlets [16], Mallat's bandlets [28], and Cohen et al.'s
tree coding scheme [13].
It would be digressing too much if we tried to explore here the vast literature of
image coding and compression. Instead, we now introduce the inpainting approach
to (lossy) image coding and compression based on the edge information.
The encoding stage consists of three steps:
- (Edge detection E) Apply an edge detector (Canny's, for example) to detect
the edge collection E of a given image u?. E is typically a set of digital pixels
or curves, without good geometric regularities. In addition, we also demand
that the physical boundary of the entire image domain Q belong to the edge
collection.
- (Edge tube T) Next, fixing a small constant e, we generate the e-neighborhood
T of the edge collection, or as we prefer to call it, an edge tube. Digitally, T
can be a 1- or 2-pixel thickening of E (see Figure 10.9).
- (Encoding) Finally, we encode the addresses of the tube pixels and use a high
bit rate to accurately code the gray values on the tube u?0T.
This encoding scheme creates a large area of "empty seas" where the image in-
formation has been wiped out, and thus achieves a high compression rate. In the
absence of strong textures and small scale features, the edge collection consists of 1-D
piecewise smooth curves. Thus as e tends to zero, the area of the tube T goes to
zero, which, theoretically, leads to an infinite compression ratio. Inevitably, such a
high compression ratio passes the reconstruction challenge to the decoding scheme.
Here we employ the digital TV inpainting scheme to "paint" the uncoded missing
information.
To decode, we apply the digital TV inpainting model to the tube T and the gray
value data u? T:
(9.1) minm VU + AT )
E (Ua-U)2
_ctCQ aQG_
1038 TONY F. CHAN AND JIANHONG SHEN
AT(a) = A, a C T; 0, a E Q\T.
10. Applications of inpainting. For all the inpainting examples of this section,
the inpainting domains are given to the algorithm and are initially painted with
random guesses, for both the iterative filtering algorithm and the time marching
scheme.
10.1. Inpainting a noisy step edge and occluded bars. See Figures 10.1
and 10.2. In the first example, a noisy step edge has been inpainted faithfully by the
TV inpainting model. For the second, the occluded bars are recovered as expected.
The noisy image to be inpainted: SNR=20:1 Occluded black and white bars
The V inpainting
TV g The TV disocclusion
FIG. 10.1. Inpainting a noisy edge (10.1). FIG. 10.2. Inpainting occluded bars (10.1).
10.3. Inpainting a noisy scratched photo. See Figure 10.4. The image rep-
resents the scanned noisy data of an old scratched photo. As promised, the TV
inpainting model can simultaneously denoise the available part of the photo and fill
in the missing features. This is the beauty of the TV inpainting: in both the model
and algorithm, denoising and inpainting are coherently integrated.
MODELS FOR LOCAL NONTEXTURE INPAINTINGS 1039
After inpainting
FIG. 10.3. Inpainting two disks (10.2). FIG. 10.4. Inpainting a noisy face (10.3).
The original complete image The mask for the inpainting domain
20
10
20
IB?_
30Lakel20 40 60 80 100
30
40
Text removal
I t - .1
. ..r Initiallyfilled in with a random guess The output from the CDD Inpainting
:j; 10 -
20 20
II f: 30 40 60 80 100 30
40 40 .
20 40 60 80 100 20 40 60 80 100
FIG. 10.5. TV for text removal (10.4). FIG. 10.6. CDD for text removal (10.4).
10.4. Removal of thick text. See Figure 10.5. The text string "Lake & Me"
has been removed, and the original features occluded by these letters are inpainted.
Note that the black rim around the right arm of the T-shirt is not successfully restored
by the TV inpainting. The "failure" is due to the scale factor discussed in section 2.
The inpainting scale (i.e., the width of a letter in this case) is larger than that of
the feature (i.e., the black rim). In Figure 10.6, we have applied the CDD inpainting
scheme (section 7) to the same image. For CDD, the connectivity principle is enforced,
and therefore the broken rim segments are indeed connected.
10.5. Removal of dense text. See Figure 10.7. The dense text strings have
been successfully removed. We feel that this is a very promising application since
(a) such problems are typically local due to the small size of the letters, and (b)
the number of letters and the complexity of their shapes are well handled by the
TV inpainting algorithm since they are easily encoded into the extended Lagrange
multiplier Ae.
1040 TONY F. CHAN AND JIANHONG SHEN
of
Caltech's
Vis ionGroup. It pro-st
Computational is
loom-in model
duces
much t in erms
better
visual
output We
of edge
sharpnesse?
10.7.
decodisng
Edge
by inpainting. In Figure 10.9, we show an example of
mo:re o>aim iBtic_ The
TV model ca:
:restoxre s_ See yva!
After
image for
approach
decoding
inpainting based on edge information. The edge
in and harmonic
(8.1) zoom-in to the test image
(8.2) theo "Lamp" from image bank
10.6. Dig ital zoom-in . See Figure 10.8. We apply both the digital TV zoom -
in (8.1) andharmonic zoom-in (8.2) t o
the test i
mage"Lamp" from the imagebank
of Caltech's Computational Vision G It cleais th r the at
roup. TVzoom-in model pro-
duces much bettervisual output in terms of edge sharpness and boundary regularity.
10.7. E dg d e ecoding
by inpainting. In Figur ew 10.9,e sho an
w example of
the inpaintingapproach
for i mage decoding based on edge in form ation. The edge
d etect w or have employed
e belongs to Canny which , is now a st MATLAB
andard
built-in function. Thethickening
width
described in the
previous sectionisone pixel.
This highly lossy coding scheme certainly loses some details of the original image. But
remarkably, it faithfully captures the most essential visual information of the image.
Acknowledgments. We owe enormous gratitude to Dr. Marcelo Bertalmio for
his generosity in exposing his new work to us through his inspiring talk at UCLA. The
present paper would have been absolutely impossible without our initially reading
his paper [1]. In addition, the second author would like to thank Professor David
Mumford for his encouragement. The authors are also very grateful for inspiration
from Professors Stan Osher, Peter Olver, Fadil Santosa, Robert Gulliver, and Dan
Kersten during the manuscript revision period.
MODELS FOR LOCAL NONTEXTURE INPAINTINGS 1041
, .
''V^B ^ 1.... , ? . . 0
Q0,_
:?:.
REFERENCES
[29] A. MARQUINA AND S. OSHER, A new time dependent model based on level set motion for non-
linear deblurring and noise removal, in Scale-Space Theories in Computer Vision, Lecture
Notes in Comput. Sci. 1682, M. Nielsen, P. Johansen, 0. F. Olsen, and J. Weickert, eds.,
1999, Springer-Verlag, New York, pp. 429-434.
[30] D. MARRAND E. HILDRETH,Theory of edge detection, Proc. Royal Soc. London B, 207 (1980),
pp. 187-217.
[31] S. MASNOUAND J.-M. MOREL, Level-lines based disocclusion, Proceedings of the 5th IEEE
International Conference on Image Processing, Chicago, IL, 1998, pp. 259-263.
[32] D. MUMFORD,The Bayesian rationale for energy functionals, in Geometry Driven Diffusion in
Computer Vision, Kluwer Academic, Norwell, MA, 1994, pp. 141-153.
[33] D. MUMFORD, Empirical investigations into the statistics of clutter and the mathemati-
cal models it leads to, a lecture for the review of ARO, 1999; also available online at
www.dam.brown.edu/people/mumford/research-new.html.
[34] D. MUMFORD AND J. SHAH, Optimal approximations by piecewise smooth functions and asso-
ciated variational problems, Comm. Pure Appl. Math., 42 (1989), pp. 577-685.
[35] R. NEVANLINNA, Analytic Functions, Springer-Verlag, New York, 1970.
[36] M. NITZBERG,D. MUMFORD,AND T. SHIOTA,Filtering, Segmentation, and Depth, Lecture
Notes in Comput. Sci. 662, Springer-Verlag, Berlin, 1993.
[37] S. OSHER AND J. A. SETHIAN, Fronts propagating with curvature-dependent speed: Algorithms
based on Hamilton-Jacobi formulations, J. Comput. Phys., 79 (1988).
[38] S. OSHERAND J. SHEN, Digitized PDE method for data restoration, in Handbook of Analytic-
Computational Methods in Applied Mathematics, Chapman and Hall/CRC Press, Boca
Raton, FL, 2000, pp. 751-771.
[39] P. PERONA, Orientation diffusion, IEEE Trans. Image Process., 7 (1998), pp. 457-467.
[40] L. RUDIN AND S. OSHER, Total variation based image restoration with free local constraints, in
Proceedings of the 1st IEEE International Conference on Image Processing, Austin, TX,
1994, pp. 31-35.
[41] L. RUDIN, S. OSHER,ANDE. FATEMI,Nonlinear total variation based noise removal algorithms,
Phys. D, 60 (1992), pp. 259-268.
[42] J. SHEN AND G. STRANG, The asymptotics of optimal (equiripple) filters, IEEE Trans. Signal
Process., 47 (1999), pp. 1087-1098.
[43] J. SHEN, G. STRANG, AND A. J. WATHEN,The potential theory of several intervals and its
applications, Appl. Math. Optim., 44 (2001), pp. 67-85.
[44] G. STRANGANDT. NGUYEN,Waveletsand Filter Banks, Wellesley-Cambridge Press, Wellesley,
MA, 1996.
[45] B. TANG, G. SAPIRO,ANDV. CASELLES,Color Image Enhancement via Chromaticity Diffusion,
Technical report, ECE-University of Minnesota, Minneapolis, MN, 1999.
[46] B. TANG, G. SAPIRO, AND V. CASELLES,Direction diffusion, in International Conference in
Computer Vision, to appear.
[47] S. WALDEN,The Ravished Image, St. Martin's Press, New York, 1985.
[48] L.-Y. WEI AND M. LEVOY, Fast Texture Synthesis Using Tree-Structured Vector Quantization,
Preprint, Computer Science, Stanford University, Stanford, CA, 2000; also in Proceedings
of SIGGRAPH 2000.