A Hybrid Approach To Clouds and Shadow Removal in Satellite Images
A Hybrid Approach To Clouds and Shadow Removal in Satellite Images
This work aims to overcome a common problem in many satellite images, which is the presence of undesirable
atmospheric components such as clouds and shadows at the time of scene capture. The presence of such ele-
ments can hinder the identification of image objects, monitoring and urban environmental, and subsequent steps
of the digital image processing such as segmentation and classification, responsible primarily for extracting
information to the user. Thus, this work presents a new way to perform a hybrid approach toward detection,
removal and replacing of these elements in satellite images. The approach proposes a regions decomposition
method using a nonlinear median filter in order to map structure and texture regions, where will be applied the
methods inpainting by smoothing based on DCT and exemplar-based texture synthesis, respectively. Finally, it
was found the effectiveness of this technique through a qualitative evaluation at the same time that a discussion
about quantitative analysis is made.
1
area. To this, the statistical measures of image aver- data, measured by the residual sum-of-squares (RSS)
age value and standard deviation are computed and and a penalty term (P ), which reflects the robustness
the classification ins possible. of the smoothed data. Another simple and straightfor-
Besides (Siravenha 2011) added the capability of ward approach to express the robustness is by using
shadows detection was added two constants called cc a second-order divided difference, which produces an
and sc, increasing the algorithm flexibility. The equa- one-dimensional array of data.
tion that describes this operation is Now, using RSS and the second-order divided dif-
ference, minimization of F (ŷ) results in a linear sys-
tem expressed in Eq. 3, which allows the smoothed
f (x, y) < sc × fav−sd , f (x, y) 0,
data determination.
fav−sd < f (x, y) < fav , f (x, y) 1,
m(x, y) =
f < f (x, y) < cc × fav+sd ,
m
f (x, y) 2, (In + sDT D)ŷ = y, (3)
f (x, y) > cc × fav+sd , f (x, y) 3.
(1) where In is the identity matrix n × n, s is a positive
where f (x, y) is the pixel value, fa v is the average real scalar that controls the grade of smoothing, so
value of image pixels, fav+sd is the sum of the aver- that, as it increases, the degree of smoothing of ŷ in-
age value pixels and the standard deviation value of creases too; and DT represents the transpose of D.
the image and fm−dp is the subtraction of the average The Eq. 3 can be solved using the so-called left
value by the standard deviation pixels. division matrix applied to sparse matrices. Solving
The region labeled as 0 represents a shadow region, this linear system, however, can be time consum-
labeled as 1 means region not affected by atmospheric ing for a large amount of data. This algorithm can
interference, while regions labeled as 2 represents thin be greatly simplified and accelerated if the data are
clouds, and dense cloud are labeled as 3. For images evenly spaced, a situation that occurs in images,
with multiple bands these labels are assigned if and where pixels are equally spaced, resulting in the fol-
only if the rule is valid for all bands lowing equation for multidimensional data:
To complete this process is applied a morphological
opened operation that aims remove very small objects ŷk+1 = IDCT N (ΓN ◦ DCT N (yk )). (4)
which can cause mistakes in following steps.
Where DCT N and IDCT N refers to the N-
2.2 Inpainting by smoothing based on mutimen- dimensional DCT and its inverse, respectively, k is the
sional DCT number of iterations, N is the number of dimensions,
◦ is the Schur product (element-to-element), while,
This method was proposed by (Garcia 2010), and Γn , represents a tensor of rank N defined by
so as in (Bertalmio, Sapiro, Caselles, and Ballester
2000), is based on information propagation by
Γn = 1N ÷ (1N + s ∧n ◦∧n ). (5)
smoothing, from the regions surrounding one where
the data needs to be redefined. The specificity of this Here, the operator ÷ symbolizes the division element
approach is related to the use of the Discrete Cosine by element, and 1N is a tensor of rank N composed
Transform (DCT) and its inverse to simplify and solv- by 1’s. ∧ is the following tensor of rank N (Buckley
ing linear systems, producing an efficient smoothing. 1994)
In statistics and data analysis, smoothing is used to
reduce experimental noise or information on a small
N
!
scale and keeping the most important marks of a (ij − 1)π
∧N
X
data set. Consider the following model for the one- i1 ,...,siN = −2 + 2 cos , (6)
j=1 nj
dimensional noisy signal y from the Eq. 2.
where nj denotes the size of ∧n along the j-th dimen-
y = ŷ + ε, (2) sion.
where ε represents a Gaussian noise with zero mean Can also be seen that when there are undefined
and unknown variance, and ŷ is the so-called smooth- values in the image, smoothing is also responsible
ing, i.e., has continuous derivatives up to some order for interpolation of data, functioning as an inpainting
(usually ≥ 2) throughout the domain. The smoothing method. In order to accelerate the convergence, the
of y depends on the best estimate of ŷ and this oper- process starts performing nearest neighbor interpola-
ation is usually performed by parametric or nonpara- tion on the image to be restored.
metric regression.
A classic approach to smoothing is the penalized 2.3 Texture Synthesis
least squares regression. This technique consists in The texture synthesis has been an intensive field of
minimizing a criterion that balances fidelity to the study because this purpose variety. It can be applied,
2
for example, in objects fill-in tasks, image recovery, be noted that the texture, as well the structure (the line
video compression and foreground removal. that apart the light and dark gray regions), are propa-
Let us define texture as a visual pattern in a 2 − D gated to Ψp fragment.
infinite plan with, at some scale, has a steady distri- The fill in order is influenced by the linear struc-
bution, naturally, one can obtain a finite sample of tures adjacent from the target region. Thus, the model
one textures present in this plan in order to synthe- based texture synthesis algorithm with propagation
size other samples from the same texture. This finite along the isophotes direction of the image, presents
sample can be extracted from a uncountable different efficiency and qualitative performance respecting the
textures, and it is a ill-posed situation. To contour this, restrictions imposed by the linear structures.
the assumption is that the sample is larger enough to
capture the textural stead distribution of the image and
the texture elements scale is known (Efros and Leung 2.3.1 Fill in regions algorithm
1999). Furthermore, the texture synthesis is responsi-
ble for joining continuous region, ensuring the visual Taking a source image with a target region Ω to be re-
quality, when is applied to fill-in border regions. defined and a source region Φ, that can be expressed
The approach proposed by (Criminisi, Prez, and as the subtraction of the image f by the target region
Toyama 2004), aims to remove or redefine larges ob- (Φ = f − Ω), one must define the window size to the
jects in a digital image with the neighborhood infor- model called Ψ. It is very common use a window with
mation. This method uses the texture synthesis to fill 9 × 9 dimension, but it is recommended that the size
in regions that contains two-dimensional textural pat- are lager than the biggest distinguishable textural ele-
terns with moderated stochasticity. For this, generates ment in the source region.
new sampling textures from a source image and make In this algorithm (Criminisi, Prez, and Toyama
a simple copy to the target areas. 2004), each pixel maintain a color value (or NaN, if
is an undefined pixel to be filled) and a confidence
value, which reflects the confidence in the color value
since the pixel is filled. During the algorithm execu-
tion, the fragments located in δΩ contour receive tem-
porary priority value, defining the order to be filled.
Hence, an interactive process is proceed in the fol-
lowing sequence:
1) Computing fragments priority: Because the
texture synthesis woks with priorities, the strategy
called best-first fill in the regions according the pri-
ority levels and becomes tendentiously to the regions
that a) are on strong continuity borders or b) are sur-
rounded by high confidence pixels.
The Figure 2 represents an image to be processed,
and given the fragment Ψp , np is the normal to the
contour δΩ of the target region and ∇Ip⊥ is the
Figure 1: Model based texture synthesis: (a) Origi- isophote at point p. The isophote represents the di-
nal image. (b) Ψp fragment centralized in p ∈ Φ. (c) rection and intensity in that point.
The most probable candidates Ψq0 e Ψ00q . (d) The most
probable candidate is propagated to target fragment.
3
the structured part should be applied the technique of
inpainting based on DCT. On the other hand, into tex-
|∇Ip⊥ · np |
P
(f −Ω) C(q) ture portions and heterogeneous areas is suitable to
T
q∈Ψp
C(p) = and D(p) = , use texture synthesis.
|Ψp | α
(7) The generalized model is defined as: f = u + v,
where |Ψp | is the total area of Ψp , α is the normaliza- where f is the input image, u is the structure image
tion factor (255 in typical applications with gray scale and v is the texture image. So, given these sub-images
images), np is a unit vector orthogonal to the δΩ in one can reconstruct the original image. However, in
p and ⊥ is the orthogonal operator. For each border practice it is observed that the original image can be
fragment a P (p) is computed, and every distinct pixel only approximately reconstructed, although the algo-
represents a fragment on a target region border. Dur- rithm presented by the authors provides results with
ing the initialization, taking the C(p) equals to 0 for an reconstruction error very small. The goal of the
all point p in the target region and C(p) equals to 1 method is to have a structure image u that preserves
for all point in source region. all edges considered strong and has interior regions
The confidence represents the measure of trusted smoothed, and a image v that contains all the texture
information surrounding a pixel. Thus, the algorithm and noise information.
aims to fill, firstly, those most reliable fragments, in- The method used to construct the structure image
cluding those with more redefined pixels or fragments u is based on the assumption that u is a 2D func-
whose pixels were never part of the target region. tion and in attempt to minimize this function in the
The D(p) term defines the isophotes strong on δΩ space of all bounded variation (BV) functions. Func-
in every iteration and it is responsible by increases the tions in BV space are functions whose total variation
confidence of the linear structures, making a safely are limited by some constant value less than infin-
filling. ity. Minimize u in BV space ensures resulting image
2) Structure and texture spread informations: be stable and without infinite values at any point. It
Once the priorities are computed, the Ψp̂ high confi- should be noted, however, that this space allows for
dence fragment is found and filled by the information functions which have very large derivatives (although
extracted from Φ source region more similar to that non-infinite), thereby ensuring that the strong edges
region. Formally are preserved.
Taking in mind the intuition described above,
arg min d(Ψp̂ , Ψq ), (8) the minimization problem should logically have two
Ψq ∈Φ terms. A term will be the fidelity, responsible for
maintaining the difference between f and u small.
where the distance (d(. . .)) between fragments Ψa and
This fidelity term ensure that data of the input image
Ψb is defined as a Sum of Squared Differences (SSD)
are kept on result. The second term imply a smooth-
of the pixels that contains information in these frag-
ing over u, although not necessarily in all u com-
ments (possible already filled pixels). The ideal frag-
ment to fill in one region is that one who minimizes
ponents. R minimization is computed as: F (u) =
The
|∇u| + λ |f − u|2 dxdy, u ∈ BV.
R
the SSD.
In the above equation, the second term is the data
Having found the source model, every pixel value
term, the first term is a regularization term to ensure a
(p0 |p0 ∈ Ψp̂∩Ω ) is copied to the correspondent target
relatively smooth image, and λ is a tuning parameter.
region inside Ψq̂ .
As can be seen, this only seeks to find the optimal u,
3) Updating the confidence values: After Ψp̂ re-
and ignores the v image. The reason for this is that in
ceive this new values it is redefined its confidence val-
previous work the authors had considered the v image
ues C(p) = C(p̂)∀p ∈ Ψp̂ ∩ Ω.
to be noise, and therefore to be discarded.
This simple rule to update the confidence values al-
As illustrated in (Brennan 2007), exists a unique
lows that the measure of the relative confidence even
result to this optimization problem, and methods exist
without any specific image information. It is expected
for finding the solution. Noting that v = f − u it is
that the confidence values decay during the filling pro-
possible to easily modify the above equation to incor-
cess, this indicates that there are less assurance about
porate v:
the pixels color values that are near the center of the
target region.
Z Z
2.4 Image Decomposition F (u) = |∇u| + λ k v k2 dxdy, u ∈ BV, (9)
This method proposed by (Vese and Osher 2002) de-
composes an image into two sub-images, each repre- which yields the Euler-Lagrange equation u =
1 ∇u
senting the components of structure or texture, thus, f + 2λ div( |∇u| ) . Solving for v: v = f − u =
1 ∇u
can be made a better redefinition in the image. On − 2λ div( |∇u| ). At this point it is useful to break v into
4
its x and y components respectively. It will be denoted and shadows will always be structure components, so
as g1 and g2 , where: to define which technique use to remove them, one
must observe the surrounding regions. Therefore, as a
result of application of the filter, those regions to be
1 ∇ux 1 ∇uy
g1 = − div( ) and g2 = − div( ) (10) redefined are mapped in the binary image to texture or
2λ |∇u| 2λ |∇u| structure regions, to finally apply inpainting or texture
synthesis on the input image, respectively.
This allows us to write v as: v = div~g where ~g =
1
(g , g ). It can be seen that g12 + g22 = 2λ
q1 2
, so that k
3 Results and Discussions
1
g12 + g22 k= 2λ
. This allows us to rewrite v as: A major problem in literature is the lack of quanti-
tative evaluation methods for inpainting algorithms.
Along this work was tested the evaluation metrics
v(x, y) = div~g = ∂x g1 (x, y) + ∂y g2 (x, y) (11)
PSNR (local and global), Kappa and SAD (Sum of
This now leads to the final minimization problem: Absolute Differences). It was concluded that none of
them is appropriate to evaluate different approaches
Z Z to redefine regions. For example, certain results from
G(u, v1, v2) = |∇u| + λ |f − u − ∂x g1 hybrid approach, and even just using texture synthe-
sis, visually performs a region filling which appears
Z q more consistently than those resulting from inpaint-
−∂y g2 |2 dxdy + µ g12 + gw2 dxdy, u ∈ BV. (12) ing (sometimes showing large blurs). However, when
making the quantitative evaluation cited, it is common
Solving the above minimization problem yields the that the inpainting approach get better results.
Euler-Lagrange equations: According (Taschler 2006), the only explanation
for this discrepancy is that the texture synthesis can
reach more faithful results to the goal, but some ele-
1 ∇u ments are not located at the corresponding position in
u = f − ∂x g1 − ∂y g2 + div( ) (13)
2λ |∇u| the reference image, ie, if the difference is one or two
pixels, it’s led to a lower PSNR value for the entire re-
g1 ∂ 2 2
gion. Another explanation can be given because these
µq = 2λ[ (u − f ) + ∂xx g1 + ∂xy g2 ] (14) metrics and inpainting are performed pixel-by-pixel,
g12 + g22 ∂x
while texture synthesis commonly is block-by-block,
taking some disadvantage in evaluation.
g2 ∂ 2 2 With regards to qualitative assessment, Fig. 3 (a)
µq = 2λ[ (u − f ) + ∂xy g1 + ∂yy g2 ] (15)
2 2
g1 + g2 ∂y has an image affected by the presence of dense
clouds and shadows about texture (urban) and struc-
ture (dense vegetation) areas. In Fig. 3 (b) is illus-
2.4.1 Implementation proposed in this work trated a black mask containing these regions to be re-
Although it was the basis, in this study was not defined. Fig. 3 (c) and (d) shows the results of inpaint-
used the decomposition and addition explained above, ing and texture synthesis, respectively, and as dis-
since this approach results in considerable error gen- cussed above are visible blurs generated by smooth-
erated at the image reconstruction and also due to the ing performed in urban areas and the erroneous re-
difficulty in establishing appropriate parameters for placement of the texture synthesis in areas of dense
an acceptable image decomposition. Instead, a strat- vegetation. From these results, then it is decided to ap-
egy was used for mapping of the structure and texture ply the hybrid approach. Fig. 3 (e) shows a binary im-
areas of an image. age after passed through the median filter, containing
This process begins transforming the component texture (white) and structure (black) regions. From
that contains the texture information, ie image v, in this mapping, Fig. 3 (f) shows the hybrid approach
a binary image with values 1 for texture heteroge- results, where becomes clear the union of advantages
neous areas and 0 for structure areas. Then a non- of the techniques applied in suitable regions thereof,
linear median filter is applied to make homogeneous thereby overcoming other methods.
(smoothed) areas where small gaps of a given fea-
ture are surrounded by predominant regions of an- 4 Conclusion
other sort. This step is critical due to the presence of This work aimed to present a new way to perform
clouds and shadows in the image, and is performed a hybrid approach toward detection, removal and re-
in order to correctly define the techniques to be em- placing of clouds and shadows areas in satellite im-
ployed for each region. This happens because clouds ages. The approach proposes a regions decomposition
5
(Eds.), VISAPP 2009 - Proceedings of the
Fourth International Conference on Computer
Vision Theory and Applications, Lisboa, Portu-
gal, February 5-8, 2009 - Volume 1, pp. 26–33.
INSTICC Press.
Criminisi, A., P. Prez, and K. Toyama (2004). Re-
gion filling and object removal by exemplar-
based image Inpainting. In IEEE Transac-
tions On Image Processing, Volume 13(9), pp.
1200–1212.
Efros, A. and T. Leung (1999). Texture synthe-
sis by non-parametric sampling. In In Inter-
national Conference on Computer Vision, pp.
1033–1038.
Garcia, D. (2010). Robust smoothing of gridded
data in one and higher dimensions with missing
values. Computational Statistics & Data Anal-
ysis 54(4), 1167 – 1178.
Hau, C. Y., C. H. Liu, T. Y. Chou, and L. S. Yang
Figure 3: Process of clouds and shadows removal: (a) (2008). The efficacy of semi-automatic classi-
Original image. (b) Mask from regions detection. (c) fication result by using different cloud detec-
Inpainting result. (d) Texture Synthesis result. (e) Bi- tion and diminution method. The International
nary image representing proposed image decomposi- Archives of the Photogrammetry, Remote Sens-
tion. (f) Proposed hybrid approach results. ing and Spatial Information Sciences.
Hoan, N. T. and R. Tateishi (2008). Cloud removal
method using a nonlinear median filter in order to map of optical image using SAR data for ALOS ap-
structure and texture regions, where will be applied plications. Experimenting on simulated ALOS
the methods inpainting by smoothing based on DCT data. The International Archives of the Pho-
and exemplar-based texture synthesis, respectively. togrammetry, Remote Sensing and Spatial In-
In qualitative evaluation was evident that the hybrid formation Sciences.
approach overcomes the use of the techniques in a
separated way. In the quantitative tests was not possi- Sarkar, S. and G. Healey (2010). Hyperspectral
ble to make a fair assessment due to non appropriation texture synthesis using histogram and power
of various metrics to evaluate different approaches to spectral density matching. Geoscience and Re-
redefine regions. Actually, we are interested in use mote Sensing, IEEE Transactions on 48(5),
quantitative approaches with a more reasonable jus- 2261 –2270.
tification, using not only information of pixel values, Siravenha, A. (2011). Um mtodo para classifi-
but also the context, shape and other attributes that are cao de imagens de satlite usando transformada
similar to the subjective evaluation of human eyes. cosseno discreta com deteco e remoo de nuvens
e sombras.
REFERENCES
Taschler, M. (2006). A comparative analysis of
Bertalmio, M., G. Sapiro, V. Caselles, and image inpainting techniques. Technical report,
C. Ballester (2000). Image inpainting. In Pro- The University of York.
ceedings of the 27th annual conference on
Computer graphics and interactive techniques, Vese, L. A. and S. J. Osher (2002). Modeling tex-
pp. 417–424. tures with total variation minimization and os-
cillating patterns in image processing. JOUR-
Brennan, S. (2007). Ee264 project: Simultaneous NAL OF SCIENTIFIC COMPUTING 19, 553–
structure and texture image inpainting. 572.
Buckley, M. (1994). Fast computation of a dis- Zhang, X., F. Qin, and Y. Qin (2010). Study on the
cretized thin-plate smoothing spline for image thick cloud removal method based on multi-
data. Biometrika 81, 247–258. temporal remote sensing images. In Interna-
Bugeau, A. and M. Bertalmo (2009). Combin- tional Conference on Multimedia Technology
ing texture synthesis and diffusion for image (ICMT), pp. 1–3.
inpainting. In A. Ranchordas and H. Arajo