0% found this document useful (0 votes)
85 views10 pages

Chromagan: Adversarial Picture Colorization With Semantic Class Distribution

This document summarizes a research paper titled "ChromaGAN: Adversarial Picture Colorization with Semantic Class Distribution". The paper proposes ChromaGAN, an adversarial learning approach to colorize grayscale images that incorporates semantic information. ChromaGAN uses a generative network conditioned on semantic clues to infer chromaticity. It is trained within an adversarial framework to colorize images in a realistic way based on color and class distributions, achieving state-of-the-art results. The paper contributes an adversarial learning method combining color, perceptual information and semantic class distributions, an unsupervised semantic class distribution learning, and a study showing semantic clues with adversarial learning yield high quality colorizations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views10 pages

Chromagan: Adversarial Picture Colorization With Semantic Class Distribution

This document summarizes a research paper titled "ChromaGAN: Adversarial Picture Colorization with Semantic Class Distribution". The paper proposes ChromaGAN, an adversarial learning approach to colorize grayscale images that incorporates semantic information. ChromaGAN uses a generative network conditioned on semantic clues to infer chromaticity. It is trained within an adversarial framework to colorize images in a realistic way based on color and class distributions, achieving state-of-the-art results. The paper contributes an adversarial learning method combining color, perceptual information and semantic class distributions, an unsupervised semantic class distribution learning, and a study showing semantic clues with adversarial learning yield high quality colorizations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

ChromaGAN: Adversarial Picture Colorization with Semantic Class Distribution

Patricia Vitoria, Lara Raad and Coloma Ballester


Department of Information and Communication Technologies
University Pompeu Fabra, Barcelona, Spain
{patricia.vitoria, lara.raad, coloma.ballester}@upf.edu
arXiv:1907.09837v2 [cs.CV] 20 Jan 2020

Abstract

The colorization of grayscale images is an ill-posed


problem, with multiple correct solutions. In this paper, we
propose an adversarial learning colorization approach cou-
pled with semantic information. A generative network is
used to infer the chromaticity of a given grayscale image
conditioned to semantic clues. This network is framed in
an adversarial model that learns to colorize by incorpo-
rating perceptual and semantic understanding of color and
class distributions. The model is trained via a fully self-
supervised strategy. Qualitative and quantitative results
show the capacity of the proposed method to colorize im-
ages in a realistic way achieving state-of-the-art results.

1. Introduction
Colorization is the process of adding plausible color in-
formation to monochrome photographs or videos (we refer Figure 1. ChromaGAN is able to colorize a grayscale image from
to [37] for an interesting historical review). Currently, dig- the semantic understanding of the captured scene.
ital colorization of black and white visual data is a crucial
task in areas so diverse as advertising and film industries,
photography technologies or artist assistance. Although im- semantic understanding of the captured scene. To illus-
portant progress has been achieved in this field, automatic trate this, Fig. 1 shows vibrant and diverse colorizations
image colorization still remains a challenge. frequently achieved. Moreover, ChromaGAN shows vari-
Colorization is a highly undetermined problem, requir- ability by colorizing differently some objects belonging to
ing mapping a real-valued grayscale image to a three- the same category that may have several real colors, as for
dimensional color-valued one, that has not a unique solu- example, the birds in Fig. 1. The user-based perceptual ab-
tion. Before the emergence of deep learning techniques, lation study show that the effect of the generative adversar-
the most effective methods relied on human intervention, ial learning is key to obtain those vivid colorizations.
usually through either user-provided color scribbles or a The contributions of this work include:
color reference image. Recently, convolutional neural net- • An adversarial learning approach coupled with semantic
work strategies have benefit from the huge amount of pub- information leading to a three term loss combining color
licly available color images in order to automatically learn and perceptual information with semantic class distribution.
what colors naturally correspond to the real objects and its • An unsupervised semantic class distribution learning.
parts. In this paper, we propose a fully automatic end-to-end • A perceptual study showing that semantic clues coupled
adversarial approach called ChromaGAN. It combines the to an adversarial approach yields high quality results.
strength of generative adversarial networks (GANs) with se- The outline of this paper is as follows. Section 2
mantic class distribution learning. As a result, ChromaGAN reviews the related work. In Section 3 the proposed
is able to perceptually colorize a grayscale image from the model, architecture and algorithm are detailed. Section 4

1
presents qualitative and quantitative results. Finally, the a fully-automatic colorization method formulated as a least
paper is concluded in Section 5. The code is available at square minimization problem solved with deep neural net-
https://github.com/pvitoria/ChromaGAN. works. A semantic feature descriptor is proposed and given
as an input to the network. In [11], a supervised learning
2. Related Work method is proposed through a linear parametric model and a
variational autoencoder which is computed by quadratic re-
In the past two decades, several colorization techniques gression on a dataset of color images. These approaches are
have been proposed. They can be classified in three classes: improved by the use of CNNs and large-scale datasets. For
scribble-based, exemplar-based and deep learning-based instance, Iizuka et al. [17] extract local and global features
methods. The first two classes depend on human interven- to predict the colorization. The network is trained jointly for
tion. The third one is based on learning leveraging the possi- classification and colorization in a labeled dataset. Zhang
bility of easily creating training data from any color image. et al. [39] learn the color distribution of every pixel. The
Scribble-based methods. The user provides local hints, network is trained with a multinomial cross entropy loss
as for instance color scribbles, which are then propagated with rebalanced rare classes allowing unusual colors to ap-
to the whole image. These methods were initiated with the pear. Mouzon et al. [26] couple the resulting distribution
work of Levin et al. [23]. They assume that spatial neigh- in Zhang et al. [39] with the variational approach proposed
boring pixels having similar intensities should have similar in [28]. This allows to select for each pixel a color can-
colors. They formalize this premise optimizing a quadratic didate from the pixel color distributions while regularizing
cost function constrained to the values given by the scrib- the result. Also, it avoids the halo artifacts noticed in [39].
bles. Several improvements were proposed in the literature. Larsson et al. [22] train a deep CNN to learn per-pixel color
Huang et al. [16] improve the bleeding artifact using the histograms. They use a VGG network to interpret the se-
grayscale image edge information. Yatziv et al. [37] pro- mantic composition of the scene and the localization of ob-
pose a luminance-weighted chrominance blending to relax jects to then predict the color histograms of every pixel. The
the dependency of the position of the scribbles. Then, Luan network is trained with the KL divergence.
et al. [25] use the scribbles to segment the grayscale image
and thus better propagate the colors. These methods suf- Other CNN based approaches are combined with user in-
fer from requiring large amounts of user inputs in particular teractions. For instance, Zhang et al. [40] propose to train a
when dealing with complex textures. Moreover, choosing deep network given the grayscale version and a set of sparse
the correct color palette is not a trivial task. user inputs. This allows the user to have more than one plau-
Exemplar-based methods. They transfer the colors of a sible solution. Also, He et al. [15] propose an exemplar-
reference image to a grayscale one. Welsh et al. [36], pro- based colorization method using a deep learning approach.
pose to match the luminance values and texture information The colorization network jointly learns faithful local col-
between images. This approach lacks of spatial coherency orization to a meaningful reference and plausible color pre-
which yields unsatisfactory results. To overcome this, sev- diction when a reliable reference is unavailable. These hy-
eral improvements have been proposed. Ironi et al. [18] brid methods yield results containing rare colors. Recently,
transfer some color values from a segmented source image Yoo et al. in [38] capture rare color instances without hu-
which are then used as scribbles in [23]. Tai et al. [33] con- man interaction using a memory network together with a
struct a probabilistic segmentation of both images to trans- triplet loss without the need of labels.
fer color between any two regions having similar statistics. Some methods use GANs [12] to colorize grayscale im-
Charpiat et al. [7] deal with the multimodality of the col- ages. Their ability in learning probability distributions over
orization problem estimating for each pixel the conditional high-dimensional spaces of data such as color images has
probability of colors. Chia et al. [9] use the semantic in- found widespread use for many tasks (e.g., [6, 19, 21, 24,
formation of the grayscale image. Gupta et al. [14] transfer 29, 35, 42]). Isola et al. [19] propose to use conditional
colors based on the features of the superpixel representa- GANs to map an input image to an output image using a U-
tion of both images. Bugeau et al. [4] colorize an image Net based generator. They train their network by combining
by solving a variational model which allows to select the the L1 -loss with an adapted GAN loss. An extension is pro-
best color candidate, from a previous selection of color val- posed in [27] where the authors claim to generalize [19] to
ues, while adding some regularization in the colorization. high resolution images, speed up and stabilize the training.
Although this type of methods reduce significantly the user Cao et al. [5] also use conditional GANs and obtain diverse
inputs, they are still highly dependent on the reference im- possible colorizations by sampling several times the input
age which must be similar to the grayscale image. noise. It is incorporated in multiple layers in their archi-
Deep learning methods. Recently, different approaches tecture, which consists of a fully convolutional non-stride
have been proposed to leverage the huge amount of network. Notice that none of these GANs based methods
grayscale/color image pairs. Cheng et al. [8] first proposed use additional information such as classification while our

2
method incorporates the distribution of semantic classes in differences. Then,
an adversarial approach coupled with color regression.
Ls (Gθ22 ) = EL∼Prg KL yv k Gθ22 (L)
 
(3)
3. Proposed Approach denotes the class distribution loss, where Prg denotes the
distribution of grayscale input images, and yv ∈ Rm the
Given a grayscale input image L, we learn a mapping output distribution vector of a pre-trained VGG-16 model
G : L −→ (a, b) such that I = (L, a, b) is a plausible color applied to the grayscale image [32] (details are given be-
image and a and b are chrominance channel images in the low). KL(·k·) stands for the Kullback-Leibler divergence.
CIE Lab color space. A plausible color image is one having Finally, Lg denotes the WGAN loss which consists of an
geometric, perceptual and semantic photo-realism. adversarial Wasserstein GAN loss [1]. Let us first remark
In this paper, the mapping G is learnt by means of an that leveraging the WGAN instead of other GAN losses
adversarial learning strategy. The colorization is produced favours nice properties such as avoiding vanishing gradi-
through a generator −equivalent to G above− that predicts ents and mode collapse, and achieves more stable training.
the chrominance channels (a, b). In parallel, a discrimina- To compute it, we use the Kantorovich-Rubinstein dual-
tor evaluates how realistic is the proposed colorization I = ity [20, 34]. As in [13], we also include a gradient penalty
(L, a, b) of L. To this aim, we propose in Section 3.1 a new term constraining the L2 norm of the gradient of the dis-
adversarial energy that learns the parameters θ and w of the criminator with respect to its input and, thus, imposing that
generator Gθ and the discriminator Dw , respectively. This Dw ∈ D, where D denotes the set of 1-Lipschitz functions.
is done training end-to-end the proposed network in a self- To sum up, our WGAN loss is defined by
supervised manner by using a dataset of real color images. h i
In particular, given a color image Ir = (L, ar , br ) in the Lg (Gθ11 , Dw ) = EI∼P
˜ r Dw (I)
˜
CIE Lab color space, we train our model to learn the color
information by detaching the training data in two parts: The − E(a,b)∼PG1 [Dw (L, a, b)] (4)
θ1
grayscale L and the chrominance channels (ar , br ). ˆ 2 − 1)2 ].
− EI∼P
ˆ [(k∇IˆDw (I)k
For the sake of clarity and by a slight abuse of notation, Î

we shall write Gθ and Dw instead of θ and w, respectively. where PGθ1 is the model distribution of Gθ11 (L), with L ∼
1
Moreover, our generator Gθ will not only learn to gener- Prg . As in [13], PIˆ is implicitly defined sampling uni-
ate color but also a class distribution vector, denoted by formly along straight lines between pairs of point sampled
y ∈ Rm , where m is the fixed number of classes. This from the data distribution Pr and the generator distribution
provides information about the probability distribution of PGθ1 . The minus before the gradient penalty term in (4)
the semantic content and objects present in the image. The 1
corresponds to the fact that, in practice, when optimizing
authors of [17] also incorporate semantic vectors but their
with respect to the discriminator parameters, our algorithm
proposal needs class labels while we learn the semantic im-
minimizes the negative of the loss instead of maximizing it.
age distribution contained in L that boosts its colorization
From the previous loss (1), we compute the weights of
without any need of a labeled dataset. For that, our genera-
Gθ , Dw by solving the following min-max problem
tor model combines two different modules. We denote it by
Gθ = (Gθ11 , Gθ22 ), where θ = (θ1 , θ2 ) stand for all the gen- min max L(Gθ , Dw ), (5)
Gθ Dw ∈D
erator parameters, Gθ11 : L −→ (a, b), and Gθ22 : L −→ y.
An overview of the model architecture can be seen in Fig. 2 The hyperparameters λg and λs are fixed and set to 0.1 and
and will be described in Section 3.2. In the next Section 3.1 0.003, respectively. Let us comment more in detail the ben-
the proposed adversarial loss is stated. efits of each of the elements of our approach.
The adversarial strategy and the GAN loss Lg . The
3.1. The Objective Function min-max problem (5) follows the usual generative adver-
sarial game between the generator and the discriminator
Our objective loss is defined by
networks. The goal is to learn the parameters of the gen-
L(Gθ , Dw ) = Le (Gθ11 ) + λg Lg (Gθ11 , Dw ) + λs Ls (Gθ22 ). (1) erator so that the probability distribution of the generated
data approaches the one of the real data, while the discrim-
The first term denotes the color error loss inator aims to distinguish between them. The initial GAN
proposals optimize the Jensen-Shannon divergence that can
Le (Gθ11 ) = E(L,ar ,br )∼Pr kGθ11 (L) − (ar , br ))k22
 
(2) be non-continuous with respect to the generator parameters.
Besides, the WGAN [1, 2] minimizes an approximation of
where Pr stands for the distribution of real color images and the Earth-Mover distance between two probability distribu-
k·k2 for the Euclidean norm. Notice that Euclidean distance tions. It is known to be a powerful tool to compare dis-
in the Lab color space is more adapted to perceptual color tributions with non-overlapping supports, in contrast to the

3
Figure 2. Overview of our model, ChromaGAN, able to automatically colorize grayscale images. It combines a Discriminator network,
Dw (in green), and a Generator network, Gθ . Gθ consists of two subnetworks: Gθ11 (yellow, purple, red and blue layers) that outputs the
chrominance information (a, b) = Gθ11 (L), and Gθ22 (yellow, red and gray layers) which outputs the class distribution vector, y = Gθ22 (L).

KL and the Jensen-Shannon divergences which produce the Class Distribution Loss. The KL-based loss Ls (Gθ22 )
vanishing gradients problem. Also, the WGAN alleviates (3) compares the generated density distribution vector y =
the mode collapse problem which is interesting when aim- Gθ22 (L) to the ground truth distribution yv ∈ Rm . The lat-
ing to be able to capture multiple possible colorizations. ter is computed using the VGG-16 [32] pre-trained on Im-
The proposed Lg loss favours perceptually real results. ageNet dataset [10]. The VGG-16 model was trained on
As the experiments in Section 4 show and has been also no- color images; in order to use it without any further training,
ticed by some authors [19], the adversarial GAN model pro- we re-shape the grayscale image as (L, L, L). The class dis-
duces sharp and colorful images favouring the emergence of tribution loss adds semantic interpretation of the scene. The
a perceptually real palette of colors instead of ochreish out- effect of this term is analyzed in Section 4.
puts produced by colorization using only terms such as the
L2 or L1 color error loss. It is in agreement to the analysis 3.2. Detailed Model Architecture
in [3]. The authors define the perceptual quality of an im- The proposed GAN architecture is conditioned by the
age as the degree to which it looks like a natural image and grayscale image L through the proposed loss (1). It contains
argue that it can be defined as the best possible probability three distinct parts. The first two, belonging to the genera-
of success in real-vs-fake user studies. They show that per- tor, focus on geometrically and semantically generating a
ceptual quality is proportional to the distance between the color image information (a, b) and classifying its semantic
distribution of the generated images and the distribution of content. The third one belongs to the discriminator network
natural images, which, at the optimum generator and dis- learning to distinguish between real and fake data.
criminator, corresponds to the Lg value. An overview of the model is shown in Fig. 2. In the re-
Color Error Loss. In some colorization methods [22, maining of the section we will describe the full architecture.
39] the authors propose to learn a per-pixel color probabil- More details are available in the supplementary material.
ity distribution allowing them to use different classification Generator Architecture. The generator Gθ is made of
losses. Instead, we chose to learn two chrominance values two subnetworks (denoted by Gθ11 and Gθ22 ) divided in three
(a, b) per-pixel using the L2 norm. As mentioned, only us- stages with some shared modules between them. Both of
ing this type of loss yields ochreish outputs. However, in them will take as input a grayscale image of size H × W .
our case the use of the perceptual GAN-based loss relaxes The subnetwork Gθ11 outputs the chrominance information,
this effect making it sufficient to obtain notable results. (a, b) = Gθ11 (L), and the subnetwork Gθ22 outputs the com-

4
puted class distribution vector, y = Gθ22 (L). Both subnet- ceptual study quantifying the realism of colorized images
works are jointly trained with a single step backpropagation. by the proposed method. Nevertheless, in order to quantita-
The first stage (displayed in yellow in Fig. 2) is shared tively compare the results of the proposed methods to others
between both subnetworks. It has the same structure as in the literature a quantitative measure will also be used.
the VGG-16 without the three last fully-connected layers at To assess the effect of each term of our loss function in
the top of the network. We initialize them with pre-trained the entire network, we perform an ablation study by evalu-
VGG-16 weights which are not frozen during training. ating three variants of our method: ChromaGAN, using the
From this first stage on, both subnetworks, Gθ11 and Gθ12 , adversarial and classification approach, ChromaGAN w/o
split into two tracks. The first one (in purple in Fig. 2) Class avoiding the classification approach (λs = 0) and
process the data by using two modules of the form Conv- ChromaNet avoiding the adversarial approach (λp = 0).
BatchNorm-ReLu. The second track (in red), first pro-
cesses the data by using four modules of the form Conv- 4.1. Dataset
BatchNorm-ReLu, followed by three fully connected layers
We train each variant of the network end-to-end on 1.3M
(in red). This second path (in gray) outputs Gθ22 providing
images from the subset of images [30] taken from Ima-
the class distribution vector. To generate the probability dis-
geNet [10]. It contains objects from 1000 different cate-
tribution y of the m semantic classes, we apply a softmax
gories and of different color conditions, including grayscale
function. Notice that Gθ22 (L) is a classification network and
images. Due to the presence of fully connected layers in our
is initialized with pre-trained weights. But, as part of this
network, the input size to the class distribution branch has to
subnetwork is shared with Gθ11 , once the network is trained,
be fixed. We chose to work with input images of 224 × 224
it has learnt to give a class distribution close to the output
pixels as is done when training the VGG-16 [32] on Ima-
of the VGG-16 and to generate useful information to help
geNet. Thus, we have resized each image in the training
the colorization process. This could be understood as fine
set and converted it to a three channels grayscale image by
tuning the network to learn to perform two tasks at once.
triplicating the luminance channel L.
In the third stage both branches are fused (in red and pur-
ple in Fig. 2) by concatenating the output features. Later
4.2. Implementation Details
on, (a, b) will be predicted by processing the information
through six modules of the form Convolution-ReLu with We train the network for a total of five epochs and set
two up-sampling layers in between. Note that while per- the batch size to 10, on the 1.3M images from the ImageNet
forming back propagation with respect to the class distribu- training dataset resized to 224 × 224. A single epoch takes
tion loss, only the second subnetwork Gθ22 will be affected. approximately 23 hours on a NVIDIA Quadro P6000 GPU.
For the color error loss, the entire network will be affected. The prediction of the colorization of a single image takes an
Discriminator Architecture. The discriminator net- average of 4.4 milliseconds. We minimize our objective loss
work Dw is based on the Markovian discriminator archi- using Adam optimizer with learning rate equal to 2e−5 and
tecture (PatchGAN [19]). The PatchGAN discriminator momentum parameters β1 = 0.5 and β2 = 0.999. We alter-
keeps track of the high-frequency structures of the gener- nate the optimization of the generator Gθ and discriminator
ated image compensating the fact that the L2 loss Le (Gθ11 ) Dw . The first stage of the network (displayed in yellow in
fails in capturing high-frequency structures but succeeds Fig. 2), takes as input a grayscale image of size 224 × 224,
in capturing low-level ones. In order to model the high- and is initialized using the pre-trained weights of the VGG-
frequencies, the PatchGAN discriminator focuses on local 16 [32] trained on ImageNet.
patches. Thus, rather than giving a single output for the full
image, it classifies each patch as real or fake. We follow the 4.3. Quantitative Evaluation
architecture defined in [19] where the input and output are
A perceptual realism study is performed to show the
of size H × W and H/8 × W/8, respectively.
strength of coupling the adversarial approach with seman-
tic information. Furthermore, as state-of-the-art methods
4. Experimental Results and Discussion
on colorization, a quantitative assessment of our method is
In this section the proposed method is evaluated both included in terms of peak signal to noise ratio (PSNR).
quantitatively and qualitatively. Notice that evaluating the The following perceptual realism study was performed.
quality of a colorized image in a quantitative way is a chal- Images are shown one by one to non-expert participants,
lenging task. For some objects several colors could per- where some are natural color images and others the result
fectly fit. Therefore, quantitative measures reflecting how of a colorization method such as ChromaGAN, Chroma-
close the outputs are to the ground truth data are not the best GAN w/o classification, ChromaNet and the four state-of-
measures for this type of problem. For that reason, qualita- the-art methods [17, 22, 39, 19]. For each image shown, the
tive comparisons are provided as well as a user-based per- participant indicates if the colorization is realistic or not.

5
GT ChromaGAN w/o Class Chroma Net. Iizuka [17] Larsson [22] Zhang [39]

Figure 3. Some qualitative results using, from right to left: Gray scale, ChromaGAN, ChromaGAN w/o Classification, ChromaNet, Iizuka
et al. [17], Larsson et al. [22] and Zhang et al. [39]

6
Method Naturalness 4.4. Qualitative Evaluation
Real images 87.1%
ChromaGAN 76.9% Our results are compared to those obtained in [17, 22, 39]
ChromaGAN w/o Class 70.9% by using the publicly available online demos. The meth-
ChromaNet 61.4% ods are trained on ImageNet dataset in the case of [39, 22]
Iizuka et al. [17] 53.9% and on Places dataset in the case of [17]. Notice that while
Larsson et al. [22] 53.6% ImageNet contains 1.3M training images, Places contains
Zhang et al. [39] 52.2% 2.4M. Several colorization results are shown on the vali-
Isola et al. [19] 27.6% dation set of ImageNet dataset in Fig. 3 and on Places in
Fig. 5. As can be observed, the method in [17] and Chro-
Table 1. Numerical detail of the perceptual test. The values shows maNet tend to output muted colours in comparison to the
the mean naturalness over all the experiments of each method. lively colors obtained with ChromaGAN, ChromaGAN w/o
class and [22, 39]. Also, ChromaGAN is able to recon-
struct color information by adding natural and vivid colors
Method PSNR (dB) in almost all the examples (specially the first, fifth, sixth,
ChromaGAN 24.98 eighth and ninth rows). Notice that the deep CNN in [39]
ChromaGAN w/o Class 25.04 is trained with a multinomial cross entropy loss with re-
ChromaNet 25.57 balanced rare classes allowing in particular for unusual col-
Iizuka et al. [17] 23.69 ors to appear in the colorized image. Desaturated results are
Larsson et al. [22] 24.93 mainly obtained by [17] and with our method without using
Zhang et al. [39] 22.04 the adversarial approach (specially in the first, second, third,
Isola et al. [19] 21.57 fourth, fifth and eighth rows), in some cases also by [22]
(second, fourth and eighth rows). Also, color boundaries
Table 2. Comparison of the average PSNR values for automatic are not clearly separated generally in the case of [17] and
methods, some extracted from the table in [40]. The experiment is sometimes by GAN w/o Class (sixth row) and [22] (third,
performed on 1000 images of the ILSVRC2012 challenge set [31]. fourth and eighth rows). Inconsistent chromaticities can be
found in the second and sixth row by [39] where the wall is
blue and the apples green and red at the same time. Third
and seventh rows display some failure cases of our method:
Fifty images are taken randomly from a set of 1600 im- the bottom-right butterfly wing is colored in green. In fact,
ages composed of 200 ground truth images (from ImageNet the case of the seventh row shows a difficult case for all
and Places datasets [41]) and 200 results from each method the methods. For the sake of comparison, some results on
(ChromaGAN, ChromaNet, ChromaGAN w/o Class and Places dataset are shown in Fig. 5 for ChromaGAN trained
[17, 22, 39, 19]). The study was performed 62 times. on ImageNet, together with the results of [17] trained on
Table 1 shows the results of perceptual realism for each Places dataset. In Fig. 6 the results of ChromaGAN are
method. One can observe that ChromaGAN who couples compared to those of the adversarial approach in [19]. The
the adversarial approach with semantic information yields results in the second column of Fig. 6 were taken directly
perceptually more realistic results. Moreover, by com- from [19]. Overall, their results provide sometimes more
paring the results of ChromaNet, ChromaGAN w/o Class vivid colors as in the second row and sometimes uncol-
and [17, 22, 39, 19], one can conclude that the adversar- orized results as in the first row. Furthermore, the results of
ial approach plays a more important role than the semantic ChromaGAN yield better results in terms of consistent hue.
information for the generation of natural color images. In the second column unnatural color stains are observed as
for instance the green marks under the bird (second row)
Additionally, the PSNR of the obtained (a, b) images is
and the yellow and red stains on the coffee pot (third row).
computed with respect to the ground truth and compared to
those obtained for other fully automatic methods as shown Legacy Black and White Photographs. ChromaGAN
in Table 2. The table shows the average of this measure over is trained using color images where the chrominance infor-
all the test images. One can observe that, in general, our mation is removed. Due to the progress in the field of pho-
PSNR values are higher than those obtained by [17, 22, 39]. tography, there is a great difference in quality between old
Moreover, comparing the PSNR of our three variants the black and white images and modern color images. Thus,
highest one is achieved by ChromaNet. This is not surpris- generating color information in original black and white im-
ing since the training loss of this method gives more im- ages is a challenging task. Fig. 4 shows some results. Addi-
portance to the quadratic color error term compared to the tional examples can be found in the supplementary material,
losses of ChromaGAN and ChromaGAN w/o Class. where we also include results applied on paintings.

7
Figure 4. Colorization results of historical black and white photographs using the proposed ChromaGAN. Note that old black and white
photographs are statistically different than actual ones, thus, making the process of colorize more difficult.

GT Iizuka [17] ChromaGAN GT Isola [19] ChromaGAN

Figure 5. Results on images from the validation set of the Places Figure 6. Results on GAN-based approaches. From left to right:
Dataset. Left: Ground truth, middle: Iizuka et al. [17], right: Ground Truth, results from Isola [19] and results from Chroma-
ChromaGAN. Notice that the model of [17] is trained on Places GAN. The results from Isola et al. were extracted from [19].
Dataset having the double of training images. Our model is trained
on the ImageNet dataset. The results are comparable.
them in terms of PSNR and perceptual quality while quali-
tative comparisons show their visual high quality.
5. Conclusions
Acknowledgements
A novel colorization method is presented. The proposed
ChromaGAN model is based on an adversarial strategy that This work was partially funded by MICINN/FEDER UE
captures geometric, perceptual and semantic information. project, reference PGC2018-098625-B-I00 VAGS, and by
A variant of ChromaGAN which avoids using the distribu- H2020-MSCA-RISE-2017 project, reference 777826 No-
tion of semantic classes also shows satisfying results. Both MADS. We thank the support of NVIDIA Corporation for
cases prove that our adversarial technique provides photo- the donation of GPUs used in this work, Rafael Grompone
realistic colorful images. The quantitative comparison to von Gioi for valuable suggestions and José Lezama for his
state-of-the-art methods show that our method outperforms help with the user study.

8
References [17] S. Iizuka, E. Simo-Serra, and H. Ishikawa. Let there be
color!: joint end-to-end learning of global and local im-
[1] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. age priors for automatic image colorization with simultane-
arXiv:1701.07875, 2017. ous classification. ACM Transactions on Graphics (TOG),
[2] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gen- 35(4):110, 2016.
erative adversarial networks. In International Conference on [18] R. Ironi, D. Cohen-Or, and D. Lischinski. Colorization by
Machine Learning, pages 214–223, 2017. example. In Rendering Techniques, pages 201–210. Citeseer,
[3] Y. Blau and T. Michaeli. The perception-distortion tradeoff. 2005.
In Proceedings of the IEEE Conference on Computer Vision [19] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-
and Pattern Recognition, pages 6228–6237, 2018. image translation with conditional adversarial networks. In
[4] A. Bugeau, V.-T. Ta, and N. Papadakis. Variational Proceedings of the IEEE conference on computer vision and
exemplar-based image colorization. IEEE Transactions on pattern recognition, pages 1125–1134, 2017.
Image Processing, 23(1):298–307, 2014. [20] L. Kantorovitch. On the translocation of masses. Manage-
[5] Y. Cao, Z. Zhou, W. Zhang, and Y. Yu. Unsupervised diverse ment Science, 5(1):1–4, 1958.
colorization via generative adversarial networks. In Joint [21] T. Karras, S. Laine, and T. Aila. A style-based genera-
European Conference on Machine Learning and Knowledge tor architecture for generative adversarial networks. arXiv
Discovery in Databases, pages 151–166. Springer, 2017. preprint arXiv:1812.04948, 2018.
[6] C. Chan, S. Ginosar, T. Zhou, and A. A. Efros. Everybody [22] G. Larsson, M. Maire, and G. Shakhnarovich. Learning rep-
dance now. arXiv:1808.07371, 2018. resentations for automatic colorization. In European Confer-
[7] G. Charpiat, M. Hofmann, and B. Schölkopf. Automatic im- ence on Computer Vision, pages 577–593. Springer, 2016.
age colorization via multimodal predictions. In European [23] A. Levin, D. Lischinski, and Y. Weiss. Colorization using
conference on computer vision, pages 126–139. Springer, optimization. In ACM transactions on graphics (tog), vol-
2008. ume 23, pages 689–694. ACM, 2004.
[8] Z. Cheng, Q. Yang, and B. Sheng. Deep colorization. In Pro- [24] K. Lin, D. Li, X. He, Z. Zhang, and M.-T. Sun. Adversar-
ceedings of the IEEE International Conference on Computer ial ranking for language generation. In Advances in Neural
Vision, pages 415–423, 2015. Information Processing Systems, pages 3155–3165, 2017.
[9] A. Y.-S. Chia, S. Zhuo, R. K. Gupta, Y.-W. Tai, S.-Y. Cho, [25] Q. Luan, F. Wen, D. Cohen-Or, L. Liang, Y.-Q. Xu, and H.-
P. Tan, and S. Lin. Semantic colorization with internet im- Y. Shum. Natural image colorization. In Proceedings of
ages. In ACM Transactions on Graphics (TOG), volume 30, the 18th Eurographics conference on Rendering Techniques,
page 156. ACM, 2011. pages 309–320. Eurographics Association, 2007.
[10] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. [26] T. Mouzon, F. Pierre, and M.-O. Berger. Joint cnn and varia-
ImageNet: A Large-Scale Hierarchical Image Database. In tional model for fully-automatic image colorization. In Inter-
CVPR09, 2009. national Conference on Scale Space and Variational Meth-
[11] A. Deshpande, J. Rock, and D. Forsyth. Learning large-scale ods in Computer Vision, pages 535–546. Springer, 2019.
automatic image colorization. In Proceedings of the IEEE [27] K. Nazeri and E. Ng. Image colorization with generative ad-
International Conference on Computer Vision, pages 567– versarial networks. arXiv preprint arXiv:1803.05400, 2018.
575, 2015. [28] F. Pierre, J.-F. Aujol, A. Bugeau, N. Papadakis, and V.-T.
[12] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, Ta. Luminance-chrominance model for image colorization.
D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Gen- SIAM Journal on Imaging Sciences, 8(1):536–563, 2015.
erative adversarial nets. In Adv in neural inf processing sys- [29] A. Pumarola, A. Agudo, A. Martinez, A. Sanfeliu, and
tems, pages 2672–2680, 2014. F. Moreno-Noguer. GANimation: Anatomically-aware Fa-
[13] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. cial Animation from a Single Image. In Proceedings of the
Courville. Improved training of wasserstein gans. In Adv in European Conference on Computer Vision (ECCV), 2018.
Neural Inf Processing Systems, pages 5769–5779, 2017. [30] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,
[14] R. K. Gupta, A. Y.-S. Chia, D. Rajan, E. S. Ng, and H. Zhiy- S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,
ong. Image colorization using similar images. In Proceed- A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual
ings of the 20th ACM international conference on Multime- Recognition Challenge. International Journal of Computer
dia, pages 369–378. ACM, 2012. Vision (IJCV), 115(3):211–252, 2015.
[15] M. He, D. Chen, J. Liao, P. V. Sander, and L. Yuan. Deep [31] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,
exemplar-based colorization. ACM Transactions on Graph- S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,
ics (TOG), 37(4):47, 2018. et al. Imagenet large scale visual recognition challenge.
[16] Y.-C. Huang, Y.-S. Tung, J.-C. Chen, S.-W. Wang, and J.- International journal of computer vision, 115(3):211–252,
L. Wu. An adaptive edge detection based colorization algo- 2015.
rithm and its applications. In Proceedings of the 13th annual [32] K. Simonyan and A. Zisserman. Very deep convolutional
ACM international conference on Multimedia, pages 351– networks for large-scale image recognition. arXiv preprint
354. ACM, 2005. arXiv:1409.1556, 2014.

9
[33] Y.-W. Tai, J. Jia, and C.-K. Tang. Local color transfer via
probabilistic segmentation by expectation-maximization. In
2005 IEEE Computer Society Conference on Computer Vi-
sion and Pattern Recognition (CVPR’05), volume 1, pages
747–754. IEEE, 2005.
[34] C. Villani. Optimal transport: old and new, volume 338.
Springer Science & Business Media, 2008.
[35] P. Vitoria., J. Sintes., and C. Ballester. Semantic image
inpainting through improved wasserstein generative adver-
sarial networks. In Proceedings of the 14th International
Joint Conference on Computer Vision, Imaging and Com-
puter Graphics Theory and Applications - Volume 4: VIS-
APP,, pages 249–260. INSTICC, SciTePress, 2019.
[36] T. Welsh, M. Ashikhmin, and K. Mueller. Transferring color
to greyscale images. In ACM Transactions on Graphics
(TOG), volume 21, pages 277–280. ACM, 2002.
[37] L. Yatziv and G. Sapiro. Fast image and video colorization
using chrominance blending. IEEE transactions on image
processing, 15(5):1120–1129, 2006.
[38] S. Yoo, H. Bahng, S. Chung, J. Lee, J. Chang, and J. Choo.
Coloring with limited data: Few-shot colorization via mem-
ory augmented networks. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition, pages
11283–11292, 2019.
[39] R. Zhang, P. Isola, and A. A. Efros. Colorful image col-
orization. In European conference on computer vision, pages
649–666. Springer, 2016.
[40] R. Zhang, J.-Y. Zhu, P. Isola, X. Geng, A. S. Lin, T. Yu, and
A. A. Efros. Real-time user-guided image colorization with
learned deep priors. arXiv preprint arXiv:1705.02999, 2017.
[41] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba.
Places: A 10 million image database for scene recognition.
IEEE transactions on pattern analysis and machine intelli-
gence, 40(6):1452–1464, 2018.
[42] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-
to-image translation using cycle-consistent adversarial net-
works. In Proceedings of the IEEE International Conference
on Computer Vision, pages 2223–2232, 2017.

10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy