0% found this document useful (0 votes)

17 views9 pages

Invertible Conditional GANs For Image Editing

GAN in images

Uploaded by

Sandeep Gurung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views9 pages

Invertible Conditional GANs For Image Editing

GAN in images

Uploaded by

Sandeep Gurung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Invertible Conditional GANs for image editing

Guim Perarnau, Joost van de Weijer, Bogdan Raducanu Jose M. Álvarez

Computer Vision Center Data61 @ CSIRO
Barcelona, Spain Canberra, Australia
guimperarnau@gmail.com jose.alvarez@nicta.com.au
arXiv:1611.06355v1 [cs.CV] 19 Nov 2016

{joost,bogdan}@cvc.uab.es

Abstract
Generative Adversarial Networks (GANs) have recently demonstrated to success-
fully approximate complex data distributions. A relevant extension of this model is
conditional GANs (cGANs), where the introduction of external information allows
to determine specific representations of the generated images. In this work, we
evaluate encoders to inverse the mapping of a cGAN, i.e., mapping a real image
into a latent space and a conditional representation. This allows, for example, to
reconstruct and modify real images of faces conditioning on arbitrary attributes.
Additionally, we evaluate the design of cGANs. The combination of an encoder
with a cGAN, which we call Invertible cGAN (IcGAN), enables to re-generate real
images with deterministic complex modifications.

1 Introduction
Image editing can be performed at different levels of complexity and abstraction. Common operations
consist in simply applying a filter to an image to, for example, augment the contrast or convert to
grayscale. These, however, are low-complex operations that do not necessarily require to comprehend
the scene or object that the image is representing. On the other hand, if one would want to modify the
attributes of a face (e.g. add a smile, change the hair color or even the gender), this is a more complex
and challenging modification to perform. In this case, in order to obtain realistic results, a skilled
human with an image edition software would often be required.
A solution to automatically perform these non-trivial operations relies on generative models. Natural
image generation has been a strong research topic for many years, but it has not been until 2015
that promising results have been achieved with deep learning techniques combined with generative
modeling [1, 2]. Generative Adversarial Networks (GANs) [3] is one of the state-of-the-art approaches
for image generation. GANs are especially interesting as they are directly optimized towards
generating the most plausible and realistic data, as opposed to other models (e.g. Variational
Autoencoders [4]), which focus on an image reconstruction loss. Additionally, GANs are able to
explicitly control generated images features with a conditional extension, conditional GANs (cGANs).
However, the GAN framework lacks an inference mechanism, i.e., finding the latent representation of
an input image, which is a necessary step for being able to reconstruct and modify real images.
In order to overcome this limitation, in this paper we introduce Invertible Conditional GANs (IcGANs)
for complex image editing as the union of an encoder used jointly with a cGAN. This model allows
to map real images into a high-feature space (encoder) and perform meaningful modifications on
them (cGAN). As a result, we can explicitly control the attributes of a real image (Figure 1), which
could be potentially useful in several applications, be it creative processes, data augmentation or face
profiling.
Code available at https://github.com/Guim3/IcGAN

Workshop on Adversarial Training, NIPS 2016, Barcelona, Spain.

Figure 1: Example of how the IcGAN reconstructs and applies complex variations on a real image.

The summary of contributions of our work is the following:

• Proposing IcGANs, composed of two crucial parts: an encoder and a cGAN. We apply this
model to MNIST [5] and CelebA [6] datasets, which allows performing meaningful and
realistic editing operations on them by arbitrarily changing the conditional information y.
• Introducing an encoder in the conditional GAN framework to compress a real image x into
a latent representation z and conditional vector y. We consider several designs and training
procedures to leverage the performance obtained from available conditional information.
• Evaluating and refining cGANs through conditional position and conditional sampling to
enhance the quality of generated images.

2 Related work

There are different approaches for generative models. Among them, there are two promising ones
that are recently pushing the state-of-the-art with highly plausible generated images.
The first one is Variational Autoencoders (VAE) [1, 4, 7, 8], which impose a prior representation
space z (e.g. normal distribution) in order to regularize and constrain the model to sample from it.
However, VAEs main limitation is the pixel-wise reconstruction error used as a loss function, which
causes the output images to look blurry. The second approach is Generative Adversarial Nets (GANs).
Originally proposed by Goodfellow et al. [3], GANs have been improved with a deeper architecture
(DCGAN) by Radford et al. [2]. The latest advances introduced several techniques that improve
the overall performance for training GANs [9] and an unsupervised approach to disentangle feature
representations [10]. Additionally, the most advanced and recent work on cGANs trains a model to
generate realistic images from text descriptions and landmarks [11].
Our work is considered in the content of the GAN framework. The baseline will be the work of
Radford’s et al. (DCGANs) [2], which we will add a conditional extension. The difference of our
approach to prior work is that we also propose an encoder (Invertible cGAN) with which we can,
given an input image x, to obtain its representation as a latent variable z and a conditional vector y.
Then, we can modify z and y to re-generate the original image with complex variations. Dumoulin et
al. [12] and Donahue et al. [13] also proposed an encoder in GANs, but in a non-conditional and
jointly trained setting. Additionally, Makhzani et al. [14] and Larsen et al. [15] proposed a similar
idea to this paper by combining a VAE and a GAN with promising results.
Reed et al. [16] implemented an encoder in a similar fashion to our approach. This paper builds
alongside their work in a complementary manner. In our case, we analyze more deeply the encoder by
including conditional information encoding and testing different architectures and training approaches.
Also, we evaluate unexplored design decisions for building a cGAN.

3 Background: Generative Adversarial Networks

A GAN is composed of two neural networks, a generator G and a discriminator D. Both networks
are iteratively trained competing against each other in a minimax game. The generator aims to
approximate the underlying unknown data distribution pdata to fool the discriminator, whilst the
discriminator is focused on being able to tell which samples are real or generated. On convergence,
we want pdata = pg , where pg is the generator distribution.

2
More formally, considering the function v(θg , θd ), where θg and θd are the parameters of the generator
G and discriminator D respectively, we can formulate GAN training as optimizing

min max v(θg , θd ) = Ex∼pdata [log D(x)] + Ez∼pz [log(1 − D(G(z)))], (1)
g d
where z is a vector noise sampled from a known simple distribution pz (e.g. normal).
GAN framework can be extended with conditional GANs (cGANs) [17]. They are quite similar to
vanilla (non-conditional) GANs, the only difference is that, in this case, we have extra information y
(e.g. class labels, attribute information) for a given real sample x. Conditional information strictly
depends on real samples, but we can model a density model py in order to sample generated labels y 0
for generated data x0 . Then, Equation 1 can be reformulated for the cGAN extension as

min max v(θg , θd ) = Ex,y∼pdata [log D(x, y)] + Ez∼pz ,y0 ∼py [log(1 − D(G(z, y 0 ), y 0 ))]. (2)
g d

Once a cGAN is trained, it allows us to generate samples using two level of variations: constrained
and unconstrained. Constrained variations are modeled with y as it directly correlates with features
of the data that are explicitly correlated with y and the data itself. Then, all the other variations of the
data not modeled by y (unconstrained variations) are encoded in z.

4 Invertible Conditional GANs

We introduce Invertible Conditional GANs (IcGANs), which are composed of a cGAN and an encoder.
Even though encoders have recently been introduced into the GAN framework [12, 13, 16], we are
the first ones to include and leverage the conditional information y into the design of the encoding
process. In section 4.1 we explain how and why an encoder is included in the GAN framework for
a conditional setting. In section 4.2, we introduce our approach to refine cGANs on two aspects:
conditional position and conditional sampling. The model architecture is described in section 4.3.

4.1 Encoder

A generator x0 = G(z, y 0 ) from a GAN framework does not have the capability to map a real image
x to its latent representation z. To overcome this problem, we can train an encoder/inference network
E that approximately inverses this mapping (z, y) = E(x). This inversion would allow us to have a
latent representation z from a real image x and, then, we would be able to explore the latent space by
interpolating or adding variations on it, which would result in variations on the generated image x0 .
If combined with a cGAN, once the latent representation z has been obtained, explicitly controlled
variations can be added to an input image via conditional information y (e.g. generate a certain digit
in MNIST or specify face attributes on a face dataset). We call this combination Invertible cGAN, as
now the mapping can be inverted: (z, y) = E(x) and x0 = G(z, y), where x is an input image and
x0 its reconstruction. See Figure 2 for an example on how a trained IcGAN is used.
Our approach consists of training an encoder E once the cGAN has been trained, as similarly
considered by Reed et al [16]. In our case, however, the encoder E is composed of two sub-encoders:
Ez , which encodes an image to z, and Ey , which encodes an image to y. To train Ez we use the
generator to create a dataset of generated images x0 and their latent vectors z, and then minimize
a squared reconstruction loss Lez (Eq. 3). For Ey , we initially used generated images x0 and their
conditional information y 0 for training. However, we found that generated images tend to be noisier
than real ones and, in this specific case, we could improve Ey by directly training with real images
and labels from the dataset pdata (Eq. 4).
Lez = Ez∼pz ,y0 ∼py kz − Ez (G(z, y 0 ))k22 (3)
Ley = Ex,y∼pdata ky − Ey (x)k22 (4)

Although Ez and Ey might seem completely independent, we can adopt different strategies to make
them interact and leverage the conditional information (for an evaluation of them, see section 5.3):
• SNG: One single encoder with shared layers and two outputs. That is, Ez and Ey are
embedded in a single encoder.

3
Figure 2: Scheme of a trained IcGAN, composed of an encoder (IND approach) and a cGAN
generator. We encode a real image x into a latent representation z and attribute information y, and
then apply variations on it to generate a new modified image x0 .

• IND: Two independent encoders. Ez and Ey are trained separately.

• IND-COND: Two encoders, where Ez is conditioned on the output of encoder Ey .
Recently, Dumoulin et al. [12] and Donahue et al. [13] proposed different approaches on how to train
an encoder in the GAN framework. One of the most interesting approaches consists in jointly training
the encoder with both the discriminator and the generator. Although this approach is promising, our
work has been completely independent of these articles and focuses on another direction, since we
consider the encoder in a conditional setting. Consequently, we implemented our aforementioned
approach which performs nearly equally [13] to their strategy.

4.2 Conditional GAN

We consider two main design decisions concerning cGANs. The first one is to find the optimal
conditional position y on the generator and discriminator, which, to our knowledge, has not been
previously addressed. Secondly, we discuss the best approach to sample conditional information for
the generator.
Conditional position In the cGAN, the conditional information vector y needs to be introduced
in both the generator and the discriminator. In the generator, y ∼ pdata and z ∼ pz (where
pz = N (0, 1)) are always concatenated in the filter dimension at the input level [16–18]. As for
the discriminator, different authors insert y in different parts of the model [16–18]. We expect that
the earlier y is positioned in the model the better since the model is allowed to have more learning
interactions with y. Experiments regarding the optimal y position will be detailed in section 5.2.
Conditional sampling There are two types of conditional information, y and y 0 . The first one is
trivially sampled from (x, y) ∼ pdata and is used for training the discriminator D(x, y) with a real
image x and its associated label y. The second one is sampled from y 0 ∼ py and serves as input
to the generator G(z, y 0 ) along with a latent vector z ∼ pz to generate an image x0 , and it can be
sampled using different approaches:
• Kernel density estimation: also known as Parzen window estimation, it consists in randomly
sampling from a kernel (e.g. Gaussian kernel with a cross-validated σ).
• Direct interpolation: interpolate between label vectors y from the training set [16]. The
reasoning behind this approach is that interpolations can belong to the label distribution py .
• Sampling from the training set y 0 ∼ py , py = pdata : Use directly the real labels y from
the training set pdata . As Gauthier [18] pointed out, unlike the previous two approaches,
this method could overfit the model by using the conditional information to reproduce the
images of the training set. However, this is only likely to occur if the conditional information
is, to some extent, unique for each image. In the case where the attributes of an image are
binary, one attribute vector y could describe a varied and large enough subset of images,
preventing the model from overfitting given y.

4
(a) (b)
Figure 3: Architecture of the generator (a) and discriminator (b) of our cGAN model. The generator
G takes as input both z and y. In the discriminator, y is concatenated in the first convolutional layer.

Kernel density estimation and direct interpolation are, at the end, two different ways to interpolate
on py . Nevertheless, interpolation is mostly suitable when the attribute information y is composed
of real vectors y ∈ Rn , not binary ones. It is not the case of the binary conditional information of
the datasets used in this paper (see section 5.1 for dataset information). Directly interpolating binary
vectors would not create plausible conditional information, as an interpolated vector y ∈ Rn would
not belong to py ∈ {0, 1}n nor pdata ∈ {0, 1}n . Using a kernel density estimation would not make
sense either, as all the binary labels would fall in the corners of a hypercube. Therefore, we will
directly sample y from pdata .

4.3 Model architecture

Conditional GAN The work of this paper is based on the Torch implementation of the DCGAN1
[2]. We use the recommended configuration for the DCGAN, which trains with the Adam optimizer
[19] (β1 = 0.5, β2 = 0.999, = 10−8 ) with a learning rate of 0.0002 and a mini-batch size of 64
(samples drawn independently at each update step) during 25 epochs. The output image size used as
a baseline is 64 × 64. Also, we train the cGAN with the matching-aware discriminator method from
Reed et al. [16]. In Figure 3 we show an overview architecture of both generator and discriminator
for the cGAN. For a more detailed description of the model see Table 1.

Table 1: Detailed generator and discriminator architecture

Generator Discriminator
Operation Kernel Stride Filters BN Activation Operation Kernel Stride Filters BN Activation
Concatenation Concatenate z and y 0 on 1st dimension Convolution 4×4 2×2 64 No Leaky ReLU
Full convolution 4×4 2×2 512 Yes ReLU Concatenation Replicate y and concatenate to 1st conv. layer
Full convolution 4×4 2×2 256 Yes ReLU Convolution 4×4 2×2 128 Yes Leaky ReLU
Full convolution 4×4 2×2 128 Yes ReLU Convolution 4×4 2×2 256 Yes Leaky ReLU
Full convolution 4×4 2×2 64 Yes ReLU Convolution 4×4 2×2 512 Yes Leaky ReLU
Full convolution 4×4 2×2 3 No Tanh Convolution 4×4 1×1 1 No Sigmoid

Encoder For simplicity, we show the architecture of the IND encoders (Table 2), as they are the ones
that give the best performance. Batch Normalization and non-linear activation functions are removed
from the last layer to guarantee that the output distribution is similar to pz = N (0, 1). Additionally,
after trying different configurations, we have replaced the last two convolutional layers with two fully
connected layers at the end of the encoder, which yields a lower error. The training configuration
(Adam optimizer, batch size, etc) is the same as the one used for the cGAN model.

Table 2: Encoder IND architecture. Last two layers have different sizes depending on the encoder (z
for Ez or y for Ey ). ny represents the size of y.
Operation Kernel Stride Filters BN Activation
Convolution 5×5 2×2 32 Yes ReLU
Convolution 5×5 2×2 64 Yes ReLU
Convolution 5×5 2×2 128 Yes ReLU
Convolution 5×5 2×2 256 Yes ReLU
Fully connected - - z: 4096, y: 512 Yes ReLU
Fully connected - - z: 100, y: ny No None

1
Torch code for DCGAN model available at https://github.com/soumith/dcgan.torch

5
5 Experiments
5.1 Datasets

We use two image datasets of different complexity and variation, MNIST [5] and CelebFaces
Attributes (CelebA) [6]. MNIST is a digit dataset of grayscale images composed of 60,000 training
images and 10,000 test images. Each sample is a 28 × 28 centered image labeled with the class of
the digit (0 to 9). CelebA is a dataset composed of 202,599 face colored images and 40 attribute
binary vectors. We use the aligned and cropped version and scale the images down to 64 × 64. We
also use the official train and test partitions, 182K for training and 20K for testing. Of the original 40
attributes, we filter those that do not have a clear visual impact on the generated images, which leaves
a total of 18 attributes. We will evaluate the quality of generated samples of both datasets. However,
a quantitative evaluation will be performed on CelebA only, as it is considerably more complex than
MNIST.

5.2 Evaluating the conditional GAN

The goals of this experiment are two. First, we evaluate the general performance of the cGAN with
an attribute predictor network (Anet) on CelebA dataset. Second, we test the impact of adding y in
different layers of the cGAN (section 4.2, conditional position).
We use an Anet2 as a way to make a quantitative evaluation in a similar manner as Salimans et al.
Inception model [9], as the output given by this Anet (i.e., which attributes are detected on a generated
sample) is a good indicator of the generator ability to model them. In other words, if the predicted
Anet attributes y 0 are closer to the original attributes y used to generate an image x0 , we expect that
the generator has successfully learned the capability to generate new images considering the semantic
meaning of the attributes. Therefore, we use the generator G to create images x0 conditioned on
attribute vectors y ∼ pdata (i.e. x0 = G(z, y)), and make the Anet predict them. Using the Anet
output, we build a confusion matrix for each attribute and compute the mean accuracy and F1-Score
to test the model and its inserted optimal position of y in both generator and discriminator.

Table 3: Quantitative cGAN evaluation depending on y inserted position. The first row shows the
results obtained with real CelebA images as an indication that Anet predictions are subject to error.
Discriminator Generator
Model Mean accuracy Mean F1-Score Mean accuracy Mean F1-Score
CelebA test set 92.78% 71.47% 92.78% 71.47%
y inserted on input 85.74% 49.63% 89.83% 59.69%
y inserted on layer 1 86.01% 52.42% 87.16% 52.40%
y inserted on layer 2 84.90% 50.00% 82.49% 52.36%
y inserted on layer 3 85.96% 52.38% 82.49% 38.01%
y inserted on layer 4 77.61% 19.49% 73.90% 4.03%

In Table 3 we can see how cGANs have successfully learned to generate the visual representations
of the conditional attributes with an overall accuracy of ∼ 86%. The best accuracy is achieved by
inserting y in the first convolutional layer of the discriminator and at the input level for the generator.
Thus, we are going to use this configuration for the IcGAN. Both accuracy and F1-Score are similar as
long as y is not inserted in the last convolutional layers, in which case the performance considerably
drops, especially in the generator. Then, these results reinforce our initial intuition of y being added
at an early stage of the model to allow learning interactions with it.

5.3 Evaluating the encoder

In this experiment, we prioritize the visual quality of reconstructed samples as an evaluation criterion.
Among the different encoder configurations of section 4.1, IND and IND-COND yield a similar
qualitative performance, being IND slightly superior. A comparison of these different configurations
is shown in Figure 4a and in Figure 4b we focus on IND reconstructed samples. On another level, the
fact that the generator is able, via an encoder, to reconstruct unseen images from the test set shows
that the cGAN is generalizing and suggests that it does not suffer from overfitting, i.e., it is not just
memorizing and reproducing training samples.
2
The architecture of the Anet is the same as Ey from Table 2.

6
(a) (b)
Figure 4: (a) Comparison of different encoder configurations, where IND yields the most faithful
reconstructions. (b) Reconstructed samples from MNIST and CelebA using IND configuration.

Additionally, we compare the different encoder configurations in a quantitative manner by using the
minimal squared reconstruction loss Le as a criterion. Each encoder is trained minimizing Le with
respect to latent representations z (Lez ) or conditional information y (Ley ). Then, we quantitatively
evaluate different model architectures using Le as a metric on a test set of 150K CelebA generated
images. We find that the encoder that yields the lowest Le is also IND (0.429), followed closely by
IND-CND (0.432), and being SNG the worst case (0.500).
Furthermore, we can see an interesting property of minimizing a loss based on the latent space instead
of a pixel-wise image reconstruction: reconstructed images tend to accurately keep high-level features
of an input image (e.g. how a face generally looks) in detriment to more local details such as the
exact position of the hair, eyes or face. Consequently, a latent space based encoder is invariant to
these local details, making it an interesting approach for encoding purposes. For example, notice how
the reconstructions in the last row of CelebA samples in Figure 4b fill the occluded part of the face
by a hand. Another advantage with respect to element-wise encoders such as VAE is that GAN based
reconstructions do not look blurry.

5.4 Evaluating the IcGAN

In order to test that the model is able to correctly encode and re-generate a real image by preserving
its main attributes, we take real samples from MNIST and CelebA test sets and reconstruct them with
modifications on the conditional information y. The result of this procedure is shown in Figure 5,
where we show a subset of 9 of the 18 for CelebA attributes for image clarity. We can see that, in
MNIST, we are able to get the hand-written style of real unseen digits and replicate these style on all
the other digits. On the other hand, in CelebA we can see how reconstructed faces generally match
the specified attribute. Additionally, we noticed that faces with uncommon conditions (e.g., looking
away from the camera, face not centered) were the most likely to be noisy. Furthermore, attributes
such as mustache often fail to be generated especially on women samples, which might indicate that
the generator is limited to some unusual attribute combinations.
Manipulating the latent space The latent feature representation z and conditional information y
learned by the generator can be further explored beyond encoding real images or randomly sampling
z. In order to do so, we linearly interpolate both z and y with pairs of reconstructed images from the
CelebA test set (Figure 6a). All the interpolated faces are plausible and the transition between faces
is smooth, demonstrating that the IcGAN learned manifold is also consistent between interpolations.
Then, this is also a good indicator that the model is generalizing the face representation properly, as it
is not directly memorizing training samples.
In addition, we perform in Figure 6b an attribute transfer between pairs of faces. We infer the latent
representation z and attribute information y of two real faces from the test set, swap y between
those faces and re-generate them. As we previously noticed, the results suggest that z encodes pose,
illumination and background information, while y tends to represent unique features of the face.

7
(a)

(b)
Figure 5: The result of applying an IcGAN to a set of real images from MNIST (a) and CelebA (b). A
real image is encoded into a latent representation z and conditional information y, and then decoded
into a new image. We fix z for every row and modify y for each column to obtain variations.

(a) (b)
Figure 6: Different ways of exploring the latent space. In (a) we take two real images and linearly
interpolate both z and y to obtain a gradual transformation from one face to another. In (b) we take
two real images, reconstruct them and swap the attribute information y between them.

6 Conclusions
We introduce an encoder in a conditional setting within the GAN framework, a model which we call
Invertible Conditional GANs (IcGANs). It solves the problem of GANs lacking the ability to infer
real samples to a latent representation z, while also allowing to explicitly control complex attributes
of generated samples with conditional information y. We also refine the performance of cGANS
by testing the optimal position in which the conditional information y is inserted in the model. We
have found that for the generator, y should be added at the input level, whereas the discriminator
works best when y is at the first layer. Additionally, we evaluate several ways to training an encoder.
Training two independent encoders – one for encoding z and another for encoding y – has proven to
be the best option in our experiments. The results obtained with a complex face dataset, CelebA, are
satisfactory and promising.
Acknowledgments This work is funded by the Projects TIN2013-41751-P of the Spanish Ministry
of Science and the CHIST ERA project PCIN-2015-226.

8
References
[1] K. Gregor, I. Danihelka, A. Graves, D. Jimenez Rezende, and D. Wierstra, “DRAW: A Recurrent Neural
Network For Image Generation,” International Conference on Machine Learning (ICML), pp. 1462–1471,
2015.
[2] A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional
Generative Adversarial Networks,” International Conference on Learning Representations (ICLR), 2016.
[3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio,
“Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani,
M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp.
2672–2680. [Online]. Available: http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
[4] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” International Conference on Learning
Representations (ICLR), 2014. [Online]. Available: http://arxiv.org/abs/1312.6114
[5] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”
in Proceedings of the IEEE, 1998, pp. 2278–2324.
[6] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of
International Conference on Computer Vision (ICCV), December 2015.
[7] D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in
deep generative models,” Proceedings of the 31st International Conference on Machine Learning (ICML),
vol. 32, pp. 1278–1286, 2014.
[8] D. P. Kingma, D. J. Rezende, S. Mohamed, and M. Welling, “Semi-supervised learning with deep
generative models,” Proceedings of Neural Information Processing Systems (NIPS), 2014. [Online].
Available: http://arxiv.org/abs/1406.5298
[9] T. Salimans, I. J. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved
techniques for training GANs,” Neural Information Processing Systems (NIPS), 2016. [Online]. Available:
http://arxiv.org/abs/1606.03498
[10] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “InfoGAN: Interpretable
Representation Learning by Information Maximizing Generative Adversarial Nets,” Neural Information
Processing Systems (NIPS), 2016. [Online]. Available: http://arxiv.org/abs/1606.03657
[11] S. Reed, Z. Akata, S. Mohan, S. Tenka, B. Schiele, and H. Lee, “Learning what and where to draw,” in
Advances in Neural Information Processing Systems (NIPS), 2016.
[12] V. Dumoulin, I. Belghazi, B. Poole, A. Lamb, M. Arjovsky, O. Mastropietro, and A. Courville,
“Adversarially Learned Inference,” Arxiv, 2016. [Online]. Available: http://arxiv.org/abs/1606.00704
[13] J. Donahue, P. Krähenbühl, and T. Darrell, “Adversarial feature learning,” CoRR, vol. abs/1605.09782,
2016. [Online]. Available: http://arxiv.org/abs/1605.09782
[14] A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow, “Adversarial autoencoders,” in International
Conference on Learning Representations (ICLR), 2016. [Online]. Available: http://arxiv.org/abs/1511.
05644
[15] A. B. L. Larsen, S. K. Sønderby, and O. Winther, “Autoencoding beyond pixels using a learned similarity
metric,” Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 1558–1566,
2015. [Online]. Available: http://arxiv.org/abs/1512.09300
[16] S. E. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to
image synthesis,” International Conference on Machine Learning (ICML), 2016. [Online]. Available:
http://arxiv.org/abs/1605.05396
[17] M. Mirza and S. Osindero, “Conditional Generative Adversarial Nets,” CoRR, vol. abs/1411.1784, 2014.
[Online]. Available: http://arxiv.org/abs/1411.1784
[18] J. Gauthier, “Conditional generative adversarial nets for convolutional face generation,” Class project for
Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester 2014, 2014.
[Online]. Available: http://cs231n.stanford.edu/reports/jgauthie_final_report.pdf
[19] D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” International Conference for
Learning Representations (ICLR), 2014. [Online]. Available: http://www.arxiv.org/pdf/1412.6980.pdf

SMK Exhibitor Directory 2025
No ratings yet
SMK Exhibitor Directory 2025
850 pages
Ten Years of Generative Adversarial Nets (Gans) : A Survey of The State-Of-The-Art
No ratings yet
Ten Years of Generative Adversarial Nets (Gans) : A Survey of The State-Of-The-Art
29 pages
ardizzone2019 - Conditional Coupling Layers
No ratings yet
ardizzone2019 - Conditional Coupling Layers
11 pages
A Survey of Image Synthesis and Editing With Generative Adversarial Networks PDF
No ratings yet
A Survey of Image Synthesis and Editing With Generative Adversarial Networks PDF
15 pages
Generative Adversarial Networks For Image and Video Synthesis: Algorithms and Applications
No ratings yet
Generative Adversarial Networks For Image and Video Synthesis: Algorithms and Applications
24 pages
Generative adversarial network An overview of theory and applications
No ratings yet
Generative adversarial network An overview of theory and applications
9 pages
Kim2019 Article LatentTransformationsNeuralNet
No ratings yet
Kim2019 Article LatentTransformationsNeuralNet
15 pages
paper4 (GAN)
No ratings yet
paper4 (GAN)
24 pages
CVAE-GAN Fine-Grained Image Generation Through Asymmetric Training
No ratings yet
CVAE-GAN Fine-Grained Image Generation Through Asymmetric Training
10 pages
Frank Gabel Eml2018 Report
No ratings yet
Frank Gabel Eml2018 Report
15 pages
Learning by Competing Competitive Multigenerator Based Adversaial Learning
No ratings yet
Learning by Competing Competitive Multigenerator Based Adversaial Learning
14 pages
A Review of Generative Adversarial Networks For Computer Vision TasksElectronics Switzerland
No ratings yet
A Review of Generative Adversarial Networks For Computer Vision TasksElectronics Switzerland
17 pages
Gan June 2019
No ratings yet
Gan June 2019
28 pages
2008.02793
No ratings yet
2008.02793
22 pages
Proceedings of Spie: A Survey On Generative Adversarial Networks and Their Variants Methods
No ratings yet
Proceedings of Spie: A Survey On Generative Adversarial Networks and Their Variants Methods
8 pages
Style Flow
No ratings yet
Style Flow
22 pages
Image-to-Image Translation With Conditional Adversarial Networks
No ratings yet
Image-to-Image Translation With Conditional Adversarial Networks
17 pages
lecture16 GAN cont
No ratings yet
lecture16 GAN cont
35 pages
Exploring Generative Adversarial Networks: Applications of GANs in AI and Big Data
No ratings yet
Exploring Generative Adversarial Networks: Applications of GANs in AI and Big Data
8 pages
Nonlinear Hierarchical Editing A Powerful Framework For Face Editing
No ratings yet
Nonlinear Hierarchical Editing A Powerful Framework For Face Editing
12 pages
Deep Generative Image Models Using A Laplacian Pyramid of Adversarial Networks
No ratings yet
Deep Generative Image Models Using A Laplacian Pyramid of Adversarial Networks
10 pages
Image-to-Image Translation With Conditional Adversarial Networks
No ratings yet
Image-to-Image Translation With Conditional Adversarial Networks
17 pages
3rd unit Notes
No ratings yet
3rd unit Notes
16 pages
The Six Fronts of The Generative Adversarial Networks
No ratings yet
The Six Fronts of The Generative Adversarial Networks
11 pages
Deep Generative Adversarial Networks For Image-To
No ratings yet
Deep Generative Adversarial Networks For Image-To
26 pages
AI resubmtion
No ratings yet
AI resubmtion
18 pages
Swapping Autoencoder For Deep Image Manipulation: Webpage
No ratings yet
Swapping Autoencoder For Deep Image Manipulation: Webpage
23 pages
Evolutionary Generative Adversarial Networks
No ratings yet
Evolutionary Generative Adversarial Networks
14 pages
A State-of-the-Art Review On Image Synthesis With Generative Adversarial Networks
No ratings yet
A State-of-the-Art Review On Image Synthesis With Generative Adversarial Networks
24 pages
ImageGenerationwithGans basedTechniquesASurvey
No ratings yet
ImageGenerationwithGans basedTechniquesASurvey
19 pages
lata2019
No ratings yet
lata2019
4 pages
Image-to-Image Translation With Conditional Adversarial Networks (Review)
No ratings yet
Image-to-Image Translation With Conditional Adversarial Networks (Review)
3 pages
Chakraborty - 2024 - Mach. - Learn. - Sci. - Technol. - 5 - 011001
No ratings yet
Chakraborty - 2024 - Mach. - Learn. - Sci. - Technol. - 5 - 011001
36 pages
Lecture4 GAN b
No ratings yet
Lecture4 GAN b
38 pages
2-7
No ratings yet
2-7
6 pages
2022_The effect of loss function on conditional generative adversarial networks
No ratings yet
2022_The effect of loss function on conditional generative adversarial networks
12 pages
Visual Gans
No ratings yet
Visual Gans
19 pages
07 1.gan 2
No ratings yet
07 1.gan 2
56 pages
Liu_Hu_report
No ratings yet
Liu_Hu_report
6 pages
Gandia
No ratings yet
Gandia
71 pages
2025-04-16_17-30-02_14_124090124
No ratings yet
2025-04-16_17-30-02_14_124090124
7 pages
BTP Report On Text To Image Synthesis
No ratings yet
BTP Report On Text To Image Synthesis
62 pages
Generative Adversarial Networks in the built environment A comprehensive review of the application of GANs across data types and scales
No ratings yet
Generative Adversarial Networks in the built environment A comprehensive review of the application of GANs across data types and scales
18 pages
Generative Adversarial Networks For Visible To Infrared Video Con
No ratings yet
Generative Adversarial Networks For Visible To Infrared Video Con
14 pages
Lec19 - GANs
No ratings yet
Lec19 - GANs
47 pages
06 cGAN
No ratings yet
06 cGAN
45 pages
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
4 pages
LargeGANS PDF
No ratings yet
LargeGANS PDF
29 pages
Geometry-Contrastive GAN For Facial Expression Transfer: Preprint. Work in Progress
No ratings yet
Geometry-Contrastive GAN For Facial Expression Transfer: Preprint. Work in Progress
14 pages
Introduction Generative Adversarial Networks
No ratings yet
Introduction Generative Adversarial Networks
41 pages
Generating Users Desired Face Image Using the Conditional Generative Adversarial Network and Relevance Feedback (1)
No ratings yet
Generating Users Desired Face Image Using the Conditional Generative Adversarial Network and Relevance Feedback (1)
11 pages
Bai 等。 - 2021 - AI-GAN Attack-Inspired Generation of Adversarial
No ratings yet
Bai 等。 - 2021 - AI-GAN Attack-Inspired Generation of Adversarial
5 pages
NeurIPS 2020 Swapping Autoencoder For Deep Image Manipulation Paper
No ratings yet
NeurIPS 2020 Swapping Autoencoder For Deep Image Manipulation Paper
14 pages
A Review On Generative Adversarial Networks Used For Image Reconstruction in Medical Imaging
No ratings yet
A Review On Generative Adversarial Networks Used For Image Reconstruction in Medical Imaging
5 pages
Sketch2face: Conditional Generative Adversarial Networks For Transforming Face Sketches Into Photorealistic Images
No ratings yet
Sketch2face: Conditional Generative Adversarial Networks For Transforming Face Sketches Into Photorealistic Images
9 pages
DCGANs
No ratings yet
DCGANs
9 pages
Scene Reconstruction From 4D Radar Data With GAN and Diffusion
No ratings yet
Scene Reconstruction From 4D Radar Data With GAN and Diffusion
69 pages
Liu and Tuzel - 2016 - Coupled Generative Adversarial Networks
No ratings yet
Liu and Tuzel - 2016 - Coupled Generative Adversarial Networks
32 pages
Gans Stanford
No ratings yet
Gans Stanford
39 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
(the Springer International Series in Engineering and Computer Science 745) Nada Lavrač, Marko Grobelnik (Auth.), Dunja Mladenić, Nada Lavrač, Marko Bohanec, Steve Moyle (Eds.) - Data Mining and Decis
No ratings yet
(the Springer International Series in Engineering and Computer Science 745) Nada Lavrač, Marko Grobelnik (Auth.), Dunja Mladenić, Nada Lavrač, Marko Bohanec, Steve Moyle (Eds.) - Data Mining and Decis
283 pages
Application of Artificial Intelligence Methods in Carotid Artery Segmentation a Review
No ratings yet
Application of Artificial Intelligence Methods in Carotid Artery Segmentation a Review
13 pages
Lecture-5 Classification in ML
No ratings yet
Lecture-5 Classification in ML
50 pages
Face Detection Report
No ratings yet
Face Detection Report
84 pages
Dance-Party-AI-Edition
No ratings yet
Dance-Party-AI-Edition
6 pages
AI_Curriculum_HandbookClassXI
No ratings yet
AI_Curriculum_HandbookClassXI
151 pages
Sifakis On The NatureofCoputing-new - Print
No ratings yet
Sifakis On The NatureofCoputing-new - Print
19 pages
Koncel-Kedziorski et al. (2019) Text Generation from Knowledge Graphs with Graph Transformers Proceedings of NAACL-HLT 2019, pages 2284–2293
No ratings yet
Koncel-Kedziorski et al. (2019) Text Generation from Knowledge Graphs with Graph Transformers Proceedings of NAACL-HLT 2019, pages 2284–2293
10 pages
Integrating AI with Fusion 360 and Other Surface Modeling Tools
No ratings yet
Integrating AI with Fusion 360 and Other Surface Modeling Tools
5 pages
Presentation for AI Engineer (1)
No ratings yet
Presentation for AI Engineer (1)
38 pages
Project Report (Group 9)
No ratings yet
Project Report (Group 9)
20 pages
Ai Notes Unit 1 (1) (2) (1)
No ratings yet
Ai Notes Unit 1 (1) (2) (1)
33 pages
Real Time Finger Tracking and Contour Detection For Gesture Recognition Using Opencv
No ratings yet
Real Time Finger Tracking and Contour Detection For Gesture Recognition Using Opencv
4 pages
9 10 AI MCQ4 sol
No ratings yet
9 10 AI MCQ4 sol
10 pages
Open Source Intelligence For Malicious Behavior Discovery and Interpretation
No ratings yet
Open Source Intelligence For Malicious Behavior Discovery and Interpretation
14 pages
IBM Cloud Pak AIOps 4.5.0 Pt1
No ratings yet
IBM Cloud Pak AIOps 4.5.0 Pt1
1,195 pages
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
No ratings yet
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
2 pages
Legal-AI-a-beginners-guide-web
No ratings yet
Legal-AI-a-beginners-guide-web
12 pages
Communicative English III Unit I 3.group Discussion
100% (1)
Communicative English III Unit I 3.group Discussion
13 pages
26 9
No ratings yet
26 9
8 pages
Guidelines LSS 2025
No ratings yet
Guidelines LSS 2025
13 pages
Ensemble Learning: Inspire Educate Transform
No ratings yet
Ensemble Learning: Inspire Educate Transform
39 pages
A CASE AGAINST COPYRIGHT PROTECTION FOR AI-GENERATED WORKS
No ratings yet
A CASE AGAINST COPYRIGHT PROTECTION FOR AI-GENERATED WORKS
21 pages
ISB DT Required Assignment 9.2
No ratings yet
ISB DT Required Assignment 9.2
3 pages
DAO2703-Operations-and-Technology-Management
No ratings yet
DAO2703-Operations-and-Technology-Management
3 pages
FYP Proposal Presentation Final
No ratings yet
FYP Proposal Presentation Final
14 pages
A Hybrid Model For Kidney Stone Detection Using Deep Learning
No ratings yet
A Hybrid Model For Kidney Stone Detection Using Deep Learning
22 pages
ssrn-4990383
No ratings yet
ssrn-4990383
49 pages
Prepared speech 1 (Inprovement Of Tech)
No ratings yet
Prepared speech 1 (Inprovement Of Tech)
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Invertible Conditional GANs For Image Editing

Uploaded by

Invertible Conditional GANs For Image Editing

Uploaded by

Invertible Conditional GANs for image editing

Guim Perarnau, Joost van de Weijer, Bogdan Raducanu Jose M. Álvarez

Workshop on Adversarial Training, NIPS 2016, Barcelona, Spain.

The summary of contributions of our work is the following:

3 Background: Generative Adversarial Networks

4 Invertible Conditional GANs

• IND: Two independent encoders. Ez and Ey are trained separately.

4.2 Conditional GAN

4.3 Model architecture

Table 1: Detailed generator and discriminator architecture

5.2 Evaluating the conditional GAN

5.3 Evaluating the encoder

5.4 Evaluating the IcGAN

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.