Compositing Foreground and Background Using Variational Autoencoders

Zeng, Zezhen; Hare, Jonathon; Prügel-Bennett, Adam

doi:10.1007/978-3-031-09037-0_45

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13363))

Included in the following conference series:

International Conference on Pattern Recognition and Artificial Intelligence

1963 Accesses

Abstract

We consider the problem of composing images by combining an arbitrary foreground object to some background. To achieve this we use a factorized latent space. Thus we introduce a model called the “Background and Foreground VAE” (BFVAE) that can combine arbitrary foreground and background from an image dataset to generate unseen images. To enhance the quality of the generated images we also propose a VAE-GAN mixed model called “Latent Space Renderer-GAN” (LSR-GAN). This substantially reduces the blurriness of BFVAE images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

FurryGAN: High Quality Foreground-Aware Image Synthesis

DiffiT: Diffusion Vision Transformers for Image Generation

GIVT: Generative Infinite-Vocabulary Transformers

References

Bouchacourt, D., Tomioka, R., Nowozin, S.: Multi-level variational autoencoder: learning disentangled representations from grouped observations. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Neural photo editing with introspective adversarial networks. In: Proceedings of International Conference on Learning Representations (2017)
Google Scholar
Burgess, C.P., et al.: Monet: unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390 (2019)
Che, T., Li, Y., Jacob, A.P., Bengio, Y., Li, W.: Mode regularized generative adversarial networks. In: Proceedings of International Conference on Learning Representations (2017)
Google Scholar
Chen, M., Artières, T., Denoyer, L.: Unsupervised object segmentation by redrawing. In: Advances in Neural Information Processing Systems (2019)
Google Scholar
Chen, T.Q., Li, X., Grosse, R.B., Duvenaud, D.K.: Isolating sources of disentanglement in variational autoencoders. In: Advances in Neural Information Processing Systems, pp. 2610–2620 (2018)
Google Scholar
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems. pp. 2172–2180 (2016)
Google Scholar
Cheung, B., Livezey, J.A., Bansal, A.K., Olshausen, B.A.: Discovering hidden factors of variation in deep networks. arXiv preprint arXiv:1412.6583 (2014)
Crawford, E., Pineau, J.: Spatially invariant unsupervised object detection with convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp. 3412–3420 (2019)
Google Scholar
Cui, K., Zhang, G., Zhan, F., Huang, J., Lu, S.: Fbc-gan: diverse and flexible image synthesis via foreground-background composition. arXiv preprint arXiv:2107.03166 (2021)
Dubrovina, A., Xia, F., Achlioptas, P., Shalah, M., Groscot, R., Guibas, L.J.: Composite shape modeling via latent space factorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8140–8149 (2019)
Google Scholar
Engelcke, M., Kosiorek, A.R., Jones, O.P., Posner, I.: Genesis: generative scene inference and sampling with object-centric latent representations. In: International Conference on Learning Representations (2020)
Google Scholar
Esser, P., Sutter, E., Ommer, B.: A variational u-net for conditional appearance and shape generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 8857–8866 (2018)
Google Scholar
Greff, K., et al.: Multi-object representation learning with iterative variational inference. In: International Conference on Machine Learning, pp. 2424–2433. PMLR (2019)
Google Scholar
Harsh Jha, A., Anand, S., Singh, M., Veeravasarapu, V.: Disentangling factors of variation with cycle-consistent variational auto-encoders. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 805–820 (2018)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)
Google Scholar
Higgins, I., et al.: Beta-VAE: learning basic visual concepts with a constrained variational framework. In: Proceedings of International Conference on Learning Representations (2017)
Google Scholar
Hoffman, M.D., Johnson, M.J.: Elbo surgery: yet another way to carve up the variational evidence lower bound. In: Workshop in Advances in Approximate Bayesian Inference (2016)
Google Scholar
Huang, H., He, R., Sun, Z., Tan, T., et al.: IntroVAE: introspective variational autoencoders for photographic image synthesis. In: Advances in Neural Information Processing Systems, pp. 52–63 (2018)
Google Scholar
Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO (June 2011)
Google Scholar
Kim, H., Mnih, A.: Disentangling by factorising. In: International Conference on Machine Learning, pp. 2649–2658. PMLR (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (2015)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: Proceedings of International Conference on Learning Representations (2013)
Google Scholar
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia (2013)
Google Scholar
Kumar, A., Sattigeri, P., Balakrishnan, A.: Variational inference of disentangled latent concepts from unlabeled observations. arXiv preprint arXiv:1711.00848 (2017)
Li, Y., Singh, K.K., Ojha, U., Lee, Y.J.: Mixnmatch: multifactor disentanglement and encoding for conditional image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Lin, C.H., Yumer, E., Wang, O., Shechtman, E., Lucey, S.: ST-GAN: spatial transformer generative adversarial networks for image compositing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9455–9464 (2018)
Google Scholar
Lin, Z., et al.: Space: unsupervised object-oriented scene representation via spatial attention and decomposition. In: Proceedings of International Conference on Learning Representations (2020)
Google Scholar
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. In: Proceedings of International Conference on Learning Representations (2016)
Google Scholar
Mathieu, M.F., Zhao, J.J., Zhao, J., Ramesh, A., Sprechmann, P., LeCun, Y.: Disentangling factors of variation in deep representation using adversarial training. In: Advances in Neural Information Processing Systems, pp. 5040–5048 (2016)
Google Scholar
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)
Nash, C., et al.: The multi-entity variational autoencoder. In: NeurIPS Workshops (2017)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Schor, N., Katzir, O., Zhang, H., Cohen-Or, D.: Componet: learning to generate the unseen by part synthesis and composition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8759–8768 (2019)
Google Scholar
Singh, K.K., Ojha, U., Lee, Y.J.: Finegan: unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Spelke, E.S.: Principles of object perception. Cogn. Sci. 14(1), 29–56 (1990)
Article Google Scholar
Srivastava, A., Valkov, L., Russell, C., Gutmann, M.U., Sutton, C.: VEEGAN: reducing mode collapse in GANs using implicit variational learning. In: Advances in Neural Information Processing Systems, pp. 3308–3318 (2017)
Google Scholar
Szabó, A., Hu, Q., Portenier, T., Zwicker, M., Favaro, P.: Challenges in disentangling independent factors of variation. arXiv preprint arXiv:1711.02245 (2017)
Welinder, P., et al.: Caltech-ucsd birds 200 (2010)
Google Scholar
Xiao, T., Hong, J., Ma, J.: DNA-GAN: learning disentangled representations from multi-attribute images. arXiv preprint arXiv:1711.05415 (2017)
Yang, J., Kannan, A., Batra, D., Parikh, D.: LR-GAN: layered recursive generative adversarial networks for image generation. arXiv preprint arXiv:1703.01560 (2017)
Zhao, S., Song, J., Ermon, S.: InfoVAE: information maximizing variational autoencoders. arXiv preprint arXiv:1706.02262 (2017)

Download references

Author information

Authors and Affiliations

University of Southampton, Southampton, UK
Zezhen Zeng, Jonathon Hare & Adam Prügel-Bennett

Authors

Zezhen Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Jonathon Hare
View author publications
You can also search for this author in PubMed Google Scholar
Adam Prügel-Bennett
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zezhen Zeng .

Editor information

Editors and Affiliations

Télécom SudParis, Palaiseau, France
Mounîm El Yacoubi
École de Technologie Supérieure, Montreal, QC, Canada
Eric Granger
Hong Kong Baptist University, Kowloon, Kowloon, Hong Kong
Pong Chi Yuen
Indian Statistical Institute, Kolkata, India
Umapada Pal
Université Paris Cité, Paris, France
Nicole Vincent

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, Z., Hare, J., Prügel-Bennett, A. (2022). Compositing Foreground and Background Using Variational Autoencoders. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds) Pattern Recognition and Artificial Intelligence. ICPRAI 2022. Lecture Notes in Computer Science, vol 13363. Springer, Cham. https://doi.org/10.1007/978-3-031-09037-0_45

Download citation

DOI: https://doi.org/10.1007/978-3-031-09037-0_45
Published: 02 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09036-3
Online ISBN: 978-3-031-09037-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Compositing Foreground and Background Using Variational Autoencoders

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

FurryGAN: High Quality Foreground-Aware Image Synthesis

DiffiT: Diffusion Vision Transformers for Image Generation

GIVT: Generative Infinite-Vocabulary Transformers

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Compositing Foreground and Background Using Variational Autoencoders

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

FurryGAN: High Quality Foreground-Aware Image Synthesis

DiffiT: Diffusion Vision Transformers for Image Generation

GIVT: Generative Infinite-Vocabulary Transformers

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.