Skip to main content

Compositing Foreground and Background Using Variational Autoencoders

  • Conference paper
  • First Online:
Pattern Recognition and Artificial Intelligence (ICPRAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13363))

  • 1963 Accesses

Abstract

We consider the problem of composing images by combining an arbitrary foreground object to some background. To achieve this we use a factorized latent space. Thus we introduce a model called the “Background and Foreground VAE” (BFVAE) that can combine arbitrary foreground and background from an image dataset to generate unseen images. To enhance the quality of the generated images we also propose a VAE-GAN mixed model called “Latent Space Renderer-GAN” (LSR-GAN). This substantially reduces the blurriness of BFVAE images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bouchacourt, D., Tomioka, R., Nowozin, S.: Multi-level variational autoencoder: learning disentangled representations from grouped observations. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  2. Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Neural photo editing with introspective adversarial networks. In: Proceedings of International Conference on Learning Representations (2017)

    Google Scholar 

  3. Burgess, C.P., et al.: Monet: unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390 (2019)

  4. Che, T., Li, Y., Jacob, A.P., Bengio, Y., Li, W.: Mode regularized generative adversarial networks. In: Proceedings of International Conference on Learning Representations (2017)

    Google Scholar 

  5. Chen, M., Artières, T., Denoyer, L.: Unsupervised object segmentation by redrawing. In: Advances in Neural Information Processing Systems (2019)

    Google Scholar 

  6. Chen, T.Q., Li, X., Grosse, R.B., Duvenaud, D.K.: Isolating sources of disentanglement in variational autoencoders. In: Advances in Neural Information Processing Systems, pp. 2610–2620 (2018)

    Google Scholar 

  7. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems. pp. 2172–2180 (2016)

    Google Scholar 

  8. Cheung, B., Livezey, J.A., Bansal, A.K., Olshausen, B.A.: Discovering hidden factors of variation in deep networks. arXiv preprint arXiv:1412.6583 (2014)

  9. Crawford, E., Pineau, J.: Spatially invariant unsupervised object detection with convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp. 3412–3420 (2019)

    Google Scholar 

  10. Cui, K., Zhang, G., Zhan, F., Huang, J., Lu, S.: Fbc-gan: diverse and flexible image synthesis via foreground-background composition. arXiv preprint arXiv:2107.03166 (2021)

  11. Dubrovina, A., Xia, F., Achlioptas, P., Shalah, M., Groscot, R., Guibas, L.J.: Composite shape modeling via latent space factorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8140–8149 (2019)

    Google Scholar 

  12. Engelcke, M., Kosiorek, A.R., Jones, O.P., Posner, I.: Genesis: generative scene inference and sampling with object-centric latent representations. In: International Conference on Learning Representations (2020)

    Google Scholar 

  13. Esser, P., Sutter, E., Ommer, B.: A variational u-net for conditional appearance and shape generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 8857–8866 (2018)

    Google Scholar 

  14. Greff, K., et al.: Multi-object representation learning with iterative variational inference. In: International Conference on Machine Learning, pp. 2424–2433. PMLR (2019)

    Google Scholar 

  15. Harsh Jha, A., Anand, S., Singh, M., Veeravasarapu, V.: Disentangling factors of variation with cycle-consistent variational auto-encoders. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 805–820 (2018)

    Google Scholar 

  16. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)

    Google Scholar 

  17. Higgins, I., et al.: Beta-VAE: learning basic visual concepts with a constrained variational framework. In: Proceedings of International Conference on Learning Representations (2017)

    Google Scholar 

  18. Hoffman, M.D., Johnson, M.J.: Elbo surgery: yet another way to carve up the variational evidence lower bound. In: Workshop in Advances in Approximate Bayesian Inference (2016)

    Google Scholar 

  19. Huang, H., He, R., Sun, Z., Tan, T., et al.: IntroVAE: introspective variational autoencoders for photographic image synthesis. In: Advances in Neural Information Processing Systems, pp. 52–63 (2018)

    Google Scholar 

  20. Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO (June 2011)

    Google Scholar 

  21. Kim, H., Mnih, A.: Disentangling by factorising. In: International Conference on Machine Learning, pp. 2649–2658. PMLR (2018)

    Google Scholar 

  22. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (2015)

    Google Scholar 

  23. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: Proceedings of International Conference on Learning Representations (2013)

    Google Scholar 

  24. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia (2013)

    Google Scholar 

  25. Kumar, A., Sattigeri, P., Balakrishnan, A.: Variational inference of disentangled latent concepts from unlabeled observations. arXiv preprint arXiv:1711.00848 (2017)

  26. Li, Y., Singh, K.K., Ojha, U., Lee, Y.J.: Mixnmatch: multifactor disentanglement and encoding for conditional image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  27. Lin, C.H., Yumer, E., Wang, O., Shechtman, E., Lucey, S.: ST-GAN: spatial transformer generative adversarial networks for image compositing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9455–9464 (2018)

    Google Scholar 

  28. Lin, Z., et al.: Space: unsupervised object-oriented scene representation via spatial attention and decomposition. In: Proceedings of International Conference on Learning Representations (2020)

    Google Scholar 

  29. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. In: Proceedings of International Conference on Learning Representations (2016)

    Google Scholar 

  30. Mathieu, M.F., Zhao, J.J., Zhao, J., Ramesh, A., Sprechmann, P., LeCun, Y.: Disentangling factors of variation in deep representation using adversarial training. In: Advances in Neural Information Processing Systems, pp. 5040–5048 (2016)

    Google Scholar 

  31. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)

  32. Nash, C., et al.: The multi-entity variational autoencoder. In: NeurIPS Workshops (2017)

    Google Scholar 

  33. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  34. Schor, N., Katzir, O., Zhang, H., Cohen-Or, D.: Componet: learning to generate the unseen by part synthesis and composition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8759–8768 (2019)

    Google Scholar 

  35. Singh, K.K., Ojha, U., Lee, Y.J.: Finegan: unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  36. Spelke, E.S.: Principles of object perception. Cogn. Sci. 14(1), 29–56 (1990)

    Article  Google Scholar 

  37. Srivastava, A., Valkov, L., Russell, C., Gutmann, M.U., Sutton, C.: VEEGAN: reducing mode collapse in GANs using implicit variational learning. In: Advances in Neural Information Processing Systems, pp. 3308–3318 (2017)

    Google Scholar 

  38. Szabó, A., Hu, Q., Portenier, T., Zwicker, M., Favaro, P.: Challenges in disentangling independent factors of variation. arXiv preprint arXiv:1711.02245 (2017)

  39. Welinder, P., et al.: Caltech-ucsd birds 200 (2010)

    Google Scholar 

  40. Xiao, T., Hong, J., Ma, J.: DNA-GAN: learning disentangled representations from multi-attribute images. arXiv preprint arXiv:1711.05415 (2017)

  41. Yang, J., Kannan, A., Batra, D., Parikh, D.: LR-GAN: layered recursive generative adversarial networks for image generation. arXiv preprint arXiv:1703.01560 (2017)

  42. Zhao, S., Song, J., Ermon, S.: InfoVAE: information maximizing variational autoencoders. arXiv preprint arXiv:1706.02262 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zezhen Zeng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zeng, Z., Hare, J., Prügel-Bennett, A. (2022). Compositing Foreground and Background Using Variational Autoencoders. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds) Pattern Recognition and Artificial Intelligence. ICPRAI 2022. Lecture Notes in Computer Science, vol 13363. Springer, Cham. https://doi.org/10.1007/978-3-031-09037-0_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-09037-0_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-09036-3

  • Online ISBN: 978-3-031-09037-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy