Skip to main content

Single Cross-domain Semantic Guidance Network for Multimodal Unsupervised Image Translation

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13833))

Included in the following conference series:

Abstract

Multimodal image-to-image translation has received great attention due to its flexibility and practicality. The existing methods lack the generality of effective style representation, and cannot capture different levels of stylistic semantic information from cross-domain images. Besides, they ignore the parallelism for cross-domain image generation, and their generator can only be responsible for specific domains. To address these issues, we propose a novel Single Cross-domain Semantic Guidance Network (SCSG-Net) for coarse-to-fine semantically controllable multimodal image translation. Images from different domains are mapped to a unified visual semantic latent space by a dual sparse feature pyramid encoder, and then the generative module generates the result images by extracting semantic style representation from the input images in a self-supervised manner guided by adaptive discrimination. Especially, our SCSG-Net meets the needs of users in different styles as well as diverse scenarios. Extensive experiments on different benchmark datasets show that our method can outperform other state-of-the-art methods both quantitatively and qualitatively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bińkowski, M., Sutherland, D., Arbel, M., Gretton, A.: Demystifying MMD GANs. In: ICLR (2018)

    Google Scholar 

  2. Chen, H., et al.: Artistic style transfer with internal-external learning and contrastive learning. In: NIPS 34 (2021)

    Google Scholar 

  3. Chen, R., Huang, W., Huang, B., Sun, F., Fang, B.: Reusing discriminators for encoding: towards unsupervised image-to-image translation. In: CVPR, pp. 8168–8177 (2020)

    Google Scholar 

  4. Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN v2: diverse image synthesis for multiple domains. In: CVPR, pp. 8188–8197 (2020)

    Google Scholar 

  5. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR, pp. 16000–16009 (2022)

    Google Scholar 

  6. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6629–6640 (2017)

    Google Scholar 

  7. Hinton, G.E., Zemel, R.: Autoencoders, minimum description length and Helmholtz free energy. In: Advances in Neural Information Processing Systems 6 (1993)

    Google Scholar 

  8. Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11

    Chapter  Google Scholar 

  9. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)

    Google Scholar 

  10. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR, pp. 4401–4410 (2019)

    Google Scholar 

  11. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of styleGAN. In: CVPR, pp. 8110–8119 (2020)

    Google Scholar 

  12. Kim, J., Kim, M., Kang, H., Lee, K.H.: U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In: ICLR (2019)

    Google Scholar 

  13. Komodakis, N., Gidaris, S.: Unsupervised representation learning by predicting image rotations. In: ICLR (2018)

    Google Scholar 

  14. Lee, H.-Y., Tseng, H.-Y., Huang, J.-B., Singh, M., Yang, M.-H.: Diverse image-to-image translation via disentangled representations. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 36–52. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_3

    Chapter  Google Scholar 

  15. Lee, H.Y., et al.: Drit++: diverse image-to-image translation via disentangled representations. Int. J. Comput. Vision 128(10), 2402–2417 (2020)

    Article  Google Scholar 

  16. Li, B., Zhu, Y., Wang, Y., Lin, C.W., Ghanem, B., Shen, L.: AniGAN: style-guided generative adversarial networks for unsupervised anime face generation. IEEE Trans. Multimedia (2021)

    Google Scholar 

  17. Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: NIPS, pp. 700–708 (2017)

    Google Scholar 

  18. Liu, M.Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., Kautz, J.: Few-shot unsupervised image-to-image translation. In: ICCV, pp. 10551–10560 (2019)

    Google Scholar 

  19. Long, J., Lu, H.: Multi-level gate feature aggregation with spatially adaptive batch-instance normalization for semantic image synthesis. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12572, pp. 378–390. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67832-6_31

    Chapter  Google Scholar 

  20. Park, T., Zhu, J.Y., Wang, O., Lu, J., Shechtman, E., Efros, A., Zhang, R.: Swapping autoencoder for deep image manipulation. In: NIPS 33, pp. 7198–7211 (2020)

    Google Scholar 

  21. Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., Cohen-Or, D.: Encoding in style: a styleGAN encoder for image-to-image translation. In: CVPR, pp. 2287–2296 (2021)

    Google Scholar 

  22. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: NIPS, pp. 2234–2242 (2016)

    Google Scholar 

  23. Sun, Yanbei, Lu, Yao, Lu, Haowei, Zhao, Qingjie, Wang, Shunzhou: Multimodal unsupervised image-to-image translation without independent style encoder. In: Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13141, pp. 624–636. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98358-1_49

    Chapter  Google Scholar 

  24. Tang, H., Xu, D., Sebe, N., Yan, Y.: Attention-guided generative adversarial networks for unsupervised image-to-image translation. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)

    Google Scholar 

  25. Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for styleGAN image manipulation. TOG 40(4), 1–14 (2021)

    Article  Google Scholar 

  26. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: CVPR, pp. 8798–8807 (2018)

    Google Scholar 

  27. Yang, S., Jiang, L., Liu, Z., Loy, C.C.: Pastiche master: exemplar-based high-resolution portrait style transfer. In: CVPR (2022)

    Google Scholar 

  28. Zhai, S., Hu, X., Chen, X., Ni, B., Zhang, W.: Resolution booster: global structure preserving stitching method for ultra-high resolution image translation. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11961, pp. 702–713. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37731-1_57

    Chapter  Google Scholar 

  29. Zhao, Y., Wu, R., Dong, H.: Unpaired image-to-image translation using adversarial consistency loss. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 800–815. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_46

    Chapter  Google Scholar 

  30. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2223–2232 (2017)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Science and technology research in key areas in Foshan under Grant 2020001006832, the Key-Area Research and Development Program of Guangdong Province under Grant 2018B010109007 and 2019B010153002, and the Guangzhou R &D Programme in Key Areas of Science and Technology Projects under Grant 202007040006, and the Guangdong Provincial Key Laboratory of Cyber-Physical System under Grant 2020B1212060069, and the Program of Marine Economy Development (Six Marine Industries) Special Foundation of Department of Natural Resources of Guangdong Province under Grant GDNRC [2020]056.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoheng Huang .

Editor information

Editors and Affiliations

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lan, J. et al. (2023). Single Cross-domain Semantic Guidance Network for Multimodal Unsupervised Image Translation. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27077-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27076-5

  • Online ISBN: 978-3-031-27077-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy