Abstract
Multimodal image-to-image translation has received great attention due to its flexibility and practicality. The existing methods lack the generality of effective style representation, and cannot capture different levels of stylistic semantic information from cross-domain images. Besides, they ignore the parallelism for cross-domain image generation, and their generator can only be responsible for specific domains. To address these issues, we propose a novel Single Cross-domain Semantic Guidance Network (SCSG-Net) for coarse-to-fine semantically controllable multimodal image translation. Images from different domains are mapped to a unified visual semantic latent space by a dual sparse feature pyramid encoder, and then the generative module generates the result images by extracting semantic style representation from the input images in a self-supervised manner guided by adaptive discrimination. Especially, our SCSG-Net meets the needs of users in different styles as well as diverse scenarios. Extensive experiments on different benchmark datasets show that our method can outperform other state-of-the-art methods both quantitatively and qualitatively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bińkowski, M., Sutherland, D., Arbel, M., Gretton, A.: Demystifying MMD GANs. In: ICLR (2018)
Chen, H., et al.: Artistic style transfer with internal-external learning and contrastive learning. In: NIPS 34 (2021)
Chen, R., Huang, W., Huang, B., Sun, F., Fang, B.: Reusing discriminators for encoding: towards unsupervised image-to-image translation. In: CVPR, pp. 8168–8177 (2020)
Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN v2: diverse image synthesis for multiple domains. In: CVPR, pp. 8188–8197 (2020)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR, pp. 16000–16009 (2022)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6629–6640 (2017)
Hinton, G.E., Zemel, R.: Autoencoders, minimum description length and Helmholtz free energy. In: Advances in Neural Information Processing Systems 6 (1993)
Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR, pp. 4401–4410 (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of styleGAN. In: CVPR, pp. 8110–8119 (2020)
Kim, J., Kim, M., Kang, H., Lee, K.H.: U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In: ICLR (2019)
Komodakis, N., Gidaris, S.: Unsupervised representation learning by predicting image rotations. In: ICLR (2018)
Lee, H.-Y., Tseng, H.-Y., Huang, J.-B., Singh, M., Yang, M.-H.: Diverse image-to-image translation via disentangled representations. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 36–52. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_3
Lee, H.Y., et al.: Drit++: diverse image-to-image translation via disentangled representations. Int. J. Comput. Vision 128(10), 2402–2417 (2020)
Li, B., Zhu, Y., Wang, Y., Lin, C.W., Ghanem, B., Shen, L.: AniGAN: style-guided generative adversarial networks for unsupervised anime face generation. IEEE Trans. Multimedia (2021)
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: NIPS, pp. 700–708 (2017)
Liu, M.Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., Kautz, J.: Few-shot unsupervised image-to-image translation. In: ICCV, pp. 10551–10560 (2019)
Long, J., Lu, H.: Multi-level gate feature aggregation with spatially adaptive batch-instance normalization for semantic image synthesis. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12572, pp. 378–390. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67832-6_31
Park, T., Zhu, J.Y., Wang, O., Lu, J., Shechtman, E., Efros, A., Zhang, R.: Swapping autoencoder for deep image manipulation. In: NIPS 33, pp. 7198–7211 (2020)
Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., Cohen-Or, D.: Encoding in style: a styleGAN encoder for image-to-image translation. In: CVPR, pp. 2287–2296 (2021)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: NIPS, pp. 2234–2242 (2016)
Sun, Yanbei, Lu, Yao, Lu, Haowei, Zhao, Qingjie, Wang, Shunzhou: Multimodal unsupervised image-to-image translation without independent style encoder. In: Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13141, pp. 624–636. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98358-1_49
Tang, H., Xu, D., Sebe, N., Yan, Y.: Attention-guided generative adversarial networks for unsupervised image-to-image translation. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for styleGAN image manipulation. TOG 40(4), 1–14 (2021)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: CVPR, pp. 8798–8807 (2018)
Yang, S., Jiang, L., Liu, Z., Loy, C.C.: Pastiche master: exemplar-based high-resolution portrait style transfer. In: CVPR (2022)
Zhai, S., Hu, X., Chen, X., Ni, B., Zhang, W.: Resolution booster: global structure preserving stitching method for ultra-high resolution image translation. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11961, pp. 702–713. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37731-1_57
Zhao, Y., Wu, R., Dong, H.: Unpaired image-to-image translation using adversarial consistency loss. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 800–815. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_46
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2223–2232 (2017)
Acknowledgements
This work was supported in part by the Science and technology research in key areas in Foshan under Grant 2020001006832, the Key-Area Research and Development Program of Guangdong Province under Grant 2018B010109007 and 2019B010153002, and the Guangzhou R &D Programme in Key Areas of Science and Technology Projects under Grant 202007040006, and the Guangdong Provincial Key Laboratory of Cyber-Physical System under Grant 2020B1212060069, and the Program of Marine Economy Development (Six Marine Industries) Special Foundation of Department of Natural Resources of Guangdong Province under Grant GDNRC [2020]056.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lan, J. et al. (2023). Single Cross-domain Semantic Guidance Network for Multimodal Unsupervised Image Translation. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-27077-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27076-5
Online ISBN: 978-3-031-27077-2
eBook Packages: Computer ScienceComputer Science (R0)