Abstract
Digital cartoon production requires extensive manual labor to colorize sketches with visually pleasant color composition and color shading. During colorization, the artist usually takes an existing cartoon image as color guidance, particularly when colorizing related characters or an animation sequence. Reference-guided colorization is more intuitive than colorization with other hints, such as color points or scribbles, or text-based hints. Unfortunately, reference-guided colorization is challenging since the style of the colorized image should match the style of the reference image in terms of both global color composition and local color shading. In this paper, we propose a novel learning-based framework which colorizes a sketch based on a color style feature extracted from a reference color image. Our framework contains a color style extractor to extract the color feature from a color image, a colorization network to generate multi-scale output images by combining a sketch and a color feature, and a multi-scale discriminator to improve the reality of the output image. Extensive qualitative and quantitative evaluations show that our method outperforms existing methods, providing both superior visual quality and style reference consistency in the task of reference-based colorization.

Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Isola, P.; Zhu, J. Y.; Zhou, T. H.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967–5976, 2017.
Zhu, J. Y.; Park, T.; Isola, P.; Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2242–2251, 2017.
Zhang, R.; Isola, P.; Efros, A. A. Colorful image colorization. In: Computer Vision-ECCV 2016. Lecture Notes in Computer Science, Vol 9907. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 649–666, 2016.
Yonetsuji, T. Paints chainer. 2017. Available at https://github.com/pfnet/Paintschainer.
Zhang, L.; Ji, Y.; Lin, X.; Liu, C. P. Style transfer for anime sketches with enhanced residual U-net and auxiliary classifier GAN. In: Proceedings of the 4th IAPR Asian Conference on Pattern Recognition, 506–511, 2017.
Kim, H.; Jhoo, H. Y.; Park, E.; Yoo, S. Tag2Pix: Line art colorization using text tag with SECat and changing loss. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 9055–9064, 2019.
Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, 1510–1519, 2017.
Li, Y.; Fang, C.; Yang, J.; Wang, Z.; Lu, X.; Yang, M. Universal style transfer via feature transforms. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 385–395, 2017.
Gonzalez-Garcia, A.; van de Weijer, J.; Bengio, Y. Image-to-image translation for cross-domain disentanglement. In: Proceedings of the 33rd Conference on Neural Information Processing Systems, 1294–1305, 2018.
Yu, X.; Chen, Y.; Liu, S.; Li, T.; Li, G. Multi-mapping image-to-image translation via learning disentanglement. In: Proceedings of the 33rd Conference on Neural Information Processing Systems, 2990–2999, 2019.
Li, X. J.; Zhao, H. L.; Nie, G. Z.; Huang, H. Image recoloring using geodesic distance based color harmonization. Computational Visual Media Vol. 1, No. 2, 143–155, 2015.
Miao, Y. W.; Hu, F. X.; Zhang, X. D.; Chen, J. Z.; Pajarola, R. SymmSketch: Creating symmetric 3D free-form shapes from 2D sketches. Computational Visual Media Vol. 1, No. 1, 3–16, 2015.
Todo, H.; Yamaguchi, Y. Estimating reflectance and shape of objects from a single cartoon-shaded image. Computational Visual Media Vol. 3, No. 1, 21–31, 2017.
Hertzmann, A.; Jacobs, C. E.; Oliver, N.; Curless, B.; Salesin, D. H. Image analogies. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, 327–340, 2001.
Gatys, L. A.; Ecker, A. S.; Bethge, M. Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2414–2423, 2016.
Johnson, J.; Alahi, A.; Li, F. F. Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision — ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 694–711, 2016.
Sanakoyeu, A.; Kotovenko, D.; Lang, S.; Ommer, B. A style-aware content loss for real-time HD style transfer. In: Proceedings of the European Conference on Computer Vision, Vol. 11212, 715–731, 2018.
Zhang, Y. L.; Fang, C.; Wang, Y. L.; Wang, Z. W.; Lin, Z.; Fu, Y.; Yang, J. Multimodal style transfer via graph cuts. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5942–5950, 2019.
Song, C.; Wu, Z.; Zhou, Y.; Gong, M.; Huang, H. ETNet: Error transition network for arbitrary style transfer. In: Proceedings of the Advances in Neural Information Processing Systems, 668–677, 2019.
Wang, H.; Li, Y. J.; Wang, Y. H.; Hu, H. J.; Yang, M. H. Collaborative distillation for ultraresolution universal style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1857–1866, 2020.
Li, X.; Liu, S.; Kautz, J.; Yang, M. Learning linear transformations for fast arbitrary style transfer. arXiv preprint arXiv: 1808.04537, 2018.
Gao, W.; Li, Y. J.; Yin, Y. H.; Yang, M. H. Fast video multi-style transfer. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 3211–3219, 2020.
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Communications of the ACM Vol. 63, No. 11, 139–144, 2020
Kim, T.; Cha, M.; Kim, H.; Lee, J. K.; Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, 1857–1865, 2017.
Zhang, L.; Li, C. Z.; Wong, T. T.; Ji, Y.; Liu, C. P. Two-stage sketch colorization. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 261, 2019.
Zhu, J.-Y.; Zhang, R.; Pathak, D.; Darrell, T.; Efros, A. A.; Wang, O.; Shechtman, E. Toward multimodal image-to-image translation. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 465–476, 2017.
Lee, J.; Kim, E.; Lee, Y.; Kim, D.; Chang, J.; Choo, J. Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5800–5809, 2020.
Welsh, T.; Ashikhmin, M.; Mueller, K. Transferring color to greyscale images. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, 277–280, 2002.
Bugeau, A.; Ta, V. T.; Papadakis, N. Variational exemplar-based image colorization. IEEE Transactions on Image Processing Vol. 23, No. 1, 298–307, 2014.
Liu, X. P.; Wan, L.; Qu, Y. G.; Wong, T. T., Lin, S., Leung, C. S., Heng, P. A. Intrinsic colorization. In: Proceedings of the ACM SIGGRAPH Asia 2008 papers, Article No. 152, 2008.
Chia, A. Y. S.; Zhuo, S. J.; Gupta, R. K.; Tai, Y. W.; Cho, S. Y.; Tan, P.; Lin, S. Semantic colorization with Internet images. ACM Transactions on Graphics Vol. 30, No. 6, Article No. 156, 2011.
Gupta, R. K.; Chia, A. Y. S.; Rajan, D.; Ng, E. S.; Huang, Z. Y. Image colorization using similar images. In: Proceedings of the 20th ACM International Conference on Multimedia, 369–378, 2012.
Tai, Y. W.; Jia, J. Y.; Tang, C. K. Local color transfer via probabilistic segmentation by expectation-maximization. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 747–754, 2005.
He, M. M.; Chen, D. D.; Liao, J.; Sander, P. V.; Yuan, L. Deep exemplar-based colorization. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 47, 2018.
Zhang, B.; He, M. M.; Liao, J.; Sander, P. V.; Yuan, L.; Bermak, A.; Chen, D. Deep exemplar-based video colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8052–8061, 2019.
Huang, X.; Liu, M. Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 179–196, 2018.
Lee, H. Y.; Tseng, H. Y.; Huang, J. B.; Singh, M.; Yang, M. H. Diverse image-to-image translation via disentangled representations. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11205. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 36–52, 2018.
He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.
Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
Available at https://www.kaggle.com/ktaebum/anime-sketch-colorization-pair.
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. In: Proceedings of the 6th International Conference on Learning Representations, 2018.
Mescheder, L.; Geiger, A.; Nowozin, S. Which training methods for GANs do actually converge? In: Proceedings of the 35th International Conference on Machine Learning, 3478–3487, 2018.
Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, 2015.
Sun, T. H.; Lai, C. H.; Wong, S. K.; Wang, Y. S. Adversarial colorization of icons based on contour and color conditions. In: Proceedings of the 27th ACM International Conference on Multimedia, 683–691, 2019.
Wang, Z.; Bovik, A. C.; Sheikh, H. R.; Simoncelli, E. P. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing Vol. 13, No. 4, 600–612, 2004.
Dowson, D. C.; Landau, B. V. The Fréchet distance between multivariate normal distributions. Journal of Multivariate Analysis Vol. 12, No. 3, 450–455, 1982.
Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 107, 2017.
Acknowledgements
This work was supported in part by a CIHE Institutional Development Grant No. IDG200107, the National Natural Science Foundation of China under Grant No. 61973221, and the Natural Science Foundation of Guangdong Province of China under Grant Nos. 2018A030313381 and 2019A1515011165.
Author information
Authors and Affiliations
Corresponding author
Additional information
Xueting Liu received her B.Eng. degree in computer science and technology from Tsinghua University and her Ph.D. degree in computer science and engineering from the Chinese University of Hong Kong in 2009 and 2014, respectively. She is currently an assistant professor in the School of Computing and Information Sciences, Caritas Institute of Higher Education. Her research interests include computational art, intelligent art, computer vision, and computer graphics.
Wenliang Wu received his B.Sc. degree from Guangdong Ocean University of Science and Technology in 2019. He is currently a graduate student in the College of Computer Science and Software Engineering, Shenzhen University. His research interests include computer vision and computer graphics.
Chengze Li received her B.Eng. degree from the University of Science and Technology of China in 2013, and her Ph.D. degree in computer science and engineering from the Chinese University of Hong Kong in 2020. Chengze is currently an assistant professor in the School of Computing and Information Sciences, Caritas Institute of Higher Education, with research interests in 2D non-photorealistic media analysis and processing, computational photography, and computer graphics.
Yifan Li received his B.Sc. degree from Jiangxi University of Science and Technology in 2018 and is now a graduate student in the College of Computer Science and Software Engineering, Shenzhen University. His research interests include computer graphics, computer vision, machine learning, and deep learning.
Huisi Wu received his B.E. and M.E. degrees in computer science both from Xi’an Jiaotong University in 2004 and 2007, respectively. He obtained his Ph.D. degree in computer science from the Chinese University of Hong Kong in 2011. He is currently an associate professor in the College of Computer Science and Software Engineering, Shenzhen University. His research interests include computer graphics, image processing, and medical imaging.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Liu, X., Wu, W., Li, C. et al. Reference-guided structure-aware deep sketch colorization for cartoons. Comp. Visual Media 8, 135–148 (2022). https://doi.org/10.1007/s41095-021-0228-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-021-0228-6