SynthText3D: synthesizing scene text images from 3D virtual worlds

Liao, Minghui; Song, Boyu; Long, Shangbang; He, Minghang; Yao, Cong; Bai, Xiang

doi:10.1007/s11432-019-2737-0

SynthText3D: synthesizing scene text images from 3D virtual worlds

Research Paper
Special Focus on Deep Learning for Computer Vision
Published: 15 January 2020

Volume 63, article number 120105, (2020)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

403 Accesses
31 Citations
Explore all metrics

Abstract

With the development of deep neural networks, the demand for a significant amount of annotated training data becomes the performance bottlenecks in many fields of research and applications. Image synthesis can generate annotated images automatically and freely, which gains increasing attention recently. In this paper, we propose to synthesize scene text images from the 3D virtual worlds, where the precise descriptions of scenes, editable illumination/visibility, and realistic physics are provided. Different from the previous methods which paste the rendered text on static 2D images, our method can render the 3D virtual scene and text instances as an entirety. In this way, real-world variations, including complex perspective transformations, various illuminations, and occlusions, can be realized in our synthesized scene text images. Moreover, the same text instances with various viewpoints can be produced by randomly moving and rotating the virtual camera, which acts as human eyes. The experiments on the standard scene text detection benchmarks using the generated synthetic data demonstrate the effectiveness and superiority of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text to Image Synthesis Using Bridge Generative Adversarial Network and Char CNN Model

UDiffText: A Unified Framework for High-Quality Text Synthesis in Arbitrary Images via Character-Aware Diffusion Models

Controllable 3D Object Generation with Single Image Prompt

References

Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2315–2324
Zhan F, Lu S, Xue C. Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In: Proceedings of European Conference on Computer Vision, 2018. 249–266
Jaderberg M, Simonyan K, Vedaldi A, et al. Synthetic data and artificial neural networks for natural scene text recognition. 2014. ArXiv: 1406.2227
Zhu Z, Huang T, Shi B, et al. Progressive pose attention transfer for person image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2347–2356
Varol G, Romero J, Martin X, et al. Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 109–117
Papon J, Schoeler M. Semantic pose using deep networks trained on synthetic RGB-D. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 774–782
McCormac J, Handa A, Leutenegger S, et al. Scenenet RGB-D: 5 m photorealistic images of synthetic indoor trajectories with ground truth. 2016. ArXiv: 1612.05079
Ros G, Sellart L, Materzynska J, et al. The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 3234–3243
Saleh F S, Aliakbarian M S, Salzmann M, et al. Effective use of synthetic data for urban scene semantic segmentation. In: Proceedings of European Conference on Computer Vision, 2018. 86–103
Peng X, Sun B, Ali K, et al. Learning deep object detectors from 3D models. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 1278–1286
Tremblay J, To T, Birchfield S. Falling things: a synthetic dataset for 3D object detection and pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 2038–2041
Hinterstoisser S, Pauly O, Heibel H, et al. An annotation saved is an annotation earned: using fully synthetic training for object instance detection. 2019. ArXiv: 1902.09967
Ye Y Y, Zhang C, Hao X L. Arpnet: attention regional proposal network for 3D object detection. Sci China Inf Sci, 2019, 62: 220104
Article Google Scholar
Cao J, Pang Y, Li X. Learning multilayer channel features for pedestrian detection. IEEE Trans Image Process, 2017, 26: 3210–3220
Article MathSciNet MATH Google Scholar
Cao J, Pang Y, Li X. Pedestrian detection inspired by appearance constancy and shape symmetry. IEEE Trans Image Process, 2016, 25: 5538–5551
Article MathSciNet MATH Google Scholar
Quiter C, Ernst M. deepdrive/deepdrive: 2.0. 2018. https://zenodo.org/record/1248998#.Xhd25Ef0laQ
Martinez M, Sitawarin C, Finch K, et al. Beyond grand theft auto V for training, testing and enhancing deep learning in self driving cars. 2017. ArXiv: 1712.01397
Qiu W, Yuille A. Unrealcv: connecting computer vision to unreal engine. In: Proceedings of European Conference on Computer Vision, 2016. 909–916
Ganoni O, Mukundan R. A framework for visually realistic multi-robot simulation in natural environment. 2017. ArXiv: 1708.01938
Wang T, Wu J D, Coates A, et al. End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR), 2012. 3304–3308
Zhan F, Zhu H, Lu S. Spatial fusion gan for image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 3653–3662
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 2672–2680
Ye Q, Doermann D. Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 1480–1500
Article Google Scholar
Bai X, Yang M K, Shi B G, et al. Deep learning for scene text detection and recognition (in Chinese). Sci Sin Inform, 2018, 48: 531–544
Article Google Scholar
Liu Y, Jin L, Zhang S, et al. Detecting curve text in the wild: new dataset and new solution. 2017. ArXiv: 1712.02170
Liao M, Shi B, Bai X, et al. TextBoxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2017. 4161–4167
Ma J, Shao W, Ye H, et al. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimedia, 2018, 20: 3111–3122
Article Google Scholar
Liu Y, Jin L. Deep matching prior network: toward tighter multi-oriented text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1962–1969
He W, Zhang Y-X, Yin F, et al. Deep direct regression for multi-oriented scene text detection. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 745–753
Zhou X, Yao C, Wen H, et al. EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 5551–5560
Liao M, Zhu Z, Shi B, et al. Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 5909–5918
Liao M, Lyu P, He M, et al. Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans Pattern Anal Mach Intell, 2019
Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 91–99
Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector. In: Proceedings of European Conference on Computer Vision, 2016. 21–37
Liao M, Shi B, Bai X. TextBoxes++: a single-shot oriented scene text detector. IEEE Trans Image Process, 2018, 27: 3676–3690
Article MathSciNet MATH Google Scholar
Shi B, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2550–2558
Wu Y, Natarajan P. Self-organized text detection with minimal post-processing via border learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 5000–5009
Long S, Ruan J, Zhang W, et al. Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of European Conference on Computer Vision, 2018. 20–36
Deng D, Liu H, Li X, et al. Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2018. 6773–6780
Lyu P, Yao C, Wu W, et al. Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 7553–7563
Chen J, Lian Z H, Wang Y Z, et al. Irregular scene text detection via attention guided border labeling. Sci China Inf Sci, 2019, 62: 220103
Article Google Scholar
Arbeláez P, Maire M, Fowlkes C, et al. Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell, 2011, 33: 898–916
Article Google Scholar
Lu S J, Tan C, Lim J-H. Robust and efficient saliency modeling from image co-occurrence histograms. IEEE Trans Pattern Anal Mach Intell, 2014, 36: 195–201
Article Google Scholar
Lin Y-T, Maire M, Belongie S, et al. Microsoft COCO: common objects in context. In: Proceedings of European Conference on Computer Vision, 2014. 740–755
Roth S D. Ray casting for modeling solids. Comput Graph Image Process, 1982, 18: 109–144
Article Google Scholar
Karatzas D, Shafait F, Uchida S, et al. ICDAR 2013 robust reading competition. In: Proceedings of International Conference on Document Analysis and Recognition, 2013. 1484–1493
Karatzas D, Gomez-Bigorda L, Nicolaou A, et al. ICDAR 2015 competition on robust reading. In: Proceedings of International Conference on Document Analysis and Recognition, 2015. 1156–1160
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 61733007). Xiang BAI was supported by National Program for Support of Top-notch Young Professionals and the Program for HUST Academic Frontier Youth Team (Grant No. 2017QYTD08).

Author information

Liao M H, Song B Y, and Long S B have the same contribution to this work.

Authors and Affiliations

School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, 430074, China
Minghui Liao, Minghang He & Xiang Bai
School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Boyu Song
School of Economics, Peking University, Beijing, 100871, China
Shangbang Long
MEG VII, Beijing, 100080, China
Cong Yao

Authors

Minghui Liao
View author publications
You can also search for this author in PubMed Google Scholar
Boyu Song
View author publications
You can also search for this author in PubMed Google Scholar
Shangbang Long
View author publications
You can also search for this author in PubMed Google Scholar
Minghang He
View author publications
You can also search for this author in PubMed Google Scholar
Cong Yao
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Bai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Bai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liao, M., Song, B., Long, S. et al. SynthText3D: synthesizing scene text images from 3D virtual worlds. Sci. China Inf. Sci. 63, 120105 (2020). https://doi.org/10.1007/s11432-019-2737-0

Download citation

Received: 15 October 2019
Revised: 25 November 2019
Accepted: 03 December 2019
Published: 15 January 2020
DOI: https://doi.org/10.1007/s11432-019-2737-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SynthText3D: synthesizing scene text images from 3D virtual worlds

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text to Image Synthesis Using Bridge Generative Adversarial Network and Char CNN Model

UDiffText: A Unified Framework for High-Quality Text Synthesis in Arbitrary Images via Character-Aware Diffusion Models

Controllable 3D Object Generation with Single Image Prompt

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

SynthText3D: synthesizing scene text images from 3D virtual worlds

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text to Image Synthesis Using Bridge Generative Adversarial Network and Char CNN Model

UDiffText: A Unified Framework for High-Quality Text Synthesis in Arbitrary Images via Character-Aware Diffusion Models

Controllable 3D Object Generation with Single Image Prompt

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.