Abstract
This paper presents advances in zone classification for printed document image analysis. It firstly introduces entropic heuristic for text separation problem. Then a brief recall on existing texture and geometric discriminant parameters proposed in a previous research is done. Several of them are chosen and modified to perform statistical pattern recognition. For each of these two aspects, experiments are done. A document image database with groundtruth is used. Available results are discussed.
Chapter PDF
Similar content being viewed by others
Keywords
- Support Vector Machine
- Linear Discriminant Analysis
- Document Image
- Radial Basis Function Kernel
- Horizontal Projection
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Nedjem E. Ayat, Mohamed Cheriet, and Ching Y. Suen. Kmod-a two parameter svm kernel for pattern recognition, 2002. To appear in ICPR 2002. Quebec city, Canada, 2002.
C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20, 1995.
Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification. Wiley Interscience, 2001.
Jean Duong, Myriam Côté, and Hubert Emptoz. Extraction des régions textuelles dans les images de documents imprimés. In Reconnaissance de Formes et Intelligence Artificielle (RFIA), Angers (France), Janvier 2002.
Jean Duong, Myriam Côté, Hubert Emptoz, and Ching Y. Suen. Extraction of text areas in printed document images. In ACM Symposium on Document Engineering (DocEng), pages 157–165, Atlanta (Georgia, USA), November 2001.
K.C. Fan, C.H. Liu, and Y.K. Wang. Segmentation and classification of mixed text/graphics/image documents. Pattern Recognition Letters, 15:1201–1209, 1994.
K.C. Fan and L.S. Wang. Classification of document blocks using density feature and connectivity histogram. Pattern Recognition Letters, 16:955–962, 1995.
Robert M. Haralick. Document image understanding: Geometric and logical layout. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), volume 4, pages 384–390, 1994.
Anil K. Jain, Robert P. W. Duin, and Jianchang Mao. Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 22(1):4–37, Januray 2000.
Anil K. Jain and Bin Yu. Document representation and its application to page decomposition. IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI), 20(3):294–308, March 1998.
George Nagy. Twenty years of document image ananlysis in pami. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 22(1):38–62, January 2000.
University of Oulu (Finland). Mediateam document database, 1998.
Oleg Okun, David Dœrmann, and Matti Pietikäinen. Page segmentation and zone classification: The state of the art, November 1999.
B. Scholkopf, C. Burges, and A. Smola. Advances in Kernel Methods: Support Vector Learning, chapter 1. MIT Press, 1999.
Vladimir Vapnik. The nature of Statistical Learning Theory. Springer Verlag, New-York (USA), 1995.
Kwan Y. Wong, Richard G. Casey, and Friedrich M. Wahl. Document analysis system. IBM Journal of Research and Developpment, 26(6):647–656, November 1982.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Duong, J., Côté, M., Emptoz, H. (2002). Feature Approach for Printed Document Image Analysis. In: Caelli, T., Amin, A., Duin, R.P.W., de Ridder, D., Kamel, M. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2002. Lecture Notes in Computer Science, vol 2396. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-70659-3_16
Download citation
DOI: https://doi.org/10.1007/3-540-70659-3_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44011-6
Online ISBN: 978-3-540-70659-5
eBook Packages: Springer Book Archive