Abstract
Writer identification form historical manuscripts presents a challenging problem with significant implications for understanding the authorship of ancient texts. In this paper, we propose a novel deep learning framework tailored for the task of historical manuscripts writer identification. Our approach leverages data-driven features, harnessing the power of neural networks to extract and learn discriminative patterns from handwritten historical documents. The key innovation of our framework lies in its ability to automatically discover and utilize relevant features from data to profile the writer, eliminating the need for manual feature engineering. Our methodology encompasses three well-defined steps: initially, manuscript preprocessing involves image denoising using advanced techniques such as non-local means and total-variation, followed by binarization using a Canny-edge detector. In the subsequent phase, we employ Harris corner detector for automatic key-point detection and clustering, allowing us to identify the regions of interest within the documents. Lastly, the features extracted from these regions are subjected to classification through transfer learning, utilizing a deep learning-based model specifically trained on the extracted patches. To achieve the final document-level identification, we enhance the system accuracy by implementing a majority vote scheme, where the aggregated decisions from multiple patches contribute to the ultimate classification outcome. We validate our approach on “ICDAR 2017” dataset, spanning different periods and writing styles of historical manuscripts. Experimental results demonstrate the superior performance of our method in accurately identifying the authors of historical documents, surpassing existing techniques. Moreover, our framework exhibits robustness in scenarios where limited training data is available. This work not only contributes to the field of historical manuscripts analysis but also highlights the potential of deep learning in solving intricate problems in the realm of document analysis and authorship attribution. Our framework offers a promising avenue for scholars and historians to gain deeper insights into the authors of historical texts, opening new doors for historical research and preservation.










Similar content being viewed by others
Data availability
Publicly available datasets that have been used in this paper: Icdar2017 competition on historical document writer identification (historical-wi). In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) [31].
Change history
06 March 2024
A Correction to this paper has been published: https://doi.org/10.1007/s11042-024-18856-y
References
Javidi M, Jampour M (2020) A deep learning framework for text-independent writer identification. Eng Appl Artif Intell 95:103912
Chahal A, Gulia P (2019) Machine learning and deep learning. Int J Innov Technol Explor Eng 8(12):4910–4914
Rehman A, Naz S, Razzak MI (2019) Writer identification using machine learning approaches: a comprehensive review. Multimed Tools Appl 78:10889–10931
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10):1995
Boudraa M, Bennour A (2024) Combination of local features and deep learning to historical manuscripts dating. In: Bennour A, Bouridane A, Chaari L (eds) Intelligent systems and pattern recognition. ISPR 2023. Communications in computer and information science, vol 1940. Springer, Cham. https://doi.org/10.1007/978-3-031-46335-8_11
Buades A, Coll B, Morel JM (2011) Non-local means denoising. Image Process On Line 1:208–212
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698
Harris C, Stephens M (1988) A combined corner and edge detector, Proceedings of the 4th Alvey Vision Conference, pp 147–151
Jin X, Han J (2011) K-means clustering. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, Boston
Abbas F, Gattal A, Djeddi C, Bensefia A, Jamil A, Saoudi K (2020) Offline writer identification based on CLBP and VLBP. In: Mediterranean conference on pattern recognition and artificial intelligence. Switzerland: Springer. pp 188–99
Abbas F, Gattal A, Djeddi C, Siddiqi I, Bensefia A, Saoudi K (2021) Texture feature column scheme for single-and multi-script writer identification. IET Biometrics 10(2):179–193
Chammas M, Makhoul A, Demerjian J (2020) Writer identification for historical handwritten documents using a single feature extraction method. In: 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, pp 1–6
Bennour A (2018) Automatic handwriting analysis for writer identification and verification. In: Proceedings of the 7th International Conference on Software Engineering and New Technologies, pp 1–7
Bennour A et al (2019) Handwriting based writer recognition using implicit shape codebook. Forensic Sci Int 301:91–100
Fecker D, Asit A, Märgner V, El-Sana J, Fingscheidt T (2014) Writer identification for historical Arabic documents. In: 2014 22nd International conference on pattern recognition. IEEE, pp 3050–3055
Asi A, Abdalhaleem A, Fecker D, Märgner V, El-Sana J (2017) On writer identification for Arabic historical manuscripts. Int J Doc Anal Recogn (IJDAR) 20:173–187
Dhali MA, He S, Popović M, Tigchelaar E, Schomaker L (2017) A digital palaeographic approach towards writer identification in the dead sea scrolls. In: Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods-Volume 1: ICPRAM (vol 2017, pp 693–702). Scitepress; Setúbal
Lai S, Zhu Y, Jin L (2020) Encoding pathlet and SIFT features with bagged VLAD for historical writer identification. IEEE Trans Inform Forensics Secur 15:3553–3566
Bennour A (2018) Clonal selection classification algorithm applied to arabic writer identification. In: Proceedings of the 8th International Conference on Information Systems and Technologies, pp 1–5
Chammas M, Makhoul A, Demerjian J, Dannaoui E (2022) A deep learning based system for writer identification in handwritten Arabic historical manuscripts. Multimedia Tools Appl 81:30769–30784
He S, Schomaker L (2021) GR-RNN: Global-context residual recurrent neural networks for writer identification. Pattern Recogn 117:107975
Semma A, Hannad Y, Siddiqi I, Djeddi C, El Kettani MEY (2021) Writer identification using deep learning with FAST key-points and Harris corner detector. Expert Syst Appl 184:115473
Rehman A, Naz S, Razzak MI, Hameed IA (2019) Automatic visual features for writer identification: a deep learning approach. IEEE access 7:17149–17157
Chammas M, Makhoul A, Demerjian J (2020) Writer identification for historical handwritten documents using a single feature extraction method. In: 19th International Conference on Machine Learning and Applications (ICMLA 2020). IEEE, USA
Christlein V, Gropp M, Fiel S, Maier A (2017) Unsupervised feature learning for writer identification and writer retrieval. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol 1. IEEE, USA. pp 991–7
Jordan S, Seuret M, Král P, Lenc L, Martínek J, Wiermann B et al (2020) Re-ranking for writer identification and writer retrieval. In: International Workshop on Document Analysis Systems. Springer, Switzerland. pp 572–86
Cilia N, De Stefano C, Fontanella F, Marrocco C, Molinara M, DiFreca AS (2020) An end-to-end deep learning system for medieval writer identification. Pattern Recognition Lett 129:137–143
Mohammed H, Märgner V, Stiehl HS (2018) Writer identification for historical manuscripts: analysis and optimisation of a classifier as an easy-to-use tool for scholars from the humanities. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE. pp 534–9
Christlein V, Nicolaou A, Seuret M, Stutzmann D, Maier A (2019) ICDAR 2019 competition on image retrieval for historical handwritten documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1505–1509
Fiel S, Kleber F, Diem M, Christlein V, Louloudis G, Nikos S, Gatos B (2017) Icdar2017 competition on historical document writer identification (historical-wi). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (vol 1, pp 1377–1382). IEEE
Perona P, Shiota T, Malik J (1994) Anisotropic diffusion. Geometry-driven diffusion in computer vision, 73–92
Paris S, Kornprobst P, Tumblin J, Durand F (2009) Bilateral filtering: Theory and applications. Foundations and Trends® in Computer Graphics and Vision 4(1):1–73
Vogel CR, Oman ME (1996) Iterative methods for total variation denoising. SIAM J Sci Comput 17(1):227–238
Otsu N (1979) A threshold selection method from gray-level histograms. Trans Syst Man Cybern 9(1):62–66
Lowe DDG (2004) Distinctive image features from scale-invariant key-points. Int J Comput Vis 60(2):91–110, 1, 2
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: An efficient alternative to SIFT or SURF. In: 2011 International conference on computer vision. Ieee, pp 2564–2571
Rosten E, Porter R, Drummond T (2010) Faster and better: A machine learning approach to corner detection. IEEE Trans Patt Anal Mach Intell 32:105–119, 1
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4700–4708
Gattal A, Djeddi C, Siddiqi I, Al-Maadeed S (2018) Writer identification on historical documents using oriented basic image features. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, USA. pp 369–73
Vincent Christlein et al (2022) Writer Retrieval and Writer Identi_cation in Greek Papyri, Intertwining Graphonomics with Human Movements: 20th International Conference of the International Graphonomics Society, IGS 2021, Las Palmas de Gran Canaria, Spain, June 7–9
Mathias Seuret et al (2020) ICFHR 2020 Competition on Image Retrieval for Historical Handwritten Fragments, 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp 216–221
Acknowledgements
The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of education in in Saudi Arabia for funding this research work through the project number: 444-9-507
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: The fourth and fifth author names contain errors in the original publication of this article. Mohammad Al-Sarem should be Mohammed Al-Sarem. Mohammad Al-Shaby should be Mohammed Al-Shabi.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bennour, A., Boudraa, M., Siddiqi, I. et al. A deep learning framework for historical manuscripts writer identification using data-driven features. Multimed Tools Appl 83, 80075–80101 (2024). https://doi.org/10.1007/s11042-024-18187-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-024-18187-y