Skip to main content
Log in

A deep learning framework for historical manuscripts writer identification using data-driven features

  • 1238: Recent Advances in Biometrics Based on Biomedical Information
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

A Correction to this article was published on 06 March 2024

This article has been updated

Abstract

Writer identification form historical manuscripts presents a challenging problem with significant implications for understanding the authorship of ancient texts. In this paper, we propose a novel deep learning framework tailored for the task of historical manuscripts writer identification. Our approach leverages data-driven features, harnessing the power of neural networks to extract and learn discriminative patterns from handwritten historical documents. The key innovation of our framework lies in its ability to automatically discover and utilize relevant features from data to profile the writer, eliminating the need for manual feature engineering. Our methodology encompasses three well-defined steps: initially, manuscript preprocessing involves image denoising using advanced techniques such as non-local means and total-variation, followed by binarization using a Canny-edge detector. In the subsequent phase, we employ Harris corner detector for automatic key-point detection and clustering, allowing us to identify the regions of interest within the documents. Lastly, the features extracted from these regions are subjected to classification through transfer learning, utilizing a deep learning-based model specifically trained on the extracted patches. To achieve the final document-level identification, we enhance the system accuracy by implementing a majority vote scheme, where the aggregated decisions from multiple patches contribute to the ultimate classification outcome. We validate our approach on “ICDAR 2017” dataset, spanning different periods and writing styles of historical manuscripts. Experimental results demonstrate the superior performance of our method in accurately identifying the authors of historical documents, surpassing existing techniques. Moreover, our framework exhibits robustness in scenarios where limited training data is available. This work not only contributes to the field of historical manuscripts analysis but also highlights the potential of deep learning in solving intricate problems in the realm of document analysis and authorship attribution. Our framework offers a promising avenue for scholars and historians to gain deeper insights into the authors of historical texts, opening new doors for historical research and preservation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

Publicly available datasets that have been used in this paper: Icdar2017 competition on historical document writer identification (historical-wi). In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) [31].

Change history

References

  1. Javidi M, Jampour M (2020) A deep learning framework for text-independent writer identification. Eng Appl Artif Intell 95:103912

    Article  Google Scholar 

  2. Chahal A, Gulia P (2019) Machine learning and deep learning. Int J Innov Technol Explor Eng 8(12):4910–4914

    Article  Google Scholar 

  3. Rehman A, Naz S, Razzak MI (2019) Writer identification using machine learning approaches: a comprehensive review. Multimed Tools Appl 78:10889–10931

    Article  Google Scholar 

  4. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  5. LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10):1995

  6. Boudraa M, Bennour A (2024) Combination of local features and deep learning to historical manuscripts dating. In: Bennour A, Bouridane A, Chaari L (eds) Intelligent systems and pattern recognition. ISPR 2023. Communications in computer and information science, vol 1940. Springer, Cham. https://doi.org/10.1007/978-3-031-46335-8_11

  7. Buades A, Coll B, Morel JM (2011) Non-local means denoising. Image Process On Line 1:208–212

    Article  Google Scholar 

  8. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698

    Article  Google Scholar 

  9. Harris C, Stephens M (1988) A combined corner and edge detector, Proceedings of the 4th Alvey Vision Conference, pp 147–151

  10. Jin X, Han J (2011) K-means clustering. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, Boston

    Google Scholar 

  11. Abbas F, Gattal A, Djeddi C, Bensefia A, Jamil A, Saoudi K (2020) Offline writer identification based on CLBP and VLBP. In: Mediterranean conference on pattern recognition and artificial intelligence. Switzerland: Springer. pp 188–99

  12. Abbas F, Gattal A, Djeddi C, Siddiqi I, Bensefia A, Saoudi K (2021) Texture feature column scheme for single-and multi-script writer identification. IET Biometrics 10(2):179–193

    Article  Google Scholar 

  13. Chammas M, Makhoul A, Demerjian J (2020) Writer identification for historical handwritten documents using a single feature extraction method. In: 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, pp 1–6

  14. Bennour A (2018) Automatic handwriting analysis for writer identification and verification. In: Proceedings of the 7th International Conference on Software Engineering and New Technologies, pp 1–7

  15. Bennour A et al (2019) Handwriting based writer recognition using implicit shape codebook. Forensic Sci Int 301:91–100

    Article  Google Scholar 

  16. Fecker D, Asit A, Märgner V, El-Sana J, Fingscheidt T (2014) Writer identification for historical Arabic documents. In: 2014 22nd International conference on pattern recognition. IEEE, pp 3050–3055

  17. Asi A, Abdalhaleem A, Fecker D, Märgner V, El-Sana J (2017) On writer identification for Arabic historical manuscripts. Int J Doc Anal Recogn (IJDAR) 20:173–187

    Article  Google Scholar 

  18. Dhali MA, He S, Popović M, Tigchelaar E, Schomaker L (2017) A digital palaeographic approach towards writer identification in the dead sea scrolls. In: Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods-Volume 1: ICPRAM (vol 2017, pp 693–702). Scitepress; Setúbal

  19. Lai S, Zhu Y, Jin L (2020) Encoding pathlet and SIFT features with bagged VLAD for historical writer identification. IEEE Trans Inform Forensics Secur 15:3553–3566

    Article  Google Scholar 

  20. Bennour A (2018) Clonal selection classification algorithm applied to arabic writer identification. In: Proceedings of the 8th International Conference on Information Systems and Technologies, pp 1–5

  21. Chammas M, Makhoul A, Demerjian J, Dannaoui E (2022) A deep learning based system for writer identification in handwritten Arabic historical manuscripts. Multimedia Tools Appl 81:30769–30784

    Article  Google Scholar 

  22. He S, Schomaker L (2021) GR-RNN: Global-context residual recurrent neural networks for writer identification. Pattern Recogn 117:107975

    Article  Google Scholar 

  23. Semma A, Hannad Y, Siddiqi I, Djeddi C, El Kettani MEY (2021) Writer identification using deep learning with FAST key-points and Harris corner detector. Expert Syst Appl 184:115473

    Article  Google Scholar 

  24. Rehman A, Naz S, Razzak MI, Hameed IA (2019) Automatic visual features for writer identification: a deep learning approach. IEEE access 7:17149–17157

    Article  Google Scholar 

  25. Chammas M, Makhoul A, Demerjian J (2020) Writer identification for historical handwritten documents using a single feature extraction method. In: 19th International Conference on Machine Learning and Applications (ICMLA 2020). IEEE, USA

  26. Christlein V, Gropp M, Fiel S, Maier A (2017) Unsupervised feature learning for writer identification and writer retrieval. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol 1. IEEE, USA. pp 991–7

  27. Jordan S, Seuret M, Král P, Lenc L, Martínek J, Wiermann B et al (2020) Re-ranking for writer identification and writer retrieval. In: International Workshop on Document Analysis Systems. Springer, Switzerland. pp 572–86

  28. Cilia N, De Stefano C, Fontanella F, Marrocco C, Molinara M, DiFreca AS (2020) An end-to-end deep learning system for medieval writer identification. Pattern Recognition Lett 129:137–143

    Article  Google Scholar 

  29. Mohammed H, Märgner V, Stiehl HS (2018) Writer identification for historical manuscripts: analysis and optimisation of a classifier as an easy-to-use tool for scholars from the humanities. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE. pp 534–9

  30. Christlein V, Nicolaou A, Seuret M, Stutzmann D, Maier A (2019) ICDAR 2019 competition on image retrieval for historical handwritten documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1505–1509

  31. Fiel S, Kleber F, Diem M, Christlein V, Louloudis G, Nikos S, Gatos B (2017) Icdar2017 competition on historical document writer identification (historical-wi). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (vol 1, pp 1377–1382). IEEE

  32. Perona P, Shiota T, Malik J (1994) Anisotropic diffusion. Geometry-driven diffusion in computer vision, 73–92

  33. Paris S, Kornprobst P, Tumblin J, Durand F (2009) Bilateral filtering: Theory and applications. Foundations and Trends® in Computer Graphics and Vision 4(1):1–73

  34. Vogel CR, Oman ME (1996) Iterative methods for total variation denoising. SIAM J Sci Comput 17(1):227–238

    Article  MathSciNet  Google Scholar 

  35. Otsu N (1979) A threshold selection method from gray-level histograms. Trans Syst Man Cybern 9(1):62–66

    Article  Google Scholar 

  36. Lowe DDG (2004) Distinctive image features from scale-invariant key-points. Int J Comput Vis 60(2):91–110, 1, 2

  37. Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: An efficient alternative to SIFT or SURF. In: 2011 International conference on computer vision. Ieee, pp 2564–2571

  38. Rosten E, Porter R, Drummond T (2010) Faster and better: A machine learning approach to corner detection. IEEE Trans Patt Anal Mach Intell 32:105–119, 1

  39. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4700–4708

  40. Gattal A, Djeddi C, Siddiqi I, Al-Maadeed S (2018) Writer identification on historical documents using oriented basic image features. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, USA. pp 369–73

  41. Vincent Christlein et al (2022) Writer Retrieval and Writer Identi_cation in Greek Papyri, Intertwining Graphonomics with Human Movements: 20th International Conference of the International Graphonomics Society, IGS 2021, Las Palmas de Gran Canaria, Spain, June 7–9

  42. Mathias Seuret et al (2020) ICFHR 2020 Competition on Image Retrieval for Historical Handwritten Fragments, 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp 216–221

Download references

Acknowledgements

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of education in in Saudi Arabia for funding this research work through the project number: 444-9-507

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akram Bennour.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: The fourth and fifth author names contain errors in the original publication of this article. Mohammad Al-Sarem should be Mohammed Al-Sarem. Mohammad Al-Shaby should be Mohammed Al-Shabi.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bennour, A., Boudraa, M., Siddiqi, I. et al. A deep learning framework for historical manuscripts writer identification using data-driven features. Multimed Tools Appl 83, 80075–80101 (2024). https://doi.org/10.1007/s11042-024-18187-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-024-18187-y

Keywords

Navigation

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy