A deep learning framework for historical manuscripts writer identification using data-driven features

Bennour, Akram; Boudraa, Merouane; Siddiqi, Imran; Al-Sarem, Mohammed; Al-Shabi, Mohammed; Ghabban, Fahad

doi:10.1007/s11042-024-18187-y

A deep learning framework for historical manuscripts writer identification using data-driven features

1238: Recent Advances in Biometrics Based on Biomedical Information
Published: 29 January 2024

Volume 83, pages 80075–80101, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

350 Accesses
Explore all metrics

A Correction to this article was published on 06 March 2024

This article has been updated

Abstract

Writer identification form historical manuscripts presents a challenging problem with significant implications for understanding the authorship of ancient texts. In this paper, we propose a novel deep learning framework tailored for the task of historical manuscripts writer identification. Our approach leverages data-driven features, harnessing the power of neural networks to extract and learn discriminative patterns from handwritten historical documents. The key innovation of our framework lies in its ability to automatically discover and utilize relevant features from data to profile the writer, eliminating the need for manual feature engineering. Our methodology encompasses three well-defined steps: initially, manuscript preprocessing involves image denoising using advanced techniques such as non-local means and total-variation, followed by binarization using a Canny-edge detector. In the subsequent phase, we employ Harris corner detector for automatic key-point detection and clustering, allowing us to identify the regions of interest within the documents. Lastly, the features extracted from these regions are subjected to classification through transfer learning, utilizing a deep learning-based model specifically trained on the extracted patches. To achieve the final document-level identification, we enhance the system accuracy by implementing a majority vote scheme, where the aggregated decisions from multiple patches contribute to the ultimate classification outcome. We validate our approach on “ICDAR 2017” dataset, spanning different periods and writing styles of historical manuscripts. Experimental results demonstrate the superior performance of our method in accurately identifying the authors of historical documents, surpassing existing techniques. Moreover, our framework exhibits robustness in scenarios where limited training data is available. This work not only contributes to the field of historical manuscripts analysis but also highlights the potential of deep learning in solving intricate problems in the realm of document analysis and authorship attribution. Our framework offers a promising avenue for scholars and historians to gain deeper insights into the authors of historical texts, opening new doors for historical research and preservation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Features for Writer Identification from Handwriting on Papyri

Writer Characterization from Handwriting on Papyri Using Multi-step Feature Learning

Writer Retrieval and Writer Identification in Greek Papyri

Data availability

Publicly available datasets that have been used in this paper: Icdar2017 competition on historical document writer identification (historical-wi). In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) [31].

Change history

06 March 2024
A Correction to this paper has been published: https://doi.org/10.1007/s11042-024-18856-y

References

Javidi M, Jampour M (2020) A deep learning framework for text-independent writer identification. Eng Appl Artif Intell 95:103912
Article Google Scholar
Chahal A, Gulia P (2019) Machine learning and deep learning. Int J Innov Technol Explor Eng 8(12):4910–4914
Article Google Scholar
Rehman A, Naz S, Razzak MI (2019) Writer identification using machine learning approaches: a comprehensive review. Multimed Tools Appl 78:10889–10931
Article Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10):1995
Boudraa M, Bennour A (2024) Combination of local features and deep learning to historical manuscripts dating. In: Bennour A, Bouridane A, Chaari L (eds) Intelligent systems and pattern recognition. ISPR 2023. Communications in computer and information science, vol 1940. Springer, Cham. https://doi.org/10.1007/978-3-031-46335-8_11
Buades A, Coll B, Morel JM (2011) Non-local means denoising. Image Process On Line 1:208–212
Article Google Scholar
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698
Article Google Scholar
Harris C, Stephens M (1988) A combined corner and edge detector, Proceedings of the 4th Alvey Vision Conference, pp 147–151
Jin X, Han J (2011) K-means clustering. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, Boston
Google Scholar
Abbas F, Gattal A, Djeddi C, Bensefia A, Jamil A, Saoudi K (2020) Offline writer identification based on CLBP and VLBP. In: Mediterranean conference on pattern recognition and artificial intelligence. Switzerland: Springer. pp 188–99
Abbas F, Gattal A, Djeddi C, Siddiqi I, Bensefia A, Saoudi K (2021) Texture feature column scheme for single-and multi-script writer identification. IET Biometrics 10(2):179–193
Article Google Scholar
Chammas M, Makhoul A, Demerjian J (2020) Writer identification for historical handwritten documents using a single feature extraction method. In: 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, pp 1–6
Bennour A (2018) Automatic handwriting analysis for writer identification and verification. In: Proceedings of the 7th International Conference on Software Engineering and New Technologies, pp 1–7
Bennour A et al (2019) Handwriting based writer recognition using implicit shape codebook. Forensic Sci Int 301:91–100
Article Google Scholar
Fecker D, Asit A, Märgner V, El-Sana J, Fingscheidt T (2014) Writer identification for historical Arabic documents. In: 2014 22nd International conference on pattern recognition. IEEE, pp 3050–3055
Asi A, Abdalhaleem A, Fecker D, Märgner V, El-Sana J (2017) On writer identification for Arabic historical manuscripts. Int J Doc Anal Recogn (IJDAR) 20:173–187
Article Google Scholar
Dhali MA, He S, Popović M, Tigchelaar E, Schomaker L (2017) A digital palaeographic approach towards writer identification in the dead sea scrolls. In: Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods-Volume 1: ICPRAM (vol 2017, pp 693–702). Scitepress; Setúbal
Lai S, Zhu Y, Jin L (2020) Encoding pathlet and SIFT features with bagged VLAD for historical writer identification. IEEE Trans Inform Forensics Secur 15:3553–3566
Article Google Scholar
Bennour A (2018) Clonal selection classification algorithm applied to arabic writer identification. In: Proceedings of the 8th International Conference on Information Systems and Technologies, pp 1–5
Chammas M, Makhoul A, Demerjian J, Dannaoui E (2022) A deep learning based system for writer identification in handwritten Arabic historical manuscripts. Multimedia Tools Appl 81:30769–30784
Article Google Scholar
He S, Schomaker L (2021) GR-RNN: Global-context residual recurrent neural networks for writer identification. Pattern Recogn 117:107975
Article Google Scholar
Semma A, Hannad Y, Siddiqi I, Djeddi C, El Kettani MEY (2021) Writer identification using deep learning with FAST key-points and Harris corner detector. Expert Syst Appl 184:115473
Article Google Scholar
Rehman A, Naz S, Razzak MI, Hameed IA (2019) Automatic visual features for writer identification: a deep learning approach. IEEE access 7:17149–17157
Article Google Scholar
Chammas M, Makhoul A, Demerjian J (2020) Writer identification for historical handwritten documents using a single feature extraction method. In: 19th International Conference on Machine Learning and Applications (ICMLA 2020). IEEE, USA
Christlein V, Gropp M, Fiel S, Maier A (2017) Unsupervised feature learning for writer identification and writer retrieval. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol 1. IEEE, USA. pp 991–7
Jordan S, Seuret M, Král P, Lenc L, Martínek J, Wiermann B et al (2020) Re-ranking for writer identification and writer retrieval. In: International Workshop on Document Analysis Systems. Springer, Switzerland. pp 572–86
Cilia N, De Stefano C, Fontanella F, Marrocco C, Molinara M, DiFreca AS (2020) An end-to-end deep learning system for medieval writer identification. Pattern Recognition Lett 129:137–143
Article Google Scholar
Mohammed H, Märgner V, Stiehl HS (2018) Writer identification for historical manuscripts: analysis and optimisation of a classifier as an easy-to-use tool for scholars from the humanities. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE. pp 534–9
Christlein V, Nicolaou A, Seuret M, Stutzmann D, Maier A (2019) ICDAR 2019 competition on image retrieval for historical handwritten documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1505–1509
Fiel S, Kleber F, Diem M, Christlein V, Louloudis G, Nikos S, Gatos B (2017) Icdar2017 competition on historical document writer identification (historical-wi). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (vol 1, pp 1377–1382). IEEE
Perona P, Shiota T, Malik J (1994) Anisotropic diffusion. Geometry-driven diffusion in computer vision, 73–92
Paris S, Kornprobst P, Tumblin J, Durand F (2009) Bilateral filtering: Theory and applications. Foundations and Trends® in Computer Graphics and Vision 4(1):1–73
Vogel CR, Oman ME (1996) Iterative methods for total variation denoising. SIAM J Sci Comput 17(1):227–238
Article MathSciNet Google Scholar
Otsu N (1979) A threshold selection method from gray-level histograms. Trans Syst Man Cybern 9(1):62–66
Article Google Scholar
Lowe DDG (2004) Distinctive image features from scale-invariant key-points. Int J Comput Vis 60(2):91–110, 1, 2
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: An efficient alternative to SIFT or SURF. In: 2011 International conference on computer vision. Ieee, pp 2564–2571
Rosten E, Porter R, Drummond T (2010) Faster and better: A machine learning approach to corner detection. IEEE Trans Patt Anal Mach Intell 32:105–119, 1
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4700–4708
Gattal A, Djeddi C, Siddiqi I, Al-Maadeed S (2018) Writer identification on historical documents using oriented basic image features. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, USA. pp 369–73
Vincent Christlein et al (2022) Writer Retrieval and Writer Identi_cation in Greek Papyri, Intertwining Graphonomics with Human Movements: 20th International Conference of the International Graphonomics Society, IGS 2021, Las Palmas de Gran Canaria, Spain, June 7–9
Mathias Seuret et al (2020) ICFHR 2020 Competition on Image Retrieval for Historical Handwritten Fragments, 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp 216–221

Download references

Acknowledgements

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of education in in Saudi Arabia for funding this research work through the project number: 444-9-507

Author information

Authors and Affiliations

Laboratory of mathematics, informatics and systems (LAMIS), Echahid Cheikh Larbi Tebessi University, Tebessa, Algeria
Akram Bennour & Merouane Boudraa
Xynoptik Pty Limited., Victoria, Australia
Imran Siddiqi
College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia
Mohammed Al-Sarem & Fahad Ghabban
Department of Management Information System, College of Business Administration, Taibah University, Medina, Saudi Arabia
Mohammed Al-Shabi

Authors

Akram Bennour
View author publications
You can also search for this author in PubMed Google Scholar
Merouane Boudraa
View author publications
You can also search for this author in PubMed Google Scholar
Imran Siddiqi
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Al-Sarem
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Al-Shabi
View author publications
You can also search for this author in PubMed Google Scholar
Fahad Ghabban
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akram Bennour.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: The fourth and fifth author names contain errors in the original publication of this article. Mohammad Al-Sarem should be Mohammed Al-Sarem. Mohammad Al-Shaby should be Mohammed Al-Shabi.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bennour, A., Boudraa, M., Siddiqi, I. et al. A deep learning framework for historical manuscripts writer identification using data-driven features. Multimed Tools Appl 83, 80075–80101 (2024). https://doi.org/10.1007/s11042-024-18187-y

Download citation

Received: 09 October 2023
Revised: 04 December 2023
Accepted: 05 January 2024
Published: 29 January 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s11042-024-18187-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A deep learning framework for historical manuscripts writer identification using data-driven features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Features for Writer Identification from Handwriting on Papyri

Writer Characterization from Handwriting on Papyri Using Multi-step Feature Learning

Writer Retrieval and Writer Identification in Greek Papyri

Data availability

Change history

06 March 2024

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

A deep learning framework for historical manuscripts writer identification using data-driven features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Features for Writer Identification from Handwriting on Papyri

Writer Characterization from Handwriting on Papyri Using Multi-step Feature Learning

Writer Retrieval and Writer Identification in Greek Papyri

Data availability

Change history

06 March 2024

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.