Detecting Forged Receipts with Domain-Specific Ontology-Based Entities & Relations

Tornés, Beatriz Martínez; Boros, Emanuela; Doucet, Antoine; Gomez-Krämer, Petra; Ogier, Jean-Marc

doi:10.1007/978-3-031-41682-8_12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14189))

Included in the following conference series:

International Conference on Document Analysis and Recognition

1062 Accesses
2 Citations

Abstract

In this paper, we tackle the task of document fraud detection. We consider that this task can be addressed with natural language processing techniques. We treat it as a regression-based approach, by taking advantage of a pre-trained language model in order to represent the textual content, and by enriching the representation with domain-specific ontology-based entities and relations. We emulate an entity-based approach by comparing different types of input: raw text, extracted entities and a triple-based reformulation of the document content. For our experimental setup, we utilize the single freely available dataset of forged receipts, and we provide a deep analysis of our results in regard to the efficiency of our methods. Our findings show interesting correlations between the types of ontology relations (e.g., has_address, amounts_to), types of entities (product, company, etc.) and the performance of a regression-based language model that could help to study the transfer learning from natural language processing (NLP) methods to boost the performance of existing fraud detection systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hierarchical Joint Entity Recognition and Relation Extraction of Contextual Entities in Family History Records

CimpleKG: A Continuously Updated Knowledge Graph on Misinformation, Factors and Fact-Checks

Fraud detection with natural language processing

Article Open access 19 July 2023

Notes

1.
The platform is available at https://receipts.univ-lr.fr/.
2.
https://owlready2.readthedocs.io/en/v0.37/.
3.
This strategy has been previously explored in research for different NLP tasks [10, 11, 35].

References

Abramova, S., et al.: Detecting copy-move forgeries in scanned text documents. Electron. Imaging 2016(8), 1–9 (2016)
Article Google Scholar
Ahmed, A.G.H., Shafait, F.: Forgery detection based on intrinsic document contents. In: 2014 11th IAPR International Workshop on Document Analysis Systems, pp. 252–256 (2014)
Google Scholar
Artaud, C., Doucet, A., Ogier, J.M., d’Andecy, V.P.: Receipt dataset for fraud detection. In: First International Workshop on Computational Document Forensics (2017)
Google Scholar
Artaud, C., Sidère, N., Doucet, A., Ogier, J.M., Yooz, V.P.D.: Find it! fraud detection contest report. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 13–18 (2018)
Google Scholar
Artaud, C.: Détection des fraudes : de l’image à la sémantique du contenu. Application à la vérification des informations extraites d’un corpus de tickets de caisse, PhD Thesis, University of La Rochelle (2019)
Google Scholar
Behera, T.K., Panigrahi, S.: Credit card fraud detection: a hybrid approach using fuzzy clustering & neural network. In: 2015 Second International Conference on Advances in Computing and Communication Engineering (2015)
Google Scholar
Benchaji, I., Douzi, S., El Ouahidi, B.: Using genetic algorithm to improve classification of imbalanced datasets for credit card fraud detection. In: International Conference on Advanced Information Technology, Services and Systems (2018)
Google Scholar
Bertrand, R., Gomez-Krämer, P., Terrades, O.R., Franco, P., Ogier, J.M.: A system based on intrinsic features for fraudulent document detection. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 106–110. Washington, DC (2013)
Google Scholar
Bertrand, R., Terrades, O.R., Gomez-Krämer, P., Franco, P., Ogier, J.M.: A conditional random field model for font forgery detection. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 576–580 (2015)
Google Scholar
Boros, E., Moreno, J., Doucet, A.: Event detection with entity markers. In: European Conference on Information Retrieval, pp. 233–240 (2021)
Google Scholar
Boros, E., Moreno, J.G., Doucet, A.: Exploring entities in event detection as question answering. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13185, pp. 65–79. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99736-6_5
Chapter Google Scholar
Carta, S., Fenu, G., Recupero, D.R., Saia, R.: Fraud detection for e-commerce transactions by employing a prudential multiple consensus model. J. Inf. Secur. Appl. 46, 13–22 (2019)
Google Scholar
Cozzolino, D., Gragnaniello, D., Verdoliva, L.: Image forgery detection through residual-based local descriptors and block-matching. In: 2014 IEEE International Conference on Image Processing (ICIP) (2014)
Google Scholar
Cozzolino, D., Poggi, G., Verdoliva, L.: Efficient dense-field copy-move forgery detection. IEEE Trans. Inf. Forensics Secur. 10(11), 2284–2297 (2015)
Article Google Scholar
Cozzolino, D., Verdoliva, L.: Camera-based image forgery localization using convolutional neural networks. In: 2018 26th European Signal Processing Conference (EUSIPCO) (2018)
Google Scholar
Cozzolino, D., Verdoliva, L.: Noiseprint: A CNN-based camera model fingerprint. IEEE Trans. Inf. Forensics Secur. 15, 144–159 (2020)
Article Google Scholar
Cruz, F., Sidere, N., Coustaty, M., d’Andecy, V.P., Ogier, J.M.: Local binary patterns for document forgery detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1 (2017)
Google Scholar
Cruz, F., Sidère, N., Coustaty, M., Poulain D’Andecy, V., Ogier, J.: Categorization of document image tampering techniques and how to identify them. In: Pattern Recognition and Information Forensics - ICPR 2018 International Workshops, CVAUI, IWCF, and MIPPSNA, Revised Selected Papers, pp. 117–124 (2018)
Google Scholar
Elkasrawi, S., Shafait, F.: Printer identification using supervised learning for document forgery detection. In: 2014 11th IAPR International Workshop on Document Analysis Systems, pp. 146–150 (2014)
Google Scholar
Fridrich, J., Kodovsky, J.: Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 7(3), 868–882 (2012)
Article Google Scholar
Gomez-Krämer, P.: Verifying document integrity. Multimedia Security 2: Biometrics, Video Surveillance and Multimedia Encryption, pp. 59–89 (2022)
Google Scholar
Guo, H., Yuan, S., Wu, X.: Logbert: log anomaly detection via bert. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). 10.48550/ARXIV.1512.03385, https://arxiv.org/abs/1512.03385
James, H., Gupta, O., Raviv, D.: OCR graph features for manipulation detection in documents (2020)
Google Scholar
Kim, J., Kim, H.-J., Kim, H.: Fraud detection for job placement using hierarchical clusters-based deep neural networks. Appl. Intell. 49(8), 2842–2861 (2019)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kowshalya, G., Nandhini, M.: Predicting fraudulent claims in automobile insurance. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT) (2018)
Google Scholar
Lee, Y., Kim, J., Kang, P.: Lanobert: system log anomaly detection based on bert masked language model. arXiv preprint arXiv:2111.09564 (2021)
Li, P., et al.: Selfdoc: self-supervised document representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5652–5660 (2021)
Google Scholar
Li, Y., Yan, C., Liu, W., Li, M.: Research and application of random forest model in mining automobile insurance fraud. In: 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) (2016)
Google Scholar
Liu, Y., et al.: Roberta: A robustly optimized bert pretraining approach. ArXiv abs/1907.11692 (2019)
Google Scholar
Martin, L., et al.: CamemBERT: a tasty French language model. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7203–7219. Association for Computational Linguistics (2020). 10.18653/v1/2020.acl-main.645, https://aclanthology.org/2020.acl-main.645
Mikkilineni, A.K., Chiang, P.J., Ali, G.N., Chiu, G.T., Allebach, J.P., Delp III, E.J.: Printer identification based on graylevel co-occurrence features for security and forensic applications. In: Security, Steganography, and Watermarking of Multimedia Contents VII, vol. 5681, pp. 430–440. International Society for Optics and Photonics (2005)
Google Scholar
Mishra, A., Ghorpade, C.: Credit card fraud detection on the skewed data using various classification and ensemble techniques. In: 2018 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS) (2018)
Google Scholar
Moreno, J.G., Boros, E., Doucet, A.: TLR at the NTCIR-15 FinNum-2 task: improving text classifiers for numeral attachment in financial social data. In: Proceedings of the 15th NTCIR Conference on Evaluation of Information Access Technologies, Tokyo Japan, pp. 8–11 (2020)
Google Scholar
Nadim, A.H., Sayem, I.M., Mutsuddy, A., Chowdhury, M.S.: Analysis of machine learning techniques for credit card fraud detection. In: 2019 International Conference on Machine Learning and Data Engineering (iCMLDE), pp. 42–47 (2019)
Google Scholar
Nigrini, M.J.: Benford’s Law: Applications for Forensic Accounting, Auditing, and Fraud Detection, vol. 586. Wiley (2012)
Google Scholar
Rabah, C.B., Coatrieux, G., Abdelfattah, R.: The supatlantique scanned documents database for digital image forensics purposes. In: 2020 IEEE International Conference on Image Processing (ICIP) (2020)
Google Scholar
Rizki, A.A., Surjandari, I., Wayasti, R.A.: Data mining application to detect financial fraud in indonesia’s public companies. In: 2017 3rd International Conference on Science in Information Technology (ICSITech) (2017)
Google Scholar
Rossi, A., Firmani, D., Matinata, A., Merialdo, P., Barbosa, D.: Knowledge graph embedding for link prediction: a comparative analysis. ACM Trans. Knowl. Discov. Data 15(2), 14:1-14:49 (2021)
Article Google Scholar
Shang, S., Kong, X., You, X.: Document forgery detection using distortion mutation of geometric parameters in characters. J. Electron. Imaging 24(2), 023008 (2015)
Article Google Scholar
Sidere, N., Cruz, F., Coustaty, M., Ogier, J.M.: A dataset for forgery detection and spotting in document images. In: 2017 Seventh International Conference on Emerging Security Technologies (EST) (2017)
Google Scholar
Tornés, B.M., Boros, E., Doucet, A., Gomez-Krämer, P., Ogier, J.M., d’Andecy, V.P.: Knowledge-based techniques for document fraud detection: a comprehensive study. In: Gelbukh, A. (eds.) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol. 13451, pp. 17–33. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-24337-0_2
Van Beusekom, J., Shafait, F., Breuel, T.M.: Text-line examination for document forgery detection. Int. J. Doc. Anal. Recogn. (IJDAR) 16(2), 189–207 (2013)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. Advances in Neural Information Processing Systems 30 (2017)
Google Scholar
Vidros, S., Kolias, C., Kambourakis, G., Akoglu, L.: Automatic detection of online recruitment frauds: characteristics, methods, and a public dataset. Future Internet 9(1), 6 (2017)
Article Google Scholar
Xu, Y., et al.: Layoutlmv2: multi-modal pre-training for visually-rich document understanding. In: ACL-IJCNLP 2021 (2021)
Google Scholar
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: Layoutlm: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020)
Google Scholar

Download references

Acknowledgements

This work was supported by the French defence innovation agency (AID), the VERINDOC project funded by the Nouvelle-Aquitaine Region.

Author information

Authors and Affiliations

University of La Rochelle, L3i, 17000, La Rochelle, France
Beatriz Martínez Tornés, Emanuela Boros, Antoine Doucet, Petra Gomez-Krämer & Jean-Marc Ogier

Authors

Beatriz Martínez Tornés
View author publications
You can also search for this author in PubMed Google Scholar
Emanuela Boros
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Doucet
View author publications
You can also search for this author in PubMed Google Scholar
Petra Gomez-Krämer
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Ogier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Beatriz Martínez Tornés .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tornés, B.M., Boros, E., Doucet, A., Gomez-Krämer, P., Ogier, JM. (2023). Detecting Forged Receipts with Domain-Specific Ontology-Based Entities & Relations. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14189. Springer, Cham. https://doi.org/10.1007/978-3-031-41682-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-41682-8_12
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41681-1
Online ISBN: 978-3-031-41682-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Detecting Forged Receipts with Domain-Specific Ontology-Based Entities & Relations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical Joint Entity Recognition and Relation Extraction of Contextual Entities in Family History Records

CimpleKG: A Continuously Updated Knowledge Graph on Misinformation, Factors and Fact-Checks

Fraud detection with natural language processing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Detecting Forged Receipts with Domain-Specific Ontology-Based Entities & Relations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical Joint Entity Recognition and Relation Extraction of Contextual Entities in Family History Records

CimpleKG: A Continuously Updated Knowledge Graph on Misinformation, Factors and Fact-Checks

Fraud detection with natural language processing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.