Skip to main content

Detecting Forged Receipts with Domain-Specific Ontology-Based Entities & Relations

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Abstract

In this paper, we tackle the task of document fraud detection. We consider that this task can be addressed with natural language processing techniques. We treat it as a regression-based approach, by taking advantage of a pre-trained language model in order to represent the textual content, and by enriching the representation with domain-specific ontology-based entities and relations. We emulate an entity-based approach by comparing different types of input: raw text, extracted entities and a triple-based reformulation of the document content. For our experimental setup, we utilize the single freely available dataset of forged receipts, and we provide a deep analysis of our results in regard to the efficiency of our methods. Our findings show interesting correlations between the types of ontology relations (e.g., has_address, amounts_to), types of entities (product, company, etc.) and the performance of a regression-based language model that could help to study the transfer learning from natural language processing (NLP) methods to boost the performance of existing fraud detection systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The platform is available at https://receipts.univ-lr.fr/.

  2. 2.

    https://owlready2.readthedocs.io/en/v0.37/.

  3. 3.

    This strategy has been previously explored in research for different NLP tasks [10, 11, 35].

References

  1. Abramova, S., et al.: Detecting copy-move forgeries in scanned text documents. Electron. Imaging 2016(8), 1–9 (2016)

    Article  Google Scholar 

  2. Ahmed, A.G.H., Shafait, F.: Forgery detection based on intrinsic document contents. In: 2014 11th IAPR International Workshop on Document Analysis Systems, pp. 252–256 (2014)

    Google Scholar 

  3. Artaud, C., Doucet, A., Ogier, J.M., d’Andecy, V.P.: Receipt dataset for fraud detection. In: First International Workshop on Computational Document Forensics (2017)

    Google Scholar 

  4. Artaud, C., Sidère, N., Doucet, A., Ogier, J.M., Yooz, V.P.D.: Find it! fraud detection contest report. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 13–18 (2018)

    Google Scholar 

  5. Artaud, C.: Détection des fraudes : de l’image à la sémantique du contenu. Application à la vérification des informations extraites d’un corpus de tickets de caisse, PhD Thesis, University of La Rochelle (2019)

    Google Scholar 

  6. Behera, T.K., Panigrahi, S.: Credit card fraud detection: a hybrid approach using fuzzy clustering & neural network. In: 2015 Second International Conference on Advances in Computing and Communication Engineering (2015)

    Google Scholar 

  7. Benchaji, I., Douzi, S., El Ouahidi, B.: Using genetic algorithm to improve classification of imbalanced datasets for credit card fraud detection. In: International Conference on Advanced Information Technology, Services and Systems (2018)

    Google Scholar 

  8. Bertrand, R., Gomez-Krämer, P., Terrades, O.R., Franco, P., Ogier, J.M.: A system based on intrinsic features for fraudulent document detection. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 106–110. Washington, DC (2013)

    Google Scholar 

  9. Bertrand, R., Terrades, O.R., Gomez-Krämer, P., Franco, P., Ogier, J.M.: A conditional random field model for font forgery detection. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 576–580 (2015)

    Google Scholar 

  10. Boros, E., Moreno, J., Doucet, A.: Event detection with entity markers. In: European Conference on Information Retrieval, pp. 233–240 (2021)

    Google Scholar 

  11. Boros, E., Moreno, J.G., Doucet, A.: Exploring entities in event detection as question answering. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13185, pp. 65–79. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99736-6_5

    Chapter  Google Scholar 

  12. Carta, S., Fenu, G., Recupero, D.R., Saia, R.: Fraud detection for e-commerce transactions by employing a prudential multiple consensus model. J. Inf. Secur. Appl. 46, 13–22 (2019)

    Google Scholar 

  13. Cozzolino, D., Gragnaniello, D., Verdoliva, L.: Image forgery detection through residual-based local descriptors and block-matching. In: 2014 IEEE International Conference on Image Processing (ICIP) (2014)

    Google Scholar 

  14. Cozzolino, D., Poggi, G., Verdoliva, L.: Efficient dense-field copy-move forgery detection. IEEE Trans. Inf. Forensics Secur. 10(11), 2284–2297 (2015)

    Article  Google Scholar 

  15. Cozzolino, D., Verdoliva, L.: Camera-based image forgery localization using convolutional neural networks. In: 2018 26th European Signal Processing Conference (EUSIPCO) (2018)

    Google Scholar 

  16. Cozzolino, D., Verdoliva, L.: Noiseprint: A CNN-based camera model fingerprint. IEEE Trans. Inf. Forensics Secur. 15, 144–159 (2020)

    Article  Google Scholar 

  17. Cruz, F., Sidere, N., Coustaty, M., d’Andecy, V.P., Ogier, J.M.: Local binary patterns for document forgery detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1 (2017)

    Google Scholar 

  18. Cruz, F., Sidère, N., Coustaty, M., Poulain D’Andecy, V., Ogier, J.: Categorization of document image tampering techniques and how to identify them. In: Pattern Recognition and Information Forensics - ICPR 2018 International Workshops, CVAUI, IWCF, and MIPPSNA, Revised Selected Papers, pp. 117–124 (2018)

    Google Scholar 

  19. Elkasrawi, S., Shafait, F.: Printer identification using supervised learning for document forgery detection. In: 2014 11th IAPR International Workshop on Document Analysis Systems, pp. 146–150 (2014)

    Google Scholar 

  20. Fridrich, J., Kodovsky, J.: Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 7(3), 868–882 (2012)

    Article  Google Scholar 

  21. Gomez-Krämer, P.: Verifying document integrity. Multimedia Security 2: Biometrics, Video Surveillance and Multimedia Encryption, pp. 59–89 (2022)

    Google Scholar 

  22. Guo, H., Yuan, S., Wu, X.: Logbert: log anomaly detection via bert. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2021)

    Google Scholar 

  23. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). 10.48550/ARXIV.1512.03385, https://arxiv.org/abs/1512.03385

  24. James, H., Gupta, O., Raviv, D.: OCR graph features for manipulation detection in documents (2020)

    Google Scholar 

  25. Kim, J., Kim, H.-J., Kim, H.: Fraud detection for job placement using hierarchical clusters-based deep neural networks. Appl. Intell. 49(8), 2842–2861 (2019)

    Article  Google Scholar 

  26. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  27. Kowshalya, G., Nandhini, M.: Predicting fraudulent claims in automobile insurance. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT) (2018)

    Google Scholar 

  28. Lee, Y., Kim, J., Kang, P.: Lanobert: system log anomaly detection based on bert masked language model. arXiv preprint arXiv:2111.09564 (2021)

  29. Li, P., et al.: Selfdoc: self-supervised document representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5652–5660 (2021)

    Google Scholar 

  30. Li, Y., Yan, C., Liu, W., Li, M.: Research and application of random forest model in mining automobile insurance fraud. In: 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) (2016)

    Google Scholar 

  31. Liu, Y., et al.: Roberta: A robustly optimized bert pretraining approach. ArXiv abs/1907.11692 (2019)

    Google Scholar 

  32. Martin, L., et al.: CamemBERT: a tasty French language model. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7203–7219. Association for Computational Linguistics (2020). 10.18653/v1/2020.acl-main.645, https://aclanthology.org/2020.acl-main.645

  33. Mikkilineni, A.K., Chiang, P.J., Ali, G.N., Chiu, G.T., Allebach, J.P., Delp III, E.J.: Printer identification based on graylevel co-occurrence features for security and forensic applications. In: Security, Steganography, and Watermarking of Multimedia Contents VII, vol. 5681, pp. 430–440. International Society for Optics and Photonics (2005)

    Google Scholar 

  34. Mishra, A., Ghorpade, C.: Credit card fraud detection on the skewed data using various classification and ensemble techniques. In: 2018 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS) (2018)

    Google Scholar 

  35. Moreno, J.G., Boros, E., Doucet, A.: TLR at the NTCIR-15 FinNum-2 task: improving text classifiers for numeral attachment in financial social data. In: Proceedings of the 15th NTCIR Conference on Evaluation of Information Access Technologies, Tokyo Japan, pp. 8–11 (2020)

    Google Scholar 

  36. Nadim, A.H., Sayem, I.M., Mutsuddy, A., Chowdhury, M.S.: Analysis of machine learning techniques for credit card fraud detection. In: 2019 International Conference on Machine Learning and Data Engineering (iCMLDE), pp. 42–47 (2019)

    Google Scholar 

  37. Nigrini, M.J.: Benford’s Law: Applications for Forensic Accounting, Auditing, and Fraud Detection, vol. 586. Wiley (2012)

    Google Scholar 

  38. Rabah, C.B., Coatrieux, G., Abdelfattah, R.: The supatlantique scanned documents database for digital image forensics purposes. In: 2020 IEEE International Conference on Image Processing (ICIP) (2020)

    Google Scholar 

  39. Rizki, A.A., Surjandari, I., Wayasti, R.A.: Data mining application to detect financial fraud in indonesia’s public companies. In: 2017 3rd International Conference on Science in Information Technology (ICSITech) (2017)

    Google Scholar 

  40. Rossi, A., Firmani, D., Matinata, A., Merialdo, P., Barbosa, D.: Knowledge graph embedding for link prediction: a comparative analysis. ACM Trans. Knowl. Discov. Data 15(2), 14:1-14:49 (2021)

    Article  Google Scholar 

  41. Shang, S., Kong, X., You, X.: Document forgery detection using distortion mutation of geometric parameters in characters. J. Electron. Imaging 24(2), 023008 (2015)

    Article  Google Scholar 

  42. Sidere, N., Cruz, F., Coustaty, M., Ogier, J.M.: A dataset for forgery detection and spotting in document images. In: 2017 Seventh International Conference on Emerging Security Technologies (EST) (2017)

    Google Scholar 

  43. Tornés, B.M., Boros, E., Doucet, A., Gomez-Krämer, P., Ogier, J.M., d’Andecy, V.P.: Knowledge-based techniques for document fraud detection: a comprehensive study. In: Gelbukh, A. (eds.) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol. 13451, pp. 17–33. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-24337-0_2

  44. Van Beusekom, J., Shafait, F., Breuel, T.M.: Text-line examination for document forgery detection. Int. J. Doc. Anal. Recogn. (IJDAR) 16(2), 189–207 (2013)

    Article  Google Scholar 

  45. Vaswani, A., et al.: Attention is all you need. Advances in Neural Information Processing Systems 30 (2017)

    Google Scholar 

  46. Vidros, S., Kolias, C., Kambourakis, G., Akoglu, L.: Automatic detection of online recruitment frauds: characteristics, methods, and a public dataset. Future Internet 9(1), 6 (2017)

    Article  Google Scholar 

  47. Xu, Y., et al.: Layoutlmv2: multi-modal pre-training for visually-rich document understanding. In: ACL-IJCNLP 2021 (2021)

    Google Scholar 

  48. Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: Layoutlm: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the French defence innovation agency (AID), the VERINDOC project funded by the Nouvelle-Aquitaine Region.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Beatriz Martínez Tornés .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tornés, B.M., Boros, E., Doucet, A., Gomez-Krämer, P., Ogier, JM. (2023). Detecting Forged Receipts with Domain-Specific Ontology-Based Entities & Relations. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14189. Springer, Cham. https://doi.org/10.1007/978-3-031-41682-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41682-8_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41681-1

  • Online ISBN: 978-3-031-41682-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy