0% found this document useful (0 votes)

40 views7 pages

Generate searchable PDFs with Azure Form Recognizer

Uploaded by

Condrado Repaso Jr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views7 pages

Generate searchable PDFs with Azure Form Recognizer

Uploaded by

Condrado Repaso Jr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Register Sign In

AI - Azure AI services Blog

AI - A ZURE AI SERVICES BLOG 5 MIN READ

Generate searchable PDFs with Azure Form Recognizer

anatolip MICROSOFT

Oct 17, 2022

PDF documents are widely used in business processes. Digitally created PDFs are very convenient to use.
Text can be searched, highlighted, and annotated. Unfortunately, a lot of PDFs are created by scanning or
converting images to PDFs. There is no digital text in these PDFs, so they cannot be searched. In this blog
post, we demonstrate how to convert such PDFs into searchable PDFs with a simple and easy to use code
and Azure Form Recognizer. The code will generate a searchable PDF file that will allow you to store the
document anywhere, search within the document and copy and paste. Blog content:

Azure Form Recognizer overview

Searchable vs non-searchable PDFs
How to generate a searchable PDF
Pre-requirement installation
How to run searchable PDF script
Searchable PDF Python script

Azure Form Recognizer overview

Azure Form Recognizer is a cloud-based Azure Applied AI Service that uses deep machine-learning models
to extract text, key-value pairs, tables, and form fields from your documents. In this blog post we will use
text extracted by Form Recognizer to add it into PDF to make it searchable.

Searchable vs non-searchable PDFs

If PDF contains text information, user can select, copy/paste, annotate text in the PDF. In searchable PDF
(example), text can be searched and selected, see text highlighting below:

PDF with digital text

If PDF is image-based (example), text cannot be searched or selected. Image compression artifacts are
typically seen around text by zooming in:

Image based PDF

How to generate a searchable PDF
PDFs contain different types of elements: text, images, others. Image-based PDFs contain only image
elements. The goal of this blog is to add invisible text elements into PDF, so users can search and select
these elements. They are invisible to make sure that produced searchable PDF looks identical to original
PDF. In example below word “Transition” is now selectable using invisible text layer:

Invisible text layer

Pre-requirement installation
Please install the following packages before running searchable pdf script:

1. Python packages:

1 pip install --upgrade azure-ai-formrecognizer>=3.3 pypdf>=3.0 reportlab

pillow pdf2image

2. Package pdf2image requires Poppler installation. Please follow instruction

https://pypi.org/project/pdf2image/ based on your platform or use Conda install:

1 conda install -c conda-forge poppler

How to run searchable PDF script

1. Create a Python file using the code below and save it on local machine as
fr_generate_searchable_pdf.py.
2. Update the key and endpoint variables with values from your Azure portal Form Recognizer instance
(see Quickstart: Form Recognizer SDKs for more details).
3. Execute script and pass input file (pdf or image) as parameter:

1 python fr_generate_searchable_pdf.py <input.pdf/jpg>

Sample script output is below:

1 (base) C:\temp>python fr_generate_searchable_pdf.py input.jpg

2 Loading input file input.jpg
3 Starting Azure Form Recognizer OCR process...
4 Azure Form Recognizer finished OCR text for 1 pages.
5 Generating searchable PDF...
6 Searchable PDF is created: input.jpg.ocr.pdf

4. Script generates searchable PDF file with suffix .ocr.pdf.

Searchable PDF Python script
Copy code below and create a Python script on your local machine. The script takes scanned PDF or image
as input and generates a corresponding searchable PDF document using Form Recognizer which adds a
searchable layer to the PDF and enables you to search, copy, paste and access the text within the PDF.

fr_generate_searchable_pdf.py
1 # Script to create searchable PDF from scan PDF or images using Azure Form
2 Recognizer
3 # Required packages
4 # pip install --upgrade azure-ai-formrecognizer>=3.3 pypdf>=3.0 reportlab
5 pillow pdf2image
6 import sys
7 import io
8 import math
9 import argparse
10 from pdf2image import convert_from_path
11 from reportlab.pdfgen import canvas
12 from reportlab.lib import pagesizes
13 from reportlab import rl_config
14 from PIL import Image, ImageSequence
15 from pypdf import PdfWriter, PdfReader
16 from azure.core.credentials import AzureKeyCredential
17 from azure.ai.formrecognizer import DocumentAnalysisClient
18
19 # Please provide your Azure Form Recognizer endpoint and key
20 endpoint = YOUR_FORM_RECOGNIZER_ENDPOINT
21 key = YOUR_FORM_RECOGNIZER_KEY
22
23 def dist(p1, p2):
24 return math.sqrt((p1.x - p2.x)*(p1.x - p2.x) + (p1.y - p2.y) * (p1.y -
25 p2.y))
26
27 if __name__ == '__main__':
28 parser = argparse.ArgumentParser()
29 parser.add_argument('input_file', type=str, help="Input PDF or image
30 (jpg, jpeg, tif, tiff, bmp, png) file name")
31 parser.add_argument('-o', '--output', type=str, required=False,
32 default="", help="Output PDF file name. Default: input_file + .ocr.pdf")
33 args = parser.parse_args()
34
35 input_file = args.input_file
36 if args.output:
37 output_file = args.output
38 else:
39 output_file = input_file + ".ocr.pdf"
40
41 # Loading input file
42 print(f"Loading input file {input_file}")
43 if input_file.lower().endswith('.pdf'):
44 # read existing PDF as images
45 image_pages = convert_from_path(input_file)
46 elif input_file.lower().endswith(('.tif', '.tiff', '.jpg', '.jpeg',
47 '.png', '.bmp')):
48 # read input image (potential multi page Tiff)
49 image_pages = ImageSequence.Iterator(Image.open(input_file))
50 else:
51 sys.exit(f"Error: Unsupported input file extension {input_file}.
52 Supported extensions: PDF, TIF, TIFF, JPG, JPEG, PNG, BMP.")
53
54 # Running OCR using Azure Form Recognizer Read API
55 print(f"Starting Azure Form Recognizer OCR process...")
56 document_analysis_client = DocumentAnalysisClient(endpoint=endpoint,
57 credential=AzureKeyCredential(key), headers={"x-ms-useragent": "searchable-
58 pdf-blog/1.0.0"})
59
60 with open(input_file, "rb") as f:
61 poller = document_analysis_client.begin_analyze_document("prebuilt-
62 read", document = f)
63
64 ocr_results = poller.result()
65 print(f"Azure Form Recognizer finished OCR text for
66 {len(ocr_results.pages)} pages.")
67
67
68
# Generate OCR overlay layer
69
print(f"Generating searchable PDF...")
70
output = PdfWriter()
71
default_font = "Times-Roman"
72
for page_id, page in enumerate(ocr_results.pages):
73
ocr_overlay = io.BytesIO()
74
75
# Calculate overlay PDF page size
76
if image_pages[page_id].height > image_pages[page_id].width:
77
page_scale = float(image_pages[page_id].height) /
78
pagesizes.letter[1]
79
else:
80
page_scale = float(image_pages[page_id].width) /
81
pagesizes.letter[1]
82
83
page_width = float(image_pages[page_id].width) / page_scale
84
page_height = float(image_pages[page_id].height) / page_scale
85
86
scale = (page_width / page.width + page_height / page.height) / 2.0
87
pdf_canvas = canvas.Canvas(ocr_overlay, pagesize=(page_width,
88
page_height))
89
90
# Add image into PDF page
91
pdf_canvas.drawInlineImage(image_pages[page_id], 0, 0,
92
width=page_width, height=page_height, preserveAspectRatio=True)
93
94
text = pdf_canvas.beginText()
95
# Set text rendering mode to invisible
96
text.setTextRenderMode(3)
97
for word in page.words:
98
# Calculate optimal font size
99
desired_text_width = max(dist(word.polygon[0], word.polygon[1]),
100
dist(word.polygon[3], word.polygon[2])) * scale
101
desired_text_height = max(dist(word.polygon[1],
102
word.polygon[2]), dist(word.polygon[0], word.polygon[3])) * scale
103
font_size = desired_text_height
104
actual_text_width = pdf_canvas.stringWidth(word.content,
105
default_font, font_size)
106
107
# Calculate text rotation angle
108
text_angle = math.atan2((word.polygon[1].y - word.polygon[0].y +
109
word.polygon[2].y - word.polygon[3].y) / 2.0,
110
(word.polygon[1].x - word.polygon[0].x +
111
word.polygon[2].x - word.polygon[3].x) / 2.0)
text.setFont(default_font, font_size)
text.setTextTransform(math.cos(text_angle), -
math.sin(text_angle), math.sin(text_angle), math.cos(text_angle),
word.polygon[3].x * scale, page_height - word.polygon[3].y * scale)
text.setHorizScale(desired_text_width / actual_text_width * 100)
text.textOut(word.content + " ")

pdf_canvas.drawText(text)
pdf_canvas.save()

# Move to the beginning of the buffer

ocr_overlay.seek(0)

# Create a new PDF page

new_pdf_page = PdfReader(ocr_overlay)
output.add_page(new_pdf_page.pages[0])

# Save output searchable PDF file

with open(output_file, "wb") as outputStream:
output.write(outputStream)
print(f"Searchable PDF is created: {output_file}")

Updated Jan 25, 2024 VERSION 8.0

AZURE AI DOCUMENT INTELLIGENCE AZURE AI SERVICES

Comment

anatolip MICROSOFT

Joined March 04, 2021

View Profile

AI - Azure AI services Blog

Follow this blog board to get notified when there's new activity

What's new
Surface Pro 9

Surface Laptop 5

Surface Studio 2+

Surface Laptop Go 2

Surface Laptop Studio

Surface Duo 2

Microsoft 365

Windows 11 apps

Microsoft Store
Account profile

Download Center

Microsoft Store support

Returns

Order tracking

Virtual workshops and training

Microsoft Store Promise

Flexible Payments

Education
Microsoft in education

Devices for education

Microsoft Teams for Education

Microsoft 365 Education

Education consultation appointment

Educator training and development

Deals for students and parents

Azure for students

Business
Microsoft Cloud

Microsoft Security

Dynamics 365

Microsoft 365

Microsoft Power Platform

Microsoft Teams

Microsoft Industry

Small Business

Developer & IT
Azure

Developer Center

Documentation

Microsoft Learn

Microsoft Tech Community

Azure Marketplace

AppSource

Visual Studio

Company
Careers

About Microsoft

Company news

Privacy at Microsoft

Investors

Diversity and inclusion

Accessibility

Sustainability

Your Privacy Choices

Microsoft Windows 7 Ultimate
67% (15)
Microsoft Windows 7 Ultimate
9 pages
Introduction to AutoCAD Plant 3D 2021
From Everand
Introduction to AutoCAD Plant 3D 2021
Tutorial Books
4/5 (6)
Hands-On Azure Data Platform: Building Scalable Enterprise-Grade Relational and Non-Relational database Systems with Azure Data Services
From Everand
Hands-On Azure Data Platform: Building Scalable Enterprise-Grade Relational and Non-Relational database Systems with Azure Data Services
Sagar Lad
No ratings yet
Knowledge Check - Training - Microsoft Learn 5
100% (1)
Knowledge Check - Training - Microsoft Learn 5
2 pages
Getting started with Adobe Acrobat Pro
From Everand
Getting started with Adobe Acrobat Pro
Rémy Lentzner
5/5 (1)
Azure for .NET Core Developers: Implementing Microsoft Azure Solutions Using .NET Core Framework
From Everand
Azure for .NET Core Developers: Implementing Microsoft Azure Solutions Using .NET Core Framework
Kasam Ahmed Shaikh
No ratings yet
MEAN Web Development - Second Edition
From Everand
MEAN Web Development - Second Edition
Amos Q. Haviv
No ratings yet
Microsoft Azure Architect Technologies: Exam Guide AZ-300: A guide to preparing for the AZ-300 Microsoft Azure Architect Technologies certification exam
From Everand
Microsoft Azure Architect Technologies: Exam Guide AZ-300: A guide to preparing for the AZ-300 Microsoft Azure Architect Technologies certification exam
Sjoukje Zaal
No ratings yet
Hands-On Microservices with Kubernetes: Build, deploy, and manage scalable microservices on Kubernetes
From Everand
Hands-On Microservices with Kubernetes: Build, deploy, and manage scalable microservices on Kubernetes
Gigi Sayfan
5/5 (1)
Boost.Asio C++ Network Programming: Learn effective C++ network programming with Boost.Asio and become a proficient C++ network programmer
From Everand
Boost.Asio C++ Network Programming: Learn effective C++ network programming with Boost.Asio and become a proficient C++ network programmer
Wisnu Anggoro
5/5 (1)
All Practicals
100% (4)
All Practicals
71 pages
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
Learning Azure DocumentDB
From Everand
Learning Azure DocumentDB
Becker Riccardo
No ratings yet
Visual Studio 2013 and .NET 4.5 Expert Cookbook
From Everand
Visual Studio 2013 and .NET 4.5 Expert Cookbook
Abhishek Sur
4/5 (3)
Node.js 6.x Blueprints
From Everand
Node.js 6.x Blueprints
Fernando Monteiro
No ratings yet
Unstructured Compare CCCurr Hist Implementaion TestCases
No ratings yet
Unstructured Compare CCCurr Hist Implementaion TestCases
20 pages
Windows Azure programming patterns for Start-ups
From Everand
Windows Azure programming patterns for Start-ups
Becker Riccardo
No ratings yet
CodeIgniter 1.7
From Everand
CodeIgniter 1.7
David Upton
No ratings yet
C++ Basics for New Programmers: A Practical Guide with Examples
From Everand
C++ Basics for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
Flask By Example: Unleash the full potential of the Flask web framework by creating simple yet powerful web applications
From Everand
Flask By Example: Unleash the full potential of the Flask web framework by creating simple yet powerful web applications
Gareth Dwyer
4/5 (1)
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
From Everand
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
Adam Freeman
No ratings yet
How To Program A Mobile Game
From Everand
How To Program A Mobile Game
Duong Tran
4/5 (1)
Mastering D3.js
From Everand
Mastering D3.js
Pablo Navarro Castillo
3/5 (1)
Bootstrap for ASP.NET MVC - Second Edition
From Everand
Bootstrap for ASP.NET MVC - Second Edition
Pieter van der Westhuizen
5/5 (1)
Odoo 10 Development Essentials
From Everand
Odoo 10 Development Essentials
Daniel Reis
No ratings yet
Mastering Azure Serverless Computing: Design and Implement End-to-End Highly Scalable Azure Serverless Solutions with Ease
From Everand
Mastering Azure Serverless Computing: Design and Implement End-to-End Highly Scalable Azure Serverless Solutions with Ease
Abhishek Mishra
No ratings yet
Ultimate Nuxt.js for Full-Stack Web Applications: Build Production-Grade Server-Side Rendering (SSR) and Static-Site Generated (SSG) Vue.js Applications Using Nuxt.js, Node.js, and Composition API (English Edition)
From Everand
Ultimate Nuxt.js for Full-Stack Web Applications: Build Production-Grade Server-Side Rendering (SSR) and Static-Site Generated (SSG) Vue.js Applications Using Nuxt.js, Node.js, and Composition API (English Edition)
Lau Tiam Kok
No ratings yet
Ultimate Nuxt.js for Full-Stack Web Applications
From Everand
Ultimate Nuxt.js for Full-Stack Web Applications
Lau Tiam Kok
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
Firebase Storage for Angular: A reliable file upload solution for your applications
From Everand
Firebase Storage for Angular: A reliable file upload solution for your applications
Abdelfattah Ragab
No ratings yet
Mastering RethinkDB
From Everand
Mastering RethinkDB
Shahid Shaikh
No ratings yet
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet
CoffeeScript Application Development
From Everand
CoffeeScript Application Development
Ian Young
No ratings yet
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
From Everand
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
Anand Vemula
No ratings yet
Active Directory Administration Cookbook: Actionable, proven solutions to identity management and authentication on servers and in the cloud
From Everand
Active Directory Administration Cookbook: Actionable, proven solutions to identity management and authentication on servers and in the cloud
Sander Berkouwer
No ratings yet
Programming APIs with C# and .NET: Develop high-performance APIs that ensure seamless application communication and enhanced security
From Everand
Programming APIs with C# and .NET: Develop high-performance APIs that ensure seamless application communication and enhanced security
Jesse Liberty
No ratings yet
Azure For Starters
From Everand
Azure For Starters
Chinmoy Mukherjee
No ratings yet
Mastering Flask Web and API Development: Build and deploy production-ready Flask apps seamlessly across web, APIs, and mobile platforms
From Everand
Mastering Flask Web and API Development: Build and deploy production-ready Flask apps seamlessly across web, APIs, and mobile platforms
Sherwin John C. Tragura
No ratings yet
iOS App Development Portable Genius
From Everand
iOS App Development Portable Genius
Richard Wentk
No ratings yet
IBM Cognos 8 Planning
From Everand
IBM Cognos 8 Planning
Jason Edwards
No ratings yet
Google Visualization API Essentials
From Everand
Google Visualization API Essentials
Traci L. Ruthkoski
3/5 (1)
Learn Microsoft Azure: Step by Step in 7 day for .NET Developers
From Everand
Learn Microsoft Azure: Step by Step in 7 day for .NET Developers
Saillesh Pawar
No ratings yet
Microsoft Azure AI-102 Practice Tests
From Everand
Microsoft Azure AI-102 Practice Tests
CertSquad Professional Trainers
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
Practical Business Intelligence
From Everand
Practical Business Intelligence
Ahmed Sherif
3/5 (1)
Adobe Acrobat X PDF Bible
From Everand
Adobe Acrobat X PDF Bible
Ted Padova
No ratings yet
Learning DHTMLX Suite UI
From Everand
Learning DHTMLX Suite UI
Eli Geske
No ratings yet
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
From Everand
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
Abdelfattah Ragab
No ratings yet
Flash with Drupal
From Everand
Flash with Drupal
Travis Tidwell
No ratings yet
Ext JS Data-driven Application Design
From Everand
Ext JS Data-driven Application Design
Kazuhiro Kotsutsumi
No ratings yet
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
FuelPHP Application Development Blueprints
From Everand
FuelPHP Application Development Blueprints
Sébastien Drouyer
No ratings yet
A Beginners Guide to Cursor
From Everand
A Beginners Guide to Cursor
Steven Mcananey
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Hyper-V Network Virtualization Cookbook
From Everand
Hyper-V Network Virtualization Cookbook
Ryan Boud
No ratings yet
Easy Programming for Everyone
From Everand
Easy Programming for Everyone
Umar Asghar
No ratings yet
Modern DevOps Practices: Implement, secure, and manage applications on the public cloud by leveraging cutting-edge tools
From Everand
Modern DevOps Practices: Implement, secure, and manage applications on the public cloud by leveraging cutting-edge tools
Gaurav Agarwal
No ratings yet
Beginner’s Guide to Adobe InDesign
From Everand
Beginner’s Guide to Adobe InDesign
Richard D. Coleman
No ratings yet
PHP MySQL Development of Login Modul: 3 hours Easy Guide
From Everand
PHP MySQL Development of Login Modul: 3 hours Easy Guide
Esstree Ishak Abdullah
5/5 (1)
Learning Google Cloud Vertex AI: Build, deploy, and manage machine learning models with Vertex AI (English Edition)
From Everand
Learning Google Cloud Vertex AI: Build, deploy, and manage machine learning models with Vertex AI (English Edition)
Hemanth Kumar K
No ratings yet
HTML5,CSS3,Javascript and JQuery Mobile Programming: Beginning to End Cross-Platform App Design
From Everand
HTML5,CSS3,Javascript and JQuery Mobile Programming: Beginning to End Cross-Platform App Design
Stephen J Link
5/5 (3)
How To Analyze A PDF With The Layout-Parser Package. - by Brendan Ferris - Towards Data Science
No ratings yet
How To Analyze A PDF With The Layout-Parser Package. - by Brendan Ferris - Towards Data Science
3 pages
Taylor Varga Resume
No ratings yet
Taylor Varga Resume
1 page
Inventory Value Report - Microsoft Dynamics Ax 2012 - Whitepaper
No ratings yet
Inventory Value Report - Microsoft Dynamics Ax 2012 - Whitepaper
10 pages
USB/330 ADSL USB Modems Setup and User's Guide Release R3.0.1
No ratings yet
USB/330 ADSL USB Modems Setup and User's Guide Release R3.0.1
64 pages
iSCSI Initiator User's Guide For Windows 7
No ratings yet
iSCSI Initiator User's Guide For Windows 7
107 pages
Firewall ports required to be opened for WSUS server in DMZ
No ratings yet
Firewall ports required to be opened for WSUS server in DMZ
2 pages
Case 2 - Scott McNealy and The Rise and Decline of Sun Microsystems
No ratings yet
Case 2 - Scott McNealy and The Rise and Decline of Sun Microsystems
20 pages
IManage Supported Platforms (9.x, English)
No ratings yet
IManage Supported Platforms (9.x, English)
4 pages
Windows 11 Commercial Licensing Guide
No ratings yet
Windows 11 Commercial Licensing Guide
17 pages
Microsoft Onenote Desktop Quick Guide
No ratings yet
Microsoft Onenote Desktop Quick Guide
1 page
Scott Foresman Homework
100% (1)
Scott Foresman Homework
7 pages
Assignment - Tracking Expenses and Petty Cash
No ratings yet
Assignment - Tracking Expenses and Petty Cash
1 page
ms-102_0
No ratings yet
ms-102_0
55 pages
There's No Good Decision in The Next Big Data Privacy Case
No ratings yet
There's No Good Decision in The Next Big Data Privacy Case
2 pages
Dragomir Vatkov Presentation For Course: Modern Software Technologies
No ratings yet
Dragomir Vatkov Presentation For Course: Modern Software Technologies
28 pages
@Cyber_Monster1 2-Operating System
No ratings yet
@Cyber_Monster1 2-Operating System
13 pages
Primo Back Office Guide
No ratings yet
Primo Back Office Guide
616 pages
Coc Level I
No ratings yet
Coc Level I
5 pages
Winshuttle-TechnicalArchitecture For Automating SAP Processes Using Forms and Workflow-Whitepaper-En
No ratings yet
Winshuttle-TechnicalArchitecture For Automating SAP Processes Using Forms and Workflow-Whitepaper-En
8 pages
MSG 00032
No ratings yet
MSG 00032
3 pages
INDRANIL
No ratings yet
INDRANIL
1 page
FortiOS v5.0 Patch Release 1 Release Notes
No ratings yet
FortiOS v5.0 Patch Release 1 Release Notes
47 pages
Microsoft Azure Certification Roadmap
No ratings yet
Microsoft Azure Certification Roadmap
10 pages
(Microsoft Dynamics Navision Technical Consultant) : Building, and MS Power BI
100% (1)
(Microsoft Dynamics Navision Technical Consultant) : Building, and MS Power BI
8 pages
MAVEN ENTERPRISES
No ratings yet
MAVEN ENTERPRISES
1 page
Welcome Pack - Sherborne Qatar Prep BH
No ratings yet
Welcome Pack - Sherborne Qatar Prep BH
10 pages
Veeam Backup m365 Product Overview
No ratings yet
Veeam Backup m365 Product Overview
2 pages
Enable - Disable A Network Connection Using WMIC - Microsoft Community
No ratings yet
Enable - Disable A Network Connection Using WMIC - Microsoft Community
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Generate searchable PDFs with Azure Form Recognizer

Uploaded by

Generate searchable PDFs with Azure Form Recognizer

Uploaded by

Register Sign In

AI - Azure AI services Blog

AI - A ZURE AI SERVICES BLOG 5 MIN READ

Generate searchable PDFs with Azure Form Recognizer

Oct 17, 2022

Azure Form Recognizer overview

Azure Form Recognizer overview

Searchable vs non-searchable PDFs

PDF with digital text

Image based PDF

Invisible text layer

1 pip install --upgrade azure-ai-formrecognizer>=3.3 pypdf>=3.0 reportlab

2. Package pdf2image requires Poppler installation. Please follow instruction

1 conda install -c conda-forge poppler

How to run searchable PDF script

1 python fr_generate_searchable_pdf.py <input.pdf/jpg>

Sample script output is below:

1 (base) C:\temp>python fr_generate_searchable_pdf.py input.jpg

4. Script generates searchable PDF file with suffix .ocr.pdf.

# Move to the beginning of the buffer

# Create a new PDF page

# Save output searchable PDF file

Updated Jan 25, 2024 VERSION 8.0

AZURE AI DOCUMENT INTELLIGENCE AZURE AI SERVICES

Joined March 04, 2021

AI - Azure AI services Blog

Surface Laptop Studio

Microsoft Store support

Virtual workshops and training

Microsoft Store Promise

Devices for education

Microsoft Teams for Education

Microsoft 365 Education

Education consultation appointment

Educator training and development

Deals for students and parents

Azure for students

Microsoft Power Platform

Microsoft Tech Community

Diversity and inclusion

Your Privacy Choices

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.