API Endpoints
API Endpoints
1. Imports:
fitz: A module from PyMuPDF, used to handle PDF files (to open and extract text).
Django Imports: Various Django modules for handling HTTP requests and file operations (HttpResponse,
JsonResponse, HttpResponseBadRequest, etc.).
Rest Framework Imports: For handling API views and requests (api_view decorator).
File Handling Imports: For handling file storage and conversion (ContentFile, default_storage).
Model Import: License, which represents the model that stores the extracted data.
@api_view(http_method_names=['POST']): This decorator ensures that the view only accepts POST
requests.
Validation:
If the file is not a PDF (doesn't end with .pdf), another error response is returned.
The file is saved temporarily using Django's default_storage.save(). The file is saved in a temp/ directory
within the storage system.
The fitz.open() method is used to open the uploaded PDF file. If there’s an error opening the PDF (e.g.,
corrupt file), a JsonResponse with error details is returned.
The text from all pages of the PDF is extracted using page.get_text("text") and joined into a single string
(text).
7. Extracting License Information:
Driving License
Business License
Tourism License
The script searches for specific keywords (e.g., "Driving License:", "Business Name:", etc.) to extract data
from the PDF.
If the license_type is identified, relevant fields (like business_name, license_no, etc.) are extracted from
the PDF text and saved to the License model.
After extracting and creating the relevant records, it saves the data to the License model (Django ORM).
9. Cleaning Up:
After processing, the uploaded file is deleted from storage using default_storage.delete(file_path).
If everything goes smoothly, the API returns an HttpResponse confirming that the file has been uploaded
and processed successfully.