Skip to content

fix: enables detection of pdf in URLs, and parsing of pdf content via URL #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 4, 2025

Conversation

lzamparo
Copy link
Contributor

Closes #25

Modifies ingest.py to correctly dispatch to PDFParser based on retrieved content header, and modifies PDFParser to retrieve content again and parse from a tempfile. This isn't super clean, but should be a good starting point for caching & parsing PDFs as the issue identified

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 21, 2025
@lzamparo
Copy link
Contributor Author

lzamparo commented Jun 3, 2025

@init27 any chance to have a quick look at this? Don't want to get too stale

@init27 init27 merged commit 1903098 into meta-llama:main Jul 4, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue on ingest PDF
3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy