Skip to content

Commit 67c4c1d

Browse files
authored
Merge pull request larymak#125 from Sdccoding/main
Created a PDF_Downloader using Python Web Scraping
2 parents 405bc8c + b50178b commit 67c4c1d

File tree

4 files changed

+33
-0
lines changed

4 files changed

+33
-0
lines changed

PDF_Downloader/Readme.MD

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
This is the readme file of this project.
2+
It's a basic PDF downloader from a certain link.
3+
4+
5+
Install required dependancies
6+
7+
python -m pip install ./requirements.txt
8+
9+
How to run :
10+
11+
python pdf.py

PDF_Downloader/pdf.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
import os
2+
import requests
3+
from urllib.parse import urljoin
4+
from bs4 import BeautifulSoup
5+
6+
#Put the link from which you need to download all the pdf
7+
url = ""
8+
9+
#If there is no such folder, the script will create one automatically
10+
folder_location = r'./NewFolder'
11+
if not os.path.exists(folder_location):os.mkdir(folder_location)
12+
13+
response = requests.get(url)
14+
soup= BeautifulSoup(response.text, "html.parser")
15+
for link in soup.select("a[href$='.pdf']"):
16+
#Name the pdf files using the last portion of each link which are unique in this case
17+
filename = os.path.join(folder_location,link['href'].split('/')[-1])
18+
with open(filename, 'wb') as f:
19+
f.write(requests.get(urljoin(url,link['href'])).content)

PDF_Downloader/requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
beautifulsoup4==4.10.0
2+
requests==2.18.4

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,3 +92,4 @@ The contribution guidelines are as per the guide [HERE](https://github.com/larym
9292
| 49 | [Pomodoro App](https://github.com/HarshitRV/Python-project-Scripts/tree/main/Pomodoro-App) | [HarshitRV](https://github.com/HarshitRV)
9393
| 49 | [BullsAndCows](https://github.com/HarshitRV/Python-project-Scripts/tree/main/BullsAndCows) | [JerryChen](https://github.com/jerrychen1990)
9494
| 50 | [Minesweeper AI](https://github.com/nrp114/Minsweeper_AI) | [Nisarg Patel](https://github.com/nrp114)
95+
| 51 | [PDF Downloader](https://github.com/Sdccoding/Python-project-Scripts/tree/main/PDF_Downloader) | [Souhardya Das Chowdhury](https://github.com/Sdccoding)

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy