0% found this document useful (0 votes)

6 views14 pages

Module 4

The document provides an overview of web scraping, defining it as an automated process for extracting data from the web, and contrasting it with web crawling. It discusses the components of a web scraper, its working steps, and the advantages of using Python for web scraping projects. Additionally, it includes examples of downloading files from the web using Python's requests library.

Uploaded by

Muhammed adhil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views14 pages

Module 4

Uploaded by

Muhammed adhil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Module-4

Web Scraping
Python Web Scraping

Web scraping is an automatic process of extracting information from

web.
What is Web Scraping?
The dictionary meaning of word ‘Scrapping’ implies getting something
from the web. Here two questions arise: What we can get from the web
and How to get that.

The answer to the first question is ‘data’. Data is indispensable for any
programmer and the basic requirement of every programming project
is the large amount of useful data.
The answer to the second question is a bit tricky, because there are lots of ways to
get data. In general, we may get data from a database or data file and other sources.
But what if we need large amount of data that is available online? One way to get
such kind of data is to manually search (clicking away in a web browser) and save
(copy-pasting into a spreadsheet or file) the required data. This method is quite
tedious and time consuming. Another way to get such data is using web scraping.

Web scraping, also called web data mining or web harvesting, is the process of
constructing an agent which can extract, parse, download and organize useful
information from the web automatically. In other words, we can say that instead of
manually saving the data from websites, the web scraping software will
automatically load and extract data from multiple websites as per our requirement.
Web Crawling v/s Web Scraping
The terms Web Crawling and Scraping are often used interchangeably
as the basic concept of them is to extract data. However, they are
different from each other. We can understand the basic difference from
their definitions.
Web crawling is basically used to index the information on the page
using bots aka crawlers. It is also called indexing. On the hand, web
scraping is an automated way of extracting the information using bots
aka scrapers. It is also called data extraction.
Scraper bots are tools or pieces of code used to extract data from web
pages. These bots are like tiny spiders that run through different web
pages in a website to extract the specific data they were created to get.
The process of extracting data with a scraper bot is called web scraping
To understand the difference between these two terms, let us look into
the comparison table given hereunder −
Uses of Web Scraping
The uses and reasons for using web scraping are as endless as the uses of the World
Wide Web. Web scrapers can do anything like ordering online food, scanning online
shopping website for you and buying ticket of a match the moment they are available
etc. just like a human can do. Some of the important uses of web scraping are discussed
here −
E-commerce Websites − Web scrapers can collect the data specially related to the price
of a specific product from various e-commerce websites for their comparison.
Content Aggregators − Web scraping is used widely by content aggregators like news
aggregators and job aggregators for providing updated data to their users.
Marketing and Sales Campaigns − Web scrapers can be used to get the data like emails,
phone number etc. for sales and marketing campaigns.
Search Engine Optimization (SEO) − Web scraping is widely used by SEO tools like
SEMRush, Majestic etc. to tell business how they rank for search keywords that matter to
them.
Data for Machine Learning Projects − Retrieval of data for machine learning projects
depends upon web scraping.
Data for Research − Researchers can collect useful data for the purpose of their research
work by saving their time by this automated process.
Components of a Web Scraper
A web scraper consists of the following components −
Web Crawler Module
A very necessary component of web scraper, web crawler module, is used to navigate the
target website by making HTTP or HTTPS request to the URLs. The crawler downloads the
unstructured data (HTML contents) and passes it to extractor, the next module.
Extractor
The extractor processes the fetched HTML content and extracts the data into semistructured
format. This is also called as a parser module and uses different parsing techniques like
Regular expression, HTML Parsing, DOM parsing or Artificial Intelligence for its functioning.
Data Transformation and Cleaning Module
The data extracted above is not suitable for ready use. It must pass through some cleaning
module so that we can use it. The methods like String manipulation or regular expression can
be used for this purpose. Note that extraction and transformation can be performed in a
single step also.
Storage Module
After extracting the data, we need to store it as per our requirement. The storage module will
output the data in a standard format that can be stored in a database or JSON or CSV format.
Working of a Web Scraper
Web scraper may be defined as a software or script used to download
the contents of multiple web pages and extracting data from it.
Step 1: Downloading Contents from Web Pages
In this step, a web scraper will download the requested contents from multiple web
pages.
Step 2: Extracting Data
The data on websites is HTML and mostly unstructured. Hence, in this step, web scraper
will parse and extract structured data from the downloaded contents.
Step 3: Storing the Data
Here, a web scraper will store and save the extracted data in any of the format like CSV,
JSON or in database.
Step 4: Analyzing the Data
After all these steps are successfully done, the web scraper will analyze the data thus
obtained.
Why Python for Web Scraping?
Python is a popular tool for implementing web scraping. Python programming language is also used
for other useful projects related to cyber security, penetration testing as well as digital forensic
applications. Using the base programming of Python, web scraping can be performed without using
any other third party tool.
Python programming language is gaining huge popularity and the reasons that make Python a good
fit for web scraping projects are as below −
Syntax Simplicity
Python has the simplest structure when compared to other programming languages. This feature of
Python makes the testing easier and a developer can focus more on programming.
Inbuilt Modules
Another reason for using Python for web scraping is the inbuilt as well as external useful libraries it
possesses. We can perform many implementations related to web scraping by using Python as the
base for programming.
Open Source Programming Language
Python has huge support from the community because it is an open source programming language.
Wide range of Applications
Python can be used for various programming tasks ranging from small shell scripts to enterprise
web applications.
Downloading files from web
using Python
Requests is a versatile HTTP library in python with various applications.
One of its applications is to download a file from web using the file URL.
Installation: First of all, you would need to download the requests
library. You can directly install it using pip by typing following
command:
pip install requests
# imported the requests library
import requests
image_url = "https://www.python.org/static/community_logos/python-logo-master-v3-
TM.png"
# URL of the image to be downloaded is defined as image_url
r = requests.get(image_url) # create HTTP response object
# send a HTTP request to the server and save
# the HTTP response in a response object called r
with open("python_logo.png",'wb') as f:
# Saving received content as a png file in
# binary format
# write the contents of the response (r.content)
# to a new file in binary mode.
f.write(r.content)
This small piece of code written above will download the following image from the web.
Now check your local directory(the folder where this script resides), and you will find this
image:
with statement in Python is used in exception handling to make the code cleaner and much more
readable. It simplifies the management of common resources like file streams.
without using with statement
file = open('file_path', 'w')
try:
file.write('hello world')
finally:
file.close()
# using with statement
with open('file_path', 'w') as file:
file.write('hello world !')

Notice that unlike the first two implementations, there is no need to call file.close() when using with
statement. The with statement itself ensures proper acquisition and release of resources. An
exception during the file.write() call in the first implementation can prevent the file from closing
properly which may introduce several bugs in the code, i.e. many changes in files do not go into
effect until the file is properly closed.
Python provides different modules like urllib, requests etc to download files
from the web.
1. Import module
import requests
2. Get the link or url
url = 'https://www.facebook.com/favicon.ico'
r = requests.get(url, allow_redirects=True)
3. Save the content with name.
open('facebook.ico', 'wb').write(r.content)
save the file as facebook.ico.
Example
import requests
url = 'https://www.facebook.com/favicon.ico'
r = requests.get(url, allow_redirects=True)
open('facebook.ico', 'wb').write(r.content)

A Practical Guide To Web Scraping (PDFDrive)
No ratings yet
A Practical Guide To Web Scraping (PDFDrive)
107 pages
Scraperapi Web Scrapping The Basics Explained
No ratings yet
Scraperapi Web Scrapping The Basics Explained
15 pages
19, 9852 1825 01 Service Manual ST14 DD
100% (3)
19, 9852 1825 01 Service Manual ST14 DD
144 pages
Linux Essentials Full Course
100% (5)
Linux Essentials Full Course
210 pages
Web Scraping Job Portals: Ashutosh Kumar, Kinshuk Chauhan, Jaspreet Kaur Grewal
No ratings yet
Web Scraping Job Portals: Ashutosh Kumar, Kinshuk Chauhan, Jaspreet Kaur Grewal
13 pages
DAP 4 Module
No ratings yet
DAP 4 Module
45 pages
Beginners
No ratings yet
Beginners
16 pages
Web Scraping - Unit 1
100% (1)
Web Scraping - Unit 1
31 pages
Q-1 Web Scraping: Definition and Significance
No ratings yet
Q-1 Web Scraping: Definition and Significance
4 pages
Domain PR Check List3!!! (8647)
No ratings yet
Domain PR Check List3!!! (8647)
304 pages
Amazon WEB Scrapin G: Using Python
No ratings yet
Amazon WEB Scrapin G: Using Python
9 pages
Data - Collection Python
No ratings yet
Data - Collection Python
40 pages
Web Scraping Using Python
No ratings yet
Web Scraping Using Python
18 pages
Data Collection
No ratings yet
Data Collection
10 pages
Webscraping
No ratings yet
Webscraping
12 pages
Web Scraping With Python Tutorials From A To Z
100% (2)
Web Scraping With Python Tutorials From A To Z
35 pages
DCIM 216 Summer 2023 #Lab 9 Web Scrapers and Spiders
No ratings yet
DCIM 216 Summer 2023 #Lab 9 Web Scrapers and Spiders
7 pages
UE20CS203-Unit1-Class6-Scraping The Web, Reading Files (.CSV)
No ratings yet
UE20CS203-Unit1-Class6-Scraping The Web, Reading Files (.CSV)
29 pages
Web Crawling - Python
No ratings yet
Web Crawling - Python
34 pages
Web Scraping With Python
No ratings yet
Web Scraping With Python
16 pages
Web Scraping 2
No ratings yet
Web Scraping 2
14 pages
Web Scrapping
100% (1)
Web Scrapping
20 pages
Web Scraping
No ratings yet
Web Scraping
5 pages
Introduction To Web Crawling Chapter - 13
No ratings yet
Introduction To Web Crawling Chapter - 13
3 pages
20 - BeautifulSoup Library For Web Scraping
No ratings yet
20 - BeautifulSoup Library For Web Scraping
12 pages
1.8 Data Scrapping PDF
No ratings yet
1.8 Data Scrapping PDF
42 pages
Web Scraping Presentation With Images
No ratings yet
Web Scraping Presentation With Images
4 pages
Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application
No ratings yet
Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application
25 pages
Web Scraping
No ratings yet
Web Scraping
16 pages
Seminar Completed
No ratings yet
Seminar Completed
22 pages
Text Processing For NLP Web Scrapping
No ratings yet
Text Processing For NLP Web Scrapping
18 pages
Template
No ratings yet
Template
21 pages
Unit 11 Application Development Using Python
No ratings yet
Unit 11 Application Development Using Python
19 pages
Scraping Book
No ratings yet
Scraping Book
50 pages
2python Web Scraping Introduction
No ratings yet
2python Web Scraping Introduction
4 pages
Scraping
100% (1)
Scraping
25 pages
Web Scraping
No ratings yet
Web Scraping
4 pages
Web Scraping Ganesh
0% (1)
Web Scraping Ganesh
20 pages
Data Analysis by Web Scraping Using Python
No ratings yet
Data Analysis by Web Scraping Using Python
6 pages
Web Scrapping Final
No ratings yet
Web Scrapping Final
7 pages
4F IntroToWebScraping
No ratings yet
4F IntroToWebScraping
6 pages
Python Web Scraping Tutorial
92% (12)
Python Web Scraping Tutorial
65 pages
Semin
No ratings yet
Semin
8 pages
Satp Installation Guide 3.2
No ratings yet
Satp Installation Guide 3.2
81 pages
Book
No ratings yet
Book
162 pages
Arindam Manna, Financial Analytics
No ratings yet
Arindam Manna, Financial Analytics
9 pages
Web Scraping Using Python - Notes
No ratings yet
Web Scraping Using Python - Notes
6 pages
Web Scraping, Web Harvesting, or Web Data Extraction Is
No ratings yet
Web Scraping, Web Harvesting, or Web Data Extraction Is
1 page
E-Commerce Review Scrapper: Python Mini Project On
No ratings yet
E-Commerce Review Scrapper: Python Mini Project On
15 pages
AZ-900T00 Microsoft Azure Fundamentals-01
No ratings yet
AZ-900T00 Microsoft Azure Fundamentals-01
21 pages
Upadhyay (2017) - Articulating The Construction of A Web Scraper For
No ratings yet
Upadhyay (2017) - Articulating The Construction of A Web Scraper For
4 pages
Com 059
No ratings yet
Com 059
6 pages
KCNA The Linux Foundation Exam Practice Questions
No ratings yet
KCNA The Linux Foundation Exam Practice Questions
14 pages
Scraping Book Python PDF
No ratings yet
Scraping Book Python PDF
50 pages
20 - 3 - A Study
No ratings yet
20 - 3 - A Study
5 pages
@7724353 PDF
No ratings yet
@7724353 PDF
5 pages
Introduction To Web Scraping
100% (1)
Introduction To Web Scraping
3 pages
Web Scraping
No ratings yet
Web Scraping
28 pages
Sing Rodia 2019
No ratings yet
Sing Rodia 2019
6 pages
CHEMISTRY - 3.1 Accuracy Precision Practice Sig Figs and Sci Notation
100% (1)
CHEMISTRY - 3.1 Accuracy Precision Practice Sig Figs and Sci Notation
20 pages
Implementation of Web Application For Disease Prediction Using AI
No ratings yet
Implementation of Web Application For Disease Prediction Using AI
5 pages
Web Scraping With Python - Sample Chapter
100% (3)
Web Scraping With Python - Sample Chapter
26 pages
Web Scraping With Python and Selenium: Sarah Fatima, Shaik Luqmaan Nuha Abdul Rasheed
No ratings yet
Web Scraping With Python and Selenium: Sarah Fatima, Shaik Luqmaan Nuha Abdul Rasheed
5 pages
Step by Step Procdure by Power Point Presentation 5289M
No ratings yet
Step by Step Procdure by Power Point Presentation 5289M
34 pages
APS 502 LP Models
No ratings yet
APS 502 LP Models
37 pages
Application Information: Need To Know How? You've Turned To The Right Place - . - Literally
No ratings yet
Application Information: Need To Know How? You've Turned To The Right Place - . - Literally
50 pages
MIPS Addressing Modes
No ratings yet
MIPS Addressing Modes
5 pages
A Dive Into Web Scraper World
100% (1)
A Dive Into Web Scraper World
5 pages
Synopsis WS
No ratings yet
Synopsis WS
11 pages
HP F210 User Manual
No ratings yet
HP F210 User Manual
31 pages
ChatLog Indore ML Python Batch 2 2021 - 07 - 21 15 - 00
No ratings yet
ChatLog Indore ML Python Batch 2 2021 - 07 - 21 15 - 00
22 pages
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
No ratings yet
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
8 pages
Agile Methology
No ratings yet
Agile Methology
29 pages
Citra Log - Txt.old
No ratings yet
Citra Log - Txt.old
6 pages
K Fold
No ratings yet
K Fold
2 pages
Function ConvertCurrencyToEnglish
No ratings yet
Function ConvertCurrencyToEnglish
8 pages
SAP Tables - Overview
No ratings yet
SAP Tables - Overview
3 pages
Knight's Tour
No ratings yet
Knight's Tour
8 pages
A Case Study Application of Linear Programming and Simulation To Mine Planning
No ratings yet
A Case Study Application of Linear Programming and Simulation To Mine Planning
9 pages
MX-CPG Bim Impplan Rev0
No ratings yet
MX-CPG Bim Impplan Rev0
17 pages
OREDA
No ratings yet
OREDA
2 pages
M241 Logic Controller - Hardware Guide
No ratings yet
M241 Logic Controller - Hardware Guide
1 page
ECall Letter
No ratings yet
ECall Letter
2 pages
ICT Trivia
No ratings yet
ICT Trivia
9 pages
SET-280. Controlling AC Lamp Dimmer Through Mobile Phone
No ratings yet
SET-280. Controlling AC Lamp Dimmer Through Mobile Phone
3 pages
Android Instructions - Freedom Pro Keyboard
No ratings yet
Android Instructions - Freedom Pro Keyboard
2 pages
Typing Lessons
No ratings yet
Typing Lessons
2 pages
Apple Assignment
No ratings yet
Apple Assignment
8 pages
Web Scraping with Python Step by Step: A Practical Guide with Examples
From Everand
Web Scraping with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Web Scraping for SEO with Python
From Everand
Web Scraping for SEO with Python
Enrique Vicente
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Module 4

Uploaded by

Module 4

Uploaded by

Module-4

Web scraping is an automatic process of extracting information from

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.