0% found this document useful (0 votes)
39 views15 pages

UNIT 3 Notes

The document discusses the history and architecture of the World Wide Web (WWW). It was created by Tim Berners-Lee in 1989 to allow researchers to access one another's work. Today, the web is a distributed system of linked documents stored on servers around the world. A web client uses a browser to request documents from servers via protocols like HTTP. Documents can be static files or dynamically generated content. The uniform resource locator (URL) uniquely identifies documents and consists of the protocol, host, port, and path.

Uploaded by

Hanuma Naik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views15 pages

UNIT 3 Notes

The document discusses the history and architecture of the World Wide Web (WWW). It was created by Tim Berners-Lee in 1989 to allow researchers to access one another's work. Today, the web is a distributed system of linked documents stored on servers around the world. A web client uses a browser to request documents from servers via protocols like HTTP. Documents can be static files or dynamically generated content. The uniform resource locator (URL) uniquely identifies documents and consists of the protocol, host, port, and path.

Uploaded by

Hanuma Naik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Nitte Education Trust

Nitte Meenakshi Institute of Technology


Yelahanka, Bangalore- 560064, Karnataka- India.
Department of Electonics and Communication Engineering
Semester: 6th C Staff In charge: Dr. B.S. Pavan
Subject Name: Computer Network and Applications Subject Code: 18EC61

UNIT 3 Notes:
Source
TEXT BOOKS
Text Book - Data Communications and Networking, 5/e By Behrouz A Forouzan, published
by McGraw-Hill

WORLD WIDE WEB (WWW)

The idea of the Web was first proposed by Tim Berners-Lee in 1989 at CERN†, the European
Organization for Nuclear Research, to allow several researchers at different locations
throughout Europe to access each others’ researches. The commercial Web started in the early
1990s.
The Web today is a repository of information in which the documents, called web
pages, are distributed all over the world and related documents are linked together. The
popularity and growth of the Web can be related to two terms in the above statement:
distributed and linked. Distribution allows the growth of the Web. Each web server in the world
can add a new web page to the repository and announce it to all Internet users without
overloading a few servers. Linking allows one web page to refer to another web page stored in
another server somewhere else in the world. The linking of web pages was achieved using a
concept called hypertext, which was introduced many years before the advent of the Internet.
The idea was to use a machine that automatically retrieved another document stored in the
system when a link to it appeared in the document. The Web implemented this idea
electronically to allow the linked document to be retrieved when the link was clicked by the
user. Today, the term hypertext, coined to mean linked text documents, has been changed to
hypermedia, to show that a web page can be a text document, an image, an audio file, or a
video file.
The purpose of the Web has gone beyond the simple retrieving of linked documents.
Today, the Web is used to provide electronic shopping and gaming. One can use the Web to
listen to radio programs or view television programs whenever one desires without being forced
to listen to or view these programs when they are broadcast.

Architecture
The WWW today is a distributed client-server service, in which a client using a browser can
access a service using a server. However, the service provided is distributed over many
locations called sites. Each site holds one or more web pages. Each web page, however, can
contain some links to other web pages in the same or other sites. In other words, a web page
can be simple or composite. A simple web page has no links to other web pages; a composite
web page has one or more links to other web pages. Each web page is a file with a name and
address.
Assume when it is needed to retrieve a scientific document that contains one reference
to another text file and one reference to a large image. Figure 26.1 shows the situation. The
main document and the image are stored in two separate files (file A and file B) in the same
site; the referenced text file (file C) is stored in another site. Since here three different files will
be dealt, it is needed three transactions if it wants to see the whole document. The first
transaction (request/response) retrieves a copy of the main document (file A), which has
references (pointers) to the second and third files. When a copy of the main document is
retrieved and browsed, the user can click on the reference to the image to invoke the second
transaction and retrieve a copy of the image (file B). If the user needs to see the contents of the
referenced text file, she can click on its reference (pointer) invoking the third transaction and
retrieving a copy of file C. Note that although files A and B both are stored in site I, they are
independent files with different names and addresses. Two transactions are needed to retrieve
them. A very important point that needs to remember is that file A, file B, and file C in Example
26.1 are independent web pages, each with independent names and addresses. Although
references to file B or C are included in file A, it does not mean that each of these files cannot
be retrieved independently. A second user can retrieve file B with one transaction. A third user
can retrieve file C with one transaction.

Web Client (Browser)


A variety of vendors offer commercial browsers that interpret and display a web page, and all
of them use nearly the same architecture. Each browser usually consists of three parts: a
controller, client protocols, and interpreters. (see Figure 26.2).
The controller receives input from the keyboard or the mouse and uses the client programs to
access the document. After the document has been accessed, the controller uses one of the
interpreters to display the document on the screen. The client protocol can be one of the
protocols described later, such as HTTP or FTP. The interpreter can be HTML, Java, or
JavaScript, depending on the type of document. Some commercial browsers include Internet
Explorer, Netscape Navigator, and Firefox.

Web Server
The web page is stored at the server. Each time a request arrives, the corresponding document
is sent to the client. To improve efficiency, servers normally store requested files in a cache in
memory; memory is faster to access than a disk. A server can also become more efficient
through multithreading or multiprocessing. In this case, a server can answer more than one
request at a time. Some popular web servers include Apache and Microsoft Internet
Information Server.

Uniform Resource Locator (URL)


A web page, as a file, needs to have a unique identifier to distinguish it from other web pages.
To define a web page, it is needed three identifiers: host, port, and path. However, before
defining the web page, it is needed to tell the browser what client-server application that wants
to be used, which is called the protocol. This means four identifiers were needed to define the
web page. The first is the type of vehicle to be used to fetch the web page; the last three make
up the combination that defines the destination object (web page).

❑ Protocol. The first identifier is the abbreviation for the client-server program that is needed
in order to access the web page. Although most of the time the protocol is HTTP (HyperText
Transfer Protocol), and other protocols such as FTP (File Transfer Protocol).

❑ Host. The host identifier can be the IP address of the server or the unique name given to the
server. IP addresses can be defined in dotted decimal notation, (such as 64.23.56.17); the name
is normally the domain name that uniquely defines the host.

❑ Port. The port, a 16-bit integer, is normally predefined for the client-server application. For
example, if the HTTP protocol is used for accessing the web page, the well-known port number
is 80. However, if a different port is used, the number can be explicitly given.

❑ Path. The path identifies the location and the name of the file in the underlying operating
system. The format of this identifier normally depends on the operating system. In UNIX, a
path is a set of directory names followed by the file name, all separated by a slash. For example,
/top/next/last/myfile is a path that uniquely defines a file named myfile, stored in the directory
last, which itself is part of the directory next, which itself is under the directory top. In other
words, the path lists the directories from the top to the bottom, followed by the file name.
To combine these four pieces together, the uniform resource locator (URL) has been
designed; it uses three different separators between the four pieces as shown below:

Web Documents
The documents in the WWW can be grouped into three broad categories: static, dynamic, and
active.
Static Documents: Static documents are fixed-content documents that are created and stored
in a server. The client can get a copy of the document only. In other words, the contents of the
file are determined when the file is created, not when it is used. Of course, the contents in the
server can be changed, but the user cannot change them. When a client accesses the document,
a copy of the document is sent. The user can then use a browser to see the document. Static
documents are prepared using one of several languages: HyperText Markup Language
(HTML), Extensible Markup Language (XML), Extensible Style Language (XSL), and
Extensible Hypertext Markup Language (XHTML).

Dynamic Documents: A dynamic document is created by a web server whenever a browser


requests the document. When a request arrives, the web server runs an application program or
a script that creates the dynamic document. The server returns the result of the program or
script as a response to the browser that requested the document. Because a fresh document is
created for each request, the contents of a dynamic document may vary from one request to
another. A very simple example of a dynamic document is the retrieval of the time and date
from a server. Time and date are kinds of information that are dynamic in that they change
from moment to moment. The client can ask the server to run a program such as the date
program in UNIX and send the result of the program to the client. Although the Common
Gateway Interface (CGI) was used to retrieve a dynamic document in the past, today’s options
include one of the scripting languages such as Java Server Pages (JSP), which uses the Java
language for scripting, or Active Server Pages (ASP), a Microsoft product that uses Visual
Basic language for scripting, or ColdFusion, which embeds queries in a Structured Query
Language (SQL) database in the HTML document.

Active Documents: For many applications, it is needed a program or a script to be run at the
client site. These are called active documents. For example, suppose while running a program
that creates animated graphics on the screen or a program that interacts with the user. The
program definitely needs to be run at the client site where the animation or interaction takes
place. When a browser requests an active document, the server sends a copy of the document
or a script. The document is then run at the client (browser) site. One way to create an active
document is to use Java applets, a program written in Java on the server. It is compiled and
ready to be run. The document is in bytecode (binary) format. Another way is to use JavaScripts
but download and run the script at the client site.
HyperText Transfer Protocol (HTTP)

The HyperText Transfer Protocol (HTTP) is used to define how the client-server programs
can be written to retrieve web pages from the Web. An HTTP client sends a request; an HTTP
server returns a response. The server uses the port number 80; the client uses a temporary port
number. HTTP uses the services of TCP, is a connection-oriented and reliable protocol. This
means that, before any transaction between the client and the server can take place, a
connection needs to be established between them. After the transaction, the connection should
be terminated. The client and server, however, do not need to worry about errors in messages
exchanged or loss of any message, because the TCP is reliable.

Nonpersistent versus Persistent Connections

The hypertext concept embedded in web page documents may require several requests and
responses. If the web pages, objects to be retrieved, are located on different servers, it does not
have any other choice than to create a new TCP connection for retrieving each object. However,
if some of the objects are located on the same server, there are two choices: to retrieve each
object using a new TCP connection or to make a TCP connection and retrieve them all. The
first method is referred to as a nonpersistent connection, the second as a persistent connection.
HTTP, prior to version 1.1, specified nonpersistent connections, while persistent connections
are the default in version 1.1, but it can be changed by the user.

Nonpersistent Connections
In a nonpersistent connection, one TCP connection is made for each request/response.
The following lists the steps in this strategy:
1. The client opens a TCP connection and sends a request.
2. The server sends the response and closes the connection.
3. The client reads the data until it encounters an end-of-file marker; it then closes the
connection.

Example 26.3
Figure 26.3 shows an example of a nonpersistent connection. The client needs to access a file
that contains one link to an image. The text file and image are located on the same server. Here
it needs two connections. For each connection, TCP requires at least three handshake messages
to establish the connection, but the request can be sent with the third one. After the connection
is established, the object can be transferred. After receiving an object, another three handshake
messages are needed to terminate the connection. This means that the client and server are
involved in two connection establishments and two connection terminations. If the transaction
involves retrieving 10 or 20 objects, the round trip times spent for these handshakes add up to
a big overhead.
Persistent Connections
HTTP version 1.1 specifies a persistent connection by default. In a persistent connection, the
server leaves the connection open for more requests after sending a response. The server can
close the connection at the request of a client or if a time-out has been reached. The sender
usually sends the length of the data with each response. However, there are some occasions
when the sender does not know the length of the data. This is the case when a document is
created dynamically or actively. In these cases, the server informs the client that the length is
not known and closes the connection after sending the data, so the client knows that the end of
the data has been reached. Time and resources are saved using persistent connections. Only
one set of buffers and variables needs to be set for the connection at each site. The round trip
time for connection establishment and connection termination is saved.

Example 26.4
Figure 26.4 shows the same scenario as in Example 26.3, but using a persistent connection.
Only one connection establishment and connection termination are used, but the request for the
image is sent separately.
Message Formats
The HTTP protocol defines the format of the request and response messages, as shown in
Figure 26.5. Each message is made of four sections. The first section in the request message is
called the request line; the first section in the response message is called the status line. The
other three sections have the same names in the request and response messages. However, the
similarities between these sections are only in the names; they may have different contents.

Request Message
The first line in a request message is called a request line. There are three fields in this line
separated by one space and terminated by two characters (carriage return and line feed) as
shown in Figure 26.5. The fields are called method, URL, and version.
The method field defines the request types. In version 1.1 of HTTP, several methods are
defined, as shown in Table 26.1. Most of the time, the client uses the GET method to send a
request. In this case, the body of the message is empty. The HEAD method is used when the
client needs only some information about the web page from the server, such as the last time it
was modified. It can also be used to test the validity of a URL. The response message in this
case has only the header section; the body section is empty. The PUT method is the inverse of
the GET method; it allows the client to post a new web page on the server (if permitted). The
POST method is similar to the PUT method, but it is used to send some information to the
server to be added to the web page or to modify the web page. The TRACE method is used for
debugging; the client asks the server to echo back the request to check whether the server is
getting the requests. The DELETE method allows the client to delete a web page on the server
if the client has permission to do so. The CONNECT method was originally made as a reserve
method; it may be used by proxy servers. Finally, the OPTIONS method allows the client to
ask about the properties of a web page.

The second field, URL. It defines the address and name of the corresponding web page. The
third field, version, gives the version of the protocol; the most current version of HTTP is 1.1.

After the request line, a zero or more request header lines. Each header line sends additional
information from the client to the server. For example, the client can request that the document
be sent in a special format. Each header line has a header name, a colon, a space, and a header
value (see Figure 26.5). Table 26.2 shows some header names commonly used in a request.
The value field defines the values associated with each header name. The list of values can be
found in the corresponding RFCs. The body can be present in a request message. Usually, it
contains the comment to be sent or the file to be published on the website when the method is
PUT or POST.
Response Message
The format of the response message is also shown in Figure 26.5. A response message consists
of a status line, header lines, a blank line, and sometimes a body. The first line in a response
message is called the status line. There are three fields in this line separated by spaces and
terminated by a carriage return and line feed. The first field defines the version of HTTP
protocol, currently 1.1. The status code field defines the status of the request. It consists of
three digits. Whereas the codes in the 100 range are only informational, the codes in the 200
range indicate a successful request. The codes in the 300 range redirect the client to another
URL, and the codes in the 400 range indicate an error at the client site. Finally, the codes in the
500 range indicate an error at the server site. The status phrase explains the status code in text
form.
After the status line, it can have zero or more response header lines. Each header line
sends additional information from the server to the client. For example, the sender can send
extra information about the document. Each header line has a header name, a colon, a space,
and a header value. Table 26.3 shows some header names commonly used in a response
message.

The body contains the document to be sent from the server to the client. The body is present
unless the response is an error message.
Example 26.5
This example retrieves a document (see Figure 26.6). It used the GET method to retrieve an
image with the path /usr/bin/image1. The request line shows the method (GET), the URL, and
the HTTP version (1.1). The header has two lines that show that the client can accept images
in the GIF or JPEG format. The request does not have a body. The response message contains
the status line and four lines of header. The header lines define the date, server, content
encoding (MIME version, which will be described in electronic mail), and length of the
document. The body of the document follows the header.

Example 26.6
In this example, the client wants to send a web page to be posted on the server. It used the PUT
method. The request line shows the method (PUT), URL, and HTTP version (1.1). There are
four lines of headers. The request body contains the web page to be posted. The response
message contains the status line and four lines of headers. The created document, which is a
CGI document, is included as the body (see Figure 26.7).
Conditional Request
A client can add a condition in its request. In this case, the server will send the requested web
page if the condition is met or inform the client otherwise. One of the most common conditions
imposed by the client is the time and date the web page is modified. The client can send the
header line If-Modified-Since with the request to tell the server that it needs the page only if it
is modified after a certain point in time.

Example 26.7
The following shows how a client imposes the modification data and time condition on a
request.

Cookies
The World Wide Web was originally designed as a stateless entity. A client sends a request; a
server responds. Their relationship is over. The original purpose of the Web, retrieving publicly
available documents, exactly fits this design. Today the Web has other functions that need to
remember some information about the clients; some are listed below:
❑ Websites are being used as electronic stores that allow users to browse through the
store, select wanted items, put them in an electronic cart, and pay at the end with a
credit card.
❑ Some websites need to allow access to registered clients only.
❑ Some websites are used as portals: the user selects the web pages he wants to see.
❑ Some websites are just advertising agencies.
For these purposes, the cookie mechanism was devised.

Creating and Storing Cookies


The creation and storing of cookies depend on the implementation; however, the principle
is the same.
1. When a server receives a request from a client, it stores information about the client
in a file or a string. The information may include the domain name of the client, the
contents of the cookie (information the server has gathered about the client such as
name, registration number, and so on), a timestamp, and other information depending
on the implementation.
2. The server includes the cookie in the response that it sends to the client.
3. When the client receives the response, the browser stores the cookie in the cookie
directory, which is sorted by the server domain name.
Using Cookies
When a client sends a request to a server, the browser looks in the cookie directory to see if it
can find a cookie sent by that server. If found, the cookie is included in the request. When the
server receives the request, it knows that this is an old client, not a new one. Note that the
contents of the cookie are never read by the browser or disclosed to the user. It is a cookie made
by the server and eaten by the server. The following explains, how a cookie is used:

❑ An electronic store (e-commerce) can use a cookie for its client shoppers. When a client
selects an item and inserts it in a cart, a cookie that contains information about the item, such
as its number and unit price, is sent to the browser. If the client selects a second item, the cookie
is updated with the new selection information, and so on. When the client finishes shopping
and wants to check out, the last cookie is retrieved and the total charge is calculated.

❑ The site that restricts access to registered clients only sends a cookie to the client when the
client registers for the first time. For any repeated access, only those clients that send the
appropriate cookie are allowed.

❑ A web portal uses the cookie in a similar way. When a user selects her favorite pages, a
cookie is made and sent. If the site is accessed again, the cookie is sent to the server to show
what the client is looking for.

❑ A cookie is also used by advertising agencies. An advertising agency can place banner ads
on some main website that is often visited by users. The advertising agency supplies only a
URL that gives the advertising agency’s address instead of the banner itself. When a user visits
the main website and clicks the icon of a corporation, a request is sent to the advertising agency.
The advertising agency sends the requested banner, but it also includes a cookie with the ID of
the user. The advertising agency has compiled the interests of the user and can sell this
information to other parties. This use of cookies has made them very controversial. Hopefully,
some new regulations will be devised to preserve the privacy of users.

Example 26.8
Figure 26.8 shows a scenario in which an electronic store can benefit from the use of cookies.
Assume a shopper wants to buy a toy from an electronic store named BestToys. The shopper
browser (client) sends a request to the BestToys server. The server creates an empty shopping
cart (a list) for the client and assigns an ID to the cart (for example, 12343). The server then
sends a response message, which contains the images of all toys available, with a link under
each toy that selects the toy if it is being clicked. This response message also includes the Set-
Cookie header line whose value is 12343. The client displays the images and stores the cookie
value in a file named BestToys. The cookie is not revealed to the shopper. Now the shopper
selects one of the toys and clicks on it. The client sends a request, but includes the ID 12343 in
the Cookie header line. Although the server may have been busy and forgotten about this
shopper, when it receives the request and checks the header, it finds the value 12343 as the
cookie. The server knows that the customer is not new; it searches for a shopping cart with ID
12343. The shopping cart (list) is opened and the selected toy is inserted in the list. The server
now sends another response to the shopper to tell her the total price and ask her to provide
payment. The shopper provides information about her credit card and sends a new request with
the ID 12343 as the cookie value. When the request arrives at the server, it again sees the ID
12343, and accepts the order and the payment and sends a confirmation in a response. Other
information about the client is stored in the server. If the shopper accesses the store sometime
in the future, the client sends the cookie again; the store retrieves the file and has all the
information about the client.

Web Caching: Proxy Servers


HTTP supports proxy servers. A proxy server is a computer that keeps copies of responses to
recent requests. The HTTP client sends a request to the proxy server. The proxy server checks
its cache. If the response is not stored in the cache, the proxy server sends the request to the
corresponding server. Incoming responses are sent to the proxy server and stored for future
requests from other clients. The proxy server reduces the load on the original server, decreases
traffic, and improves latency. However, to use the proxy server, the client must be configured
to access the proxy instead of the target server.
Note that the proxy server acts as both server and client. When it receives a request from a
client for which it has a response, it acts as a server and sends the response to the client. When
it receives a request from a client for which it does not have a response, it first acts as a client
and sends a request to the target server. When the response has been received, it acts again as
a server and sends the response to the client.

Proxy Server Location


The proxy servers are normally located at the client site. This means that it can have a hierarchy
of proxy servers, as shown below:
1. A client computer can also be used as a proxy server, in a small capacity, that stores
responses to requests often invoked by the client.
2. In a company, a proxy server may be installed on the computer LAN to reduce the
load going out of and coming into the LAN.
3. An ISP with many customers can install a proxy server to reduce the load going out
of and coming into the ISP network.

Example 26.9
Figure 26.9 shows an example of a use of a proxy server in a local network, such as the network
on a campus or in a company. The proxy server is installed in the local network. When an
HTTP request is created by any of the clients (browsers), the request is first directed to the
proxy server. If the proxy server already has the corresponding web page, it sends the response
to the client. Otherwise, the proxy server acts as a client and sends the request to the web server
in the Internet. When the response is returned, the proxy server makes a copy and stores it in
its cache before sending it to the requesting client.

Cache Update
A very important question is how long a response should remain in the proxy server before
being deleted and replaced. Several different strategies are used for this purpose. One solution
is to store the list of sites whose information remains the same for a while. For example, a news
agency may change its news page every morning. This means that a proxy server can get the
news early in the morning and keep it until the next day.
Another recommendation is to add some headers to show the last modification time of the
information. The proxy server can then use the information in this header to guess how long
the information would be valid.

HTTP Security
HTTP per se does not provide security. However, HTTP can be run over the Secure Socket
Layer (SSL). In this case, HTTP is referred to as HTTPS. HTTPS provides confidentiality,
client and server authentication, and data integrity.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy