World Wide Web and HTTP
World Wide Web and HTTP
ARCHITECTURE
The WWW is a distributed client-server service.
A client using a browser can access a service using a server.
The service provided is distributed over many locations called sites.
Each site holds one or more documents, referred to as web pages.
A web page can be simple or composite.
A simple web page has no link to other Web pages.
A composite web page has one or more links to other web pages.
Each web page is a file with a name and address.
Assume we need to retrieve a web page that contains text with pictures. Since the pictures are
not stored as separate files, the whole document is a simple web page. It can be retrieved using
one single request/response transaction, as shown in figure
Now assume we need to retrieve a document that contains one reference to another text file
and one reference to a large image.
The main document and the image are stored in two separate files in the same site (file A and
file B); the referenced text file is stored in another site (file C).
Since we are dealing with three different files, we need three transactions if we want to see the
whole document.
The first transaction (request/response) retrieves a copy of the main document (file A), which
has a reference (pointer) to the second and the third files.
When a copy of the main document is retrieved and browsed, the user can click on the
reference to the image to invoke the second transaction and retrieve a copy of the image (file
B).
If the user further needs to see the contents of the referenced text file, she can click on its
reference (pointer) invoking the third transaction and retrieving a copy of the file C.
Note that although files A and B both are stored in site I, they are independent files with
different names and addresses. Two transactions are needed to retrieve them.
The controller receives input from the keyboard or the mouse and uses the client protocols
such as FTP, or TELNET, or HTTP to access the document. After the document has been
accessed, the controller uses one of the interpreters to display the document on the screen.
Web Server
The web page is stored at the server.
Each time a client request arrives, the corresponding document is sent to the client.
To improve efficiency, servers normally store requested files in a cache in memory; memory is
faster to access than disk.
HTTP Transaction
The figure illustrates the HTTP transaction between the client and server.
Although HTTP uses the services of TCP, HTTP itself is a stateless protocol, which means that
the server does not keep information about the client.
The client initializes the transaction by sending a request. The server replies by sending a
response.
Request Message
A request message consists of a request line, a header, and sometimes a body.
Request Line
The first line in a request message is called a request line. There are three fields in this line
separated by some character delimiter as shown in the figure. The fields are called methods,
URL, and Version. These three should be separated by a space character. At the end two
characters, a carriage return followed by a line feed, terminate the line. The method field
defines the request type. In version 1.1 of HTTP, several methods are defined, as shown in the
table.
The second field URL defines the address and name of corresponding web page.
The third field, version, gives the version of the protocol; the most current version of HTTP is 3
(2022).
The value field defines the values associated with each header name. The list of values can be
found in the corresponding RFCs.
Status Line
The first line in a response message is called the status line.
There are three fields in this line separated by spaces and terminated by a carriage return and
line feed.
The first field defines the version of HTTP protocol, currently 1.1.
The status code field defines the status of the request. It consists of three digits.
Codes in the 100 range are only informational.
Codes in the 200 range indicate a successful request.
Codes in the 300 range redirect the client to another URL.
Codes in the 400 range indicate an error at the client site.
Codes in the 500 range indicate an error at the server site.
The status phrase explains the status code in text form.
The possible values for the status code and status phrase are shown in table.
Header Lines in Response Message
After the status line, we can have zero or more response header lines.
Each header line sends additional information from the server to the client. For example, the
sender can send extra information about the document.
Each header line has a header name, a colon, a space, and a header value.
Table shows some header names commonly used in a response message.
Body
The body contains the document to be sent from the server to the client.
The body is present unless the response is an error message.
Example 1
This example retrieves a document
We use the GET method to retrieve an image with the path /usr/bin/image1.
The request line shows the method (GET), the URL, and the HTTP version (1.1).
The header has two lines that show that the client can accept images in the GIF or JPEG format.
The request does not have a body.
The response message contains the status line and four lines of header.
The header lines define the date, server, MIME version, and length of the document.
The body of the document follows the header.
Example 2
In this example, the client wants to send data to the server.
We use the POST method.
The request line shows the method (POST), URL, and HTTP version (1.1).
There are four lines of headers.
The request body contains the input information.
The response message contains the status line and four lines of headers.
The created document, which is a CGI document, is included as the body.
Example 3
HTTP uses ASCII characters. The following figure shows how a client can directly connect to a
server using TELNET, which logs into port 80.
The first three lines shows that the connection is successful.
We then type three lines. The first shows the request line (GET method), the second is the
header (defining the host), the third is a blank terminating the request.
The server response is seven lines starting with the status line.
The blank line at the end terminates the server response.
The file of 14,230 lines is received after the blank line (not shown here).
The last line is the output by the client.
Conditional Request
A client can add a condition in its request.
In this case, the server will send the requested Web page if the condition is met or inform the
client otherwise.
One of the most common conditions imposed by the client is the time and date the Web page is
modified.
The client can send the header line If-Modified-Since to the request to tell the server that it
needs the page if it is modified after a certain point of time.
Example:
The following shows how a client imposes the modification data and time condition on a
request.
The status line in the responds shows the file is not modified after the defined point of time.
The body of the response message is also empty.
Persistence
HTTP, prior to version 1.1, specified a nonpersistent connection, while a persistent connection
is the default in version 1.1.
Nonpersistent Connection
In a nonpersistent connection, one TCP connection is made for each request/response.
The following lists the steps in this strategy:
1. The client opens a TCP connection and sends a request.
2. The server sends the response and closes the connection.
3. The client reads the data until it encounters an end-of-file marker; it then closes the
connection.
In this strategy, if a file contains link to N different pictures in different files (all located on the
same server), the connection must be opened and closed N + 1 times.
The nonpersistent strategy imposes high overhead on the server because the server needs N +
1 different buffers and requires a slow start procedure each time a connection is opened.
The figure shows an example of a nonpersistent connection.
The client needs to access a file that contains two links to images.
The text file and images are located on the same server.
Persistent Connection
HTTP version 1.1 specifies a persistent connection by default.
In a persistent connection, the server leaves the connection open for more requests after
sending a response.
The server can close the connection at the request of a client or if a time-out has been reached.
The sender usually sends the length of the data with each response.
However, there are some occasions when the sender does not know the length of the data. This
is the case when a document is created dynamically or actively. In these cases, the server
informs the client that the length is not known and closes the connection after sending the data
so the client knows that the end of the data has been reached.
The figure shows an example of a persistent connection.
Cookies
The World Wide Web was originally designed as a stateless entity.
A client sends a request; a server responds. Their relationship is over.
The original design of WWW exactly fits this purpose.
Today the web has other functions; some are listed below:
Websites are being used as electronic stores that allow users to browse through the
store, select wanted items, put them in an electronic cart, and pay at the end with a
credit card.
Some websites need to allow access to registered clients only.
Some websites are used as portals: The user selects the web pages he wants to see.
Some websites are just advertising.
For these purposes, the cookie mechanism was devised.
HTTP cookies (also called web cookies, Internet cookies, browser cookies, or simply cookies) are
small blocks of data created by a web server while a user is browsing a website and placed on
the user's computer or other device by the user's web browser. Cookies are placed on the
device used to access a website, and more than one cookie may be placed on a user's device
during a session.
Cookies serve useful and sometimes essential functions on the web. They enable web servers to
store stateful information (such as items added in the shopping cart in an online store) on the
user's device or to track the user's browsing activity (including clicking particular
buttons, logging in, or recording which pages were visited in the past). They can also be used to
save for subsequent use information that the user previously entered into form fields, such as
names, addresses, passwords, and payment card numbers.
.
Creating and Storing Cookies
The creation and storing of cookies depend on the implementation; however, the principle
is the same.
1. When a server receives a request from a client, it stores information about the client in a file
or a string. The information may include the domain name of the client, the contents of the
cookie (information the server has gathered about the client such as name, registration
number, and so on), a timestamp, and other information depending on the implementation.
2. The server includes the cookie in the response that it sends to the client.
3. When the client receives the response, the browser stores the cookie in the cookie directory,
which is sorted by the domain server name.
Using Cookies
When a client sends a request to a server, the browser looks in the cookie directory to see if it
can find a cookie sent by that server.
If found, the cookie is included in the request. When the server receives the request, it knows
that this is an old client, not a new one. Note that the contents of the cookie are never read by
the browser or disclosed to the user. It is a cookie made by the server and eaten by the server.
Now let us see how a cookie is used for the four previously mentioned purposes:
An electronic store (e-commerce) can use a cookie for its client shoppers. When a client selects
an item and inserts it into a cart, a cookie that contains information about the item, such as its
number and unit price, is sent to the browser. If the client selects a second item, the cookie is
updated with the new selection information. And so on. When the client finishes shopping and
wants to check out, the last cookie is retrieved and the total charge is calculated.
The site that restricts access to registered clients only sends a cookie to the client when the
client registers for the first time. For any repeated access, only those clients that send the
appropriate cookie are allowed.
A Web portal uses the cookie in a similar way. When a user selects her favorite pages, a cookie
is made and sent. If the site is accessed again, the cookie is sent to the server to show what the
client is looking for.
A cookie is also used by advertising agencies. An advertising agency can place banner ads on
some main website that is often visited by users. The advertising agency supplies only a URL
that gives the banner address instead of the banner itself. When a user visits the main website
and clicks the icon of an advertised corporation, a request is sent to the advertising agency. The
advertising agency sends the banner, a GIF file for example, but it also includes a cookie with
the ID of the user. Any future use of the banners adds to the database that profiles the Web
behavior of the user. The advertising agency has compiled the interests of the user and can sell
this information to other parties. This use of cookies has made them very controversial.
Hopefully, some new regulations will be devised to preserve the privacy of users.
Example
The figure shows a scenario in which an electronic store can benefit from the use of cookies.
Assume a shopper wants to buy a toy from an electronic store named BestToys. The shopper
browser (client) sends a request to the BestToys server. The server creates an empty shopping
cart (a list) for the client and assigns an ID to the cart (for example, 12343). The server then
sends a response message, which contains the images of all toys available with a link under
each toy that select the toy if it is being clicked. This response message also includes the Set-
Cookie header line whose value is 12343. The client displays the images and store the cookie
value in a file named BestToys. The cookie is not revealed to the shopper.
Now the shopper selects one of the toys and clicks on it. The client sends a request, but
includes the ID 12343 in the Cookie header line. Although the server may have been busy and
has forgotten about this shopper, when it receives the request and check the header it finds the
value 12343 as the cookie. The server knows that the customer is not new, it searches for a
shopping cart with ID 12343. The shopping cart (list) is opened and the selected toy is inserted
to the list. The server now sends another response to the shopper to tell her the total price and
ask her to provide payment. The shopper provides information about her credit card and sends
a new request with the ID 12343 as the cookie value. When the request arrives at the server, it
again sees the ID 12343, and accepts the order and the payment and sends a confirmation in a
response. Other information about the client, such as the credit card number, name, and
address is stored in the server. If the shopper accesses the store sometime in the
future, the client sends the cookie again; the store retrieves the file and has all information
about the client.
Web Caching: Proxy Server
HTTP supports proxy servers. A proxy server is a computer that keeps copies of responses to
recent requests.
The HTTP client sends a request to the proxy server. The proxy server checks its cache. If the
response is not stored in the cache, the proxy server sends the request to the corresponding
server. Incoming responses are sent to the
proxy server and stored for future requests from other clients.
The proxy server reduces the load on the original server, decreases traffic, and improves
latency. However, to use the proxy server, the client must be configured to access the proxy
instead of the target server.
Note that the proxy server acts both as a server and client. When it receives a request from a
client for which it has a response, it acts as a server and sends the response to the client. When
it receives a request from a client for which it does not have a response, it first acts as a server
and sends a request to the target server. When the response has been received, it acts again as
a server and sends the response to the client.
Cache Update
A very important question is how long a response should remain in the proxy server before
being deleted and replaced. Several different strategies are used for this purpose.
One solution is to store the list of sites whose information remains the same for a while. For
example, a news agency may change its news page every morning. This means that a proxy
server can get the news early in the morning and keep it until the next day. Another
recommendation is to add some headers to show the last modification time of
the information. The proxy server can then use the information in this header to guess how
long the information would be valid. There are more recommendations for web caching, but we
leave them to more specific books on this subject.
HTTP Security
The HTTP per se does not provide security.
HTTP can be run over the Secure Socket Layer (SSL).
In this case, HTTP is referred to as HTTPS. HTTPS provides confidentiality, client and server
authentication, and data integrity.