Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


HyperText Transfer Protocol

The innovations that Berners-Lee added to the Internet to create the World Wide Web had two fundamental dimensions: connectivity and interface. He invented a new protocol for the computers to speak as they exchanged hypermedia documents. This Hypertext Transfer Protocol (HTTP) made it very easy for any computer on the Internet to safely offer up its collection of documents into the greater whole; using HTTP, a computer that asked for a file from another computer would know, when it received the file, if it was a picture, a movie, or a spoken word. With this feature of HTTP, the Internet began to reflect an important truth - retrieving a file's data is almost useless unless you know what kind of data it is. In a sea of Web documents, it's impossible to know in advance what a document is - it could be almost anything - but the Web understands "data types" and passes that information along.
- Mark Pesce, "VRML - Browsing and Building Cyberspace", New Riders Publishing, 1995.


Although an understanding of HTTP is not strictly necessary for the development of CGI applications, some appreciation of "what's under the hood" will certainly help you to develop them with more fluency and confidence. As with any field of endeavour, a grasp of the fundamental underlying principles allows you to visualise the structures and processes involved in the CGI transactions between clients and servers - giving you a more comprehensive mental model on which to base your programming.

Underlying the user interface represented by browsers, is the network and the protocols that travel the wires to the servers or "engines" that process requests, and return the various media. The protocol of the web is known as HTTP, for HyperText Transfer Protocol. HTTP is the underlying mechanism on which CGI operates, and it directly determines what you can and cannot send or receive via CGI.

Tim Berners-Lee implemented the HTTP protocol in 1990-1 at CERN, the European Center for High-Energy Physics in Geneva, Switzerland. HTTP stands at the very core of the World Wide Web. According to the HTTP 1.0 specification, The Hypertext Transfer Protocol (HTTP) is an application-level protocol with the lightness and speed necessary for distributed, collaborative, hypermedia information systems. It is a generic, stateless, object-oriented protocol which can be used for many tasks, such as name servers and distributed object management systems, through extension of its request methods (commands). A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.

HTTP Properties

A comprehensive addressing scheme

The HTTP protocol uses the concept of reference provided by the Universal Resource Identifier (URI) as a location (URL) or name (URN), for indicating the resource on which a method is to be applied. When an HTML hyperlink is composed, the URL (Uniform Resource Locator) is of the general form http://host:port-number/path/file.html. More generally, a URL reference is of the type service://host/file.file-extension and in this way, the HTTP protocol can subsume the more basic Internet services.

HTTP/1.0 is also used for communication between user agents and various gateways, allowing hypermedia access to existing Internet protocols like SMTP, NNTP, FTP, Gopher, and WAIS. HTTP/1.0 is designed to allow communication with such gateways, via proxy servers, without any loss of the data conveyed by those earlier protocols.

Client-Server Architecture

The HTTP protocol is based on a request/response paradigm. The communication generally takes place over a TCP/IP connection on the Internet. The default port is 80, but other ports can be used. This does not preclude the HTTP/1.0 protocol from being implemented on top of any other protocol on the Internet, so long as reliability can be guaranteed.

A requesting program (a client) establishes a connection with a receiving program (a server) and sends a request to the server in the form of a request method, URI, and protocol version, followed by a message containing request modifiers, client information, and possible body content. The server responds with a status line, including its protocol version and a success or error code, followed by a message containing server information, entity metainformation, and possible body content.

The HTTP protocol is connectionless

Although we have just said that the client establishes a connection with a server, the protocol is called connectionless because once the single request has been satisfied, the connection is dropped. Other protocols typically keep the connection open, e.g. in an FTP session you can move around in remote directories, and the server keeps track of who you are, and where you are.

While this greatly simplifies the server construction and relieves it of the performance penalties of session housekeeping, it makes the tracking of user behaviour, e.g. navigation paths between local documents, impossible. Many, if not most, web documents consist of one or more inline images, and these must be retrieved individually, incurring the overhead of repeated connections.

The HTTP protocol is stateless

After the server has responded to the client's request, the connection between client and server is dropped and forgotten. There is no "memory" between client connections. The pure HTTP server implementation treats every request as if it was brand-new, i.e. without context.

CGI applications get around this by encoding the state or a state identifier in hidden fields, the path information, or URLs in the form being returned to the browser. The first two methods return the state or its id when the form is submitted back by the user; the method of encoding state into hyperlinks (URLs) in the form only returns the state (or id) if the user clicks on the link and the link is back to the originating server.

It's often advisable to not encode the whole state but to save it, e.g. in a file, and identify it by means of a unique identifier, such as a sequential integer. Visitor counter programs can be adapted very nicely for this - and thereby become useful. You then only have to send the state identifier in the form, which is advisable if the state vector becomes large - saving network traffic. However you then have to take care of housekeeping the state files, e.g. by periodic clean-up tasks.

An extensible and open representation for data types

HTTP uses Internet Media Types (formerly referred to as MIME Content-Types) to provide open and extensible data typing and type negotiation. For mail applications, where there is no type negotiation between sender and receiver, it's reasonable to put strict limits on the set of allowed media types. With HTTP, where the sender and recipient can communicate directly, applications are allowed more freedom in the use of non-registered types.

When the client sends a transaction to the server, headers are attached that conform to standard Internet e-mail specifications (RFC822). Most client requests expect an answer either in plain text or HTML. When the HTTP Server transmits information back to the client, it includes a MIME-like (Multipart Internet Mail Extension) header to inform the client what kind of data follows the header. Translation then depends on the client possessing the appropriate utility (image viewer, movie player, etc.) corresponding to that data type.

HTTP Header Fields

An HTTP transaction consists of a header followed optionally by an empty line and some data. The header will specify such things as the action required of the server, or the type of data being returned, or a status code. The use of header fields sent in HTTP transactions gives the protocol great flexibility. These fields allow descriptive information to be sent in the transaction, enabling authentication, encryption, and/or user identification. The header is a block of data preceding the actual data, and is often referred to as meta information, because it is information about information.

The header lines received from the client, if any, are placed by the server into the CGI environment variables with the prefix HTTP_ followed by the header name. Any - characters in the header name are changed to _ characters. The server may exclude any headers which it has already processed, such as Authorization, Content-type, and Content-length. If necessary, the server may choose to exclude any or all of these headers if including them would exceed any system environment limits.

An example of this is the HTTP_ACCEPT variable, another example is the header User-Agent.

  • HTTP_ACCEPT The MIME types which the client will accept, as given by HTTP headers. Other protocols may need to get this information from elsewhere. Each item in this list should be separated by commas as per the HTTP spec.

    Format: type/subtype, type/subtype

  • HTTP_USER_AGENT

    The browser the client is using to send the request. General format: software/version library/version.

The server sends back to the client:
  • A status code that indicates whether the request was successful or not. Typical error codes indicate that the requested file was not found, that the request was malformed, or that authentication is required to access the file.
  • The data itself. Since HTTP is liberal about sending documents of any format, it is ideal for transmitting multimedia such as graphics, audio, and video files. This complete freedom to transmit data of any format is one of the most significant advantages of HTTP and the Web.
  • It also sends back information about the object being returned. Note that the following is not a complete list of header fields, and that some of them only make sense in one direction.

Content-Type

The Content-Type header field indicates the media type of the data sent to the recipient or, in the case of the HEAD method, the media type that would have been sent had the request been a GET. This field is used by browsers to know how to deal with the data. The client uses this information to determine how to handle a video file or an inline graphic. An example:
       Content-Type: text/html

Date

The Date header represents the date and time at which the message was originated. An example is
       Date: Tue, 15 Nov 1994 08:12:31 GMT

Expires

The Expires field gives the date after which the information in the document ceases to be valid. Caching clients, including proxies, must not cache this copy of the resource beyond the date given, unless its status has been updated by a later check of the origin server.
       Expires: Thu, 01 Dec 1994 16:00:00 GMT

From

The From header field, if given, should contain an Internet e-mail address for the human user who controls the requesting user agent. An example is:
       From: Stars@WDVL.com
This header field may be used for logging purposes and as a means for identifying the source of invalid or unwanted requests. It should not be used as an insecure form of access protection. The interpretation of this field is that the request is being performed on behalf of the person given, who accepts responsibility for the method performed. In particular, robot agents should include this header so that the person responsible for running the robot can be contacted if problems occur on the receiving end.

If-Modified-Since

The If-Modified-Since header field is used with the GET method to make it conditional: if the requested resource has not been modified since the time specified in this field, a copy of the resource will not be returned from the server; instead, a 304 (not modified) response will be returned without any data. An example of the field is:
       If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT

Last-Modified

The Last-Modified header field indicates the date and time at which the sender believes the resource was last modified. The "Last Modified" field is useful for clients that eliminate unnecessary transfers by using caching. The exact semantics of this field are defined in terms of how the receiver should interpret it: if the receiver has a copy of this resource which is older than the date given by the Last-Modified field, that copy should be considered stale. An example of its use is
       Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT

Location

The Location response header field defines the exact location of the resource that was identified by the request URI. If the value is a full URL, the server returns a "redirect" to the client to retrieve the specified object directly.
Location: http://WWW.Stars.com/Tutorial/HTTP/index.html
If you want to reference another file on your own server, you should output a partial URL, such as the following:
Location: /Tutorial/HTTP/index.html
	
The server will act as if the client had not requested your script, but instead requested http://yourserver/Tutorial/HTTP/index.html. It will take care of all access control, determining the file's type, etc.. In this case clients don't do the redirection, but the server does it "on the fly". Important: Only full URLs in Location field can contain the #label part of URL (i.e. fragment), because that is meant only for the client-side, and the server cannot possibly handle it in any way.

As an example of actual use, the "Ask Dr.Web" form has a Yes/No toggle after the question "Did you search the library and read the FAQ?". The default is No, so if the user doesn't reset this to Yes they will simply be redirected to the FAQ and their question will not be sent.

if	($input{'YN'} eq "No")	{
	print
  "Location: http://WWW.Stars.com/Dr.Web/FAQ.html\n\n";
	}
else	{
	print	"Content-type:	text/html\n\n";
	&Feedback;
	}

Referer

The Referer request header field allows the client to specify, for the server's benefit, the address (URI) of the resource from which the request URI was obtained. This allows a server to generate lists of back-links to resources for interest, logging, optimized caching, etc. It also allows obsolete or mistyped links to be traced for maintenance. Example:
       Referer: http://WWW.Stars.com/index.html
If a partial URI is given, it should be interpreted relative to the request URI. The URI must not include a fragment (#label within a document).

Server

The Server response header field contains information about the software used by the origin server to handle the request. The field can contain multiple product tokens and comments identifying the server and any significant subproducts. By convention, the product tokens are listed in order of their significance for identifying the application. Example:
       Server: CERN/3.0 libwww/2.17

User-Agent

The User-Agent field contains information about the user agent originating the request. This is for statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations - such as inability to support HTML tables. By convention, the product tokens are listed in order of their significance for identifying the application. Example:
       User-Agent: CERN-LineMode/2.15 libwww/2.17b3

HTTP Methods

HTTP/1.0 allows an open-ended set of methods to be used to indicate the purpose of a request. The three most often used methods are GET, HEAD, and POST.

The GET method

The GET method is used to ask for a specific document - when you click on a hyperlink, GET is being used. GET should probably be used when a URL access will not change the state of a database (by, for example, adding or deleting information) and POST should be used when an access will cause a change. The semantics of the GET method changes to a "conditional GET" if the request message includes an If-Modified-Since header field. A conditional GET method requests that the identified resource be transferred only if it has been modified since the date given by the If-Modified-Since header. The conditional GET method is intended to reduce network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring unnecessary data.

The HEAD method

The HEAD method is used to ask only for information about a document, not for the document itself. HEAD is much faster than GET, as a much smaller amount of data is transferred. It's often used by clients who use caching, to see if the document has changed since it was last accessed. If it was not, then the local copy can be reused, otherwise the updated version must be retrieved with a GET. The metainformation contained in the HTTP headers in response to a HEAD request should be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the resource identified by the request URI without transferring the data itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.

The POST method

The POST method is used to transfer data from the client to the server; it's designed to allow a uniform method to cover functions like: annotation of existing resources; posting a message to a bulletin board, newsgroup, mailing list, or similar group of articles; providing a block of data (usually a form) to a data-handling process; extending a database through an append operation.

POST /cgi-bin/post-query HTTP/1.0
Accept: text/html
Accept: video/mpeg
Accept: image/gif
Accept: application/postscript
User-Agent:  Lynx/2.2  libwww/2.14
From:  Stars@WDVL.com
Content-type: application/x-www-form-urlencoded
Content-length: 150
     * a blank line *
org=CyberWeb%20SoftWare
&users=10000
&browsers=lynx
  • This is a "POST" query addressed for the program residing in the file at "/cgi-bin/post-query", that simply echoes the values it receives.

  • The client lists the MIME-types it is capable of accepting, and identifies itself and the version of the WWW library it is using.

  • Finally, it indicates the MIME-type it has used to encode the data it is sending, the number of character included, and the list of variables and their values it has collected from the user.

  • MIME-type application/x-www-form-urlencoded means that the variable name-value pairs will be encoded the same way a URL is encoded. Any special characters, including puctuation, will be encoded as %nn where nn is the ASCII value for the character in hex.

HTTP Response

Here is an example of an HTTP response from a server to a client request:
HTTP/1.0 200 OK
Date: Wednesday, 02-Feb-95 23:04:12 GMT
Server: NCSA/1.3
MIME-version: 1.0
Last-modified: Monday, 15-Nov-93 23:33:16 GMT
Content-type: text/html
Content-length: 2345
     * a blank line *
<HTML> ...
  • The server agrees to use HTTP version 1.0 for communication and sends the status 200 indicating it has successfully processed the client's request.

  • It then sends the date and identifies itself as an NCSA HTTP server.

  • It also indicates it is using MIME version 1.0 to describe the information it is sending, and includes the MIME-type of the information about to be sent in the "Content-type:" header.

  • Finally, it sends the number of characters it is going to send, followed by a blank line and the data itself.

  • Client and server headers are RFC 822 compliant mail headers. A Client may send any number of Accept: headers and the server is expected to convert the data into a form the client can accept.

The HyperText Transfer Protocol - Next Generation

The essential simplicity of HTTP has been a major factor in its rapid adoption, but this very simplicity has become its main drawback; the next generation of HTTP, dubbed " HTTP-NG", will be a replacement for HTTP 1.0 with much higher performance and adding some extra features needed for use in commercial applications. It's designed to make it easy to implement the basic functionality needed by all browsers, whilst making the addition of more powerful features such as security and authentication much simpler.

The current HTTP 1.0 often causes performance problems on the server side, and on the network, since it sets up a new connection for every request. Simon Spero has published a progress report on what the W3C calls "HTTP Next Generation", or HTTP-NG. HTTP-NG "divides up the connection [between client and server] into lots of different channels ... each object is returned over its own channel." HTTP-NG allows many different requests to be sent over a single connection. These requests are asynchronous - there's no need for the client to wait for a response before sending out a new request. The server can also respond to requests in any order it sees fit - it can even interweave the data from multiple objects, allowing several images to be transferred in "parallel".

To make these multiple data streams easy to work with, HTTP-NG sends all its messages and data using a "session layer". This divides the connection up into lots of different channels. HTTP-NG sends all control messages (GET requests, meta-information etc) over a control channel. Each object is returned over in its own channel. This also makes redirection much more powerful - for example, if the object is a video the server can return the meta-information over the same connection, together with a URL pointing to a dedicated video transfer protocol that will fetch the data for the relevant object. This becomes very important when working with multimedia aware networking technologies, such as ATM or RSVP. The HTTP-NG protocol will permit complex data types such as video to redirect the URL to a video transfer protocol and only then will the data be fetched for the client.



Up to => Home / Internet / Protocols / HTTP




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers