Web Foundations (URI & HTTP)

Web Architecture (INFO 290-03)

Erik Wilde, UC Berkeley School of Information
2007-09-06
Creative Commons License

This work is licensed under a CC
Attribution 3.0 Unported License

Abstract

The Web assumes an underlying network infrastructure providing a reliable, connection-oriented, flow-controlled, end-to-end transport service. Based on such a network service (today provided by the Internet), the Web's transport protocol moves representations of resources identified by a Uniform Resource Identifier (URI) between Web servers and clients. The most important protocols for data transfer on the Web is the Hypertext Transfer Protocol (HTTP).

Web Server Service

Outline (Uniform Resource Identifier (URI))

  1. Uniform Resource Identifier (URI) [5]
  2. Hypertext Transfer Protocol (HTTP) [12]
    1. HTTP Basics [7]
    2. Content Negotiation [3]
  3. Proxies [3]
  4. Conclusions [1]

Resource Identification

URI Schemes

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
[…] the URI syntax is a federated and extensible naming system wherein each scheme's specification may further restrict the syntax and semantics of identifiers using that scheme.

Uniform Resource Identifier (URI): Generic Syntax, RFC 3986, January 2005

Query Information

The query component contains non-hierarchical data that, along with data in the path component […], serves to identify a resource within the scope of the URI's scheme and naming authority […].

Uniform Resource Identifier (URI): Generic Syntax, RFC 3986, January 2005

Processing URIs

Resources vs. Representations

Outline (Hypertext Transfer Protocol (HTTP))

  1. Uniform Resource Identifier (URI) [5]
  2. Hypertext Transfer Protocol (HTTP) [12]
    1. HTTP Basics [7]
    2. Content Negotiation [3]
  3. Proxies [3]
  4. Conclusions [1]

The Web's Protocol

internet-traffic-trends.png

provided by CacheLogic Inc.

DNS & HTTP

The two basic protocols which every Web browser must implement are DNS access and HTTP. However, most operating systems provide an API for DNS access, so the browser can use this service locally and only has to implement HTTP. TCP (which is required as the foundation for HTTP) is usually provided by the operating system.

browser-dns-http.png

Outline (HTTP Basics)

  1. Uniform Resource Identifier (URI) [5]
  2. Hypertext Transfer Protocol (HTTP) [12]
    1. HTTP Basics [7]
    2. Content Negotiation [3]
  3. Proxies [3]
  4. Conclusions [1]

HTTP Messages

start-line
message-header *

message-body ?

HTTP Header Fields

HTTP Requests

Method Request-URI HTTP/Major.Minor
[Header]*

[Entity]?

HTTP GET

GET / HTTP/1.1
Host: ischool.berkeley.edu

HTTP Responses

HTTP/Major.Minor Status-Code Text
[Header]*

[Entity]?

HTTP Performance

HTTP Connection Handling

http-phttp-pipelining.png

Outline (Content Negotiation)

  1. Uniform Resource Identifier (URI) [5]
  2. Hypertext Transfer Protocol (HTTP) [12]
    1. HTTP Basics [7]
    2. Content Negotiation [3]
  3. Proxies [3]
  4. Conclusions [1]

What is Content Negotiation?

Three Different Variants

Server Side Content Negotiation

Outline (Proxies)

  1. Uniform Resource Identifier (URI) [5]
  2. Hypertext Transfer Protocol (HTTP) [12]
    1. HTTP Basics [7]
    2. Content Negotiation [3]
  3. Proxies [3]
  4. Conclusions [1]

Proxies

Browsers & Proxies

A proxy is configured in the browser (manually or automatically), so that the browser sends all requests to the proxy instead of the target Web server. The proxy then forwards the request. Proxies can be chained, so that the requests and responses travel through a number of HTTP systems.

proxy.png

Firewalls

Outline (Conclusions)

  1. Uniform Resource Identifier (URI) [5]
  2. Hypertext Transfer Protocol (HTTP) [12]
    1. HTTP Basics [7]
    2. Content Negotiation [3]
  3. Proxies [3]
  4. Conclusions [1]

Web Server Service