Web Foundations (URI & HTTP)

Web Architecture [./]
Fall 2008 — INFO 290-03 (CCN 42584)

Erik Wilde, UC Berkeley School of Information
2008-09-09

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents E. Wilde: Web Foundations (URI & HTTP)

Contents

E. Wilde: Web Foundations (URI & HTTP)

(2) Abstract

The Web assumes an underlying network infrastructure providing a reliable, connection-oriented, flow-controlled, end-to-end transport service. Based on such a network service (today provided by the Internet), the Web's transport protocol moves representations of resources identified by a Uniform Resource Identifier (URI) between Web servers and clients. The most important protocols for data transfer on the Web is the Hypertext Transfer Protocol (HTTP).



E. Wilde: Web Foundations (URI & HTTP)

(3) Web Server Service



Uniform Resource Identifier (URI)

Outline (Uniform Resource Identifier (URI))

  1. Uniform Resource Identifier (URI) [5]
  2. Hypertext Transfer Protocol (HTTP) [12]
    1. HTTP Basics [7]
    2. Content Negotiation [3]
  3. Proxies [3]
  4. Conclusions [1]
Uniform Resource Identifier (URI) E. Wilde: Web Foundations (URI & HTTP)

(5) Resource Identification



Uniform Resource Identifier (URI) E. Wilde: Web Foundations (URI & HTTP)

(6) URI Schemes

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
[…] the URI syntax is a federated and extensible naming system wherein each scheme's specification may further restrict the syntax and semantics of identifiers using that scheme.

Uniform Resource Identifier (URI): Generic Syntax, RFC 3986, January 2005 [http://dret.net/rfc-index/reference/RFC3986]



Uniform Resource Identifier (URI) E. Wilde: Web Foundations (URI & HTTP)

(7) Query Information

The query component contains non-hierarchical data that, along with data in the path component […], serves to identify a resource within the scope of the URI's scheme and naming authority […].

Uniform Resource Identifier (URI): Generic Syntax, RFC 3986, January 2005 [http://dret.net/rfc-index/reference/RFC3986]



Uniform Resource Identifier (URI) E. Wilde: Web Foundations (URI & HTTP)

(8) Processing URIs



Uniform Resource Identifier (URI) E. Wilde: Web Foundations (URI & HTTP)

(9) Resources vs. Representations



Hypertext Transfer Protocol (HTTP)

Outline (Hypertext Transfer Protocol (HTTP))

  1. Uniform Resource Identifier (URI) [5]
  2. Hypertext Transfer Protocol (HTTP) [12]
    1. HTTP Basics [7]
    2. Content Negotiation [3]
  3. Proxies [3]
  4. Conclusions [1]
Hypertext Transfer Protocol (HTTP) E. Wilde: Web Foundations (URI & HTTP)

(11) The Web's Protocol

internet-traffic-trends.png

provided by CacheLogic Inc. [http://www.cachelogic.com/]



Hypertext Transfer Protocol (HTTP) E. Wilde: Web Foundations (URI & HTTP)

(12) DNS & HTTP

The two basic protocols which every Web browser must implement are DNS access and HTTP. However, most operating systems provide an API for DNS access, so the browser can use this service locally and only has to implement HTTP. TCP (which is required as the foundation for HTTP) is usually provided by the operating system.

browser-dns-http.png

HTTP Basics

HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(14) HTTP Messages

  • HTTP needs a reliable connection
    • the foundation for HTTP is the Transmission Control Protocol (TCP) [Internet Foundations; Transmission Control Protocol (TCP) (1)]
    • DNS resolution yields an IP address
    • open TCP connection to port 80 or port specified in URI (http://rosetta.sims.berkeley.edu:8085/)
  • HTTP is a text-based protocol
    • the connection is used to transmit text messages
    • all HTTP messages are human-readable (not all entities, though)
    • basic HTTP operations can be carried out by hand
start-line
message-header *

message-body ?


HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(15) HTTP Header Fields

  • Header fields contain information about the message
    • general header: Date as the message origination date
    • request header: Accept-Language indicated language preferences
    • response header: Server contains system information
    • entity header: Content-Type specifies the media type of the entity
  • HTTP defines a number of header fields [http://www.cs.tut.fi/~jkorpela/http.html]
    • unknown fields must be ignored (extensibility)
    • unstandardized fields should use a X- prefix
  • HTTP is about acting on these fields
    • HTTP defines what HTTP implementations must or should do


HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(16) HTTP Requests

  • After opening a connection, the client sends a request
    • the method indicates the action to be performed on the resource
    • HTTP's most interesting methods are: GET, HEAD, POST
    • other interesting methods are: PUT, DELETE
  • The URI identifies the resource to which the request should be applied
    • absolute URIs are required when contacting Proxies [Proxies (1)]
    • absolute paths are required when contacting a server directly
    • the URI may contain Query Information [Query Information (1)]
    • fragment identifiers are not sent (they are interpreted on the client side)
  • The Host header field must be included in every request
Method Request-URI HTTP/Major.Minor
[Header]*

[Entity]?


HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(17) HTTP GET

  • Retrieval action based on the URI
    • maybe implemented by reading a file
    • maybe implemented by processing a file (PHP)
    • maybe implemented by invoking a process
  • Semantics may change based on header fields
    • If-*: only reply with the entity if necessary
    • Range: only reply with the requested part of the entity
  • Cacheability depends on header fields of the response
GET / HTTP/1.1
Host: ischool.berkeley.edu


HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(18) HTTP Responses

  • The server's response to interpreting a request
    • the status code is given numerically and as text
    • 2** for variations of ok
    • 3** for redirections
    • 4** are different client side problems (404: not found)
    • 5** are different server side problems
  • Header fields specify additional information
    • information about the server
    • information about the entity (media type, encoding, language)
HTTP/Major.Minor Status-Code Text
[Header]*

[Entity]?


HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(19) HTTP Performance

  • HTTP/1.0 allowed one transaction per connection
    • TCP connection setup and teardown are expensive
    • TCP's slow start slows down the initial phase of data transfer
    • typical Web pages use between 10-20 resources (HTML + images)
    • typically, these resources are stored on the same server
  • HTTP/1.1 introduces persistent connections
    • the TCP connection stays open for some time (10 sec is a popular choice)
    • additional requests to the same server use the same TCP connection
  • HTTP/1.1 introduces pipelined connections
    • instead of waiting for a response, requests can be queued
    • the server responds as fast as possible
    • the order may not be changed (there is no sequence number)


HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(20) HTTP Connection Handling

http-phttp-pipelining.png

Content Negotiation

Outline (Content Negotiation)

  1. Uniform Resource Identifier (URI) [5]
  2. Hypertext Transfer Protocol (HTTP) [12]
    1. HTTP Basics [7]
    2. Content Negotiation [3]
  3. Proxies [3]
  4. Conclusions [1]
Content Negotiation E. Wilde: Web Foundations (URI & HTTP)

(22) What is Content Negotiation?

  • Negotiation between two HTTP peers
    • resources may be available in different representations
    • possible dimensions are language, graphics format, character encoding, …
    • using one URI, it should be possible to get the best resource
  • Negotiation requires knowledge about the resource user
    • languages depend on humans reading pages
    • graphics formats depend on the browser's functionality
  • Content negotiation is a form of a Web-based service
    • client request a URI and have some constraints
    • using these constraints, the best representation should be served
    • ideally, content negotiation should not be too expensive


Content Negotiation E. Wilde: Web Foundations (URI & HTTP)

(23) Three Different Variants

  • Server Side Content Negotiation
    • the server has a set of representations and information from the request
    • the server returns the best representation based on the request
  • Client Side Content Negotiation
    • the server responds with a list of different representations
    • the client (browser or user) makes a choice and sends a second request
  • Transparent Content Negotiation
    • Caches act as in client side negotiation and thus know the available representations
    • Clients contacting the cache can be served by the cache as in server side negotiation


Content Negotiation E. Wilde: Web Foundations (URI & HTTP)

(24) Server Side Content Negotiation

  • Clients usually tell something about themselves
    • Accept, Accept-Charset, Accept-Encoding, Accept-Language
    • the server also knows their IP address
    • the server may also use additional information (Cookie [State Management; Cookie (1)]s)
  • The server needs to find the best representation
    • most easily by matching the request with available representations
    • could also be implemented more dynamically by generating new representations


Proxies

Outline (Proxies)

  1. Uniform Resource Identifier (URI) [5]
  2. Hypertext Transfer Protocol (HTTP) [12]
    1. HTTP Basics [7]
    2. Content Negotiation [3]
  3. Proxies [3]
  4. Conclusions [1]
Proxies E. Wilde: Web Foundations (URI & HTTP)

(26) Proxies



Proxies E. Wilde: Web Foundations (URI & HTTP)

(27) Browsers & Proxies

A proxy is configured in the browser (manually or automatically), so that the browser sends all requests to the proxy instead of the target Web server. The proxy then forwards the request. Proxies can be chained, so that the requests and responses travel through a number of HTTP systems.

proxy.png

Proxies E. Wilde: Web Foundations (URI & HTTP)

(28) Firewalls



Conclusions

Outline (Conclusions)

  1. Uniform Resource Identifier (URI) [5]
  2. Hypertext Transfer Protocol (HTTP) [12]
    1. HTTP Basics [7]
    2. Content Negotiation [3]
  3. Proxies [3]
  4. Conclusions [1]
Conclusions E. Wilde: Web Foundations (URI & HTTP)

(30) Web Server Service



2008-09-09 Web Architecture [./]
Fall 2008 — INFO 290-03 (CCN 42584)