Web Foundations (URIs & HTTP)

Web Architecture and Information Management [./]
Spring 2010 — INFO 190-02 (CCN 42509)

Erik Wilde and Ryan Shaw, UC Berkeley School of Information

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License

Contents Erik Wilde and Ryan Shaw: Web Foundations (URIs & HTTP)


Erik Wilde and Ryan Shaw: Web Foundations (URIs & HTTP)

(2) Abstract

The Web's architecture has very simple principles revolving around the ideas of placing a heavy emphasis on a consistent and global identification mechanism for resources, a standardized way of how resource representations can be retrieved, and a standardized way of how resource representations should be usable by using standardized media types. Based on the Internet, the Web's transport protocol transmits representations of resources identified by a Uniform Resource Identifier (URI) between Web servers and clients. The most important protocols for data transfer on the Web is the Hypertext Transfer Protocol (HTTP).

Erik Wilde and Ryan Shaw: Web Foundations (URIs & HTTP)

(3) Web Server Service

Uniform Resource Identifier (URI)

Outline (Uniform Resource Identifier (URI))

  1. Uniform Resource Identifier (URI) [7]
  2. Hypertext Transfer Protocol (HTTP) [14]
    1. HTTP Basics [7]
    2. HTTP Authentication [5]

(5) Resource Identification

Global naming leads to global network effects... the value of an identifier increases the more it is used consistently

Architecture of the World Wide Web, Volume One [http://www.w3.org/TR/webarch/]

(6) URIs & Resources

(7) URIs & Resources

(8) URI Schemes

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

(9) Resources & Representations

(10) 1 Resource, 2 Representations


(11) 2 Resources, 1 Representation


Hypertext Transfer Protocol (HTTP)

Outline (Hypertext Transfer Protocol (HTTP))

  1. Uniform Resource Identifier (URI) [7]
  2. Hypertext Transfer Protocol (HTTP) [14]
    1. HTTP Basics [7]
    2. HTTP Authentication [5]

(13) DNS & HTTP

The two basic protocols which every Web browser must implement are DNS [Internet Architecture; Domain Name System (DNS) (1)] access and HTTP [Hypertext Transfer Protocol (HTTP) (1)]. However, most operating systems provide an API for DNS access, so the browser can use this service locally and only has to implement HTTP. TCP [Internet Architecture; Transmission Control Protocol (TCP) (1)] (which is required as the foundation for HTTP) is usually provided by the operating system.


(14) The Web's Protocol


provided by CacheLogic Inc. [http://www.cachelogic.com/]

HTTP Basics

(16) HTTP Messages

  • HTTP needs a reliable connection
    • the foundation for HTTP is the Transmission Control Protocol (TCP) [Internet Architecture; Transmission Control Protocol (TCP) (1)]
    • DNS resolution yields an IP address
    • open TCP connection to port 80 or port specified in URI (http://rosetta.sims.berkeley.edu:8085/)
  • HTTP is a text-based protocol
    • the connection is used to transmit text messages
    • all HTTP messages are human-readable (not all entities, though)
    • basic HTTP operations can be carried out by hand
          message-header *

          message-body ?

(17) HTTP Header Fields

  • Header fields contain information about the message
    • general header: Date as the message origination date
    • request header: Accept-Language indicates language preferences
    • response header: Server contains system information
    • entity header: Content-Type specifies the media type of the entity
  • HTTP defines a number of header fields [http://www.cs.tut.fi/~jkorpela/http.html]
    • unknown fields must be ignored (extensibility)
    • unstandardized fields should use a X- prefix
  • HTTP is about acting on these fields
    • HTTP defines what HTTP implementations must or should do

(18) HTTP Requests

  • After opening a connection, the client sends a request
    • the method indicates the action to be performed on the resource
    • HTTP's most interesting methods are: GET, HEAD, POST
    • other interesting methods are: PUT, DELETE
  • The URI identifies the resource to which the request should be applied
    • absolute URIs are required when contacting proxies
    • absolute paths are required when contacting a server directly
    • the URI may contain query information
  • The Host header field must be included in every request
Method Request-URI HTTP/Major.Minor



  • Retrieval action based on the URI
    • maybe implemented by reading a file
    • maybe implemented by processing a file (PHP)
    • maybe implemented by invoking a process
  • Semantics may change based on header fields
    • If-*: only reply with the entity if necessary
    • Range: only reply with the requested part of the entity
  • Cacheability depends on header fields of the response
GET / HTTP/1.1
          Host: ischool.berkeley.edu

(20) HTTP Responses

  • The server's response to interpreting a request
    • the status code is given numerically and as text
    • 2** for variations of ok
    • 3** for redirections
    • 4** are different client side problems (404: not found)
    • 5** are different server side problems
  • Header fields specify additional information
    • information about the server
    • information about the entity (media type, encoding, language)
HTTP/Major.Minor Status-Code Text


(21) HTTP Performance

  • HTTP/1.0 allowed one transaction per connection
    • TCP connection setup and teardown are expensive
    • TCP's slow start slows down the initial phase of data transfer
    • typical Web pages use between 10-20 resources (HTML + images + CSS + scripts)
    • typically, these resources are stored on the same server
  • HTTP/1.1 introduces persistent connections
    • the TCP connection stays open for some time (10 sec is a popular choice)
    • additional requests to the same server use the same TCP connection
  • HTTP/1.1 introduces pipelined connections
    • instead of waiting for a response, requests can be queued
    • the server responds as fast as possible
    • the order may not be changed (there is no sequence number)

(22) HTTP Connection Handling


HTTP Authentication

Outline (HTTP Authentication)

  1. Uniform Resource Identifier (URI) [7]
  2. Hypertext Transfer Protocol (HTTP) [14]
    1. HTTP Basics [7]
    2. HTTP Authentication [5]

(24) HTTP Access Control

  • HTTP servers can deny access [http://en.wikipedia.org/wiki/List_of_HTTP_status_codes#4xx_Client_Error] because of access control
    • 401 Unauthorized means the resource is access controlled
    • 403 Forbidden means the resource is inaccessible
    • 405 Method Not Allowed signals a request using the wrong request method [HTTP Requests (1)]
  • Two different approaches to unauthorized access are possible
    • repeat the HTTP request with the proper authentication credentials
    • redirect to a Login Page [Login Page (1)] and establish an authenticated Session [State Management (Cookies); Session (1)]

(25) HTTP Authentication

HTTP Authentication

(26) Basic HTTP Authentication

(27) Repeated Access

  • Clients typically access more than one protected resource
    • a perfectly stateless client would always request authentication from the user
    • using the realm clients can identify repeated accesses
  • Web interactions by default are perfectly stateless
    • each request is completely independent from other requests
    • stateless interactions make the Web loosely coupled and scalable
    • concepts like the realm or State Management (Cookies) [State Management (Cookies)] introduce state
  • Clients remember the authentication and replay it automatically
    • browsers provide little control over this feature
    • logging out of HTTP authenticated sessions is hard

(28) Login Page

  • Basic HTTP Authentication [Basic HTTP Authentication (1)] works with browser controls (including the window)
    • no possibility to log out without using browser-specific controls
    • client side security depends on browser security measures
  • Using forms gives more freedom in session management
    • Authentication [Security & Privacy; Authentication (1)] and Authorization [Security & Privacy; Authorization (1)] are completely application-based
    • if there were secure personal browsers this would not work very well

Erik Wilde and Ryan Shaw: Web Foundations (URIs & HTTP)

(29) Conclusions

2010-02-22 Web Architecture and Information Management [./]
Spring 2010 — INFO 190-02 (CCN 42509)