Web Foundations (URI & HTTP)

Web Architecture and Information Management [./]
Spring 2009 — INFO 190-02 (CCN 42509)

Erik Wilde, UC Berkeley School of Information
2009-02-25

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents E. Wilde: Web Foundations (URI & HTTP)

Contents

E. Wilde: Web Foundations (URI & HTTP)

(2) Abstract

The Web's architecture has very simple principles revolving around the ideas of placing a heavy emphasis on a consistent and global identification mechanism for resources, a standardized way of how resource representations can be retrieved, and a standardized way of how resource representations should be usable by using standardized media types. Based on the Internet, the Web's transport protocol transmits representations of resources identified by a Uniform Resource Identifier (URI) between Web servers and clients. The most important protocols for data transfer on the Web is the Hypertext Transfer Protocol (HTTP).



E. Wilde: Web Foundations (URI & HTTP)

(3) Web Server Service



Uniform Resource Identifier (URI)

Outline (Uniform Resource Identifier (URI))

  1. Uniform Resource Identifier (URI) [2]
  2. Hypertext Transfer Protocol (HTTP) [14]
    1. HTTP Basics [7]
    2. HTTP Authentication [5]
Uniform Resource Identifier (URI) E. Wilde: Web Foundations (URI & HTTP)

(5) Resource Identification



Uniform Resource Identifier (URI) E. Wilde: Web Foundations (URI & HTTP)

(6) URI Schemes

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
http://dret.net/lectures/web-spring09/foundations#uri-schemes


Hypertext Transfer Protocol (HTTP)

Outline (Hypertext Transfer Protocol (HTTP))

  1. Uniform Resource Identifier (URI) [2]
  2. Hypertext Transfer Protocol (HTTP) [14]
    1. HTTP Basics [7]
    2. HTTP Authentication [5]
Hypertext Transfer Protocol (HTTP) E. Wilde: Web Foundations (URI & HTTP)

(8) DNS & HTTP

The two basic protocols which every Web browser must implement are DNS [Internet Architecture; Domain Name System (DNS) (1)] access and HTTP [Hypertext Transfer Protocol (HTTP) (1)]. However, most operating systems provide an API for DNS access, so the browser can use this service locally and only has to implement HTTP. TCP [Internet Architecture; Transmission Control Protocol (TCP) (1)] (which is required as the foundation for HTTP) is usually provided by the operating system.

browser-dns-http.png

Hypertext Transfer Protocol (HTTP) E. Wilde: Web Foundations (URI & HTTP)

(9) The Web's Protocol

internet-traffic-trends.png

provided by CacheLogic Inc. [http://www.cachelogic.com/]



HTTP Basics

HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(11) HTTP Messages

  • HTTP needs a reliable connection
    • the foundation for HTTP is the Transmission Control Protocol (TCP) [Internet Architecture; Transmission Control Protocol (TCP) (1)]
    • DNS resolution yields an IP address
    • open TCP connection to port 80 or port specified in URI (http://rosetta.sims.berkeley.edu:8085/)
  • HTTP is a text-based protocol
    • the connection is used to transmit text messages
    • all HTTP messages are human-readable (not all entities, though)
    • basic HTTP operations can be carried out by hand
start-line
message-header *

message-body ?


HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(12) HTTP Header Fields

  • Header fields contain information about the message
    • general header: Date as the message origination date
    • request header: Accept-Language indicates language preferences
    • response header: Server contains system information
    • entity header: Content-Type specifies the media type of the entity
  • HTTP defines a number of header fields [http://www.cs.tut.fi/~jkorpela/http.html]
    • unknown fields must be ignored (extensibility)
    • unstandardized fields should use a X- prefix
  • HTTP is about acting on these fields
    • HTTP defines what HTTP implementations must or should do


HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(13) HTTP Requests

  • After opening a connection, the client sends a request
    • the method indicates the action to be performed on the resource
    • HTTP's most interesting methods are: GET, HEAD, POST
    • other interesting methods are: PUT, DELETE
  • The URI identifies the resource to which the request should be applied
    • absolute URIs are required when contacting proxies
    • absolute paths are required when contacting a server directly
    • the URI may contain query information
  • The Host header field must be included in every request
Method Request-URI HTTP/Major.Minor
[Header]*

[Entity]?


HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(14) HTTP GET

  • Retrieval action based on the URI
    • maybe implemented by reading a file
    • maybe implemented by processing a file (PHP)
    • maybe implemented by invoking a process
  • Semantics may change based on header fields
    • If-*: only reply with the entity if necessary
    • Range: only reply with the requested part of the entity
  • Cacheability depends on header fields of the response
GET / HTTP/1.1
Host: ischool.berkeley.edu


HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(15) HTTP Responses

  • The server's response to interpreting a request
    • the status code is given numerically and as text
    • 2** for variations of ok
    • 3** for redirections
    • 4** are different client side problems (404: not found)
    • 5** are different server side problems
  • Header fields specify additional information
    • information about the server
    • information about the entity (media type, encoding, language)
HTTP/Major.Minor Status-Code Text
[Header]*

[Entity]?


HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(16) HTTP Performance

  • HTTP/1.0 allowed one transaction per connection
    • TCP connection setup and teardown are expensive
    • TCP's slow start slows down the initial phase of data transfer
    • typical Web pages use between 10-20 resources (HTML + images + CSS + scripts)
    • typically, these resources are stored on the same server
  • HTTP/1.1 introduces persistent connections
    • the TCP connection stays open for some time (10 sec is a popular choice)
    • additional requests to the same server use the same TCP connection
  • HTTP/1.1 introduces pipelined connections
    • instead of waiting for a response, requests can be queued
    • the server responds as fast as possible
    • the order may not be changed (there is no sequence number)


HTTP Basics E. Wilde: Web Foundations (URI & HTTP)

(17) HTTP Connection Handling

http-phttp-pipelining.png

HTTP Authentication

Outline (HTTP Authentication)

  1. Uniform Resource Identifier (URI) [2]
  2. Hypertext Transfer Protocol (HTTP) [14]
    1. HTTP Basics [7]
    2. HTTP Authentication [5]
HTTP Authentication E. Wilde: Web Foundations (URI & HTTP)

(19) HTTP Access Control

  • HTTP servers can deny access [http://en.wikipedia.org/wiki/List_of_HTTP_status_codes#4xx_Client_Error] because of access control
    • 401 Unauthorized means the resource is access controlled
    • 403 Forbidden means the resource is inaccessible
    • 405 Method Not Allowed signals a request using the wrong request method [HTTP Requests (1)]
  • Two different approaches to unauthorized access are possible
    • repeat the HTTP request with the proper authentication credentials
    • redirect to a Login Page [Login Page (1)] and establish an authenticated Session [State Management (Cookies); Session (1)]


HTTP Authentication E. Wilde: Web Foundations (URI & HTTP)

(20) HTTP Authentication

HTTP Authentication

HTTP Authentication E. Wilde: Web Foundations (URI & HTTP)

(21) Basic HTTP Authentication



HTTP Authentication E. Wilde: Web Foundations (URI & HTTP)

(22) Repeated Access

  • Clients typically access more than one protected resource
    • a perfectly stateless client would always request authentication from the user
    • using the realm clients can identify repeated accesses
  • Web interactions by default are perfectly stateless
    • each request is completely independent from other requests
    • stateless interactions make the Web loosely coupled and scalable
    • concepts like the realm or State Management (Cookies) [State Management (Cookies)] introduce state
  • Clients remember the authentication and replay it automatically
    • browsers provide little control over this feature
    • logging out of HTTP authenticated sessions is hard


HTTP Authentication E. Wilde: Web Foundations (URI & HTTP)

(23) Login Page

  • Basic HTTP Authentication [Basic HTTP Authentication (1)] works with browser controls (including the window)
    • no possibility to log out without using browser-specific controls
    • client side security depends on browser security measures
  • Using HTML Forms [HTML Forms] gives more freedom in session management
    • Authentication [Security & Privacy; Authentication (1)] and Authorization [Security & Privacy; Authorization (1)] are completely application-based
    • if there were secure personal browsers this would not work very well


E. Wilde: Web Foundations (URI & HTTP)

(24) Conclusions



2009-02-25 Web Architecture and Information Management [./]
Spring 2009 — INFO 190-02 (CCN 42509)