Web Foundations (URI & HTTP)

Mobile Application Design and Development [./]
Spring 2010 — INFO 152 (CCN 42504)

Erik Wilde, UC Berkeley School of Information
2010-02-12

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents Erik Wilde: Web Foundations (URI & HTTP)

Contents

Erik Wilde: Web Foundations (URI & HTTP)

(2) Abstract

The Web's architecture has very simple principles revolving around the ideas of placing a heavy emphasis on a consistent and global identification mechanism for resources, a standardized way of how resource representations can be retrieved, and a standardized way of how resource representations should be usable by using standardized media types. Based on the Internet, the Web's transport protocol transmits representations of resources identified by a Uniform Resource Identifier (URI) between Web servers and clients. The most important protocols for data transfer on the Web is the Hypertext Transfer Protocol (HTTP).



Uniform Resource Identifier (URI)

Outline (Uniform Resource Identifier (URI))

  1. Uniform Resource Identifier (URI) [2]
  2. Hypertext Transfer Protocol (HTTP) [15]
    1. HTTP Basics [8]
    2. HTTP Authentication [5]
Uniform Resource Identifier (URI) Erik Wilde: Web Foundations (URI & HTTP)

(4) Resource Identification



Uniform Resource Identifier (URI) Erik Wilde: Web Foundations (URI & HTTP)

(5) URI Schemes

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
http://dret.net/lectures/web-fall09/foundations#uri-schemes


Hypertext Transfer Protocol (HTTP)

Outline (Hypertext Transfer Protocol (HTTP))

  1. Uniform Resource Identifier (URI) [2]
  2. Hypertext Transfer Protocol (HTTP) [15]
    1. HTTP Basics [8]
    2. HTTP Authentication [5]
Hypertext Transfer Protocol (HTTP) Erik Wilde: Web Foundations (URI & HTTP)

(7) DNS & HTTP

The two basic protocols which every Web browser must implement are DNS access and HTTP [Hypertext Transfer Protocol (HTTP) (1)]. However, most operating systems provide an API for DNS access, so the browser can use this service locally and only has to implement HTTP. TCP (which is required as the foundation for HTTP) is usually provided by the operating system.

browser-dns-http.png

Hypertext Transfer Protocol (HTTP) Erik Wilde: Web Foundations (URI & HTTP)

(8) The Web's Protocol

internet-traffic-trends.png

provided by CacheLogic Inc. [http://www.cachelogic.com/]



HTTP Basics

HTTP Basics Erik Wilde: Web Foundations (URI & HTTP)

(10) HTTP Messages

  • HTTP needs a reliable connection
    • the foundation for HTTP is TCP
    • DNS resolution yields an IP address [Geolocation; Internet Addressing (1)]
    • open TCP connection to port 80 or port specified in URI (http://rosetta.ischool.berkeley.edu:8085/)
  • HTTP is a text-based protocol
    • the connection is used to transmit text messages
    • all HTTP messages are human-readable (not all entities, though)
    • basic HTTP operations can be carried out by hand
start-line
message-header *

message-body ?


HTTP Basics Erik Wilde: Web Foundations (URI & HTTP)

(11) HTTP Header Fields

  • Header fields contain information about the message
    • general header: Date as the message origination date
    • request header: Accept-Language indicates language preferences
    • response header: Server contains system information
    • entity header: Content-Type specifies the media type of the entity
  • HTTP defines a number of header fields [http://www.cs.tut.fi/~jkorpela/http.html]
    • unknown fields must be ignored (extensibility)
    • unstandardized fields should use a X- prefix
  • HTTP is about acting on these fields
    • HTTP defines what HTTP implementations must or should do


HTTP Basics Erik Wilde: Web Foundations (URI & HTTP)

(12) HTTP Content Negotiation

  • HTTP Header Fields [HTTP Header Fields (1)] have interaction semantics
    • depending on the header type they convey different information
  • HTTP Content Negotiation [http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html] allows representation negotiation
    • the client specifies a number of preferred content properties
    • the server responds with the representation that fits best
  • Server-driven negotiation can use a number of request header fields
    • Accept uses a list of media types [../web-fall09/mediatypes]
    • Accept-Charset uses a list of character sets
    • Accept-Encoding uses a list of content encodings
    • Accept-Language uses a list of language codes
    • User-Agent specifies the client's identification
  • Client-driven negotiation lets the server send a list of URIs
    • two steps are required to get the best alternate representation


HTTP Basics Erik Wilde: Web Foundations (URI & HTTP)

(13) HTTP Requests

  • After opening a connection, the client sends a request
    • the method indicates the action to be performed on the resource
    • HTTP's most interesting methods are: GET, HEAD, POST
    • other interesting methods are: PUT, DELETE
  • The URI identifies the resource to which the request should be applied
    • absolute URIs are required when contacting proxies
    • absolute paths are required when contacting a server directly
    • the URI may contain query information
  • The Host header field must be included in every request
Method Request-URI HTTP/Major.Minor
[Header]*

[Entity]?


HTTP Basics Erik Wilde: Web Foundations (URI & HTTP)

(14) HTTP GET

  • Retrieval action based on the URI
    • maybe implemented by reading a file
    • maybe implemented by processing a file (PHP)
    • maybe implemented by invoking a process
  • Semantics may change based on header fields
    • If-*: only reply with the entity if necessary
    • Range: only reply with the requested part of the entity
  • Cacheability depends on header fields of the response
GET / HTTP/1.1
Host: ischool.berkeley.edu


HTTP Basics Erik Wilde: Web Foundations (URI & HTTP)

(15) HTTP Responses

  • The server's response to interpreting a request
    • the status code is given numerically and as text
    • 2** for variations of ok
    • 3** for redirections
    • 4** are different client side problems (404: not found)
    • 5** are different server side problems
  • Header fields specify additional information
    • information about the server
    • information about the entity (media type, encoding, language)
HTTP/Major.Minor Status-Code Text
[Header]*

[Entity]?


HTTP Basics Erik Wilde: Web Foundations (URI & HTTP)

(16) HTTP Performance

  • HTTP/1.0 allowed one transaction per connection
    • TCP connection setup and teardown are expensive
    • TCP's slow start slows down the initial phase of data transfer
    • typical Web pages use between 10-20 resources (HTML + images + CSS + scripts)
    • typically, these resources are stored on the same server
  • HTTP/1.1 introduces persistent connections
    • the TCP connection stays open for some time (10 sec is a popular choice)
    • additional requests to the same server use the same TCP connection
  • HTTP/1.1 introduces pipelined connections
    • instead of waiting for a response, requests can be queued
    • the server responds as fast as possible
    • the order may not be changed (there is no sequence number)


HTTP Basics Erik Wilde: Web Foundations (URI & HTTP)

(17) HTTP Connection Handling

http-phttp-pipelining.png

HTTP Authentication

Outline (HTTP Authentication)

  1. Uniform Resource Identifier (URI) [2]
  2. Hypertext Transfer Protocol (HTTP) [15]
    1. HTTP Basics [8]
    2. HTTP Authentication [5]
HTTP Authentication Erik Wilde: Web Foundations (URI & HTTP)

(19) HTTP Access Control

  • HTTP servers can deny access [http://en.wikipedia.org/wiki/List_of_HTTP_status_codes#4xx_Client_Error] through access control
    • 401 Unauthorized means the resource is access controlled
    • 403 Forbidden means the resource is inaccessible
    • 405 Method Not Allowed signals a request using the wrong request method [HTTP Requests (1)]
  • Two different approaches to unauthorized access are possible
    • repeat the HTTP request with the proper authentication credentials
    • redirect to a Login Page [Login Page (1)] and establish an authenticated session


HTTP Authentication Erik Wilde: Web Foundations (URI & HTTP)

(20) HTTP Authentication

HTTP Authentication

HTTP Authentication Erik Wilde: Web Foundations (URI & HTTP)

(21) Basic HTTP Authentication



HTTP Authentication Erik Wilde: Web Foundations (URI & HTTP)

(22) Repeated Access

  • Clients typically access more than one protected resource
    • a perfectly stateless client would always request authentication from the user
    • using the realm clients can identify repeated accesses
  • Web interactions by default are perfectly stateless
    • each request is completely independent from other requests
    • stateless interactions make the Web loosely coupled and scalable
    • concepts like the realm or cookies introduce state
  • Clients remember the authentication and replay it automatically
    • browsers provide little control over this feature
    • logging out of HTTP authenticated sessions is hard


HTTP Authentication Erik Wilde: Web Foundations (URI & HTTP)

(23) Login Page

  • Basic HTTP Authentication [Basic HTTP Authentication (1)] works with browser controls (including the window)
    • no possibility to log out without using browser-specific controls
    • client side security depends on browser security measures
  • Using HTML forms gives more freedom in session management
    • authentication and authorization are completely application-based
    • if there were secure personal browsers this would not work very well


Erik Wilde: Web Foundations (URI & HTTP)

(24) Conclusions



2010-02-12 Mobile Application Design and Development [./]
Spring 2010 — INFO 152 (CCN 42504)