Media Types

Web Architecture and Information Management [./]
Spring 2009 — INFO 190-02 (CCN 42509)

Erik Wilde, UC Berkeley School of Information
2009-03-11

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents E. Wilde: Media Types

Contents

E. Wilde: Media Types

(2) Abstract

One of the most important aspect of computer-based communications is the concept of media types, the question what type of information some digital artifact represents, and how it is encoded. The most common standard for this information is the scheme introduced by Multipurpose Internet Mail Extensions (MIME). Media types can be negotiated by peers communicating through HTTP. Some media types allow fragment identifiers, which allow references to a resource to identify a fragment of the complete resource.



E. Wilde: Media Types

(3) Multipurpose Internet Mail Extensions (MIME)



E. Wilde: Media Types

(4) Windows File Type Handling

Windows File Type Handling

Media Types and the Web

Outline (Media Types and the Web)

  1. Media Types and the Web [2]
  2. Media Types [10]
    1. Text Content Types [3]
    2. Image Content Types [3]
  3. Fragment Identifiers [2]
Media Types and the Web E. Wilde: Media Types

(6) Browsers and Resources



Media Types and the Web E. Wilde: Media Types

(7) Firefox Media Type Handling

Controlling Media Type Handling in Firefox

Media Types

Outline (Media Types)

  1. Media Types and the Web [2]
  2. Media Types [10]
    1. Text Content Types [3]
    2. Image Content Types [3]
  3. Fragment Identifiers [2]
Media Types E. Wilde: Media Types

(9) Content Types



Media Types E. Wilde: Media Types

(10) Subtypes



Media Types E. Wilde: Media Types

(11) Media Type Registration



Media Types E. Wilde: Media Types

(12) application/msword Media Type


SECURITY CONSIDERATIONS:
None known.


PUBLISHED SPECIFICATION:

Specification by example:

   From any microsoft word application select "Save As..." from the
   "File" menu.  Enter a filename, make sure that "Normal" is specified
   for the file type, and click "Save".

Company Contact:

   Microsoft Inc.

   16011 NE 36th Way
   Box 97017
   Redmond WA, 98073-9717


Text Content Types

Outline (Text Content Types)

  1. Media Types and the Web [2]
  2. Media Types [10]
    1. Text Content Types [3]
    2. Image Content Types [3]
  3. Fragment Identifiers [2]
Text Content Types E. Wilde: Media Types

(14) Plain Text

  • RFC 2046 [http://dret.net/rfc-index/reference/RFC2046] defines plain text files as a basic media type
    • any text file that does not contains structures which are intended for machine-based processing
    • even Comma-Separated Values (CSV) [Comma-Separated Values (CSV) (1)] does not count as plain text
  • Guessing of character encoding is hard and unreliable and should be avoided
    • the character encoding can be specified with an additional parameter: text/plain; charset=iso-8859-1
    • if no such parameter is present, ASCII should be assumed as the character encoding
  • For more specific text subtypes, various other subtypes exist [http://www.iana.org/assignments/media-types/text/]
    • calendar for information about calendar entries
    • javascript for JavaScript code (should now be marked as application/javascript)
    • sgml and xml for text with additional markup


Text Content Types E. Wilde: Media Types

(15) HTML

  • RFC 2854 [http://dret.net/rfc-index/reference/RFC2854] registers text/html for HTML documents
    • like Plain Text [Plain Text (1)] the character encoding can also be specified as a parameter
    • it is not specific for some version of HTML (version information can be found in the HTML document)
  • HTML Fragment Identifiers [HTML Fragment Identifiers (1)] are also defined by the media type registration
  • HTML in many cases needs additional resources to be self-contained
    • images which are references by img elements (maybe external image maps)
    • other media referenced by object or applet (or the deprecated embed)
    • stylesheets or scripts which are referenced in the document head (they may reference other files …)


Text Content Types E. Wilde: Media Types

(16) Comma-Separated Values (CSV)

  • RFC 4180 [http://dret.net/rfc-index/reference/RFC4180] defines a textual format for spreadsheet data
  • CSV has been used for a long time, but some of the details were solved differently
  • Defining a media type makes it easier for implementations to know what to expect
    • the registration not only registers the type, but also defines it
  • CSV is not overly complex, but some issues have to be solved
    • how to separate lines (CRLF)
    • how to end the file (CRLF is allowed but optional)
    • are there headers allowed (yes, but they are not marked as such)
    • may different lines use different numbers of fields (no)
    • are spaces significant (yes)
    • are quotes significant (no, they are delimiters, so quotes as values must be escaped)
    • how to treat fields with CRLF, commas, or quotes (enclose the value in quotes)


Image Content Types

Outline (Image Content Types)

  1. Media Types and the Web [2]
  2. Media Types [10]
    1. Text Content Types [3]
    2. Image Content Types [3]
  3. Fragment Identifiers [2]
Image Content Types E. Wilde: Media Types

(18) Graphic Interchange Format (GIF)

  • RFC 2046 [http://dret.net/rfc-index/reference/RFC2046] registers the oldest graphics format on the Web
  • GIF was subject of a long patent debate
    • the compression technique of GIF (LZW [http://en.wikipedia.org/wiki/Lzw]) had been patented by Unisys (1983)
    • Unisys wanted to get licensing fees from all commercial online uses of GIF
    • Portable Network Graphics (PNG) [Multimedia Content; Portable Network Graphics (PNG) (1)] was developed as an effort to develop a copyright-free format
    • in 1999, Unisys changed its tactics and wanted to collect one-time fees ($5000-$7500) from all users
    • all GIF-related LZW expired in 2003/2004, so GIF is freely available now
  • GIF's poor features make PNG the better choice anyway
    • 8 bit color (requires dithering for photographs), binary transparency
    • GIF's animation feature is the only thing that is not available in PNG … running-wolf.gif


Image Content Types E. Wilde: Media Types

(19) Joint Photographic Experts Group (JPEG)

  • RFC 2046 [http://dret.net/rfc-index/reference/RFC2046] standardizes the second popular image format for the Web
  • JPEG has been specifically designed for photographs
    • it always is lossy (it cannot preserve the complete information from a random bitmap)
    • it uses perception-based compression (for example, color precision is sacrificed for brightness)
Average Quality JPEG Low Quality JPEG Lowest Quality JPEG
Q = 50, filesize 15,138 bytes Q = 10, filesize 4,787 bytes Q = 1, filesize 1,523 bytes


Image Content Types E. Wilde: Media Types

(20) Portable Network Graphics (PNG)

png-transparency.png
  • PNG is registered as image/png and is the third major image format
    • PNG was intended to be a royalty- and copyright-free replacement of GIF [Multimedia Content; Graphics Interchange Format (GIF) (1)]
    • image formats need to supported by browsers and thus take a long time until they are established
    • IE6 implements PNG in a very rudimentary form, IE7 handles PNG correctly
  • PNG has some advantages over GIF and JPEG
    • lossless, compressed palette, grayscale, or true color images
    • 8 bit alpha channel for gradual opacity (blending into the background)
  • JPEG still is the preferred format for photographic pictures
  • GIF still is the preferred format for animated images


Fragment Identifiers

Outline (Fragment Identifiers)

  1. Media Types and the Web [2]
  2. Media Types [10]
    1. Text Content Types [3]
    2. Image Content Types [3]
  3. Fragment Identifiers [2]
Fragment Identifiers E. Wilde: Media Types

(22) Identification of Resource Fragments



Fragment Identifiers E. Wilde: Media Types

(23) HTML Fragment Identifiers



E. Wilde: Media Types

(24) Conclusions



2009-03-11 Web Architecture and Information Management [./]
Spring 2009 — INFO 190-02 (CCN 42509)