Content Syndication

Web Architecture [./]
Fall 2010 — INFO 290 (CCN 42605)

Erik Wilde, UC Berkeley School of Information

Creative Commons License []

This work is licensed under a CC
Attribution 3.0 Unported License

Contents E. Wilde: Content Syndication


E. Wilde: Content Syndication

(2) Abstract

For many information sources on the Web, it is useful to have some standardized way of subscribing to information updates. Syndication formats such as RSS and Atom can be used by these information sources to publish a feed of updated information items. Feeds follow a simple and unified model for representing information items, and thus can be easily aggregated, filtered and re-published across a wide variety of applications, as long as those applications publish information in feed-based formats.

E. Wilde: Content Syndication

(3) Content Feeds

Syndication Formats

Outline (Syndication Formats)

  1. Syndication Formats [18]
    1. RSS [11]
    2. Atom [7]
  2. Syndication Aggregation [5]
    1. FeedBurner [3]
  3. Conclusions [1]


RSS E. Wilde: Content Syndication

(6) RSS History

  • The Myth of RSS Compatibility [] provides a good overview
  • RSS is a schoolbook example for why standards are a good thing
    • RSS 0.9 [RSS 0.9 (1)] was created for the My Netscape portal in March 1999
    • RSS 0.91 (a simplification) was introduced in July 1999 (as an interim solution)
    • the AOL/Netscape merger removed the format from the company's portal
    • RSS was without an owner, and different parties claimed/denied ownership
    • RSS 1.0 [RSS 1.0 (1)] was created by an informal developer group
    • RSS 0.92 (and 0.93 and 0.94) were published without acknowledging RSS 1.0
    • finally, RSS 2.0 [RSS 2.0 (1)] was released as a follow-up to the RSS 0.9x versions
  • Using RSS has become an exercise in managing a menagerie of versions

RSS E. Wilde: Content Syndication

(7) RSS 0.9

  • RSS means RDF Site Summary (or Rich Site Summary?)
    • based on an RDF draft and not compatible with the final RDF specification
    • RDF was considered too cumbersome and unstable
    • 0.90 (proto-RDF) was quickly replaced by the non-RDF 0.91 version
  • RSS 0.92+ versions were developed as unilateral specifications
    • starting with RSS 0.91, RSS means Rich Site Summary
    • it is no longer built on RDF, instead it simply uses XML
    • the 0.9x branch eventually was renamed to RSS 2.0 [RSS 2.0 (1)]

RSS E. Wilde: Content Syndication

(9) RSS 1.0

  • RSS means RDF Site Summary (this time for real)
    • based on the final RDF specification and thus incompatible with any RSS 0.9 [RSS 0.9 (1)]
    • developed when the Semantic Web [Semantic Web] and RDF [Microformats; Resource Description Framework (RDF) (1)] were first heavily marketed (1999 [])
    • RDF was expected to become the format for metadata on the Web
  • RSS 1.0 makes heavy use of XML Namespaces
  • RSS 1.0 introduces features which were not present in 0.91
    • date information for published items (very relevant for news feeds)
    • individual authors for various items in a feed
  • RSS 1.0 is the latest version of RDF-based RSS
    • the Semantic Web [Semantic Web] wave is not over yet, but RDF [Microformats; Resource Description Framework (RDF) (1)] has lost its novelty appeal
    • for a more XML-oriented encoding, RSS 0.9 [RSS 0.9 (1)] provides a better foundation

RSS E. Wilde: Content Syndication

(11) RSS 2.0

  • RSS now means Really Simple Syndication
    • RSS 2.0 is the continuation of the 0.91 branch (which dropped RDF)
    • together with RSS 1.0 [RSS 1.0 (1)] it is the most popular version of RSS
    • migration from 0.91 to 2.0 is easily possible
  • RSS 2.0 tries to avoid the use of XML Namespaces
  • RSS 2.0 is increasingly used with extensions [] for vendor-specific information
    • the RSS core is minimal, so many applications need extensions
    • many extensions have overlapping functionality
    • most extensions have unclear semantics and unclear versioning policies

RSS E. Wilde: Content Syndication

(13) The Case for Content Management

  • RSS is very rarely produced by hand
    • by definition, RSS contains redundant information for a specific purpose
  • If a Content Management System (CMS) is used, RSS can be generated
    • basic metadata can be generated by the CMS (title, author, date)
    • better tagging of content results in better tagging of feeds
    • well-tagged feeds are better foundations for large-scale reuse of feed items
  • Blogging is simply a specialized case of a CMS
    • Web-based interface for controlling everything
    • strictly time-ordered sequenced of published items
    • navigation features primarily based on the time-specific facets of the blog (maybe tags)
    • all blogging tools include feed support

RSS E. Wilde: Content Syndication

(14) Consuming RSS

  • RSS feeds often have quality problems
    • surprisingly often feeds do not even deliver well-formed XML
    • the use of embedded markup in RSS is not well-defined
  • Writing an RSS reader from scratch is not a good idea
  • There are three major tasks which RSS readers must do
    1. accept non-XML RSS feeds and fix them to be XML
    2. look at the feed contents and bring them into a unified form
    3. produce a unified view of feeds regardless of the RSS version

RSS E. Wilde: Content Syndication

(15) RSS Technical Problems

  • What to put into an item's description
    • the fundamental question is whether a description is text or HTML
    • if there is no well-defined way, then interpretation is client-specific
      <description>This is a <em>very important</em> blog post …
      <description>This is a &lt;em>very important&lt;/em> blog post …
      <description>This is a blog post about <em> in RSS feeds …
      <description>This is a blog post about &lt;em> in RSS feeds …
      <description>This is a blog post about &amp;lt;em> in RSS feeds …
  • Underspecified and not very robust in various other areas
    • broken RSS is accepted by most readers (but fixing it can change the interpretation)
    • the interpretation of relative URIs is not mentioned in the specifications
    • some minimal semantics (classification) for items would be very useful

RSS E. Wilde: Content Syndication

(16) RSS Political Problems

  • Multiple and incompatible RSS History [RSS History (1)] are still in widespread use
    • RSS 1.0 [RSS 1.0 (1)] and RSS 2.0 [RSS 2.0 (1)] are incompatible by design (RDF vs. non-RDF)
    • none of the RSS versions is maintained by a universally accepted standards body
  • None of the specifications is being updated or fixed
    • some of the lessons learned by RSS deployment are not used in a new version
    • it is unlikely that a new version will be produced which merges the RSS landscape
  • Invent something new instead of trying to fix RSS
    • Atom [Atom (1)] started in 2003 (called Echo at first)
    • W3C or IETF would have been promising candidates for a new RSS
    • W3C is more formal, IETF is more developer-centered
    • IETF was chosen over W3C [] because the of Atom community's preferences


Outline (Atom)

  1. Syndication Formats [18]
    1. RSS [11]
    2. Atom [7]
  2. Syndication Aggregation [5]
    1. FeedBurner [3]
  3. Conclusions [1]
Atom E. Wilde: Content Syndication

(18) Atom History

  • RSS's shortcomings were very apparent and could not be fixed
  • In mid-2003, discussions started about an improved format
  • It also became apparent that the format should have a protocol
  • Atom 0.3 was released in December 2003 but had no formal home
  • IETF was chosen as the new home with a working group in June 2004
  • RFC 4287 [] was published in December 2005
  • AtomPub [Atom Publishing Protocol (AtomPub)] has been published as RFC 5023 [] in October 2007

Atom E. Wilde: Content Syndication

(19) Atom vs. RSS

  • Standardized by the IETF (well-defined process)
  • Classification of entries (user-defined categories)
  • More XML-like markup design (more nesting)
  • Namespaces are used and supported as standard mechanism
  • Atom feeds must be well-formed XML (there even is a schema [])
  • Interpretation of content is well-defined (various content types)
  • Support for xml:lang and xml:base

Atom E. Wilde: Content Syndication

(20) Atom Example

<feed xmlns="" xml:lang="en-us">
 <link rel='self' href=""/>
  <name>Tim Bray</name>
 <subtitle>ongoing fragmented essay by Tim Bray</subtitle>
 <entry xml:base="When/200x/2007/04/02/">
  <title>Atom Publishing Protocol Interop!</title>
  <category scheme="" term="Technology/Atom"/>
  <category scheme="" term="Technology"/>
  <category scheme="" term="Atom"/>
  <content type="xhtml">
   <div xmlns="">
    <p>Mark your calendar: <a href="">April 16-17 at Google</a>. <em>Everybody</em> is invited, provided they bring along an APP implementation, client or server. This was just announced a couple of days ago, and as I write this there are already <s>six</s> twelve client and <s>seven</s> fourteen server implementations signed up to be there and try to <a href="">fill in the grid</a>. Let’s drop some names, in alphabetical order: AOL, Flock, Google, IBM, Lotus, Microsoft, Oracle, O’Reilly, Six Apart, Sun, WordPress. Um, have I mentioned that the APP is going to be huge?</p>

Atom E. Wilde: Content Syndication

(21) Atom Content

  • RSS had no safe way of finding out what an entry's content is
    • this led to different implementations being smart about what the RSS author really wanted
    • one of Atom's main goals was to improve this in a well-defined way
    • Atom allows escaped markup (the only way to include non-XML HTML in an XML format)
  • Each content element should have a type (the default is text)
  • Atom's content interpretation algorithm (use first applicable rule):
    1. if type is text, no child elements are allowed (plain text content)
    2. if type is html then RSS's method of escaped markup is used
    3. if type is xhtml then there must be an div containing XHTML markup
    4. if type is an XML media type [Media Types] then the content should be treated as this type
    5. if type starts with text/ then no child elements are allowed
    6. for all other values, the content must be an base64-encoded entity of the specified MIME type

Atom E. Wilde: Content Syndication

(23) Atom Categories

  • Atom allows to assign categories to entries
    • each category element must have a term attribute for the category
    • an optional scheme identifies the categorization scheme (ontology, taxonomy, …)
    • an optional label attribute provides a human-readable label for the category
  • AtomPub [Atom Publishing Protocol (AtomPub)] defines a document format for Category Documents [Atom Publishing Protocol (AtomPub); Category Documents (1)]
  • Three different cases of categorization can be distinguished
    1. use a well-known scheme (such as Dublin Core)
    2. use a private but well-designed scheme (which has a URI and can be reused reliably)
    3. use tags without schemes, which then are little more than content labels
  • Widely-known tags are not easy to handle []
    • they are more than just privately assigned tags
    • there is no formal scheme for them, just an emerging consensus

Atom E. Wilde: Content Syndication

(24) Switching from RSS to Atom

  • Generate both feeds but serve RSS with an HTTP redirect (301)
    • old subscribers with broken clients can still use the RSS feed
    • old subscribers with correct clients will use the Atom feed
  • Atom exposes more information than RSS (category for tags)
    • the mapping of publishing info to the feed has to be changed/extended
    • for standard metadata use Atom's built-in metadata elements
    • for application-specific metadata consider reusing an existing metadata schema
  • Atom can be used to publish snippets as well as full content
    • content allows any type of content to be used and may contain a complete entry
    • summary allows only text and should provide a condensed version of an entry
    • some Atom sources publish two feeds for summaries and content
  • Generate good Atom and downgrade it to RSS 1.0 & 2.0

Syndication Aggregation

Outline (Syndication Aggregation)

  1. Syndication Formats [18]
    1. RSS [11]
    2. Atom [7]
  2. Syndication Aggregation [5]
    1. FeedBurner [3]
  3. Conclusions [1]
Syndication Aggregation E. Wilde: Content Syndication

(26) End-User Aggregation

<link rel="alternate" type="application/rdf+xml" title="…" href="…" />
<link rel="alternate" type="application/rss+xml" title="…" href="…" />
<link rel="alternate" type="application/atom+xml" title="…" href="…" />

Syndication Aggregation E. Wilde: Content Syndication

(27) Aggregation Intermediaries


Outline (FeedBurner)

  1. Syndication Formats [18]
    1. RSS [11]
    2. Atom [7]
  2. Syndication Aggregation [5]
    1. FeedBurner [3]
  3. Conclusions [1]
FeedBurner E. Wilde: Content Syndication

(29) Fixing Feeds

Cleaning Up Feeds

FeedBurner E. Wilde: Content Syndication

(30) Load Balancing

Providing Feed Load Balancing

FeedBurner E. Wilde: Content Syndication

(31) Statistics/Analytics

Providing Feed Statistics


Outline (Conclusions)

  1. Syndication Formats [18]
    1. RSS [11]
    2. Atom [7]
  2. Syndication Aggregation [5]
    1. FeedBurner [3]
  3. Conclusions [1]
Conclusions E. Wilde: Content Syndication

(33) Semantic Web Light

2010-10-28 Web Architecture [./]
Fall 2010 — INFO 290 (CCN 42605)