Microformats
[./]
Fall 2009 — INFO 290 (CCN 42593)
[http://creativecommons.org/licenses/by/3.0/]
[http://creativecommons.org/licenses/by/3.0/]
(2) Abstract
HTML pages are for human users and describe a resource in structural terms (headings, lists, tables, …). For machine-based interaction, it is often necessary to have more information about the application concepts. XML is a popular language for representing application structures, but is targeted at machine-based processing alone. Microformats and more formal approaches such as the Resource Description Format (RDF), RDF in Attributes (RDFa), and Web Ontology Language (OWL) often are used to describe Web content semantically.
(3) HTML vs. XML
- HTML describes structures in a very general way
- HTML elements describe logical page structures such as headings, lists, tables, …
- useful for dynamic and adaptive page rendering, but not for understanding contents
- Good HTML may have more information available
- classes in HTML elements may represent underlying concepts (CSS may use this)
- HTML containers [Advanced HTML; All-Purpose Elements (1)] may represent aggregation of some basic information items
- Very good HTML
- some guidelines/rules/methods for understanding class names
- some model for the underlying schema (what may appear in which combination)
- Excellent HTML is dynamically generated from XML
- the model is exposed as structured XML data that is available to the client
- there is a stylesheet for producing the HTML version of the XML
- but even XML does not provide semantics (it is just a structured syntax)
(4) Plain HTML
<html>
<head>
<title>Cannondale 2007 System Six 2</title>
</head>
<body>
<h1>Cannondale 2007 System Six 2</h1>
<ul>
<li>Mavic Ksyrium ES wheelset</li>
<li>Maxxis Xenith Hors Categorie tires</li>
<li>Fi'zi:k Arione Titanium saddle</li>
<li>SRAM Force components</li>
</ul>
<p>Sizes: 48cm, 50cm, 52cm, 54cm, 56cm, 58cm, 60cm, 63cm</p>
<p>Dealers: </p>
<ul>
<li>Mike's Bikes of Berkeley, 2161 University Avenue, Berkeley, CA 94704; +1-510-8452453</li>
</ul>
</body>
</html>
(5) Good HTML
<html>
<head>
<title>Cannondale 2007 System Six 2</title>
</head>
<body>
<h1 class="bike"><span class="manufacturer">Cannondale</span> <span class="year">2007</span> <span class="model">System Six</span> <span class="type">2</span></h1>
<ul class="components">
<li class="component"><span class="manufacturer">Mavic</span> <span class="type">Ksyrium ES</span> wheelset</li>
<li class="component"><span class="manufacturer">Maxxis</span> <span class="type">Xenith Hors Categorie</span> tires</li>
<li class="component"><span class="manufacturer">Fi'zi:k</span> <span class="type">Arione Titanium</span> saddle</li>
<li class="component"><span class="manufacturer">SRAM</span> <span class="type">Force</span> components</li>
</ul>
<p>Sizes: <span class="size">48cm</span>, <span class="size">50cm</span>, <span class="size">52cm</span>, <span class="size">54cm</span>, <span class="size">56cm</span>, <span class="size">58cm</span>, <span class="size">60cm</span>, <span class="size">63cm</span></p>
<p>Dealers: </p>
<ul>
<li class="dealer"><span class="name">Mike's Bikes of Berkeley</span>, <span class="adr"><span class="street-address">2161 University Avenue</span>, <span class="locality">Berkeley</span>, <span class="region">CA</span> <span class="postal-code">94704</span></span>; <a href="tel:+1-510-8452453">+1-510-8452453</a></li>
</ul>
</body>
</html>
(6) Excellent HTML
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="bike2html.xsl" type="text/xsl"?>
<bike manufacturer="Cannondale" year="2007">
<model>System Six</model>
<type>2</type>
<sizes>
<size unit="cm">48</size>
<size unit="cm">50</size>
<size unit="cm">52</size>
<size unit="cm">54</size>
<size unit="cm">56</size>
<size unit="cm">58</size>
<size unit="cm">60</size>
<size unit="cm">63</size>
</sizes>
<parts>
<wheelset manufacturer="Mavic">Ksyrium ES</wheelset>
<tires manufacturer="Maxxis">Xenith Hors Categorie</tires>
<saddle manufacturer="Fi'zi:k">Arione Titanium</saddle>
<components manufacturer="SRAM">Force</components>
</parts>
<dealers>
<dealer>
<name>Mike's Bikes of Berkeley</name>
<address>2161 University Avenue</address>
<city>Berkeley</city>
<zip>94704</zip>
<state>CA</state>
<phone>+1-510-8452453</phone>
</dealer>
</dealers>
</bike>
(7) XML → HTML Stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<title>
<xsl:value-of select="bike/@manufacturer"/>
<xsl:text> </xsl:text>
<xsl:value-of select="bike/@year"/>
<xsl:text> </xsl:text>
<xsl:value-of select="bike/model"/>
<xsl:text> </xsl:text>
<xsl:value-of select="bike/type"/>
</title>
</head>
<body>
<h1 class="bike">
<span class="manufacturer">
<xsl:value-of select="bike/@manufacturer"/>
</span>
<xsl:text> </xsl:text>
<span class="year">
<xsl:value-of select="bike/@year"/>
</span>
<xsl:text> </xsl:text>
<span class="model">
<xsl:value-of select="bike/model"/>
</span>
<xsl:text> </xsl:text>
<span class="type">
<xsl:value-of select="bike/type"/>
</span>
</h1>
<ul class="components">
<xsl:for-each select="//parts/*">
<li class="component">
<span class="manufacturer">
<xsl:value-of select="@manufacturer"/>
</span>
<xsl:text> </xsl:text>
<span class="type">
<xsl:value-of select="text()"/>
</span>
<xsl:text> </xsl:text>
<xsl:value-of select="local-name()"/>
</li>
</xsl:for-each>
</ul>
<p>
<xsl:text>Sizes: </xsl:text>
<xsl:for-each select="//size">
<span class="size">
<xsl:value-of select="concat(text(), @unit)"/>
</span>
<xsl:if test="not(position() = last())">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:for-each>
</p>
<p>Dealers: </p>
<ul>
<xsl:for-each select="//dealer">
<li class="dealer">
<span class="name">
<xsl:value-of select="name"/>
</span>
<xsl:text>, </xsl:text>
<span class="adr">
<span class="street-address">
<xsl:value-of select="address"/>
</span>
<xsl:text>, </xsl:text>
<span class="locality">
<xsl:value-of select="city"/>
</span>
<xsl:text>, </xsl:text>
<span class="region">
<xsl:value-of select="state"/>
</span>
<xsl:text> </xsl:text>
<span class="postal-code">
<xsl:value-of select="zip"/>
</span>
</span>
<xsl:text>; </xsl:text>
<a href="tel:{phone}">
<xsl:value-of select="phone"/>
</a>
</li>
</xsl:for-each>
</ul>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
(8) Graceful Degradation
- XML was designed as a language for Web content
- the idea was that XML documents would be delivered to the browser
- stylesheets (CSS/XSL) would take care of the client-side rendering
- CSS [Cascading Style Sheets (CSS)] is good at supporting graceful degradation
- viewing an HTML page with CSS turned off most of the time works fine
- XSLT is not good at supporting graceful degradation
- the browser just displays the raw XML when XSLT is not supported
- Serving XML on the Web is not a good idea
- in closed scenarios (intranet applications) this is a viable solution
- in open scenarios, HTML should be served as the default representation
- alternate versions can be provided by supporting HTTP Content Negotiation [Web Foundations (URI & HTTP); HTTP Content Negotiation (1)]
(9) Excellent HTML
<html>
<head>
<title>Cannondale 2007 System Six 2</title>
<link title="XML version" rel="alternate" type="application/xml" href="systemsix.xml"/>
</head>
<body>
<h1 class="bike"><span class="manufacturer">Cannondale</span> <span class="year">2007</span> <span class="model">System Six</span> <span class="type">2</span></h1>
<ul class="components">
<li class="component"><span class="manufacturer">Mavic</span> <span class="type">Ksyrium ES</span> wheelset</li>
<li class="component"><span class="manufacturer">Maxxis</span> <span class="type">Xenith Hors Categorie</span> tires</li>
<li class="component"><span class="manufacturer">Fi'zi:k</span> <span class="type">Arione Titanium</span> saddle</li>
<li class="component"><span class="manufacturer">SRAM</span> <span class="type">Force</span> components</li>
</ul>
<p>Sizes: <span class="size">48cm</span>, <span class="size">50cm</span>, <span class="size">52cm</span>, <span class="size">54cm</span>, <span class="size">56cm</span>, <span class="size">58cm</span>, <span class="size">60cm</span>, <span class="size">63cm</span></p>
<p>Dealers: </p>
<ul>
<li class="dealer"><span class="name">Mike's Bikes of Berkeley</span>, <span class="adr"><span class="street-address">2161 University Avenue</span>, <span class="locality">Berkeley</span>, <span class="region">CA</span> <span class="postal-code">94704</span></span>; <a href="tel:+1-510-8452453">+1-510-8452453</a></li>
</ul>
</body>
</html>
(10) From Information, Knowledge
- XML is often said to be
self-describing
- many people think this is the same as
self-explanatory
- the catch is what exactly it is you refer to by
describing
- Database data cannot live without a database
- database data is simply content, the structure is provided by a DBMS
- XML documents have their structure encoded within them
- compared to database data, XML in fact is
self-describing
- What is the gap between
self-describing
and self-explanatory
?- it is impossible to find out how the document could be modified
- there are no semantics associated with structure or content
- so
self-describing
means, you can guess a lot, but you maybe wrong
(11) The Semantic Web Hype
1965, H. A. Simon: [http://en.wikipedia.org/wiki/Artificial_intelligence#_note-11]
1967, Marvin Minsky: [http://en.wikipedia.org/wiki/Artificial_intelligence#_note-12]
- How to get past the limitations of HTML?
- a machine-friendly Web must make Web resources machine-processable
- XML solved the problem on the syntax level
- how could the problem be solved on the level of semantics?
- As in the 1970's, description logic was declared as being the solution
- there was a need for the Web to move towards semantics
- there was a community of AI researchers with a long history
- the Semantic Web was born and is currently repeating AI history
(12) Semantic Web Layer Cake

Microformats
(14) Islands of Semantics
- Microformats solve very specific problems in a very specific way
- encoding address information on a Web page
- encoding a location of something represented by a Web resource
- Microformats can be compared to
tagging
- a very simple mechanism with a minimal barrier-to-entry
- little flexibility in adapting the mechanism to slightly other uses
- often underspecified and interpretation implementation-dependent
- no unified rules across different platforms which makes processing hard
- nice and easy to start with, but questionable for robust long-term solutions
- Currently there are about 10 reasonably popular microformats
- [http://microformats.org/wiki/Main_Page]
(15) Microformat Syntax
- HTML has some underspecified and underused elements
- dfn, code, samp, kbd, var, cite, abbr, acronym
- they can be reused and augmented with additional information
- HTML allows non-HTML content in HTML pages
- unknown elements and attributes must be ignored
- HTML allows class attributes to carry semantics
- HTML has a head which contains page metadata
- for example, the link element specifies connections to other resources
(16) Magic Names
- A syntax defines where and how to embed information
- what is embedded and how well is it defined semantically?
- is there an underlying model for specifying dependencies?
- how many assumptions does it take to implement a microformat?
- Names are never self-explanatory, they always represent concepts
- nothing can remove the burden of defining a conceptual model
- if this is not done, models evolve and there will be more than one
Microformats
and tagging
share the same folklore- define simple things and good things will happen
- this works by supporting a quickly growing ecosystem of diverging semantics
- semantics are most useful when they are well-defined
- loose semantics also have some utility
(17) Microformats on the Web
- Easy to embed for generated content
- some of the very basic formats may even appear in browsers one day
- combining well-designed URIs with document relationships is better than every site map
- Hard to rely on for applications that need dependable semantics
- useful as a hint and as a starting point
- microformats are not a good idea for complex information management tasks
- Use as foundation for representing common concepts
- when formatting addresses, use [http://microformats.org/wiki/adr] class names
- for structured documents use [http://microformats.org/wiki/xoxo]
Resource Description Framework (RDF)
(19) Describing Resources
- RDF describes everything in triples
- making a statement about a resource (identified by a URI [Web Foundations (URI & HTTP); Uniform Resource Identifier (URI) (1)])
- describing a certain property of the resource (identified by a URI [Web Foundations (URI & HTTP); Uniform Resource Identifier (URI) (1)])
- specifying a value for that property (a URI [Web Foundations (URI & HTTP); Uniform Resource Identifier (URI) (1)] or a
literal
)
[http://www.w3.org/TR/REC-rdf-syntax/#intro](20) RDF Graphs

(21) RDF is Simple and Complex
- [http://www.w3.org/TR/rdf-concepts/] is the idea of descriptive triples
- the actual RDF model is rooted in description logic
- RDF itself can only describe individuals (something identified by URI)
- [http://www.w3.org/TR/rdf-syntax-grammar/] is an XML syntax for encoding triples
- the syntax allows a variety of ways to represent the same RDF statements
- processing RDF/XML with XML tools is likely to fail
- use RDF parsers to parse all variations of RDF/XML into an abstract RDF graph
- [http://www.w3.org/TR/rdf-schema/] supports the creation of RDF vocabularies
- describe the classes of things that can be used in statements
- describe the properties which can be used for each of these classes
- describe the allowed values for the supported properties
(22) RDF Schema Graph

(23) RDF in Attributes (RDFa)
- Microformats can use any kind of markup design
- this makes it hard to detect microformats when processing a Web page
- combining microformats can become complicated and ill-designed
- RDFa defines a syntax for embedding RDF into HTML
- the vocabulary must be described by some RDF schema language
<p>This document is licensed under a <a xmlns:cc="http://creativecommons.org/licenses/" rel="cc:license" href="http://creativecommons.org/licenses/by/nc-nd/3.0/">Creative Commons License</a>.</p>
- RDFa uses and extends HTML for embedding RDF
- it uses HTML's rel, rev, href, and src attributes
- it defines a number of [http://www.w3.org/TR/rdfa-syntax/#s_metaAttributes]
- it defines a [http://www.w3.org/TR/rdfa-syntax/#s_model] for deriving RDF triples from these attributes
More Languages
(25) SPARQL
- RDF graphs can be large and hard to handle
- querying RDF graphs using XML technologies is hard and slow
- special data structures need special query languages
- SPARQL is a query language for querying RDF graphs
- Using RDF without using SPARQL does not make a lot of sense
- if the data is simple and restricted, why use RDF?
- processing unrestricted RDF without a special language is very hard
- Semantic Web search engines can harvest the Web for RDF
- the result is a huge graph of RDF describing all semantic Web resources
- querying into this graph retrieves all formalized semantics on the Web
(26) Web Ontology Language (OWL)
- RDF and RDF Schema are rather basic languages
- OWL adds more sophisticated features to RDF Schema
- constructions of classes using existing ones
- characterize relationships (e.g., whether they are transitive, symmetric, functional, etc.)
- Formal semantics are hard to write and compute
- no property expressions or datatypes in RDF Schemas
- not all set operators, restricted cardinality in OWL Lite
- some restrictions, but a computational guarantee in OWL DL
- full expressive power in OWL Full (but no computational guarantee)
(27) Vocabulary Taxonomy

Conclusions
(29) Some Questions
- Is the world something that can be objectively formalized and described?
- If the conceptualization of the world changes, what about the ontology?
- How can ontology users understand a large ontology?
- Should users trust ontologies which are based on strict categorization?
- How much responsibility should we delegate to formalisms?
- Can [http://video.google.com/videoplay?docid=-7704388615049492068]?
(30) Semantics are Important and Hard
- Semantics must be captured somewhere
- Most semantic definitions are using prose and some formalism
- Completely formal semantics are hard to define and hard to use
- Semantic Web technologies may share the fate of AI