<?xml version="1.0" encoding="UTF-8"?>
<!-- $Id: xml-fall07.xml 755 2007-12-07 03:47:05Z dret $ -->
<?xslidy counter-separator=":&#160;" ?>
<?xslidy counter-format="full" ?>
<?xslidy extension-file="html" ?>
<?xslidy extension-link="" ?>
<?xslidy img-path="img" ?>
<?xslidy link-author="http://dret.net/netdret/" ?>
<?xslidy link-contents="./" ?>
<?xslidy link-glossary="http://dret.net/glossary/" ?>
<?xslidy link-home="./" ?>
<?xslidy listing-class="listing" ?>
<?xslidy listing-path="src" ?>
<?xslidy outline-class="outline" ?>
<?xslidy outline-title="Outline" ?>
<?xslidy outlink-mark="a" ?>
<?xslidy outlink-style="class(outlink)" ?>
<?xslidy part-slide-count="all" ?>
<?xslidy part-slide-text=" [*]" ?>
<?xslidy layout="ischool" ?>
<?xslidy xslidy-prefix="xslidy" ?>
<xslidy xmlns="http://dret.net/xmlns/xslidy/1" xmlns:xslidy="http://dret.net/xmlns/xslidy/1">
	<title short="XML Foundations"><a href="./" title="Course Homepage">XML Foundations</a> (INFO 242)</title>
	<author short="E. Wilde"><a href="http://dret.net/netdret/" title="dret.net">Erik Wilde</a></author>
	<affiliation short="UC Berkeley ISchool"><a href="http://www.berkeley.edu/" title="University of California, Berkeley">UC Berkeley</a> <a href="http://ischool.berkeley.edu/" title="ISchool">School of Information</a></affiliation>
	<date short="Fall 2007">Fall Semester 2007</date>
	<copyright>2007 Erik Wilde</copyright>
	<style type="text/css" src="xslidy-fall07.css"/>
	<index name="index.html">
		<category element="xml" class="xml"/>
		<category element="elem" class="xml elem"/>
		<category element="xpathf" class="xpath"/>
		<category element="xpath" class="xpath"/>
		<category element="xslte" class="xslt elem"/>
		<category element="xslta" class="xslt"/>
		<category element="xslt" class="xslt"/>
		<category element="xq" class="xq"/>
		<category element="xsde" class="xsd elem"/>
		<category element="xsda" class="xsd"/>
		<category element="xsd" class="xsd"/>
		<category element="xsdtype" class="xsd xsdprefix"/>
		<category element="http" class="http"/>
	</index>
	<toc name="toc.html">
		<table rules="all" cellspacing="0" cellpadding="5" width="100%">
			<thead>
				<tr>
					<th>Date</th>
					<th>Subject</th>
					<th>Slides</th>
					<th>Resources</th>
				</tr>
			</thead>
			<tbody>
				<xslidy:for-each-presentation>
					<tr>
						<td align="right" valign="top"><xslidy:date/></td>
						<td valign="top"><b><xslidy:title/><span class="toggle">:</span></b> <span class="toggle"><span class="abstract"><xslidy:toc class="abstract"/></span></span></td>
						<td align="center"><xslidy:presentation-link title="Lecture Slides"><xslidy:title form="short"/></xslidy:presentation-link> <xslidy:slides>(*&#160;Slides)</xslidy:slides></td>
						<td align="center"><xslidy:toc class="resources"/></td>
					</tr>
				</xslidy:for-each-presentation>
			</tbody>
		</table>
	</toc>
	<toc name="242.xml">
		<course xmlns="urn:publicid:IDN+www.sims.berkeley.edu:schema:syllabusapp:syllabus:200404:en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:publicid:IDN+www.sims.berkeley.edu:schema:syllabusapp:syllabus:200404:en syllabus_schema.xsd">
			<generalInformation>
				<title>XML Foundations</title>
				<units>2</units>
				<website>http://dret.net/lectures/xml-fall07/</website>
				<departmentListing>
					<name>SIMS</name>
					<code>INFO</code>
					<courseNumber>242</courseNumber>
				</departmentListing>
				<schedule>
					<year>2007</year>
					<semester>F</semester>
					<startDate>2007-08-28</startDate>
					<endDate>2007-12-06</endDate>
				</schedule>
				<teachingTeam>
					<teacher>
						<typeCode>Professor</typeCode>
						<initials>EW</initials>
						<name>
							<givenName>Erik</givenName>
							<familyName>Wilde</familyName>
						</name>
						<contact>
							<email>dret@berkeley.edu</email>
							<phone>
								<type>Office</type>
								<number>+1-510-6432253</number>
							</phone>
							<website>http://dret.net/netdret/</website>
						</contact>
					</teacher>
				</teachingTeam>
				<gradingOptionCode>LG</gradingOptionCode>
				<description>
					<p>Three hours of lecture, one hour of Laboratory per week. The Extensible Markup Language (XML), with its ability to define formal structural and semantic definitions for metadata and information models, is the key enabling technology for information services and document-centric business models that use the Internet and its family of protocols. This course introduces XML syntax, styles and transformations, and schema languages. It balances conceptual topics with practical skills for designing and implementing conceptual models as XML schemas.</p>
				</description>
			</generalInformation>
			<syllabus>
				<instructionFormatCode>LEC</instructionFormatCode>
				<dayPattern>
					<dayTime>
						<dayOfWeek>Tu</dayOfWeek>
						<timeSpan>
							<startTime>14:00:00</startTime>
							<endTime>15:30:00</endTime>
						</timeSpan>
					</dayTime>
					<dayTime>
						<dayOfWeek>Th</dayOfWeek>
						<timeSpan>
							<startTime>14:00:00</startTime>
							<endTime>15:30:00</endTime>
						</timeSpan>
					</dayTime>
				</dayPattern>
				<location>110 South Hall</location>
				<classes>
					<xslidy:for-each-presentation>
						<class>
							<title><xslidy:title/></title>
							<date><xslidy:date form="short"/></date>
							<xslidy:if-toc class="abstract"><description><xslidy:toc class="abstract"/></description></xslidy:if-toc>
							<resourceList>
								<resource>
									<title>Lecture Notes</title>
									<url><xslidy:presentation-link element="" prefix="http://dret.net/lectures/xml-fall07/"/></url>
								</resource>
								<xslidy:if-toc class="resources"><resource><comment><xslidy:toc class="resources"/></comment></resource></xslidy:if-toc>
							</resourceList>
						</class>
					</xslidy:for-each-presentation>
				</classes>
			</syllabus>
			<updated>
				<updateDate>Fall 2007</updateDate>
				<updateBy>dret</updateBy>
			</updated>
		</course>
	</toc>
	<presentation id="intro">
		<title short="Introduction">Overview and Introduction</title>
		<date>2007-08-28</date>
		<toc class="resources"><a href="http://www.w3.org/Press/1998/XML10-REC">XML 1.0 Press Release</a></toc>
		<toc class="abstract">The <em>Extensible Markup Language (XML)</em> has been introduced in 1998 to enable content providers to publish their content on the Web in an application-specific format. HTML was considered as conveying not enough semantics, since its only purpose was (and is) the preparation of content for Web-based publishing. XML was the first step towards machine-readable data formats for the Web, a trend that since its invention has been taken to higher levels with the idea of the <em>Semantic Web</em>. XML appeared when the Web was in the steepest part of its success curve, and since then has taken over as the globally accepted format for the exchange of machine-readable structured data.</toc>
		<slide>
			<title>Abstract</title>
			<p class="abstract"><toc class="abstract"/></p>
		</slide>
		<slide>
			<title>XML Executive Summary</title>
			<ul>
				<li>More and more value switches from goods to information</li>
				<li>Information sharing needs well-defined structures</li>
				<li>Business agility and flexibility are critical success factors</li>
				<li>Standardized formats prevent lock-in and incompatibilities</li>
				<li>XML is the most successful format for structured data</li>
				<li>XML technologies are widely used and universally available</li>
				<li>XML for B2B enables better workflow engineering</li>
				<li>XML for B2C is a good interface between B2B and Web interfaces</li>
				<li><em>XML is a mission-critical success factor for optimizing ROI and minimizing interoperability risks in today's fast-moving globalized fragmented business landscape …</em></li>
			</ul>
		</slide>
		<slide>
			<title>What's the Plan?</title>
			<ul>
				<li><link href="basics">XML Basics</link> and <link href="bestpractices">how to apply them</link></li>
				<li><link href="dtd">Describing classes of XML documents</link></li>
				<li><link href="xmlns">Combining different vocabularies of XML documents</link></li>
				<li><link href="xpath">Selecting parts of an XML document</link></li>
				<li><link href="xslt-1">Transforming XML into something else (or XML again)</link></li>
				<li><link href="xsdl-1">A more complicated way to describe classes of XML documents</link></li>
				<li><link href="schemalanguages">Even more ways of describing classes of XML documents</link></li>
				<li><link href="xquery-1">How does all of this relate to databases?</link></li>
				<li><link href="trends">What to expect as future developments</link></li>
			</ul>
		</slide>
		<slide>
			<title>What are we doing?</title>
			<img src="altova-partner.gif" style="float : right ; margin : 1em ; " href="http://www.altova.com/" title="Altova XML Spy"/>
			<ul>
				<li>Assignments</li>
				<ul>
					<li>blogs as the common theme (the perfect XML application example)</li>
					<li>how to create an XML document representing a blog</li>
					<li>how to write a schema describing this document's structure</li>
					<li>how to select parts of the blog (posts, titles, comments, …)</li>
					<li>how to transform blogs (into HTML, RSS, Atom, …)</li>
					<li>how to extract blog information from an XML database</li>
				</ul>
				<li>Tools</li>
				<ul>
					<li>XML editor such as <a href="http://www.altova.com/">Altova XML Spy</a> (XSLT and XQuery included)</li>
					<li>XSLT Processor such as <a href="http://www.saxonica.com/">Saxon</a></li>
					<li>XQuery Processor such as <a href="http://www.saxonica.com/">Saxon</a></li>
					<li>XML database such as <a href="http://www.marklogic.com/">MarkLogic</a> or <a href="http://exist.sourceforge.net/">eXist</a></li>
				</ul>
			</ul>
		</slide>
		<part>
			<title>Varia</title>
			<slide>
				<title>About Me</title>
				<ul>
					<li>Computer Science at <a href="http://www.tu-berlin.de/eng/">Technical University of Berlin (TUB)</a> (88-91)</li>
					<li>Ph.D. at <a href="http://www.ethz.ch/index_EN">ETH Zürich</a> (92-97)</li>
					<li>Post-Doc at <a href="http://www.icsi.berkeley.edu/" title="International Computer Science Institute">ICSI</a>, Berkeley (97/98)</li>
					<li>Various activities back in Switzerland (98-06)</li>
					<ul>
						<li>teaching at <a href="http://www.ethz.ch/index_EN">ETH Zürich</a> and <a href="http://www.fhnw.ch/">FHNW</a></li>
						<li>working as independent consultant (training, courses, consulting)</li>
						<li>research in <a href="http://dret.net/projects/">various XML-related areas</a></li>
					</ul>
					<li>Visiting Assistant Professor at the <a href="http://ischool.berkeley.edu/">School of Information</a> (since fall 2006)</li>
					<ul>
						<li>technical director of the <a href="http://isd.ischool.berkeley.edu/">Information and Service Design (ISD) program</a></li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>About this Course</title>
				<ul>
					<li>Course Web page: <code><a href="./">http://dret.net/lectures/xml-fall07/</a></code></li>
					<li>Course mailing list: subscribe at <code><a href="mailto:majordomo@ischool.berkeley.edu">majordomo@ischool.berkeley.edu</a></code></li>
					<ul>
						<li>no subject (leave blank)</li>
						<li>body of message: <code>subscribe i242</code></li>
					</ul>
					<li>Letter grade based on final exam (30' oral)</li>
				</ul>
			</slide>
			<slide>
				<title>About these Slides</title>
					<ul>
						<li>Generated from <a href="http://dret.net/projects/xslidy/">XSLidy</a> <a href="./xml-fall07.xml">XML</a></li>
						<ul>
							<li>all <a href="http://www.w3.org/Talks/Tools/Slidy/">Slidy</a> presentations are generated from this source</li>
							<li><code><a href="./242.xml">242.xml</a></code> for importing the syllabus into <a href="http://rosetta.sims.berkeley.edu:8085/sylvia/f07/view/242.complete">SylViA</a></li>
							<li><code><a href="./toc.html">toc.html</a></code> for displaying the summary on the <a href="./">course's Web page</a></li>
						</ul>
						<li>Designed for online presentation and use (lots of links!)</li>
						<ul>
							<li>for printing, use <q>a</q> (all slides), and then <q>s</q> (smaller font) a couple of times</li>
						</ul>
						<li>A good real-world example for XML applications</li>
						<ul>
							<li>XSLidy is useful, but there is no interface (XML editing only)</li>
							<li>SylViA is useful, but there is no interface (XML editing or XSLidy export)</li>
						</ul>
					</ul>
			</slide>
			<slide>
				<title>Additional Resources</title>
				<ul>
					<li>My <a href="http://dret.net/glossary/">Online Glossary at <code>http://dret.net/glossary/</code></a></li>
						<ul>
							<li>suggestions, updates, corrections are very welcome (really!)</li>
							<li>XML-based and XSLT-generated HTML pages</li>
						</ul>
					<li>My <a href="http://dret.net/biblio/">bibliography at <code>http://dret.net/biblio/</code></a></li>
						<ul>
							<li>suggestions, updates, corrections are very welcome (really!)</li>
						</ul>
					<li>The <a href="http://www.w3.org/"><em>World Wide Web Consortium (W3C)</em></a></li>
					<ul>
						<li>the organization which invented XML</li>
						<li>as well as (almost) all other technologies covered in this course</li>
					</ul>
				</ul>
			</slide>
		</part>
		<part>
			<title>What is XML?</title>
			<slide>
				<title>XML Yin &amp; Yang</title>
				<img src="yin-yang.png" style="float : right ; margin : 1em ; "/>
				<ul>
					<li>XML is …</li>
					<ul>
						<li>… great for exchanging trees (if this is what you want to do)</li>
						<li>… platform-independent (even your mobile phone processes XML)</li>
						<li>… a foundation for other technologies (some of which we will look at)</li>
					</ul>
				</ul>
				<ul>
					<li>XML is not …</li>
					<ul>
						<li>… a programming language (ever programmed comma-separated values?)</li>
						<li>… capturing semantics (without higher-layer consensus, XML is worthless)</li>
						<li>… ensuring interoperability (we both use bits! we can interoperate!)</li>
					</ul>
				</ul>
			</slide>
			<part>
				<title>What is XML Good for?</title>
				<slide>
					<title>Why Use XML?</title>
					<ul>
						<li>Because you want to share data</li>
						<ul>
							<li>share it in a format which is widely used and easy to use</li>
							<li>enable others to use it on various platforms with existing tools</li>
						</ul>
						<li>Because you want to share data cheaply</li>
						<ul>
							<li>it is easier to use XML than to invent something new</li>
							<li>it is even easier to use an existing XML schema than to invent a new one</li>
						</ul>
						<li>Because you want to share data openly</li>
						<ul>
							<li>if you invent new formats, people must process them</li>
							<li>avoid applying the <q>security through obscurity</q> principle inadvertently</li>
							<li>application-specific processing should be deferred to higher layers</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>Is XML Self-Describing?</title>
					<ul>
						<li>XML is often said to be <q>self-describing</q></li>
						<ul>
							<li>many people think this is the same as <q>self-explanatory</q></li>
							<li>the catch is what exactly it is you refer to by <q>describing</q></li>
						</ul>
						<li>Database data cannot live without a database</li>
						<ul>
							<li>database data is simply content, the structure is provided by a DBMS</li>
							<li>XML documents have their structure encoded within them</li>
							<li>compared to database data, XML in fact is <q>self-describing</q></li>
						</ul>
						<li>What is the gap between <q>self-describing</q> and <q>self-explanatory</q>?</li>
						<ul>
							<li>it is impossible to find out how the document could be modified</li>
							<li>there are no semantics associated with neither structure nor content</li>
							<li>so <q>self-describing</q> means, you can guess a lot, but you maybe wrong</li>
						</ul>
					</ul>
				</slide>
			</part>
			<part>
				<title>What is XML not Good for?</title>
				<slide>
					<title>XML is Character-Based</title>
					<ul>
						<li>XML is <u>not</u> a binary format, it is <link href="unicode">based on Unicode</link></li>
						<ul>
							<li><q>binary structures</q> cannot (or rather should not) be described using XML</li>
						</ul>
						<li>Multimedia formats often are binary</li>
						<ul>
							<li>image formats such as GIF, JPEG, and PNG</li>
							<li>audio formats such as MP3 and AAC</li>
							<li>video formats such as MPEG4 and H.264</li>
						</ul>
						<li>But: multimedia also uses many XML formats</li>
						<ul>
							<li>vector graphics formats such as <em>Scalable Vector Graphics (SVG)</em></li>
							<li><em>Synchronized Multimedia Integration Language (SMIL)</em> for describing presentations</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>XML is a Syntax for Trees</title>
					<ul>
						<li>Not all data is easily represented by trees</li>
						<ul>
							<li>overlapping markup (multiple <q>views</q> of the same content)</li>
							<li>graph-like structures which are less constrained than trees</li>
						</ul>
						<li>What is it that you have in your tree?</li>
						<ul>
							<li>XML encodes a structure purely on the syntactic level</li>
							<li>what the structures <u>mean</u> is in no way described by XML</li>
							<li>XML structures must be accompanied by semantic descriptions</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>XML Usages</title>
					<ul>
						<li>XML can be used <link href="bestpractices">in different ways</link></li>
						<ul>
							<li>people should be able to use your XML directly using standard tools</li>
							<li>if they <em>absolutely need</em> a set of special tools, something is wrong</li>
						</ul>
						<li>XML is hip, so everybody wants to use it</li>
						<ul>
							<li>many things have been created ad-hoc and without much planning</li>
							<li>if you start something which is XML-based, use XML responsibly</li>
							<li>if you have to use some <q>bad XML</q>, complain about it</li>
						</ul>
						<li>Finding the balance can be hard</li>
						<ul>
							<li>XML is great for prototyping and experiments</li>
							<li>once you decide to redesign your XML, it may be too late</li>
							<li><em>XML documents</em> may be short-lived, <em>XML schemas</em> are definitely not</li>
						</ul>
					</ul>
				</slide>
			</part>
		</part>
		<part>
			<title>Why XML?</title>
			<slide>
				<title>Web Technologies</title>
				<ul>
					<li>Early Web: URI+HTTP+HTML</li>
					<ul>
						<li>URIs identify resources (in a human-readable way)</li>
						<li>HTTP retrieves resources (using a simple protocol)</li>
						<li>HTML is the resource format (using a simple data format)</li>
					</ul>
					<li>The early Web was a distributed hypermedia system</li>
					<ul>
						<li>not designed by hypermedia researchers or companies</li>
						<li>simple enough to be adopted very fast</li>
					</ul>
					<li>The Web today uses many different technologies</li>
					<ul>
						<li>URI+HTTP+HTML for basic Web publishing</li>
						<li>CSS &amp; JavaScript (maybe even Ajax) for advanced publishing</li>
					</ul>
					<li>JavaScript &amp; XML (a.k.a. Ajax)</li>
					<ul>
						<li>scripts dynamically loading data from a server</li>
						<li>machine-to-machine interaction: the server and the script</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>From Humans to Machines</title>
				<ul>
					<li>The Web was designed for humans</li>
					<ul>
						<li>HTML is a language for describing page layout and links</li>
						<li>machines were only used for implementing it</li>
					</ul>
					<li>Search engines were the first machine users on the Web</li>
					<ul>
						<li>they made the Web's success possible</li>
						<li>they demonstrated how hard it is to <q>understand</q> HTML pages</li>
						<li>search engines are still a very active field of research</li>
					</ul>
					<li>A bigger Web needs more automation</li>
				</ul>
			</slide>
			<part>
				<title>Pre-XML Problems</title>
				<slide>
					<title>HTML is for Humans</title>
					<ul>
						<li>HTML is a format for <q>dead ends</q></li>
						<ul>
							<li>HTML is good for rendering Web pages</li>
							<li>HTML is bad for understanding Web pages</li>
							<li>the browser is a <q>dead end</q> (from a machine's point of view)</li>
						</ul>
						<li>Web growth in the late 90's was enormous</li>
						<ul>
							<li>everybody was putting information <q>online</q></li>
							<li>but this information was inaccessible for machines</li>
						</ul>
						<li>How can this information be made accessible to machines?</li>
						<ul>
							<li>HTML is not the right format (slightly better than fax machines)</li>
							<li>there was no other widely accepted format for structured data</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>A Machine-Friendly Web</title>
					<ul>
						<li>Information should be published in a machine-understandable format</li>
						<ul>
							<li>HTML is good for rendering Web pages</li>
							<li>HTML is bad for understanding Web pages</li>
							<li><q>understanding</q> is the key term here: <u>application</u> semantics!</li>
						</ul>
						<li>Information should be published in application-specific formats</li>
						<ul>
							<li>HTML is one application: Rendering documents for humans</li>
							<li>machines need other structures to process Web content</li>
						</ul>
						<li>1996: W3C Working Group <q>SGML on the Web</q></li>
						<ul>
							<li>HTML is just one document type defined with SGML</li>
							<li>SGML is a very complex and expensive technology</li>
							<li>how can SGML be made easily and widely usable?</li>
						</ul>
					</ul>
				</slide>
			</part>
			<part>
				<title>XML on the Web</title>
				<slide>
					<title>SGML, HTML, and XML</title>
					<ul>
						<li>Standard Generalized Markup Language (SGML)</li>
						<ul>
							<li>a language for designing <em>document types</em></li>
							<li>a very complex standard with many expensive and non-interoperable implementations</li>
						</ul>
						<li>Hypertext Markup Language (HTML)</li>
						<ul>
							<li>implements <a href="http://www.w3.org/TR/REC-html40/sgml/loosedtd.html">a simple SGML <em>document type</em></a></li>
							<li>its syntax is <a href="http://www.oasis-open.org/cover/sgmlsyn/sgmlsyn.htm">SGML syntax</a>, it is not defined by HTML itself</li>
							<li>uses very few SGML features, dedicated processors are rather easy to build</li>
						</ul>
						<li>Extensible Markup Language (XML)</li>
						<ul>
							<li>a language for designing <em>document types</em> (i.e., classes of documents)</li>
							<li>a greatly simplified version of SGML, omitting many obscure features</li>
							<li>a specification with <u>no optional parts!</u></li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>XML Documents on the Web</title>
					<ul>
						<li>XML's idea was that content should be published as XML</li>
						<ul>
							<li>stylesheets could then be used to render human-readable views</li>
							<li>machines could simply use the underlying XML</li>
						</ul>
						<li>There are (almost) no XML documents on the Web</li>
						<ul>
							<li>stylesheet support depends on browsers (software has a long life!)</li>
							<li>many content providers do not want to publish machine-readable data</li>
						</ul>
						<li>There are many XML documents behind HTML documents</li>
						<ul>
							<li>content does not have to be made public in a machine-readable way</li>
							<li>browser-independent HTML can be produced from XML</li>
							<li>XML technologies can be leveraged on the server-side</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>XML Documents Elsewhere</title>
					<ul>
						<li>XML is not used as intended, but it is very successful</li>
						<ul>
							<li>as a server-side foundation for Web publishing</li>
							<li>as a B2B-focused format with no Web publishing in mind</li>
						</ul>
						<li>XML has been successful because of different reasons</li>
						<ul>
							<li>being there at the right time (Internet bubble)</li>
							<li>politically correct (the W3C is OS-agnostic)</li>
							<li>technically sound (simple and no optional parts)</li>
							<li>human-readable based on a well-known syntax</li>
							<li>great for rapid prototyping and experiments</li>
						</ul>
					</ul>
				</slide>
			</part>
			<part>
				<title>XML Today</title>
				<slide>
					<title>Used Everywhere</title>
					<ul>
						<li>Very small: Messages from sensors</li>
						<ul>
							<li>e.g., building automation or car electronics</li>
							<li>mostly implemented in hardware or firmware</li>
						</ul>
						<li>Very large: Genome sequences</li>
						<ul>
							<li>encoding the results of genome analyses</li>
							<li>yields very large XML documents (several gigabytes)</li>
						</ul>
						<li>Very different processing requirements</li>
						<ul>
							<li>very fast processing (time critical applications)</li>
							<li>memory-conserving processing (very large documents)</li>
							<li>incremental processing (streaming)</li>
							<li>random access (only small parts required)</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>This Course and XML</title>
					<ul>
						<li><q>XML is the ASCII for the 21<sup>st</sup> century</q></li>
						<ul>
							<li>information professionals should know and use XML</li>
							<li>you will see it in many projects</li>
							<li>you will hopefully use it in many projects</li>
							<li>you will be able to build and test prototypes very rapidly</li>
						</ul>
						<li>What do you need for using XML?</li>
						<ul>
							<li>XML and some kind of schema language</li>
							<li>XSLT for processing it</li>
						</ul>
					</ul>
				</slide>
			</part>
		</part>
		<part>
			<title>Beyond XML</title>
			<slide>
				 <title>Sharing Concepts</title>
				 <ul>
					<li>XML is a syntax for trees</li>
					 <ul>
						<li>trees are just structured data</li>
						<li>for doing something useful, you must <em>understand the trees</em></li>
					 </ul>
					<li>Schema-based sharing of concepts is possible</li>
					 <ul>
						<li>HTML works great because everybody is using it</li>
						<li>Anything beyond HTML's capabilities needs a new schema</li>
					 </ul>
					<li>General sharing of concepts is hard</li>
					 <ul>
						<li>the AI community tried for decades and failed</li>
						<li>micro-formats are a more humble approach to <q>reusable shared concepts</q></li>
						<li>agreement in communities gets exponentially harder with their size</li>
					 </ul>
				 </ul>
			</slide>
			<slide id="intro-semweb">
				<title>The Semantic Web</title>
				<ul>
					<li>Technologies for describing concepts</li>
					<ul>
						<li>the foundation of successful interaction is <em>mutual understanding</em></li>
						<li>describe your XML using Semantic Web technologies</li>
					</ul>
					<li>XML core technologies do not convey any meaning</li>
					<ul>
						<li>XML is a language for exchanging trees</li>
						<li>XML schema languages describe what trees may be exchanged</li>
						<li>XML schema languages are for <em>markup design</em></li>
					</ul>
					<li>Semantic Web technologies have received a lot of attention</li>
					<ul>
						<li>and a lot of research funding</li>
						<li>success for the most general approaches is questionable</li>
						<li>proven failure as demonstrated by <a href="http://technetcast.ddj.com/tnc_play_stream.html?stream_id=526">AI's failure</a></li>
						<li>modest approaches are much more promising and likely to succeed</li>
					</ul>
				</ul>
			</slide>
		</part>
	</presentation>
	<presentation id="blogxml">
		<title>Blogging in XML</title>
		<date>2007-08-30</date>
		<toc class="resources"></toc>
		<toc class="abstract">XML in used in a wide variety of application scenarios, resulting in a wide variety of requirements. This lecture introduces the application example used in this course, which is the representation of blog data in XML. Blogs are a good example for XML, because of their mix of structured data (blog post metadata) and textual data (the actual blog post), the requirement to derive different views (such as weekly and monthly summaries) from the same set of data, and the requirement to make the data available in various output formats (such as HTML and RSS).</toc>
		<slide>
			<title>Abstract</title>
			<p class="abstract"><toc class="abstract"/></p>
		</slide>
		<part id="xmlblogging-xml">
			<title>BlogXML</title>
			<slide>
				<title>Blog Structures</title>
				<ul>
					<li>Blogs have a number of recurring features</li>
					<ul>
						<li>they are a sequentially ordered series of blog posts</li>
						<li>a blog has an owner and a permanent URI</li>
						<li>posts have a date and content</li>
						<li>content can be anything from plain text to complex HTML structures</li>
					</ul>
					<li>Blog posts can be regarded as individual documents</li>
					<ul>
						<li>the complete blog (the collection of all posts) is one <em>big compound document</em></li>
						<li>for advanced publishing, the blog is more useful than isolated posts</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title><code>dretblog.xml</code></title>
				<listing src="dretblog.xml"/>
			</slide>
		</part>
		<part id="xmlblogging-dtd">
			<title>Rules for BlogXML</title>
			<slide>
				<title>Structural Constraints</title>
				<listing src="blogxml.dtd"/>
			</slide>
			<slide>
				<title>Adding Datatype Constraints</title>
				<listing src="blogxml.xsd"/>
			</slide>
			<slide>
				<title>A Clearer View</title>
				<img style="width : 90% ; margin : 4% ;" src="blogxml-xsd.png" title="BlogXML XSDL"/>
			</slide>
			<slide>
				<title>Less Constraints</title>
				<img style="width : 90% ; margin : 4% ;" src="blogxml-xsd-unbounded.png" title="BlogXML XSDL (Repeatable Images)"/>
			</slide>
		</part>
		<part id="xmlblogging-xpath">
			<title>Selecting BlogXML Content</title>
			<slide>
				<title>Using XML Structures</title>
				<ul>
					<li>How many blog posts?</li>
					<pre>count(//post)</pre>
					<li>The title of the second post?</li>
					<pre>//post[2]/title</pre>
					<li>How many days after the preceding post?</li>
					<pre>for $i in //post return days-from-duration(xs:date($i/@date) - xs:date($i/preceding-sibling::post[1]/@date))</pre>
					<li>How many days before the last post?</li>
					<pre>for $i in //post return days-from-duration(xs:date(//post[last()]/@date) - xs:date($i/@date))</pre>
				</ul>
			</slide>
		</part>
		<part id="xmlblogging-html">
			<title>Publishing BlogXML</title>
			<slide>
				<title>Generating HTML from BlogXML</title>
				<listing src="blog2html.xsl"/>
			</slide>
			<slide>
				<title>One Page Blog</title>
				<listing src="dretblog.html" line="1-10"/>
			</slide>
			<slide>
				<title>Generating a Blog from BlogXML</title>
				<listing src="blog2html2.xsl"/>
			</slide>
			<slide>
				<title>Hyperlinked Blog</title>
				<listing src="dretblog2.html"/>
				<listing src="2007-05-15.html"/>
			</slide>
		</part>
		<part id="xmlblogging-atom">
			<title>Syndicating BlogXML</title>
			<slide>
				<title>Generating Atom from BlogXML</title>
				<listing src="blog2atom.xsl"/>
			</slide>
			<slide>
				<title>dretblog Atom Feed</title>
				<listing src="dretblog.atom" line="2-26"/>
			</slide>
		</part>
		<part id="xmlblogging-xdbms">
			<title>Managing BlogXML</title>
			<slide>
				<title>Files vs. Databases</title>
				<ul>
					<li>XML typically is managed in documents</li>
					<ul>
						<li>XML has its roots in the document processing area</li>
					</ul>
					<li>What is the best granularity for XML documents?</li>
					<ul>
						<li>each post as a document that is individually managed</li>
						<li>each blog as a document that is individually managed</li>
						<li>all blogs as one big documents containing all data</li>
						<li>additional data such as users, groups, access rights, …</li>
					</ul>
				</ul>
			</slide>
		</part>
		<part id="xmlblogging-conclusions">
			<title>Conclusions</title>
			<slide>
				<title>XML Blogs!</title>
				<ul>
					<li>XML as the starting point for handling structured data</li>
					<li>Designing a data model (and the schema) is a key issue</li>
					<li>Working with XML is supported by various technologies</li>
					<li>Transformations produce new structures for data reuse</li>
					<li>Large amounts of XML data can be stored in XML databases</li>
				</ul>
			</slide>
		</part>
	</presentation>
	<presentation id="basics">
		<title short="Basics">XML Basics</title>
		<date>2007-09-04</date>
		<toc class="resources"><a href="http://www.w3.org/TR/REC-xml/" title="W3C XML 1.0 Specification">Spec</a></toc>
		<toc class="abstract">The <em>Extensible Markup Language (XML)</em> defines a simple way for structuring data. The power and popularity of XML can be explained by its versatility, the platform-independence, the standards and technologies leveraging it, and the number of tools and products supporting it. Understanding XML itself is rather simple, it only depends on a very small set of other technologies. Unicode and URIs are the most important foundations of XML. XML itself specifies two different things: on the one hand the format for structured data, which are called <em>XML documents</em>, and on the other hand a constraint language for XML documents, which is called <em>Document Type Definition (DTD)</em>.</toc>
		<slide>
			<title>Abstract</title>
			<p class="abstract"><toc class="abstract"/></p>
		</slide>
		<part>
			<title>Foundations for XML</title>
			<slide>
				<title>Identifications</title>
				<ul>
					<li>Identification of Character Encodings</li>
					<ul>
						<li>text can be encoded using different character sets and encodings</li>
						<li>IANA maintains the <a href="http://www.iana.org/assignments/character-sets">official list of character encodings</a></li>
						<li>character encoding is about <em>characters</em>, not about <em>text</em></li>
					</ul>
					<li>Identification of Languages</li>
					<ul>
						<li>textual content should be tagged with language information</li>
						<li>specification based on <a href="http://www.loc.gov/standards/iso639-2/langhome.html">ISO 639 language tags</a></li>
						<li>language identification is about <em>text</em>, not about <em>characters</em></li>
					</ul>
				</ul>
			</slide>
			<part id="unicode">
				<title>Unicode</title>
				<slide>
					<title>XML's Idea of Content and Names</title>
					<p>XML documents can use a wide array of characters. They are defined by <a href="http://www.unicode.org/">Unicode</a>, which currently (Version 5.0) defines more than 100'000 characters (#100'000 added in 2005).</p>
					<listing src="japanese1.xml"/>
					<listing src="japanese2.xml"/>
				</slide>
				<slide>
					<title>XML and Unicode</title>
					<ul>
						<li>XML is based on Unicode</li>
						<ul>
							<li>XML is defined in terms of <a href="http://www.w3.org/TR/xml/#sec-starttags">character structures</a></li>
							<li>how these characters are encoded is not part of XML</li>
						</ul>
						<li>How are XML documents encoded?</li>
						<ul>
							<li>applications can use any character encoding they like</li>
							<li>XML processors <em>must</em> support UTF-8 and UTF-16</li>
							<li>XML processors <em>may</em> support any number of additional encodings</li>
						</ul>
						<li>How is the encoding <q>encoded</q>?</li>
						<ul>
							<li>part of the XML document: <code>&lt;?xml version="1.0" encoding="UTF-8"?></code></li>
							<li>bootstrap problem solved heuristically or by out-of-band information</li>
						</ul>
					</ul>
				</slide>
			</part>
			<part id="uri">
				<title>Uniform Resource Identifier (URI)</title>
				<slide>
					<title>Identifiers are Essential</title>
					<ul>
						<li><em>Uniform Resource Locator (URL)</em> is the old concept</li>
						<ul>
							<li>introduced to distinguish between <em>locating</em> and <em>naming</em></li>
							<li><em>locating</em> and <em>naming</em> are two ways of <em>identification</em></li>
							<li>URLs have been replaced by URIs, technically URLs do not exist anymore</li>
						</ul>
						<li>URIs identify resources</li>
						<ul>
							<li>some resources may be retrieved using a protocol: <code href="">http://dret.net/netdret/</code></li>
							<li>not all resource access is retrieval: <code href="mailto:dret@berkeley.edu">mailto:dret@berkeley.edu</code></li>
							<li>sometimes computers are not required: <code href="tel:+1-510-6432253">tel:+1-510-6432253</code></li>
							<li>or resources cannot be located: <code href="urn:ietf:rfc:2648">urn:ietf:rfc:2648</code></li>
							<li>or location is the only means of identification: <code href="http://maps.google.com/maps?hl=en&amp;ie=UTF8&amp;om=1&amp;ll=27.988262,86.925277&amp;t=k">geo:27.988056;86.925278</code></li>
						</ul>
					</ul>
				</slide>
			</part>
		</part>
		<part>
			<title>XML</title>
			<slide>
				<title>XML Use Cases</title>
				<ul>
					<li>XML is a metalanguage supporting application-specific vocabularies</li>
					<li><em>RSS</em> (and <em>Atom</em>) are XML vocabularies for newsfeeds</li>
					<ul>
						<li><a href="http://docordie.blogspot.com/">Doc or Die</a>: <a href="http://docordie.blogspot.com/rss.xml">RSS feed</a> vs. <a href="http://docordie.blogspot.com/atom.xml">Atom feed</a></li>
						<li>browsers now incorporate and/or integrate newsfeed readers</li>
					</ul>
					<li><em>OpenDocument (ODF)</em> is a language for office application documents</li>
					<ul>
						<li>designed for open and interoperable exchange</li>
						<li>standardized by ISO (which now also standardizes Microsoft's <em>Open XML</em>)</li>
					</ul>
					<li><em>Scalable Vector Graphics (SVG)</em> for portable vector graphics</li>
					<ul>
						<li>designed for embedding in Web pages</li>
						<li>good example for compound documents: <a href="http://www.carto.net/papers/svg/animated_weather_symbols/">HTML containing SVG</a></li>
					</ul>
				</ul>
			</slide>
			<part>
				<title>XML Documents</title>
				<slide>
					<title>Markup?</title>
					<ul>
						<li>Structures are encoded using special characters</li>
						<ul>
							<li>a fundamental difference when comparing to binary formats</li>
							<li>markup languages can be read and modified using text-based tools</li>
							<li>programs must treat markup characters in a special way</li>
						</ul>
						<li>Documents are content interspersed with markup (i.e., structures)</li>
						<ul>
							<li>XML-aware software interprets the markup</li>
							<li>XML-unaware software just sees a text file</li>
							<li>modifications must be made XML-aware (e.g., inserting <q>AT&amp;T</q> as <q>AT&amp;amp;T</q>)</li>
						</ul>
						<li>You have to pay the <link href="markup-price"/></li>
					</ul>
				</slide>
				<slide>
					<title>Basic Concepts</title>
					<ul>
						<li>XML Documents have an <em>XML declaration</em> (optional)</li>
						<li>There is exactly one <em>document element</em> (a.k.a. <em>root element</em>)</li>
						<li>Elements may be nested (there is no conceptual limit)</li>
						<ul>
							<li>elements may be repeated (they can be identified by position)</li>
						</ul>
						<li>Elements are marked up using <em>tags</em></li>
						<ul>
							<li>most elements have content, surrounded by <em>start</em> and <em>end tags</em></li>
							<li>empty elements are allowed and may use a special notation</li>
						</ul>
						<li>Elements may have attributes (zero to any number)</li>
						<ul>
							<li>attributes can only occur once on an element (i.e., they cannot be repeated)</li>
						</ul>
					</ul>
					<listing src="my-first.xml"/>
				</slide>
				<slide id="xmltree">
					<title>Tree Syntax</title>
					<ul>
						<li>Markup is important, but only a notation</li>
						<li>XML documents are trees with different node types</li>
						<ul>
							<li>nodes so far: document, element, attribute, text</li>
						</ul>
						<img style="width : 90% ; margin : 4% ;" src="document-tree.png" title="XML document tree"/>
					</ul>
				</slide>
				<slide id="xmlelements">
					<title>Elements</title>
					<ul>
						<li>Elements can use a <a href="http://www.w3.org/TR/xml/#NT-Name">wide variety of names</a></li>
						<ul>
							<li>Allowed: <elem>html</elem>, <elem>id9832798472</elem>, <elem>_</elem>, <elem>:</elem>, <elem>こんにちは</elem></li>
							<li>Disallowed: leading numbers, spaces, control characters</li>
						</ul>
						<li>Element names usually convey some information about the content</li>
						<ul>
							<li>this is not reliable and highly language-dependent</li>
							<li>it is <em>very useful</em> when working with a known vocabulary</li>
							<li>it is <em>potentially harmful</em> when working with an unknown vocabulary</li>
						</ul>
						<li>Elements are the foundation for XML's versatility</li>
						<ul>
							<li>they can be nested (<code>&lt;address>&lt;city>Berkeley&lt;/city>&lt;zip>94709&lt;/zip>…</code>)</li>
							<li>they can be repeated (<code>&lt;givenname>Erik&lt;/givenname>&lt;givenname>Thomas&lt;/givenname></code>)</li>
							<li>their sequence can convey additional information (given names have a sequence)</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>Attributes</title>
					<ul>
						<li>Additional information pertaining to elements</li>
						<li>Traditionally, anything that is not considered <q>content</q></li>
						<ul>
							<li>SGML is a document markup language</li>
							<li>XML uses SGML's concepts</li>
							<li>XML has its roots in the document world</li>
						</ul>
						<li>Elements: Content (i.e., Data); Attributes: Metadata</li>
						<li>Documents often distinguish by what is textual content</li>
					</ul>
					<listing src="section.xml" line="12-20"/>
				</slide>
				<slide>
					<title>Attribute Syntax</title>
					<ul>
						<li>Naming rules are the same as for <link href="xmlelements"/></li>
						<li>Attributes always appear within an element's <em>start tag</em></li>
						<li>Attributes are <a href="http://www.w3.org/TR/xml/#NT-Attribute">name/value-pairs</a></li>
						<ul>
							<li>the value is enclosed in single or double quotes</li>
						</ul>
						<li>Attribute with a single-quote value: <elem>elem attr="Single: '"/</elem></li>
						<li>Attribute with a double-quote value: <elem>elem attr='Double :"'/</elem></li>
						<li>How can attribute values contain both?</li>
					</ul>
				</slide>
				<slide id="markup-price">
					<title>The Price for Markup</title>
					<ul>
						<li>Markup characters have a special meaning</li>
						<ul>
							<li><q>&lt;</q> opens a tag</li>
							<li>for attribute values, quotes delimit the value</li>
						</ul>
						<li>The literal use of a markup character requires escaping</li>
						<ul>
							<li>XML's <em>entities</em> can refer to pieces of content</li>
							<li>entity syntax is <code>&amp;name;</code> for referring to the entity <q><code>name</code></q></li>
							<li>XML has 5 <a href="http://www.w3.org/TR/xml/#sec-predefined-ent">predefined entities</a>: <code>&amp;lt;</code>, <code>&amp;gt;</code>, <code>&amp;amp;</code>, <code>&amp;apos;</code>, <code>&amp;quot;</code></li>
						</ul>
						<li>Attribute using both kinds of quotes: <code>&lt;elem attr="Single ' and Double &amp;quot;"/></code></li>
					</ul>
					<pre><![CDATA[<li>Attribute using both kinds of quotes: <code>&lt;elem attr="Single ' and Double &amp;quot;"/></code></li>]]></pre>
				</slide>
				<slide id="mixed-content">
					<title>Mixed Content</title>
					<p>The term <em>Mixed content</em> in XML refers to elements <a href="http://www.w3.org/TR/xml/#sec-mixed-content">which have text content mixed with elements</a>. What these elements do depends on the elements <img style="height : 1em" src="smiley.gif"/>, but the important point is that they are on the same level as the text nodes of the mixed content.</p>
					<pre><![CDATA[<p>The term <em>Mixed content</em> in XML refers to elements <a href="http://www.w3.org/TR/xml/#sec-mixed-content">which have text content mixed with elements</a>. What these elements do depends on the elements <img style="height : 1em" src="smiley.gif"/>, but the important point is that they are on the same level as the text nodes of the mixed content.</p>]]></pre>
					<img style="width : 90% ; margin : 4% ;" src="mixed-content.png" title="XML tree for mixed content"/>
				</slide>
				<slide>
					<title>Mixed Content Usage</title>
					<ul>
						<li>Database people find mixed content irritating</li>
						<ul>
							<li>cannot be easily mapped to relational structures</li>
							<li>is more <em>document-like</em> than <em>data-like</em></li>
							<li>much harder to optimize for query analysis and query processing</li>
						</ul>
						<li>Document people find mixed content very intriguing</li>
						<ul>
							<li>textual content can still be used as simple text</li>
							<li>markup provides additional information for rich text</li>
							<li>start with a text-only document and use markup to add structure to it</li>
						</ul>
					</ul>
				</slide>
				<slide id="whitespace">
					<title>Whitespace</title>
					<ul>
						<li>XML documents often are pretty-printed</li>
						<li><em>Whitespace text nodes</em> often are <q>not really content</q></li>
						<ul>
							<li>XML whitespace characters are <em>space</em>, <em>tab</em>, <em>newline</em>, and <em>carriage return</em></li>
							<li>whitespace text nodes are text nodes containing <em>only</em> whitespace characters</li>
						</ul>
						<img style="width : 90% ; margin : 4% ;" src="document-tree-whitespace.png" title="XML tree with whitespace text nodes"/>
					</ul>
				</slide>
				<slide>
					<title>Significant Whitespace</title>
					<ul>
						<li>Some whitespace text nodes are relevant</li>
						<li>Usually text nodes in <em>mixed content</em> elements</li>
					</ul>
					<p>Whitespace <i>can be</i> <u>very</u> <b>important</b>!</p>
					<pre><![CDATA[<p>Whitespace <i>can be</i> <u>very</u> <b>important</b>!</p>]]></pre>
					<img style="height : 40% ; margin : 2% ;" src="significant-whitespace.png" title="XML tree containing significant whitespace"/>
				</slide>
			</part>
			<part id="wellformed">
				<title>Processing XML</title>
				<slide>
					<title>Observing XML Syntax</title>
					<ul>
						<li>XML's syntax requires you to use the right characters</li>
						<ul>
							<li><a href="http://www.w3.org/TR/xml/#NT-element">the grammar alone</a> allows many XML errors</li>
							<li><a href="http://www.w3.org/TR/xml/#GIMatch">additional constraints</a> ensure that everything is used correctly</li>
						</ul>
						<li><em>XML processors</em> (a.k.a. <em>XML parsers)</em> check for these rules</li>
						<ul>
							<li>if there are problems, the document cannot be interpreted as XML</li>
							<li>otherwise, the document is said to be <em>well-formed</em></li>
						</ul>
						<li>Only well-formed documents can be regarded as a tree</li>
						<ul>
							<li>other documents are not XML at all, even though they may be close</li>
							<li>XML processors must report problems to the application (no <em>silent recovery</em>)</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>Validity</title>
					<ul>
						<li><em>Well-formed documents</em> observe XML rules</li>
						<ul>
							<li>they observe the XML syntax</li>
							<li>they observe all well-formedness constraints</li>
						</ul>
						<li>Applications require the right elements and attributes</li>
						<li><em>Validity</em> is a more comprehensive concept</li>
						<li><em>Valid documents</em> observe additional rules</li>
						<ul>
							<li>they must be well-formed documents</li>
							<li>they must adhere to the constraints defined in a <link href="dtd"/></li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>Semantics</title>
					<ul>
						<li>XML is a language for encoding trees</li>
						<ul>
							<li>Elements and attributes are labeled node in this tree</li>
							<li>the labels can be chosen freely by document authors</li>
						</ul>
						<li>The tree's meaning is nothing XML is concerned with</li>
						<ul>
							<li>peers must have a mutual understanding of the semantics</li>
							<li>XML without mutual understanding is almost useless</li>
							<li>reverse engineering often is possible, but it is risky and brittle</li>
						</ul>
					</ul>
				</slide>
			</part>
		</part>
		<part>
			<title>Conclusions</title>
			<slide>
				<title>XML Documents</title>
				<ul>
					<li>XML documents are structured data using markup</li>
					<li>Elements and Attributes are the main structuring mechanisms</li>
					<li>Elements and Attributes have names, but have no inherent semantics</li>
					<li>For using XML successfully, <em>shared semantics</em> are essential</li>
					<li><a href="a/1/">Assignment 1</a> asks you to think about semantics</li>
				</ul>
			</slide>
		</part>		
	</presentation>
	<presentation id="processing">
		<title>Processing XML</title>
		<date>2007-09-06</date>
		<toc class="resources"><a href="http://www.w3.org/DOM/" title="W3C DOM Home">DOM</a>&#160;· <a href="http://sax.sourceforge.net/">SAX</a></toc>
		<toc class="abstract">XML is a format for structured data, but it does not prescribe any way of processing these structures. In practice, XML data has to processed by using XML-specific support in some programming environment. In this lecture, the most popular ways of processing XML data are discussed; the <em>Document Object Model (DOM)</em> as a tree-based data model, the <em>Simple API for XML (SAX)</em> as an event-based programming model, and <em>XSL Transformations (XSLT)</em> as a dedicated programming language for transforming XML.</toc>
		<slide>
			<title>Abstract</title>
			<p class="abstract"><toc class="abstract"/></p>
		</slide>
        <part id="xml-processing">
			<title>Processing XML</title>
			<slide>
				<title>XML and Programming</title>
				<ul>
					<li>XML is a format for structured data</li>
					<ul>
						<li>trees do not map very well to most programming languages</li>
						<li>for working with XML, some mapping into the language is required</li>
					</ul>
					<li>There are two basic approaches for programming with XML:</li>
					<ol>
						<li>use special functions to work on XML documents as external data objects</li>
						<li>map XML documents to native data structures of the programming language</li>
					</ol>
					<li>A third approach is to have an <q>XML programming language</q></li>
					<ul>
						<li><link href="xslt-1">XSLT</link> is an example for an XML programming language</li>
						<li><link href="xsdl-1">XSDL</link> and <link href="xpath">XPath</link> become an integral part of Java in <a href="http://www.research.ibm.com/xj/">XJ</a></li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>XML and Programming Languages</title>
				<ul>
					<li>Most programming languages do not support XML natively</li>
					<ul>
						<li>a certain impedance mismatch between both models in unavoidable</li>
					</ul>
					<li>Function libraries (or their equivalent) can provide XML processing facilities</li>
					<ul>
						<li><link href="sax">SAX</link> as an event-based API for accessing XML documents</li>
						<li><link href="dom">DOM</link> as a tree-based API for accessing XML documents</li>
					</ul>
					<li>Mapping between XML and the programming language can take two forms</li>
					<ul>
						<li>using hand-crafted code (based on XML functions) that performs the mapping</li>
						<li>generating code using an XML schema and/or target data structures in the language</li>
					</ul>
					<li>Generating mapping code can be done in two ways</li>
					<ul>
						<li>using a generic <link href="databinding"/> framework for the mapping</li>
						<li>using hand-crafted code that can be better tailored to the schemas</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Typical XML &amp; Programming Problem</title>
				<ul>
					<li><a href="../web-fall07/ajax">Asynchronous JavaScript and XML (Ajax)</a> is based on HTTP &amp; XML</li>
					<ul>
						<li>JavaScript code can communicate with the server using <code href="http://www.w3.org/TR/XMLHttpRequest/" title="W3C XMLHttpRequest Spec">XMLHttpRequest</code></li>
						<li>in theory, the server sends XML data which is processed by the script</li>
					</ul>
					<li>XML parsing and processing is inconvenient in JavaScript</li>
					<ul>
						<li>there is a impedance mismatch between JavaScript and XML</li>
						<li>if the client is slow and the XML is big, parsing can be time-consuming</li>
						<li>if all clients are JavaScript sending XML is not really necessary</li>
					</ul>
					<li><em>JavaScript Object Notation (JSON)</em> is a JavaScript-centric data model</li>
					<ul>
						<li>JavaScript code can directly instantiate JSON structures as runtime objects</li>
						<li>any non-JavaScript client (if there are any) will have to use JSON as well</li>
					</ul>
				</ul>
			</slide>
        </part>
        <part id="sax">
			<title>Simple API for XML (SAX)</title>
			<slide>
				<title>Lightweight XML Processing</title>
				<ul>
					<li>SAX is an event-based API for accessing XML documents</li>
					<li>SAX allows users to use event handlers for parsing-related events</li>
					<ul>
						<li>the parser reads a document and recognizes markup structures</li>
						<li>for each recognized structure, a user-supplied function can be called</li>
					</ul>
					<li>SAX parsing requires little memory and can handle very large documents</li>
					<ul>
						<li>the breadth of the XML document tree is irrelevant to SAX parsing</li>
						<li>the depth of the tree is relevant for checking for well-formed documents</li>
					</ul>
					<li>SAX parsing does not allow random access or backward movement</li>
					<ul>
						<li>saving context and history is something the application has to manage</li>
						<li>at a certain complexity, SAX parsing requires a lot of additional code</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>SAX Parser</title>
				<img style="width : 90% ; margin : 2% ; " src="sax-parser.png" title="SAX Parser"/>
			</slide>
        </part>
        <part id="dom">
			<title>Document Object Model (DOM)</title>
			<slide>
				<title>XML Trees Everywhere</title>
				<ul>
					<li>DOM is a tree-based API for accessing XML documents</li>
					<ul>
						<li>the specification using a <a href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/idl-definitions.html">language-independent <em>Interface Definition Language (IDL)</em></a> </li>
						<li><q>language bindings</q> map IDL to specific languages such as <a href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/java-binding.html">Java</a> or <a href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/ecma-script-binding.html">JavaScript</a></li>
					</ul>
					<li>DOM is based on a in-memory representation of an XML document</li>
					<ul>
						<li>random document access using <a href="http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-1950641247">the tree's node structure</a></li>
						<li>more specific tasks such as <a href="http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-217A91B8">getting an element's attribute by name</a></li>
					</ul>
					<li>DOM parsers have an additional layer for building the tree</li>
					<ul>
						<li>an underlying SAX parser reports structures for tree building</li>
						<li>the memory representation is heavily interlinked (requiring substantial memory)</li>
						<li>DOM calls query or modify the memory representation of the tree</li>
					</ul>
					<li>DOM processing is not appropriate for all tasks</li>
					<ul>
						<li>very large documents may not fit into memory (risk of <em>thrashing</em>)</li>
						<li>for isolated tasks, the parsing overhead is prohibitive</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>DOM Parser</title>
				<img style="width : 90% ; margin : 2% ; " src="dom-parser.png" title="DOM Parser"/>
			</slide>
			<slide id="jdom">
				<title>JDOM</title>
				<ul>
					<li>DOM is not optimized for a specific programming language</li>
					<ul>
						<li>DOM knowledge can be easily transferred between programming languages</li>
						<li>programming with DOM in a given language often is not very convenient</li>
					</ul>
					<li>JDOM is a Java-specific version of a tree-based XML API</li>
					<ul>
						<li>represents the same concepts as DOM (XML structures)</li>
						<li>represents XML concepts <a href="http://www-128.ibm.com/developerworks/java/library/j-jdom/#h2">in a more Java-friendly way</a></li>
						<li>JDOM has no relationship with the W3C's DOM API</li>
					</ul>
					<li>JDOM can be built on top of almost any parser</li>
					<ul>
						<li>SAX is a pretty common choice for a foundation for JDOM</li>
						<li>SAX events are then used to build the JDOM tree</li>
					</ul>
				</ul>
			</slide>
        </part>
        <part id="databinding">
			<title>XML Data Binding</title>
			<slide>
				<title>Mapping XML into Languages</title>
				<ul>
					<li>XML data binding connects XML with language-specific structures</li>
					<ul>
						<li>for OO languages this often means mapping schemas and classes</li>
						<li>code for serialization and deserialization can then be generated</li>
					</ul>
					<li>Typical problems of data binding are schema changes</li>
					<ul>
						<li>if the schema is updated, can the code be migrated easily?</li>
						<li>can instances of different versions be handled by the same code?</li>
						<li>most data binding frameworks do not fully support XSDL anyway</li>
					</ul>
					<li>Several XML data binding frameworks are in widespread use</li>
					<ul>
						<li><a href="https://jaxb.dev.java.net/">Java Architecture for XML Binding (JAXB)</a></li>
						<li>Castor, another Java-based data binding framework</li>
					</ul>
				</ul>
			</slide>
        </part>
        <part id="xslt-intro">
			<title>XSL Transformations (XSLT)</title>
			<slide>
				<title>An XML Programming Language</title>
				<ul>
					<li>XSLT is not practical as a general-purpose programming language</li>
					<ul>
						<li>input and output and limited to handling documents (XML and plain text)</li>
						<li>system programming is not part of the language model</li>
					</ul>
					<li>XSLT is a very natural choice for XML-centric tasks</li>
					<ul>
						<li>XML is the data model of XSLT (technically, it now is <link href="xdm">XDM</link>)</li>
						<li>simple values use <link href="xsdl-1">XSDL</link>'s <link href="xsdl-simple-types"/></li>
						<li>structured values are XML trees</li>
					</ul>
					<li>XSLT and XML data binding not always work well together</li>
					<ul>
						<li>XML data binding often is regarded as <q>just a serialization</q> of language structures</li>
						<li>in these cases, the XML is hard to use outside of the language where it originated</li>
						<li>these scenarios are a telling sign for <em>poor use of XML</em></li>
					</ul>
				</ul>
			</slide>
        </part>
        <part>
			<title>Conclusions</title>
			<slide>
				<title>Document Engineering</title>
				<ul>
					<li>Documents are more important than programs</li>
					<li>Programs must be able to easily work with documents</li>
					<li>XML APIs make XML structures available for programs</li>
					<li>XML data binding maps XML to language structures</li>
					<li>XSLT uses XML as its native data model</li>
				</ul>
			</slide>
			<slide>
				<title>Assignment 2</title>
				<ul>
					<li>Implement <a href="a/1/">Assignment 1</a> using simple XML structures</li>
					<li><a href="a/2/">Assignment 2</a> asks for a number of sample entries</li>
					<li>Conceptual models can be represented in XML in many different ways</li>
				</ul>
			</slide>
        </part>
	</presentation>
	<presentation id="dtd">
		<title short="DTD">Document Type Definition (DTD)</title>
		<date>2007-09-11</date>
		<toc class="resources"><a href="xml-quickref.pdf">XML QuickRef</a></toc>
		<toc class="abstract">The XML specification defines a format for structured data (XML documents) and a grammar-based constraint language for these (DTD). In SGML-based systems, DTDs were often very complex and feature-rich constructs, which controlled a lot of the processing of SGML documents. XML greatly simplified DTDs, and de-facto usage of DTDs today simplified them even more. In many systems today, DTDs are not used at all or generated from sample documents. In this lecture, it is argued that DTDs (or schemas, to be more general) should be taken seriously in any non-trivial XML application, because they are a representation of the underlying (and often underspecified) data model of the application.</toc>
		<slide>
			<title>Abstract</title>
			<p class="abstract"><toc class="abstract"/></p>
		</slide>
		<part>
			<title>Schema Languages</title>
			<slide>
				<title>XML Validation</title>
				<ul>
					<li>XML knows two <q>states</q> of documents, <em>well-formed</em> and <em>valid</em></li>
					<li><em>well-formed</em> documents satisfy all basic constraints of the XML specification</li>
					<ul>
						<li>they can be parsed according to the XML grammar</li>
						<li>they satisfy the additional constraints (e.g., start and end tags match)</li>
						<li>together, this means they can be translated into a <link href="xmltree">tree</link></li>
					</ul>
					<li><em>valid</em> documents have been validated against a DTD</li>
					<ul>
						<li>a document must be well-formed before it can be validated</li>
						<li>all elements and attributes have been defined</li>
						<li>elements and attributes are used according to their definition</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Validation and Applications</title>
				<img src="valid-documents.png" style="width : 90% ; margin : 4% ; "/>
			</slide>
			<slide>
				<title>Non-XML, Well-Formed, and Valid</title>
				<listing src="non.xml" line="3-9"/>
				<listing src="address-invalid.xml" line="3-9"/>
				<listing src="address-valid.xml" line="3-9"/>
			</slide>
			<slide>
				<title>DTD Example</title>
				<listing src="address-valid.xml" line="1-2"/>
				<listing src="address.dtd"/>
				<ul>
					<li>The DTD defines constraints on element and attribute usage</li>
					<li>The DTD does only in part constrain textual contents</li>
				</ul>
			</slide>
			<slide>
				<title>XML Schema Languages</title>
				<ul>
					<li>DTDs are part of XML itself</li>
					<ul>
						<li>XML specifies the document format <u>and</u> one schema language</li>
						<li>DTD support is provided by most XML processors (<a href="http://www.w3.org/TR/REC-xml/#proc-types" title="XML specification">validating processors</a>)</li>
					</ul>
					<li>Other schema languages are available</li>
					<ul>
						<li><link href="xsdl-1">XSDL</link> as the W3C's recommendation</li>
						<li><link href="schematron"/> as a rule-based alternative</li>
						<li>various <a href="http://dret.net/glossary/xmlschemalanguage" title="XML glossary">other research projects and products</a></li>
					</ul>
					<li>Choosing appropriate schema language(s) is important</li>
					<ul>
						<li>we look at DTDs because they are part of XML itself</li>
						<li>we look at XSDL because it is widely used</li>
						<li>we look at Schematron because it is simple and powerful</li>
						<li>you may even invent your own schema language (a.k.a. <em>meta-programming</em>)</li>
					</ul>
				</ul>
			</slide>
		</part>
		<part>
			<title>DTD Basics</title>
			<slide>
				<title>XML is SGML light</title>
				<ul>
					<li>XML is a subset of SGML</li>
					<ul>
						<li>XML documents have been greatly simplified</li>
						<li>XML DTDs have retained more of SGML's peculiarities</li>
					</ul>
					<li>DTD design should be left to XML experts</li>
					<ul>
						<li>simple DTDs (for prototypes) are easy to define (or generate)</li>
						<li>serious DTDs for complex data models are hard to define</li>
					</ul>
					<li>XML is a useful tool for experiments and prototypes</li>
					<ul>
						<li>basic knowledge of DTDs is required</li>
						<li>serious XML schemas often use <link href="xsdl-1">XSDL</link> anyway</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Connecting Documents and DTDs</title>
				<ul>
					<li>A DTD is a schema for a set of documents</li>
					<ul>
						<li>there may be just one document for a DTD, there may be billions (HTML)</li>
						<li>in most cases, DTDs are managed as a separate resource</li>
					</ul>
					<li>The <a href="http://www.w3.org/TR/xml#sec-prolog-dtd"><em>Document Type Declaration</em></a> <q>contains or points to markup declarations that provide a grammar for a class of documents</q></li>
					<ul>
						<li>the part which is contained is called <em>Internal Subset</em></li>
						<li>the part which is pointed to is called <em>External Subset</em></li>
						<li>internal and external subset together are the <em>Document Type Definition (DTD)</em></li>
					</ul>
					<li>External subsets are identified by <em>Public</em> and <em>System Identifiers</em></li>
					<ul>
						<li><em>public identifiers</em> use a special notation</li>
						<li><em>system identifiers</em> are URIs (relative or absolute)</li>
						<li>applications use (i.e., know or retrieve) the DTD for validation</li>
					</ul>
				</ul>
				<listing src="address-valid.xml" line="1-2"/>
			</slide>
			<part>
				<title>DTD Syntax</title>
				<slide>
					<title>DTDs are not XML Documents</title>
					<ul>
						<li>DTDs use a special syntax</li>
						<ul>
							<li>somewhat ironic when everything else is XMLized</li>
							<li>DTDs cannot be processed with standard XML tools</li>
							<li>more compact than XML syntax</li>
						</ul>
						<li>Definition of elements and attribute lists</li>
						<ul>
							<li>elements are defined by the content they allow</li>
							<li>attribute lists are sets of allowed attributes on elements</li>
						</ul>
					</ul>
					<listing src="address.dtd"/>
				</slide>
				<slide>
					<title>Syntax Rules</title>
					<ul>
						<li>There is no container containing the whole DTD</li>
						<ul>
							<li><code>&lt;!ELEMENT xml EMPTY></code> thus is a complete DTD</li>
						</ul>
						<li>Definitions (officially called <em>declarations</em>) use <code>&lt;!… ></code> syntax</li>
						<ul>
							<li><code>ELEMENT</code> is used to <link href="dtd-element">define an element</link></li>
							<li><code>ATTLIST</code> is used to <link href="dtd-attlist">define an attribute list</link></li>
							<li><code>ENTITY</code> is used to <link href="dtd-entity">define an entity</link></li>
						</ul>
						<li>The document element is not marked explicitly</li>
						<ul>
							<li>but it must be declared in the document type declaration</li>
							<li>this means the document element is defined by the document, not by the DTD</li>
						</ul>
					</ul>
				<listing src="address-valid.xml" line="1-3"/>
				</slide>
			</part>
			<part id="dtd-element">
				<title>Defining Elements</title>
				<slide id="element-only-declaration">
					<title>Element Only Content</title>
					<ul>
						<li>Element content is defined by a grammar for the children</li>
						<ul>
							<li>sequences are indicated with a comma: <q><code>,</code></q></li>
							<li>choices are indicated with a vertical bar: <q><code>|</code></q></li>
							<li>optional parts are indicated with a question mark: <q><code>?</code></q></li>
							<li>repeatable parts are indicated with a plus: <q><code>+</code></q></li>
							<li>optional and repeatable parts are indicated with a asterisk: <q><code>*</code></q></li>
							<li>parentheses can be used for grouping and nesting</li>
						</ul>
					</ul>
					<listing src="xhtml1-transitional.dtd" line="1064-1074"/>
				</slide>
				<slide id="mixed-content-declaration">
					<title>Mixed Content</title>
					<ul>
						<li><link href="mixed-content"/> allows text content and elements to be mixed</li>
						<ul>
							<li><link href="whitespace"/> characters are allowed in <link href="element-only-declaration"/> (this must not be declared)</li>
							<li>for non-whitespace characters, character data must be allowed explicitly</li>
						</ul>
						<li>The allowed child elements may be constrained, but not their order or their number of occurrences</li>
						<li>Mixed Content always is defined as <code>&lt;!ELEMENT x (#PCDATA | a | b | …)* ></code></li>
					</ul>
					<listing src="xhtml1-transitional.dtd" line="568-568"/>
					<ul>
						<li><em>Character only</em> content is a special case of mixed content</li>
						<ul>
							<li>the element may only contain characters (no other elements)</li>
							<li>the repetition is not necessary because there is no choice</li>
						</ul>
					</ul>
					<listing src="xhtml1-transitional.dtd" line="355-355"/>
				</slide>
				<slide>
					<title>Empty Content</title>
					<ul>
						<li>Empty elements can be useful</li>
						<ul>
							<li>they may contain all information in attributes</li>
							<li>their presence may carry semantics without the need for additional information</li>
						</ul>
					</ul>
					<listing src="xhtml1-transitional.dtd" line="833-848"/>
				</slide>
			</part>
			<part id="dtd-attlist">
				<title>Defining Attribute Lists</title>
				<slide>
					<title>Attributes belong to Elements</title>
					<ul>
						<li>Attributes are specified in an element's <em>Attribute List</em></li>
						<ul>
							<li>an element definition may have any number of attributes associated with it</li>
							<li>attributes may occur at most once on an element</li>
						</ul>
						<li>Attributes definitions have a <em>name</em>, a <em>type</em>, and a <em>default declaration</em></li>
						<ul>
							<li>the attribute appears according to the default declaration</li>
							<li>if the attribute is present, its value must conform to the type</li>
						</ul>
					</ul>
					<listing src="xhtml1-transitional.dtd" line="794-801"/>
				</slide>
				<slide id="dtd-attr-type">
					<title>Attribute Types</title>
					<ul>
						<li>Attribute values can be constrained (which is not possible for element content)</li>
						<ul>
							<li><code>CDATA</code> means any character string (but no markup)</li>
							<li>enumerated types list allowed values: <code>(data|ref|object)</code> (list of XML names)</li>
							<li><code>ID</code> for identifying elements (part of <code><link href="ididref"/></code>)</li>
							<li><code>IDREF</code> for referencing identified elements (part of <code><link href="ididref"/></code>)</li>
						</ul>
						<li>Application-oriented attribute types are often <q>simulated</q></li>
						<ul>
							<li>using <link href="param-entity"/>, modeling information can be preserved</li>
						</ul>
					</ul>
					<listing src="xhtml1-transitional.dtd" line="894-894"/>
					<listing src="xhtml1-transitional.dtd" line="52-53"/>
					<ul>
						<li>The default declaration specifies the attribute's presence</li>
						<ul>
							<li><code>#REQUIRED</code> means the attribute has to be specified (on every element)</li>
							<li><code>#IMPLIED</code> marks an optional attribute (the parser may imply a value)</li>
							<li><code>"…"</code> specifies a default value (and the attribute is optional)</li>
						</ul>
					</ul>
				</slide>
			</part>
		</part>
		<part>
			<title>Advanced DTDs</title>
			<part id="ididref">
				<title>ID/IDREF</title>
				<slide>
					<title>References in Documents</title>
					<ul>
						<li>Without Validation, there are no IDs</li>
						<ul>
							<li><code>ID</code> is an <link href="dtd-attr-type">attribute type</link> declared in the DTD</li>
							<li><code>xml:id</code> is an attempt to support schema-independent IDs</li>
						</ul>
						<li>IDs are used to assign identities to elements</li>
						<ul>
							<li>the XML processor reports duplicate IDs as errors (<a href="http://www.w3.org/TR/xml/#id">part of validation</a>)</li>
						</ul>
						<li>IDREFs are used to reference existing IDs</li>
						<ul>
							<li>the XML processor reports references to non-existing IDs as errors (<a href="http://www.w3.org/TR/xml/#idref">part of validation</a>)</li>
						</ul>
						<li>IDs must be XML Names (in particular, they may not start with a number)</li>
					</ul>
				</slide>
				<slide>
					<title>ID/IDREF in a Document</title>
					<listing src="section.xml" line="3-18"/>
					<listing src="section.dtd" line="2-12"/>
				</slide>
				<slide>
					<title>References within the Tree</title>
					<img src="section.png" style="width : 90% ; margin : 4% ; "/>
				</slide>
				<slide>
					<title>Formatting Example</title>
					<p>XSLidy can generate links to sections such as the section about <link href="ididref"/>, this link is then translated into the appropriate HTML code, meaning a link with the target being a fragment identifier to the slide number.</p>
					<pre><![CDATA[<p>XSLidy can generate links to sections such as the section about <link href="ididref"/>, this link is then translated into the appropriate HTML code, meaning a link with the target being a fragment identifier to the slide number.</p>]]></pre>
					<p>After running XSLidy, the following HTML is generated:</p>
					<pre><![CDATA[<p>XSLidy can generate links to sections such as the section about <a href="#(23)">ID/IDREF</a>, this link is then translated into the appropriate HTML code, meaning a link with the target being a fragment identifier to the slide number.</p>]]></pre>
				</slide>
				<slide>
					<title>ID/IDREF Semantics</title>
					<ul>
						<li>Rooted in the document world</li>
						<ul>
							<li>all parts are assembled before processing</li>
							<li>names are symbolic and assigned as required</li>
							<li>mixed syntax and semantics</li>
						</ul>
						<li>Good idea, but many shortcomings</li>
						<ul>
							<li>constraints apply to one document only</li>
							<li>IDs and IDREFs are global instead of scoped</li>
							<li>identifiers should be allowed to use any type</li>
							<li>identifier processing should be type-specific (2 &#x225F; 02)</li>
						</ul>
						<li>Applications must know how to process ID/IDREF</li>
						<ul>
							<li>for HTML export, links can be generated</li>
							<li>for databases, keys should be used</li>
						</ul>
					</ul>
				</slide>
			</part>
			<part id="dtd-entity">
				<title>Entities</title>
				<slide>
					<title>General Entities</title>
					<ul>
						<li>XML's core concept of physical data structures</li>
						<ul>
							<li>an entity is a named unit of data which can be referenced</li>
							<li>within documents, it is referenced by the markup <code>&amp;entity-name;</code></li>
						</ul>
						<li>Entities can be used to name and reuse document content</li>
					</ul>
					<listing src="xhtml-lat1.ent" line="135-142"/>
					<ul>
						<li><em>Character References</em> look like entities: <code>&amp;#9786;</code> or <code>&amp;#x263A;</code> = &#x263A;</li>
						<ul>
							<li>they can be used to represent any Unicode character, they are processed as single characters</li>
						</ul>
					</ul>
				</slide>
				<slide id="param-entity">
					<title>Parameter Entities</title>
					<ul>
						<li>Parameter entities are parsed entities for use within the DTD</li>
						<ul>
							<li>a parameter entity must be specifically declared as such</li>
							<li>within DTDs, it is referenced by the markup <code>%entity-name;</code></li>
							<li>outside of DTDs, parameter entities cannot be used</li>
						</ul>
						<li>As general entities, parameter entities are meant for reuse</li>
						<ul>
							<li>in a DTD, reuse is mostly about reusing structures</li>
							<li>parameter entities are DTDs <q>duct tape</q>, not elegant, but effective</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>XHTML Parameter Entities (Attributes)</title>
					<listing src="xhtml1-transitional.dtd" line="433-437"/>
					<listing src="xhtml1-transitional.dtd" line="188-188"/>
					<listing src="xhtml1-transitional.dtd" line="133-138"/>
					<listing src="xhtml1-transitional.dtd" line="145-149"/>
					<listing src="xhtml1-transitional.dtd" line="55-56"/>
					<listing src="xhtml1-transitional.dtd" line="193-193"/>
				</slide>
				<slide>
					<title>XHTML Parameter Entities (Content)</title>
					<listing src="xhtml1-transitional.dtd" line="433-437"/>
					<listing src="xhtml1-transitional.dtd" line="230-230"/>
					<listing src="xhtml1-transitional.dtd" line="227-227"/>
					<listing src="xhtml1-transitional.dtd" line="203-204"/>
					<listing src="xhtml1-transitional.dtd" line="200-201"/>
					<listing src="xhtml1-transitional.dtd" line="197-198"/>
					<listing src="xhtml1-transitional.dtd" line="222-222"/>
				</slide>
			</part>
		</part>
		<part>
			<title>More Advanced DTDs</title>
			<slide>
				<title>Additional Mechanisms</title>
				<ul>
					<li>DTDs have more advanced mechanisms</li>
					<ul>
						<li>used in few applications, mostly by SGML veterans</li>
						<li>should not be used in new projects</li>
					</ul>
					<li><em>Conditional Sections</em> for configurable DTDs</li>
					<ul>
						<li>parts of a DTD can be enclosed in special constructs</li>
						<li>based on parameter entity setting, these parts can be switched <q>on</q> or <q>off</q></li>
					</ul>
					<li><em>External Entities</em> for referencing external resources</li>
					<ul>
						<li><em>parsed entities</em> contain content parsed by the XML processor</li>
						<li>inclusion should be done with <em>XInclude</em></li>
						<li><em>unparsed entities</em> contain non-XML content (e.g., images or plain text)</li>
						<li>referring to non-XML content is handled on the application level</li>
					</ul>
				</ul>
			</slide>
		</part>
		<part>
			<title>Conclusions</title>
			<slide>
				<title>DTD for XML Schemas</title>
				<ul>
					<li>XML documents are processed by applications</li>
					<li>Applications have assumptions about XML documents</li>
					<li>DTDs allow to formalize some of these constraints</li>
					<li>Part of the constraint checking must still be programmed</li>
				</ul>
			</slide>
			<slide>
				<title>Modeling DTDs</title>
				<ul>
					<li>Data models can be mapped to many different DTDs</li>
					<li>What is a good DTD? What is a bad DTD?</li>
					<li>How does the DTD affect further processing?</li>
				</ul>
			</slide>
		</part>
	</presentation>
	<presentation id="bestpractices">
		<title short="Best Practices">The Good, the Bad, and the Ugly</title>
		<date>2007-09-13</date>
		<toc class="resources"><a href="http://dret.net/netdret/docs/wilde-elpub2006-xml.pdf">Structuring Content with XML</a>&#160;· <a href="http://www.tbray.org/ongoing/When/200x/2006/01/09/On-XML-Language-Design">On XML Language Design</a></toc>
		<toc class="abstract">While XML it rather easy to understand and use, it is also rather easy to use XML in ways which either produce <q>ugly</q> XML, or which may lead to problems in components further processing the XML. The topic of this lecture thus is to look at design guidelines for XML schemas, leading to <q>good</q> XML. Some of the simpler topics cover basic questions of how to map a data model to XML markup (e.g., when to use elements or attributes). The next question is how data should be represented in XML so that applications can process it efficiently. We also look at what part of the markup an application will actually have access to, and this is defined by the <em>XML Information Set (Infoset)</em>, the specification underlying many XML technologies.</toc>
		<slide>
			<title>Abstract</title>
			<p class="abstract"><toc class="abstract"/></p>
		</slide>
		<slide>
			<title>XML Best Practices</title>
			<ul>
				<li><link href="goodxml">Good</link>: What you should do when using XML</li>
				<li><link href="badxml">Bad</link>: What you should not do when using XML</li>
				<li><link href="uglyxml">Ugly<sup>1</sup></link>: What you maybe have to do when using XML</li>
				<li><link href="infoset">Ugly<sup>2</sup></link>: XML's ugly little secret …</li>
			</ul>
		</slide>
		<part id="goodxml">
			<title short="Good XML">XML Best Practices</title>
			<slide>
				<title>Markup and Schemas</title>
				<ul>
					<li>XML can be encountered in different ways</li>
					<ol>
						<li>as somebody having to process XML documents</li>
						<li>as somebody having to understand XML documents</li>
						<li>as somebody having to generate XML documents</li>
						<li>as somebody having to design XML schemas</li>
					</ol>
				</ul>
			</slide>
			<part id="good-documents">
				<title>XML Documents</title>
				<slide>
					<title>Generating XML</title>
					<ul>
						<li>Character encoding</li>
						<ul>
							<li>use one of XML's standard encodings (UTF-8 or UTF-16)</li>
							<li>if you are using mostly latin characters, UTF-8 is much more compact</li>
							<li>any other character encoding may cause interoperability issues</li>
						</ul>
						<li>Pretty-printing (adding line feeds and indentation)</li>
						<ul>
							<li>pretty-printed XML is easier to read for humans</li>
							<li>pretty printed XML contains unnecessary whitespace</li>
							<li>pretty-printing is good for experiments and prototypes</li>
							<li>pretty printing should be switched off for production systems</li>
						</ul>
					</ul>
				</slide>
				<slide id="xml-views">
					<title>XML Views</title>
					<ul>
						<li>Other people may use different tools</li>
						<ul>
							<li>XML is a character-based formats, so every character counts</li>
							<li>other people may choose different technologies</li>
							<li>even your XML editor may choose to see things differently</li>
						</ul>
						<li>Many XML technologies use abstractions</li>
						<ul>
							<li>useful for concentrating on the <em>tree view</em></li>
							<li>no full control of markup usage (automatic serialization)</li>
							<li>think about working with a tree rather than working with a text file</li>
						</ul>
					</ul>
				</slide>
			</part>
			<part id="good-dtd">
				<title>XML DTDs</title>
				<slide id="model-to-markup">
					<title>From Model to Markup</title>
					<ul>
						<li>There should be a conceptual model of the data</li>
						<ul>
							<li>formal conceptual models for XML are an active field of research</li>
							<li>informal models may use any notation</li>
						</ul>
						<li>Model design should omit questions of markup design</li>
						<ul>
							<li>element/attribute decisions are not a model question</li>
							<li>hierarchy/reference decisions are not a model question</li>
							<li>identifying the relevant entities and their relationships is a good idea</li>
						</ul>
						<li>Document engineering never invented modeling tools</li>
						<ul>
							<li>for document modelers, <q>the markup is the model</q></li>
							<li>there are no established notations for modeling documents</li>
							<li>document-type parts (e.g., mixed content) are hard to include in models</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>From Graphs to Trees</title>
					<ul>
						<li>In the model, <em>n:m</em> relationships may appear</li>
						<ul>
							<li>in an address database, an address should be reusable</li>
							<li>in a résumé, an organization's information should be reusable</li>
						</ul>
						<li>XML documents are trees</li>
						<ul>
							<li>all non-tree structures must be represented by tree structures</li>
							<li>in most cases, this will be done by introducing references</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>From Markup to Model</title>
					<ul>
						<li>Start with a sample instance</li>
						<ol>
							<li>start with a sample instance</li>
							<li>generate a schema for the instance with some tool</li>
							<li>open up the schema where necessary</li>
							<li>try creating more example instances <em>as different as possible/required</em></li>
							<li>write code for manipulating your test set of instances</li>
						</ol>
						<li>Restarting may be hard, but should be done</li>
						<ul>
							<li>view the initial design as a test bed, not as the <q>first version</q></li>
							<li>after you have learned some lessons, <em>throw everything away</em></li>
							<li>restart by designing everything from scratch</li>
							<li>content may be salvaged by writing small XSLT programs</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>Top-Down or Bottom-Up?</title>
					<ul>
						<li>Both strategies have strengths and shortcomings</li>
						<ul>
							<li><em>top-down</em> tends to result in markup which looks <q>generated</q></li>
							<li><em>bottom-up</em> tends to result in markup which is less consistent</li>
						</ul>
						<li>Consistency is an important consideration</li>
						<ul>
							<li>if you dislike attributes, avoid them wherever possible</li>
							<li>if you like attributes, use them wherever possible</li>
							<li>don't mix these two styles of markup design</li>
						</ul>
					</ul>
				</slide>
				<slide id="reuse">
					<title>Reuse is Good</title>
					<ul>
						<li>Elements can be reused in different contexts</li>
						<ul>
							<li>elements then appear in the content model of more than one element</li>
							<li>an <code>address</code> may be used for <code>employee</code> as well as for <code>customer</code></li>
						</ul>
						<li>Content can be reused in different contexts</li>
						<ul>
							<li>(parts of) a content model may be useful in different contexts</li>
							<li>this only reuses an element's content, but not its name</li>
						</ul>
						<li>Attributes can be reused in different contexts</li>
						<ul>
							<li>technically, attributes are element-specific and have no relations when appearing on different elements</li>
							<li>when reusing attribute names, they should represent the same concept</li>
						</ul>
					</ul>
					<listing src="reuse.xml" line="3-16"/>
				</slide>
				<slide>
					<title>Reuse is Hard (in DTDs)</title>
					<ul>
						<li>Element reuse simply lists the element in more than one content model</li>
						<li>Content reuse requires parameter entities</li>
						<li>Attribute reuse requires parameter entities</li>
						<li>Nested parameter entities for multi-level reuse</li>
					</ul>
					<listing src="reuse.dtd"/>
				</slide>
			</part>
			<part>
				<title>General XML Issues</title>
				<slide id="element-vs-attribute">
					<title>Element vs. Attribute</title>
					<ul>
						<li>Elements and attributes are containers</li>
						<ul>
							<li>both contain character content</li>
						</ul>
						<li>Elements may carry attributes and may contain other elements</li>
						<ul>
							<li>for nested structures, elements must be chosen</li>
							<li>if the content needs to be annotated with an attribute, an element must be chosen</li>
							<li>if the item should be repeatable, an element must be chosen</li>
						</ul>
						<li>Attributes use less markup and have types</li>
						<ul>
							<li>if the content is (unstructured) <q>metadata</q>, an attribute may be a good choice</li>
							<li>for special types (ID/IDREF and enumerations), attributes are required</li>
							<li>if simple markup is an issue, attributes may be preferable</li>
						</ul>
						<li>Be consistent in you markup design style!</li>
					</ul>
				</slide>
				<slide>
					<title>Hierarchy vs. Reference</title>
					<ul>
						<li>Hierarchies are only possible with <em>1:n</em> relationships</li>
						<ul>
							<li>for <em>n:m</em> relationships, references are the only possible representation</li>
						</ul>
						<li>Containment should be represented as hierarchy</li>
						<ul>
							<li>containment limits the lifetime of the contained part to that of the container</li>
						</ul>
					</ul>
					<listing src="address-hierarchy.xml" line="2-11"/>
					<listing src="address-reference.xml" line="2-11"/>
				</slide>
				<slide id="granularity">
					<title>Granularity</title>
					<ul>
						<li>XML structures should identify the relevant information</li>
						<ul>
							<li>what exactly means <q>relevant</q>?</li>
							<li>very high granularity makes data acquisition hard</li>
							<li>very high granularity makes data processing easy</li>
						</ul>
						<li>Granularity is a general problem of data modeling</li>
						<ul>
							<li>XML is simply a syntax for representing structured data</li>
							<pre>&lt;phone>+1-510-6432253&lt;/phone></pre>
							<pre>&lt;phone cc="1" area="510" local="6432253"/></pre>
						</ul>
					</ul>
				</slide>
			</part>
		</part>
		<part id="badxml">
			<title>Bad XML</title>
			<slide>
				<title>Consistent Markup</title>
				<ul>
					<li>Decide on a strategy and stick to it</li>
					<li>Inconsistent markup is hard to work with</li>
					<li>Do not try to use markup itself for data representation</li>
					<ul>
						<li><q>attribute values in single quotes should be ignored</q></li>
						<li><q>empty elements using empty element tags have a special meaning</q></li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Simple Markup</title>
				<ul>
					<li>XML can be read and edited by hand</li>
					<ul>
						<li>this depends on the application scenario and markup design</li>
						<li>human-accessible XML should be a markup design goal</li>
					</ul>
					<li>Tool requirements</li>
					<ul>
						<li>if your documents can only be used with tool xyz, something is wrong</li>
						<li>XML should be used for open data formats in open environments</li>
					</ul>
					<li>Undocumented side-effects</li>
					<ul>
						<li>data models may include more dependencies than encoded in the schema</li>
						<li>clearly document these side-effects so that users are warned</li>
						<li>if possible, document them in a machine readable way using <link href="schematron">a schema language</link></li>
					</ul>
				</ul>
			</slide>
		</part>
		<part id="uglyxml">
			<title>Ugly XML</title>
			<slide id="redundant-data">
				<title>Redundant Data</title>
				<ul>
					<li>Redundant data is bad</li>
					<ul>
						<li>database design emphasizes <em>normalization</em> to eliminate redundant data</li>
						<li>normalization is difficult, creates complex structures, and makes data access slower</li>
						<li>real-life models and databases always contain redundancies</li>
					</ul>
					<li>Redundant data is used very frequently</li>
					<ul>
						<li>the <a href="http://zip4.usps.com/zip4/citytown_zip.jsp">ZIP code identifies state and city/cities</a></li>
						<li>very few address databases normalize street names (or numbers)</li>
					</ul>
					<li>Redundancy can be used for error detection/correction</li>
				</ul>
			</slide>
			<slide id="schema-redundancy">
				<title>Redundancy in the Schema</title>
				<ul>
					<li>Redundant data in schemas is very bad</li>
					<ul>
						<li>schema inspection cannot reveal the <q>same objective</q> behind the same markup</li>
						<li>further schema development will introduce inconsistencies</li>
					</ul>
					<li>Redundant data in schemas should be avoided</li>
					<ul>
						<li>schemas are a small and well-designed dataset</li>
						<li>schema design and maintenance are important issues</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Generically Generated Markup</title>
				<ul>
					<li>Some XML designers generate their schemas</li>
					<ul>
						<li>generated schemas are more likely to be not very well-designed</li>
						<li>the schema generation process may be poorly implemented</li>
					</ul>
					<li>Some schemas are based on a very generic markup</li>
					<ul>
						<li>the structure actually is in the content, not in the markup</li>
						<li>XML tools will not be very useful when working with these documents</li>
					</ul>
				</ul>
				<listing src="generic.xml" line="2-14"/>
			</slide>
		</part>
		<part id="infoset">
			<title short="Infoset">XML Information Set (XML Infoset)</title>
			<slide>
				<title>What is the Content of an XML Document?</title>
				<ul>
					<li>An interesting (and fruitless) discussion</li>
					<ul>
						<li>the content is whatever you consider it to be</li>
						<li>agreement between peers is necessary for data exchange</li>
						<li>agreement between specification writers and toolmakers is necessary to provide tools</li>
					</ul>
					<li>DOM and XSLT were two early arrivals</li>
					<ul>
						<li>both had an idea (and a model) of what the content of an XML document is</li>
						<li>they did not have the exact same idea</li>
					</ul>
					<li>Set a normative standard for an XML document's content</li>
					<ul>
						<li>the Infoset defines what is represented in the tree</li>
						<li>people should be confident to get this information when using XML technologies</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Infoset Example</title>
				<img src="infoset-example.png" style="width : 90% ; margin : 4% ; "/>
			</slide>
			<slide id="not-infoset">
				<title>What is <u>Not</u> in the Infoset</title>
				<ul>
					<li>Do not rely on <a href="http://www.w3.org/TR/xml-infoset/#omitted">information not available in the Infoset</a></li>
					<ul>
						<li>order of attributes</li>
						<li>type of quotes around attribute values</li>
						<li>notation of empty elements (<code>&lt;elem>&lt;/elem></code> vs. <code>&lt;elem/></code>)</li>
						<li>how lines are terminated</li>
						<li>entities and character references</li>
					</ul>
					<li>XML contains all this information if used as XML document</li>
					<li>many XML technologies are in fact Infoset technologies</li>
					<ul>
						<li>XSDL, XSLT, XQuery, SOAP, …</li>
					</ul>
				</ul>
			</slide>
		</part>
		<part>
			<title>Conclusions</title>
			<slide>
				<title>XML and Modeling</title>
				<ul>
					<li>XML is about representing structured data</li>
					<li>XML is a format for representing trees</li>
					<li>Data models often are not trees</li>
					<li>Mapping data models to trees can be done in many ways</li>
				</ul>
			</slide>
			<slide>
				<title>Assignment</title>
				<ul>
					<li><a href="a/2/">Assignment 2</a> is a simple Modeling task</li>
					<ul>
						<li>we provide a sample instance and some requirements</li>
						<li>create an XML version of the sample instance</li>
						<li>create a DTD which is more versatile than just working for the sample instance</li>
					</ul>
				</ul>
			</slide>
		</part>		
	</presentation>
	<presentation id="xmlns">
		<title short="Namespaces">XML Namespaces</title>
		<date>2007-09-18</date>
		<toc class="resources"><a href="http://www.rpbourret.com/xml/NamespacesFAQ.htm#p1">XML Namespaces FAQ (Part I)</a>&#160;· <a href="http://www.w3.org/TR/REC-xml-names/" title="W3C XML Namespaces Specification">Spec</a></toc>
		<toc class="abstract">XML is successful because it can be used in many different scenarios, and because it is easy to define a schema (such as a DTD) for new scenarios, producing a tailored XML data model for this scenario. This means that names in XML documents must be interpreted as belonging to a certain schema. As long as a document uses names from only one schema, this can be done rather easily. However, in many scenarios today documents combine names from different schemas, and <em>XML Namespaces</em> provide a mechanism how the names in an XML document can be associated with a namespace.</toc>
		<slide>
			<title>Abstract</title>
			<p class="abstract"><toc class="abstract"/></p>
		</slide>
		<part>
			<title>How to think about Namespaces</title>
			<slide>
				<title>Namespaces are Simple</title>
				<ul>
					<li>XML Namespaces are often misunderstood</li>
					<ul>
						<li>the biggest problem is to get rid of some assumptions</li>
						<li>XML Namespaces are too simple and thus confusing</li>
					</ul>
					<li>Instincts of Web users</li>
					<ol>
						<li>URIs identify something that can be retrieved by a browser</li>
						<li>URIs identify something that can be displayed by a browser</li>
						<li>if I cannot get it and cannot look at it, what good can it be?</li>
					</ol>
					<li>However, these assumptions are not always true</li>
					<ol>
						<li>URIs identify <em>resources</em> which often, but not always, can be accessed over the Web</li>
						<li>URIs identify <em>resources</em> which often, but not always, have a Web-accessible representation</li>
						<li>sharing URIs means sharing an identity, which can mean sharing semantics (associated with this identity)</li>
					</ol>
				</ul>
			</slide>
			<slide>
				<title>Simple Examples</title>
				<listing src="mathml1.xml" line="2-6"/>
				<listing src="mathml2.xml" line="2-6"/>
				<listing src="mathml3.xml" line="2-6"/>
				<listing src="mathml4.xml" line="2-6"/>
			</slide>
			<slide>
				<title>Name Spaces</title>
				<ul>
					<li>Names are one form of identification</li>
					<li>Identification is essential for communications</li>
					<li>Names in XML are not suitable for identification</li>
					<ul>
						<li>they are local to their context (where they are defined)</li>
						<li>if the context is uniquely identified, the names would be, too</li>
					</ul>
					<li>Name Spaces: <em>Put names into spaces</em></li>
					<ul>
						<li>how to identify the space? Web things are identified by URIs</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>URI Philosophy</title>
				<ul>
					<li><link href="uri"/> uniquely identify resources</li>
					<li>URIs often provide access information</li>
					<ul>
						<li>pretty clear in <code>http://dret.net/lectures/xml-fall07/</code></li>
						<li>less clear in <code>urn:ietf:rfc:2648</code>  (<a href="http://dret.net/rfc-index/reference/RFC2648">RFC 2648</a>)</li>
						<li>very (and purposely) unclear in <code>tag:9327493874329</code>  (<a href="http://dret.net/rfc-index/reference/RFC4151">RFC 4151</a>)</li>
					</ul>
					<li>URIs often return <em>resource representations</em></li>
					<ul>
						<li>the resource itself is never returned (how to return a <em>lecture</em>?)</li>
						<li>some representation often is useful (HTML, PDF, maybe video/audio)</li>
						<li>the resource exists and is useful without a representation!</li>
					</ul>
					<li>URIs are much more than just addresses of HTML pages</li>
				</ul>
			</slide>
			<slide>
				<title>The Namespace Problem</title>
				<ul>
					<li>People assume that URIs point to Web pages</li>
					<ul>
						<li>a <em>namespace name</em> (a URI) may point to a Web page</li>
						<li>it may also have no Web page associated with it</li>
						<li>it may even use a URI scheme which cannot be retrieved</li>
						<li>but it is still possible to compare URIs!</li>
					</ul>
					<li>People assume some standardized content format</li>
					<ul>
						<li>friendly namespaces provide HTML portals (<a href="http://www.w3.org/1999/xhtml">XHTML</a>)</li>
						<li>some namespaces just give you the schema (<a href="http://www.w3.org/2001/12/soap-envelope">SOAP</a>)</li>
						<li>less friendly namespaces provide minimal information (<a href="http://www.w3.org/1999/XSL/Transform">XSLT</a>)</li>
						<li>very unfriendly namespaces may return a 404 or even use inaccessible schemes</li>
						<li>but they all are valid, because no resource representation is required!</li>
					</ul>
					<li>Namespaces are used by comparing URIs</li>
					<ul>
						<li>anything else maybe useful, but is not strictly required</li>
						<li>when searching for a namespace definition, use Google (string search)</li>
					</ul>
				</ul>
			</slide>
		</part>
		<part>
			<title>Using Namespaces</title>
			<slide>
				<title>Declaring Namespaces</title>
				<ul>
					<li>Using a namespace means referencing names from it</li>
					<ul>
						<li>unfortunately, there is no really standard way of writing these names</li>
						<li>(the <q><a href="http://www.jclark.com/xml/xmlns.htm">Clark notation</a></q> is useful: <code>{http://www.w3.org/1999/xhtml}html</code>)</li>
						<li>Namespaces are declared and then used</li>
					</ul>
					<li><xml>xmlns</xml>-prefixed attributes are used for declaring namespaces</li>
					<ul>
						<li>Default: <elem>html xmlns="http://www.w3.org/1999/xhtml"</elem></li>
						<li>Prefix: <elem>xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml"</elem></li>
					</ul>
					<li>Namespace declarations are inherited and can be overwritten</li>
					<ul>
						<li>the default namespace can be undeclared</li>
						<li>Namespace declarations can be used in a myriad of ways</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Unhealthy Namespace Usages</title>
				<ul>
					<li>Namespaces can be (and are) used in very weird ways</li>
					<ul>
						<li>these are syntax variations of identical structures</li>
						<li>without a good (i.e., conforming) parser, interpretation is very hard</li>
						<li>copy/paste can become hard or impossible</li>
					</ul>
					<li>Namespaces can be <a href="http://lists.xml.org/archives/xml-dev/200204/msg00170.html">neurotic, psychotic, borderline, or normal</a></li>
					<li>Each of the insane cases complicates processing</li>
					<li>None of these has any real technical inaccuracies</li>
					<li>XML should be used with humans in mind</li>
				</ul>
			</slide>
			<slide>
				<title>Unhealthy Namespace Usages in Practice</title>
				<listing src="neurotic.xml" line="2-9"/>
				<listing src="borderline.xml" line="2-9"/>
				<listing src="psychotic.xml" line="2-9"/>
			</slide>
			<slide>
				<title>Elements and Attributes</title>
				<ul>
					<li>Namespaces often apply to elements and attributes</li>
					<ul>
						<li>if an element name has no prefix, it has no namespace or the default namespace associated</li>
						<li>if a name has a prefix, the prefix must be bound to a namespace name</li>
						<li>names like this are called <em>Qualified Names (QNames)</em></li>
					</ul>
					<li>Elements and Attributes are treated differently</li>
					<ul>
						<li>the default namespace only applies to unprefixed element names</li>
						<li>unprefixed attribute names are in no namespace</li>
						<li><link href="xsdl-1">XSDL</link> deals with this by <link href="xsdl-names">keeping attributes <q>local</q></link></li>
					</ul>
					<li>Applications should interpret QNames</li>
					<ul>
						<li>naïve implementations will break when processing unhealthy instances</li>
						<li>the mechanics of implementing namespaces are not very hard</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Other Usages</title>
				<ul>
					<li>Increasingly, QNames are used in content</li>
					<ul>
						<li><link href="xslt-1">XSLT</link> was the first specification using this</li>
						<li>many other technologies have followed</li>
					</ul>
				</ul>
				<pre><![CDATA[<xsl:template match="section" xmlns:mathml="http://www.w3.org/1998/Math/MathML/">
<xsl:if test="exists(.//mathml:*)">]]></pre>
				<ul>
					<li>Technically, everything is well-defined</li>
					<ul>
						<li>for processing, the namespace bindings must be known</li>
						<li>copy/paste on a textual basis may not work or even work wrong</li>
					</ul>
				</ul>
			</slide>
		</part>
		<part>
			<title>Defining Namespaces</title>
			<slide>
				<title>Any URI is Possible</title>
				<ul>
					<li>A namespace name is a URI, that's all!</li>
					<ul>
						<li>it may not be accessible (because of the URI scheme)</li>
						<li>when retrieving it, nothing may be returned</li>
						<li>when retrieving it, something may be returned</li>
					</ul>
					<li>The only important thing is <em>the name</em></li>
					<ul>
						<li>the name is mentioned in the documentation</li>
						<li>if you know the documentation, you known the name</li>
						<li>shared names mean shared knowledge</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Namespace Definitions</title>
				<ul>
					<li>Namespaces can be defined by a DTD (<a href="http://www.w3.org/TR/xhtml1/#strict">XHTML</a>)</li>
					<li>Namespaces can be defined by XSDL (<a href="http://www.w3.org/TR/soap12-part1/#tabnsprefixes">SOAP</a>)</li>
					<li>Namespaces can be defined by RELAX NG (<a href="http://www.w3.org/TR/xhtml2/conformance.html#strict">XHTML 2.0</a>)</li>
					<li>Namespaces can be defined by prose (<a href="http://www.w3.org/TR/xslt#xslt-namespace">XSLT</a>)</li>
					<li>If schemas are provided, additional information is required</li>
					<ul>
						<li>it is unlikely that a namespace can be fully described by a schema</li>
						<li>additional constraints and semantics are specified in prose</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Structured Namespaces</title>
				<ul>
					<li>Namespaces have no structure</li>
					<ul>
						<li>a collection of names grouped by their namespace name</li>
						<li>inside the namespace, names have local meaning</li>
					</ul>
					<li>Namespace definitions to make up their own rules</li>
					<ul>
						<li>but then they must also make rules how to deal with conflicts</li>
					</ul>
					<li>XSDL <a href="http://www.w3.org/TR/xmlschema-1/#concepts-nameSymbolSpaces">structures the namespace defined by a schema</a></li>
					<ul>
						<li>the different <q>parts</q> of the namespace are called <em>symbol spaces</em></li>
						<li>all XSDL components have their own symbol space</li>
						<li><em>simple</em> and <em>complex types</em> share the same symbol space</li>
						<li>locally defined elements/attributes are in <q>sub symbol spaces</q></li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Fixed or Extensible?</title>
				<ul>
					<li>Can a namespace change over time?</li>
					<ul>
						<li>may the namespace description become outdated? extended? replaced?</li>
						<li>this should be clearly documented in the namespace description</li>
					</ul>
					<li>The XML XML Namespace was widely believed <a href="http://www.w3.org/XML/1998/namespace">to be defined by XML</a></li>
					<ul>
						<li><xml>xml:lang</xml> and <xml>xml:space</xml> defined by XML</li>
						<li><xml>xml:base</xml> was added by <em>XML Base</em></li>
						<li><xml>xml:id</xml> was added by <em>xml:id</em></li>
					</ul>
					<li>When defining namespaces, plan ahead and publish everything</li>
					<ul>
						<li>dependencies, change management, and versioning issues are important</li>
						<li>there still is no accepted standard for namespace descriptions</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Namespace Descriptions</title>
				<img style="width : 90% ; margin : 2% ; " src="ns-description.png"/>
				<p class="quotenote"><a href="http://dret.net/netdret/publications#wil06h">Erik Wilde, <q>Structuring Namespace Descriptions</q>, 15th International World Wide Web Conference (WWW2006), Edinburgh, UK, May 2006.</a></p>
			</slide>
		</part>
		<part>
			<title>Processing Namespaces</title>
			<slide id="namespace-validity">
				<title>Namespaces and Validity</title>
				<ul>
					<li>Namespaces define an additional layer on top of XML</li>
					<ul>
						<li>they define additional semantics (assignment to namespaces)</li>
						<li>they define additional constraints (declaration and usage of namespaces)</li>
					</ul>
					<li>Namespace-awareness is a basic requirement for XML tools</li>
					<ul>
						<li>XML not compliant with XML Namespaces will break most tools</li>
						<li>processing namespaces should be done by tools</li>
						<li>a namespace-aware parser translates namespace declarations into nodes</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Namespaces in the Document</title>
				<listing src="mathml4.xml"/>
			</slide>
			<slide>
				<title>Namespaces in the Tree</title>
				<img src="xmlns-tree.png" style="width : 90% ; margin : 4% ; "/>
			</slide>
		</part>
		<part>
			<title>Conclusions</title>
			<slide>
				<title>Name Spaces</title>
				<ul>
					<li><q>Bags of Names</q> with a URI as a label</li>
					<li>The URI does not necessarily return anything</li>
					<li>Namespaces can be defined in any way (e.g., schemas)</li>
				</ul>
			</slide>
		</part>
	</presentation>
	<presentation id="xpath">
		<title short="XPath">XML Path Language (XPath)</title>
		<date>2007-09-20</date>
		<toc class="resources"><a href="xpath-chapter.pdf">XPath Chapter</a>&#160;· <a href="xpath-quickref.pdf">XPath QuickRef</a></toc>
		<toc class="abstract">XML structures data into a rather small number of different constructs, most notably elements and attributes. The <em>XML Path Language (XPath)</em> defines a way how to select parts of XML documents, so that they can be used for further processing. XPath's primary use in in <em>XSL Transformations (XSLT)</em>, but other XML technologies use it as well, e.g. XSDL. XPath is a very compact language with a syntax that resembles path expressions well-known from file systems. These path expressions, however, are generalized and therefore much more powerful than the rather simple path expressions in file systems. Because of its use in different XML technologies, XPath is one of the most important XML core technologies.</toc>
		<slide>
			<title>Abstract</title>
			<p class="abstract"><toc class="abstract"/></p>
		</slide>
		<part>
			<title>Why XPath?</title>
			<slide>
				<title>Selecting Parts of XML Documents</title>
				<ul>
					<li>XML is a syntax for trees</li>
					<ul>
						<li>it defines a way for how trees can be exchanged</li>
					</ul>
					<li>XML technologies should provide support for working with trees</li>
					<ul>
						<li>when receiving trees, access to the tree should be easy (DOM)</li>
						<li>validating trees should be easy (<link href="xsdl-1">XSDL</link>)</li>
						<li>mapping trees should be easy (<link href="xslt-1">XSLT</link>)</li>
						<li>querying tree collections should be easy (<link href="xquery-1">XQuery</link>)</li>
						<li>XPath is what regular expressions for text-based information</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Making Selection Reusable</title>
				<ul>
					<li>Different XML technologies need selection</li>
					<ul>
						<li><link href="xslt-1">XSLT</link> needs it for selecting parts and manipulating them</li>
						<li><link href="xsdl-1">XSDL</link> needs it for applying identity constraints</li>
						<li>DOM needs it for extracting parts from an XML tree</li>
						<li>XQuery needs it for writing XML-oriented queries</li>
					</ul>
					<li>XPath was created to be reusable</li>
					<ul>
						<li>XML experts should only learn one selection language</li>
						<li>this knowledge can be reused when learning new technologies</li>
						<li>implementations can reuse code libraries</li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>How XPath Evolved</title>
				<ul>
					<li>XSL was designed as the new XML stylesheet language</li>
					<ol>
						<li><link href="xslt-1">XSL Transformations (XSLT)</link> transform the input document</li>
						<li><em>XSL Formatting Objects (XSL-FO)</em> is what they will transform it to</li>
					</ol>
					<li>XSLT was designed to work on arbitrary XML input documents</li>
					<ul>
						<li>started as a part of XSL (<a href="http://www.w3.org/TR/1998/WD-xsl-19981216">WD-xsl-19981216</a> → <a href="http://www.w3.org/TR/1999/WD-xslt-19990421">WD-xslt-19990421</a>)</li>
						<li>for selecting parts of the transformation input, a selection mechanism had to be provided</li>
					</ul>
					<li>XPath was turned into a standalone specification</li>
					<ul>
						<li>started as a part of XSLT (<a href="http://www.w3.org/TR/1999/WD-xslt-19990421">WD-xslt-19990421</a> → <a href="http://www.w3.org/1999/07/WD-xslt-19990709">WD-xslt-19990709</a>)</li>
						<li>reused in a number of other W3C specifications (XSDL, DOM)</li>
					<li>Complete overhaul for XSLT 2.0 and XQuery</li>
					<ul>
						<li><a href="http://www.w3.org/TR/xpath20/">XPath 2.0</a> as the core language</li>
						<li>a much larger set of <a href="http://www.w3.org/TR/xpath-functions/">functions and operators</a></li>
						<li>the underlying <a href="http://www.w3.org/TR/xpath-datamodel/">data model</a> which describes the foundation</li>
					</ul>
					</ul>
				</ul>
			</slide>
		</part>
		<part>
			<title>How XPath Works</title>
			<part id="xpath-tree">
				<title>The XPath Tree Model</title>
				<slide>
					<title>Starting from the Infoset</title>
					<ul>
						<li>XPath operates on an abstract data model</li>
						<ul>
							<li>a tree derived from the <link href="infoset"/></li>
							<li>a simplification (another one!) of the underlying XML</li>
						</ul>
						<li>The Infoset is turned into an <em>XPath node tree</em></li>
						<ul>
							<li>11 infoset item types → 7 XPath node tree node types</li>
							<li>character items are merged into text nodes</li>
							<li>namespace declarations are no longer visible as attributes</li>
						</ul>
					</ul>
				</slide>
				<slide id="not-xpath">
					<title>What is <u>Not</u> in the XPath Tree</title>
					<ul>
						<li>The same things which are <link href="not-infoset">not in the Infoset</link></li>
						<ul>
							<li>the order of attributes in a start tag</li>
							<li>the types of quotes around attribute values</li>
							<li>character references and entities (<code>&amp;#xFC;</code>/<code>&amp;uuml;</code> → <code>ü</code>)</li>
						</ul>
						<li>And some more …</li>
						<ul>
							<li>namespace declarations are no longer visible as attributes</li>
							<li>notations and unexpanded entity references</li>
						</ul>
					</ul>
				</slide>
			</part>
			<part>
				<title>XPath Evaluation</title>
				<slide>
					<title>Tree In / Selection Out</title>
					<ul>
						<li>XPath evaluates an expression based on a tree</li>
						<li>Where the tree comes from is out of XPath's scope</li>
						<li>The result of the evaluation is a selection</li>
						<ul>
							<li><code>//img[not(@alt)]</code> → select all images which have no <code>alt</code> attribute</li>
							<li><code>count(//img)</code> → return the number of images</li>
							<li><code>/descendant::img[3]/@src</code> → return the third image's <code>src</code> URI</li>
							<li><code>starts-with(/html/@lang, 'en')</code> → test whether the document's language is english</li>
						</ul>
						<li>Syntax errors may occur</li>
						</ul>
				</slide>
			</part>
		</part>
		<part>
			<title short="Location Paths">XPath Location Paths</title>
			<slide>
				<title>Location Path Structure</title>
				<ul>
					<li>Each location path consists of <em>Location Steps</em></li>
					<ul>
						<li>location steps are separated by <q><code>/</code></q>, like path names in file systems</li>
					</ul>
					<li>Similarities between XPath location paths and file systems</li>
					<ol>
						<li>nodes in the <link href="xpath-tree">XPath tree</link> have different types</li>
						<li>the <link href="xpath-nodetest">type and number of nodes selected by one step</link></li>
						<li>the <link href="xpath-axes">direction in which each step moves</link></li>
						<li>additional <link href="xpath-predicates">filters for selecting specific nodes</link></li>
					</ol>
					<li>Differences between XPath location paths and file systems</li>
					<ol>
						<li>XPaths may return <link href="xpath-expressions">other data types than nodes</link></li>
						<li>XPath provides a <link href="xpath-functions">built-in function library</link></li>
					</ol>
				</ul>
			</slide>
			<part>
				<title short="Node Tests">XPath Node Tests</title>
				<slide>
					<title>File System vs. XPath Paths</title>
					<table style="margin : 5% ; " width="85%">
						<tr>
							<th>File System Path:</th>
							<td align="center"><code>/</code></td>
							<td align="center"><code>usr</code></td>
							<td align="center"><code>/</code></td>
							<td align="center"><code>local</code></td>
							<td align="center"><code>/</code></td>
							<td align="center"><code>apache</code></td>
							<td align="center"><code>/</code></td>
							<td align="center"><code>bin</code></td>
							<td align="center"><code>/</code></td>
						</tr>
						<tr>
							<th># Selected Nodes:</th>
							<td align="center">1</td>
							<td align="center">→ 1</td>
							<td align="center">→</td>
							<td align="center">1</td>
							<td align="center">→</td>
							<td align="center">1</td>
							<td align="center">→</td>
							<td align="center">1</td>
						</tr>
					</table>
					<table style="margin : 5% ; " width="85%">
						<tr>
							<th>XPath:</th>
							<td align="center"><code>/</code></td>
							<td align="center"><code>html</code></td>
							<td align="center"><code>/</code></td>
							<td align="center"><code>body</code></td>
							<td align="center"><code>/</code></td>
							<td align="center"><code>table</code></td>
							<td align="center"><code>/</code></td>
							<td align="center"><code>thead</code></td>
							<td align="center"><code>/</code></td>
							<td align="center"><code>tr</code></td>
						</tr>
						<tr>
							<th># Selected Nodes:</th>
							<td align="center">1</td>
							<td align="center">→ 1</td>
							<td align="center">→</td>
							<td align="center">1</td>
							<td align="center">→</td>
							<td align="center">6</td>
							<td align="center">→</td>
							<td align="center">4</td>
							<td align="center">→</td>
							<td align="center">12</td>
						</tr>
					</table>
				</slide>
				<slide id="xpath-nodetest">
					<title>Tests for Nodes</title>
					<ul>
						<li>Name tests</li>
						<ul>
							<li>testing for a particular name (elements/attributes): <code>/html/head/title</code></li>
							<li>wildcards (testing for any name): <code>/html/head/*</code></li>
						</ul>
						<li>Node type tests</li>
						<ul>
							<li>text nodes: <code>text()</code></li>
							<li>comment nodes: <code>comment()</code></li>
							<li>any nodes: <code>node()</code></li>
						</ul>
						<li>Processing instruction tests</li>
						<ul>
							<li>any PI: <code>processing-instruction()</code></li>
							<li>specific PI: <code>processing-instruction("xml-stylesheet")</code></li>
						</ul>
					</ul>
				</slide>
			</part>
			<part id="xpath-axes">
				<title short="Axes">XPath Axes</title>
				<slide>
					<title>Where Do You Want to Go Today?</title>
					<ul>
						<li>File system paths are one direction only</li>
						<ul>
							<li>always one level down in the file system hierarchy</li>
							<li><code>.</code> and <code>..</code> are clever directory shortcuts</li>
							<li>other directions supported by tools (e.g., <code>find</code>)</li>
						</ul>
						<li>XPath allows steps is different directions</li>
						<ul>
							<li>the default direction is <code>child</code></li>
							<li>other directions are explicitly specified: <code>descendant::a</code></li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>Axis Peculiarities</title>
					<ul>
						<li>Attributes and Namespaces are <u>not</u> the children of elements, but …</li>
						<li>… elements are their attributes' parent!</li>
						<ul>
							<li>very counter-intuitive</li>
							<li>very convenient</li>
						</ul>
						<li>Attributes and Namespaces are always leaves in the node tree</li>
						<li>Attribute nodes <u>have</u> the attribute value as their value</li>
						<li>Namespace nodes <u>have</u> the namespace name (i.e., a URI) as their value</li>
						<li>Namespace nodes exist because of namespace declarations</li>
						<ul>
							<li>in the XPath node tree, only the namespace nodes are visible</li>
							<li>the namespace declaration attributes (<code>xmlns</code>) are invisible</li>
							<li>one namespace declaration potentially creates many namespace nodes</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>Axes</title>
					<img style="height : 75% ; margin : 2% ; " src="xpath-axes.png" title="XPath Axes"/>
				</slide>
				<slide>
					<title>Putting it all Together</title>
					<ul>
						<li>XPath location paths use a simple syntax</li>
						<ul>
							<li>sequence of location steps, separated by <q><code>/</code></q></li>
						</ul>
						<li>Each location step uses a simple structure (<code>preceding::p[@class="warning"]</code>)</li>
						<ol>
							<li>an axis followed by <q><code>::</code></q> (no axis uses the default axis <code>child</code>)</li>
							<li>a <link href="xpath-nodetest">node test</link></li>
							<li><em>0-n</em> <link href="xpath-predicates"/> enclosed in <q><code>[]</code></q></li>
						</ol>
						<li>Location paths can be abbreviated</li>
						<ul>
							<li><code>child::</code> can be omitted (default axis)</li>
							<li><code>attribute::</code> can be written as <q><code>@</code></q></li>
							<li><q><code>.</code></q> is an abbreviation for <code>self::node()</code></li>
							<li><q><code>..</code></q> is an abbreviation for <code>parent::node()</code></li>
							<li><q><code>//</code></q> is an abbreviation for <code>/descendant-or-self::node()/</code></li>
						</ul>
					</ul>
				</slide>
			</part>
			<part id="xpath-predicates">
				<title>Predicates</title>
				<slide>
					<title>Location Step Filters</title>
					<ul>
						<li>Predicates are filters for each location step</li>
						<ul>
							<li>there can be any number of filters (<em>0-n</em>)</li>
							<li>each filter is applied to each selected node individually</li>
						</ul>
						<li>Each predicate is an XPath and evaluated as a boolean</li>					
						<ul>
							<li>the context of this evaluation is the node for which the filter is evaluated</li>
							<li>if the result is a number, it is compared with the <code>position()</code> function (<code>/descendant::a[5]</code>)</li>
						</ul>
						<li>Predicates always reduce the set of selected nodes</li>
						<ul>
							<li>as corner cases, the set of selected nodes does not change or is empty</li>
							<li>predicates are used in the majority of non-trivial XPath location paths</li>
						</ul>
					</ul>
				</slide>
				<slide>
					<title>Location Path Processing</title>
					<ul>
						<li>Location paths are processed in a very simple way</li>
						<ol>
							<li>start with a given context</li>
							<li>for each location step, repeat the following steps:</li>
							<li>based on the context and the axis, select the nodes on this axis</li>
							<li>reduce this selection to the nodes identified by the node test</li>
							<li>sequentially apply all filters to each of these nodes</li>
							<li>take the remaining node set as the context for the next location step</li>
						</ol>
					</ul>
				</slide>
			</part>
		</part>
		<part id="xpath-expressions">
			<title>XPath Expressions</title>
			<slide>
				<title>Beyond Location Paths</title>
				<ul>
					<li>XPath is a full expression language</li>
					<ul>
						<li>any evaluated expression in XSLT is an XPath</li>
						<li>XPath must be able to calculate operate on non-XML data types</li>
					</ul>
					<li>XPath uses a very simple data model</li>
					<ol>
						<li>node sets: <code>//img[not(@alt)]</code></li>
						<li>number: <code>count(//img)</code></li>
						<li>string: <code>/descendant::img[3]/@src</code></li>
						<li>boolean: <code>starts-with(/html/@lang, 'en')</code></li>
					</ol>
				</ul>
			</slide>
			<slide>
				<title>XPath Usages</title>
				<ul>
					<li>XPath is used in different technologies</li>
					<ul>
						<li>XSLT uses XPath as its expression language</li>
						<li>XSDL uses XPath for selecting identity constraint nodes</li>
						<li>DOM uses XPath as a way to select DOM nodes</li>
					</ul>
					<li>Depending on the environment, expression must yield certain results</li>
					<ul>
						<li>for conditionals, a boolean must be returned</li>
						<li>iterations (in XSLT) only loop over nodes</li>
						<li>when printing out text, a string must be produced</li>
					</ul>
					<li>XPath has built-in rules for casting types</li>
					<ul>
						<li>node set → boolean: empty is false, non-empty is true</li>
						<li>node → string: take the <em>string value</em> (i.e., concatenate all text node descendants)</li>
						<li>string → number: interpret as decimal notation (otherwise return <q><code>NaN</code></q>)</li>
						<li>XPaths often return surprising results (<code>//a[starts-with(@href, https)]</code>)</li>
					</ul>
				</ul>
			</slide>
		</part>
		<part id="xpath-functions">
			<title>XPath Functions</title>
			<slide>
				<title>Function Library</title>
				<ul>
					<li>XPath has a small library of built-in functions</li>
					<ul>
						<li>useful for basic XPath-level functions</li>
						<li>other specs are allowed to extend it (XSLT does it)</li>
					</ul>
					<li>XPath functions return results of various data types</li>
					<ul>
						<li>boolean: <code>boolean, contains, false, lang, not, starts-with, true</code></li>
						<li>number: <code>ceiling, count, floor, last, number, position, round, string-length, sum</code></li>
						<li>string: <code>concat, local-name, name, namespace-uri, normalize-space, string, substring, substring-after, substring-before, translate</code></li>
						<li>node set: <code>id</code></li>
					</ul>
				</ul>
			</slide>
			<slide>
				<title>Using Functions</title>
				<ul>
					<li>Functions and location paths are orthogonal</li>
					<ul>
						<li>each construct may be based on the other</li>
						<li>it is possible to nest them arbitrarily</li>
						<li>predicates often contain functions</li>
						<pre>//a[substring(@href,string-length(@href)-2)='pdf']</pre>
					</ul>
					<li>XPaths can become powerful and complex</li>
					<ul>
						<li>writing some code or thinking about an XPath?</li>
						<li>XPaths are more declarative</li>
						<li>they may be more robust against changes in the XML schema</li>
						<li>they can be optimized by a smart XPath implementation</li>
					</ul>
				</ul>
			</slide>
		</part>
		<part>
			<title>Limitations of XPath</title>
			<slide>
				<title>XPath Selects</title>
				<ul>
					<li>Query languages select and recombine</li>
					<ol>
						<li>look up all addresses by zip code</li>
						<li>for each zip code, count the number of addresses</li>
					</ol>
					<li>XSLT fills in the missing parts (as a programming language)</li>
					<ul>
						<li>XSLT can construct XML and re-apply XPath</li>
					</ul>
					<li>XQuery fills in the missing parts (query-wise)</li>
					<ul>
						<li>80% of XQuery are XPath (in version 2.0, though)</li>
						<li>the remaining 20% are bindings, constructors, and glue</li>
					</ul>
				</ul>
			</slide>
		</part>
		<part>
			<title>Conclusions</title>
			<slide>
				<title>XPath is Important</title>
				<ul>
					<li>XPath is a basic tool of the XML toolbox</li>
					<li>XPath is reused in various XML technologies</li>
					<li>XPath selects parts of an XML document</li>
					<li>XPath can do more general things by using expressions</li>
				</ul>
			</slide>
		</part>
	</presentation>
	<presentation id="xslt-1">
		<title short="XSLT 1">XML Transformations (XSLT) – Part I</title>
		<date>2007-09-25</date>
		<toc class="resources"/>
		<toc class="abstract">Because XML can be used to represent any vocabulary (often defined by some schema), the question is how these different vocabularies can be processed and maybe transformed into something else. This <q>something else</q> may be another XML vocabulary (a common requirement in B2B scenarios), or it may be HTML (a common scenario for Web publishing). Using <em>XSL Transformations (XSLT)</em>, mapping tasks can be implemented easily. XSLT leverages XPath's expressive power in a rather simple programming language, the programs are often called <em>stylesheets</em>. For easy tasks, XSLT mappings can be specified without much real <q>programming</q> going on, by simply specifying how components of the source markup are mapped to components of the target markup.</toc>
		<slide>
			<title>Abstract</title>
			<p class="abstract"><toc class="abstract"/></p>
		</slide>
		<slide>
			<title>XPath and XSLT</title>
			<ul>
				<li>XPath is an expression language</li>
				<ul>
					<li>location paths let you select parts of an XML document tree</li>
					<li>expressions in general may have other data types as well (string, number, boolean)</li>
				</ul>
				<li>XSLT is a programming language based on XPath</li>
				<ul>
					<li>XSLT defines the structures for the control flow within the program</li>
					<li>in all the places where something is evaluated, XPaths are being used</li>
					<li>sometimes, one can substitute for the other</li>
				</ul>
			</ul>
			<listing src="xslt-vs-xpath.xsl" line="5-13"/>
		</slide>
		<slide>
			<title>XSLT Syntax</title>
			<img src="xml-technology-syntaxes.png" style="width : 90% ; margin : 4% ; "/>
		</slide>
		<slide>
			<title>XSLT Executive Summary</title>
			<ul>
				<li>XSLT is an XML-oriented programming language</li>
				<li>XSLT uses XML as its syntax</li>
				<li>XSLT is a weakly typed language</li>
				<li>XSLT is not designed for large programming tasks</li>
				<li>XSLT is the standard language for XML-to-XML transformations</li>
				<li>XSLT is very simple and often too simple</li>
				<li><link href="xslt20-1">XSLT 2.0</link> is much more complex and powerful</li>
			</ul>
		</slide>
		<sli