Extensible Markup Language (XML)

Information Systems and the World Wide Web

International School of New Media
University of Lübeck

Erik Wilde, UC Berkeley School of Information
2007-01-04
Creative Commons License

This work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 2.5 License.

Abstract

The Extensible Markup Language (XML) defines a simple way for structuring data. The power and popularity of XML can be explained by its versatility, the platform-independence, the standards and technologies leveraging it, and the number of tools and products supporting it. Understanding XML itself is rather simple, it only depends on a very small set of other technologies. Unicode and URIs are the most important foundations of XML. XML itself specifies two different things: on the one hand the format for structured data, which are called XML documents, and on the other hand a constraint language for XML documents, which is called Document Type Definition (DTD)

Outline (Why XML?)

  1. Why XML? [9]
    1. Pre-XML Problems [2]
    2. XML on the Web [3]
    3. XML Today [2]
  2. What is XML? [12]
    1. What is XML Good for? [8]
    2. What is XML not Good for? [3]
  3. Foundations for XML [3]
  4. XML [15]
    1. XML Documents [11]
    2. Processing XML [3]
  5. Conclusions [1]

Web Technologies

From Humans to Machines

Outline (Pre-XML Problems)

  1. Why XML? [9]
    1. Pre-XML Problems [2]
    2. XML on the Web [3]
    3. XML Today [2]
  2. What is XML? [12]
    1. What is XML Good for? [8]
    2. What is XML not Good for? [3]
  3. Foundations for XML [3]
  4. XML [15]
    1. XML Documents [11]
    2. Processing XML [3]
  5. Conclusions [1]

HTML is for Humans

A Machine-Friendly Web

Outline (XML on the Web)

  1. Why XML? [9]
    1. Pre-XML Problems [2]
    2. XML on the Web [3]
    3. XML Today [2]
  2. What is XML? [12]
    1. What is XML Good for? [8]
    2. What is XML not Good for? [3]
  3. Foundations for XML [3]
  4. XML [15]
    1. XML Documents [11]
    2. Processing XML [3]
  5. Conclusions [1]

SGML, HTML, and XML

XML Documents on the Web

XML Documents Elsewhere

Outline (XML Today)

  1. Why XML? [9]
    1. Pre-XML Problems [2]
    2. XML on the Web [3]
    3. XML Today [2]
  2. What is XML? [12]
    1. What is XML Good for? [8]
    2. What is XML not Good for? [3]
  3. Foundations for XML [3]
  4. XML [15]
    1. XML Documents [11]
    2. Processing XML [3]
  5. Conclusions [1]

Used Everywhere

This Course and XML

Outline (What is XML?)

  1. Why XML? [9]
    1. Pre-XML Problems [2]
    2. XML on the Web [3]
    3. XML Today [2]
  2. What is XML? [12]
    1. What is XML Good for? [8]
    2. What is XML not Good for? [3]
  3. Foundations for XML [3]
  4. XML [15]
    1. XML Documents [11]
    2. Processing XML [3]
  5. Conclusions [1]

XML Ying & Yang

Outline (What is XML Good for?)

  1. Why XML? [9]
    1. Pre-XML Problems [2]
    2. XML on the Web [3]
    3. XML Today [2]
  2. What is XML? [12]
    1. What is XML Good for? [8]
    2. What is XML not Good for? [3]
  3. Foundations for XML [3]
  4. XML [15]
    1. XML Documents [11]
    2. Processing XML [3]
  5. Conclusions [1]

Why Use XML?

Case Study

Pre-XML Data

@misc{xml10fourth,
    author =    "Tim Bray and Jean Paoli and C. Michael Sperberg-McQueen and Eve Maler and Fran\c{c}ois Yergeau",
    title =     "Extensible Markup Language (XML) 1.0 (Fourth Edition)",
    howpublished =  "World Wide Web Consortium, Recommendation REC-xml-20060816",
    month =     aug,
    year =      2006,
    uri =       "http://www.w3.org/TR/2006/REC-xml-20060816",
    abstract =  "The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML."
}

XMLized Data (Bad Idea)

<?xml version="1.0" encoding="UTF-8"?>
<bibtex>
@misc{xml10fourth,
    author =    "Tim Bray and Jean Paoli and C. Michael Sperberg-McQueen and Eve Maler and Fran\c{c}ois Yergeau",
    title =     "Extensible Markup Language (XML) 1.0 (Fourth Edition)",
    howpublished =  "World Wide Web Consortium, Recommendation REC-xml-20060816",
    month =     aug,
    year =      2006,
    uri =       "http://www.w3.org/TR/2006/REC-xml-20060816",
    abstract =  "The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML."
}

XMLized Data

 <reference id="xml10fourth" type="misc">
  <abstract>The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.</abstract>
  <author>
   <person>
    <first>Tim</first>
    <last>Bray</last>
   </person>
   <person>
    <first>Jean</first>
    <last>Paoli</last>
   </person>
   <person>
    <first>C. Michael</first>
    <last>Sperberg-McQueen</last>
   </person>
   <person>
    <first>Eve</first>
    <last>Maler</last>
   </person>
   <person>
    <first>Fran\c{c}ois</first>
    <last>Yergeau</last>
   </person>
  </author>
  <howpublished>World Wide Web Consortium, Recommendation REC-xml-20060816</howpublished>
  <month>
   <macro ref="aug"/>
  </month>
  <title>Extensible Markup Language (XML) 1.0 (Fourth Edition)</title>
  <uri>http://www.w3.org/TR/2006/REC-xml-20060816</uri>
  <year>2006</year>
 </reference>

XML Data

 <reference name="xml10fourth" type="bibtex:misc">
  <names type="sharef:author">
   <person>
    <givenname>Tim</givenname>
    <surname>Bray</surname>
   </person>
   <person>
    <givenname>Jean</givenname>
    <surname>Paoli</surname>
   </person>
   <person>
    <givenname>C. Michael</givenname>
    <surname>Sperberg-McQueen</surname>
   </person>
   <person>
    <givenname>Eve</givenname>
    <surname>Maler</surname>
   </person>
   <person>
    <givenname>François</givenname>
    <surname>Yergeau</surname>
   </person>
  </names>
  <date value="2006-08"/>
  <abstract>
   <richtext>
    <p>The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.</p>
   </richtext>
  </abstract>
  <howpublished>World Wide Web Consortium, Recommendation REC-xml-20060816</howpublished>
  <title type="sharef:primaryTitle">Extensible Markup Language (XML) 1.0 (Fourth Edition)</title>
  <identifier type="sharef:uri">http://www.w3.org/TR/2006/REC-xml-20060816</identifier>
 </reference>

Other XML Data

  <record>
   <work-type>
    <style face="normal" font="default" size="100%">World Wide Web Consortium, Recommendation REC-xml-20060816</style>
   </work-type>
   <ref-type>13</ref-type>
   <contributors>
    <authors>
     <author>
      <style face="normal" font="default" size="100%">Bray, Tim</style>
     </author>
     <author>
      <style face="normal" font="default" size="100%">Paoli, Jean</style>
     </author>
     <author>
      <style face="normal" font="default" size="100%">Sperberg-McQueen, C. Michael</style>
     </author>
     <author>
      <style face="normal" font="default" size="100%">Maler, Eve</style>
     </author>
     <author>
      <style face="normal" font="default" size="100%">Yergeau, François</style>
     </author>
    </authors>
   </contributors>
   <titles/>
   <dates>
    <year>
     <style face="normal" font="default" size="100%">2006</style>
    </year>
    <pub-dates>
     <date>
      <style face="normal" font="default" size="100%">2006-08</style>
     </date>
    </pub-dates>
   </dates>
   <abstract>
    <style face="normal" font="default" size="100%">The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.</style>
   </abstract>
   <urls/>
  </record>

Is XML Self-Describing?

Outline (What is XML not Good for?)

  1. Why XML? [9]
    1. Pre-XML Problems [2]
    2. XML on the Web [3]
    3. XML Today [2]
  2. What is XML? [12]
    1. What is XML Good for? [8]
    2. What is XML not Good for? [3]
  3. Foundations for XML [3]
  4. XML [15]
    1. XML Documents [11]
    2. Processing XML [3]
  5. Conclusions [1]

XML is Character-Based

XML is a Syntax for Trees

XML Usages

Outline (Foundations for XML)

  1. Why XML? [9]
    1. Pre-XML Problems [2]
    2. XML on the Web [3]
    3. XML Today [2]
  2. What is XML? [12]
    1. What is XML Good for? [8]
    2. What is XML not Good for? [3]
  3. Foundations for XML [3]
  4. XML [15]
    1. XML Documents [11]
    2. Processing XML [3]
  5. Conclusions [1]

Identifications

XML's Idea of Content and Names

XML documents can use a wide array of characters. They are defined by Unicode, which currently (Version 5.0) defines more than 100'000 characters (#100'000 added in 2005).

<?xml version="1.0" encoding="UTF-8"?>
<JAPANESE>
 <TITLE>専門家リスト </TITLE>
 <ITEM>アシム・アブドゥラー氏(コマースネット事務局長)</ITEM>
 <ITEM>アラン・A・メッコラー氏(メッコラーメディア会長兼CEO)</ITEM>
 <ITEM>アラン・サルディッチ氏(メトリコムディレクター)</ITEM>
 <ITEM>ウィスター・ウォルコット氏(パイロットネットワーク・サービシズ副社長)</ITEM>
 <ITEM>・エリック・リンゲワルド氏(ビー・インク副社長)</ITEM>
 <ITEM>ジェームス・L・バークスデール氏(ネットスケープ・コミュニケーションズ社長)</ITEM>
</JAPANESE>
<?xml version="1.0" encoding="UTF-8"?>
<文書 改訂日付="1999年3月1日">
 <題>サンプル</題>
 <段落>これはサンプル文書です。</段落>
 <!-- コメント -->
 <段落>会社名</段落>
 <図面 図面実体名="サンプル" />
</文書>

XML and Unicode

Outline (XML)

  1. Why XML? [9]
    1. Pre-XML Problems [2]
    2. XML on the Web [3]
    3. XML Today [2]
  2. What is XML? [12]
    1. What is XML Good for? [8]
    2. What is XML not Good for? [3]
  3. Foundations for XML [3]
  4. XML [15]
    1. XML Documents [11]
    2. Processing XML [3]
  5. Conclusions [1]

XML Use Cases

Outline (XML Documents)

  1. Why XML? [9]
    1. Pre-XML Problems [2]
    2. XML on the Web [3]
    3. XML Today [2]
  2. What is XML? [12]
    1. What is XML Good for? [8]
    2. What is XML not Good for? [3]
  3. Foundations for XML [3]
  4. XML [15]
    1. XML Documents [11]
    2. Processing XML [3]
  5. Conclusions [1]

Markup?

Basic Concepts

<?xml version="1.0" encoding="UTF-8"?>
<element>
 <subelement attribute="value">Content</subelement>
 <subelement a2="value2">More Content</subelement>
 <empty-element a3="v3"></empty-element>
 <empty-element a4="v4" a5="v5"/>
</element>

Tree Syntax

Elements

Attributes

 <section id="xml" author="bob">
  <title>Extensible Markup Language (XML)</title>
  <p>XML is based on SGML (Section <ref name="sgml"/>) ...</p>
  <p type="example">XML can be used ...</p>
  <section id="xml-syntax" author="dret">
   <title>XML Syntax</title>
   <p>Section <ref name="sgml-syntax"/> describes ...</p>
  </section>
 </section>

Attribute Syntax

The Price for Markup

<li>Attribute using both kinds of quotes: <code>&lt;elem attr="Single ' and Double &amp;quot;"/></code></li>

Mixed Content

The term Mixed content in XML refers to elements which have text content mixed with elements. What these elements do depends on the elements , but the important point is that they are on the same level as the text nodes of the mixed content.

<p>The term <em>Mixed content</em> in XML refers to elements <a href="http://www.w3.org/TR/xml/#sec-mixed-content">which have text content mixed with elements</a>. What these elements do depends on the elements <img style="height : 1em" src="smily.gif"/>, but the important point is that they are on the same level as the text nodes of the mixed content.</p>

Mixed Content Usage

Whitespace

Significant Whitespace

Whitespace can be very important!

<p>Whitespace <i>can be</i> <u>very</u> <b>important</b>!</p>

Outline (Processing XML)

  1. Why XML? [9]
    1. Pre-XML Problems [2]
    2. XML on the Web [3]
    3. XML Today [2]
  2. What is XML? [12]
    1. What is XML Good for? [8]
    2. What is XML not Good for? [3]
  3. Foundations for XML [3]
  4. XML [15]
    1. XML Documents [11]
    2. Processing XML [3]
  5. Conclusions [1]

Observing XML Syntax

Validity

Semantics

Outline (Conclusions)

  1. Why XML? [9]
    1. Pre-XML Problems [2]
    2. XML on the Web [3]
    3. XML Today [2]
  2. What is XML? [12]
    1. What is XML Good for? [8]
    2. What is XML not Good for? [3]
  3. Foundations for XML [3]
  4. XML [15]
    1. XML Documents [11]
    2. Processing XML [3]
  5. Conclusions [1]

XML Documents