The Extensible Markup Language (XML) defines a simple way for structuring data. The power and popularity of XML can be explained by its versatility, the platform-independence, the standards and technologies leveraging it, and the number of tools and products supporting it. Understanding XML itself is rather simple, it only depends on a very small set of other technologies. Unicode and URIs are the most important foundations of XML. XML itself specifies two different things: on the one hand the format for structured data, which are called XML documents, and on the other hand a constraint language for XML documents, which is called Document Type Definition (DTD)
understandHTML pages
dead ends
dead end(from a machine's point of view)
online
understandingis the key term here: application semantics!
SGML on the Web
XML is the ASCII for the 21st century
security through obscurityprinciple inadvertently
@misc{xml10fourth, author = "Tim Bray and Jean Paoli and C. Michael Sperberg-McQueen and Eve Maler and Fran\c{c}ois Yergeau", title = "Extensible Markup Language (XML) 1.0 (Fourth Edition)", howpublished = "World Wide Web Consortium, Recommendation REC-xml-20060816", month = aug, year = 2006, uri = "http://www.w3.org/TR/2006/REC-xml-20060816", abstract = "The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML." }
<reference id="xml10fourth" type="misc"> <abstract>The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.</abstract> <author> <person> <first>Tim</first> <last>Bray</last> </person> <person> <first>Jean</first> <last>Paoli</last> </person> <person> <first>C. Michael</first> <last>Sperberg-McQueen</last> </person> <person> <first>Eve</first> <last>Maler</last> </person> <person> <first>Fran\c{c}ois</first> <last>Yergeau</last> </person> </author> <howpublished>World Wide Web Consortium, Recommendation REC-xml-20060816</howpublished> <month> <macro ref="aug"/> </month> <title>Extensible Markup Language (XML) 1.0 (Fourth Edition)</title> <uri>http://www.w3.org/TR/2006/REC-xml-20060816</uri> <year>2006</year> </reference>
<reference name="xml10fourth" type="bibtex:misc"> <names type="sharef:author"> <person> <givenname>Tim</givenname> <surname>Bray</surname> </person> <person> <givenname>Jean</givenname> <surname>Paoli</surname> </person> <person> <givenname>C. Michael</givenname> <surname>Sperberg-McQueen</surname> </person> <person> <givenname>Eve</givenname> <surname>Maler</surname> </person> <person> <givenname>François</givenname> <surname>Yergeau</surname> </person> </names> <date value="2006-08"/> <abstract> <richtext> <p>The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.</p> </richtext> </abstract> <howpublished>World Wide Web Consortium, Recommendation REC-xml-20060816</howpublished> <title type="sharef:primaryTitle">Extensible Markup Language (XML) 1.0 (Fourth Edition)</title> <identifier type="sharef:uri">http://www.w3.org/TR/2006/REC-xml-20060816</identifier> </reference>
understoodto make the mapping
<record> <work-type> <style face="normal" font="default" size="100%">World Wide Web Consortium, Recommendation REC-xml-20060816</style> </work-type> <ref-type>13</ref-type> <contributors> <authors> <author> <style face="normal" font="default" size="100%">Bray, Tim</style> </author> <author> <style face="normal" font="default" size="100%">Paoli, Jean</style> </author> <author> <style face="normal" font="default" size="100%">Sperberg-McQueen, C. Michael</style> </author> <author> <style face="normal" font="default" size="100%">Maler, Eve</style> </author> <author> <style face="normal" font="default" size="100%">Yergeau, François</style> </author> </authors> </contributors> <titles/> <dates> <year> <style face="normal" font="default" size="100%">2006</style> </year> <pub-dates> <date> <style face="normal" font="default" size="100%">2006-08</style> </date> </pub-dates> </dates> <abstract> <style face="normal" font="default" size="100%">The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.</style> </abstract> <urls/> </record>
self-describing
self-explanatory
describing
self-describing
self-describingand
self-explanatory?
self-describingmeans, you can guess a lot, but you maybe wrong
viewsof the same content)
bad XML, complain about it
XML documents can use a wide array of characters. They are defined by Unicode, which currently (Version 5.0) defines more than 100'000 characters (#100'000 added in 2005).
<?xml version="1.0" encoding="UTF-8"?> <JAPANESE> <TITLE>専門家リスト </TITLE> <ITEM>アシム・アブドゥラー氏(コマースネット事務局長)</ITEM> <ITEM>アラン・A・メッコラー氏(メッコラーメディア会長兼CEO)</ITEM> <ITEM>アラン・サルディッチ氏(メトリコムディレクター)</ITEM> <ITEM>ウィスター・ウォルコット氏(パイロットネットワーク・サービシズ副社長)</ITEM> <ITEM>・エリック・リンゲワルド氏(ビー・インク副社長)</ITEM> <ITEM>ジェームス・L・バークスデール氏(ネットスケープ・コミュニケーションズ社長)</ITEM> </JAPANESE>
<?xml version="1.0" encoding="UTF-8"?> <文書 改訂日付="1999年3月1日"> <題>サンプル</題> <段落>これはサンプル文書です。</段落> <!-- コメント --> <段落>会社名</段落> <図面 図面実体名="サンプル" /> </文書>
encoded?
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?> <element> <subelement attribute="value">Content</subelement> <subelement a2="value2">More Content</subelement> <empty-element a3="v3"></empty-element> <empty-element a4="v4" a5="v5"/> </element>
<address><city>Berkeley</city><zip>94709</zip>...
)<givenname>Erik</givenname><givenname>Thomas</givenname>
)content
<section id="xml" author="bob"> <title>Extensible Markup Language (XML)</title> <p>XML is based on SGML (Section <ref name="sgml"/>) ...</p> <p type="example">XML can be used ...</p> <section id="xml-syntax" author="dret"> <title>XML Syntax</title> <p>Section <ref name="sgml-syntax"/> describes ...</p> </section> </section>
<opens a tag
&name;
for referring to entity name
<
, >
, &
, '
, "
<elem attr="Single ' and Double ""/>
<li>Attribute using both kinds of quotes: <code><elem attr="Single ' and Double &quot;"/></code></li>
The term Mixed content in XML refers to elements which have text content mixed with elements. What these elements do depends on the elements , but the important point is that they are on the same level as the text nodes of the mixed content.
<p>The term <em>Mixed content</em> in XML refers to elements <a href="http://www.w3.org/TR/xml/#sec-mixed-content">which have text content mixed with elements</a>. What these elements do depends on the elements <img style="height : 1em" src="smily.gif"/>, but the important point is that they are on the same level as the text nodes of the mixed content.</p>
Whitespace can be very important!
<p>Whitespace <i>can be</i> <u>very</u> <b>important</b>!</p>