[http://creativecommons.org/licenses/by/3.0/]
This work is licensed under a CC
Attribution 3.0 Unported License [http://creativecommons.org/licenses/by/3.0/]
SGML light
The XML specification defines a format for structured data (XML documents) and a grammar-based constraint language for these (DTD). In SGML-based systems, DTDs were often very complex and feature-rich constructs, which controlled a lot of the processing of SGML documents. XML greatly simplified DTDs, and de-facto usage of DTDs today simplified them even more. In many systems today, DTDs are not used at all or generated from sample documents. In this lecture, it is argued that DTDs (or schemas, to be more general) should be taken seriously in any non-trivial XML application, because they are a representation of the underlying (and often underspecified) data model of the application.
typesof documents, well-formed and valid
<address> <name short="iSchool">School of Information</name> <voice>(510) 642-1464</phone> <fax>(510) 642-1464</fax> <website>http://ischool.berkeley.edu/</website> <postal>...</postal> </address>
<address> <name short="iSchool">School of Information</name> <voice>(510) 642-1464</voice> <fax>(510) 642-1464</fax> <website>http://ischool.berkeley.edu/</website> <postal>...</postal> </address>
<address> <name short="iSchool">School of Information</name> <phone type="voice">(510) 642-1464</phone> <phone type="fax">(510) 642-1464</phone> <website>http://ischool.berkeley.edu/</website> <postal>...</postal> </address>
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE address SYSTEM "address.dtd">
<!ELEMENT address (name, phone*, website*, postal?)> <!ELEMENT name (#PCDATA)> <!ATTLIST name short CDATA #REQUIRED > <!ELEMENT phone (#PCDATA)> <!ATTLIST phone type ( voice | fax ) #REQUIRED > <!ELEMENT postal (#PCDATA)> <!ELEMENT website (#PCDATA)>
SGML light
contains or points to markup declarations that provide a grammar for a class of documents
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE address SYSTEM "address.dtd">
<!ELEMENT address (name, phone*, website*, postal?)> <!ELEMENT name (#PCDATA)> <!ATTLIST name short CDATA #REQUIRED > <!ELEMENT phone (#PCDATA)> <!ATTLIST phone type ( voice | fax ) #REQUIRED > <!ELEMENT postal (#PCDATA)> <!ELEMENT website (#PCDATA)>
<!ELEMENT example EMPTY>
thus is a complete DTD<!… >
syntaxELEMENT
is used to define an element [Defining Elements (1)]ATTLIST
is used to define an attribute list [Defining Attribute Lists (1)]ENTITY
is used to define an entity [Entities (1)]<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE address SYSTEM "address.dtd"> <address>
,
|
?
+
*
<!ELEMENT table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))> <!ELEMENT caption %Inline;> <!ELEMENT thead (tr)+> <!ELEMENT tfoot (tr)+> <!ELEMENT tbody (tr)+> <!ELEMENT colgroup (col)*> <!ELEMENT col EMPTY> <!ELEMENT tr (th|td)+> <!ELEMENT th %Flow;> <!ELEMENT td %Flow;>
<!ELEMENT x (#PCDATA | a | b | …)* >
<!ELEMENT address (#PCDATA | %inline; | %misc.inline; | p)*>
<!ELEMENT style (#PCDATA)>
<!ELEMENT img EMPTY> <!ATTLIST img %attrs; src %URI; #REQUIRED alt %Text; #REQUIRED name NMTOKEN #IMPLIED longdesc %URI; #IMPLIED height %Length; #IMPLIED width %Length; #IMPLIED usemap %URI; #IMPLIED ismap (ismap) #IMPLIED align %ImgAlign; #IMPLIED border %Length; #IMPLIED hspace %Pixels; #IMPLIED vspace %Pixels; #IMPLIED >
<!ELEMENT param EMPTY> <!ATTLIST param id ID #IMPLIED name CDATA #REQUIRED value CDATA #IMPLIED valuetype (data|ref|object) "data" type %ContentType; #IMPLIED >
simulated
accept-charset %Charsets; #IMPLIED
<!ENTITY % Charsets "CDATA"> <!-- a space separated list of character encodings, as per [RFC2045] -->
#REQUIRED
means the attribute has to be specified (on every element)#IMPLIED
marks an optional attribute (the parser may imply a value)"…"
specifies a default value (and the attribute is optional)ID
is an attribute type [Attribute Types (1)] declared in the DTDxml:id
[XQuery – Part IV; XML IDs with xml:id (1)] is an attempt to support schema-independent IDs<document> <section id="sgml" author="dret"> <title>Standard Generalized Markup Language (SGML)</title> <p>SGML is an ISO standard ...</p> <section id="sgml-syntax" author="bob"> <title>SGML Syntax</title> <p>SGML uses markup, which is ...</p> </section> </section> <section id="xml" author="bob"> <title>Extensible Markup Language (XML)</title> <p>XML is based on SGML (Section <ref name="sgml"/>) ...</p> <p type="example">XML can be used ...</p> <section id="xml-syntax" author="dret"> <title>XML Syntax</title> <p>Section <ref name="sgml-syntax"/> describes ...</p>
<!ELEMENT section ( title, p+, section* ) > <!ATTLIST section id ID #REQUIRED author CDATA #REQUIRED > <!ELEMENT title ( #PCDATA )> <!ELEMENT p ( #PCDATA | ref )*> <!ATTLIST p type CDATA #IMPLIED > <!ELEMENT ref EMPTY > <!ATTLIST ref name IDREF #REQUIRED >
Hotspot can generate links to sections such as the section about ID/IDREF [ID/IDREF (1)], this link is then translated into the appropriate HTML code, meaning a link with the target being a fragment identifier to the slide number.
<p>Hotspot can generate links to sections such as the section about <link href="ididref"/>, this link is then translated into the appropriate HTML code, meaning a link with the target being a fragment identifier to the slide number.</p>
After running Hotspot, the following HTML is generated:
<p>Hotspot can generate links to sections such as the section about <a href="#ididref">ID/IDREF</a>, this link is then translated into the appropriate HTML code, meaning a link with the target being a fragment identifier to the slide number.</p>
&entity-name;
<!ENTITY aacute "á"> <!-- latin small letter a with acute, U+00E1 ISOlat1 --> <!ENTITY acirc "â"> <!-- latin small letter a with circumflex, U+00E2 ISOlat1 --> <!ENTITY atilde "ã"> <!-- latin small letter a with tilde, U+00E3 ISOlat1 --> <!ENTITY auml "ä"> <!-- latin small letter a with diaeresis, U+00E4 ISOlat1 -->
☺
or ☺
= ☺%entity-name;
duct tape, not elegant, but effective
<!ELEMENT p %Inline;> <!ATTLIST p %attrs; %TextAlign; >
<!ENTITY % attrs "%coreattrs; %i18n; %events;">
<!ENTITY % coreattrs "id ID #IMPLIED class CDATA #IMPLIED style %StyleSheet; #IMPLIED title %Text; #IMPLIED" >
<!ENTITY % i18n "lang %LanguageCode; #IMPLIED xml:lang %LanguageCode; #IMPLIED dir (ltr|rtl) #IMPLIED" >
<!ENTITY % LanguageCode "NMTOKEN"> <!-- a language code, as per [RFC3066] -->
<!ENTITY % TextAlign "align (left|center|right|justify) #IMPLIED">
<!ELEMENT p %Inline;> <!ATTLIST p %attrs; %TextAlign; >
<!ENTITY % Inline "(#PCDATA | %inline; | %misc.inline;)*">
<!ENTITY % inline "a | %special; | %fontstyle; | %phrase; | %inline.forms;">
<!ENTITY % special "%special.basic; | %special.extra;">
<!ENTITY % special.basic "br | span | bdo">
<!ENTITY % special.extra "object | applet | img | map | iframe">
<!ENTITY % misc.inline "ins | del | script">