XML Schema — Part I

XML Foundations (INFOSYS 242)

Erik Wilde, UC Berkeley iSchool
Tuesday, October 3, 2006
Creative Commons License

This work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 2.5 License.

Abstract

XML Schema is the most popular schema language for XML today. It has been introduced to overcome some of the commonly observed limitations of DTDs, most notably the lack of typing. Simple Types describe content which is not structured by XML markup, which means it describes attribute values and element content. Simple types can be defined by deriving new types from existing types by using type restriction. Complex Types describe element content if this content is using attributes and/or element content other than only character data. Using XML Schema's type concepts, it is easier to represent model-level information in a schema, because type hierarchies can represent model-level specializations.

Bad Names

XML Schema is a language for describing an XML schema.
An XML schema can be defined using XML Schema.
I would like to use XML Schema for my XML schema.

What's Wrong With DTDs?

Different Levels of Semantics

Schema-Validation and Applications

Validation and Typing

  1. Validation checks for structural integrity (is the document schema-valid?)
    • checking elements and attributes for proper usage (as with DTDs)
    • checking element contents and attribute values for proper values
  2. Type annotations make the types available to applications
    • instead of having to look at the schema, applications get the Post-Schema Validation Infoset (PSVI)
    • type-based applications (such as XSLT 2.0) can work on the typed instance

Outline (XML Schema Types)

  1. XML Schema Types [3]
  2. Simple Types [11]
    1. Simple Type Restrictions [7]
  3. Complex Types [6]
    1. Content Models [3]
  4. Conclusions [1]

What is a Type?

XML Schema vs. DTD

DTD XML Schema
Concepts some conceptual model (formal/informal)
Types ID/IDREF and (#P)CDATA Hierarchy of Simple and Complex Types
Markup Constructs Element Type Declarations
<!ELEMENT order ...
Element Definitions
<xs:element name="order"> ...
Instances (Documents) <order date=""> [ order content ] </order>

Document/Data Perspectives

Outline (Simple Types)

  1. XML Schema Types [3]
  2. Simple Types [11]
    1. Simple Type Restrictions [7]
  3. Complex Types [6]
    1. Content Models [3]
  4. Conclusions [1]

What are Simple Types?

Named vs. Anonymous

 <xs:element name="home" type="phoneType"/>
 <xs:element name="office" type="phoneType"/>
 <xs:simpleType name="phoneType">
  <xs:restriction base="xs:string">
   <xs:maxLength value="30"/>
  </xs:restriction>
 </xs:simpleType>
 <xs:element name="business">
  <xs:simpleType>
   <xs:restriction base="xs:string">
    <xs:maxLength value="30"/>
   </xs:restriction>
  </xs:simpleType>
 </xs:element>

Type Definitions

Type Hierarchy

Outline (Simple Type Restrictions)

  1. XML Schema Types [3]
  2. Simple Types [11]
    1. Simple Type Restrictions [7]
  3. Complex Types [6]
    1. Content Models [3]
  4. Conclusions [1]

Built-In Types

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:simpleType name="integer">
  <xs:restriction base="xs:decimal">
   <xs:fractionDigits value="0" fixed="true"/>
  </xs:restriction>
 </xs:simpleType>
 <xs:simpleType name="nonNegativeInteger">
  <xs:restriction base="integer">
   <xs:minInclusive value="0"/>
  </xs:restriction>
 </xs:simpleType>
 <xs:simpleType name="positiveInteger">
  <xs:restriction base="nonNegativeInteger">
   <xs:minInclusive value="1"/>
  </xs:restriction>
 </xs:simpleType>
</xs:schema>

How to Restrict

Facets

Facet Applicability

string length, minLength, maxLength, pattern, enumeration, whiteSpace
boolean pattern, whiteSpace
float pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
double pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
decimal totalDigits, fractionDigits, pattern, whiteSpace, enumeration, maxInclusive, maxExclusive, minInclusive, minExclusive
duration pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
dateTime pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
time pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
date pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
gYearMonth pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
gYear pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
gMonthDay pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
gDay pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
gMonth pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
hexBinary length, minLength, maxLength, pattern, enumeration, whiteSpace
base64Binary length, minLength, maxLength, pattern, enumeration, whiteSpace
anyURI length, minLength, maxLength, pattern, enumeration, whiteSpace
QName length, minLength, maxLength, pattern, enumeration, whiteSpace
NOTATION length, minLength, maxLength, pattern, enumeration, whiteSpace

Patterns

([a-zA-Z]{2}|[iI]-[a-zA-Z]+|[xX]-[a-zA-Z]{1,8})(-[a-zA-Z]{1,8})*

Simple Type Examples

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:simpleType name="myIntegerType">
  <xs:restriction base="xs:integer">
   <xs:minInclusive value="10000"/>
   <xs:maxInclusive value="99999"/>
  </xs:restriction>
 </xs:simpleType>
 <xs:simpleType name="stockKeepingUnitType">
  <xs:restriction base="xs:string">
   <xs:pattern value="\d{3}-[A-Z]{2}"/>
  </xs:restriction>
 </xs:simpleType>
 <xs:simpleType name="USStateType">
  <xs:restriction base="xs:string">
   <xs:enumeration value="AK"/>
   <xs:enumeration value="AL"/>
   <xs:enumeration value="AR"/>
   <!-- and so on ... -->
  </xs:restriction>
 </xs:simpleType>
</xs:schema>

Facet Limitations

Outline (Complex Types)

  1. XML Schema Types [3]
  2. Simple Types [11]
    1. Simple Type Restrictions [7]
  3. Complex Types [6]
    1. Content Models [3]
  4. Conclusions [1]

What is a Complex Type?

Complex Type Example

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:element name="billingAddress" type="addressType"/>
 <xs:element name="shippingAddress" type="addressType"/>
 <xs:complexType name="addressType">
  <xs:sequence>
   <xs:element name="name" type="xs:string"/>
   <xs:element name="street" type="xs:string"/>
   <xs:element name="city" type="xs:string"/>
   <xs:element name="state" type="xs:string" minOccurs="0"/>
   <xs:element name="zip" type="xs:decimal"/>
  </xs:sequence>
  <xs:attribute name="country" type="xs:NMTOKEN"/>
 </xs:complexType>
</xs:schema>

Complex Types & Content Types

Simple Types Complex Types
Simple Content Complex Content
Element Only Mixed Empty

Outline (Content Models)

  1. XML Schema Types [3]
  2. Simple Types [11]
    1. Simple Type Restrictions [7]
  3. Complex Types [6]
    1. Content Models [3]
  4. Conclusions [1]

DTD Content Models

Mixed Content

 <xs:element name="p" type="mixedType"/>
 <xs:complexType name="mixedType" mixed="true">
  <xs:choice maxOccurs="unbounded" minOccurs="0">
   <xs:element ref="b"/>
   <xs:element name="i" type="xs:string"/>
   <xs:element name="u" type="xs:string"/>
  </xs:choice>
  <xs:attribute ref="class"/>
 </xs:complexType>

Empty Content

Outline (Conclusions)

  1. XML Schema Types [3]
  2. Simple Types [11]
    1. Simple Type Restrictions [7]
  3. Complex Types [6]
    1. Content Models [3]
  4. Conclusions [1]

Typed XML Structures