Alternative Schema Languages — Schematron

XML Foundations (INFOSYS 242)

Erik Wilde, UC Berkeley iSchool
Tuesday, October 17, 2006
Creative Commons License

This work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 2.5 License.

Abstract

While XML Schema is the most popular schema language in use today and for the foreseeable future, it is only one representative from a class of languages which are all designed for the purpose of testing whether some XML document satisfies a set of constraints. This test could of course also be conducted programmatically, but this is not portable and not easily maintainable. Schema languages thus often use a declarative approach to specifying how to conduct validation. A very simple yet very powerful language for this is Schematron, which uses the expressive power of XPath for testing whether a document satisfies a set of conditions. Schematron is rule-based in contrast to the more traditional grammar-based schema languages and complements these very well.

XML Schema Languages

Schema-Validation and Applications

Validation Pipelines

Validation Pipeline Example

Outline (RELAX NG)

  1. RELAX NG [7]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]

Design by Committee

RELAX NG +/-

Outline (Principles)

  1. RELAX NG [7]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]

Validation

Grammars

Outline (Example)

  1. RELAX NG [7]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]

DTD and XML Schema

<!ELEMENT document (heading, chapter) >
<!ELEMENT heading  (#PCDATA) >
<!ELEMENT chapter  (heading, para+) >
<!ATTLIST chapter  id ID #REQUIRED >
<!ELEMENT para     (#PCDATA) >
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:element name="document">
  <xs:complexType>
   <xs:sequence>
    <xs:element ref="heading"/>
    <xs:element ref="chapter"/>
   </xs:sequence>
  </xs:complexType>
 </xs:element>
 <xs:element name="heading" type="xs:string"/>
 <xs:element name="chapter">
  <xs:complexType>
   <xs:sequence>
    <xs:element ref="heading"/>
    <xs:element name="para" type="xs:string" maxOccurs="unbounded"/>
   </xs:sequence>
   <xs:attribute name="id" type="xs:ID"/>
  </xs:complexType>
 </xs:element>
</xs:schema>

RELAX NG

<grammar xmlns="http://relaxng.org/ns/structure/1.0">
 <start><ref name="document"/></start>
 <define name="document">
  <element name="document">
   <ref name="heading"/>
   <ref name="chapter"/>
  </element>
 </define>
 <define name="heading">
  <element name="heading"><text/></element>
 </define>
 <define name="chapter">
  <element name="chapter">
   <attribute name="id"><text/></attribute>
   <ref name="heading"/>
   <oneOrMore>
    <element name="para"><text/></element>
   </oneOrMore>
  </element>
 </define>
</grammar>

RELAX NG Compact Syntax

start    = document

document = element document { heading, chapter }

heading  = element heading { text }

chapter  = element chapter {
    attribute id { text },
    heading,
    element para { text }+
  }

Outline (Document Schema Definition Languages (DSDL))

  1. RELAX NG [7]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]

Modular Validation

DSDL Master Plan

Outline (Schematron)

  1. RELAX NG [7]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]

XPath Again

Basics

<schema xmlns="http://www.ascc.net/xml/schematron">
 <title>Address Checking</title>
 <pattern name="Phone Number Checking">
  <rule context="address">
   <assert test="(count(phone[@type = 'voice']) > 0) and (count(phone[@type = 'fax']) >  0)">there must be at least one voice and one fax number</assert>
  </rule>
 </pattern>
</schema>

Outline (Implementation)

  1. RELAX NG [7]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]

Performing Validation

XSLT-Generated XSLT

<xsl:template match="rule">
	<xsl:template match="{@context}">
		<xsl:apply-templates select="assert"/>
	</xsl:template>
</xsl:template>

XSLT-Based Schematron

Compiling Assertions

<xsl:stylesheet version="1.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
 xmlns:axsl="http://www.w3.org/1999/XSL/TransformAlias" 
 xmlns:sch="http://www.ascc.net/xml/schematron"
  >
<xsl:namespace-alias stylesheet-prefix="axsl" result-prefix="xsl"/>
 <!-- ASSERT and REPORT -->
 <xsl:template match="sch:assert | assert">
                <xsl:if test="not(@test)">
                    <xsl:message>Markup Error: no test attribute in &lt;assert></xsl:message>
                </xsl:if>
  <axsl:choose>
   <axsl:when test="{@test}"/>
   <axsl:otherwise>
    <xsl:call-template name="process-assert">
     <xsl:with-param name="role" select="@role"/>
     <xsl:with-param name="id" select="@id"/>
     <xsl:with-param name="test" select="normalize-space(@test)" />
     <xsl:with-param name="icon" select="@icon"/>
     <xsl:with-param name="subject" select="@subject"/>
     <xsl:with-param name="diagnostics" select="@diagnostics"/>
    </xsl:call-template>  
   </axsl:otherwise>
  </axsl:choose>
 </xsl:template>
 <xsl:template match="sch:report | report">
                <xsl:if test="not(@test)">
                    <xsl:message>Markup Error: no test attribute in &lt;report></xsl:message>
                </xsl:if>
  <axsl:if test="{@test}">
   <xsl:call-template name="process-report">
    <xsl:with-param name="role" select="@role"/>
    <xsl:with-param name="test" select="normalize-space(@test)" />
    <xsl:with-param name="icon" select="@icon"/>
    <xsl:with-param name="id" select="@id"/>
    <xsl:with-param name="subject" select="@subject"/>
    <xsl:with-param name="diagnostics" select="@diagnostics"/>
   </xsl:call-template>
  </axsl:if>
 </xsl:template>

Compiled Example

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:sch="http://www.ascc.net/xml/schematron" version="1.0">
 <xsl:template match="*|@*" mode="schematron-get-full-path">
  <xsl:apply-templates select="parent::*" mode="schematron-get-full-path"/>
  <xsl:text>/</xsl:text>
  <xsl:if test="count(. | ../@*) = count(../@*)">@</xsl:if>
  <xsl:value-of select="name()"/>
  <xsl:text>[</xsl:text>
  <xsl:value-of select="1+count(preceding-sibling::*[name()=name(current())])"/>
  <xsl:text>]</xsl:text>
 </xsl:template>
 <xsl:template match="/">
  <xsl:apply-templates select="/" mode="M1"/>
 </xsl:template>
 <xsl:template match="address" priority="4000" mode="M1">
  <xsl:choose>
   <xsl:when test="(count(phone[@type = 'voice']) &gt; 0) and (count(phone[@type = 'fax']) &gt;  0)"/>
   <xsl:otherwise>there must be at least one voice and one fax number</xsl:otherwise>
  </xsl:choose>
  <xsl:apply-templates mode="M1"/>
 </xsl:template>
 <xsl:template match="text()" priority="-1" mode="M1"/>
 <xsl:template match="text()" priority="-1"/>
</xsl:stylesheet>

Outline (Patterns)

  1. RELAX NG [7]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]

Grouping Tests

Outline (Rules)

  1. RELAX NG [7]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]

Setting the Context

Outline (Assertions)

  1. RELAX NG [7]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]

Assertions with assert

<!ELEMENT ENTRY (NAME, ADDRESS, PHONENUM+, EMAIL) >
( count(NAME) = 1 and count(ADDRESS) = 1 and count(EMAIL) = 1 ) and ( NAME[following-sibling::ADDRESS] and ADDRESS[following-sibling::PHONENUM] and PHONENUM[following-sibling::EMAIL] ) and ( count(NAME|ADDRESS|PHONENUM|EMAIL) = count(*) )

Assertions with report

Report Example

<schema xmlns="http://www.ascc.net/xml/schematron">
 <title>Address Checking</title>
 <pattern name="Contact Details Checking">
  <rule context="address">
   <assert test="(count(phone[@type = 'voice']) > 0) and (count(phone[@type = 'fax']) >  0)">ERROR: There must be at least one voice and one fax number</assert>
  </rule>
  <rule context="website">
   <report test="substring(text(), string-length(text())-3) = '.edu'">REPORT: There is an .edu Web site</report>
  </rule>
 </pattern>
</schema>

Outline (Conclusions)

  1. RELAX NG [7]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]

Validation is Good