Alternative Schema Languages – Schematron

XML Foundations [./]
Fall 2010 — INFO 242 (CCN 42593)

Erik Wilde, UC Berkeley School of Information
2010-11-16

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents E. Wilde: Alternative Schema Languages – Schematron

Contents

E. Wilde: Alternative Schema Languages – Schematron

(2) Abstract

XSD is only one representative from a class of languages which are all designed for the purpose of testing whether some XML document satisfies a set of constraints. This test could of course also be conducted programmatically, but this is not portable and not easily maintainable. Schema languages thus often use a declarative approach to specifying how to conduct validation. A very simple yet very powerful language for this is Schematron, which uses the expressive power of XPath for testing whether a document satisfies a set of conditions. Schematron is rule-based in contrast to the more traditional grammar-based schema languages and complements these very well.



E. Wilde: Alternative Schema Languages – Schematron

(3) XML Schema Languages



E. Wilde: Alternative Schema Languages – Schematron

(4) Schema-Validation and Applications

schema-valid-documents.png

E. Wilde: Alternative Schema Languages – Schematron

(5) Validation Pipelines



E. Wilde: Alternative Schema Languages – Schematron

(6) Validation Pipeline Example

validation-pipeline.png

RELAX NG

Outline (RELAX NG)

  1. RELAX NG [8]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]
RELAX NG E. Wilde: Alternative Schema Languages – Schematron

(8) Design by Committee



RELAX NG E. Wilde: Alternative Schema Languages – Schematron

(9) RELAX NG +/-



RELAX NG E. Wilde: Alternative Schema Languages – Schematron

(10) RELAX NG Syntaxes

xml-technology-syntaxes.png

Principles

Principles E. Wilde: Alternative Schema Languages – Schematron

(12) Validation

  • Validation should not change the document
    • there are no default values
  • Only schema↔instance tests are supported
    • there is no type hierarchy as in XSD (schema↔schema)
    • there are no identity constraints (instance↔instance)
  • Grammars should not be restricted
    • DTDs and XSD no not allow non-determinism
    • RELAX NG allows non-deterministic content models
      chess = white, (black, white)*, black?


Principles E. Wilde: Alternative Schema Languages – Schematron

(13) Grammars

  • RELAX NG grammars have a start symbol
    • DTDs and XSD do not have start symbols
  • Attributes are defined as part of the content model
    • a more homogeneous view of the XML document tree
    • this allows alternatives of elements and attributes
  • Grammars are a set of named rules
    • rules define how an element is composed
    • local definitions (nested specifications of content models) are possible


Example

Example E. Wilde: Alternative Schema Languages – Schematron

(15) DTD and XSD

<!ELEMENT document (heading, chapter) >
<!ELEMENT heading  (#PCDATA) >
<!ELEMENT chapter  (heading, para+) >
<!ATTLIST chapter  id ID #REQUIRED >
<!ELEMENT para     (#PCDATA) >
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:element name="document">
  <xs:complexType>
   <xs:sequence>
    <xs:element ref="heading"/>
    <xs:element ref="chapter"/>
   </xs:sequence>
  </xs:complexType>
 </xs:element>
 <xs:element name="heading" type="xs:string"/>
 <xs:element name="chapter">
  <xs:complexType>
   <xs:sequence>
    <xs:element ref="heading"/>
    <xs:element name="para" type="xs:string" maxOccurs="unbounded"/>
   </xs:sequence>
   <xs:attribute name="id" type="xs:ID"/>
  </xs:complexType>
 </xs:element>
</xs:schema>


Example E. Wilde: Alternative Schema Languages – Schematron

(16) RELAX NG

<grammar xmlns="http://relaxng.org/ns/structure/1.0">
 <start><ref name="document"/></start>
 <define name="document">
  <element name="document">
   <ref name="heading"/>
   <ref name="chapter"/>
  </element>
 </define>
 <define name="heading">
  <element name="heading"><text/></element>
 </define>
 <define name="chapter">
  <element name="chapter">
   <attribute name="id"><text/></attribute>
   <ref name="heading"/>
   <oneOrMore>
    <element name="para"><text/></element>
   </oneOrMore>
  </element>
 </define>
</grammar>


Example E. Wilde: Alternative Schema Languages – Schematron

(17) RELAX NG Compact Syntax

start    = document

document = element document { heading, chapter }

heading  = element heading { text }

chapter  = element chapter {
    attribute id { text },
    heading,
    element para { text }+
  }


Document Schema Definition Languages (DSDL)

Outline (Document Schema Definition Languages (DSDL))

  1. RELAX NG [8]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]
Document Schema Definition Languages (DSDL) E. Wilde: Alternative Schema Languages – Schematron

(19) Modular Validation



Document Schema Definition Languages (DSDL) E. Wilde: Alternative Schema Languages – Schematron

(20) DSDL Master Plan



Schematron

Outline (Schematron)

  1. RELAX NG [8]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]
Schematron E. Wilde: Alternative Schema Languages – Schematron

(22) XPath Again



Schematron E. Wilde: Alternative Schema Languages – Schematron

(23) Basics

<schema xmlns="http://www.ascc.net/xml/schematron">
 <title>Address Checking</title>
 <pattern name="Phone Number Checking">
  <rule context="address">
   <assert test="(count(phone[@type = 'voice']) > 0) and (count(phone[@type = 'fax']) >  0)">there must be at least one voice and one fax number</assert>
  </rule>
 </pattern>
</schema>


Implementation

Outline (Implementation)

  1. RELAX NG [8]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]
Implementation E. Wilde: Alternative Schema Languages – Schematron

(25) Performing Validation

  • Schema languages are declarative inputs for validation
    • schema languages are not executable programming languages
    • to perform validation, some software component must process documents and schemas
  • Schema languages require supporting software
    • DTDs are part of XML, validating XML processor must perform DTD validation
    • XSD is a separate specification, an XSD processor is required
  • Schematron is built around XPaths
    • any technology supporting XPath evaluation would be a good foundation
    • XSLT is a technology supporting XPath evaluation
    • XSLT's program flow control is good enough to support Schematron
    • XSLT processors are available for a large number of platforms


Implementation E. Wilde: Alternative Schema Languages – Schematron

(26) XSLT-Generated XSLT

  • XSLT uses XML as its syntax
    • this is inconvenient because XSLT programs are very verbose
    • processing XSLT with XSLT is supported very well
    • for power users, the benefits outweigh the discomforts
  • How is it possible to generate XSLT from XSLT?
    • it is impossible to use literal result elements (they would be executed)
    • it would be against XSLT's idea to write the resulting XSLT as text
    • there must be a distinction between executable and output XSLT elements
<xsl:template match="rule">
	<xsl:template match="{@context}">
		<xsl:apply-templates select="assert"/>
	</xsl:template>
</xsl:template>


Implementation E. Wilde: Alternative Schema Languages – Schematron

(27) XSLT-Based Schematron

schematron-xslt.png

Implementation E. Wilde: Alternative Schema Languages – Schematron

(28) Compiling Assertions

<xsl:stylesheet version="1.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
 xmlns:axsl="http://www.w3.org/1999/XSL/TransformAlias" 
 xmlns:sch="http://www.ascc.net/xml/schematron"
  >
<xsl:namespace-alias stylesheet-prefix="axsl" result-prefix="xsl"/>
 <!-- ASSERT and REPORT -->
 <xsl:template match="sch:assert | assert">
                <xsl:if test="not(@test)">
                    <xsl:message>Markup Error: no test attribute in &lt;assert></xsl:message>
                </xsl:if>
  <axsl:choose>
   <axsl:when test="{@test}"/>
   <axsl:otherwise>
    <xsl:call-template name="process-assert">
     <xsl:with-param name="role" select="@role"/>
     <xsl:with-param name="id" select="@id"/>
     <xsl:with-param name="test" select="normalize-space(@test)" />
     <xsl:with-param name="icon" select="@icon"/>
     <xsl:with-param name="subject" select="@subject"/>
     <xsl:with-param name="diagnostics" select="@diagnostics"/>
    </xsl:call-template>  
   </axsl:otherwise>
  </axsl:choose>
 </xsl:template>
 <xsl:template match="sch:report | report">
                <xsl:if test="not(@test)">
                    <xsl:message>Markup Error: no test attribute in &lt;report></xsl:message>
                </xsl:if>
  <axsl:if test="{@test}">
   <xsl:call-template name="process-report">
    <xsl:with-param name="role" select="@role"/>
    <xsl:with-param name="test" select="normalize-space(@test)" />
    <xsl:with-param name="icon" select="@icon"/>
    <xsl:with-param name="id" select="@id"/>
    <xsl:with-param name="subject" select="@subject"/>
    <xsl:with-param name="diagnostics" select="@diagnostics"/>
   </xsl:call-template>
  </axsl:if>
 </xsl:template>


Implementation E. Wilde: Alternative Schema Languages – Schematron

(29) Compiled Example

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:sch="http://www.ascc.net/xml/schematron" version="1.0">
 <xsl:template match="*|@*" mode="schematron-get-full-path">
  <xsl:apply-templates select="parent::*" mode="schematron-get-full-path"/>
  <xsl:text>/</xsl:text>
  <xsl:if test="count(. | ../@*) = count(../@*)">@</xsl:if>
  <xsl:value-of select="name()"/>
  <xsl:text>[</xsl:text>
  <xsl:value-of select="1+count(preceding-sibling::*[name()=name(current())])"/>
  <xsl:text>]</xsl:text>
 </xsl:template>
 <xsl:template match="/">
  <xsl:apply-templates select="/" mode="M1"/>
 </xsl:template>
 <xsl:template match="address" priority="4000" mode="M1">
  <xsl:choose>
   <xsl:when test="(count(phone[@type = 'voice']) &gt; 0) and (count(phone[@type = 'fax']) &gt;  0)"/>
   <xsl:otherwise>there must be at least one voice and one fax number</xsl:otherwise>
  </xsl:choose>
  <xsl:apply-templates mode="M1"/>
 </xsl:template>
 <xsl:template match="text()" priority="-1" mode="M1"/>
 <xsl:template match="text()" priority="-1"/>
</xsl:stylesheet>


Patterns

Patterns E. Wilde: Alternative Schema Languages – Schematron

(31) Grouping Tests

  • Patterns are containers for a set of Rules [Rules (1)]
    • patterns are used for representing goal-oriented parts of the validation
    • achieving one goal may require checking within various contexts
  • Patterns are described by a title and additional text
    • Schematron is geared towards human users
    • title and text are documentation only, they are never used for validation
  • Patterns can be grouped by phases for different validation tasks
    • patterns group a set of rules specific for one validation goal
    • depending on the application, different validation phases may require different sets of patterns


Rules

Rules E. Wilde: Alternative Schema Languages – Schematron

(33) Setting the Context

  • Setting the context is essential for XPath expressions
    • within Patterns [Patterns (1)], rules group context-specific Assertions [Assertions (1)]
    • assertion XPaths are evaluated relative to a rule's context
  • Abstract rules make is possible to reuse assertions
    • abstract rules are not evaluated (they do not have a context)
    • other rules may import assertions by extending an abstract rule


Assertions

Assertions E. Wilde: Alternative Schema Languages – Schematron

(35) Assertions with assert

  • assert is used to specify assertions
    • if the XPath evaluates to false, the assertion's content is output
    • assertion are always evaluated as boolean (type casting will be applied)
  • Assertion XPaths are evaluated relative to the containing rule's context
    • moving an assertion from one rule to another will change its meaning
  • XPath is not good for expressing grammar rules
    • grammar checking should be left to grammar-oriented languages
<!ELEMENT ENTRY (NAME, ADDRESS, PHONENUM+, EMAIL) >
( count(NAME) = 1 and count(ADDRESS) = 1 and count(EMAIL) = 1 ) and ( NAME[following-sibling::ADDRESS] and ADDRESS[following-sibling::PHONENUM] and PHONENUM[following-sibling::EMAIL] ) and ( count(NAME|ADDRESS|PHONENUM|EMAIL) = count(*) )


Assertions E. Wilde: Alternative Schema Languages – Schematron

(36) Assertions with report

  • report is used to generate reports
    • if the XPath evaluates to true, then the assertion's content is output
    • assertion are always evaluated as boolean (type casting will be applied)
  • Logically, assert and report are inverse
    • assert is used to test conformance (it outputs errors)
    • report id used to report observations (it outputs messages)
    • Schematron's processing model is underspecified (check assertions, print outputs)
  • Schematron is useful for reporting to humans
    • machine-oriented environments need a better processing model
    • using Schematron as a starting point could be a good way to start


Assertions E. Wilde: Alternative Schema Languages – Schematron

(37) Report Example

<schema xmlns="http://www.ascc.net/xml/schematron">
 <title>Address Checking</title>
 <pattern name="Contact Details Checking">
  <rule context="address">
   <assert test="(count(phone[@type = 'voice']) > 0) and (count(phone[@type = 'fax']) >  0)">ERROR: There must be at least one voice and one fax number</assert>
  </rule>
  <rule context="website">
   <report test="substring(text(), string-length(text())-3) = '.edu'">REPORT: There is an .edu Web site</report>
  </rule>
 </pattern>
</schema>


Conclusions

Outline (Conclusions)

  1. RELAX NG [8]
    1. Principles [2]
    2. Example [3]
  2. Document Schema Definition Languages (DSDL) [2]
  3. Schematron [12]
    1. Implementation [5]
    2. Patterns [1]
    3. Rules [1]
    4. Assertions [3]
  4. Conclusions [1]
Conclusions E. Wilde: Alternative Schema Languages – Schematron

(39) Validation is Good



2010-11-16 XML Foundations [./]
Fall 2010 — INFO 242 (CCN 42593)