A Compact XML Schema Syntax [ Erik Wilde, Kilian Stillhard ]

Abstract:

The new schema language defined by the W3C, XML Schema, is used in a number of applications, such as Web Services and XQuery, and will probably be used by an increasing number of users in the near future. Currently, XML Schema's data model, the "XML Schema Components", can only be represented in the rather verbose XML syntax defined in the XML Schema specification itself. We propose an alternative non-XML syntax, which is (1) much more compact than the XML syntax, (2) defined by EBNF productions, (3) re-uses well-known syntactic concepts where appropriate, and (4) is easy to implement using standard parser-generating tools. Our approach is comparable to the approach of the RELAX NG schema language, which also supports two alternative syntaxes, an XML-based one, and a more compact non-XML one. We believe that XML Schema could be made easier to use by supporting a compact syntax. Currently, complex schemas are very hard to read due to the large amount of XML markup, and the various tools and GUIs that are on the market differ widely and in all cases support only a subset of the features of XML Schema. We believe that there should be a compact syntax, optimized for human users, which makes it easy to read and write XML Schemas, and which supports the full feature set of XML Schema. Obviously, a non-XML syntax makes it necessary to introduce new tools. However, generating parsers from EBNF productions is rather simple and well-supported by standard tools (such as yacc and JavaCC), and the other direction (i.e., generating non-XML syntax) can be implemented by using XML tools. Our XML Schema Compact Syntax (XSCS) is geared towards human users, by re-using language constructs known from other application areas, such as DTDs and programming languages, and making them available for XML Schema component representation. Examples for this re-use of syntactic constructs are DTD-style content models, number ranges ("[a,b]" or "(a,b]" as in standard mathematical notation), and qualifying attributes like "abstract" or "final" known from programming languages ("final abstract type { ... }"). We also believe that graphical representations of complex structures such as schemas are not always suitable because some people prefer textual representations, editing might be faster when using keyboard input instead of using click-and-point operations, and graphical representations (usually) hide some information. We fully integrate the processing of our syntax into the existing pipeline of XML-based tools by creating a parser that generates SAX events or DOM trees from the compact syntax documents. This way, we can use the existing XML Schema validation engines and XML Schema error checking facilities already implemented in validation engines like the Xerces parser. In addition, we have a serialization module to generate compact syntax documents from XML Schema DOM trees. Our overall goal is to improve XML Schema acceptance by providing a syntax that is easier to work with than the XML syntax, and tools to process this syntax.

Annotation:

Keywords: XSD (XML Schema Definition Language)_0.8; XSCS (XML Schema Compact Syntax)_0.9;