Validation of Character Repertoires for XML Documents

Erik Wilde

Citation
Descriptions
Abstract:

XML is based on Unicode, and therefore XML documents may use the full Unicode character repertoire. However, XML-based applications often use XML interfaces to legacy software which in many cases is not capable of dealing with the full Unicode character repertoire. We therefore propose a schema language for XML which is capable of limiting the character repertoire of XML documents. This schema language, called Character Repertoire Validation for XML (CRVX), has features to permit or disallow character repertoire subsets from certain parts of an XML document, for example only for element and attribute names. CRVX uses information from the Unicode Character Database (UCD) to make character repertoire specification as easy as possible. CRVX is not intended to be the only schema language in an XML application scenario, but it provides useful additional schema-based validation to protect applications from unsupported characters. XML applications typically combine different schema languages before processing XML documents, and CRVX is intended to complement other schema languages such as grammar-based languages (DTD, XML Schema) or rule-based languages (Schematron). CRVX can be implemented in various ways. One simple solution is to use XSLT to transform an CRVX schema into an XSLT program, which is then used to validate XML documents. We briefly describe such an implementation. Other (and more efficient) implementations could be based on DOM or SAX parsers.

Resources

Bibliography Navigation: Reference List; Author Index; Title Index; Keyword Index


Generated by sharef2html on 2011-04-15, 02:00:41.