SGML and XML Document Grammars and Exceptions

Pekka Kilpeläinen, Derick Wood

Pekka Kilpeläinen, Derick Wood, SGML and XML Document Grammars and Exceptions, Information and Computation, 169(2):230-251, September 2001.

The Standard Generalized Markup Language (SGML) and the Extensible Markup Language (XML) allow users to define document-type definitions (DTDs), which are essentially extended context-free grammars expressed in a notation that is similar to extended Backus-Naur form. The right-hand side of a production, called a content model, is both an extended and a restricted regular expression. The semantics of content models for SGML DTDs can be modified by exceptions (XML does not allow exceptions). Inclusion exceptions allow named elements to appear anywhere within the content of a content model, and exclusion exceptions preclude named elements from appearing in the content of a content model. We give precise definitions of the semantics of exceptions, and prove that they do not increase the expressive power of SGML DTDs when we restrict DTDs according to accepted SGML practice. We prove the following results: 1. Exceptions do not increase the expressive power of extended context-free grammars. 2. For each DTD with exceptions, we can obtain a structurally equivalent extended context-free grammar. 3. For each DTD with exceptions, we can construct a structurally equivalent DTD when we restrict the DTD to adhere to accepted SGML practice. 4. Exceptions are a powerful shorthand notation — eliminating them may cause exponential growth in the size of an extended context-free grammar or of a DTD.


Bibliography Navigation: Reference List; Author Index; Title Index; Keyword Index

Generated by sharef2html on 2011-04-15, 02:00:41.