Balanced Context-Free Grammars, Hedge Grammars and Pushdown Caterpillar Automata

Anne Brüggemann-Klein, Derick Wood


The XML community generally takes trees and hedges as the model for XML document instances and element content. In contrast, Berstel and Boasson have discussed XML documents in the framework of extended context-free grammar, modeling XML documents as Dyck strings and schemas as balanced grammars. How can these two models be brought closer together? We examine the close relationship between Dyck strings and hedges, observing that trees and hedges are higher level abstractions than are Dyck primes and Dyck strings. We then argue that hedge grammars are effecively identical to balanced grammars and that balanced languages are identical to regular hedge languages, modulo encoding. From the close relationship between Dyck strings and hedges, we obtain a two-phase architecture for the parsing of balanced languages. We propose caterpillar automata with an additional pushdown stack as a new computational model for the second phase; that is, for the validation of XML documents.


Bibliography Navigation: Reference List; Author Index; Title Index; Keyword Index

Generated by sharef2html on 2011-04-15, 02:00:41.