Restoring the Primacy of PCDATA

Patrick Durusau, Matthew O'Donnell


Current markup languages and processing tools assume and even impose a hierarchical, tree-based approach to the data encoded in documents. This paper explores the benefits and gains made possible by processing documents marked up in XML syntax as definitions of sets and the relations between them. This change of understanding has implications for the relationship between data and metadata without necessitating either a new syntax or set of processing tools. The use of milestones or empty elements in XML documents has traditionally been advocated as a solution to the problem of multiple and potentially overlapping structures. One of the difficulties of processing milestones with tree-based tools is that it requires the extraction of a node (and element with child PCDATA) from a flat representation where the milestones are siblings of the PCDATA. This procedure is made more difficult by the presence of intervening elements and structures. A set-based understanding of markup syntax treats all elements as milestones, thereby 'flattening' the document and raising the PCDATA to the primary level. The virtual milestones function to mark the boundaries of a set.


Bibliography Navigation: Reference List; Author Index; Title Index; Keyword Index

Generated by sharef2html on 2011-04-15, 02:00:41.