Towards Conceptual Modeling for XML

Erik Wilde

ETH Zürich, TIK



Today, XML is primarily regarded as a syntax for exchanging structured data, and therefore the question of how to develop well-designed XML models has not been studied extensively. As applications are increasingly penetrated by XML technologies, and because query and programming languages provide native XML support, it would be beneficial to use these features to work with well-designed XML models. In order to better focus on XML-oriented technologies in systems engineering and programming languages, an XML modeling language should be used, which is more focused on modeling and structure than typical XML schema languages. In this paper, we examine the current state of the art in XML schema languages and XML modeling, and present a list of requirements for a XML conceptual modeling language.

Model Views of Potential XML Users

  1. Document Processing
    • modeling documents as XML documents
    • XML schema languages are made for this
  2. Information Management/Database Design
    • mostly relational models (ER and derivations)
    • mismatches between ER and XML's hierarchical model
  3. Software Engineering
    • software engineering combines data and behavior
    • classes are more than just documents

Usages of XML

  1. Document Processing
    • many document-oriented systems are XML-based
    • XML import/export are a must
    • XML's restrictions are (in most cases) acceptable
  2. Information Management/Database Design
    • XML support in DBMSs (SQL/XML), XDBMS are available
    • "modeling" using XSD or DTDs (or simply well-formed)
    • no good support for combining RDBMS/XDBMS
  3. Software Engineering
    • XML as a way for persisting objects
    • UML as a way to model software systems
    • CASE tools for generating classes and their attributes

Missing: XML Conceptual Modeling

Example: XML in Programming Languages

Conceptual Models for XML

XML Conceptual Modeling Languages

Example XER Schema (1)

Example XER Schema

Example XER Schema (2)

Weaknesses of Conceptual Languages

  1. Targeted on specific schema language
  2. No or weak support for mixed content
  3. Lack of formal foundation
  4. No support for reference/hierarchy mix of XML
  5. No support for multi-document scenarios
  6. Non-deterministic Content

Formal Models for XML

XML Formal Models

Weaknesses of Formal Languages

State of the Art

List of Requirements (1)

  1. Formal Foundation
  2. Graphical Notation
  3. Hierarchical and Referential Structures
  4. Schema Language Mappings
  5. Exceptions (Inclusions and Exclusions)
  6. Non-deterministic Content

List of Requirements (2)

  1. Treating XML Nodes Consistently
  2. Model Groups
  3. Reuse of Content
  4. Generalized Mixed Content
  5. Open Content
  6. Intra- and Inter-Document Relationships


Thank You! Questions?