From Model to Markup

XML Foundations (INFO 242)

Erik Wilde, UC Berkeley School of Information
2007-11-06
Creative Commons License

This work is licensed under a CC
Attribution 3.0 Unported License

Abstract

While XML is very useful for representing and manipulating structured data, the question remains where these structures come from. They are usually some kind of encoding for a conceptual model, but there is no established and universally accepted way of how to connect the modeling world with XML markup. Some of the challenges and approaches to XML and modeling will be presented in this lecture. The goal of this lecture is to raise awareness for the current gap between models and markup, and for practical approaches how to bridge that gap.

Outline (Motivation)

  1. Motivation [3]
  2. Modeling Layers – Layering Models? [9]
    1. Modeling [4]
    2. Layering [4]
  3. Data Modeling [18]
    1. An example case: Harry again [7]
    2. Excursus: Data Modeling in the World of Relational Databases [2]
    3. Conceptual Modeling for XML Data [7]
  4. Conclusions [1]

Writing schemas is hard & tedious

Writing 5cHEMa$ is cool & g33ky!

We Need a Conceptual Modeling Layer

Outline (Modeling Layers – Layering Models?)

  1. Motivation [3]
  2. Modeling Layers – Layering Models? [9]
    1. Modeling [4]
    2. Layering [4]
  3. Data Modeling [18]
    1. An example case: Harry again [7]
    2. Excursus: Data Modeling in the World of Relational Databases [2]
    3. Conceptual Modeling for XML Data [7]
  4. Conclusions [1]

Outline (Modeling)

  1. Motivation [3]
  2. Modeling Layers – Layering Models? [9]
    1. Modeling [4]
    2. Layering [4]
  3. Data Modeling [18]
    1. An example case: Harry again [7]
    2. Excursus: Data Modeling in the World of Relational Databases [2]
    3. Conceptual Modeling for XML Data [7]
  4. Conclusions [1]

What is a Model?

The cardinality of the relationship model-instance can be many-to-one! Think of a toy vessel – e.g., a Titanic! But wait … which one's now the model?

What is a Model? (Natural Language)

http://www.markrobertwahlberg.com/mwck.jpg
  1. From http://www.markrobertwahlberg.com/
  2. From http://www.mocpages.com/
  3. From http://www.imdb.com/

Modeling

In a certain field / realm / universe of discourse there usually is some agreement on how modeling has to be done. This is an essential prerequisite for models to be used as a subject of discussion / negotiation / evaluation. This agreement can have been achieved implicitly or by standardization. In the example above, Derek Zoolander does not know the conventions implicitly being agreed on when dealing with architectural models.

Why modeling?

Outline (Layering)

  1. Motivation [3]
  2. Modeling Layers – Layering Models? [9]
    1. Modeling [4]
    2. Layering [4]
  3. Data Modeling [18]
    1. An example case: Harry again [7]
    2. Excursus: Data Modeling in the World of Relational Databases [2]
    3. Conceptual Modeling for XML Data [7]
  4. Conclusions [1]

Layering in Computer Science

  1. Encapsulating / hiding details / internals
  2. Enabling working on a simpler / more goal-oriented level
  3. Reusing frequent structures / procedures (patterns1)
  4. Gaining independence from specific technologies / media
  1. Patterns are models that are sufficiently general, adaptable, and worthy of imitation that we can reuse them. (From: Glushko, Robert J. and McGrath, Tim: Document Engineering, p. 90). Identifying such patterns is a modeling task!

Layering in Computer Science: Protocol stacks

  1. Encapsulation
  2. Goal-orientation
  3. Pattern reuse
  4. Independence

Layering in Computer Science: Compilers

  1. Encapsulation
  2. Goal-orientation
  3. Pattern reuse
  4. Independence

Layering in Human Physiology

The Combination: Model Layers!

  1. Chen, Peter Pin-Shan: The Entity-Relationship Model – Towards a Unified View od Data, Cambridge MA, 1976.

Outline (Data Modeling)

  1. Motivation [3]
  2. Modeling Layers – Layering Models? [9]
    1. Modeling [4]
    2. Layering [4]
  3. Data Modeling [18]
    1. An example case: Harry again [7]
    2. Excursus: Data Modeling in the World of Relational Databases [2]
    3. Conceptual Modeling for XML Data [7]
  4. Conclusions [1]

Who's a Data Modeler?

Quality Criteria

Outline (An example case: Harry again)

  1. Motivation [3]
  2. Modeling Layers – Layering Models? [9]
    1. Modeling [4]
    2. Layering [4]
  3. Data Modeling [18]
    1. An example case: Harry again [7]
    2. Excursus: Data Modeling in the World of Relational Databases [2]
    3. Conceptual Modeling for XML Data [7]
  4. Conclusions [1]

Harry returns

  1. The fact, that the person's name and contact information usually is given in a head section of a résumé document does not necessarily mean that such a head section is a relevant structural element: It's just a representational convention, and therefore should not be part of the data model. When creating a view of the data, we can utilize our knowledge of appropriate representational conventions by rendering personal information in a head section.
  2. If our vocabulary contains dedicated elements for education and experience it is not necessary to include attributes or elements specifying section titles like education or experience: This information can be retrieved from the structure and again added to a specific view when being generated.

Retrieving the résumé's structure

You need to have an SVG viewer installed in order to view this graphic!

A well-designed DTD (1)

<!ELEMENT résumé (date, person, education*, experience*, essay, skill*) >

<!ELEMENT date (day?, month, year) >
<!ELEMENT person (name, address) >
<!ELEMENT education (degree?, institution, startDate, endDate) >
<!ELEMENT experience (task*, company, startDate, endDate) >
<!ELEMENT essay (#PCDATA) >
<!ELEMENT skill (#PCDATA) >
<!ATTLIST skill proficiency (low | medium | high) #IMPLIED >

A well-designed DTD (2)

<!ELEMENT name (first+, middle*, last+) >
<!ELEMENT address (street+, city, zip*, state?, country) >

<!ELEMENT startDate (date) >
<!ELEMENT endDate (date) >

A good instance from a well-designed DTD

 <person>
  <name>
   <first>Felix</first>
   <last>Michel</last>
 <education>
  <degree>Swiss Matura</degree>
  <institution>
   <fullName>Literargymnasium Rämibühl</fullName>
 <experience>
  <task>Software Engineering</task>
  <task>Network Deployment</task>
  <company>
   <fullName>tegoro solutions</fullName>
 <skill proficiency="high">C</skill>
 <skill proficiency="high">XML Schema</skill>
 <skill proficiency="low">English</skill>

A look at the essay section

 <essay>
  I entered the Literargymnasium Rämibühl in 1994 where I obtained
  my Swiss Matura in January 2001. Starting from fall 2001 I have
  been studying Electrical Engineering at the ETHZ, where I
  learned C and XML Schema. After my final exams I stayed with
  tegoro solutions in Basel for an internship before I had the
  opportunity to do my Master Thesis at UC Berkeley. I hope that I
  will be able to improve my english skills while staying at
  iSchool.
 </essay>

Critical Review: A well-designed DTD?

Outline (Excursus: Data Modeling in the World of Relational Databases)

  1. Motivation [3]
  2. Modeling Layers – Layering Models? [9]
    1. Modeling [4]
    2. Layering [4]
  3. Data Modeling [18]
    1. An example case: Harry again [7]
    2. Excursus: Data Modeling in the World of Relational Databases [2]
    3. Conceptual Modeling for XML Data [7]
  4. Conclusions [1]

Quality Criteria: Normal Forms

  1. Yet, the most strict normal forms (4NF, 5NF) are hardly ever used in practice for the reasons mentioned earlier

Conceptual Modeling Formalism: Entity-Relationship Diagrams

ER.png

Outline (Conceptual Modeling for XML Data)

  1. Motivation [3]
  2. Modeling Layers – Layering Models? [9]
    1. Modeling [4]
    2. Layering [4]
  3. Data Modeling [18]
    1. An example case: Harry again [7]
    2. Excursus: Data Modeling in the World of Relational Databases [2]
    3. Conceptual Modeling for XML Data [7]
  4. Conclusions [1]

Is there anything similar for XML?

Why is it so hard to create a suitable formalism?

An informal formalism

Determine … Phase Question Example Action
1. Entities Inventory What's there? person, company Sketch boxes
2. Reusable Objects Analysis address, date Perhaps include some model libraries (UBL)
3. Reusable Tags Markup design What do we need? lists, hyperlinks, headings Perhaps include some schemas (XHTML)
4. Relations Assembly What's the connection? has-a, contains, references Draw arcs and arrows

The résumé structure, informally formalized

informal.png

An even better DTD

<!ENTITY % date "(day?, month, year)" > 
<!ENTITY % address "(street, street?, city, zip*, state?, country)" >
<!ELEMENT startDate %date; >
<!ELEMENT endDate %date; >
<!ELEMENT institution (fullName, address) >
<!ATTLIST institution id ID #REQUIRED >
<!ELEMENT education (degree?, startDate, endDate) >
<!ATTLIST education 
 id     ID  #REQUIRED
 ref    IDREF #IMPLIED
>
<!ELEMENT skills (skill*) >
<!ELEMENT skill (#PCDATA) >
<!ATTLIST skill
 id      ID  #REQUIRED
 proficiency  (low | medium | high) #IMPLIED
 origin    IDREF #IMPLIED
>

An even better instance

  <education id="edu1" ref="inst1">
   <degree id="degMat">Swiss Matura</degree>
   <startDate>
    <day>22</day>
    <month>August</month>
    <year>1994</year>
   </startDate>
 <essay>
  After having obtained my <link ref="degMat"/> from <link ref="inst1"/>, I have
  been studying Electrical Engineering at the <link ref="inst2"/>, where I
  learned <link ref="sklC"/> and <link ref="sklXSD"/>. After my final exams I stayed with
  <link ref="comp1"/> for an internship before I had the
  opportunity to do my Master Thesis at <link ref="inst3"/>. I hope that I
  will be able to improve my <link ref="sklEN"/> skills while staying at
  <link ref="inst3"/>.
 </essay>

Generating Views

 <xsl:template match="training/education/degree" mode="info">
  <div class="infoBox">
   <xsl:attribute name="id" select="@id"></xsl:attribute>
   <h3><xsl:apply-templates select="."></xsl:apply-templates></h3>
    Obtained from
    <xsl:variable name="REF" select="../@ref" />
    <xsl:value-of select="//*[@id=$REF]/fullName" />
   </div> 
 </xsl:template>

Outline (Conclusions)

  1. Motivation [3]
  2. Modeling Layers – Layering Models? [9]
    1. Modeling [4]
    2. Layering [4]
  3. Data Modeling [18]
    1. An example case: Harry again [7]
    2. Excursus: Data Modeling in the World of Relational Databases [2]
    3. Conceptual Modeling for XML Data [7]
  4. Conclusions [1]

XML and Modeling