XML and Databases

XML Foundations [./]
Fall 2008 — INFO 242 (CCN 42572)

Erik Wilde, UC Berkeley School of Information
2008-12-04

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents E. Wilde: XML and Databases

Contents

E. Wilde: XML and Databases

(2) Abstract

While XML databases are a good solution for managing XML content, frequently it is necessary to uses non-XML databases for managing XML content. In most cases, these databases will be relational databases. There a two major approaches of how to manage XML content in a relational database. The first approach is to define a mapping between XML and relational structures and work with this mapping. The second approach is to use the XML-specific functionality, which is increasingly provided by relational databases, turning them into XML-aware databases.



E. Wilde: XML and Databases

(3) XML is Trees



E. Wilde: XML and Databases

(4) Storing XML



Relational Databases

Outline (Relational Databases)

  1. Relational Databases [2]
  2. Database Support for XML [4]
  3. XML Storage in Databases [9]
  4. XML in Relational Databases [5]
    1. SQL/XML [3]
  5. XML Databases [7]
    1. Files vs. Databases [4]
    2. Updating XDBMS [3]
  6. Conclusions [3]
Relational Databases E. Wilde: XML and Databases

(6) Generic XML Storage



Relational Databases E. Wilde: XML and Databases

(7) Tree Table

tree-table.png
ID Type Name Value Parent Left
1 Root
2 Element a 1
3 Element b 2
4 Element c 2 3
5 Text Text 3
6 Attribute att 42 4


Database Support for XML

Outline (Database Support for XML)

  1. Relational Databases [2]
  2. Database Support for XML [4]
  3. XML Storage in Databases [9]
  4. XML in Relational Databases [5]
    1. SQL/XML [3]
  5. XML Databases [7]
    1. Files vs. Databases [4]
    2. Updating XDBMS [3]
  6. Conclusions [3]
Database Support for XML E. Wilde: XML and Databases

(9) Why XML and Databases?



Database Support for XML E. Wilde: XML and Databases

(10) XML Interchange

xml-dbms-application.png

Database Support for XML E. Wilde: XML and Databases

(11) XML Support in DBMS

xml-dbms-xmlsupport.png

Database Support for XML E. Wilde: XML and Databases

(12) XML DBMS

xml-dbms-xdbms.png

XML Storage in Databases

Outline (XML Storage in Databases)

  1. Relational Databases [2]
  2. Database Support for XML [4]
  3. XML Storage in Databases [9]
  4. XML in Relational Databases [5]
    1. SQL/XML [3]
  5. XML Databases [7]
    1. Files vs. Databases [4]
    2. Updating XDBMS [3]
  6. Conclusions [3]
XML Storage in Databases E. Wilde: XML and Databases

(14) Model Mapping



XML Storage in Databases E. Wilde: XML and Databases

(15) XML is Text



XML Storage in Databases E. Wilde: XML and Databases

(16) XML → ∗LOB

xml-storage-lob.png

XML Storage in Databases E. Wilde: XML and Databases

(17) XML as a Datatype



XML Storage in Databases E. Wilde: XML and Databases

(18) XML Datatype

xml-storage-datatype.png

XML Storage in Databases E. Wilde: XML and Databases

(19) Mapping XML to Models



XML Storage in Databases E. Wilde: XML and Databases

(20) Shredding (XML → Columns)

xml-storage-shredding.png

XML Storage in Databases E. Wilde: XML and Databases

(21) XML as First-Class Citizen



XML Storage in Databases E. Wilde: XML and Databases

(22) XML DBMS

xml-storage-xdbms.png

XML in Relational Databases

Outline (XML in Relational Databases)

  1. Relational Databases [2]
  2. Database Support for XML [4]
  3. XML Storage in Databases [9]
  4. XML in Relational Databases [5]
    1. SQL/XML [3]
  5. XML Databases [7]
    1. Files vs. Databases [4]
    2. Updating XDBMS [3]
  6. Conclusions [3]
XML in Relational Databases E. Wilde: XML and Databases

(24) RDBish XML



XML in Relational Databases E. Wilde: XML and Databases

(25) Problematic XML



SQL/XML

SQL/XML E. Wilde: XML and Databases

(27) SQL/XML:2003

  • SQL/XML provides XML Support in DBMS [XML Support in DBMS (1)]s
    • it introduces XML as a Datatype [XML as a Datatype (1)]
    • it introduces a number of operations for generating XML from query results
    • it defines mappings to bridge both worlds (SQL and XML)
  • SQL/XML does not change anything about the database model
    • data is still stored in tables only
    • a column of a table may use the XML type
    • queries may return results in XML rather than as SQL result sets


SQL/XML E. Wilde: XML and Databases

(28) SQL/XML Example

SELECT
  e.EmpId,
  e.FirstName,
  e.LastName,
  e.StartDate,
  e.EndDate
FROM Employees e WHERE e.EmpId = 12
SELECT
  XMLELEMENT(NAME "employee",
    XMLATTRIBUTES(e.EmpId as "id"),
    XMLELEMENT(NAME "names",
      XMLELEMENT(NAME "first", e.FirstName),
      XMLELEMENT(NAME "last", e.LastName)),
    XMLELEMENT(NAME "hire-dates",
      XMLATTRIBUTES(e.StartDate as "start", e.EndDate as "end")))
FROM Employees e WHERE e.EmpId = 12


SQL/XML E. Wilde: XML and Databases

(29) SQL/XML:2007

  • Adds the concept of XML Tables
  • XML Tables are not tables, they are containers for XML
  • SQL/XML:2007 changes the database's data model
    • it is now possible to have a database with no tables
    • likely use cases are to have both: traditional and XML tables
  • SQL/XML:2007 defines a hybrid database: relational and XML database


XML Databases

Outline (XML Databases)

  1. Relational Databases [2]
  2. Database Support for XML [4]
  3. XML Storage in Databases [9]
  4. XML in Relational Databases [5]
    1. SQL/XML [3]
  5. XML Databases [7]
    1. Files vs. Databases [4]
    2. Updating XDBMS [3]
  6. Conclusions [3]

Files vs. Databases

Files vs. Databases E. Wilde: XML and Databases

(32) Abstraction Layers

  • Files systems are general-purpose mechanisms for managing data
    • files may contain any data that can be encoded as a sequence of bytes
    • file systems maintain some metadata about files (owner, dates, permissions)
    • data management is limited to reading or writing streams of bytes
  • Databases are specialized tools for managing data
    • they prescribe a logical model which defines the type of data to work with
    • they provide operations on this logical model only (and not on the physical model)
    • the physical model can be optimized to provide better performance/security/reliability
    • the physical model can be stored in files or as raw data without a file system
  • Relational databases (RDBMS) use tables as their logical model
  • XML databases (XDBMS) use XDM [XQuery 1.0 and XPath 2.0 Data Model (XDM)] (typed Infosets) as their document model


Files vs. Databases E. Wilde: XML and Databases

(33) File-Based XQuery

File-based XQuery Processing

Files vs. Databases E. Wilde: XML and Databases

(34) Database-Based XQuery

DB-based XQuery Processing

Files vs. Databases E. Wilde: XML and Databases

(35) XDBMS Database Management

  • Databases are optimized data management systems
    • data must be structured according to the Data Definition Language (DDL)
    • it can only be manipulated using the Data Manipulation Language (DML)
    • DDL and DML allow databases to implement optimized storage and retrieval
  • XML is a new DDL, and relational databases cannot handle XML natively
  • XML documents do not have to be stored as text-based XML document files
    • XML is the data model an application expects when working with XML
    • XML storage can be optimized for various purposes, one example is Persistent DOM (PDOM)
    • database data structures always depend on the expected write vs. read ratio


Updating XDBMS

Updating XDBMS E. Wilde: XML and Databases

(37) XQuery

  • XQuery [XML Query (XQuery) – Part I] is a read-only language
    • queries a collection of XML documents (XDM [XQuery 1.0 and XPath 2.0 Data Model (XDM)] instances)
    • returns an XDM instance, serialized as XML or something else
  • Updating XML databases currently is not covered by a widely accepted standard
    • XUpdate [XUpdate (1)] is a simple and rather old solution (04/2000)
    • various XQuery Update Facility [XQuery Update Facility (1)] have been proposed for XQuery
    • XML database implementers often introduce proprietary update facilities
    • the W3C is working on an XQuery Update Facility [http://www.w3.org/TR/xquery-update-10/]


Updating XDBMS E. Wilde: XML and Databases

(38) XUpdate

  • XUpdate [http://xmldb-org.sourceforge.net/xupdate/xupdate-wd.html] defines an language for specifying XML updates
    • the data model is based on XPath 1.0
    • the syntax is based on XML
  • XUpdate has no connections with a query language, it is for updates only
    <addresses version="1.0">
      <address id="1"> 
        <fullname>Andreas Laux</fullname> 
        <born day='1' month='12' year='1978'/> 
      </address> 
    </addresses>
    <xupdate:modifications version="1.0" xmlns:xupdate="http://www.xmldb.org/xupdate"> 
      <xupdate:insert-after select="/addresses/address[1]" > 
        <xupdate:element name="address">
          <xupdate:attribute name="id">2</xupdate:attribute>
            <fullname>Lars Martin</fullname> 
            <born day='2' month='12' year='1974'/> 
          </xupdate:element> 
        </xupdate:insert-after> 
    </xupdate:modifications>


Updating XDBMS E. Wilde: XML and Databases

(39) XQuery Update Facility

  • XQuery 1.0 [XML Query (XQuery) – Part I] had been planned to be a read-only language
    • creating a fully functional language would have been to ambitious
    • with a solid formal foundation, XQuery can be upgraded to also provide update features
  • Several XQuery update extensions have been proposed
    • updating goes through a consolidation phase similar to querying
    • the eventual XQuery update facility will be integrated with XPath
  • W3C's XQuery Update Facility [http://www.w3.org/TR/xqupdate/] is in early draft status
do insert <year>2005</year> after fn:doc("bib.xml")/books/book[1]/publisher
do delete fn:doc("bib.xml")/books/book[1]/author[last()]
do replace fn:doc("bib.xml")/books/book[1]/publisher with fn:doc("bib.xml")/books/book[2]/publisher


Conclusions

Outline (Conclusions)

  1. Relational Databases [2]
  2. Database Support for XML [4]
  3. XML Storage in Databases [9]
  4. XML in Relational Databases [5]
    1. SQL/XML [3]
  5. XML Databases [7]
    1. Files vs. Databases [4]
    2. Updating XDBMS [3]
  6. Conclusions [3]
Conclusions E. Wilde: XML and Databases

(41) Tables and Trees don't Mix



Conclusions E. Wilde: XML and Databases

(42) Database Technologies do Mix



Conclusions E. Wilde: XML and Databases

(43) XML Databases for XML Scenarios



2008-12-04 XML Foundations [./]
Fall 2008 — INFO 242 (CCN 42572)