XML Databases
|
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.
|
Abstract
XML Databases are specialized databases for handling XML data. As their query language, they will often use XQuery, but they need additional technologies for updating and storing data. XQuery currently is a read-only language, so update facilities must be provided as an addition to XQuery querying capabilities. One of the big advantages of databases vs. file systems are optimized storage (and thus access) structures, and in the case of XML databases this means storing XML documents other than as text files.
Abstraction Layers
- Files systems are general-purpose mechanisms for managing data
- files may contain any data that can be encoded as a sequence of bytes
- file systems maintain some metadata about files (owner, dates, permissions)
- data management is limited to reading or writing streams of bytes
- Databases are specialized tools for managing data
- they prescribe a logical model which defines the type of data to work with
- they provide operations on this logical model only (and not on the physical model)
- the physical model can be optimized to provide better performance/security/reliability
- the physical model can be stored in files or as raw data without a file system
- Relational databases (RDBMS) use tables as their logical model
- XML databases (XDBMS) use XDM (
typed Infosets
) as their document model
File-Based XQuery
Database-Based XQuery
Database Management
- Databases are optimized data management systems
- data must be structured according to the Data Definition Language (DDL)
- it can only be manipulated using the Data Manipulation Language (DML)
- DDL and DML allow databases to implement optimized storage and retrieval
- XML is a new DDL, and relational databases cannot handle XML natively
- XML documents do not have to be stored as text-based XML document files
- XML is the data model an application expects when working with XML
- XML storage can be optimized for various purposes, one example is Persistent DOM (PDOM)
- database data structures always depend on the expected write vs. read ratio
Sequential Numbers
Dewey Decimal Classification (DDC)
JDBC
- Database systems are stand-alone applications
- they provide the service of storing, querying, and updating data
- they are often accessed from various applications in an IT landscape
- JDBC is the standard Java technology to connect to a database
- JDBC allows standardized access from Java programs to relational databases
- database vendors provide a JDBC driver for their database product
- JDBC accepts and sends them to the database system
SELECT returns row result set, i.e. the number of rows generated ny the queryINSERT, UPDATE, and DELETE return a simple count (database rows affected)
XML:DB API (XAPI)
- XML:DB was an initiative of XDBMS providers and supporters
- it was founded when XDBMS was not a mainstream concept
- none of he big players ever participated in the group
- no longer active and some of its members have already disappeared
- XML:DB has published some influential draft documents
- XML:DB API (XAPI) was their proposal for how an XDBMS API could look like
- XUpdate was their proposal for an update language for XDBMS
- The latest XAPI Draft is dated 09/2001
- it uses XPath as the
query language
- it uses XUpdate as the update language
- it predates XDM and XQuery, the two essential XDBMS technologies today
Database Connection
- Currently there is no
JDBC
for XDBMS- JDBC is no applicable, XAPI is outdated
- XQuery for Java (XQJ) is in the pipeline of the Java Community Process (JCP)
- it is still in its early draft stage (version 0.5) and will continue to change
- it is in development since 06/2003 and the main concept seem to be stable now
XQJ Example
// establish a connection to the XQuery engine
XQConnection conn = xqds.getConnection();
// create an expression object that is later used to execute an XQuery expression
XQExpression expr = conn.createExpression();
// The XQuery expression to be executed
String es = "for $n in fn:doc('catalog.xml')//item" + " return fn:data($n/name)";
// execute the XQuery expression
XQResultSequence result = expr.executeQuery(es);
// process the result (sequence) iteratively
while (result.next()) {
// retrieve the current item of the sequence as a String
String str = result.getAtomicValue();
System.out.println("Product name: " + str);
}
// free all resources allocated for the result
result.close();
// free all resources allocated for the expression
expr.close();
// free all resources allocated for the connection
conn.close();
HTTP Access
- XDBMS access can be regarded as a REST service
- resource formats are whatever the database manages, but always XML
- the usual CRUD operations of databases can be mapped to HTTP methods
- HTTP access provides an elegant and flexible interface
- database access can be managed in the same way as any other Web-based service
- many applications have access to the database system
- HTTP database access is not appropriate for all scenarios
- high-throughput, high-volume applications need a better optimized solution
- plain HTTP access does not provide support for transactions or security
XQuery
- XQuery is a read-only language
- queries a collection of XML documents (XDM instances)
- returns an XDM instance, serialized as XML or something else
- Updating XML databases currently is not covered by a widely accepted standard
- XUpdate is a simple and rather old solution (04/2000)
- various XQuery Update Extensions have been proposed for XQuery
- the W3C is working on a XQuery Update Facility, but this will not be finished for some time
- XML database implementers often introduce proprietary update facilities
XUpdate
- XUpdate defines an language for specifying XML updates
- the data model is based on XPath 1.0
- the syntax is based on XML
- XUpdate has no connections with a query language, it is for updates only
<addresses version="1.0">
<address id="1">
<fullname>Andreas Laux</fullname>
<born day='1' month='12' year='1978'/>
</address>
</addresses><xupdate:modifications version="1.0" xmlns:xupdate="http://www.xmldb.org/xupdate">
<xupdate:insert-after select="/addresses/address[1]" >
<xupdate:element name="address">
<xupdate:attribute name="id">2</xupdate:attribute>
<fullname>Lars Martin</fullname>
<born day='2' month='12' year='1974'/>
</xupdate:element>
</xupdate:insert-after>
</xupdate:modifications>
XQuery Update Extensions
- XQuery 1.0 had been planned to be a read-only language
- creating a fully functional language would have been to ambitious
- with a solid formal foundation, XQuery can be upgraded to also provide update features
- Several XQuery update extensions have been proposed
- updating goes through a consolidation phase similar to querying
- the eventual XQuery update facility will be integrated with XPath
- W3C's XQuery Update Facility is in early draft status
do insert <year>2005</year> after fn:doc("bib.xml")/books/book[1]/publisher
do delete fn:doc("bib.xml")/books/book[1]/author[last()]
do replace fn:doc("bib.xml")/books/book[1]/publisher with fn:doc("bib.xml")/books/book[2]/publisher
Java XDBMS
- Java-based XML database with its own data storage
- not the fastest implementation choice
- can be run on any platform providing a 1.4 JRE
- Usable as standalone or embedded in Cocoon
- standalone provides HTTP access and is usable by any application
- embedded integrates eXist into Cocoon and turn Cocoon into a CMS
- Management through Java client or Web-based management console
- Java client is the older management tool and has more features
- Web-based management tool is work in progress and has some (Web-specific) limitations
- XML documents are stored in a proprietary format
- structurally indexed trees of the XML document
- additional indices can be managed to enable faster querying
Collections
- XML databases can store and retrieve XML documents
- in file systems, directories are used to organize the storage of files
- many XML databases use collections to organize the storage of XML documents
- Like directories, collections can be nested
- Queries affect all documents of a collection
Indexing
- Indexing makes XQueries much more efficient
- DBMS indices cost storage space and are expensive to update
- DBMS indices can be used for faster query processing
- eXist has three kinds of indices
- Structural Index for the document structures (cannot be modified by users)
- Fulltext Index indexes the content of element and attributes (can be disabled)
- Range Index for type-specific indexing of specific elements or attributes (must be enabled)
- Disabling fulltext indexing in eXist can yield surprising results
- rather than searching the collection, the database returns an empty result
- XQuery execution depends on fulltext indices (not only in terms of performance)
XML Collections
- XDBMS are useful for managing large document collections
- Managing a DBMS requires a minimum amount of expertise
- Performance benefits can be tremendous
- XDBMS are still in their early years, expect some surprises …