Select Structures from Your Blog

Assignment 4 — XML Foundations Fall 2008

Assigned: Thursday, September 25th, 2008
Due: Thursday, October 9th, 2008

Introduction:

Think about how to select interesting structures from your blog. These structures should be used to generate statistics for your blog, create accessible representations for your blog, and generate data from it which are only peripherally connected to the blog.

Instructions:

XPath is the language that many XML-oriented technologies use to select parts of an XML documents. XPath is supported in DOM and used in XSLT and XQuery as the only supported expression language. In this assignment, we apply XPath to the XML produced in Assignment 2, which conforms to the DTD created in Assignment 3. Writing XPaths requires knowledge of the schema (e.g., the DTD), because the selection of XML structures only can be done reliably if you know what to expect.

This also means that a bigger set of test data (i.e., a non-trivial XML produced in Assignment 2) will be much better starting point to start writing XPaths and see how they work or why they fail.

  • Create two trivial statistical use cases such as counting the number of blog entries or calculating the average number of goals per game (if your blog covers soccer games and has some XML structure to represent goals).
  • For some more challenging non-trivial scenarios, include at least three advanced use cases which should include the ID/IDREF structures of your blog. One example could be (taking the supreme court rulings scenario) to find all the cases in which a certain justice abstained; or to find all cases in which two given justices made different votes.

This assignment determines how much interesting information you can extract from the data represented in your XML. The more structural information you have, the more interesting things you can derive from it. This assignment might also be a good opportunity to revise your XML and the DTD. Ideally, the representations we create later (using XSLT) should provide much richer access to the data than just presenting the bare bones data itself. It can enrich the data with connections and context that can be derived from using the structural richness of the data, so the more structures you have, the better you can provide a rich representation of individual parts of your data.

Submit your XPath expressions together with the XML instance document on which you have tested them.


Creative Commons License Please send comments to dret@berkeley.edu
Last modification: Monday, 02-Feb-2009 01:00:13 EST
valid CSS! valid XHTML 1.0!