XML Path Language (XPath)

Database Management [./]
Fall 2012 — INFO 257

Erik Wilde, EMC IIG
2012-09-27

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents E. Wilde: XML Path Language (XPath)

Contents

E. Wilde: XML Path Language (XPath)

(2) Abstract

XML structures data into a rather small number of different constructs, most notably elements and attributes. The XML Path Language (XPath) defines a way how to select parts of XML documents, so that they can be used for further processing. XPath is a very compact language with a syntax that resembles path expressions well-known from file systems. These path expressions, however, are generalized and therefore much more powerful than the rather simple path expressions in file systems. Because of its use in different XML technologies, XPath is one of the most important XML core technologies. With XPath 2.0, the language has been greatly extended, the new version of XPath is the foundation for XSLT 2.0 and XQuery. XPath 2.0 provides support for regular expression matching, typed expressions, and contains language constructs for conditional and repeated evaluation.



Why XPath?

Outline (Why XPath?)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Limitations of XPath 1.0 [1]
  7. XPath 2.0 [8]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
    4. Sequences [2]
  8. Applications [3]
  9. Conclusions [1]
Why XPath? E. Wilde: XML Path Language (XPath)

(4) Selecting Parts of XML Documents



Why XPath? E. Wilde: XML Path Language (XPath)

(5) Making Selection Reusable



Why XPath? E. Wilde: XML Path Language (XPath)

(6) How XPath Evolved



How XPath Works

Outline (How XPath Works)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Limitations of XPath 1.0 [1]
  7. XPath 2.0 [8]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
    4. Sequences [2]
  8. Applications [3]
  9. Conclusions [1]

The XPath Tree Model

The XPath Tree Model E. Wilde: XML Path Language (XPath)

(9) Starting from the Infoset

  • XPath operates on an abstract data model
    • a tree derived from the Infoset
    • a simplification (another one!) of the underlying XML
  • The Infoset is turned into an XPath node tree
    • 11 infoset item types → 7 XPath node tree node types
    • character items are merged into text nodes
    • namespace declarations are no longer visible as attributes


The XPath Tree Model E. Wilde: XML Path Language (XPath)

(10) What is Not in the XPath Tree

  • The same things which are not in the Infoset
    • the order of attributes in a start tag
    • the types of quotes around attribute values
    • character references and entities (ü/üü)
  • And some more …
    • namespace declarations are no longer visible as attributes
    • notations and unexpanded entity references


XPath Evaluation

XPath Evaluation E. Wilde: XML Path Language (XPath)

(12) Tree In / Selection Out

  • XPath evaluates an expression based on a tree
  • Where the tree comes from is out of XPath's scope
  • The result of the evaluation is a selection
    • //img[not(@alt)] → select all images which have no alt attribute
    • count(//img) → return the number of images
    • /descendant::img[3]/@src → return the third image's src URI
    • starts-with(/html/@lang, 'en') → test whether the document's language is english
  • Syntax errors may occur


XPath Location Paths

Outline (XPath Location Paths)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Limitations of XPath 1.0 [1]
  7. XPath 2.0 [8]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
    4. Sequences [2]
  8. Applications [3]
  9. Conclusions [1]
XPath Location Paths E. Wilde: XML Path Language (XPath)

(14) Location Path Structure



XPath Node Tests

XPath Node Tests E. Wilde: XML Path Language (XPath)

(16) File System vs. XPath Paths

File System Path: / usr / local / apache / bin /
# Selected Nodes: 1 → 1 1 1 1
XPath: / html / body / table / thead / tr
# Selected Nodes: 1 → 1 1 6 4 12


XPath Node Tests E. Wilde: XML Path Language (XPath)

(17) Tests for Nodes

  • Name tests
    • testing for a particular name (elements/attributes): /html/head/title
    • wildcards (testing for any name): /html/head/*
  • Node type tests
    • text nodes: text()
    • comment nodes: comment()
    • any nodes: node()
  • Processing instruction tests
    • any PI: processing-instruction()
    • specific PI: processing-instruction("xml-stylesheet")


XPath Axes

XPath Axes E. Wilde: XML Path Language (XPath)

(19) Where Do You Want to Go Today?

  • File system paths are one direction only
    • always one level down in the file system hierarchy
    • . and .. are clever directory shortcuts
    • other directions supported by tools (e.g., find)
  • XPath allows steps is different directions
    • the default direction is child
    • other directions are explicitly specified: descendant::a


XPath Axes E. Wilde: XML Path Language (XPath)

(20) Axis Peculiarities

  • Attributes and Namespaces are not the children of elements, but …
  • … elements are their attributes' parent!
    • very counter-intuitive
    • very convenient
  • Attributes and Namespaces are always leaves in the node tree
  • Attribute nodes have the attribute value as their value
  • Namespace nodes have the namespace name (i.e., a URI) as their value
  • Namespace nodes exist because of namespace declarations
    • in the XPath node tree, only the namespace nodes are visible
    • the namespace declaration attributes (xmlns) are invisible
    • one namespace declaration potentially creates many namespace nodes


XPath Axes E. Wilde: XML Path Language (XPath)

(21) Axes

XPath Axes

XPath Axes E. Wilde: XML Path Language (XPath)

(22) Putting it all Together

  • XPath location paths use a simple syntax
    • sequence of location steps, separated by /
  • Each location step uses a simple structure (preceding::p[@class="warning"])
    1. an axis followed by :: (no axis uses the default axis child)
    2. a node test [Tests for Nodes (1)]
    3. 0-n Predicates [Predicates (1)] enclosed in []
  • Location paths can be abbreviated
    • child:: can be omitted (default axis)
    • attribute:: can be written as @
    • . is an abbreviation for self::node()
    • .. is an abbreviation for parent::node()
    • // is an abbreviation for /descendant-or-self::node()/


Predicates

Predicates E. Wilde: XML Path Language (XPath)

(24) Location Step Filters

  • Predicates are filters for each location step
    • there can be any number of filters (0-n)
    • each filter is applied to each selected node individually
  • Each predicate is an XPath and evaluated as a boolean
    • the context of this evaluation is the node for which the filter is evaluated
    • if the result is a number, it is compared with the position() function (/descendant::a[5])
  • Predicates always reduce the set of selected nodes
    • as corner cases, the set of selected nodes does not change or is empty
    • predicates are used in the majority of non-trivial XPath location paths


Predicates E. Wilde: XML Path Language (XPath)

(25) Location Path Processing

  • Location paths are processed in a very simple way
    1. start with a given context
    2. for each location step, repeat the following steps:
    3. based on the context and the axis, select the nodes on this axis
    4. reduce this selection to the nodes identified by the node test
    5. sequentially apply all filters to each of these nodes
    6. take the remaining node set as the context for the next location step


XPath Expressions

Outline (XPath Expressions)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Limitations of XPath 1.0 [1]
  7. XPath 2.0 [8]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
    4. Sequences [2]
  8. Applications [3]
  9. Conclusions [1]
XPath Expressions E. Wilde: XML Path Language (XPath)

(27) Beyond Location Paths



XPath Expressions E. Wilde: XML Path Language (XPath)

(28) XPath Usages



XPath Functions

Outline (XPath Functions)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Limitations of XPath 1.0 [1]
  7. XPath 2.0 [8]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
    4. Sequences [2]
  8. Applications [3]
  9. Conclusions [1]
XPath Functions E. Wilde: XML Path Language (XPath)

(30) Function Library



XPath Functions E. Wilde: XML Path Language (XPath)

(31) Using Functions



Limitations of XPath 1.0

Outline (Limitations of XPath 1.0)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Limitations of XPath 1.0 [1]
  7. XPath 2.0 [8]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
    4. Sequences [2]
  8. Applications [3]
  9. Conclusions [1]
Limitations of XPath 1.0 E. Wilde: XML Path Language (XPath)

(33) XPath Selects



XPath 2.0

Outline (XPath 2.0)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Limitations of XPath 1.0 [1]
  7. XPath 2.0 [8]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
    4. Sequences [2]
  8. Applications [3]
  9. Conclusions [1]
XPath 2.0 E. Wilde: XML Path Language (XPath)

(35) Easier to Understand

<listing src="xlinked-class.xml" line="81-98"/>
string-join(tokenize( if ( exists(@encoding) ) then unparsed-text($fileuri, @encoding) else unparsed-text($fileuri), '\r?\n')[(position() ge number(tokenize(current()/@line, '\-')[1])) and (position() le number(tokenize(current()/@line, '\-')[2]))], '&#xa;')


Conditional Expressions

Conditional Expressions E. Wilde: XML Path Language (XPath)

(37) Control Flow in XPath

  • XPath 1.0 expressions control flow is based on predicates
    • the results of location path steps are filtered by predicates
    • this can be used to emulate control flow
    • this technique is limited because it can only be applied to nodes
  • XPath 2.0 introduces conditional expressions
    • a condition is given which is interpreted as a boolean
    • based on the result, either the then or the else part is evaluated
    • the else part may not be omitted
if ( … ) then … else …
if ( @sex eq 'm' ) then 'Sir' else 'Madam'
if ( @sex eq 'm' ) then 'Sir' else if ( @sex eq 'f' ) then 'Madam' else 'Whatever'


Conditional Expressions E. Wilde: XML Path Language (XPath)

(38) Less XSLT

<names>
 <name>
  <first>Erik</first>
  <last>Wilde</last>
 </name>
 <name>
  <last>Hasan</last>
 </name>
</names>
first | last[not(../first)]
<xsl:variable name="name">
	<xsl:choose>
		<xsl:when test="first">
			<xsl:value-of select="first"/>
		</xsl:when>
		<xsl:otherwise>
			<xsl:value-of select="last"/>
		</xsl:otherwise>
	</xsl:choose>
</xsl:variable>
if ( exists(first) ) then first else last


Iterations

Iterations E. Wilde: XML Path Language (XPath)

(40) Repeating Expression Evaluation

  • Iteration repeatedly applies an expression to a sequence of items
    • the notion of Sequences [Sequences (1)] is central to this concept
    • this requires variables for binding and evaluation
  • Iterations clearly demonstrate the change in expressiveness
    • they introduce functionality which previously was limited to host languages
for $… in … return …
for $i in //name return $i/last
for $i in //name return if ( exists($i/first) ) then $i/first else $i/last


Iterations E. Wilde: XML Path Language (XPath)

(41) Iterations vs. Location Paths

  • Every location path can be written using iterations
    /names/name/last
    for $i in /names return for $j in $i/name return $j/last
  • Iterations are a more generalized way of evaluation
    • path expressions work on nodes only
      for $i in 1 to 10 return $i
    • path expression sort by document order and eliminate duplicates
      //last/../..
      for $i in //last return for $j in $i/.. return $j/..
    • location steps change the context, iterations use the variable for this purpose
  • Location paths are a useful syntax and method for tree navigation


Quantified Expressions

Quantified Expressions E. Wilde: XML Path Language (XPath)

(43) Testing Sequences

  • Testing whether some or all items of a sequence satisfy a condition
    • the notion of Sequences [Sequences (1)] is central to this concept
    • this requires variables for binding and evaluation
  • Quantifiers are well-known from query languages
    • some iterates over items and succeeds after the first success
    • every iterates over items and fails after the first failure
    • both constructs are good candidates for optimization
( some | every ) $… in … satisfies …
some $i in //*[@xlink:type='locator']/@xlink:href satisfies $i eq $query-uri
every $i in //li/@id satisfies //*[@xlink:type='locator'][@xlink:href=concat('#', $i)]


Sequences

Sequences E. Wilde: XML Path Language (XPath)

(45) Major Changes

  • XPath 1.0 has a very simple data model
    1. node sets: //img[not(@alt)]
    2. number: count(//img)
    3. string: /descendant::img[3]/@src
    4. boolean: starts-with(/html/@lang, 'en')
  • XPath 2.0 needs a more powerful model for its advanced functionality
    • everything in XPath 2.0 is a sequence
    • sequences can contain a mix of items of various types
    • sequences cannot be nested (there are no sequences of sequences)
every $i in ( 11, 22, 33, 'string' ) satisfies string(number($i)) ne 'NaN'


Sequences E. Wilde: XML Path Language (XPath)

(46) Divide and Conquer

  • Sequences are part of the XDM [http://dret.net/lectures/xml-fall08/xdm]
    • data models are separate entities from evaluation languages
    • a data model can be reused in different evaluation languages
  • XDM is far more complex than its predecessor, the Infoset
    • XSD datatypes have been integrated into the data model
    • Sequences allow more complex structures to exist
  • Understanding the data model is key to understanding the language
    • for simple XPaths, the mental model of XPath 1.0 works
    • more advanced XPaths can only be understood when understanding XDM


Applications

Outline (Applications)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Limitations of XPath 1.0 [1]
  7. XPath 2.0 [8]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
    4. Sequences [2]
  8. Applications [3]
  9. Conclusions [1]
Applications E. Wilde: XML Path Language (XPath)

(48) Standalone

for $i in ( 11, 22, 33, 'string' ) return ($i, number($i))


Applications E. Wilde: XML Path Language (XPath)

(49) XQuery

declare variable $firstName external;
<videos featuring="{$firstName}"> {
  let $doc := .
  for $v in $doc//video, $a in $doc//actors/actor
  where ends-with($a, $firstName) and $v/actorRef = $a/@id
  order by $v/year
  return
	<video year="{$v/year}"> { $v/title } </video> }
</videos>


Applications E. Wilde: XML Path Language (XPath)

(50) XSLT 2.0



Conclusions

Outline (Conclusions)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Limitations of XPath 1.0 [1]
  7. XPath 2.0 [8]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
    4. Sequences [2]
  8. Applications [3]
  9. Conclusions [1]
Conclusions E. Wilde: XML Path Language (XPath)

(52) Easy Transition



2012-09-27 Database Management [./]
Fall 2012 — INFO 257