XML Path Language (XPath) 2.0

XML Foundations [./]
Fall 2013 — INFO 242 (CCN 41613)

Erik Wilde, UC Berkeley School of Information
2013-09-18

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents E. Wilde: XML Path Language (XPath) 2.0

Contents

E. Wilde: XML Path Language (XPath) 2.0

(2) Abstract

The XML Path Language (XPath) is one of the most useful and frequently used languages in the area of XML technologies. In its version 1.0, it is used in technologies such as XSLT, XSD, DOM, and XML Tools. With XPath 2.0, the language has been greatly extended, the new version of XPath is the foundation for XSLT 2.0 and XQuery. XPath 2.0 provides support for regular expression matching, typed expressions, and contains language constructs for conditional and repeated evaluation.



Why XPath?

Outline (Why XPath?)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath 1.0 Revisited [3]
  4. Ease of Use [6]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
  5. Sequences [2]
  6. Applications [3]
  7. Conclusions [1]
Why XPath? E. Wilde: XML Path Language (XPath) 2.0

(4) Selecting Parts of XML Documents



Why XPath? E. Wilde: XML Path Language (XPath) 2.0

(5) Making Selection Reusable



Why XPath? E. Wilde: XML Path Language (XPath) 2.0

(6) How XPath Evolved



How XPath Works

Outline (How XPath Works)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath 1.0 Revisited [3]
  4. Ease of Use [6]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
  5. Sequences [2]
  6. Applications [3]
  7. Conclusions [1]

The XPath Tree Model

Outline (The XPath Tree Model)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath 1.0 Revisited [3]
  4. Ease of Use [6]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
  5. Sequences [2]
  6. Applications [3]
  7. Conclusions [1]
The XPath Tree Model E. Wilde: XML Path Language (XPath) 2.0

(9) Starting from the Infoset

  • XPath operates on an abstract data model [XML Path Language (XPath); The XPath Tree Model (1)]
  • The Infoset is turned into an XPath node tree
    • 11 infoset item types → 7 XPath node tree node types
    • character items are merged into text nodes
    • namespace declarations are no longer visible as attributes


The XPath Tree Model E. Wilde: XML Path Language (XPath) 2.0

(10) What is Not in the XPath Tree

  • The same things which are not in the Infoset [XML Varia; What is Not in the Infoset (1)]
    • the order of attributes in a start tag
    • the types of quotes around attribute values
    • character references and entities (ü/üü)
  • And some more …
    • namespace declarations are no longer visible as attributes
    • notations and unexpanded entity references


XPath Evaluation

XPath Evaluation E. Wilde: XML Path Language (XPath) 2.0

(12) Tree In / Selection Out

  • XPath evaluates an expression based on a tree
  • Where the tree comes from is out of XPath's scope
  • The result of the evaluation is a selection
    • //img[not(@alt)] → select all images which have no alt attribute
    • count(//img) → return the number of images
    • /descendant::img[3]/@src → return the third image's src URI
    • starts-with(/html/@lang, 'en') → test whether the document's language is english
  • Syntax errors may occur


XPath 1.0 Revisited

Outline (XPath 1.0 Revisited)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath 1.0 Revisited [3]
  4. Ease of Use [6]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
  5. Sequences [2]
  6. Applications [3]
  7. Conclusions [1]
XPath 1.0 Revisited E. Wilde: XML Path Language (XPath) 2.0

(14) Source Document

 <body>
  <div class="header">
   <h1><a href="http://dret.net/lectures/publishing-spring07/">Web-Based Publishing</a> – Class List</h1>
   <h2><a href="http://www.berkeley.edu/" title="UC Berkeley">UCB</a> <a href="http://ischool.berkeley.edu/" title="School of Information">iSchool</a> – Spring 2007</h2>
  </div>
  <ul>
   <li id="jeff">Jeff Decker</li>
   <li id="michael">Michael Lee</li>
   <li id="yiming">Yiming Liu</li>
   <li id="matty">Matthew Ochmanek</li>
   <li id="igor">Igor Pesenson</li>
   <li id="ryan">Ryan Shaw</li>
   <li id="libby">Libby Smith</li>
   <li id="john">John Ward</li>
   <li id="lois">Lois Wei</li>
   <li id="dret">Erik Wilde</li>
  </ul>
 </body>


XPath 1.0 Revisited E. Wilde: XML Path Language (XPath) 2.0

(15) XPath Expressions



XPath 1.0 Revisited E. Wilde: XML Path Language (XPath) 2.0

(16) Axes

xpath-axes.png

Ease of Use

Outline (Ease of Use)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath 1.0 Revisited [3]
  4. Ease of Use [6]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
  5. Sequences [2]
  6. Applications [3]
  7. Conclusions [1]
Ease of Use E. Wilde: XML Path Language (XPath) 2.0

(18) Easier to Understand

<listing src="xlinked-class.xml" line="81-98"/>
string-join(tokenize( if ( exists(@encoding) ) then unparsed-text($fileuri, @encoding) else unparsed-text($fileuri), '\r?\n')[(position() ge number(tokenize(current()/@line, '\-')[1])) and (position() le number(tokenize(current()/@line, '\-')[2]))], '&#xa;')


Conditional Expressions

Outline (Conditional Expressions)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath 1.0 Revisited [3]
  4. Ease of Use [6]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
  5. Sequences [2]
  6. Applications [3]
  7. Conclusions [1]
Conditional Expressions E. Wilde: XML Path Language (XPath) 2.0

(20) Control Flow in XPath

  • XPath 1.0 expressions control flow is based on predicates
    • the results of location path steps are filtered by predicates
    • this can be used to emulate control flow
    • this technique is limited because it can only be applied to nodes
  • XPath 2.0 introduces conditional expressions
    • a condition is given which is interpreted as a boolean
    • based on the result, either the then or the else part is evaluated
    • the else part may not be omitted
if ( … ) then … else …
if ( @sex eq 'm' ) then 'Sir' else 'Madam'
if ( @sex eq 'm' ) then 'Sir' else if ( @sex eq 'f' ) then 'Madam' else 'Whatever'


Conditional Expressions E. Wilde: XML Path Language (XPath) 2.0

(21) Less XSLT

<names>
 <name>
  <first>Erik</first>
  <last>Wilde</last>
 </name>
 <name>
  <last>Hasan</last>
 </name>
</names>
first | last[not(../first)]
<xsl:variable name="name">
	<xsl:choose>
		<xsl:when test="first">
			<xsl:value-of select="first"/>
		</xsl:when>
		<xsl:otherwise>
			<xsl:value-of select="last"/>
		</xsl:otherwise>
	</xsl:choose>
</xsl:variable>
if ( exists(first) ) then first else last


Iterations

Iterations E. Wilde: XML Path Language (XPath) 2.0

(23) Repeating Expression Evaluation

  • Iteration repeatedly applies an expression to a sequence of items
    • the notion of Sequences [Sequences (1)] is central to this concept
    • this requires variables for binding and evaluation
  • Iterations clearly demonstrate the change in expressiveness
    • they introduce functionality which previously was limited to host languages
for $… in … return …
for $i in //name return $i/last
for $i in //name return if ( exists($i/first) ) then $i/first else $i/last


Iterations E. Wilde: XML Path Language (XPath) 2.0

(24) Iterations vs. Location Paths

  • Every location path can be written using iterations
    /names/name/last
    for $i in /names return for $j in $i/name return $j/last
  • Iterations are a more generalized way of evaluation
    • path expressions work on nodes only
      for $i in 1 to 10 return $i
    • path expression sort by document order and eliminate duplicates
      //last/../..
      for $i in //last return for $j in $i/.. return $j/..
    • location steps change the context, iterations use the variable for this purpose
  • Location paths are a useful syntax and method for tree navigation


Quantified Expressions

Outline (Quantified Expressions)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath 1.0 Revisited [3]
  4. Ease of Use [6]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
  5. Sequences [2]
  6. Applications [3]
  7. Conclusions [1]
Quantified Expressions E. Wilde: XML Path Language (XPath) 2.0

(26) Testing Sequences

  • Testing whether some or all items of a sequence satisfy a condition
    • the notion of Sequences [Sequences (1)] is central to this concept
    • this requires variables for binding and evaluation
  • Quantifiers are well-known from query languages
    • some iterates over items and succeeds after the first success
    • every iterates over items and fails after the first failure
    • both constructs are good candidates for optimization
( some | every ) $… in … satisfies …
some $i in //*[@xlink:type='locator']/@xlink:href satisfies $i eq $query-uri
every $i in //li/@id satisfies //*[@xlink:type='locator'][@xlink:href=concat('#', $i)]


Sequences

Outline (Sequences)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath 1.0 Revisited [3]
  4. Ease of Use [6]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
  5. Sequences [2]
  6. Applications [3]
  7. Conclusions [1]
Sequences E. Wilde: XML Path Language (XPath) 2.0

(28) Major Changes

every $i in ( 11, 22, 33, 'string' ) satisfies string(number($i)) ne 'NaN'


Sequences E. Wilde: XML Path Language (XPath) 2.0

(29) Divide and Conquer



Applications

Outline (Applications)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath 1.0 Revisited [3]
  4. Ease of Use [6]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
  5. Sequences [2]
  6. Applications [3]
  7. Conclusions [1]
Applications E. Wilde: XML Path Language (XPath) 2.0

(31) Standalone

for $i in ( 11, 22, 33, 'string' ) return ($i, number($i))


Applications E. Wilde: XML Path Language (XPath) 2.0

(32) XQuery

declare variable $firstName external;
<videos featuring="{$firstName}"> {
  let $doc := .
  for $v in $doc//video, $a in $doc//actors/actor
  where ends-with($a, $firstName) and $v/actorRef = $a/@id
  order by $v/year
  return
	<video year="{$v/year}"> { $v/title } </video> }
</videos>


Applications E. Wilde: XML Path Language (XPath) 2.0

(33) XSLT 2.0



Conclusions

Outline (Conclusions)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath 1.0 Revisited [3]
  4. Ease of Use [6]
    1. Conditional Expressions [2]
    2. Iterations [2]
    3. Quantified Expressions [1]
  5. Sequences [2]
  6. Applications [3]
  7. Conclusions [1]
Conclusions E. Wilde: XML Path Language (XPath) 2.0

(35) Easy Transition



2013-09-18 XML Foundations [./]
Fall 2013 — INFO 242 (CCN 41613)