XML Path Language (XPath)

XML Foundations [./]
Fall 2011 — INFO 242 (CCN 42596)

Erik Wilde, UC Berkeley School of Information
2011-10-04

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents E. Wilde: XML Path Language (XPath)

Contents

E. Wilde: XML Path Language (XPath)

(2) Abstract

XML structures data into a rather small number of different constructs, most notably elements and attributes. The XML Path Language (XPath) defines a way how to select parts of XML documents, so that they can be used for further processing. XPath's primary use in in XSL Transformations (XSLT) and XQuery, but other XML technologies use it as well, e.g. XSD. XPath is a compact language with a syntax that resembles path expressions well-known from file systems. These path expressions, however, are generalized and therefore more powerful than the rather simple path expressions in file systems. Because of its use in different XML technologies, XPath is one of the most important XML core technologies.



Why XPath?

Outline (Why XPath?)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Conclusions [4]
Why XPath? E. Wilde: XML Path Language (XPath)

(4) Selecting Parts of XML Documents



Why XPath? E. Wilde: XML Path Language (XPath)

(5) Making Selection Reusable



Why XPath? E. Wilde: XML Path Language (XPath)

(6) How XPath Evolved



How XPath Works

Outline (How XPath Works)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Conclusions [4]

The XPath Tree Model

Outline (The XPath Tree Model)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Conclusions [4]
The XPath Tree Model E. Wilde: XML Path Language (XPath)

(9) Starting from the Infoset

  • XPath operates on an abstract data model
    • a tree derived from the [@infoset]
    • a simplification of the underlying XML
  • The Infoset is turned into an XPath node tree
    • 11 infoset item types → 7 XPath node tree node types
    • character items are merged into text nodes
    • namespace declarations are no longer visible as attributes


The XPath Tree Model E. Wilde: XML Path Language (XPath)

(10) What is Not in the XPath Tree

  • The same things which are [@not-infoset]
    • the order of attributes in a start tag
    • the types of quotes around attribute values
    • character references and entities (ü/üü)
  • And some more …
    • namespace declarations are no longer visible as attributes
    • notations and unexpanded entity references


XPath Evaluation

XPath Evaluation E. Wilde: XML Path Language (XPath)

(12) Tree In / Selection Out

  • XPath evaluates an expression based on a tree
  • Where the tree comes from is out of XPath's scope
  • The result of the evaluation is a selection
    • //img[not(@alt)] → select all images which have no alt attribute
    • count(//img) → return the number of images
    • /descendant::img[3]/@src → return the third image's src URI
    • starts-with(/html/@lang, 'en') → test whether the document's language is english
  • Syntax errors may occur


XPath Location Paths

Outline (XPath Location Paths)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Conclusions [4]
XPath Location Paths E. Wilde: XML Path Language (XPath)

(14) Location Path Structure



XPath Node Tests

XPath Node Tests E. Wilde: XML Path Language (XPath)

(16) File System vs. XPath Paths

File System Path: / usr / local / apache / bin /
# Selected Nodes: 1 → 1 1 1 1
XPath: / html / body / table / thead / tr
# Selected Nodes: 1 → 1 1 6 4 12


XPath Node Tests E. Wilde: XML Path Language (XPath)

(17) Tests for Nodes

  • Name tests
    • testing for a particular name (elements/attributes): /html/head/title
    • wildcards (testing for any name): /html/head/*
  • Node type tests
    • text nodes: text()
    • comment nodes: comment()
    • any nodes: node()
  • Processing instruction tests
    • any PI: processing-instruction()
    • specific PI: processing-instruction("xml-stylesheet")


XPath Axes

XPath Axes E. Wilde: XML Path Language (XPath)

(19) Where Do You Want to Go Today?

  • File system paths are one direction only
    • always one level down in the file system hierarchy
    • . and .. are clever directory shortcuts
    • other directions supported by tools (e.g., find)
  • XPath allows steps is different directions
    • the default direction is child
    • other directions are explicitly specified: descendant::a


XPath Axes E. Wilde: XML Path Language (XPath)

(20) Axis Peculiarities

  • Attributes and Namespaces are not the children of elements, but …
  • … elements are their attributes' parents!
    • counter-intuitive but very convenient
  • Attributes and Namespaces are always leaves in the node tree
  • Attribute nodes have the attribute value as their value
  • Namespace nodes have the namespace name (i.e., a URI) as their value
  • Namespace nodes exist because of namespace declarations
    • in the XPath node tree, only the namespace nodes are visible
    • the namespace declaration attributes (xmlns) are invisible
    • one namespace declaration potentially creates many namespace nodes


XPath Axes E. Wilde: XML Path Language (XPath)

(21) Axes

XPath Axes

XPath Axes E. Wilde: XML Path Language (XPath)

(22) Putting it all Together

  • XPath location paths use a simple syntax
    • sequence of location steps, separated by /
  • Each location step uses a simple structure (preceding::p[@class="warning"])
    1. an axis followed by :: (no axis uses the default axis child)
    2. a node test [Tests for Nodes (1)]
    3. 0-n Predicates [Predicates (1)] enclosed in []
  • Location paths can be abbreviated
    • child:: can be omitted (default axis)
    • attribute:: can be written as @
    • . is an abbreviation for self::node()
    • .. is an abbreviation for parent::node()
    • // is an abbreviation for /descendant-or-self::node()/


Predicates

Predicates E. Wilde: XML Path Language (XPath)

(24) Location Step Filters

  • Predicates are filters for each location step
    • there can be any number of filters (0-n)
    • each filter is applied to each selected node individually
  • Each predicate is an XPath and evaluated as a boolean
    • the context of this evaluation is the node for which the filter is evaluated
    • if the result is a number, it is compared with the position() function (/descendant::a[5])
  • Predicates always reduce the set of selected nodes
    • as corner cases, the set of selected nodes does not change or is empty
    • predicates are used in the majority of non-trivial XPath location paths


Predicates E. Wilde: XML Path Language (XPath)

(25) Location Path Processing

  • Location paths are processed in a very simple way
    1. start with a given context
    2. for each location step, repeat the following steps:
    3. based on the context and the axis, select the nodes on this axis
    4. reduce this selection to the nodes identified by the node test
    5. sequentially apply all filters to each of these nodes
    6. take the remaining node set as the context for the next location step


XPath Expressions

Outline (XPath Expressions)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Conclusions [4]
XPath Expressions E. Wilde: XML Path Language (XPath)

(27) Beyond Location Paths



XPath Expressions E. Wilde: XML Path Language (XPath)

(28) XPath Usages



XPath Functions

Outline (XPath Functions)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Conclusions [4]
XPath Functions E. Wilde: XML Path Language (XPath)

(30) Function Library



XPath Functions E. Wilde: XML Path Language (XPath)

(31) Using Functions



Conclusions

Outline (Conclusions)

  1. Why XPath? [3]
  2. How XPath Works [3]
    1. The XPath Tree Model [2]
    2. XPath Evaluation [1]
  3. XPath Location Paths [9]
    1. XPath Node Tests [2]
    2. XPath Axes [4]
    3. Predicates [2]
  4. XPath Expressions [2]
  5. XPath Functions [2]
  6. Conclusions [4]
Conclusions E. Wilde: XML Path Language (XPath)

(33) XPath is Important



Conclusions E. Wilde: XML Path Language (XPath)

(34) Limitations of XPath



Conclusions E. Wilde: XML Path Language (XPath)

(35) Getting Started



Conclusions E. Wilde: XML Path Language (XPath)

(36) Some Exercises



2011-10-04 XML Foundations [./]
Fall 2011 — INFO 242 (CCN 42596)