XQuery 1.0 and XPath 2.0 Data Model (XDM)

Outline (XQuery 1.0 and XPath 2.0 Data Model (XDM))

XQuery 1.0 and XPath 2.0 Data Model (XDM) [12]
1. Sets vs. Sequences [2]
2. Comparisons [4]
3. Working with Sequences [5]
4. Working with Values [1]
Readable Syntax [6]
1. Conditional Expressions [2]
2. Iterations [2]
3. Quantified Expressions [1]
Conclusions [1]

Sets vs. Sequences

Outline (Sets vs. Sequences)

XQuery 1.0 and XPath 2.0 Data Model (XDM) [12]
1. Sets vs. Sequences [2]
2. Comparisons [4]
3. Working with Sequences [5]
4. Working with Values [1]
Readable Syntax [6]
1. Conditional Expressions [2]
2. Iterations [2]
3. Quantified Expressions [1]
Conclusions [1]

Sets vs. Sequences E. Wilde: XML Path Language (XPath) 2.0

(5) XPath 1.0 Sets

XPath 1.0 has a very simple data model of four types
1. node sets: //img[not(@alt)]
2. number: count(//img)
3. string: /descendant::img[3]/@src
4. boolean: starts-with(/html/@lang, 'en')
When XPath 1.0 was created, the XML world was untyped
- XML documents contain content in text nodes and attribute values
- XPath introduced its humble world of three datatypes
Dealing with types in XSLT 1.0 is very unpleasant
- all datatypes beyond the basic types must be implemented by hand
- all operations on these types must be implemented as well
- EXSLT [http://www.exslt.org/] collects modules for frequently used datatypes

Sets vs. Sequences E. Wilde: XML Path Language (XPath) 2.0

(6) XPath 2.0 Sequences

XSD introduces the concept of typed data to the XML world
- one part of XSD is its ability to validate documents
- the other part of XSD is the fact that validation produces type annotations
Sequences are XPath's mechanism where these types show up
XPath 2.0 needs a more powerful model for its advanced functionality
- everything in XPath 2.0 is a sequence (of typed items)
- sequences can contain a mix of items of various types
- sequences cannot be nested (there are no sequences of sequences)
Sequences replace node sets, which in XDM do not exist anymore
- Sequences replace node-sets from XPath 1.0. In XPath 1.0, node-sets do not contain duplicates. In generalizing node-sets to sequences in XPath 2.0, duplicate removal is provided by functions on node sequences. (XDM [http://www.w3.org/TR/xpath-datamodel/#sequences])

Comparisons

Outline (Comparisons)

XQuery 1.0 and XPath 2.0 Data Model (XDM) [12]
1. Sets vs. Sequences [2]
2. Comparisons [4]
3. Working with Sequences [5]
4. Working with Values [1]
Readable Syntax [6]
1. Conditional Expressions [2]
2. Iterations [2]
3. Quantified Expressions [1]
Conclusions [1]

Comparisons E. Wilde: XML Path Language (XPath) 2.0

(8) General Comparisons

= != < <= > >=

XPath 1.0 only has these operators
- they are defined to work on any of the four datatypes
- node set comparisons are defined in a rather complex way [http://www.w3.org/TR/xpath#booleans]
- in particular, XPath 1.0 comparisons often involve type casting
XPath 2.0 introduces Value Comparisons [Value Comparisons (1)] for comparing atomic values
- they are introduced to provide a set of operators with less surprises
- the original XPath 1.0 operators are redefined to work on sequences
General comparisons can be expressed using Quantified Expressions [Quantified Expressions (1)]
- potentially a large number of comparisons
```
$X = $Y
```
```
some $x in $X, $y in $Y satisfies $x eq $y
```

Comparisons E. Wilde: XML Path Language (XPath) 2.0

(9) Value Comparisons

eq ne lt le gt ge

These operators have been introduced by XPath 2.0
- they work on single values only
- they should be used except when sequences are allowed as operands
The value comparison operators also have built-in type conversion rules
- prior to anything else, both operands are atomized
- comparing with an empty sequence always yields an empty sequence
- comparing with a sequence with more than one item yields an error
- after that, the values are converted to a common type

Comparisons E. Wilde: XML Path Language (XPath) 2.0

(10) Node Comparisons

is << >>

Comparing nodes by identity or document order
- node identity is very cumbersome to test in XPath 1.0
- XPath 2.0 makes axis support optional
- some XQuery implementations do not support preceding* and following*
$a is $b is true only if both variables identify the same node
- when processing documents, identity often is more relevant than equality
- much better than XPath 1.0's generate-id($a) = generate-id($b)
$a << $b is true if $a precedes $b in document order
- precedence (as in the preceding axis) excludes containment
- if the nodes are in different documents, the result is undefined

Comparisons E. Wilde: XML Path Language (XPath) 2.0

(11) Some Surprises

Sequences make some things more complicated than atomic values
$X = $X is not always true
- if $X is the empty sequence, there are no equal items
$X != 'test' and not($X = 'test') are not the same
- $X != 'test' is true if one item in $X is not equal to 'test'
- not($X = 'test') is true if no item in $X is equal to 'test'
- the classical case are optional parts: @mode != 'test' is false if there is no @mode!
- it is generally a good idea to avoid != and use not() and =
$X = $Y and $Y = $Z does not imply $X = $Z
- the reason is that comparisons are done pairwise (the comparisons are sets of comparisons)
- (1, 2), (2, 3), and (3, 4) illustrate this behavior
- = only tests for partial equality (one item must be equal)

Working with Sequences

Outline (Working with Sequences)

XQuery 1.0 and XPath 2.0 Data Model (XDM) [12]
1. Sets vs. Sequences [2]
2. Comparisons [4]
3. Working with Sequences [5]
4. Working with Values [1]
Readable Syntax [6]
1. Conditional Expressions [2]
2. Iterations [2]
3. Quantified Expressions [1]
Conclusions [1]

Working with Sequences E. Wilde: XML Path Language (XPath) 2.0

(13) Testing Sequence Cardinality

Testing for empty sequences
```
empty(()) = true()
```
Testing for non-empty sequences
```
exists((1, 2, 3)) = true()
```
Cleaner code for conditional expressions
- good code should not rely on implicit type conversions
```
if ( exists(@email) ) then …
```
```
if ( empty(@email) ) then …
```

Working with Sequences E. Wilde: XML Path Language (XPath) 2.0

(14) Set Operations on Sequences

Merging two node sets (no duplicates, document order)
```
() | ()
```
Intersecting two node sets (no duplicates, document order)
```
() intersect ()
```
Subtracting two node sets (no duplicates, document order)
```
() except ()
```
Comparing sequences item by item for deep equality
```
deep-equal((1, 2, 3), (1, 3, 2)) ≡ false()
```

Working with Sequences E. Wilde: XML Path Language (XPath) 2.0

(15) Manipulating Sequences (I)

Concatenating sequences

((1, 2, 3), (4, 5, 6)) ≡ (1, 2, 3, 4, 5, 6)

Reversing sequences
```
reverse((1, 2, 3, 4)) ≡ (4, 3, 2, 1)
```
Finding items in sequences
```
index-of((1, 2, 3, 1), 1) ≡ (1, 4)
```

Cutting sub-sequences out of sequences

subsequence((1, 2, 3, 4, 5, 6, 7), 5, 2) ≡ (5, 6)

Working with Sequences E. Wilde: XML Path Language (XPath) 2.0

(16) Manipulating Sequences (II)

Inserting items into sequences

insert-before(("one", "two", "four"), 3, "three") ≡ ("one", "two", "three", "four")

Removing items from sequences

remove(("white", "white", "black", "white"), 3) ≡ ("white", "white", "white")

Removing duplicates from a sequence

distinct-values((1, 2, 3, 1, 2, 6, 7)) ≡ (1, 2, 3, 6, 7)

Help your optimizer!

unordered((1, 2, 3, 4, 5)) ≡ (3, 4, 1, 2, 5)

Working with Sequences E. Wilde: XML Path Language (XPath) 2.0

(17) Aggregating Sequences

Counting the number of items in a sequence
```
count((1, 2, 3, 4, 5, 6)) ≡ 6
```
Calculating the average (the types must be compatible)
```
avg((1, 2, 3, 4, 5, 6)) ≡ 3.5
```
Getting maximum or minimum values from a sequence (the types must be compatible)
```
max($seq) ge min($seq)
```
Calculating the sum of sequence items (the types must be compatible)
```
sum(1 to 42) ≡ 903
```

Working with Values

Outline (Working with Values)

XQuery 1.0 and XPath 2.0 Data Model (XDM) [12]
1. Sets vs. Sequences [2]
2. Comparisons [4]
3. Working with Sequences [5]
4. Working with Values [1]
Readable Syntax [6]
1. Conditional Expressions [2]
2. Iterations [2]
3. Quantified Expressions [1]
Conclusions [1]

Working with Values E. Wilde: XML Path Language (XPath) 2.0

(19) Type Casting

Values often are untyped
- they may be part of a schema-less document
- they may be extracted as substring of some text value
- XSLT 2.0 allows to read text files (these texts are never typed)
For intermediate results typed values may be advantageous
- certain operations are only possible on typed values
- code using typed values usually is more robust

XPath 2.0 has several ways to handle types and instances

42 instance of xs:integer

'2007-02-13' castable as xs:date

'2007-02-13' cast as xs:date

if ( $i castable as xs:… ) then $i cast as xs:… else ()

Readable Syntax

Outline (Readable Syntax)

XQuery 1.0 and XPath 2.0 Data Model (XDM) [12]
1. Sets vs. Sequences [2]
2. Comparisons [4]
3. Working with Sequences [5]
4. Working with Values [1]
Readable Syntax [6]
1. Conditional Expressions [2]
2. Iterations [2]
3. Quantified Expressions [1]
Conclusions [1]

Readable Syntax E. Wilde: XML Path Language (XPath) 2.0

(21) Easier to Understand

XPath 2.0 provides better ways to write XPaths
- some constructs allow better ways of writing XPaths
- some constructs allow things previously impossible in XPath
XPath usually is embedded in another language (XQuery, XSLT)
- even in XSLT 1.0, there was always a trade-off between XPath and XSLT
- with XPath 2.0, even more powerful XPaths can be implemented
Finding a good balance between XPath and the host language is an art
- very complex XPaths can become almost undecipherable
- there is no final answer, coding styles vary based on language preference

<listing src="xlinked-class.xml" line="81-98"/>

string-join(tokenize( if ( exists(@encoding) ) then unparsed-text($fileuri, @encoding) else unparsed-text($fileuri), '\r?\n')[(position() ge number(tokenize(current()/@line, '\-')[1])) and (position() le number(tokenize(current()/@line, '\-')[2]))], '&#xa;')

Conditional Expressions

Outline (Conditional Expressions)

XQuery 1.0 and XPath 2.0 Data Model (XDM) [12]
1. Sets vs. Sequences [2]
2. Comparisons [4]
3. Working with Sequences [5]
4. Working with Values [1]
Readable Syntax [6]
1. Conditional Expressions [2]
2. Iterations [2]
3. Quantified Expressions [1]
Conclusions [1]

Conditional Expressions E. Wilde: XML Path Language (XPath) 2.0

(23) Control Flow in XPath

XPath 1.0 expressions control flow is based on predicates
- the results of location path steps are filtered by predicates
- this can be used to emulate control flow
- this technique is limited because it can only be applied to nodes
XPath 2.0 introduces conditional expressions
- a condition is given which is interpreted as a boolean
- based on the result, either the then or the else part is evaluated
- the else part may not be omitted

if ( … ) then … else …

if ( @sex eq 'm' ) then 'Sir' else 'Madam'

if ( @sex eq 'm' ) then 'Sir' else if ( @sex eq 'f' ) then 'Madam' else 'Whatever'

Conditional Expressions E. Wilde: XML Path Language (XPath) 2.0

(24) Less XSLT

<names>
 <name>
  <first>Erik</first>
  <last>Wilde</last>
 </name>
 <name>
  <last>Hasan</last>
 </name>
</names>

names.xml

first | last[not(../first)]

<xsl:variable name="name">
	<xsl:choose>
		<xsl:when test="first">
			<xsl:value-of select="first"/>
		</xsl:when>
		<xsl:otherwise>
			<xsl:value-of select="last"/>
		</xsl:otherwise>
	</xsl:choose>
</xsl:variable>

if ( exists(first) ) then first else last

Iterations

Outline (Iterations)

XQuery 1.0 and XPath 2.0 Data Model (XDM) [12]
1. Sets vs. Sequences [2]
2. Comparisons [4]
3. Working with Sequences [5]
4. Working with Values [1]
Readable Syntax [6]
1. Conditional Expressions [2]
2. Iterations [2]
3. Quantified Expressions [1]
Conclusions [1]

Iterations E. Wilde: XML Path Language (XPath) 2.0

(26) Repeating Expression Evaluation

Iteration repeatedly applies an expression to a sequence of items
- the notion of sequences is central to this concept
- this requires variables for binding and evaluation
Iterations clearly demonstrate the change in expressiveness
- they introduce functionality which previously was limited to host languages

for $… in … return …

for $i in //name return $i/last

for $i in //name return if ( exists($i/first) ) then $i/first else $i/last

Iterations E. Wilde: XML Path Language (XPath) 2.0

(27) Iterations vs. Location Paths

Every location path can be written using iterations

/names/name/last

for $i in /names return for $j in $i/name return $j/last

Iterations are a more generalized way of evaluation
- path expressions work on nodes only
```
for $i in 1 to 10 return $i
```
- path expression sort by document order and eliminate duplicates
```
//last/../..
```
```
for $i in //last return for $j in $i/.. return $j/..
```
- location steps change the context, iterations use the variable for this purpose
Location paths are a useful syntax and method for tree navigation

Quantified Expressions

Outline (Quantified Expressions)

XQuery 1.0 and XPath 2.0 Data Model (XDM) [12]
1. Sets vs. Sequences [2]
2. Comparisons [4]
3. Working with Sequences [5]
4. Working with Values [1]
Readable Syntax [6]
1. Conditional Expressions [2]
2. Iterations [2]
3. Quantified Expressions [1]
Conclusions [1]

Quantified Expressions E. Wilde: XML Path Language (XPath) 2.0

(29) Testing Sequences

Testing whether some or all items of a sequence satisfy a condition
- the notion of sequences is central to this concept
- this requires variables for binding and evaluation
Quantifiers are well-known from query languages
- some iterates over items and succeeds after the first success
- every iterates over items and fails after the first failure
- both constructs are good candidates for optimization

( some | every ) $… in … satisfies …

some $i in //*[@xlink:type='locator']/@xlink:href satisfies $i eq $query-uri

every $i in //li/@id satisfies //*[@xlink:type='locator'][@xlink:href=concat('#', $i)]

XML Path Language (XPath) 2.0

XML Foundations [./]Fall 2011 — INFO 242 (CCN 42596)

Erik Wilde, UC Berkeley School of Information2011-10-06

Contents

(2) Abstract

XQuery 1.0 and XPath 2.0 Data Model (XDM)

Outline (XQuery 1.0 and XPath 2.0 Data Model (XDM))

Sets vs. Sequences

Outline (Sets vs. Sequences)

(5) XPath 1.0 Sets

(6) XPath 2.0 Sequences

Comparisons

Outline (Comparisons)

(8) General Comparisons

(9) Value Comparisons

(10) Node Comparisons

(11) Some Surprises

Working with Sequences

Outline (Working with Sequences)

(13) Testing Sequence Cardinality

(14) Set Operations on Sequences

(15) Manipulating Sequences (I)

(16) Manipulating Sequences (II)

(17) Aggregating Sequences

Working with Values

Outline (Working with Values)

(19) Type Casting

Readable Syntax

Outline (Readable Syntax)

(21) Easier to Understand

Conditional Expressions

Outline (Conditional Expressions)

(23) Control Flow in XPath

(24) Less XSLT

Iterations

Outline (Iterations)

(26) Repeating Expression Evaluation

(27) Iterations vs. Location Paths

Quantified Expressions

Outline (Quantified Expressions)

(29) Testing Sequences

Conclusions

Outline (Conclusions)

(31) Easy Transition

XML Foundations [./]
Fall 2011 — INFO 242 (CCN 42596)

Erik Wilde, UC Berkeley School of Information
2011-10-06