Overview and Introduction

XML Foundations [./]
Fall 2013 — INFO 242 (CCN 41613)

Erik Wilde, UC Berkeley School of Information
2013-09-04

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents E. Wilde: Overview and Introduction

Contents

E. Wilde: Overview and Introduction

(2) Abstract

The Extensible Markup Language (XML) has been introduced in 1998 to enable content providers to publish their content on the Web in an application-specific format. HTML was considered as conveying not enough semantics, since its only purpose was (and is) the preparation of content for Web-based publishing. XML was the first step towards machine-readable data formats for the Web, a trend that since its invention has been taken to higher levels with the idea of the Semantic Web. XML appeared when the Web was in the steepest part of its success curve, and since then has taken over as the globally accepted format for the exchange of machine-readable structured data.



Varia

Outline (Varia)

  1. Varia [4]
  2. Why/How/Where XML is Useful [5]
  3. Data Formats? Databases? [8]
    1. Alternatives to XML [3]
    2. XML Big Data [5]
  4. Conclusions [1]
Varia E. Wilde: Overview and Introduction

(4) About Me



Varia E. Wilde: Overview and Introduction

(5) About this Course



Varia E. Wilde: Overview and Introduction

(6) About these Slides



Varia E. Wilde: Overview and Introduction

(7) Additional Resources



Why/How/Where XML is Useful

Outline (Why/How/Where XML is Useful)

  1. Varia [4]
  2. Why/How/Where XML is Useful [5]
  3. Data Formats? Databases? [8]
    1. Alternatives to XML [3]
    2. XML Big Data [5]
  4. Conclusions [1]
Why/How/Where XML is Useful E. Wilde: Overview and Introduction

(9) XML is a Metalanguage



Why/How/Where XML is Useful E. Wilde: Overview and Introduction

(10) GPS Track Visualization

marin-run-map.png

Why/How/Where XML is Useful E. Wilde: Overview and Introduction

(11) GPS Track in XML

<gpx version="1.1" creator="Garmin Connect" xmlns="http://www.topografix.com/GPX/1/1">
  <metadata>
    <link href="connect.garmin.com">
      <text>Garmin Connect</text>
    </link>
    <time>2010-01-15T20:21:18.000Z</time>
  </metadata>
  <trk>
    <name>Fire Trail</name>
    <trkseg>
      <trkpt lon="-122.256273524836" lat="37.8699341509491">
        <time>2010-01-15T20:21:18.000Z</time>
      </trkpt>
      <trkpt lon="-122.256267238408" lat="37.8699422813952">
        <ele>98.4</ele>
        <time>2010-01-15T20:21:21.000Z</time>
      </trkpt>
      <trkpt lon="-122.256179982796" lat="37.8701049741358">
        <ele>100.4</ele>
        <time>2010-01-15T20:21:26.000Z</time>
      </trkpt>


Why/How/Where XML is Useful E. Wilde: Overview and Introduction

(12) GPS Track Combination

heatmap-golden-gate.png

Why/How/Where XML is Useful E. Wilde: Overview and Introduction

(13) Finding Activities (Saxon)

declare default element namespace "http://www.topografix.com/GPX/1/1";
declare namespace saxon="http://saxon.sf.net/";
declare option saxon:output "method=text";

declare variable $dir := '/Users/dret/Desktop/Dropbox/training/';
declare variable $file := '2010-*.gpx';
declare variable $files := collection(concat($dir,'?select=',$file));

declare variable $lon := -122.48;
declare variable $lat := 37.82;
declare variable $box := 0.01;

declare variable $lonlower := $lon - $box;
declare variable $lonupper := $lon + $box;
declare variable $latlower := $lat - $box;
declare variable $latupper := $lat + $box;

( "Searching", count($files), "files with", count($files/gpx/trk/trkseg/trkpt), "track points:&#xa;", 

for $activity in $files
    where exists($activity/gpx/trk/trkseg/trkpt[ (@lon > $lonlower) and
                                                 (@lon < $lonupper) and
                                                 (@lat > $latlower) and
                                                 (@lat < $latupper) ])
    return ( saxon:format-date(xs:date(substring($activity/gpx/metadata/time/text(),1,10)), "[MNn] [D], [Y]"),
             ": ",
             $activity/gpx/trk/name/text(),
             "&#xa;") )


Data Formats? Databases?

Outline (Data Formats? Databases?)

  1. Varia [4]
  2. Why/How/Where XML is Useful [5]
  3. Data Formats? Databases? [8]
    1. Alternatives to XML [3]
    2. XML Big Data [5]
  4. Conclusions [1]

Alternatives to XML

Outline (Alternatives to XML)

  1. Varia [4]
  2. Why/How/Where XML is Useful [5]
  3. Data Formats? Databases? [8]
    1. Alternatives to XML [3]
    2. XML Big Data [5]
  4. Conclusions [1]
Alternatives to XML E. Wilde: Overview and Introduction

(16) XML and CSV

  • CSV is easier to understand and use
  • CSV tools (such as Excel) are widely used and understood
  • Structures beyond tables are hard to represent
  • Document structures are impossible to represent


Alternatives to XML E. Wilde: Overview and Introduction

(17) XML and JSON

  • JSON [XML Varia; JavaScript Object Notation (JSON) (1)] maps better into most programming languages
  • JSON objects can be readily used as language objects
  • Structures beyond nested objects/arrays are hard to represent
  • Document structures are impossible to represent


Alternatives to XML E. Wilde: Overview and Introduction

(18) XML and RDF

  • RDF [XML Varia; Semantic Web & Linked Data (1)] does not have the built-in tree bias of XML
  • RDF can be combined more easily across documents
  • Data that has some natural coherence is hard to manage
  • Document structures are impossible to represent


XML Big Data

Outline (XML Big Data)

  1. Varia [4]
  2. Why/How/Where XML is Useful [5]
  3. Data Formats? Databases? [8]
    1. Alternatives to XML [3]
    2. XML Big Data [5]
  4. Conclusions [1]
XML Big Data E. Wilde: Overview and Introduction

(20) XML and Databases

  • A data format defines a framework for certain kinds of data
    • some formats are metalanguages such as XML, but most are not
  • A database is based on a data model and manages data as well as access
  • XML does not talk about databases at all
  • It is possible to build databases that support XML's model
    • data storage can be scaled beyond what file systems can manage
    • access to XML can be much better optimized in XML databases


XML Big Data E. Wilde: Overview and Introduction

(21) XQuery using Files

File-based XQuery Processing

XML Big Data E. Wilde: Overview and Introduction

(22) XQuery using an XML Database

DB-based XQuery Processing

XML Big Data E. Wilde: Overview and Introduction

(23) Moving Data into a Database

xDB-import.png

XML Big Data E. Wilde: Overview and Introduction

(24) Finding Activities (xDB)

declare default element namespace "http://www.topografix.com/GPX/1/1";

declare variable $files := doc("/");

declare variable $lon := -122.48;
declare variable $lat := 37.82;
declare variable $box := 0.01;

declare variable $lonlower := $lon - $box;
declare variable $lonupper := $lon + $box;
declare variable $latlower := $lat - $box;
declare variable $latupper := $lat + $box;

( "Searching", count($files), "files with", count($files/gpx/trk/trkseg/trkpt), "track points:&#xa;", 

for $activity in $files
    where exists($activity/gpx/trk/trkseg/trkpt[ (@lon > $lonlower) and
                                                 (@lon < $lonupper) and
                                                 (@lat > $latlower) and
                                                 (@lat < $latupper) ])
    return ( substring($activity/gpx/metadata/time/text(),1,10),
             ": ",
             $activity/gpx/trk/name/text(),
             "&#xa;") )


Conclusions

Outline (Conclusions)

  1. Varia [4]
  2. Why/How/Where XML is Useful [5]
  3. Data Formats? Databases? [8]
    1. Alternatives to XML [3]
    2. XML Big Data [5]
  4. Conclusions [1]
Conclusions E. Wilde: Overview and Introduction

(26) XML is a Hammer



2013-09-04 XML Foundations [./]
Fall 2013 — INFO 242 (CCN 41613)