XML Foundations

INFO 242 (CCN 41613) — Fall 2013
School of Information, UC Berkeley

Instructor: Erik Wilde

Lecture: Mon&Wed 2:00–3.30, 202 South Hall

Description: The Extensible Markup Language (XML), with its ability to define formal structural and semantic definitions for metadata and information models, is the key enabling technology for information services and document-centric business models that use the Internet and its family of protocols. This course introduces XML syntax, transformations, schema languages, and the querying of XML databases. It balances conceptual topics with practical skills for designing, implementing, and handling conceptual models as XML schemas.

Date Subject Slides Additional Resources
2013-09-04 Overview and Introduction: The Extensible Markup Language (XML) has been introduced in 1998 to enable content providers to publish their content on the Web in an application-specific format. HTML was considered as conveying not enough semantics, since its only purpose was (and is) the preparation of content for Web-based publishing. XML was the first step towards machine-readable data formats for the Web, a trend that since its invention has been taken to higher levels with the idea of the Semantic Web. XML appeared when the Web was in the steepest part of its success curve, and since then has taken over as the globally accepted format for the exchange of machine-readable structured data.
2013-09-04T09:00 2013-09-04T10:30 205 South Hall, UC Berkeley
Introduction (26 Slides) XML 1.0 Press Release [http://www.w3.org/Press/1998/XML10-REC] · XML People [http://www.tbray.org/ongoing/When/200x/2008/02/10/XML-People]
2013-09-09 XML Basics: The Extensible Markup Language (XML) defines a simple way for structuring data. The power and popularity of XML can be explained by its versatility, the platform-independence, the standards and technologies leveraging it, and the number of tools and products supporting it. Understanding XML itself is rather simple, as it only depends on a very small set of other technologies. Unicode is the most important foundation of XML. XML itself specifies two different things: on the one hand the format for structured data, which are called XML documents, and on the other hand a constraint language for XML documents, which is called Document Type Definition (DTD).
2013-09-09T09:00 2013-09-09T10:30 205 South Hall, UC Berkeley
Basics (26 Slides) XML 1.0 Spec [http://www.w3.org/TR/REC-xml/] · XML Fever [http://dret.net/netdret/docs/wilde-cacm2008-xml-fever.html]
2013-09-11 Document Type Definition (DTD): The XML specification defines a format for structured data (XML documents) and a grammar-based constraint language for these (DTD). In SGML-based systems, DTDs were often very complex and feature-rich constructs, which controlled a lot of the processing of SGML documents. XML greatly simplified DTDs, and de-facto usage of DTDs today simplified them even more. In many systems today, DTDs are not used at all or generated from sample documents. In this lecture, it is argued that DTDs (or schemas, to be more general) should be taken seriously in any non-trivial XML application, because they are a representation of the underlying (and often underspecified) data model of the application.
2013-09-11T09:00 2013-09-11T10:30 205 South Hall, UC Berkeley
DTD (36 Slides)
2013-09-16 XML Path Language (XPath): XML structures data into a rather small number of different constructs, most notably elements and attributes. The XML Path Language (XPath) defines a way how to select parts of XML documents, so that they can be used for further processing. XPath's primary use is in XSL Transformations (XSLT), but other XML technologies use it as well, e.g. XSD. XPath is a very compact language with a syntax that resembles path expressions well-known from file systems. These path expressions, however, are generalized and therefore much more powerful than the rather simple path expressions in file systems. Because of its use in different XML technologies, XPath is one of the most important XML core technologies.
2013-09-16T09:00 2013-09-16T10:30 205 South Hall, UC Berkeley
XPath (35 Slides) XPath 1.0 Spec [http://www.w3.org/TR/xpath] · XPath Chapter [xpath-chapter.pdf]
2013-09-18 XML Path Language (XPath) 2.0: The XML Path Language (XPath) is one of the most useful and frequently used languages in the area of XML technologies. In its version 1.0, it is used in technologies such as XSLT, XSD, DOM, and XML Tools. With XPath 2.0, the language has been greatly extended, the new version of XPath is the foundation for XSLT 2.0 and XQuery. XPath 2.0 provides support for regular expression matching, typed expressions, and contains language constructs for conditional and repeated evaluation.
2013-09-18T09:00 2013-09-18T10:30 205 South Hall, UC Berkeley
XPath 2.0 (35 Slides) XPath 2.0 Spec [http://www.w3.org/TR/xpath20]
2013-09-23 Guest Lecture by Eric Kansa : XML in Application: Eric Kansa will discuss "open data" and some implementation strategies for realizing the goals of the open data movement. He will be drawing examples from Open Context, a research data sharing system focusing on archaeology. The discussion will explore pragmatic approaches to using XML and other structured data formats for publishing data on the Web to serve different communities with different levels of technical skills and capabilities.
2013-09-23T09:00 2013-09-23T10:30 205 South Hall, UC Berkeley
XML in Application Open Context [http://opencontext.org/]
2013-09-25 XML Namespaces: XML is successful because it can be used in many different scenarios, and because it is easy to define a schema (such as a DTD) for new scenarios, producing a tailored XML data model for this scenario. This means that names in XML documents must be interpreted as belonging to a certain schema. As long as a document uses names from only one schema, this can be done rather easily. However, in many scenarios today documents combine names from different schemas, and XML Namespaces provide a mechanism how the names in an XML document can be associated with a namespace.
2013-09-25T09:00 2013-09-25T10:30 205 South Hall, UC Berkeley
Namespaces (23 Slides) XML Namespaces FAQ (Part I) [http://www.rpbourret.com/xml/NamespacesFAQ.htm#p1] · XML Namespaces Spec [http://www.w3.org/TR/REC-xml-names/]
2013-09-30 XML Schema (XSD) – Part I: The XML Schema Definition Language (XSD) is the most popular schema language for XML today. It has been introduced to overcome some of the commonly observed limitations of DTDs, most notably the lack of typing. Simple Types describe content which is not structured by XML markup, which means it describes attribute values and element content. Simple types can be defined by deriving new types from existing types by using type restriction.
2013-09-30T09:00 2013-09-30T10:30 205 South Hall, UC Berkeley
XSD 1 (27 Slides) XML Schema [http://www.w3.org/XML/Schema]
2013-10-02 XML Schema (XSD) – Part II: XSD Complex Types describe element content if this content is using attributes and/or element content other than only character data. Thus, complex types are used to define the allowed markup structures for a class of documents. Using XSD's type concepts, it is easier to represent model-level information in a schema, because type hierarchies can represent model-level specializations.
2013-10-02T09:00 2013-10-02T10:30 205 South Hall, UC Berkeley
XSD 2 (36 Slides)
2013-10-07 XML Schema (XSD) – Part III: XSD allows greater flexibility in defining constraints on intra-document references than the ID/IDREF construct of DTDs. XSD's Identity Constraints are scoped, typed, and can be used for elements or attributes. They are more powerful that the DTD's limited ID/IDREF mechanism, but still lack sufficient generality to support a really wide set of model constraints to be expressed. XSD complex types can be derived by restriction or extension. Complex type restriction defines the restricted type to be a more restricted version of the base type. Complex type extension make it possible to extend the base type by either adding attributes or contents (only by appending new content to the content model). Complex type derivation allows XSD to express type hierarchies of complex types, which can be aligned with more or less specialized code for processing instances of these types.
2013-10-07T09:00 2013-10-07T10:30 205 South Hall, UC Berkeley
XSD 3 (27 Slides) XSD Identity Constraints [http://www.awprofessional.com/articles/printerfriendly.asp?p=31477&rl=1]
2013-10-09 Project Presentation Preparation: This slot is reserved for the project group to meet at prepare their project presentations.
2013-10-09T09:00 2013-10-09T10:30 205 South Hall, UC Berkeley
Project Presentation Preparation
2013-10-14 Project Presentations: This slot is reserved for the project presentations, with each group getting an opportunity to present.
2013-10-14T09:00 2013-10-14T10:30 205 South Hall, UC Berkeley
Project Presentations
2013-10-16 Project Presentation Feedback: This slot is reserved for the project groups to incorporate the feedback from the project presentations.
2013-10-16T09:00 2013-10-16T10:30 205 South Hall, UC Berkeley
Project Presentation Feedback
2013-10-21 XML Transformations (XSLT) – Part I: Because XML can be used to represent any vocabulary (often defined by some schema), the question is how these different vocabularies can be processed and maybe transformed into something else. This something else may be another XML vocabulary (a common requirement in B2B scenarios), or it may be HTML (a common scenario for Web publishing). Using XSL Transformations (XSLT), mapping tasks can be implemented easily. XSLT leverages XPath's expressive power in a rather simple programming language, the programs are often called stylesheets. For easy tasks, XSLT mappings can be specified without much real programming going on, by simply specifying how components of the source markup are mapped to components of the target markup.
2013-10-21T09:00 2013-10-21T10:30 205 South Hall, UC Berkeley
XSLT 1 (27 Slides) XSLT 1.0 Spec [http://www.w3.org/TR/xslt]
2013-10-23 XML Transformations (XSLT) – Part II: XSLT's template matching mechanism lets the XSLT processor find the best match to process a selected node. XSLT also supports a more traditional way of using templates, where they are called in a way very similar for function calls in most programming languages. Another interesting area of XSLT are variables and parameters, which are used for storing or passing values within XSLT code. One special property of XSLT variables is that they cannot be changed, which is a result of the functional design of the language.
2013-10-23T09:00 2013-10-23T10:30 205 South Hall, UC Berkeley
XSLT 2 (29 Slides) XSLT Parameters [http://www-128.ibm.com/developerworks/xml/library/x-tipxsltrun/]
2013-10-28 XML Transformations (XSLT) 2.0 – Part I: While XML Transformations (XSLT) 1.0 has become a successful programming language widely used for transforming XML documents, its limitations sometimes make it difficult to use XSLT in a good way. An important reason for many of the limitations is the fact that XSLT 1.0 has been designed as a client-side language. Building on XSLT 1.0 and XPath 2.0, XML Transformations (XSLT) 2.0 improves the language in a variety of ways.
2013-10-28T09:00 2013-10-28T10:30 205 South Hall, UC Berkeley
XSLT 2.0 1 (27 Slides) XSLT 2.0 Spec [http://www.w3.org/TR/xslt20/]
2013-10-30 XML Transformations (XSLT) 2.0 – Part II: Many of the new features of XSLT 2.0 have their roots in XPath 2.0 and the underlying new data model of sequences. But some features of XSLT 2.0 really are part of the language itself, such as support for user-defined functions, and the ability to group items and then iterate over these groups. In addition, XSLT now can be used as a typed programming language, which consumes and produces typed trees instead of just well-formed XML trees.
2013-10-30T09:00 2013-10-30T10:30 205 South Hall, UC Berkeley
XSLT 2.0 2 (16 Slides) Reevaluating XSLT 2.0 [http://www.oreillynet.com/xml/blog/2007/03/reevaluating_xslt_20.html]
2013-11-13 Representational State Transfer (REST): Representational State Transfer (REST) is defined as an architectural style, which means that it is not a concrete systems architecture, but instead a set of constraints that are applied when designing a systems architecture. We briefly discuss these constraints, but then focus on explaining how the Web is one such systems architecture that implements REST. In particular, the mechanisms of the Uniform Resource Identifiers (URIs), the Hypertext Transfer Protocol (HTTP), media types, and markup languages such as the Hypertext Markup Language (HTML) and the Extensible Markup Language (XML). We also introduce Atom and the Atom Publishing Protocol (AtomPub) as two established ways on how RESTful services are already provided and used on today's Web.
2013-11-13T09:00 2013-11-13T10:30 205 South Hall, UC Berkeley
REST (36 Slides) Fielding Dissertation [http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm] · REST Paper [http://portal.acm.org/citation.cfm?doid=337180.337228]
2013-11-18 Atom Syndication Format: REST's level of abstraction and its simplicity as a small set of constraints can make it hard to get a grasp on how it can be applied for real-world projects. This presentations introduces real-world REST by looking at how REST can be used by reusing existing RESTful designs in terms of representations and interaction protocols; Atom and the Atom Publishing Protocol (AtomPub) are used as examples for existing RESTful designs. In addition, we take a brief look at how to go beyond using these canned REST approaches, and how existing programming framework provide support for designing and implementing RESTful services.
2013-11-18T09:00 2013-11-18T10:30 205 South Hall, UC Berkeley
Atom (28 Slides) Atom Spec [http://tools.ietf.org/html/rfc4287] · AtomPub Spec [http://tools.ietf.org/html/rfc5032] · Feed Validator [http://validator.w3.org/feed/]
2013-11-20 XML Query (XQuery) – Part I: The XML Query (XQuery) language has been designed to query collections of XML documents. It is thus different from XSLT, which primarily transforms one document at a time. However, the core of both languages is XPath 2.0, which means that learning XQuery (and XSLT 2.0) is not very hard when starting with a solid knowledge of XPath 2.0. XQuery's main concept is an expression language which supports iteration and binding of variables to intermediate results. XQuery has been built on top of XPath 2.0, which means it uses the same foundation as XSLT 2.0. Both languages have a large overlap, and according to personal preferences and the XML task, one language may be preferred over the other. Features such as user-defined functions and schema-awareness bring XQuery even closer to XSLT 2.0, making the decision to choose one over the other mostly a question of personal preference.
2013-11-20T09:00 2013-11-20T10:30 205 South Hall, UC Berkeley
XQuery 1 (34 Slides) XQuery Spec [http://www.w3.org/TR/xquery] · XQuery/XSLT Comparison [http://www.ibm.com/developerworks/xml/library/x-wxxm34.html]
2013-11-25 Setting Up xDB: XML databases are not fundamentally different from other database engines: They provide support to store and query potentially large amounts of data. xDB is EMC's XML Database product, and is implemented in Java. It therefore can be used on any computer with Java runtime support. xDB has a database server, which has to be started and then waits for incoming connection requests. xDB also has a Java client which can be used to connect to the server, and then provides administrative as well as user functionality. In this lecture, we will install xDB on all student computers, and ingest some XML data into an XML database managed by xDB.
2013-11-25T09:00 2013-11-25T10:30 205 South Hall, UC Berkeley
xDB (2 Slides) EMC XML Home [https://community.emc.com/community/edn/xmltech/content] · xDB Manuals [https://community.emc.com/docs/DOC-7711]
2013-12-02 XML Query (XQuery) – Part II: XML and relational databases are not entirely different things: they share the basic model of providing support for storing and processing (potentially large amounts of) data. SQL/XML is an approach to bridge the two worlds of relational and XML databases, by allowing relational database to produce XML, and even to store and query XML. In the second half of this lecture we are taking a second look at XQuery, specifically its processing model and the way how queries are specifying the input documents to XQuery expressions.
2013-12-02T09:00 2013-12-02T10:30 205 South Hall, UC Berkeley
XQuery 2 (36 Slides) SQL/XML [http://en.wikipedia.org/wiki/SQL/XML]
2013-12-04 RELAX NG: Schema languages as a general concept in XML are used to (1) prescribe the allowed document structure, and/or (2) validate a document against a description of what is allowed in a document and what isn't. DTDs and XSDs are particularly important schema languages because DTDs are part of XML itself, and XSD was the first major improvement of the rather limited capabilities of DTDs. Recently, however, XSD is increasingly criticized for its complexity, and the RELAX NG schema language is gaining popularity instead. RELAX NG is a grammar-based schema language (like DTD and XSD), but adds a human-friendly syntax, the ability to use datatypes, and it removes the ability to allow validation to change a document.
2013-12-04T09:00 2013-12-04T10:30 205 South Hall, UC Berkeley
RELAX NG (12 Slides) RELAX NG Home Page [http://relaxng.org/] · The Design of RELAX NG [http://www.thaiopensource.com/relaxng/design.html]
2013-12-09 XML Varia: The first half of the lecture compares XML to alternatives, that also are used as ways to represent and/or manage and/or exchange and/or process data. The most relevant approaches in this space are RDF, JSON, and tabular/relational models such as SQL or NoSQL. One of the reasons why using XML-based approaches for data representation, management, interchange, and processing, is that there is a large landscape of existing standards and technologies and tools, and that for many problems it thus is possible to approach the problem by reusing existing solutions. In the second half of this lecture, we look at a small set of additional standards that were not yet covered.
2013-12-09T09:00 2013-12-09T10:30 205 South Hall, UC Berkeley
XML Varia (37 Slides) Data, Models, Metamodels, Cosmologies [http://dret.typepad.com/dretblog/2009/08/data-models-metamodels-cosmologies.html] · xml:id [http://www.w3.org/TR/xml-id/] · XInclude [http://www.w3.org/TR/xinclude/]
2013-12-11 Course Summary: Q&A with a short discussion of the course topics, followed by questions about topics, standards, technologies, and exam issues.
2013-12-11T09:00 2013-12-11T10:30 205 South Hall, UC Berkeley
Summary (4 Slides)
Show Abstracts
Hide Abstracts
Creative Commons License Please send comments to dret@berkeley.edu
Last modification on Tuesday, 27-Jan-2015 00:56:57 CET
valid CSS! valid XHTML 1.0!