XML Foundations (Fall 2011 — INFO 242

Description: The Extensible Markup Language (XML), with its ability to define formal structural and semantic definitions for metadata and information models, is the key enabling technology for information services and document-centric business models that use the Internet and its family of protocols. This course introduces XML syntax, transformations, schema languages, and the querying of XML databases. It balances conceptual topics with practical skills for designing, implementing, and handling conceptual models as XML schemas.

Date	Subject	Slides	Additional Resources
2011-08-25	Overview and Introduction: The Extensible Markup Language (XML) has been introduced in 1998 to enable content providers to publish their content on the Web in an application-specific format. HTML was considered as conveying not enough semantics, since its only purpose was (and is) the preparation of content for Web-based publishing. XML was the first step towards machine-readable data formats for the Web, a trend that since its invention has been taken to higher levels with the idea of the Semantic Web. XML appeared when the Web was in the steepest part of its success curve, and since then has taken over as the globally accepted format for the exchange of machine-readable structured data. 2011-08-25T09:00 2011-08-25T10:30 205 South Hall, UC Berkeley	Introduction (35 Slides)	XML 1.0 Press Release [http://www.w3.org/Press/1998/XML10-REC] · XML People [http://www.tbray.org/ongoing/When/200x/2008/02/10/XML-People]
2011-08-30	XML Basics: The Extensible Markup Language (XML) defines a simple way for structuring data. The power and popularity of XML can be explained by its versatility, the platform-independence, the standards and technologies leveraging it, and the number of tools and products supporting it. Understanding XML itself is rather simple, it only depends on a very small set of other technologies. Unicode and URIs are the most important foundations of XML. XML itself specifies two different things: on the one hand the format for structured data, which are called XML documents, and on the other hand a constraint language for XML documents, which is called Document Type Definition (DTD). 2011-08-30T09:00 2011-08-30T10:30 205 South Hall, UC Berkeley	Basics (30 Slides)	Spec [http://www.w3.org/TR/REC-xml/] · XML Fever [http://dret.net/netdret/docs/wilde-cacm2008-xml-fever.html]
2011-09-01	Document Type Definition (DTD): The XML specification defines a format for structured data (XML documents) and a grammar-based constraint language for these (DTD). In SGML-based systems, DTDs were often very complex and feature-rich constructs, which controlled a lot of the processing of SGML documents. XML greatly simplified DTDs, and de-facto usage of DTDs today simplified them even more. In many systems today, DTDs are not used at all or generated from sample documents. In this lecture, it is argued that DTDs (or schemas, to be more general) should be taken seriously in any non-trivial XML application, because they are a representation of the underlying (and often underspecified) data model of the application. 2011-09-01T09:00 2011-09-01T10:30 205 South Hall, UC Berkeley	DTD (36 Slides)	XML QuickRef [xml-quickref.pdf]
2011-09-06	XSD – Part I: The XML Schema Definition Language (XSD) is the most popular schema language for XML today. It has been introduced to overcome some of the commonly observed limitations of DTDs, most notably the lack of typing. Simple Types describe content which is not structured by XML markup, which means it describes attribute values and element content. Simple types can be defined by deriving new types from existing types by using type restriction. 2011-09-06T09:00 2011-09-06T10:30 205 South Hall, UC Berkeley	XSD 1 (27 Slides)	XSD QuickRef [xsd-quickref.pdf] · XML Schema [http://www.w3.org/XML/Schema]
2011-09-08	XSD – Part II: XSD Complex Types describe element content if this content is using attributes and/or element content other than only character data. Thus, complex types are used to define the allowed markup structures for a class of documents. Using XSD's type concepts, it is easier to represent model-level information in a schema, because type hierarchies can represent model-level specializations. 2011-09-08T09:00 2011-09-08T10:30 205 South Hall, UC Berkeley	XSD 2 (27 Slides)
2011-09-13	XSD – Part III: XSD allows greater flexibility in defining constraints on intra-document references than the ID/IDREF construct of DTDs. XSD's Identity Constraints are scoped, typed, and can be used for elements or attributes. They are more powerful that the DTD's limited ID/IDREF mechanism, but still lack sufficient generality to support a really wide set of model constraints to be expressed. XSD complex types can be derived by restriction or extension. Complex type restriction defines the restricted type to be a more restricted version of the base type. Complex type extension make it possible to extend the base type by either adding attributes or contents (only by appending new content to the content model). Complex type derivation allows XSD to express type hierarchies of complex types, which can be aligned with more or less specialized code for processing instances of these types. 2011-09-13T09:00 2011-09-13T10:30 205 South Hall, UC Berkeley	XSD 3 (36 Slides)	XSD Identity Constraints [http://www.awprofessional.com/articles/printerfriendly.asp?p=31477&rl=1]
2011-09-15	Processing XML: XML is a format for structured data, but it does not prescribe any way of processing these structures. In practice, XML data has to processed by using XML-specific support in some programming environment. In this lecture, the most popular ways of processing XML data are discussed; the Document Object Model (DOM) as a tree-based data model, the Simple API for XML (SAX) as an event-based programming model, and XSL Transformations (XSLT) as a dedicated programming language for transforming XML. 2011-09-15T09:00 2011-09-15T10:30 205 South Hall, UC Berkeley	Processing XML (19 Slides)	DOM [http://www.w3.org/DOM/] · SAX [http://sax.sourceforge.net/]
2011-09-20	XML Namespaces: XML is successful because it can be used in many different scenarios, and because it is easy to define a schema (such as a DTD) for new scenarios, producing a tailored XML data model for this scenario. This means that names in XML documents must be interpreted as belonging to a certain schema. As long as a document uses names from only one schema, this can be done rather easily. However, in many scenarios today documents combine names from different schemas, and XML Namespaces provide a mechanism how the names in an XML document can be associated with a namespace. 2011-09-20T09:00 2011-09-20T10:30 205 South Hall, UC Berkeley	Namespaces (25 Slides)	XML Namespaces FAQ (Part I) [http://www.rpbourret.com/xml/NamespacesFAQ.htm#p1] · Spec [http://www.w3.org/TR/REC-xml-names/]
2011-09-22	Examples of XML DTDs and XSDs: 2011-09-22T09:00 2011-09-22T10:30 205 South Hall, UC Berkeley	Schema Examples
2011-09-27	XQuery – Part I: The XML Query (XQuery) language has been designed to query collections of XML documents. It is thus different from XSLT, which primarily transforms one document at a time. However, the core of both languages is XPath 2.0, which means that learning XQuery (and XSLT 2.0) is not very hard when starting with a solid knowledge of XPath 2.0. XQuery's main concept is an expression language which supports iteration and binding of variables to intermediate results. XQuery has been built on top of XPath 2.0, which means it uses the same foundation as XSLT 2.0. Both languages have a large overlap, and according to personal preferences and the XML task, one language may be preferred over the other. Features such as user-defined functions and schema-awareness bring XQuery even closer to XSLT 2.0, making the decision to choose one over the other mostly a question of personal preference. 2011-09-27T09:00 2011-09-27T10:30 205 South Hall, UC Berkeley	XQuery – Part I	XQuery/XSLT Comparison [http://www.ibm.com/developerworks/xml/library/x-wxxm34.html]
2011-09-29	XQuery – Part II: 2011-09-29T09:00 2011-09-29T10:30 205 South Hall, UC Berkeley	XQuery – Part II	Spec [http://www.w3.org/TR/xquery] · XQuery QuickRef [xquery-quickref.pdf]
2011-10-04	XML Path Language (XPath): XML structures data into a rather small number of different constructs, most notably elements and attributes. The XML Path Language (XPath) defines a way how to select parts of XML documents, so that they can be used for further processing. XPath's primary use in in XSL Transformations (XSLT) and XQuery, but other XML technologies use it as well, e.g. XSD. XPath is a compact language with a syntax that resembles path expressions well-known from file systems. These path expressions, however, are generalized and therefore more powerful than the rather simple path expressions in file systems. Because of its use in different XML technologies, XPath is one of the most important XML core technologies. 2011-10-04T09:00 2011-10-04T10:30 205 South Hall, UC Berkeley	XPath (36 Slides)	XPath Chapter [xpath-chapter.pdf] · XPath QuickRef [xpath-quickref.pdf]
2011-10-06	XML Path Language (XPath) 2.0: The XML Path Language (XPath) is one of the most useful and frequently used languages in the are of XML technologies. In its version 1.0, it is used in technologies such as XSLT, XSD, DOM, and XML Tools. With XPath 2.0, the language has been greatly extended, the new version of XPath is the foundation for XSLT 2.0 and XQuery. XPath 2.0 provides support for regular expression matching, typed expressions, and contains language constructs for conditional and repeated evaluation. 2011-10-06T09:00 2011-10-06T10:30 205 South Hall, UC Berkeley	XPath 2.0 (31 Slides)	Spec [http://www.w3.org/TR/xpath20] · XPath 2.0 QuickRef [xpath2-quickref.pdf] · XPath 2.0 Functions QuickRef [xpath20-functions-quickref.pdf] · XPath 2.0 RegEx QuickRef [xpath20-regex-quickref.pdf]
2011-10-11	XQuery – Part III: In this continuation of the XPath/XQuery theme, we look at how XPath 2.0 fits into the greater picture of XML technologies, and how XPath 2.0 and XQuery 1.0 provide language constructs that go way beyonf the rather limited facilities of XPath 1.0. 2011-10-11T09:00 2011-10-11T10:30 205 South Hall, UC Berkeley	XQuery – Part III (22 Slides)
2011-10-13	XQuery – Part IV: This last part of the XQuery lectures covers some additional XQuery topics and briefly looks into how to create user-defined functions. After that, XML linking methods are considered, specifically `xml:id`, XML Inclusions (XInclude), and the XML Linking Language (XLink). These methods are used as a starting point to look at potential XML processing in XQuery, specifically, how to use XQuery code to identify and dereference the links that are used in the XInclude vocabulary. 2011-10-13T09:00 2011-10-13T10:30 205 South Hall, UC Berkeley	XQuery – Part IV (27 Slides)	xml:id [http://www.w3.org/TR/xml-id/] · XInclude [http://www.w3.org/TR/xinclude/] · XLink [http://www.w3.org/TR/xlink/]
2011-10-18	From Model to Markup: XML is very useful for representing and manipulating structured data, but the basic question remains where these structures come from. They are usually some kind of encoding for a conceptual model, but there is no established and universally accepted way of how to connect the modeling world with XML markup. Some of the challenges and approaches to XML and modeling will be presented in this lecture. The goal of this lecture is to raise awareness for the current gap between models and markup, and for practical approaches how to bridge that gap. 2011-10-18T09:00 2011-10-18T10:30 205 South Hall, UC Berkeley	Modeling (19 Slides)	Document Design Matters [http://dret.net/netdret/docs/wilde-cacm2008-document-design-matters]
2011-10-20	XML Transformations (XSLT) – Part I: Because XML can be used to represent any vocabulary (often defined by some schema), the question is how these different vocabularies can be processed and maybe transformed into something else. This something else may be another XML vocabulary (a common requirement in B2B scenarios), or it may be HTML (a common scenario for Web publishing). Using XSL Transformations (XSLT), mapping tasks can be implemented easily. XSLT leverages XPath's expressive power in a rather simple programming language, the programs are often called stylesheets. For easy tasks, XSLT mappings can be specified without much real programming going on, by simply specifying how components of the source markup are mapped to components of the target markup. 2011-10-20T09:00 2011-10-20T10:30 205 South Hall, UC Berkeley	XSLT 1 (22 Slides)	Spec [http://www.w3.org/TR/xslt] · XSLT/XPath QuickRef [xslt-quickref.pdf]
2011-10-25	XML Transformations (XSLT) – Part II: XSLT processes documents by matching nodes in the document tree to templates, which then are executed to process these nodes. This process of matching and executing templates is the core of XSLT's processing model. XSLT has built-in templates which complement the user-supplied templates, so that the XSLT processor always finds a template to execute. Templates can conflict, and it is then necessary to resolve this conflict by finding the best match of all matching templates. This conflict resolution process also is a very important component of the XSLT processing model. 2011-10-25T09:00 2011-10-25T10:30 205 South Hall, UC Berkeley	XSLT 2 (25 Slides)
2011-10-27	XML Transformations (XSLT) – Part III: XSLT's template matching mechanism lets the XSLT processor find the best match to process a selected node. XSLT also supports a more traditional way of using templates, where they are called in a way very similar for function calls in most programming languages. Another interesting area of XSLT are variables and parameters, which are used for storing or passing values within XSLT code. One special property of XSLT variables is that they cannot be changed, which is a result of the functional design of the language. 2011-10-27T09:00 2011-10-27T10:30 205 South Hall, UC Berkeley	XSLT 3 (24 Slides)	XSLT Parameters [http://www-128.ibm.com/developerworks/xml/library/x-tipxsltrun/]
2011-11-01	XProc: XProc is a language for describing operations to be performed on XML documents. An XML Pipeline specifies a sequence of operations to be performed on zero or more XML documents. Pipelines generally accept zero or more XML documents as input and produce zero or more XML documents as output. Pipelines are made up of simple steps which perform atomic operations on XML documents and constructs similar to conditionals, iteration, and exception handlers which control which steps are executed. 2011-11-01T09:00 2011-11-01T10:30 205 South Hall, UC Berkeley	XProc	Spec [http://www.w3.org/TR/xproc] · EMC Community Network: XML Technologies [https://community.emc.com/community/edn/xmltech]
2011-11-03	XML Transformations (XSLT) – Part IV: Advanced XSLT processing includes better control of the input and output documents, which can be finely controlled in terms of how whitespace is treated. Another interesting feature of XSLT are keys, which allow shorthand notations for frequently used access paths to nodes, and provide XSLT processors with more information for performance optimizations. Instructions for creating all possible kinds of nodes in the output tree make it possible to write code which generates element or attribute names based on runtime evaluations. 2011-11-03T09:00 2011-11-03T10:30 205 South Hall, UC Berkeley	XSLT 4 (26 Slides)
2011-11-08	XML Transformations (XSLT) 2.0 – Part I: While XML Transformations (XSLT) 1.0 has become a successful programming language widely used for transforming XML documents, its limitations sometimes make it difficult to use XSLT in a good way. An important reason for many of the limitations is the fact that XSLT 1.0 has been designed as a client-side language. Building on XSLT 1.0 and XPath 2.0, XML Transformations (XSLT) 2.0 improves the language in a variety of ways. 2011-11-08T09:00 2011-11-08T10:30 205 South Hall, UC Berkeley	XSLT 2.0 1 (27 Slides)	Spec [http://www.w3.org/TR/xslt20/] · XSLT 2.0 QuickRef [xslt2-quickref.pdf]
2011-11-10	XML Transformations (XSLT) 2.0 – Part II: Many of the new features of XSLT 2.0 have their roots in XPath 2.0 and the underlying new data model of sequences. But some features of XSLT 2.0 really are part of the language itself, such as support for user-defined functions, and the ability to group items and then iterate over these groups. In addition, XSLT now can be used as a typed programming language, which consumes and produces typed trees instead of just well-formed XML trees. 2011-11-10T09:00 2011-11-10T10:30 205 South Hall, UC Berkeley	XSLT 2.0 2 (17 Slides)	Reevaluating XSLT 2.0 [http://www.oreillynet.com/xml/blog/2007/03/reevaluating_xslt_20.html]

XML Foundations

INFO 242 (CCN 42596) — Fall 2011
School of Information, UC Berkeley

Instructor: Ray Larson
Secondary Instructors: Jeroen van Rotterdam and Erik Wilde
TA: Yiming Liu

Lecture: Tue&Thu 9.00–10.30, 205 South Hall

XML Foundations

INFO 242 (CCN 42596) — Fall 2011School of Information, UC Berkeley

Instructor: Ray Larson Secondary Instructors: Jeroen van Rotterdam and Erik Wilde TA: Yiming Liu

Lecture: Tue&Thu 9.00–10.30, 205 South Hall

INFO 242 (CCN 42596) — Fall 2011
School of Information, UC Berkeley

Instructor: Ray Larson
Secondary Instructors: Jeroen van Rotterdam and Erik Wilde
TA: Yiming Liu