XML Foundations

UCB iSchool INFOSYS 242 (2 units)

Instructor: Erik Wilde
TA: Katrina Rhoads Lindholm

Lecture: Tu & Th 14.00–15.30, 110 South Hall
Lab: Mo 12.30–14.00, 210 South Hall

Description: Three hours of lecture, one hour of laboratory per week. The Extensible Markup Language (XML), with its ability to define formal structural and semantic definitions for metadata and information models, is the key enabling technology for information services and document-centric business models that use the Internet and its family of protocols. This course introduces XML syntax, styles and transformations, and schema languages. It balances conceptual topics with practical skills for designing and implementing conceptual models as XML schemas.

Date Subject Slides Required Reading Resources
Tuesday, August 29, 2006 Overview and Introduction: The Extensible Markup Language (XML) has been introduced in 1998 to enable content providers to publish their content on the Web in an application-specific format. HTML was considered as conveying not enough semantics, since its only purpose was (and is) the preparation of content for Web-based publishing. XML was the first step towards machine-readable data formats for the Web, a trend that since its invention has been taken to higher levels with the idea of the Semantic Web. XML appeared when the Web was in the steepest part of its success curve, and since then has taken over as the globally accepted format for the exchange of machine-readable structured data. Introduction (41 Slides) XML 1.0 Press Release
Thursday, August 31, 2006 XML Basics: The Extensible Markup Language (XML) defines a simple way for structuring data. The power and popularity of XML can be explained by its versatility, the platform-independence, the standards and technologies leveraging it, and the number of tools and products supporting it. Understanding XML itself is rather simple, it only depends on a very small set of other technologies. Unicode and URIs are the most important foundations of XML. XML itself specifies two different things: on the one hand the format for structured data, which are called XML documents, and on the other hand a constraint language for XML documents, which is called Document Type Definition (DTD). Basics (31 Slides) Chapters 1.3 (pp. 16-28) & 2.1-2.4 (pp. 49-66) W3C's XML Specification
Tuesday, September 5, 2006 Document Type Definition (DTD): The XML specification defines a format for structured data (XML documents) and a grammar-based constraint language for these (DTD). In SGML-based systems, DTDs were often very complex and feature-rich constructs, which controlled a lot of the processing of SGML documents. XML greatly simplified DTDs, and de-facto usage of DTDs today simplified them even more. In many systems today, DTDs are not used at all or generated from sample documents. In this lecture, it is argued that DTDs (or schemas, to be more general) should be taken seriously in any non-trivial XML application, because they are a representation of the underlying (and often underspecified) data model of the application. DTD (38 Slides) Chapter 4-4.2 (pp. 108-132) XML QuickRef
Thursday, September 7, 2006 The Good, the Bad, and the Ugly: While XML it rather easy to understand and use, it is also rather easy to use XML in ways which either produce ugly XML, or which may lead to problems in components further processing the XML. The topic of this lecture thus is to look at design guidelines for XML schemas, leading to good XML. Some of the simpler topics cover basic questions of how to map a data model to XML markup (e.g., when to use elements or attributes). The next question is how data should be represented in XML so that applications can process it efficiently. We also look at what part of the markup an application will actually have access to, and this is defined by the XML Information Set (Infoset), the specification underlying many XML technologies. Best Practices (32 Slides) Chapter 3-3.4 (pp. 18-25) On XML Language Design
Tuesday, September 12, 2006 Cascading Style Sheets (CSS): Cascading Stylesheets (CSS) have been designed as a language for better separating presentation-specific issues from the structuring of documents as provided by HTML. However, CSS can be applied to XML as well, either directly (by applying a CSS stylesheet to an XML document), or as an supplement to basic HTML layout structures generated from an XML document. CSS uses a simple model of selectors and declarations. Selectors specify to which elements of a document a set of declarations (each being a value assigned to a property) apply; in addition there is a model of how property values are inherited and cascaded. The biggest limitation of CSS is that it cannot change the structure of the displayed document. CSS (39 Slides) Chapter 5 (pp. 164-204) W3C CSS Specs; W3C CSS Validator
Thursday, September 14, 2006 XML Namespaces: XML is successful because it can be used in many different scenarios, and because it is easy to define a schema (such as a DTD) for new scenarios, producing a tailored XML data model for this scenario. This means that names in XML documents must be interpreted as belonging to a certain schema. As long as a document uses names from only one schema, this can be done rather easily. However, in many scenarios today documents combine names from different schemas, and XML Namespaces provide a mechanism how the names in an XML document can be associated with a namespace. Namespaces (27 Slides) XML Namespaces FAQ (Part I) W3C's XML Namespaces Specification
Tuesday, September 19, 2006 XML Path Language (XPath): XML structures data into a rather small number of different constructs, most notably elements and attributes. The XML Path Language (XPath) defines a way how to select parts of XML documents, so that they can be used for further processing. XPath's primary use in in XSL Transformations (XSLT), but other XML technologies use it as well, e.g. XML Schema. XPath is a very compact language with a syntax that resembles the path expressions which are well-known from file systems. These path expressions, however, are generalized and therefore much more powerful than the rather simple path expressions in file systems. Because of its use in different XML technologies, XPath is one of the most important XML core technologies. XPath (46 Slides) XPath Chapter XPath QuickRef
Thursday, September 21, 2006 XML Transformations (XSLT) — Part I: Because XML can be used to represent any vocabulary (often defined by some schema), the question is how these different vocabularies can be processed and maybe transformed into something else. This something else maybe another XML vocabulary (a common requirement in B2B scenarios), or it may be HTML (a common scenario for Web publishing). Using XSL Transformations (XSLT), mapping tasks can be implemented easily. XSLT leverages XPath's expressive power in a rather simple programming language. For easy tasks, XSLT mapping can be specified without much real programming going on, by simply specifying how components of the source markup are mapped to components of the target markup. XSLT 1 (21 Slides)
Tuesday, September 26, 2006 XML Transformations (XSLT) — Part II: XSLT processes documents by matching nodes in the document tree to templates, which then are executed to process these nodes. This process of matching and executing templates is the core of XSLT's processing model. XSLT has built-in templates which complement the user-supplied templates, so that the XSLT processor always finds a template to execute. Templates can conflict, and it is then necessary to resolve this conflict by finding the best match of all matching templates. This conflict resolution process also is a very important component of the XSLT processing model. XSLT 2 (33 Slides)
Thursday, September 28, 2006 XML Transformations (XSLT) — Part III: Advanced XSLT processing includes better control of the input and output documents, which can finely controlled in terms of how whitespace is treated. Another interesting feature of XSLT are keys, which allow shorthand notations for frequently used access paths to nodes, and provide XSLT processors with more information for performance optimizations. Instructions for creating all possible kinds of nodes in the output tree make it possible to write code which generates element or attribute names based on runtime evaluations. XSLT 3 (39 Slides) XSLT Parameters
Tuesday, October 3, 2006 XML Schema — Part I: XML Schema is the most popular schema language for XML today. It has been introduced to overcome some of the commonly observed limitations of DTDs, most notably the lack of typing. Simple Types describe content which is not structured by XML markup, which means it describes attribute values and element content. Simple types can be defined by deriving new types from existing types by using type restriction. Complex Types describe element content if this content is using attributes and/or element content other than only character data. Using XML Schema's type concepts, it is easier to represent model-level information in a schema, because type hierarchies can represent model-level specializations. XSD 1 (34 Slides) Chapters 4.3 & 4.4 (pp. 132-159) XML Schema QuickRef
Thursday, October 5, 2006 XML Schema — Part II: XML Schema allows greater flexibility in defining constraints on intra-document references than the ID/IDREF construct of DTDs. XML Schema's Identity Constraints are scoped, typed, and can be used for elements or attributes. The second aspect of XML Schema discussed today is the derivation of complex types. Complex types can be derived by restriction or extension. Complex type restriction defines the restricted type to be a more restricted version of the base type. Complex type extension make it possible to extend the base type by either adding attributes or contents (only by appending new content to the content model). XSD 2 (35 Slides) XML Schema Identity Constraints
Tuesday, October 10, 2006 From Model to Markup: While XML is very useful for representing and manipulating structured data, the question remains where these structures come from. They are usually some kind of encoding for a conceptual model, but there is no established and universally accepted way of how to connect the modeling world with XML markup. Some of the challenges and approaches to XML and modeling will be presented in this lecture. The goal of this lecture is to raise awareness for the current gap between models and markup, and for practical approaches how to bridge that gap. Modeling (43 Slides)
Tuesday, October 17, 2006 Alternative Schema Languages — Schematron: While XML Schema is the most popular schema language in use today and for the foreseeable future, it is only one representative from a class of languages which are all designed for the purpose of testing whether some XML document satisfies a set of constraints. This test could of course also be conducted programmatically, but this is not portable and not easily maintainable. Schema languages thus often use a declarative approach to specifying how to conduct validation. A very simple yet very powerful language for this is Schematron, which uses the expressive power of XPath for testing whether a document satisfies a set of conditions. Schematron is rule-based in contrast to the more traditional grammar-based schema languages and complements these very well. Schema Languages (38 Slides) Chapter 4.5 (159-163) The Design of RELAX NG; Schematron
Thursday, October 19, 2006 XML and Database Systems: XML is the most popular data format for exchanging data, but the majority of data within applications and closed systems is still stored in Relational Database Managements Systems (RDBMS). This leads to two main issues, the first one being how moving data between XML formats and RDBMS can be done easily and efficiently, so that moving data between these two worlds can be done as easy as possible. The second issue is how to map the data models between these two worlds. Relational data can easily be represented in XML, because tables can be easily represented in trees. Things can be more complicated in the other direction, because arbitrary XML can be hard to store in a relational database. For XML-centric scenarios, XML Database Management Systems (XDBMS) are an interesting alternative, which provide XML-specific query capabilities with XML Query (XQuery). XML & DB (39 Slides) XML and Databases XML Programming with SQL/XML and XQuery; SQL/XML; XML Query
Tuesday, October 24, 2006 XML Trends & Developments: XML is a very basic technology for representing trees using a standardized markup-based syntax. An increasing number of technologies are building on this foundation, creating an expanding field of XML-based technologies for interoperability in many different fields. Application-specific XML-based data formats are used in many different settings, and the best data format for a given scenario depends on the existing formats in this area and the exact requirements. More interestingly, generic XML technologies which can be applied in many different settings make it easier for developers and system integrators to achieve their goal of making system interoperate. XML Trends (27 Slides) W3C XML Activity Statement
Show Descriptions
Hide Descriptions

Creative Commons License please send comments to dret@berkeley.edu
last modification on Tuesday, 23-Jan-2007 18:15:52 EST
valid CSS! valid XHTML 1.0!