E. Wilde: Content Management System (CMS)

(3) Content on the Web

Web technologies describe how to get content into browsers
- HTML [Hypertext Markup Language (HTML)] is the universally supported representation for content
- HTTP [Web Foundations (URI & HTTP); Hypertext Transfer Protocol (HTTP) (1)] allows browsers to GET information from servers
Resources are never transmitted or displayed
- browsers only display resource representations
- how a representation is produced is entirely up to the server
Managing resources and producing representations is a core Web task
- resource management often is done using some proprietary system
- mapping resources to representations should be done by rules
- in many scenarios, resources are content units
A Content Management System (CMS) manages any kind of content
- and it does not necessarily provide Web access
A Web Content Management System (WCMS) has Web-specific functionality
- support for Web representations (HTML & CSS)
- support for Web patterns (navigation bars)

Content on the Web

Outline (Content on the Web)

Content on the Web [2]
Content in CMS [11]
Management in CMS [3]
System in CMS [3]
Conclusions [1]

Content on the Web E. Wilde: Content Management System (CMS)

(5) Content and Structure

Content is what matters most
- content itself often has some internal structure
- content may explicitly link to other content
Macro-structure often is more representation than content
- displaying the current context of the content
- displaying related content
- displaying some overall structure (site navigation)
Content often is reusable across application scenarios
- publishers have used Content Management Systems for a long time
- adding a new publication channel to a CMS should be evolutionary
Content may require very different support depending on the channel
- newspapers need fine-tuned layout and good control over content size
- Web needs good interlinking and navigation

Content on the Web E. Wilde: Content Management System (CMS)

(6) CMS Evolution

Web servers reading from files
Web servers implementing primitive content management (SSI)
Scripting languages implementing better management
Management code getting hooked up to databases
Better handling of client-specific behavior
Databases getting more diverse (RDB, XML, RDF)

Content in CMS

Outline (Content in CMS)

Content on the Web [2]
Content in CMS [11]
Management in CMS [3]
System in CMS [3]
Conclusions [1]

Content in CMS E. Wilde: Content Management System (CMS)

(8) Serving Content from Files

Content in CMS E. Wilde: Content Management System (CMS)

(9) The Rise of the CMS

File-based content management works well for small sites
- simple site structure and small number of files
- redundant parts can be manually synchronized
- no software is required other than a Web server
Web servers soon developed rudimentary CMS functions (SSI [http://httpd.apache.org/docs/2.2/howto/ssi.html])
- rudimentary support is better than no support
- managing a non-trivial setup with SSI still is a challenge
- SSI allows includes but no backlinks and thus hides dependencies
Content management is very similar to code management
- simple setups require no or little tool support
- serious projects need tools to manage dependencies and changes

Content in CMS E. Wilde: Content Management System (CMS)

(10) Serving Content from Files with SSI

Content in CMS E. Wilde: Content Management System (CMS)

(11) Files (Opaque Chunks)

All major operating systems have file systems
Files are typically treated as opaque chunks of data
Applications may have special knowledge of file contents
Advantages of files:
- universally supported across major operating systems
- storage and user management comes for free
- all that is needed for a Web site is a Web server
Disadvantages of files:
- content access requires file system access
- setting up parallel servers requires additional effort
- no support for managing structure, everything handcoded

Content in CMS E. Wilde: Content Management System (CMS)

(12) File Systems are Databases

A file system is a simple hierarchical database
- it does not know data types and simply stores any content
- its structure is a tree with a few extra tricks (such as symlinks)
Many scenarios have much more structured data models
- products, people, financial institutions all have complex data models
- content should be stored and queried based on these models
Databases are better optimized for storing structured content
- better methods for structured storage and retrieval
- better strategies for managing large datasets
- sophisticated tools for access control, backup, and versioning

Content in CMS E. Wilde: Content Management System (CMS)

(13) Tables (Relational Model)

Most widely used model for large collections of structured data
Very mature products and many skilled people available
The biggest advantage is that it is not hierarchical (no structure bias)
Advantages of relations:
- well-understood model and maps well to existing data
- the non-hierarchical model allows views from different perspectives
- highly scalable solutions available
Disadvantages of Relations:
- bad for sequences and variable structures (choices, repetitions, …)
- very bad for structured documents

Content in CMS E. Wilde: Content Management System (CMS)

(14) ER Model

Content in CMS E. Wilde: Content Management System (CMS)

(15) Ordered Trees (XML)

XML [Extensible Markup Language (XML)] has a heritage of document processing
XML tools can be used standalone and are widely supported
XML and HTML have a very similar foundation
XML has two built-in directions: hierarchy and ordered children
Advantages of XML:
- maps well to HTML and XHTML
- well-suited for document-oriented content
Disadvantages of XML:
- not good at representing non-tree data
- databases not as mature as relational products

Content in CMS E. Wilde: Content Management System (CMS)

(16) XML Content

The term Mixed content in XML refers to elements which have text content mixed with elements [http://www.w3.org/TR/xml/#sec-mixed-content]. What these elements do depends on the elements , but the important point is that they are on the same level as the text nodes of the mixed content.

Content in CMS E. Wilde: Content Management System (CMS)

(17) Directed Graphs (RDF)

RDF is the metamodel of the Semantic Web [Semantic Web]
Highly granular, less rigid than tables, less ordered than trees
Advantages of RDF:
- any structure can be mapped to RDF triples
- support still limited but getting better
Disadvantages of RDF:
- no model for document boundaries and self-contained units
- bad for sequences
- very bad for structured documents

Content in CMS E. Wilde: Content Management System (CMS)

(18) Choose a Matching Metamodel

Content has some inherent metamodel properties
- forcing that into a different metamodel is possible but unwise
Using a metamodel which best matches a model is crucial
- if you have large collections of rigid and highly-structured data: Tables
- if you have structured documents with rich text: XML
- if you have fine-granular graph-structured data: RDF
Mapping is always possible but has severe limitations
- things that work effortlessly in one metamodel may be awkward in another
- there is no such thing as the one metamodel for all needs
- RDF's claim to be the one metamodel for everything is not backed by facts

Management in CMS

Outline (Management in CMS)

Content on the Web [2]
Content in CMS [11]
Management in CMS [3]
System in CMS [3]
Conclusions [1]

Management in CMS E. Wilde: Content Management System (CMS)

(20) Deconstructing Management

What is managed?
Who is managing it?
What are the management support functions?
Are there workflows and processes?
Is the management integrated with other processes?
Is it likely that processes will be followed?

Management in CMS E. Wilde: Content Management System (CMS)

(21) Managing Content with Files

Management in CMS E. Wilde: Content Management System (CMS)

(22) Integrated Management Functions

Separating management and publishing does not work well
- typical examples for workflows are review and release processes
- oftentimes publishing-specific roles are required
Integrated management takes over all tasks
- What is managed? Database of structured content.
- Who is managing it? Registered users based on roles.
- What are the management support functions? Building a site around the content.
- Are there workflows and processes? Can be based on users/roles/content.
- Is the management integrated with other processes? APIs allow extension/integration.
- Is it likely that processes will be followed? Easier to use than homegrown methods.

System in CMS

Outline (System in CMS)

Content on the Web [2]
Content in CMS [11]
Management in CMS [3]
System in CMS [3]
Conclusions [1]

System in CMS E. Wilde: Content Management System (CMS)

(24) System Platform

Systems need runtime environments
CMS are programs that are installed on some OS
- integrated Web server or connect with Web server
- integrated database or connect with database
- integrated use management or connect with user management
Typical steps for setting up a CMS
1. setting up the runtime environment
2. installing the CMS software
3. initializing the CMS installation
4. migrating existing data into the CMS installation

System in CMS E. Wilde: Content Management System (CMS)

(25) Drupal

System in CMS E. Wilde: Content Management System (CMS)

(26) MarkLogic

XML-based content management
- content is stored in XML documents in an XML database
- non-XML content can be stored as well
- programming is done in XQuery using extension functions
XML-based content management works well for documents
- XML's ordered trees are well-suited to represent documents
- MarkMail [http://markmail.org/] provides access to well-indexed data [http://markmail.org/search/?q=erik%20wilde] of 7,227 mailing lists
XML databases are more efficient than XML files
- XML content is indexed and access is faster
- building server farms is supported by the database management system

Content Management System (CMS)

Web Architecture [./]
Fall 2009 — INFO 290 (CCN 42593)

Erik WildeUC Berkeley School of Information, UC Berkeley School of Information
2009-09-24

Contents

(2) Abstract

(3) Content on the Web

Content on the Web

Outline (Content on the Web)

(5) Content and Structure

(6) CMS Evolution

Content in CMS

Outline (Content in CMS)

(8) Serving Content from Files

(9) The Rise of the CMS

(10) Serving Content from Files with SSI

(11) Files (Opaque Chunks)

(12) File Systems are Databases

(13) Tables (Relational Model)

(14) ER Model

(15) Ordered Trees (XML)

(16) XML Content

(17) Directed Graphs (RDF)

(18) Choose a Matching Metamodel

Management in CMS

Outline (Management in CMS)

(20) Deconstructing Management

(21) Managing Content with Files

(22) Integrated Management Functions

System in CMS

Outline (System in CMS)

(24) System Platform

(25) Drupal

(26) MarkLogic

Conclusions

Outline (Conclusions)

(28) Content vs. Web Pages

Content Management System (CMS)

Web Architecture [./]Fall 2009 — INFO 290 (CCN 42593)

Erik WildeUC Berkeley School of Information, UC Berkeley School of Information2009-09-24

Contents

(2) Abstract

(3) Content on the Web

Content on the Web

Outline (Content on the Web)

(5) Content and Structure

(6) CMS Evolution

Content in CMS

Outline (Content in CMS)

(8) Serving Content from Files

(9) The Rise of the CMS

(10) Serving Content from Files with SSI

(11) Files (Opaque Chunks)

(12) File Systems are Databases

(13) Tables (Relational Model)

(14) ER Model

(15) Ordered Trees (XML)

(16) XML Content

(17) Directed Graphs (RDF)

(18) Choose a Matching Metamodel

Management in CMS

Outline (Management in CMS)

(20) Deconstructing Management

(21) Managing Content with Files

(22) Integrated Management Functions

System in CMS

Outline (System in CMS)

(24) System Platform

(25) Drupal

(26) MarkLogic

Conclusions

Outline (Conclusions)

(28) Content vs. Web Pages

Web Architecture [./]
Fall 2009 — INFO 290 (CCN 42593)

Erik WildeUC Berkeley School of Information, UC Berkeley School of Information
2009-09-24