Semantic Web

Web Architecture [./]
Fall 2010 — INFO 290 (CCN 42605)

Erik Wilde, UC Berkeley School of Information
2010-11-16

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents E. Wilde: Semantic Web

Contents

E. Wilde: Semantic Web

(2) Abstract

The Semantic Web can either be understood as a prepackaged set of languages and technologies for representing semantics and working with them, or as a more general idea of Web Semantics, which instead of predefining certain languages and technologies just looks at the various options of how more semantics can be represented on the Web. Taking the latter approach, this lecture looks at the various ways in which semantics can be introduced on the Web, and what is required in these scenarios in terms of technology and information sharing.



Information About the Web

Outline (Information About the Web)

  1. Information About the Web [10]
    1. Describing Resources with Microformats [2]
    2. Publishing Resources as XML [2]
    3. Transforming Resources into RDF [2]
    4. Transforming Services into RDF [2]
  2. Conclusions [1]
Information About the Web E. Wilde: Semantic Web

(4) Web Resources



Information About the Web E. Wilde: Semantic Web

(5) Resources on the Web

Linked Resources on the Web

Describing Resources with Microformats

Outline (Describing Resources with Microformats)

  1. Information About the Web [10]
    1. Describing Resources with Microformats [2]
    2. Publishing Resources as XML [2]
    3. Transforming Resources into RDF [2]
    4. Transforming Services into RDF [2]
  2. Conclusions [1]
Describing Resources with Microformats E. Wilde: Semantic Web

(7) Metadata as Markup Overlay

Embedding Microformats in Web Resources

Describing Resources with Microformats E. Wilde: Semantic Web

(8) Surfacing Concepts

  • Microformats [Microformats] are embedded into HTML Web content
    • there is no need for alternative representations
    • machines use the same resources as humans (and extract microformat data)
  • Many larger Web sites are using structured back-ends
    • for pure HTML publishing, semantics are translated into HTML/CSS
    • more semantics can be represented by also producing microformats
  • Microformats can also help in decentralized service scenarios
    • centralized: data has to be submitted to a hub
    • decentralized: crawlers search for microformats and use what they find
    • advantage of decentralization: loose coupling and independent control


Publishing Resources as XML

Publishing Resources as XML E. Wilde: Semantic Web

(10) Different Representations

Publishing Resources as XML

Publishing Resources as XML E. Wilde: Semantic Web

(11) XML as Custom Markup

  • XML [../xml-fall10] can represent arbitrary (textual) data
    • works very well for tree-structured and document-style data
    • works not so well for graph-like data with no inherent order
  • XML only defines a syntax for representing ordered trees
    • what the elements and attributes mean requires agreement on the vocabulary
    • XML is a good tool for data exchange, but only for the syntax part
  • Agreement can be based on three different approaches
    1. everybody always uses a universally useful vocabulary (XHTML)
    2. user (groups) agree on vocabularies based on mutual interest
    3. user (groups) agree on modules and build vocabularies with these modules (UBL [http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl])


Transforming Resources into RDF

Outline (Transforming Resources into RDF)

  1. Information About the Web [10]
    1. Describing Resources with Microformats [2]
    2. Publishing Resources as XML [2]
    3. Transforming Resources into RDF [2]
    4. Transforming Services into RDF [2]
  2. Conclusions [1]
Transforming Resources into RDF E. Wilde: Semantic Web

(13) Translation into a Universal Metamodel

GRDDL for Transforming Markup into RDF

Transforming Resources into RDF E. Wilde: Semantic Web

(14) The Semantic Web Vision

  • RDF [Microformats; Resource Description Framework (RDF) (1)] is a more general model for structured data
    • it is very fine-granular and can represent (almost) everything
    • its granularity becomes problematic in scenario with coarse granularity
  • Transforming resources can be done in two basic ways
    1. using well-defined mappings between HTML and RDF (GRDDL [http://www.w3.org/TR/grddl-primer/])
    2. extracting information by analysis such as NLP [http://en.wikipedia.org/wiki/Natural_language_processing]
  • Transforming resources does not require cooperation of the service
    • any service can be crawled and the resources can be transformed
  • Transforming resources does not allow access to all data
    • crawling can have limits and thus the RDF data is limited as well
    • with everything as RDF the complete RDF graph could be queried


Transforming Services into RDF

Outline (Transforming Services into RDF)

  1. Information About the Web [10]
    1. Describing Resources with Microformats [2]
    2. Publishing Resources as XML [2]
    3. Transforming Resources into RDF [2]
    4. Transforming Services into RDF [2]
  2. Conclusions [1]
Transforming Services into RDF E. Wilde: Semantic Web

(16) Bypassing Web Publishing

Using Structured Data Sources

Transforming Services into RDF E. Wilde: Semantic Web

(17) Trillions of Triples

  • Turn the structured data from the back-end into RDF
    • the complete dataset can be transformed in one processing step
    • relationships between resources may be better preserved than by crawling
    • keeping the snapshot current can become a significant problem
  • With everything in one database (a triple store [http://en.wikipedia.org/wiki/Triplestore]) queries become possible
    • SPARQL queries can query the graph of complete service data
    • large datasets easily translate into several billion triples
    • triple store implementation moves to native RDF databases
  • Large triple stores can process complex SPARQL queries
    • performance is a problem with large datasets
    • RDF's simplicity bites back in the form of reification [http://en.wikipedia.org/wiki/Reification_%28computer_science%29#Reification_on_Semantic_Web]


Conclusions

Outline (Conclusions)

  1. Information About the Web [10]
    1. Describing Resources with Microformats [2]
    2. Publishing Resources as XML [2]
    3. Transforming Resources into RDF [2]
    4. Transforming Services into RDF [2]
  2. Conclusions [1]
Conclusions E. Wilde: Semantic Web

(19) Exposing Structured Data



2010-11-16 Web Architecture [./]
Fall 2010 — INFO 290 (CCN 42605)