The Semantic Web —
Introduction and Overview
ETH Zürich, TIK
The Web has been designed as a source of information for humans. For many, this medium has become indispensable. As a vision for the future, the Web could and should be extended with information that can be understood by machines. This would be the foundation for a new class of applications, and would also result in the improved interconnectivity of available information. The new possibilities of this "Semantic Web" will be demonstrated showing prototypes and simple examples.
- Do computers understand the Web?
- Computers and intelligence
- Semantic Web — How it started
- Semantic Web visions
- Semantic Web applications
- Final remarks
Do Computers Understand Humans?
Do Computers Understand the Web?
- Web content is multimedia content
- Media types label the type of the content (
- The content has to conform to this type (HTML, pixels)
- Almost all media types are ultimately intended for humans
- Content for human perception (text, images, sound)
- Descriptive metadata (Dublin Core in HTML, XMP in PDF)
- Computers have a lot of trouble understanding content
- Understanding general text is very hard ("I see the man with the telescope.")
- Understanding images is even harder (CAPTCHATM)
- Understanding sound maybe harder still
Computers and Today's Web
- Computers are the infrastructure of the Web
- Essential for the distribution of contents
- Essential for the presentation of contents
- Essential for navigating through content
- But they have no idea what they are doing...
- They do not understand the content they are managing
- Even search is mostly based on pattern matching
- The old idea of intelligent machines that offer:
- Deeper understanding of contents
- Provides a whole new field of applications
Can Computers Understand Humans?
System error 487644: define "understand"
- Firstly, what does it mean to "understand"?
- Concepts are the foundation of all knowledge
- Categorizations based on abstractions
- Assertions describe and interrelate concepts
- And thus connect them to form a network
- Describing observed facts by using given concepts
- The assertions allow reasoning about the facts
- True intelligence: making up new concepts/assertions when necessary
- ... and verifying them because of confirmations
- ... or falsifying them because of inconsistencies
- This is (relatively) easy at first sight, but in practice ...
- ... humans easily exist with inconsistent models
- ... formalisms have problems with inconsistent models
AI — Theory and Practice
- AI — the unkept promise of computer science
- Concepts and facts: Anna is
female and Bob is her
daughter(child, parent) :- father(parent, child), female(child).
- Reasoning (new fact): Anna is Bob's
- Concepts are often fuzzy (humans are either men or women)
- Assertions are often fuzzy (the sex of a person is always the same)
- Observed facts do not fit into the given concepts/assertions
- No more funding, neural networks, silence...
- Some steadfast believers: cycorp resp. opencyc.org
What does cyc Know?
- OpenCyc 0.9 (2/2005): 47'000 concepts and 306'000 assertions about them
The Semantic Web — What is it?
- Machine-readable description (conceptualization) of resources
- Machine-readable assertions about concepts
- If both are present, AI-style applications are possible
"Register me at the EINIRAS 2005 conference, look for a hotel, a flight, and a rental car. Travel is possible for me between 9/10 and 9/19. I want the cheapest offer, but using a rental car or public transport to get to the conference venue should take no more than 20 minutes."
Semantic Web Stone Age
- The Web is the first "totally free" medium
- Web content is accessible world-wide without limitations
- Web content is neither censored nor controlled
- Inherent risks for control freaks
- "Good guys": protecting children and others in need
- "Bad guys": prohibiting/filtering unwanted information
- Something must be done to enable control over this medium!
Semantic Web in Action
- Platform for Internet Content Selection (PICS) syntax
<meta http-equiv="pics-label" content='(pics-1.1 "http://www.icra.org/ratingsv02.html" l gen true for "http://www.playboy.com" r (ca 1 la 1 lb 1 lc 1 nc 1 nd 1 ne 1 ng 1 ni 1 oa 1 ob 1 od 1 vz 1)' />
- Resource Description Framework (RDF) syntax
<link rel="meta" href="/labels.rdf" type="application/rdf+xml" title="ICRA labels" />
labels.rdf contains RDF statements
- Both labels mean the same using different syntaxes
- Statements about "Nudity and sexual material"
- Statements about "Language"
- Statements about "Other topics" (e.g., smoking & drinking)
W3C's Semantic Web Vision
- Resources are describing themselves or are being described
- The Resource Description Framework (RDF) allows statements
(Resource, Property, Value), e.g.
(ETHZ, location, Zürich)
- RDF can make statements about statements (important for trust and security issues)
- How to reason about these statements?
- Searching for descriptions satisfying given criteria
- SPARQL is an RDF-based query language
- Find all resources where
(*, location, Zürich)
Who do you Trust? And Why?
- The size and extension of the Web raise serious questions:
- Statements about resources are interpretations of these resources
- Interpretations are subjective and context-dependent
- Seen globally, subjectivity and context are very important
- Complex statements are very hard to check
- Important questions about the Semantic Web are social/cultural:
- Do I trust the statements about a resource?
- Do I trust the reasoning based on these statements?
- And am I responsible for the actions finally being taken?
Chances, Challenges and Limits
- ⊕ Clearly defined application areas
- Manageable set of concepts and assertions
- Costs for adding new concepts/assertions and creating statements
- ⊖ General information available on the Web
- Huge and unmanaged set of concepts and resources
- Sizeable costs for new resources and ontology changes
- No clearly identifiable pay-off
Application Areas of the Semantic Web
- Clearly definable area
- Clearly defined area
- Willingness to introduce self-limitations
- Willingness to bear the starting costs
- Willingness to accept the learning curve
ShaRef & Semantic Web
- ShaRef describes (bibliographic) resources
- Resources have different types (book, article, web page)
- Resources have properties of given types (title, author, date)
- ShaRef connects resources
- Resources can refer to other resources (updates, refutes)
- Resources can refer to a common vocabulary (keywords)
- ShaRef's statements about resources can be exported as RDF, ...
- ... or as HTML, or as BibTeX, or as EndNote, ...
- ... with clearly defined semantics, syntax is a technical detail
Joining of Statements
- Dublin Core (DC) describes networked resources
- Small set of properties to describe resources (RDF example)
- vCard describes virtual business cards
- Small set of properties to describe contact information (RDF example)
- Friend of a Friend (FOAF) describes social networks
- Small set of properties to describe social contacts (RDF example)
- RDF is the common vocabulary used for these different applications
- This makes it easy to join the statements to a common graph
- DC metadata extended with contact information (RDF example)
- ... and then extended by the social network information (RDF example)
- Thanks for listening!
- Some interesting resources:
- Questions or comments: Send me e-mail