The Semantic Web —
Introduction and Overview
ETH Zürich, TIK
http://dret.net/netdret/docs/wilde-einiras2005-semweb/
Abstract
The Web has been designed as a source of information for humans. For many, this medium has become indispensable. As a vision for the future, the Web could and should be extended with information that can be understood by machines. This would be the foundation for a new class of applications, and would also result in the improved interconnectivity of available information. The new possibilities of this "Semantic Web" will be demonstrated showing prototypes and simple examples.
Overview
- Do computers understand the Web?
- Computers and intelligence
- Semantic Web — How it started
- Semantic Web visions
- Semantic Web applications
- Final remarks
Do Computers Understand Humans?
Do Computers Understand the Web?
- Web content is multimedia content
- Media types label the type of the content (
text/html
, image/gif
)
- The content has to conform to this type (HTML, pixels)
- Almost all media types are ultimately intended for humans
- Content for human perception (text, images, sound)
- Descriptive metadata (Dublin Core in HTML, XMP in PDF)
- Computers have a lot of trouble understanding content
- Understanding general text is very hard ("I see the man with the telescope.")
- Understanding images is even harder (CAPTCHATM)
- Understanding sound maybe harder still
Computers and Today's Web
- Computers are the infrastructure of the Web
- Essential for the distribution of contents
- Essential for the presentation of contents
- Essential for navigating through content
- But they have no idea what they are doing...
- They do not understand the content they are managing
- Even search is mostly based on pattern matching
- The old idea of intelligent machines that offer:
- Deeper understanding of contents
- Provides a whole new field of applications
Can Computers Understand Humans?
System error 487644: define "understand"
- Firstly, what does it mean to "understand"?
- Concepts are the foundation of all knowledge
- Categorizations based on abstractions
- Assertions describe and interrelate concepts
- And thus connect them to form a network
- Describing observed facts by using given concepts
- The assertions allow reasoning about the facts
- True intelligence: making up new concepts/assertions when necessary
- ... and verifying them because of confirmations
- ... or falsifying them because of inconsistencies
- This is (relatively) easy at first sight, but in practice ...
- ... humans easily exist with inconsistent models
- ... formalisms have problems with inconsistent models
AI — Theory and Practice
- AI — the unkept promise of computer science
- Theory:
- Concepts and facts: Anna is
female
and Bob is her father
- Assertion:
daughter(child, parent) :- father(parent, child), female(child).
- Reasoning (new fact): Anna is Bob's
daughter
- Practice:
- Concepts are often fuzzy (humans are either men or women)
- Assertions are often fuzzy (the sex of a person is always the same)
- Observed facts do not fit into the given concepts/assertions
- No more funding, neural networks, silence...
- Some steadfast believers: cycorp resp. opencyc.org
What does cyc Know?
- OpenCyc 0.9 (2/2005): 47'000 concepts and 306'000 assertions about them
The Semantic Web — What is it?
- Machine-readable description (conceptualization) of resources
- Machine-readable assertions about concepts
- If both are present, AI-style applications are possible
"Register me at the EINIRAS 2005 conference, look for a hotel, a flight, and a rental car. Travel is possible for me between 9/10 and 9/19. I want the cheapest offer, but using a rental car or public transport to get to the conference venue should take no more than 20 minutes."
Semantic Web Stone Age
- The Web is the first "totally free" medium
- Web content is accessible world-wide without limitations
- Web content is neither censored nor controlled
- Inherent risks for control freaks
- "Good guys": protecting children and others in need
- "Bad guys": prohibiting/filtering unwanted information
- Something must be done to enable control over this medium!
Semantic Web in Action
- Platform for Internet Content Selection (PICS) syntax
<meta http-equiv="pics-label" content='(pics-1.1 "http://www.icra.org/ratingsv02.html" l gen true for "http://www.playboy.com" r (ca 1 la 1 lb 1 lc 1 nc 1 nd 1 ne 1 ng 1 ni 1 oa 1 ob 1 od 1 vz 1)' />
- Resource Description Framework (RDF) syntax
<link rel="meta" href="/labels.rdf" type="application/rdf+xml" title="ICRA labels" />
labels.rdf
contains RDF statements
- Both labels mean the same using different syntaxes
- Statements about "Nudity and sexual material"
- Statements about "Language"
- Statements about "Other topics" (e.g., smoking & drinking)
W3C's Semantic Web Vision
- Resources are describing themselves or are being described
- The Resource Description Framework (RDF) allows statements
(Resource, Property, Value)
, e.g. (ETHZ, location, Zürich)
- RDF can make statements about statements (important for trust and security issues)
- How to reason about these statements?
- Searching for descriptions satisfying given criteria
- SPARQL is an RDF-based query language
- Find all resources where
(*, location, Zürich)
Who do you Trust? And Why?
- The size and extension of the Web raise serious questions:
- Statements about resources are interpretations of these resources
- Interpretations are subjective and context-dependent
- Seen globally, subjectivity and context are very important
- Complex statements are very hard to check
- Important questions about the Semantic Web are social/cultural:
- Do I trust the statements about a resource?
- Do I trust the reasoning based on these statements?
- And am I responsible for the actions finally being taken?
Chances, Challenges and Limits
- ⊕ Clearly defined application areas
- Manageable set of concepts and assertions
- Costs for adding new concepts/assertions and creating statements
- ⊖ General information available on the Web
- Huge and unmanaged set of concepts and resources
- Sizeable costs for new resources and ontology changes
- No clearly identifiable pay-off
Application Areas of the Semantic Web
- Clearly definable area
- Clearly defined area
- Willingness to introduce self-limitations
- Willingness to bear the starting costs
- Willingness to accept the learning curve
ShaRef & Semantic Web
- ShaRef describes (bibliographic) resources
- Resources have different types (book, article, web page)
- Resources have properties of given types (title, author, date)
- ShaRef connects resources
- Resources can refer to other resources (updates, refutes)
- Resources can refer to a common vocabulary (keywords)
- ShaRef's statements about resources can be exported as RDF, ...
- ... or as HTML, or as BibTeX, or as EndNote, ...
- ... with clearly defined semantics, syntax is a technical detail
Joining of Statements
- Dublin Core (DC) describes networked resources
- Small set of properties to describe resources (RDF example)
- vCard describes virtual business cards
- Small set of properties to describe contact information (RDF example)
- Friend of a Friend (FOAF) describes social networks
- Small set of properties to describe social contacts (RDF example)
- RDF is the common vocabulary used for these different applications
- This makes it easy to join the statements to a common graph
- DC metadata extended with contact information (RDF example)
- ... and then extended by the social network information (RDF example)
That's it!
- Thanks for listening!
- Some interesting resources:
- Questions or comments: Send me e-mail