A Conceptual Model and Rule-Based Query Language for HTML

Mengchi Liu, Tok Wang Ling

Mengchi Liu, Tok Wang Ling, A Conceptual Model and Rule-Based Query Language for HTML, World Wide Web, Springer-Verlag, 1(1-2):49-77, March 2001.

Most documents available over the Web conform to the HTML specification. Such documents are hierarchically structured in nature. The existing data models for the Web either fail to capture the hierarchical structure within the documents or can only provide a very low level representation of such hierarchical structure. How to represent and query HTML documents at a higher level is an important issue. In this paper, we first propose a novel conceptual model for HTML. This conceptual model has only a few simple constructs but is able to represent the complex hierarchical structure within HTML documents at a level that is close to human conceptualization/visualization of the documents. We also describe how to convert HTML documents based on this conceptual model. Using the conceptual model and conversion method, one can capture the essence (i.e., semistructure) of HTML documents in a natural and simple way. Based on this conceptual model, we then present a rule-based language to query HTML documents over the Internet. This language provides a simple but very powerful way to query both intra-document structures and inter-document structures and allows the query results to be restructured. Being rule-based, it naturally supports negation and recursion and therefore is more expressive than SQL-based languages. A logical semantics is also provided.


Bibliography Navigation: Reference List; Author Index; Title Index; Keyword Index

Generated by sharef2html on 2011-04-15, 02:00:41.