Structured Data on the Web

Michael J. Cafarella, Alon Halevy, Jayant Madhavan

Michael J. Cafarella, Alon Halevy, Jayant Madhavan, Structured Data on the Web, Communications of the ACM, 54(2):72-79, February 2011.

Though the web is best known as a vast repository of shared documents, it also contains a significant amount of structured data covering a complete range of topics, from product to financial, public-record, scientific, hobby-related, and government. Structured data on the Web shares many similarities with the kind of data traditionally managed by commercial database systems but also reflects some unusual characteristics of its own; for example, it is embedded in textual Web pages and must be extracted prior to use; there is no centralized data design as there is in a traditional database; and, unlike traditional databases that focus on a single domain, it covers everything. Existing data-management systems do not address these challenges and assume their data is modeled within a well-defined domain.


