Design of an RSS Crawler with Adaptive Revisit Manager

Bum-Suk Lee, Jin Woo Im, Byung-Yeon Hwang, Du Zhang

Citation
Descriptions
Abstract:

RSS (Rich Site Summary, or Really Simple Syndication) is widely used for notifying readers of updated information on blogs and feeding news to readers quickly. RSS is very simple, and so is mostly used as a web service. However there is no satisfactory search engine which works for RSS. The reason is that RSS is continuously modified, and the structure of general search engines is ineffective to collect information from RSS sources. In this paper, we discuss a web crawling algorithm, and propose a structure for an RSS crawler which is geared toward collecting and updating RSS in the Web 2.0 environment. The proposed method (1) uses visited domain name history to predict the location of the RSS of a new seed URL, and (2) updates RSS information adaptively, based on some update-checking heuristics. These approaches can serve as cornerstones for an efficient and effective RSS search engine.

Resources

Bibliography Navigation: Reference List; Author Index; Title Index; Keyword Index


Generated by sharef2html on 2011-04-15, 02:00:41.