[an error occurred while processing this directive]

The Stanford WebBase Project

Our DLI2 WebBase project builds on our previous Google activity. It has the following goals:

  1. Provide a storage infrastructure for Web-like content
  2. Store a sizeable portion of the Web
  3. Enable researchers to easily build indexes of page features across large sets of pages
  4. Distribute Webbase content via multicast channels

Some of the challenges we are addressing include:

The project is developing the following facilities:

 

  1. Smart crawling technology. This will allow us to crawl 'valuable' sites more frequently or more deeply. The measure of 'value' is, of course, itself an important research topic. Google's page rank is one such measure.
  2. Web repository. This infrastructure will hold large numbers of Web pages, and will allow experimental search and analysis over that information. We are working on the following components:
  3. Wide-Area Web Data Distribution. Our distribution machinery will allow researchers everywhere to take advantage of our collected data. Rather than forcing all index creation and data analysis to run at the site where the data is located, our wide-area data distribution facility will multicast the archive's content through multicast channels. Channels may vary in bandwidth and content. Subscribers specify the parameters of their channel.

See Junghoo Cho's Powerpoint presentation for more detail. [an error occurred while processing this directive]