InfoBus Proxy Development Kit

DLIOP-Compliant Repository Proxies

This Document

- Document the DocCursor() call and concept.


The Stanford InfoBus is a facility that allows information repositories and services to be made accessible via remote CORBA method calls. Examples for information repositories are databases, commercial information brokers, Web search engines, and metadata repositories. Examples for information processing services are document summarizers, indexers, visualization services, filtering tools, etc.


Each repository or service is represented on the InfoBus by a 'Library Service Proxy' (LSP), often simply called a 'proxy'. A proxy is a CORBA object that responds to standard method calls at one end. It also understands how to communicate with the respective service. The advantage of this proxy-based arrangement is that heterogeneity in interactions with these services can be masked, which makes it much easier to write applications that communicate with many repositories or services.


This ease of use relies on some standardization of the proxy interfaces: Which methods can be called on a proxy? Which parameters do they take? We have addressed this issue for proxies to repositories. We have developed a standard protocol we use on the InfoBus to communicate with repositories. The protocol is called the Digital Library Interoperability Protocol (DLIOP). This protocol specifies how clients submit a query, how results are returned synchronously or asynchronously, and how clients may ask for details of a document, such as the value of individual properties (author,title, etc.). The DLIOP is documented on the Web. A CORBA interface definition language (IDL) specification is at http://www-diglib.stanford.edu/diglib/pub/software/IDLInterchange.idl.


The DLIOP is very flexible in that it allows stateful, as well as stateless implementations, synchronous or asynchronous operation, and dynamic load balancing even during a session. While we need this flexibility for our research, it is not needed for all applications. We therefore prepared a software development kit that allows proxies to be built quickly, and without an understanding of DLIOP. This document explains that development kit. An appendix provides a very quick summary of how DLIOP functions.



The development kit documented here contains two parts: one to easily build new proxies, and one to easily build a client.

Basic Model

Figure 2 shows the model of the proxy kit.

Figure 2: New Sources Only Need a Simple 'CollectionPlug' to Operate on the InfoBus

The InfoBus is accessed from different clients via DLIOP, as exemplified by the oval at the top. Some sources interact with clients via DLIOP (top source). Sources for which InfoBus access is discussed in this document are accessed through two modules. The DLIOPWrapper is a module that comes with the developer kit. It handles all of the DLIOP protocol. The only module to be written by the source proxy developer is the proxy's collection plug. At its bottom, a collection plug module interacts with its source in whatever manner is appropriate (SQL, HTTP, Telnet, etc.). At its top, the plug provides an extremely simple API we call the Simple Collection API (SCAPI). The methods included in this API are documented below.

From the collection plug's point of view, interactions with the client (via the DLIOP wrapper) follow this simple protocol:

  1. Wrapper tells plug to perform a query on the underlying repository. The wrapper tells the plug how many documents to retrieve right away as part of the querying process.
  2. Plug performs the query and retrieves the documents. It (at least conceptually) stores them in a private result array and tells the wrapper how many results it retrieved. The plug maintains enough state that it can retrieve more result documents from the repository later on.
  3. The wrapper may ask the plug how many total hits were found
  4. Eventually, the wrapper will ask the plug for one result hit at a time. It does this by asking for a set of document properties, like 'author', or 'title' to be returned for each document. Documents are identified by their index in the result array. There is one special document property called 'content' which addresses the whole document, not just particular fields.
  5. By asking for a document with a result array index larger than what's currently in the result array, the wrapper can cause the plug to retrieve more documents.

We are keeping SCAPI as simple as possible. It consists of the following methods which will all be invoked by the DLIOP wrapper:

Collection Plug SCAPI Specification

Methods:

short TotalItems(): How many documents the collection plug has found to match the query. Value is -1if the total number of items cannot be determined. This is sometimes true for proxies to Web search engines that don't provide an explicit total hit count.

MoreAvailable(): Returns true or false depending on whether the collection plug has found more hits than the client has requested up to any given time.

TheQuery(): returns the query the collection plug is working on. It is the same as the one passed in the Search() call.

GetDefaultPropertyNames(): Returns the names of document properties the collection plug is able to extract from documents and return to the client. Ex: If the collection plug supports ['title', 'author'], then the client would be allowed to ask that title, author, or both be included with every result hit. Document objects placed in the client's result collection would then have these properties set. (See also SetProperties()and GetPropertyNames()).

GetPropertyNames(int): Return properties for the document at the given index in the result set.

GetPropertyNames(): Return the list of document property names the plug will include in result objects (i.e. the value set from a previous SetProperties() call.

SetProperties(String[]): Tell collection plug the names of document properties to include in result objects. (See also GetDefaultPropertyNames() and GetPropertyNames()). If any of the properties is not legal for this collection plug (i.e. is not in the set returned by GetDefaultPropertyNames()), then a java.lang.IllegalArgumentException is raised.

Search(String, int): This is the method that performs an actual query on the underlying collection.The first parameter is the query in a query language appropriate to the underlying collection. The second parameter contains the number of items. In response to this call, the method performs the query and (at least conceptually) places the resulting documents in a result array. It returns the number of items retrieved. Note that this number may be smaller than the number requested in the second parameter, because fewer documents are found. If the query is not well-formed, a java.lang.ParseException is raised.

GetPropertyValues(int): This method is how the software using the plug retrieves the values of the document properties. The parameter is an index into the result array. For example, assume that a previous SetProperties() call instructed the plug to return 'author' and 'title'. Assume further that the a previous query returned 7 items. The call GetPropertyValues(3) would return an array of two strings, the title and author of the 4th document in the result array (origin is 0). If the parameter is larger than the number of items in the result array, the plug attempts to retrieve more documents from the underlying collection. These documents are appended to the result collection, and the desired information is returned. If the requested index is out of range, that is if it is higher than the number of documents found, then a java.lang.ArrayIndexOutOfBoundsException is raised.

GetPropertyValues(int, String[]): This is like GetPropertyValues(int),but the properties to be returned are explicitly included in the call. In addition to java.lang.ArrayIndexOutOfBoundsException, this method throws java.lang.IllegalArgumentException if non-available properties are requested.

CheckPropAvailability(String): Returns true if the given property is supported by the collection plug.

CheckPropAvailability(String[]): Returns true if the given properties are supported by the collection plug.


If the collection plug provides these methods, the source will be accessible from any InfoBus client.

A Simple Java InfoBus Client

Appendix: Quick DLIOP Summary

For the interested reader, Figure 1 shows the basic interaction between a DLIOP client and a server. It is quite simple.

Figure 1: A basic client/proxy interaction via DLIOP


The client creates a local 'result collection' object which will eventually hold the results. The client then issues a query to the server, passing the query string, a pointer to the result collection, the number of results requested with the initial return batch, and the document attributes to be included for each document. For example, the client could ask for the titles and authors of the first 10 documents that contain the words 'digital library'. The server processes the query and adds ten new document objects to the client's result collection. Each object will contain two properties: 'author' and 'title'. The client can then communicate with the result collection to retrieve these objects.


Document objects have properties that can be retrieved with the getProperty() method, similar to the JavaBean convention.