Digital Libraries Research Agenda Report - Appendix 3: Working Group Reports

Report of the Library Perspective Working Group

IITA Digital Libraries Workshop

Bruce Schatz

Appendix 3-3 Library Perspectives

Introduction
1. Philosophical Issues: What is a Digital Library and How Does it Relate to Traditional Libraries?
2. Research Issues.
3. Priorities and Recommendations.

Introduction

This report briefly summarizes the discussions from the "Library" group. The perspective of this group was of "librarians" who might be seen as custodians of large digital repositories in the future. Thus, there was a strong concentration on retrieval from existing collections, rather than on generation of new materials. This focus provided a significant discussion on problems of search, and largely omitted problems of publishing. The library perspective might be summarized as building and maintaining large distributed repositories. Repositories are the technology to support users in search sessions across collections. In information infrastructure such as digital libraries, the technology and systems cannot exist alone in the absence of users and collections.

Most of the discussion in this group centered around the important research issues for repositories, with the hopes of encouraging research and funding into solutions of these issues. The issues are discussed below, preceded by a brief philosophical discussion, and followed by a prioritization of the most pressing research issues. The issues in the short-term concentrate on syntax, while the more important ones in the long-term concentrate on semantics.

1. Philosophical Issues: What is a Digital Library and How Does it Relate to Traditional Libraries?

An extended discussion of digital libraries made it clear that the defining factor is searching large collections. The key to a digital library is not yhr digitization of physical materials, but the organization of an electronic collection for better access. The organization provides coherence to a massive amount of shared knowledge, while the access provides convenient retrieval for a wide range of users distributed across a network. Therefore,

a digital library deals with organization and access of a large information repository.

The issues in digital libraries are thus quite similar to those in traditional libraries. Most of the major problems are the same, with some change in orientation due to electronic rather than physical materials. For the foreseeable future, digital libraries are likely to augment physical libraries, much as an on-line card catalog augments, rather than strictly replaces, a book collection. A user's information needs will nearly always be satisfied by some combination of digital and physical materials, each relying on the availability of collections and appropriateness of access. As a rule of thumb, the digital medium tends to be better for searching, and the physical medium tends to be better for reading. Remote items are more easily available, and relationships between items can be more easily followed.

2. Research Issues.

After much discussion, the research issues for digital libraries were divided into four major categories. The first two deal with the technologies and systems at the syntactic (interoperability) and the semantic (description) levels. The second two deal with the users and the collections support.

2.1. Interoperability. These issues deal with the global architecture necessary to deploy digital libraries widely. They are primarily at the syntactic level, dealing with the mechanisms for passing digital objects and operations around the network between collections and users. Thus, these issues concentrate mostly on access:

Naming of digital objects. Giving a unique and invariant name to information objects.

Protocols for object transmission. Executing operations across the network (e.g., issuing a search query to multiple collections).

Types of digital objects. Keeping track of the class definitions for information objects.

Metadata (syntax-level). Registering and reconciling the object schema.

2.2. Descriptions. These issues deal with the resources necessary to retrieve objects adequately from digital libraries. They are primarily at the semantic level, dealing with the mechanisms for describing the meaning of the objects in the collections. Thus these issues concentrate mostly on organization.

Metadata (semantics-level). Defining the value and meaning of the object substructure.

Computed descriptions. Extracting meaning deduced from object content (rather than recorded in static metadata fields).

Unification. Merging the semantics of the metadata across descriptions (e.g., interpreting an author search "properly" across multiple collections with different definitions).

Organization. Clustering the descriptions to facilitate navigation (e.g., building indexes at multiple levels to categorize the networked information).

2.3. Users. These issues deal with the interaction required for users to adequately access a digital collection.

Needs. Understanding what the users need and how to provide it (user assessment, user interface, and new information types).

Contributions. Enabling the users to organize the digital collections for better personal access (annotation, groupwork, and authoring).

2.4 Collections. These issues deal with the management required for collections to be adequately organized.

Archiving. Insuring that access to the digital collection is possible on a "permanent" basis (preservation of objects, conservation of operations).

Virtual collection development. Providing tools to organize a collection consisting of objects distributed across the network.

Repository management. Providing tools to update and maintain a digital collection.

3. Priorities and Recommendations.

From the library perspective, all of these issues are important. Users and collections must be served, and the underlying technology must be available for organization and access. However, some of the issues are pre-requisites to the others. In particular, digital objects must be available for the collections to be generated. So object naming (to reference the objects) and object archiving (to preserve them) are of immediate importance. Once the collections can exist, the adequacy of organization and access holds the most immediate importance. Metadata (both syntactic and semantic descriptions) and needs (understanding the users) are the next most immediate issues. After these critical topics, the rest of the issues were judged to be roughly of the same immediacy. The largest technology pay-off is in the semantic description issues, notably unification and organization; but much research remains to be done for a comprehensive solution to mapping semantics across repositories. A final recommendation is a plea for information systems research, instead of computer technology research. Digital libraries need to be tested with large collections and users since the value of the technology cannot be evaluated in isolation. Large testbeds of systems with new functionality are necessary to prototype new digital libraries. Deployment monies become as important as development monies for digital library funding.