Digital Libraries Research Agenda Report - Appendix 3: Working Group Reports

Report of the Publishing Perspective Working Group

IITA Digital Libraries Workshop

William Arms

A. Introduction
B. The Need for Research in Digital Libraries
C. Needs of Originators, Creators, and Publishers
D. The Top Research Topics
E. Areas Not Recommended for Research.
F. Fragmentation and Coordination of Research

A. Introduction

The working group considered research in digital libraries from the perspective of all creators, originators, editors, rights-holders, and publishers of material in the digital library. This first section describes some underlying issues behind the group's recommendations in later sections.

1. Organizations and publishing. The boundaries between authors, publishers, libraries, and readers evolved partly in response to technology, particularly the difficulty and expense of creating and storing paper documents. New technologies can shift the balance and blur the boundaries.

Publishers and libraries perform many functions that go far beyond the creation and management of physical items. Examples range from editing and refereeing, to abstracting and indexing. We believe, therefore, that the roles of libraries and publishers will continue even as their specific practices change with the technology. The new forms of publishing and library organizations that will emerge are open to speculation, but we believe they will be shaped by natural market forces and are not a topic for research.

2. The social, economic, and legal frameworks. The research agenda in digital libraries should not be restricted to technical areas. The social, economic, and legal questions are too important ignore.

Publishing and libraries exist in a social and economic framework, where the operating rules are codified by a network of laws and business relationships. One of the greatest forces inhibiting the rapid deployment of digital libraries is the need to modify this framework. Two key topics are understanding how copyright functions in digital libraries and how the various costs will be covered.

3. Ease of use. The benefits of digital libraries will not be appreciated unless they are easy to use effectively. Some experienced people already meet a high percentage of their library needs with networked information. These individuals have development heuristics for finding and evaluating information, but their practices are difficult to describe and teach to less-experienced users.

The ease of use will develop naturally when the rate of change slows down, conventions develop, and less successful systems are withdrawn. However, natural progression alone is unlikely to be sufficient by itself.

B. The Need for Research in Digital Libraries

1. What is a digital library? A library is a system in which large volumes of information from many sources are assembled, organized, and made accessible without detailed prior knowledge of that information's use.

A digital library is a library where the information is stored and processed in digital formats. (The World Wide Web is a simple example.) The digital library system will contain many components, with different technical underpinnings, managed by many organizations.

2. Why is research in digital libraries important? Libraries are important because: (1) they retain the social, scientific, legal and other records of our culture; (2) they provide wide, inexpensive access; and (3) they provide access to this record supporting economic and cultural development.

Digital Libraries are important because: (1) they have the potential to provide library services more effectively; (2) they can store information that exists only in digital form; and (3) they provide new opportunities to organize and disseminate information.

Research in digital libraries is needed to tackle the hard, technical questions that must be resolved for the essential functions of libraries to continue into the digital age and to realize their new potential. It will be essential, for example, to develop ways for independently developed digital libraries to interoperate.

In addition to supporting the development of digital libraries, research in this area will necessarily address core problems in network computing, that are key to the development of many other areas of national interest, such as electronic commerce.

C. Needs of Originators, Creators, and Publishers

In this section, the value and potential of digital libraries is explored through the needs of "originators." This word is used to describe all forms of creators and publishers -- people or organizations who generate, organize, or otherwise create material that they wish to distribute in digital form.

  1. Dissemination. The basic need of originators is an infrastructure that supports widespread distribution of digital library objects within a simple framework.

  2. Access. The second need is a library system that provides access to these objects. This requires tools for finding material, such as catalogs and indexes, and systems for managing access, such as authentication and payment tools.

  3. Archiving. Originators usually expect that their material will be preserved over long periods of time. They require systems that will ensure access despite changes in organizations and technology.

  4. Control. When originators distribute their material, they usually require some control over how it is used. This control varies from placing the material in the public domain to tight restrictions on access. It includes decisions about who can alter the material and other considerations of integrity.

  5. Legal and social. A society that enables orderly dissemination is crucial. Legal areas include copyright and other intellectual property, privacy, obscenity, and libel. Business practices include acceptable use policies, codes of practice, and standard contracts.

  6. Tools. Originators need computers, networks, and software tools for the straightforward and orderly creation, distribution, and access to all types of information.

D. The Top Research Topics

This section lists key topics where research can contribute to the development of digital libraries, satisfying the needs described in Section C.

1. General

a) Scale and complexity. Many of the problems faced by digital libraries are already solved on a small scale, but deployment on a large scale is more difficult. It is a deep research problem to create widely dispersed, distributed systems, developed by many organizations, across national boundaries, with technology from many sources.

b) Integration of digital and conventional libraries. Digital libraries and conventional libraries will coexist indefinitely. Two major research topics are: (1) how to build integrated libraries where some of the material is in conventional formats, notably paper, and some is digital; and (2) how to build indexing and abstracting service that combine the effectiveness of human and computer systems.

c) Measurement of effectiveness. Research in libraries, including digital libraries, lacks measures of effectiveness. For example, the classical measures of recall and precision are widely disliked, yet no alternatives exist.

d) Tools for creating and managing digital libraries. At present, digital libraries are very labor intensive, as are traditional libraries. Tools are needed to simplify the tasks of creating, managing, and using them.

2. Content

a) Text. Text retains a special place in the digital library, because it is the primary medium of human communication. There are many complex research questions about creating digital text, organizing it for retrieval and display, and combining it with other material.

b) Active library objects. The digital medium allows for new types of library objects such as software, simulations, animations, movies, slide shows, and sound tracks, with new ways to structure material, such as hypertext. Active library objects enable the form of an object with which the user interacts to be very different from the stored form.

c) Integration of mixed media. Much of the development of multi-media and mixed media is happening independently. Digital library research will integrate these materials and develop systems to provide access to them.

3. The long term

a) Preservation. To preserve material in the digital library is to retain its content over long periods, without necessarily retaining the media, the format, or other methods of representing the content. Preservation of material over very long periods is one of the defining characteristics of libraries, archives, and museums.

To preserve digital library material, more than bits must be retained. The library must be able to recognize formats and have the technical ability to display, perform, or otherwise interact with materials originally developed for long-dead computer systems written in forgotten programming languages.

b) Naming. Naming systems are a key component of libraries. They need to support the access to materials long after their creators cease to exist. The problems divide into two sections-- naming individual digital objects and naming works -- which may be composed of many digital objects. Each part of the problem has to deal with both static and dynamic objects and to resolve the issues of equivalence.

4. Computer systems

The research problems here concern library and publishing functions in distributed systems.

a) Repository access protocols. No existing protocol for communication between library client and the various types of repositories and archives is adequate.

b) Security and authentication. Security and authentication are essential. General developments for the National Information Infrastructure (NII) may create services that digital libraries can use, but protocols that deal with the specific practices of publishers and libraries must be developed.

c) Mixed environments. Users of the digital library will have a huge variety of computers, connected over widely differing communications channels, operating in different social and legal frameworks. The digital library must adapt to these mixed environments, providing suitable services with good performance.

5. Social

The social aspects of digital libraries are some of the most difficult. Here are two vital topics:

a) Human-computer interaction. The problem in human-computer interaction lies in the structuring of information sources and services. Users must not be obliged to serve a long apprenticeship before they can make effective use of the digital library.

b) Rights management. Rights management is a key part of control. Rights in intellectual property must be identified and tracked. Rights management can be linked with questions of payment.

E. Areas Not Recommended for Research.

The list of research topics in Section D is long, but many important topics have been omitted.

1. Scope. Some important areas can be left to normal developments. In these areas, the main concern is that the interests of existing organizations should not inhibit entrepreneurs and innovation.

Some topics fall cleanly within other research fields. For example, we do not recommend specific research in networks or multimedia, except in areas where digital libraries have special needs.

2. Difficulty. Some areas are important but so complex that we see little hope of successful research. Some of the social and economic questions fall in this area. Other areas are so straightforward that they do not justify specific research.

3. Unserved public. Libraries have been a great contributor to the continuing openness of society. Although we do not recommend specific research in this area, digital libraries must serve the nation and the world as broadly as possible and not be confined to people who have advanced equipment and resources.

4. Transfer from research. The research topics proposed have a bias in favor of long-term, fundamental research. This must be combined with more effective methods of technology transfer.

F. Fragmentation and Coordination of Research

Research into digital libraries is poorly served by the existing and planned conferences and journals. The community needs a small number of high-quality methods of exchanging research ideas and develop standards.

Each of the federal funding agencies sponsoring digital library research has its own mission. These do not always map cleanly onto the needs for research and advanced development. Interagency cooperation will be as valuable as it has been in the overall HPCC program.

We encourage the continuing development of a framework for organizing, coordinating, and communicating of digital library research. This will act as a bridge with organizations engaged in implementation.