Digital Libraries Research Agenda Report
Executive Summary
Interoperability, Scaling, and the
Digital Libraries Research Agenda:
A Report on the May 18-19, 1995
IITA Digital Libraries Workshop
- I. Infrastructure
- II. Research Agendas and Priorities
- A. Research in Digital Library Interoperability
- B. Research in Describing Objects and Repositories.
- C. Research in Collection Management and Organization
- D. Research in User Interfaces and Human-Computer Interaction.
- E. Economic, Social, and Legal Research Issues
- III. Scaling of Digital Library Experiments
This report summarizes the results of a workshop on Digital Libraries held
under the auspices of the U.S. Government's Information Infrastructure
Technology and Applications (IITA) Working Group in Reston, Virginia on May
18-19, 1995. The objective of the workshop was to refine the research
agenda for digital libraries with specific emphasis on issues of scaling
and interoperability, and to identify the infrastructure developments
needed to make progress on these issues.
Key Findings
In the near term, investment is needed to support both infrastructure
development and research in the following areas, with emphasis on rapid
deployment of:
- A. Common schemes for the naming of digital objects, and the linking of
these schemes to protocols for object transmission, metadata, and object
type classifications essential. Naming schemes that allow global unique
reference represent one of the most important infrastructural components to
support large-scale development of digital library systems and resource
sharing and interoperation among these systems.
- B. A deployed public key cryptosystem infrastructure -- including key
servers and appropriate standards -- are essential to progress in digital
libraries. This is needed for authentication, privacy, rights management,
and payments for the use of intellectual property. Without these services,
commercial publishers will remain reluctant to participate in digital
library development in the Internet environment.
A specific research agenda for digital libraries continues to evolve. Five
key areas were identified. The "grand challenge" is interoperability at a
deep semantic level, providing digital library users with a coherent view
of heterogenous autonomously managed rsources, and many research priorities
relate to this long-term objective. Other key themes involve the
relationship between traditional library missions and practices and digital
libraries, and ways in which traditional library functions will migrate to
the digital library environment.
There is a spectrum of interoperability goals, ranging from those that can
be achieved in the near term to longer-term "grand challenge" objectives.
At the near-term end of the spectrum is the use of common tools and
interfaces that provide a superficial uniformity for navigation and access,
but rely almost entirely on human intelligence to provide any coherence of
content. At the opposite end of the spectrum is deep semantic
interoperability -- the ability of a user to access, consistently and
coherently, similar (though autonomously defined and managed) classes of
digital objects and services, distributed across heterogeneous
repositories, with federating or mediating software compensating for
site-by-site variations. It also extends beyond passive digital objects to
actual services offered by specific digital library systems. An
intermediate position between these two extremes advocates primarily
syntactic interoperability (the interchange of metadata and the use of
digital object transmission protocols and formats based on this metadata
rather than simply common navigation, query and viewing interfaces) as a
means of providing limited coherence of content, supplemented by human
interpretation. Definition of levels of interoperability and the challenges
in achieving them is itself a key research problem.
More technical research questions involve protocol design that supports a
broad range of interaction types, inter-repository protocols, distributed
search protocols and technologies (including the ability to search across
heterogeneous databases with some level of semantic consistency), and
object interchange protocols. Interoperability is not simply a matter of
providing coherence among passive object repositories. Digital library
systems offer a range of services, and these services must be projected in
an interoperable fashion as well. Existing Internet protocols (such as
HTTP, the basis of the World Wide Web) are clearly inadequate. Research
must move beyond the current base of deployed protocols and systems. This
raises complex questions about how to deploy prototype systems and the
tradeoffs between advanced capabilities and ubiquity of access.
Finally, users in the networked environment will have access to personal,
workgroup, organizational, and public information spaces. Digital libraries
can exist at multiple levels in this hierarchy. Users will demand a
coherent view across these multiple information spaces, and controlled
interoperability among them.
In order to provide a coherent view of collections of digital objects, they
must be described in a consistent fashion which can facilitate the use of
mechanisms such as protocols that support distributed search and retrieval
from disparate sources. Research in description of objects and collections
of objects provides the foundation for effective interoperability.
Interoperability at the level of deep semantics will require breakthroughs
in description as well as retrieval, object interchange, and object
retrieval protocols.
Issues here include the definition and use of metadata and its capture or
computation from objects (both textual and multimedia), the use of computed
descriptions of objects, federation and integration of heterogeneous
repositories with disparate semantics, clustering and automatic
hierarchical organization of information, and algorithms for automatic
rating, ranking, and evaluation of information quality, genre, and other
properties. Other key issues involved knowledge representation and
interchange, and the definition and interchange of ontologies for
information context.
Research is also needed to understand the strengths and limitations of
purely computer-based technologies for describing objects and repositories,
and the appropriate roles for the efforts of human librarians and subject
experts in the digital library context as a complement to these
technology-based approaches is also clearly a central problem.
Collection management and organization research is the area where
traditional library missions and practices are reinterpreted for the
digital library environment. Progress in this area is essential if digital
library collections are to meet successfully the needs of their user
communities.
Policies and methods for incorporating information resources on the network
into managed collections, rights management, payment, and control issues
were all identified as central problems in the management of digital
collections. Approaches to replication and caching of information and
their relationship to collection management in a distributed environment
need careful examination. The authority and quality of content in digital
libraries is of central concern to the user community. Ensuring and
identifying these attributes of content calls for research that spans both
technical and organizational issues. Research is also needed to clarify the
roles of librarians and institutions in defining and managing collections
in the networked environment. The preservation of digital content across
multiple generations of hardware and software technologies and standards is
essential in the creation of effective digital libraries. This is an
extraordinarily difficult research problem which has not received
sufficient attention.
While user interfaces and human-computer interaction issues are an
extensive field of research in their own right, there are some specific
problems that are central to progress in digital libraries.
Display of information, visualization, and navigation of large information
collections, and linkages to information manipulation/analysis tools were
identified as key areas for research. The use of more sophisticated models
of user behavior in long-term interactions with digital library systems is
a potentially fruitful area for research. The necessity for a more
comprehensive understanding of user needs, objectives, and behavior in
employing digital library systems was stressed repeatedly as a basis for
designing effective systems. Finally, digital library systems must become
far more effective in adapting to variations in the capabilities of user
workstations and network connections (bandwidth) in presenting appropriate
user interfaces. New technologies such as personal digital assistants and
nomadic computing models will emphasize this need.
Digital libraries are not simply technological constructs; they exist
within a rich legal, social, and economic context, and will succeed only to
the extent that they meet these broader needs. Rights management, economic
models for the use of electronic information and billing systems to support
these economic models will be needed. User privacy needs to be carefully
considered. There are complex policy issues related to collection
development and management, and preservation and archiving. Existing
library practice may shed some light on these questions. The social context
of digital documents, including authorship, ownership, the act of
publication, versions, authenticity, and integrity require a better
understanding. Research in all of these areas will also be needed if
digital libraries are to be successful.
The understanding of digital libraries requires large-scale experiments.
These must be enabled by funding, infrastructure development, and software
deployment strategies. Further, it is vital that support be provided to
study the operation and effectiveness of large-scale experiments once they
are deployed.
The common vision is one of tens of thousands of repositories of digital
information that are autonomously managed yet integrated into what users
view as a coherent digital library system. We must move rapidly towards an
infrastructure that can support and facilitate research towards this common
vision. The Internet as a context for deploying digital library systems
offers an unprecedented opportunity -- not only technically by providing
connectivity to an enormous potential user base, but also culturally, given
the Internet community's models and traditions of technology diffusion
through the distribution of publicly available prototype software -- to
move ahead large-scale experiments.
We don't know how to approach scaling as a research question, other than to
build upon experience with the Internet. However, attention to scaling as a
research theme is essential and may help in further clarifying
infrastrcture needs and priorities, as well as informing work in all areas
of the research agenda outlined above. For example, reliability questions
are poorly understood. In a sufficiently large system, some components will
inevitably be out of service during the processing of any given query.
There was consensus on the need to enable large-scale deployment projects
(in terms of size of user community, number of objects, and number of
repositories) and subsequently to fund study the effectiveness and use of
such systems. It is clear that limited deployment of prototype systems will
not suffice if we are to fully understand the research questions involved
in digital libraries.