|
|
Kenneth R. Abbott and Sunil K. Sarin.
Experiences with workflow management: Issues for the next generation.
In Richard Furuta and Christine Neuwirth, editors, CSCW '94,
New York, 1994. ACM.
Workflow management is a technology that is considered
strategically important by many businesses, and its
market growth shows no signs of abating. It is,
however, often viewed with skepticism by the
research community, conjuring up visions of
oppressed workers performing rigidly-defined tasks
on an assembly line. Although the potential for
abuse no doubt exists, workflow management can
instead be used to help individuals manage their
work and to provide a clear context for performing
that work. A key challenge in the realization of
this ideal is the reconciliation of workflow process
models and software with the rich variety of
activities and behaviors that comprise ``real''
work. Our experiences with the InConcert workflow
management system are used as a basis for outlining
several issues that will need to be addressed in
meeting this challenge. This is intended as an
invitation to CSCW researchers to influence this
important technology in a constructive manner by
drawing on research and experience.
|
|
|
Tarek F. Abdelzaher and Nina Bhatti.
Web content adaptation to improve server overload behavior.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
This paper presents a study of Web content
adaptation to improve server overload
performance, as well as an implementation of a' Web content adaptation
software prototype. When the request rate on a
Web server increases beyond server capacity, the server becomes
overloaded and unresponsive. The TCP listen
queue of the server's socket overflows exhibiting a drop-tail behavior. As a
result, clients experience service outages.
Since clients typically issue multiple requests over the duration of a
session with the server, and since requests are
dropped indiscriminately, all clients
connecting to the server at overload are likely
to experience connection failures, even though
there may be enough capacity on the server to deliver all responses properly
for a subset of clients. In this paper, we
propose to resolve the overload problem by adapting delivered content to load
conditions to alleviate overload. The premise
is that successful delivery of a less resource intensive content under overload
is more desirable to clients than connection
rejection or failures.
|
|
|
Serge Abiteboul, Sophie Cluet, and Tova Milo.
Querying and updating the file.
In Proceedings of the Nineteenth International Conference on
Very Large Databases, pages 73-84, Dublin, Ireland, 1993. VLDB Endowment,
Saratoga, Calif.
|
|
|
Serge Abiteboul, Sophie Cluet, and Tova Milo.
Correspondence and translation for heterogeneous data.
In Proceedings of the 6th International Conference on Database
Theory, Delphi, Greece, 1997. Springer, Berlin.
|
|
|
Marc Abrams, Constantinos Phanouriou, Alan L. Batongbacal, Stephen M. Williams,
and Jonathan E. Shuster.
Uiml: An appliance-independent xml user interface language.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
Today's Internet appliances feature user
interface technologies almost unknown a few
years ago: touch screens, styli,
handwriting and voice recognition, speech
synthesis, tiny screens, and more. This
richness creates problems. First. different
appliances use different languages: WML for
cell phones; SpeechML, JSML, and VoxML for
voice enabled devices such as phones; HTML and XUL for desktop computers, and
so on. Thus, developers must maintain multiple
source code families to deploy interfaces to one information system
on multiple appliances. Second, user interfaces
differ dramatically in complexity (e.g, PC versus cell phone
interfaces). Thus, developers must also manage
interface content. Third, developers
risk writing appliance-specific interfaces for
an appliance that might not be on the market
tomorrow. A solution is to build
interfaces with a single, universal language
free of assumptions about appliances and
interface technology. This paper
introduces such a language, the User Interface
Markup Language (UIML), an XML-compliant
language. UIML insulates the interface designer from the peculiarities
of different appliances through style sheets. A
measure of the power of UIML is that it can replace hand-coding of Java AWT
or Swing user interfaces.
|
|
|
Mark S. Ackerman.
Providing social interaction in the digital library.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document(12K)
.
Audience: Non-technical, digital library researchers/funders.
References: 13.
Links: 2.
Relevance: Low-medium.
Abstract: Argues that social aspects of collaboration must be included
in a Digital Library for the informal, organizational things that aren't always
available in information sources. Mentions a TCL based system called CAFE
that adds functionality of messages, bulletin boards, and talk.
|
|
|
Mark S. Ackerman and Roy T. Fielding.
Collection maintenance in the digital library.
In Proceedings of the Second Annual Conference on the Theory and
Practice of Digital Libraries, 1995.
Format: HTML Document(39K + pictures) .
Audience: Librarians, web masters.
References: 27.
Links: 2.
Relevance: Low.
Abstract: Discusses the problem of collection maintenance in the digital
domain, and argues that while some traditional practices will carry over, new
methods will have to be created, esp. for dynamic and informal resources. S
uggests that some maintenance can be done automatically by agents, and gives 2
examples: MOMSpider, which checks to make sure links are still current and
Web:Lookout which notifies user when interesting changes are made to a watched
page.
|
|
|
Michael J. Ackerman.
Accessing the visible human project.
D-Lib Magazine, October 1995.
Format: HTML Document(11K).
Audience: Medical professionals,.
References: 1.
Links: 5.
Relevance: None.
Abstract: Describes the Visible Human Project (1 mm
cross sections of two cadavers), how to obtain the
images, how large they are, what IP agreements need
to be signed.
|
|
|
R. Acuff, L. Fagan, T. Rindfleisch, B. Levitt, and P. Ford.
Lightweight, mobile e-mail for intra-clinic communication.
In Proceedings of the 1997 AMIA Annual Fall Symposium, pages
729-33, Oct 1997.
|
|
|
N. Adam, Y. Yesha, B. Awerbuch, K. Bennet, B. Blaustein, A. Brodsky, R. Chen,
O. Dogramaci, B. Grossman, R. Holowczak, J. Johnson K. Kalpakis, C. McCollum,
A.-L. Neches, B. Neches, A. Rosenthal, J. Slonim, H. Wactlar, and O. Wolfson.
Strategic directions in electronic commerce and digital libraries:
towards a digital agora.
ACM Computing Surveys, 28(4):818-35, December 1996.
The paper examines the research requirements of electronic
commerce and digital libraries in six key areas. It provides
case studies that describe three electronic commerce research
projects (USC-ISI, CommerceNet, First Virtual) and six
digital libraries projects sponsored by an NSF/ARPA/NASA
initiative. The paper focuses on the following common areas
of EC and DL research: acquiring and storing information;
finding and filtering information; securing information and
auditing access; universal access; cost management and
financial instruments; and socio-economic impact.
|
|
|
Anne Adams and Ann Blandford.
Digital libraries’ support for the user’s ‘information journey’.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
The temporal elements of users’ information requirements are a continually confounding aspect of digital library design. No sooner have users’ needs been identified and supported than they change. This paper evaluates the changing information requirements of users through their ‘information journey’ in two different domains (health and academia). In-depth analysis of findings from interviews, focus groups and observations of 150 users have identified three stages to this journey: information initiation, facilitation (or gathering) and interpretation. The study shows that, although digital libraries are supporting aspects of users’ information facilitation, there are still requirements for them to better support users’ overall information work in context. Users are poorly supported in the initiation phase, as they recognize their information needs, especially with regard to resource awareness; in this context, interactive press-alerts are discussed. Some users (especially clinicians and patients) also required support in the interpretation of information, both satisfying themselves that the information is trustworthy and understanding what it means for a particular individual.
|
|
|
Eytan Adar and Jeremy Hylton.
On-the-fly hyperlink creation for page images.
In Proceedings of the Second Annual Conference on the Theory and
Practice of Digital Libraries, 1995.
Format: HTML Document () .
Audience: Digital library researchers.
References: 9.
Links: 0.
Relevance: Low.
Abstract: Store pages as bitmaps, and retrieve a cite when user clicks
on
it, by doing OCR, then passing relevant line to library catalog, as 12 queries
of 3 words each (randomly selected from the line) and returning the best scoring
results. Somewhat robust to typos in cites, but not too slow.
|
|
|
Paul S. Adler and Terry Winograd, editors.
Usability : turning technologies into tools.
Oxford University Press, 1992.
|
|
|
Eugene Agichtein and Luis Gravano.
Snowball: Extracting relations from large plain-text collections.
In Proceedings of the Fifth ACM International Conference on
Digital Libraries, 2000.
Text documents often contain valuable structured data
that is hidden in regular English sentences. This data is best
exploited if available as a relational table that we could use for
answering precise queries or for running data mining tasks. We
explore a technique for extracting such tables from
document collections that requires only a handful of training examples
from users. these examples are used to generate extraction patterns,
that in turn result in new tuples being extracted from the document
collection. We build on this idea and present our Snowball system.
Snowball introduces novel strategies for generating patterns and
extracting tuples from plain-text documents. At each iteration of the
extraction process, Snowball evaluates the quality of these patterns
and tuples without human intervention, and keeps only the most
reliable ones for the next iteration. In this paper we also develop a
scalable evaluation methodology and metrics for our task, and present
a thorough experimental evaluation of Snowball and comparable techniques
over a collection for more than 300,000 newspaper documents.
|
|
|
Maristella Agosti, Nicola Ferro, and Nicola Orio.
Annotating illuminated manuscripts: an effective tool for research
and education.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
The aim of this paper is to report the research results of an ongoing project that deals with the exploitation of a digital archive of drawings and illustrations of historic documents for research and education purposes. According to the results on a study of user requirements, we designed tools to provide researchers with novel ways for accessing the digital manuscripts, sharing, and transferring knowledge in a collaborative environment. Annotations are proposed for making explicit the results of scientific research on the relationships between images belonging to manuscripts produced in a time span of centuries. For this purpose, a taxonomy for linking annotation is proposed, together with a conceptual schema for representing annotations and for linking them to digital objects.
|
|
|
Rakesh Agrawal, Tomasz Imielinski, and Arun Swami.
Mining association rules between sets of items in large databases.
In Proceedings of the International Conference on Management of
Data, pages 207-216. ACM Press, 1993.
|
|
|
Alfred Aho, John Hopcroft, and Jeffrey Ullman.
Data Structures and Algorithms.
Addison-Wesley, 1983.
|
|
|
T. Alanko, M. Kojo, M. Liljeberg, and K. Raatikainen.
Mowgli: improvements for internet applications using slow wireless
links.
In Waves of the Year 2000+ PIMRC '97. The 8th IEEE International
Symposium on Personal, Indoor and Mobile Radio Communications. Technical
Program, Proceedings (Cat. No.97TH8271), volume 3, pages 1038-42, 1997.
Modern cellular telephone systems extend the usability of
portable personal computers enormously. A nomadic user can be
given ubiquitous access to remote information stores and
computing services. However, the behavior of wireless links
creates severe inconveniences within the traditional data
communication paradigm. We give an overview of the problems
related to wireless mobility. We also present a new software
architecture for mastering the problems and discuss a new
paradigm for designing mobile distributed applications. The
key idea in the architecture is to place a mediator, a
distributed intelligent agent, between the mobile node and
the wireline network.
|
|
|
Reka Albert, Albert-Laszlo Barabasi, and Hawoong Jeong.
Diameter of the World Wide Web.
Nature, 401(6749), September 1999.
|
|
|
Alexa internet inc.
http://www.alexa.com.
|
|
|
R. B. Allen.
Interface issues for interactive multimedia documents.
In Advances in Digital Libraries '95, 1995.
Format: Not Yet Online.
|
|
|
Robert B. Allen.
Navigating and searching in hierarchical digital library catalogs.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (21K) .
Audience: non technical, users.
References: 15.
Links: 2.
Relevance: Low.
Abstract: Describes a particular user interface based on a book shelf
metaphor. Tries to use an a priori classification (Dewey Decimal System) as an
organization tool (in addition to results of electronic searches).
|
|
|
Robert B. Allen.
Two digital library intefaces which exploit hierarchical structure.
In DAGS '95, 1995.
Format: HTML Document(33K + pictures) .
Audience: General Computer scientists, HCI .
References: 22.
Links: 1.
Relevance: Low-Medium.
Abstract: Uses metaphor of hierarchical Dewey Decimal system or
faceted (implying a DAG) ACM literature categories to aid
UI. Shows graphically where in the hierarchy hits were
found for a search.
|
|
|
Robert B. Allen.
A query interface for an event gazetteer.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
We introduce the idea of an ``event gazetteer''
that stores and presents locations in time. Each event is coded as a
schema with attributes of event type, location, actor, and beginning
and ending times. Sets of events can be collected as timelines and the
events on these timelines can be linked by annotations. The system has
been built with JSP and Oracle. Systematic metadata is essential for
effective interaction with this system. For instance, the actors may be
described by the roles in which they participate. In this paper, we
focus on the construction of queries for this complex metadata.
Ultimately, we envision a flexible, broad-based service that is a
resource for users ranging from students to genealogists interested in
events.
|
|
|
Robert B. Allen.
A multi-timeline interface for historical newspapers.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
Events may be are best understood in the context of other events. Because of the temporal ordering, we can call a set of related events a timeline. Even such timelines are best understood in the context of other timelines. To facilitate the exploration of a collection of timelines and events, a visualization tool has been developed that structures the user's browsing. In this model, each event is accompanied by a text description and links to related resources. In particular, this system can provide a browsing interface of digitized historical newspapers.
|
|
|
Robert B. Allen and Jane Acheson.
Browsing the structure of multimedia stories.
In Proceedings of the Fifth ACM International Conference on
Digital Libraries, 2000.
Stories may be analyzed as sequences of causally-related events
and reactions to those events by the characters. We employ a notation of plot
elements, similar to one developed by Lehnert, and we extend that by forming
higher level story threads. This notation requires that events and
reactions be linked and that the chains of links be terminated back to the
beginning of the story. Furthermore, we have built a browser for the plot
elements, the story threads, and associated multimedia. We apply the browser
to Corduroy, a children's short feature which was analyzed in detail. We
provide additional illustrations with analysis of Kiss of Death, a Film
Noir classic. Effectively, the browser provides a framework for
interactive summaries of the narrative.
|
|
|
Open Mobile Alliance.
Wireless application protocol.
http://www.openmobilealliance.org/tech/affiliates/wap/wapindex.html#wap20,
2001.
The WAP Web site from where the specs are available.
|
|
|
Virgilio Almeida, Azer Bestavros, Mark Crovella, and Adriana de Oliveira.
Characterizing reference locality in the www.
In Proceedings of PDIS'96: The IEEE Conference on Parallel and
Distributed Information Systems, 1996.
|
|
|
Virgilio A.F. Almeida, Wagner Meira Jr., Vicotr F. Ribeiro, and Nivio Ziviani.
Efficiency analysis of brokers in the electronic marketplace.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
In this paper we analyze the behavior of e-commerce
users based on actual logs from two large non-English e-brokers.
We start by presenting a quantitative study of the behavior of e-brokers and
discuss the influence of regional and cultural
issues on them. We then discuss a model that
quantifies the efficiency of the results
provided by brokers in the electronic
marketplace. This model is a function of
factors such as server response time and
regional factors. Our findings clearly
indicate that e-commerce is strongly tied to
local language, national customs and
regulations, currency conversion and
logistics, and Internet infrastructure. We
found that the behavior of customers of online
bookstores is strongly affected
by brand and regional factors. Music CD
shoppers show a different behavior that might
stem from the fact that music is
universal and not so language dependent.
|
|
|
Altavista incorporated.
http://www.altavista.com.
|
|
|
Amazon inc.
http://www.amazon.com.
|
|
|
Jose-Luis Ambite and Craig A. Knoblock.
Reconciling distributed information sources.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
B. Amento, L. Terveen, and W. Hill.
Does authority mean quality? Predicting expert quality ratings of
web documents.
In Proceedings of the Twenty-Third Annual International ACM
SIGIR Conference on Research and Development in Information Retrieval. ACM,
2000.
evaluating different link based ranking techniques
|
|
|
Einat Amitay, Nadav Har'El, Ron Sivan, and Aya Soffer.
Web-a-where: geotagging web content.
In SIGIR '04: Proceedings of the 27th annual international
conference on Research and development in information retrieval, pages
273-280. ACM Press, 2004.
|
|
|
E. Amoroso.
Fundamentals of Computer Security Technology.
Prentice Hall, Englewood Cliffs, NJ., 1994.
|
|
|
H. Anan, X. Liu, K. Maly, M. Nelson, M. Zubair, J. C. French, E. Fox, and
P. Shivakumar.
Preservation and transition of ncstrl using an oai-based
architecture.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
NCSTRL (Networked Computer Science Technical Reference Library) is
a federation of digital libraries providing computer science materials. The
architecture of the original NCSTRL was based largely on the Dienst software.
It was implemented and maintained by the digital library group at Cornell
University until September 2001. At that time, we had an immediate goal of
preserving the existing NCSTRL collection and a long-term goal of providing a
framework where participating organizations could continue to disseminate
technical publications. Moreover, we wanted the new NCSTRL to be based on
OAI (Open Archives Initiative) principles that provide a framework to facilitate
the discovery of content in distributed archives. In this paper, we describe our
experience in moving towards an OAI-based NCSTRL.
|
|
|
Dan Ancona, Jim Frew, Greg Jan‰e, and Dave Valentine.
Accessing the alexandria digital library from geographic information
systems.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
We describe two experimental desktop library
clients that offer improved access to geospatial data via the
Alexandria Digital Library (ADL): ArcADL, an extension to ESRI's
ArcView GIS, and vtADL, an extension to the Virtual Terrain Project's
Enviro terrain visualization package. ArcADL provides a simplified user
interface to ADL's powerful underlying distributed geospatial search
technology. Both clients use the ADL Access Framework to access library
data that is available in multiple formats and retrievable by multiple
methods. Issues common to both clients and future scenarios are also
considered.
|
|
|
Kenneth M. Anderson, Aaron Andersen, Neet Wadhwani, and Laura M. Bartolo.
Metis: Lightweight, flexible, and web-based workflow services for
digital libraries.
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
The Metis project is developing workflow technology
designed for use in digital libraries by avoiding the assumptions
made by traditional workflow systems. In particular, digital
libraries have highly distributed sets of stakeholders who
nevertheless must work together to perform shared activities.
Hence, traditional assumptions that all members of a workflow belong
to the same organization, work in the same fashion, or have access to
similar computing platforms are invalid. The Metis approach makes use
of event-based workflows to support the distributed nature of digital
library workflow and employs techniques to make the resulting
technology lightweight, flexible, and integrated with the Web.
This paper describes the conceptual framework behind the Metis approach
as well as a prototype which implements the framework. The prototype
is evaluated based on its ability to model and execute a workflow drawn
from a real-world digital library. After describing related work, the
paper concludes with a discussion of future research opportunities in
the area of digital library workflow and outlines how Metis is being
deployed to a small set of digital libraries for additional evaluation.
|
|
|
R. Anderson and M. Kuhn.
Tamper resistance-a cautionary note.
In Proceedings of the Second USENIX Workshop on Electronic
Commerce, Berkeley, CA, USA, 1996. USENIX Assoc.
An increasing number of systems, from pay-TV to electronic
purses, rely on the tamper resistance of smartcards and other
security processors. We describe a number of attacks on such
systems some old, some new and some that are simply little
known outside the chip testing community. We conclude that
trusting tamper resistance is problematic; smartcards are
broken routinely, and even a device that was described by a
government signals agency as the most secure processor
generally available' turns out to be vulnerable. Designers of
secure systems should consider the consequences with care.
|
|
|
R. Anderson, C. Manifavas, and C. Sutherland.
Netcard - a practical electronic cash system.
In Fourth Cambridge Workshop on Security Protocols, 1996.
|
|
|
R.C. Angell, G.E. Freund, and P. Willett.
Automatic spelling correction using a trigram similarity measure.
Information Processing and Management, 19(4):255-261, 1983.
|
|
|
ANSI/NISO.
Information Retrieval: Application Service Definition and
Protocol Specification, April 1995.
Available at http://lcweb.loc.gov/z3950/agency/document.html.
|
|
|
Vinod Anupam, Alain Mayer, Kobbi Nissim, Benny Pinkas, and Michael K. Reiter.
On the security of pay-per-click and other web advertising schemes.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
We present a hit inflation attack on pay-per-
click Web advertising schemes. Our attack is
virtually impossible for the
program provider to detect conclusively,
regardless of whether the provider is a third-
party `ad network` or the target of
the click itself. If practiced widely, this
attack could accelerate a move away from pay-
per-click program, and toward
programs in which referrers are paid only if
the referred user subsequently makes a
purchase (pay-per-sale) or engages in
other substantial activity at the target site
(pay-per-lead). We also briefly discuss the
lack of auditability inherent in these
schemes.
|
|
|
Kyoichi Arai, Teruo Yokoyama, and Yutaka Matsushita.
A window sytems with leafing through mode: Bookwindow.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'92, 1992.
|
|
|
Avi Arampatzis, Marc van Kreveld, Iris Reinbacher, Paul Clough, Hideo Joho,
Mark Sanderson, Christopher B. Jones, Subodh Vaid, Marc Benkert, and
Alexander Wolff.
Web-based delineation of imprecise regions.
In Proceedings of the Workshop on Geographic Information
Retrieval, 2004.
|
|
|
Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, and Sriram
Raghavan.
Searching the web.
ACM Transactions on Internet Technology, 2001.
Submitted for publication. Available at
http://dbpubs.stanford.edu/pub/2000-37.
We offer an overview of current Web search engine
design. After introducing a generic search engine
architecture, we examine each engine component in
turn. We cover crawling, local Web page storage,
indexing, and the use of link analysis for boosting
search performance. The most common design and
implementation techniques for each of these components
are presented. We draw for this presentation from the
literature, and from our own experimental search engine
testbed. Emphasis is on introducing the fundamental
concepts, and the results of several performance
analyses we conducted to compare different designs.
|
|
|
William Y. Arms.
Key concepts in the architecture of the digital library.
D-Lib Magazine, Jul 1995.
Format: HTML Document(18K + pictures).
Audience: computer scientists, digital library
researchers.
References: 1.
Links: 3.
Relevance: Medium-low.
Abstract: Outlines 8 principles that are important to
DLs, a combination of social/economic issues (avoid
using words like ``copy'' and ``publish'') and
technical ones (basically a sales pitch for the
Kahn/Wilensky model of handles, maintenance, and
access control.)
|
|
|
William Y. Arms.
Key concepts in the architecture of the digital library.
D-Lib Magazine, July 1995.
|
|
|
R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell.
Webwatcher: A learning apprentice for the world wide web.
In AAAI Spring Symposium on Information Gathering, 1995.
We describe an information seeking assistant for the world
wide web. This agent, called WebWatcher, interactively helps users locate
desired information by employing learned knowledge about which hyperlinks
are likely to lead to the target information.
|
|
|
Robert Armstrong, Dayne Freitag, Thorsten Joachims, and Tom Mitchell.
Webwatcher: A learning apprentice for the world wide web.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
Kenneth Arnold.
The body in the virtual library: Rethinking scholarly communication.
In JEP.
Format: HTML Document (41K) .
Audience: Scholars, publishers (esp. university press), librarians.
References: 10.
Links: 1.
Relevance: Low-Medium.
Abstract: Discusess the future of university presses, in pretty grim
terms. Suggests that they lack the capital, staff, and quick reaction time to
survive in an electronic world. Considers the Mellon report on scholarly comm
unication (which suggests universities get copyrights on books their faculty
produce) unreasonable. Thinks that relying on commercial network providers
(esp. cable, telecom) would be disastrous. Advocates a non-profit distribution
ne
twork for scholarly publication.
|
|
|
Kenneth Arnold.
The electronic librarian is a verb/the electronic library is not a
sentence.
In JEP, 1994.
Format: HTML Document (49K) .
Audience: Librarians, policy makers.
References: 10.
Links: 1.
Relevance: low.
Abstract: A vision of the networked library. Sees the real value of
librarians as creating attention structures which anticipate the way clients
search.
|
|
|
Dennis S. Arnon.
Scrimshaw: a language for document queries and transformations.
Electronic Publishing: Origination, Dissemination and Design,
6(4):361-372, December 1993.
|
|
|
J. Ashley, M. Flickner, J. Hafner, D. Lee, W. Niblack, and D. Petkovic.
The query by image content (QBIC) system.
In Proceedings of the International Conference on Management of
Data (SIGMOD). ACM Press, 1995.
|
|
|
N. Asokan, P.A. Janson, M. Steiner, and M. Waidner.
The state of the art in electronic payment systems.
Computer, 30(9):28-35, September 1997.
The exchange of goods conducted face-to-face between two
parties dates back to before the beginning of recorded
history. Traditional means of payment have always had
security problems, but now electronic payments retain the
same drawbacks and add some risks. Unlike paper, digital
documents can be copied perfectly and arbitrarily often,
digital signatures can be produced by anybody who knows the
secret cryptographic key, and a buyer's name can be
associated with every payment, eliminating the anonymity of
cash. Without new security measures, widespread electronic
commerce is not viable. On the other hand, properly designed
electronic payment systems can actually provide better
security than traditional means of payments, in addition to
flexibility. This article provides an overview of electronic
payment systems, focusing on issues related to security.
|
|
|
Active Server Pages technology.
http://msdn.microsoft.com/workshop/server/asp/aspfeat.asp.
|
|
|
R. Atkinson, A. Demers, C. Hauser, C. Jacobi, P. Kessler, and M. Weiser.
Experiences creating a portable cedar.
SIGPLAN Not. (USA), SIGPLAN Notices, 24(7):322-8, 1989.
The authors have recently re-implemented the Cedar language
to
make it portable across many different architectures. The
strategy was, first, to use machine-dependent C code as an
intermediate language, second, to create a
language-independent layer known as the Portable Common
Runtime, and third, to write a relatively large amount of
Cedar-specific runtime code in a subset of Cedar itself. The
paper presents a brief description of the Cedar language, the
portability strategy for the compiler and runtime, the manner
of making connections to other languages and the Unix
operating system, and some performance measures of the
Portable Cedar.
|
|
|
Neal Audenaert, Richard Furuta, Eduardo Urbina, Jie Deng, Carlos Monroy, Rosy
Sáenz, and Doris Careaga.
Integrating collections at the cervantes project.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
Unlike many efforts that focus on supporting scholarly research by developing large-scale, general resources for a wide range of audiences, we at the Cervantes Project have chosen to focus more narrowly on developing resources in support of ongoing research about the life and works of a single author, Miguel de Cervantes Saavedra (1547-1616). This has lead to a group of hypertextual archives, tightly integrated around the narrative and thematic structure of Don Quixote. This project is typical of many humanities research efforts and we discuss how our experiences inform the broader challenge of developing resources to support humanities research.
|
|
|
Cyrus Azarbod and William Perrizo.
Building concept hierarchies for schema integration in hddbs using
incremental concept formation.
In B. Bhargava, T. Finin, and Y. Yesha, editors, CIKM 93.
Proceedings of the Second International Conference on Information and
Knowledge Management, pages 732-734, Washington, D.C., November 1993. ACM.
|
|
|
Sulin Ba, Aimo Hinkkanen, and Andre B. Whinston.
Digital library as a foundation for decision support systems.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (43K) .
Audience: Semi-technical, business slant, funding proposal.
References: 14.
Links: 1.
Relevance: Low.
Abstract: Sees a DL as an enterprise wide collection of
*executable* documents. SGML and Mathematica suggested
as integration tools. Search for data representation
which will allow automatic combination of separate
documents to solve problems.
|
|
|
D. Bachiochi, M. Berstene, E. Chouinard, N. Conlan, M. Danchak, T. Furey,
C. Neligon, and D. Way.
Usability studies and designing navigational aids for the world wide
web.
In Proceedings of the Sixth International World-Wide Web
Conference, 1997.
|
|
|
B. R. Badrinath.
Distributed computing in mobile environments.
Computers & Graphics, 20(5):615-17, 1996.
Rapid progress in hardware has led to the availability of
portable personal computers ranging from laptops to hand-held
computers (PDAs and Internet terminals). The presence of
wireless connectivity gives these hand-held units the
capability of accessing information anywhere, at any time.
These mobile units can be considered to be part of a
worldwide distributed information system. Distributed
computing in mobile environments faces new challenges as more
and more mobile hosts become an integral part of a
distributed system. Problems in distributed computing in
mobile environments are due to: (1) mobility, (2) wireless
and (3) resource constraints at the mobile host. In this
paper, we discuss the impact of these factors and research
issues that need to be addressed in mobile distributed
systems.
|
|
|
Ricardo Baeza-Yates and Berthier Ribeiro-Neto.
Modern Information Retrieval.
Addison-Wesley-Longman, May 1999.
The chapters of the book are:
Introduction
Modeling
Retrieval Evaluation
Query Languages (with Gonzalo Navarro)
Query Operations
Text and Multimedia Languages and Properties
Text Operations (with Nivio Ziviani)
Indexing and Searching (with Gonzalo Navarro)
Parallel and Distributed IR (by Eric Brown)
User Interfaces and Visualization (by Marti Hearst)
Multimedia IR: Models and Languages
(by Elisa Bertino, Barbara Catania and Elena Ferrari)
Multimedia IR: Indexing and Searching (by Christos Faloutsos)
Searching the Web
Libraries and Bibliographic Systems (by Edie Rasmussen)
Digital Libraries (by Edward Fox and Ohm Sornil)
Appendix: Porter's Algorithm
Glossary
References (more than 800)
Index
More information can be found in:
http://www.sims.berkeley.edu/ hearst/irbook
|
|
|
David Bainbridge, Craig G. Nevill-Manning, Ian H. Witten, Lloyd A. Smith, and
Rodger J. McNab.
Towards a digital library of popular music.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
Digital libraries of music have the potential to capture popular
imagination in ways that more scholarly libraries cannot.
we are working towards a comprehensive digital library of
musical material, including popular music. We have developed
new ways of collecting musical material, accessing it
through searching and browsing, and presenting the results
to the user. We work with different representations of music:
facsimile images of scores, the internal representation of a
music editing program, page images typeset by a music editor,
MIDI files, audio files representing sung user input, and
textual metadata such as title, composer and arranger, and
lyrics. This paper describes a comprehensive suite of tools that we
have built for this project. These tools gather musical material,
convert between many of these representations, allow
searching based on combined musical and textual criteria,
and help present the results of searching and browsing. Although
we do not yet have a single fully-blown digital music
library, we have built several exploratory prototype collections
of music, some of them very large (100,000 tunes), and
critical components of the system have been evaluated.
|
|
|
David Bainbridge, John Thompson, and Ian H. Witten.
Assembling and enriching digital library collections.
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
People who create digital libraries need to gather together the raw material,
add metadata as necessary, and design and build new collections. This paper sets out the
requirements for these tasks and describes a new tool that supports them interactively, making
it easy for users to create their own collections from electronic files of all types. The process
involves selecting documents for inclusion, coming up with a suitable metadata set, assigning
metadata to each document or group of documents, designing the form of the collection in terms
of document formats, searchable indexes, and browsing facilities, building the necessary indexes
and data structures, and putting the collection in place for others to use. All these tasks are
supported within a modern point-and-click interaction paradigm. Although the tool is specific to the
Greenstone digital library software, the underlying ideas should prove useful in more general contexts.
|
|
|
M. Baker.
Changing communication environments in mosquitonet.
In Proceedings of the IEEE Workshop on Mobile Computing Systems
and Applications, Dec 1994.
|
|
|
M. Baker, X. Zhao, S. Cheshire, and J. Stone.
Supporting mobility in mosquitonet.
In Proceedings of the 1996 USENIX Conference, Jan 1996.
|
|
|
Scott Baker and John H. Hartman.
The gecko nfs web proxy.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
The World-Wide Web provides remote access to
pages using its own naming scheme (URLs).
transfer protocol (HTTP),
and cache algorithms. Not only does using these
special-purpose mechanisms have performance
implications, but they
make it impossible for standard Unix
applications to access the Web. Gecko is a
system that provides access to the Web
via the NFS protocol. URLs are mapped to Unix
file names, providing unmodified applications
access to Web pages; pages
are transferred from the Gecko server to the
clients using NFS instead of HTTP.
significantly improving performance; and
NFS's cache consistency mechanism ensures that
all clients have the same version of a page.
Applications access pages as
they would Unix files. A client-side proxy
translates HTTP requests into file accesses,
allowing existing Web applications
to use Gecko. Experiments performed on our
prototype show that Gecko is able to provide
this additional functionality at a
performance level that exceeds that of HTTP.
|
|
|
Scott M. Baker and Bongki Moon.
Distributed cooperative web servers.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
Traditional techniques for a distributed web
server design rely on manipulation of
central resources, such as routers or
DNS services, to distribute requests
designated for a single IP address to
multiple web servers. The goal of the
distributed
cooperative Web server (DCWS) system
development is to explore application-level
techniques for distributing web
content. We achieve this by dynamically
manipulating the hyperlinks stored within
the web documents themselves. The
DCWS system effectively eliminates the
bottleneck of centralized resources, while
balancing the load among distributed
web servers. DCWS servers may be located in
different networks, or even different
continents and still balance load
effectively. DCWS system design is fully
compatible with existing HTTP protocol
semantics and existing web client
software products.
|
|
|
M. Balabanovic and Y. Shoham.
Learning information retrieval agents: Experiments with automated web
browsing.
In AAAI spring symposium on Information Gathering, 1995.
The current exponential growth of the Internet precipitates
a need for new tools to help people cope with the volume of
information. To complement recent work on creating searchable indexex of
the World-Wide Web and systems for filtering incoming e-mail and Usenet
news articles, we describe a system which helps users keep abreast of new
and interesting information. Every day it presents a selection of
interesting web pages. The user evaluates each page, and given this
feedback the system adapts and attempts to produce better pages the
following day. We prsent some early results from an AI programming class
to whom this was set as a project, and then describe our current
implementation. Over the course of 24 days the output of our system was
compared to both randomly-selected and human-selected pages. It
consistently performed better than the random pages, and was better than
the human-selected pages half of the time.
|
|
|
M. Balabanovic and Y. Shoham.
Fab: content-based collaborative recommendation.
Communications of the ACM, 40(3):66-72, March 1997.
Online readers are in need of tools to help them cope with the
mass of content that is available on the World Wide Web. In
traditional media, readers are provided assistance in making
selections. This includes both implicit assistance in the
form of editorial oversight and explicit assistance in the
form of recommendation services such as movie reviews and
restaurant guides. The electronic medium offers new
opportunities to create recommendation services, ones that
adapt over time to track users' evolving interests. Fab is
such a recommendation system for the Web, and has been
operational in several versions since December 1994. By
combining both collaborative and content-based filtering
systems, Fab may eliminate many of the weaknesses found in
each approach.
|
|
|
M. Balabanovic, Y. Shoham, and Y. Yun.
An adaptive agent for automated web browsing.
Journal of Visual Communication and Image Representation, 6(4),
December 1995.
|
|
|
Marko Balabanovic.
An adaptive web page recommendation service.
In Proceedings of the First International Conference on
Autonomous Agents p. 378-385, February 1997.
|
|
|
Marko Balabanovic.
Exploring versus exploiting when learning user models for text
recommendation.
User Modeling and User-Adapted Interaction (to appear), 8(1),
1998.
|
|
|
Marko Balabanovic.
An interface for learning multi-topic user profiles from implicit
feedback.
Technical Report SIDL-WP-1998-0089, Stanford University, 1998.
|
|
|
Marko Balabanovic.
The ``slider'' interface.
IBM interVisions, 11, February 1998.
|
|
|
Marko Balabanovic, Lonny L. Chu, and Gregory J. Wolff.
Storytelling with digital photographs.
In CHI '00: Proceedings of the SIGCHI conference on Human
factors in computing systems, pages 564-571, New York, NY, USA, 2000. ACM
Press.
|
|
|
Marko Balabanovic and Yoav Shoham.
Learning inforamtion retrieval agents: Experiments with automated web
browsing.
In Proceedings of the AAAI Spring Symposium on Information
Gathering from Heterogenous, Distributed Resources, 1995.
Format: Compressed PostScript
|
|
|
Marko Balabanovic and Yoav Shoham.
Combining content-based and collaborative recommendation.
Communications of the ACM, 40(3), March 1997.
|
|
|
Marko Balabanovic, Yoav Shoham, and Yeogirl Yun.
An adaptive agent for automated web browsing.
Journal of Visual Communication and Image Representation, 6(4),
December 1995.
you give agent profile. It looks at the Web for things of
interest and reports back. You give feedback
|
|
|
Michelle Baldonado.
Searching, browsing, and metasearching with sensemaker.
Web Techniques Magazine, May 1997.
|
|
|
Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke.
Metadata for digital libraries: Architecture and design rationale.
Technical Report SIDL-WP-1997-0055; 1997-26, Stanford University,
1997.
Accessible at http://dbpubs.stanford.edu/pub/1997-26.
In a distributed, heterogeneous, proxy-based digital
library, autonomous services and collections are accessed
indirectly via proxies. To facilitate metadata
compatibility and interoperability in such a digital
library, we have designed a metadata architecture that
includes four basic component classes: attribute model
proxies, attribute model translators, metadata facilities
for search proxies, and metadata repositories. Attribute
model proxies elevate both attribute sets and the
attributes they define to first-class objects. They also
allow relationships among attributes to be captured.
Attribute model translators map attributes and attribute
values from one attribute model to another (where
possible). Metadata facilities for search proxies provide
structured descriptions both of the collections to which
the search proxies provide access and of the search
capabilities of the proxies. Finally, metadata repositories
accumulate selected metadata from local instances of the
other three component classes in order to facilitate global
metadata queries and local metadata caching. In this paper,
we outline further the roles of these component classes,
discuss our design rationale, and analyze related work.
|
|
|
Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke.
Metadata for digital libraries: Architecture and design rationale.
In Proceedings of the Second ACM International Conference on
Digital Libraries, pages 47-56, 1997.
At http://dbpubs.stanford.edu/pub/1997-26.
In a distributed, heterogeneous, proxy-based digital
library, autonomous services and collections are accessed
indirectly via proxies. To facilitate metadata compatibility
and interoperability in such a digital library, we have
designed a metadata architecture that includes four basic
component classes: attribute model proxies, attribute model
translators, metadata facilities for search proxies, and
metadata repositories. Attribute model proxies elevate both
attribute sets and the attributes they define to first-class
objects. They also allow relationships among attributes to
be captured. Attribute model translators map attributes and
attribute values from one attribute model to another (where
possible). Metadata facilities for search proxies provide
structured descriptions both of the collections to which the
search proxies provide access and of the search capabilities
of the proxies. Finally, metadata repositories accumulate
selected metadata from local instances of the other three
component classes in order to facilitate global metadata
queries and local metadata caching. In this paper, we
outline further the roles of these component classes,
discuss our design rationale, and analyze related work.
|
|
|
Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke.
The Stanford Digital Library metadata architecture.
International Journal of Digital Libraries, 1(2), February
1997.
See also http://dbpubs.stanford.edu/pub/1997-56.
|
|
|
Michelle Baldonado, Steve Cousins, B. Lee, and Andreas Paepcke.
Notable: An annotation system for networked handheld devices.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'99, pages 210-211, 1999.
|
|
|
Michelle Baldonado, Seth Katz, Andreas Paepcke, Chen-Chuan K. Chang, Hector
Garcia-Molina, and Terry Winograd.
An extensible constructor tool for the rapid, interactive design of
query synthesizers.
In Proceedings of the Third ACM International Conference on
Digital Libraries, 1998.
Accessible at http://dbpubs.stanford.edu/pub/1998-48.
We describe an extensible constructor tool that helps
information experts (e.g., librarians) create
specialized query synthesizers for heterogeneous
digital-library environments. A query synthesizer
provides a graphical user interface in which a
digital-library patron can specify a high-level,
fielded, multi-source query. Furthermore, a query
synthesizer interacts with a query translator and an
attribute translator to transform high-level queries
into sets of source-specific queries. We discuss how
the constructor can facilitate discovery of available
attributes (e.g., title), collation of schemas from
different sources, selection of input widgets for a
synthesizer (e.g., a text box or a drop-down list
widget to support input of controlled vocabulary), and
other design aspects. We also describe a prototype
constructor we implemented, based on the Stanford
InfoBus and metadata architecture.
|
|
|
Michelle Q Wang Baldonado and Steve B. Cousins.
Addressing heterogeneity in the networked information environment.
New Review of Information Networking, 2:83-102, 1996.
Several ongoing Stanford University Digital Library projects
address the issue of
heterogeneity in networked information environments. A networked
information
environment has the following components: users, information repositories,
information
services, and payment mechanisms. This paper describes three of the
heterogeneity-focused Stanford projects-InfoBus, REACH, and DLITE. The
InfoBus
project is at the protocol level, while the REACH and DLITE projects are
both at the
conceptual model level. The InfoBus project provides the infrastructure
necessary for
accessing heterogeneous services and utilizing heterogeneous payment
mechanisms. The
REACH project sets forth a uniform conceptual model for finding information
in
networked information repositories. The DLITE project presents a general
task-based
strategy for building user interfaces to heterogeneous networked
information services.
|
|
|
Michelle Q Wang Baldonado and Terry Winograd.
Techniques and tools for making sense out of heterogeneous search
service results.
Technical Report SIDL-WP-1995-0019; 1995-59, Stanford University,
1995.
|
|
|
Michelle Q Wang Baldonado and Terry Winograd.
A user interaction model for browsing based on category-level
operations.
Technical Report SIDL-WP-1996-0029; 1996-75, Stanford University,
1996.
We propose a user interaction model for browsing based on itera
tive category-level
operations. The motivation comes from two observations: 1) people naturally
think in terms
of categories, and 2) in browsing, the types of categories that are salient
to users change as
they browse. We define a set of category-level operations that lets users
iteratively view
and find results in terms of these changing category types. We also show
that we can
express some standard IR operations as iteratively applied sequences of a
funda mental
category-level operation (thus unifying them). Finally, we describe
SenseMaker, a
prototype interface for browsing heteroge neous sources.
|
|
|
Michelle Q Wang Baldonado and Terry Winograd.
SenseMaker: An information-exploration interface supporting the
contextual evolution of a user's interests.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'97, pages 11-18, Atlanta, Ga., March 1997. ACM Press, New York.
|
|
|
Sujata Banerjee and Vibhu O. Mittal.
On the use of linguistic ontologies for accessing and indexing
distributed digital libraries.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document ()
.
Audience: Non-technical, on-line searchers.
References: 16.
Links: 1.
Relevance: Low.
Abstract: Addresses problem of finding correct keywords to search for
by using WordNet. If a search doesn't turn up the hits needed, it modifies
query by using synonyms, generalizing, or replacing with a set of more specific
words. Searcher is asked to approve modified queries, which are then re-sent to
content providers.
|
|
|
Gaurav Banga, Fred Douglis, and Michael Rabinovich.
Optimistic deltas for www latency reduction.
In Proceedings of USENIX Technical Conference, pages 289-303,
1997.
|
|
|
Ziv Bar-Yossef, Alexander Berg, Steve Chien, and Jittat Fakcharoenphol Dror
Weitz.
Approximating aggregate queries about web pages via random walks.
In Proceedings of the Twenty-sixth International Conference on
Very Large Databases, 2000.
|
|
|
Ziv Bar-Yossef, Andrei Z. Broder, Ravi Kumar, and Andrew Tomkins.
Sic transit gloria telae: towards an understanding of the web's
decay.
In WWW '04: Proceedings of the 13th international conference on
World Wide Web, pages 328-337, New York, NY, USA, 2004. ACM Press.
The rapid growth of the web has been noted and tracked
extensively. Recent studies have however documented
the dual phenomenon: web pages have small half
lives, and thus the web exhibits rapid death as
well. Consequently, page creators are faced with an
increasingly burdensome task of keeping links
up-to-date, and many are falling behind. In addition
to just individual pages, collections of pages or
even entire neighborhoods of the web exhibit
significant decay, rendering them less effective as
information resources. Such neighborhoods are
identified only by frustrated searchers, seeking a
way out of these stale neighborhoods, back to more
up-to-date sections of the web; measuring the decay
of a page purely on the basis of dead links on the
page is too naive to reflect this frustration. In
this paper we formalize a strong notion of a decay
measure and present algorithms for computing it
efficiently. We explore this measure by presenting a
number of validations, and use it to identify
interesting artifacts on today's web. We then
describe a number of applications of such a measure
to search engines, web page maintainers,
ontologists, and individual users.
|
|
|
Albert-Laszlo Barabasi and Reka Albert.
Emergence of scaling in random networks.
Science, 286(5439):509-512, October 1999.
|
|
|
David Bargeron, Anoop Gupta, Jonathan Grudin, and Elizabeth Sanocki.
Annotations for streaming video on the web: System design and usage
studies.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
Streaming video on the World Wide Web is being widely
deployed, and workplace training and distance education
are key applications. The ability to annotate video on
the Web can provide significant added value in these and
other areas. Written and spoken annotations can provide
`in context' personal notes and can enable asynchronous
collaboration among groups of users. With annotations,
users are no longer limited to viewing content passively
on the Web, but are free to add and share commentary and
links, thus transforming the Web into an interactive
medium. We discuss design considerations in constructing
a collaborative video annotation system, and we
introduce our prototype, called MRAS. We present
preliminary data on the use of Web- based annotations
for personal note-taking and for sharing notes in a
distance education scenario, Users showed a strong
preference for MRAS over pen-and-paper for taking notes,
despite taking longer to do so. They also indicated that
they would make more abstract and questions with MRAS
than in a `live' situation, and that sharing added
substantial value.
|
|
|
Bruce R. Barkstrom, Melinda Finch, Michelle Ferebee, and Calvin Mackey.
Adapting digital libraries to continual evolution.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
In this paper, we describe five investment streams (data
storage infrastructure, knowledge management, data production control,
data transport and security, and personnel skill mix) that need to be balanced
against short-term operating demands in order to maximize the probability of
long-term viability of a digital library. Because of the rapid pace of
information technology change, a digital library cannot be a static institution.
Rather, it has to become a flexible organization adapted to continuous evolution
of its infrastructure.
|
|
|
Kobus Barnard, Pinar Duygulu, David Forsyth, Nando de Freitas, David M. Blei,
and Michael I. Jordan.
Matching words and pictures.
J. Mach. Learn. Res., 3:1107-1135, 2003.
We present a new approach for modeling multi-modal data sets,
focusing on the specific case of segmented images with associated text.
Learning the joint distribution of image regions and words has many applications.
We consider in detail predicting words associated with whole images
(auto-annotation) and corresponding to particular image regions (region
naming). Auto-annotation might help organize and access large collections
of images. Region naming is a model of object recognition as a process of
translating image regions to words, much as one might translate from one
language to another. Learning the relationships between image regions and
semantic correlates (words) is an interesting example of multi-modal data
mining, particularly because it is typically hard to apply data mining
techniques to collections of images. We develop a number of models for the
joint distribution of image regions and words, including several which
explicitly learn the correspondence between regions and words. We study
multi-modal and correspondence extensions to Hofmann's hierarchical
clustering/aspect model, a translation model adapted from statistical
machine translation (Brown et al.), and a multi-modal extension to mixture
of latent Dirichlet allocation (MoM-LDA). All models are assessed using a
large collection of annotated images of real scenes. We study in depth the
difficult problem of measuring performance. For the annotation task, we
look at prediction performance on held out data. We present three alternative
measures, oriented toward different types of task. Measuring the performance
of correspondence methods is harder, because one must determine whether
a word has been placed on the right region of an image. We can use annotation
performance as a proxy measure, but accurate measurement requires hand labeled
data, and thus must occur on a smaller scale. We show results using both an
annotation proxy, and manually labeled data.
|
|
|
Kobus Barnard and David .A. Forsyth.
Learning the semantics of words and pictures.
In Proceedings of the IEEE International Conference on Computer
Vision, July 2001.
|
|
|
Rob Barrett, Paul P. Maglio, and Daniel C. Kellem.
How to personalize the web.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'97, 1997.
|
|
|
Laura M. Bartolo, Cathy S. Lowe, Adam C. Powell IV, Donald R. Sadoway, Jorges
Vieyra, and Kyle Stemen.
Use of matml with software applications for e-learning.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
This pilot project investigates facilitating the
development of the Semantic Web for e-learning through a practical
example, using Materials Property Data Markup Language (MatML) to
provide materials property data to a web-based application program.
Property data for 100 materials is marked up with MatML and used as an
input format for an application program. Students use the program to
generate graphs showing selected properties for different materials.
Selected graphs are submitted to the Materials Digital Library (MatDL)
so that successive classes may be informed by earlier work to encourage
new discoveries.
|
|
|
C. Batini, M. Lenzerini, and S. Navathe.
A comparative analysis of methodologies for database schema
integration.
ACM Computing Surveys, 18(4), 1986.
|
|
|
Patrick Baudisch and Ruth Rosenholtz.
Halo: a technique for visualizing off-screen objects.
In CHI '03: Proceedings of the SIGCHI conference on Human
factors in computing systems, pages 481-488, New York, NY, USA, 2003. ACM
Press.
|
|
|
E. Bauer, D. Koller, and Y. Singer.
Update rules for parameter estimation in Bayesian networks.
In Proceedings of the 13th Annual Conference on Uncertainty in
AI (UAI), 1997.
|
|
|
M. Bearman.
Odp-trader.
Open Distributed Processing, 2:19 - 33, 1994.
|
|
|
Herb Becker.
The role of the library of congress in the national digital library.
In Proceedings of DL'96, 1996.
Format: Not yet online.
|
|
|
Benjamin B. Bederson.
Photomesa: a zoomable image browser using quantum treemaps and
bubblemaps.
In Proceedings of the 14th annual ACM symposium on User
interface software and technology, pages 71-80. ACM Press, 2001.
|
|
|
Benjamin B. Bederson, Ben Shneiderman, and Martin Wattenberg.
Ordered and quantum treemaps: Making effective use of 2D space to
display hierarchies.
ACM Transactions on Graphics, 21(4):833-854, 2002.
|
|
|
Doug Beeferman, Adam Berger, and John D. Lafferty.
Statistical models for text segmentation.
Machine Learning, 34(1-3):177-210, 1999.
|
|
|
Alireza Behreman.
Generic electronic payment services.
In The Second USENIX Workshop on Electronic Commerce
Proceedings, 1996.
|
|
|
Alireza Behreman and Rajkumar Narayanaswamy.
Payment method negotiation service.
In The Second USENIX Workshop on Electronic Commerce
Proceedings, 1996.
|
|
|
M. Beigl and R. Rudisch.
System support for mobile computing.
Computers & Graphics, 20(5):619-625, 1996.
Today a mobile user wants to connect his portable
computer: remotely to the central database at home,
locally to the printer on the spot and globally to
the world-wide-web. To achieve this, different
connection lines are available: wireless networks
for connecting out in the fields, ISDN or analogue
telephone lines when residing in a hotel, Ethernet
access at the customer's site. But this
connectivity raises a lot of questions, about
technical, security or accounting issues. This
paper presents the architecture of an environment
aiming to support mobile users and dealing with the
given problems.
|
|
|
N.J. Belkin and W. Bruce Croft.
Information filtering and information retrieval: two sides of same
coin?
Communications of the ACM, 35(12):29-38, December 1992.
A comparison is made between information retrieval and
information filtering. The authors determine that information
filtering is a well defined process. By examining its
foundations and comparing it to the foundations of the IR
enterprise, the authors find there is very little difference
between filtering and retrieval at an abstract level. They
conclude that the two enterprises have the same goal; namely
they are both concerned with getting information to people
who need it. However, the authors emphasize that IR research
has ignored some aspects of the general problem which both IR
and information filtering address, and that these aspects are
precisely those which especially relevant to the specific
contexts of filtering.
|
|
|
Timothy C. Bell, Alistair Moffat, and Ian H. Witten.
Compressing the digital library.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (32K) .
Audience: Semi-technical, general computer scientists.
References: 8.
Links: 1.
Relevance: Medium (but not mainstream DL).
Abstract: Discusses the interaction of compression and indexing.
Suggests a Huffman encoding applied to words & non-words. Inverted bitmap for
indexing, enhanced with Golomb encoding. Compressed 266 Mb Wall Street Journal
a
rticle
database by 50including creating the index. Queries were processed in less than .1 sec.
|
|
|
M. Bellare, J.A. Garay, R. Hauser, A. Herzberg, H. Krawczyk, M. Steiner,
G. Tsudik, and M. Waidner.
ikp-a family of secure electronic payment protocols.
In Proceedings of the First USENIX Workshop of Electronic
Commerce, Berkeley, CA, USA, 1995. USENIX Assoc.
This paper proposes a family of protocols-iKP (i=1,2,3)-for
secure electronic payments over the Internet. The protocols
implement credit card-based transactions between the customer
and the merchant while using the existing financial network
for clearing and authorization. The protocols can be extended
to apply to other payment models, such as debit cards and
electronic checks. They are based on public-key cryptography
and can be implemented in either software or hardware.
Individual protocols differ in key management complexity and
degree of security. It is intended that their deployment be
gradual and incremental. The iKP protocols are presented
herein with the intention to serve as a starting point for
eventual standards on secure electronic payment.
|
|
|
Jezekiel Ben-Arie, Purvin Pandit, and ShyamSundar Rajaram.
Design of a digital library for human movement.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
This paper is focused on a central aspect in the design of our
planned digital library for human movement, i.e. on the aspect of representation
and recognition of human activity from video data. The method of representation
is important since it has a major impact on the design of all the other building
blocks of our system such as the user interface/query block or the activity
recognition/storage block. In this paper we evaluate a representation method
for human movement that is based on sequences of angular poses and angular
velocites of the human skeletal joints, for storage and retrieval of human
actions in video databases. The choice of a representation method plays an
important role in the database structure, search methods, storage efficiency
etc.. For this representation, we develop a novel approach for complex human
activity recognition by employing multidimensional indexing combined with
temporal or sequential correlation. this scheme is then evaluated with respect
to its efficiency in storage and retrieval.
For the indexing we use postures of humans in videos that are decomposed into
a set of multidimensional tuples which represent the poses/velocities of human
body parts such as arms, legs and torso. Three novel methods for human activity
recognition are theoretically and experimentally compared. The methods require
only a few sparsely sampled human postures. We also achieve speed invariant
recognition of activities by eliminating the time factor and replacing it with
sequence information. The indexing approach also provides robust recognition
and an efficient storage/retrieval of all the activities in a small set of hash
tables.
|
|
|
Israel Ben-Shaul, Michael Herscovici, Michal Jacovi, Yoelle S. Maarek, Dan
Pelleg, Menachem Shtalhaim, Vladimir Soroka, and Sigalit Ur.
Adding support for dynamic and focused search with fetuccino.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
This paper proposes two enhancements to
existing search services over the Web. One
enhancement is the addition
of limited dynamic search around results
provided by regular Web search services, in
order to correct part of the
discrepancy between the actual Web and its
static image as stored in search repositories.
The second enhancement is
an experimental two-phase paradigm that allows
the user to distinguish between a domain query
and a focused query
within the dynamically identified domain. We
present Fetuccino, an extension of the
Mapuccino system that implements
these two enhancements. Fetuccino provides an
enhanced user-interface for visualization of
search results, including
advanced graph layout, display of structural
information and support for standards (such as
XML). While Fetuccino
has been implemented on top of existing search
services, its features could easily be
integrated into any search engine
for better performance. A light version of
Fetuccino is available on the Internet at
http://www.ibm.com/java/fetuccino.
|
|
|
Israel Ben-Shaul, Michael Herscovici, Michal Jacovi, Yoelle S. Maarek, Dan
Pelleg, Menachem Shtalhaim, Vladimir Soroka, and Sigalit Ur.
Adding support for dynamic and focused search with fetuccino.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
|
|
|
Tamara L. Berg, Alexander C. Berg, Jaety Edwards, Michael Maire, Ryan White,
Yee-Whye Teh, Erik Learned-Miller, and D.A. Forsyth.
Names and faces in the news.
In CVPR 2004: Conference on Computer Vision and Pattern
Recognition. IEEE Computer Society, 2004.
|
|
|
Donna Bergmark.
Collection synthesis.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
The invention of the hyperlink and the HTTP transmission protocol
caused an amazing new structure to appear on the Internet - the World Wide Web.
With the Web, there came spiders, robots, and Web crawlers, which go from one
link
to the next checking Web health, ferreting out information and resources, and
imposing organization on the huge collection of information (and dross)
residing on the net. This paper reports on the use of one such crawler to
synthesize document collections on various topics in science, mathematics,
engineering and technology. Such collections could be part of a digital
library.
|
|
|
Howard Besser.
Mesl project description.
In Proceedings of DL'96, 1996.
Format: Not yet online.
|
|
|
Krishna Bharat and Andrei Broder.
Mirror, mirror on the web: A study of host pairs with replicated
content.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
TWO previous studies. one done at Stanford in 1997 based on data
collected by the Google
search engine, and one done at Digital in 1996 based on AltaVista data,
revealed that almost a third of the Web consists of duplicate pages. Both
studies
identified mirroring, that is, the systematic
replication of content over a pair of hosts, as
the principal cause of duplication, but did not further investigate this
phenomenon. The main aim of this paper is to
present a clearer picture of mirroring
on the Web. As input we used a set of 179
million URLs found during a Web crawl done in
the summer of 1998. We looked at all hosts with more than 100 URLs in
our input (about 238,000), and discovered that
about 10the prevalence of mirroring based on a
mirroring classification scheme that we define. There are numerous reasons for
mirroring: technical (e.g., to improve access
time), commercial (e.g., different intermediaries offering the same products),
cultural (e.g., same content in two languages),
social (e.g.. sharing of research data). and so forth. Although we have not done
a exhaustive study of the causes of replication, we discuss and provide
examples for several representative cases. Our
technique for detecting mirrored hosts from
large sets of collected URLs depends mostly on the syntactic analysis of URL
strings, and requires retrieval and content
analysis only for a small number of pages. We are able to detect both
partial and total mirroring, and handle cases
where the content is not byte-wise identical. Furthermore, our technique is
computationally very efficient and does not
assume that the initial set of URLs gathered from each host is comprehensive.
Hence, this approach has practical uses beyond our study, and can be applied in
other settings. For instance, for Web crawlers
and caching proxies, detecting mirrors can be
valuable to avoid redundant fetching. and knowledge of mirroring can be
used to compensate for broken links.
|
|
|
Krishna Bharat, Andrei Broder, Monika Henzinger, Puneet Kumar, and Suresh
Venkatasubramanian.
The connectivity server: Fast access to linkage information on the
web.
In Proceedings of the Seventh International World-Wide Web
Conference, April 1998.
|
|
|
B. Bhushan et al.
Managing heterogeneous networks-integrator-based approach.
In IFIP Transactions C (Communication Systems), 1993.
The authors discuss an object oriented approach to
network management. Their goal is to briefly explain
a real example of an integrated network management
(INM) system. One of the major requirements when
looking at information transfer between the managed
network and the management system is to mask the
heterogeneity of the underlying resources. As an
example of the unification of heterogeneity
networks, a software called the Integrator has been
designed and implemented. The Integrator is a
mechanism that provides an object oriented interface
to the user (human or network management application
programs) to offer a homogeneous view of a world
(set of heterogeneous domains) through a model
(depicting a formal information view). The
Integrator uses two agents to communicate with
underlying network elements: an SNMP agent accessing
TCP/IP parameters for an Ethernet network through a
SNMP agent, and an X.25 interface program doing the
same for X.25 parameters through proprietary
management software. The concepts of the Integrator
has been applied in the EC project PEMMON
|
|
|
Timothy W. Bickmore and Bill N. Schilit.
Digestor: Device-independent access to the world wide web.
In Proceedings of the Sixth International World-Wide Web
Conference, 1997.
|
|
|
Eric Bier, Lance Good, Kris Popat, and Alan Newberger.
A document corpus browser for in-depth reading.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
Software tools, including Web browsers, e-books,
electronic document formats, search engines, and digital libraries are
changing the way that people read, making it easier for them to find
and view documents. However, while these tools provide significant help
with short-term reading projects involving small numbers of documents,
they fall short of supporting readers engaged in longer-term reading
projects, in which a topic is to be understood in-depth by reading many
documents. Such readers need to find and manage many documents and
citations, remember what they have read, and prioritize what to read
next. In this paper, we describe three integrated software tools that
facilitate in-depth reading. A first tool extracts citation information
from documents. A second finds on-line documents from their citations.
The last is a document corpus browser that uses a zoomable user
interface to show a corpus at multiple granularities while supporting
reading tasks that take days, weeks, or longer. We describe these tools
and the design principles that motivated them.
|
|
|
Eric A. Bier and Adam Perer.
Icon abacus and ghost icons.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
We present two techniques that make document collection visualizations more informative. Icon abacus uses the horizontal position of icon groups to communicate document attributes. Ghost icons show linked documents by adding temporary icons and by highlighting or dimming existing ones.
|
|
|
William P. Birmingham.
An agent-based architecture for digital libraries.
D-Lib Magazine, July 1995.
Format: HTML Document().
|
|
|
William P. Birmingham, Karen M. Drabenstott, Carolyn O. Frost, Amy J. Warner,
and Katherine Willis.
The university of michigan digital library: This is not your father's
library.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (36K)
.
Audience: slightly technical, generalist comfortable with technology,
funders.
References: 13.
Links: 1.
Relevance: Medium-High.
Abstract: Describes the UMichigan Digital Libraries proposal, including
some detail about their agent architecture. User agents, Collection-interface
agents, and mediators all play a role. Network resources are allocated on
a market-based mechanism, and proposal mentions need to protect intellectual
property & handle payment issues.
|
|
|
William P. Birmingham, Edmund H. Durfee, Tracy Mullen, and Michael P. Wellman.
The distributed agent architecture of the university of michigan
digital library (extended abstract).
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
Ann Peterson Bishop.
Working towards an understanding of digital library use: A report on
the user research efforts of the nsf/arpa/nasa dli projects.
D-Lib Magazine, October 1995.
Format: HTML Document().
|
|
|
Ann Peterson Bishop.
Making digital libraries go: Comparing use across genres.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
A new federal initiative called Information Technology for the
Twenty-First Century (IT2) recognizes the need to bridge
research across domains in or&r to bring computing benefits to
society at large. One implication for digital library (DL)
research is that we should start looking at projects that span the
spectrum from basic computer science to the implementation of
working systems and consider links among findings on
information system use from a variety of arenas in life. In this
paper, I integrate findings from my research on people's
encounters with DLs in two different arenas: academia and low-income
neighborhoods. The point is to see how concepts and
conclusions related to use do, in fact, cross these arenas. The
paper also aims to help bring results from studies of local
community information practices into the realm of DLs, since
community networking represents one particular genre and
audience that has not yet received a great deal of attention from
those engaged in DL research. Beginning with a discussion of
DL use as an assemblage of infrastructure, norms, knowledge,
and practice, the paper explores a number of insights gleaned
from user studies associated with two separate research projects:
1) the recently completed University of Illinois Digital
Libraries Initiative (DLI) project; and 2) the Community
Networking Initiative (CNI) currently in progress under the
auspices of the University of Illinois, the Urban League of
Champaign County and Prairienet, the community network
serving East Central Illinois. Insights about DL use discussed
in this paper include: the way in which trivial barriers are
magnified until they effectively cut off use on a large scale; the
difficulties faced by outsiders whose information worlds are
impoverished, the primacy of comfort and relevant content in
encouraging use; and the importance of informal social
networks for providing help related to system use.
|
|
|
Barclay Blair and John Boyer.
Xfdl: Creating electronic commerce transaction records using xml.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
In the race to transform the World Wide Web
from a medium for information presentation to a
medium for information
exchange, the development of practices for
ensuring the security, auditability, and non-
repudiation of transactions that are
well established in the paper-based world has
not kept pace in the digital world. Existing
Internet technology provides
no easy way to create a valid `digital receipt'
that meets the requirements of both complex
distributed networks and the
business community. In addition, an improved
articulation of digital signatures is needed.
Extensible Forms Description
Language (XFDL), developed by UWI.Com and Tim
Bray, is an application of XML that allows
organizations to move
their paper-based forms systems to the Internet
while maintaining the necessary attributes of
paper-based transaction
records. XFDL was designed for implementation
in business-to-business electronic commerce and
intra-organizational
information transactions.
|
|
|
Catherine Blake.
Information synthesis: A new approach to explore secondary
information in scientific literature.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
Advances in both technology and publishing practices continue to increase the quantity of scientific literature that is available electronically. In this paper, we introduce the Information Synthesis process, a new approach that enables scientists to visualize, explore, and resolve contradictory findings that are inevitable when multiple empirical studies explore the same natural phenomena. Central to the Information Synthesis approach is a cyber-infrastructure that provides a scientist with both secondary information from an article and structured information resources. To demonstrate this approach, we have developed the Multi-User, Information Extraction for Information Synthesis (METIS) System. METIS is an interactive system that automates critical tasks within the Information Synthesis process. We provide two case-studies that demonstrate the utility of the Information Synthesis approach.
|
|
|
J.A. Blakeley, W.J. McKenna, and G. Graefe.
Experiences building the open oodb query optimizer.
In Proceedings of the International Conference on Management of
Data, 1993.
The authors report their experiences building the query
optimizer for TI's Open OODB system. It is probably the
first working object query optimizer to be based on a
complete extensible optimization framework including logical
algebra, execution algorithms, property enforcers, logical
transformation rules, implementation rules, and selectivity
and cost estimation. Their algebra incorporates a new
materialize operator with its corresponding logical
transformation and implementation rules that enable the
optimization of path expressions. The Open OODB query
optimizer was constructed using the Volcano Optimizer
Generator, demonstrating that this second-generation
optimizer generator enables rapid development of efficient
and effective query optimizers for non-standard data models
and systems.
|
|
|
Ann Blandford, Suzette Keith, Iain Connell, and Helen Edwards.
Analytical usability evaluation for digital libraries: a case study.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
There are two main kinds of approach to
considering usability of any system: empirical and analytical.
Empirical techniques involve testing systems with users, whereas
analytical techniques involve usability personnel assessing systems
using established theories and methods. We report here on a set of
studies in which four different techniques were applied to various
digital libraries, focusing on the strengths, limitations and scope of
each approach. Two of the techniques, Heuristic Evaluation and
Cognitive Walkthrough, were applied in text-book fashion, because there
was no obvious way to contextualize them to the Digital Libraries (DL)
domain. For the third, Claims Analysis, it was possible to develop a
set of re-usable scenarios and personas that relate the approach
specifically to DL development. The fourth technique, CASSM, relates
explicitly to the DL domain by combining empirical data with an
analytical approach. We have found that Heuristic Evaluation and
Cognitive Walkthrough only address superficial aspects of interface
design (but are good for that), whereas Claims Analysis and CASSM can
help identify deeper conceptual difficulties (but demand greater skill
of the analyst). However, none fit seamlessly within the fragmented
function-oriented design practices that typify much digital library
development, highlighting an important area for further work to support
improved usability.
|
|
|
Ann Blandford, Hanna Stelmaszewska, and Nick Bryan-Kinns.
Use of multiple digital libraries: A case study.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
The aim of the work reported here was to better understand
the usability issues raised when digital libraries are used in a natural
setting. The method used was a protocol analysis of users working on a
task of their own choosing to retrieve documents from publicly available
digital libraries. Various classes of usability difficulties were found.
Here, we focus on use in context - that is, usability concerns that arise
from the fact that libraries are accessed in particular ways, under
technically and organisationally imposed constraints, and that use of
any particular resource is discretionary. The concepts from an Interaction
Framework, which provides support for reasoning about patterns of
interaction between users and systems, are applied to understand interaction
issues.
|
|
|
R. Boisvert, S. Browne, J. Dongarra, and E. Grosse.
Digital software and data repositories for support of scientific
computing.
In Advances in Digital Libraries '95, 1995.
Format: Not Yet Online.
|
|
|
Kurt D. Bollacker, Steve Lawrence, and C. Lee Giles.
A system for automatic personalized tracking of scientific literature
on the web.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
We introduce a system as part of the CiteSeer digital library
project for automatic tracking of scientific literature that is
relevant to a user's research interests. Unlike previous systems
that use simple keyword matching, CiteSeer is able to
track and recommend topically relevant papers even when
keyword based query profiles fail. This is made possible
through the use of a heterogenous profile to represent user
interests. These profiles include several representations, including
content based relatedness measures. The CiteSeer
tracking system is well integrated into the search and browsing
facilities'of CiteSeer, and provides the user with great
flexibility in tuning a profile to better match his or her interests.
The software for this system is available, and a sample
database is online as a public service.
|
|
|
Leslie Bondaryk.
Calculus modules online: An internet multimedia application.
In DAGS'95, 1995.
Format: HTML Document(21K + pictures)
Audience: Calculus Instructors.
References: 13.
Links: 16.
Abstract: Discusses an architecture for a system that aids in the
teaching of calculus.
|
|
|
J. Bonigk and A. Lubinski.
A basic architecture for mobile information access.
Computers & Graphics, 20(5):683-91, 1996.
As the development of pen computing' continues, more and
more
of today's computers are likely gradually to move away from
people's desktops and into their pockets. The development of
personal digital assistants (PDAs) has initiated this move.
As these devices move into people's pockets, they need the
ability to access information on the move. This article
describes a generic view of a client server mobile computing
architecture. It also sheds some light on the basic network
topologies that have been considered previously for such
systems. The scenario used is a hospital ward. Each doctor is
equipped with a PDA and each ward or a group of wards with a
server providing patient records. As a doctor visits a
patient in a ward, the patient's record is accessed from the
server onto the PDA. The doctor updates the record and sends
the update back to the server.
|
|
|
Jos‰ Borbinha, Nuno Freire, and Joƒo Neves.
Bnd: A national digital library as a jigsaw.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
This paper describes the architecture and
components of the infrastructure in construction for the National
Digital Library in Portugal. The requirements emerged from the
definition of the services to support, with a special focus on
scalability, and from the decision to give a special attention to
community building standards, open solutions, and reusable and cost
effective components. The generic bibliographic metadata format in this
project is UNIMARC, and the structural metadata is METS. The URN
identifiers are processed and resolved as simple but very effective
PURL identifiers, and the storage is provided by the new emerging
LUSTRE file system, for immediate access, and by a locally developed
GRID architecture, ARCO, for long term preservation. All these
components run on Linux servers, as also the middleware for access
based in the FEDORA framework.
|
|
|
N. Borenstein and N. Freed.
MIME (Multipurpose Internet Mail Extensions) Part One:
Mechanisms for specifying and describing the format of Internet message
bodies, September 1993.
Internet RFC 1521.
|
|
|
Nathaniel Borenstein.
Cooperative work in the andrew message system.
In Proceedings of the Conference on Computer-Supported
Cooperative Work, CSCW'88, 1988.
Describes collab-related aspects of Andrew.
|
|
|
Christine L. Borgman, Gregory Leazer, Anne Gilliland-Swetland, Kelli Millwood,
Leslie Champeny, Jason Finley, and Laura J. Smart.
How geography professors select materials for classroom lectures:
Implications for the design of digital libraries.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
A goal of the Alexandria Digital Earth Prototype
(ADEPT) project is to make primary resources in geography useful for
undergraduate instruction in ways that will promote inquiry learning.
The ADEPT education and evaluation team interviewed professors about
their use of geography information as they prepare for class lectures,
as compared to their research activities. We found that professors
desired the ability to search by concept (erosion, continental drift,
etc.) as well as geographic location, and that personal research
collections were an important source of instructional materials.
Resources in geo-spatial digital libraries are typically described by
location, but are rarely described by concept or educational
application. This paper presents implications for the design of an
educational digital library from our observations of the lecture
preparation process. Findings include functionality requirements for
digital libraries and implications for the notion of digital libraries
as a shared information environment. The functional requirements
include definitions and enhancements of searching capabilities, the
ability to contribute and to share personal collections of resources,
and the capability to manipulate data and images.
|
|
|
Katy Borner, Ying Feng, and Tamara McMahon.
Collaborative visual interfaces to digital libraries.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
This paper argues for the design of collaborative visual
interfaces
to digital libraries that support social navigation. As an illustrative example
we present work in progress on the design of a three-dimensional document space
for a scholarly community - namely faculty, staff, and students at the School of
Library and Information Science, Indiana University. We conclude with a set of
research challenges.
|
|
|
C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber, Michael F.
Schwartz, and Duane P. Wessels.
Harvest: A scalable, customizable discovery and access system.
Technical Report CU-CS-732-94, Dept. of Computer Science, Univ. of
Colorado, Boulder, Colo., August 1994.
Accessible at http://harvest.transarc.com/.
|
|
|
C.M. Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber, and Michael F.
Schwartz.
The harvest information discovery and access system.
Computer Networks and ISDN Systems, 28(1-2):119-125, December
1995.
It is increasingly difficult to make effective use of
Internet information, given the rapid growth in data
volume, user base, and data diversity. We introduce
Harvest, a system that provides a scalable, customizable
architecture for gathering, indexing, caching,
replicating, and accessing Internet information.
|
|
|
Claus Brabrand, Anders Moller, Anders Sandholm, and Michael I. Schwartzbach.
A runtime system for interactive web services.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
Interactive Web services are increasingly
replacing traditional static Web pages.
Producing Web services seems to
require a tremendous amount of laborious
low-level coding due to the primitive nature
of CGI programming. We present
ideas for an improved runtime system for
interactive Web services built on top of CGI
running on virtually every
combination of browser and HTTP/CGI server.
The runtime system has been implemented and
used extensively in
<bigwig>. a tool for producing interactive
Web services.
|
|
|
Onn Brandman, Junghoo Cho, Hector Garcia-Molina, and Narayanan Shivakumar.
Crawler-friendly web servers.
In Proceedings of the Workshop on Performance and Architecture
of Web Servers (PAWS), Santa Clara, California, June 2000.
Held in conjunction with ACM SIGMETRICS 2000. Available at
http://dbpubs.stanford.edu/pub/2000-25.
In this paper we study how to make web servers (e.g.,
Apache) more crawler friendly. Current web servers
offer the same interface to crawlers and regular web
surfers, even though crawlers and surfers have very
different performance requirements. We evaluate simple
and easy-to-incorporate modifications to web servers so
that there are significant bandwidth
savings. Specifically, we propose that web servers
export meta-data archives decribing their content.
|
|
|
Onn Brandman, Hector Garcia-Molina, and Andreas Paepcke.
Where have you been? a comparison of three web tracking technologies.
In Submitted for publication, 1999.
Available at http://dbpubs.stanford.edu/pub/1999-61.
Web searching and browsing can be improved if browsers and search
engines know which pages users frequently visit. 'Web
tracking' is the process of gathering that
information. The goal for Web tracking is to obtain a
database describing Web page download times and users'
page traversal patterns. The database can then be used
for data mining or for suggesting popular or relevant
pages to other users. We implemented three Web tracking
systems, and compared their performance. In the first
system, rather than connecting directly to Web sites, a
client issues URL requests to a proxy. The proxy
connects to the remote server and returns the data to
the client, keeping a log of all transactions. The
second system uses sniffers to log all HTTP traffic
on a subnet. The third system periodically collects
browser log files and sends them to a central
repository for processing. Each of the systems differs
in its advantages and pitfalls. We present a comparison
of these techniques.
|
|
|
Jack Brassil.
September - secure electronic publishing trial.
In Proceedings of DL'96, 1996.
Format: Not yet online.
|
|
|
Lee Breslau, Pei Cao, Li Fan, Graham Phillips, and Scott Shenker.
Web caching and zipf-like distributions: Evidence and implications.
In Proceedings of Infocom, 1999.
|
|
|
Allen Brewer, Wei Ding, Karla Hahn, and Anita Komlodi.
The role of intermediary services in emerging digital libraries.
In Proceedings of DL'96, 1996.
Format: Not yet online.
|
|
|
M.W. Bright, A.R. Hurson, and S. Pakzad.
Automated resolution of sematic heterogeneity in multidatabases.
ACM Transaction on Database Systems, 19(2):212-253, June 1994.
|
|
|
M.W. Bright, A.R. Hurson, and Simin H. Pakzad.
A taxonomy and current issues in multidatabase systems.
IEEE Computer, 25(3):51-60, March 1992.
This article presents a taxonomy of global
information-sharing systems and discusses where
multidatabase systems fit in the spectrum of
solutions. The authors use this taxonomy as a basis
for defining multidatabase systems, then discuss the
issues associated with them. In particular, the
paper focuses on two major design approaches-
global schema systems and multidatabase language systems.
|
|
|
Brightplanet.com.
http://www.brightplanet.com.
|
|
|
The Deep Web: Surfacing Hidden Value.
http://www.completeplanet.com/Tutorials/DeepWeb/.
|
|
|
S. Brin and L. Page.
The anatomy of a large-scale hypertextual web search engine.
In Proceedings of 7th World Wide Web Conference, 1998.
In this paper, we present Google, a prototype of a
large-scale search engine which makes heavy use of
the structure present in hypertext. Google is designed
to crawl and index the Web efficiently and produce
much more satisfying search results than existing
systems. The prototype with a full text and hyperlink
database of at least 24 million pages is available at
http://google.stanford.edu/
To engineer a search engine is a challenging task.
Search engines index tens to hundreds of millions of
web pages involving a comparable number of distinct
terms. They answer tens of millions of queries every
day. Despite the importance of large-scale search
engines on the web, very little academic research has
been done on them. Furthermore, due to rapid advance
in technology and web proliferation, creating a web
search engine today is very different from three years
ago. This paper provides an in-depth description of
our large-scale web search engine - the first such
detailed public description we know of to date.
Apart from the problems of scaling traditional
search techniques to data of this magnitude, there are
new technical challenges involved with using the
additional information present in hypertext to
produce better search results. This paper addresses
this question of how to build a practical large-scale
system which can exploit the additional information
present in hypertext. Also we look at the problem of
how to effectively deal with uncontrolled hypertext
collections where anyone can publish anything they
want.
|
|
|
Sergev Brin, James Davis, and Hector Garcia-Molina.
Copy detection mechanisms for digital documents.
SIGMOD, pages 398-409, 1995.
In a digital library system, documents are available in
digital form and therefore are more easily copied and
their copyrights are more easily violated. This is a
very serious problem, as it discourages owners of
valuable information from sharing it with authorized
users. There are two main philosophies for addressing
this problem: prevention and detection. The former
actually makes unauthorized use of documents difficult
or impossible while the latter makes it easier to
discover such activity. We propose a system for
registering documents and then detecting copies, either
complete copies or partial copies. We describe
algorithms for such detection, and metrics required for
evaluating detection mechanisms (covering accuracy,
efficiency, and security). We also describe a working
prototype, called COPS, describe implementation issues,
and present experimental results that suggest the proper
settings for copy detection parameters.
|
|
|
Sergey Brin.
Extracting patterns and relations from the world wide web.
In WebDB Workshop at 6th International Conference on Extending
Database Technology, EDBT'98, 1998.
Available at http://www-db.stanford.edu/ sergey/extract.ps.
Seed a search with examples of a pattern, such as
citations to books. Let the engine run over Web pages
and learn. Get back more books.
|
|
|
Sergey Brin and Lawrence Page.
The anatomy of a large-scale hypertextual web search engine.
In Proceedings of the Seventh International World-Wide Web
Conference, 1998.
Shows architecture of Google.
|
|
|
Sergey Brin and Lawrence Page.
Dynamic data mining: A new architecture for data with high
dimensionality.
Technical report, Stanford University, 1998.
Describes a new architecture for data mining. It
makes use of some of the dynamic itemset counting
technology
|
|
|
Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar
Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener.
Graph structure in the web: experiments and models.
In Proceedings of the Ninth International World-Wide Web
Conference, 2000.
|
|
|
Eric W. Brown, James P. Callan, and W. Bruce Croft.
Fast incremental indexing for full-text information retrieval.
In Proceedings of the Twentieth Internationl Conference on Very
Large Databases, pages 192-202, September 1994.
|
|
|
Eric W. Brown, James P. Callan, W. Bruce Croft, and J. Eliot B. Moss.
Supporting full-text information retrieval with a persistent object
store.
In Proceedings of the Fourth Internationl Conference on
Extending Database Technology-EDBT'94, pages 365-378, March 1994.
|
|
|
Michael S. Brown and W. Brent Seales.
Beyond 2d images: Effective 3d imaging for library materials.
In Proceedings of the Fifth ACM International Conference on
Digital Libraries, 2000.
Significant efforts are being made to digitize rare and
valuable library materials, with the goal of providing patrons and
historians digital facsimiles that capture the look and feel of the
original materials. This is often done by digitally photographing
the materials and making high resolution 2D images available. The
underlying assumption is that the objects are flat. However,
older materials may not be flat in practice, being warped and
crinkled due to decay, neglect, accident and the passing of time.
In such cases, 2D imaging is insufficient to capture the look
and feel of the original. For these materials, 3D acquisition is
necessary to create a realistic facsimile. This paper outlines
a technique for capturing an accurate 3D representation of library
materials which can be integrated directly into current digitization
setups. This will allow digitization efforts to provide patrons
with more realistic digital facsimile of library materials.
|
|
|
Michael S. Brown and W. Brent Seales.
The digital atheneum: New approaches for preserving, restoring and
analyzing damaged manuscripts.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
This paper presents research focused on developing new
techniques and algorithms for the digital acquisition, restoration, and
study of damaged manuscripts. We present results from an acquisition
effort in partnership with the British Library, funded through the NSF
DLI-2 program, designed to capture 3-D models of old and damaged
manuscripts. We show how these 3-D facsimiles can be analyzed and
manipulated in ways that are tedious or even impossible if confined to
the physical manuscript. In particular, we present results from a
restoration framework we have developed for flattening the 3-D
representation of badly warped manuscripts. We expect these research
directions to give scholars more sophisticated methods to preserve,
restore, and better understand the physical objects they study.
|
|
|
Michael S. Brown and Desmond Tsoi.
Correcting common image distortions in library materials acquired by
a camera [short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
We present a technique to correct image distortion that can occur when
library materials are imaged by cameras. Our approach provides a general framework to
undo a variety of common distortions, including binder curl, fold distortion, and
combinations of the two. Our algorithm is described and demonstrated on several examples.
|
|
|
Shirley Browne, Jack Dongarra, Eric Grosse, and Tom Rowan.
The netlib mathematical software repository.
D-Lib Magazine, Sep 1995.
Format: HTML Document().
|
|
|
Shirley Browne, Jack Dongarra, Ken Kennedy, and Tom Rowan.
Management of the nationale hpcc software exchange-a virtual
distributed digital library.
In Proceedings of the Second Annual Conference on the Theory and
Practice of Digital Libraries, 1995.
Format: HTML Document (35K + pictures) .
Audience: Computer scientists, mathematicians, librarians.
References: 15.
Links: 13.
Relevance: Low-Medium.
Abstract: Describes the NHSE software repository, with files kept at
authors' sites, but a central index in a common form (prepared manually now, but
hopefully automatically later). Includes a special process for submission a
nd revision via web forms, with digital signatures (PGP) required for
authentication. Accepted files are fingerprinted using MD5 so that
modifications can be detected. A scheme of LIFNs (Location Independent
FileNames) is essentially
a precursor to URN's.
|
|
|
Peter Brusilovsky, Rosta Farzan, and Jaewook Ahn.
Comprehensive personalized information access in an educational
digital library.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
This paper explores two ways to help students locate most relevant resources in educational digital libraries. One is a more comprehensive access to educational resources through several ways of information access including browsing and information visualization. Another is personalized information access through social navigation support. The paper presents the details of the Knowledge Sea III system for comprehensive personalized access to educational resources and presented results of a classroom study. The study delivered a convincing argument for the importance of providing several ways of information showing that only about 10% of all resource accesses were made through the traditional search interface. We have also collected some good evidence in favor of the social navigation support.
|
|
|
George Buchanan, David Bainbridge, Katherine Don, and Ian H. Witten.
A new framework for building digital library collections.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
This paper introduces a new framework for building digital library collections and contrasts it with existing systems. It describes a significant new step in the development of a widely-used open-source digital library system, Greenstone, which has evolved over many years. It is supported by a fresh implementation, which forced us to rethink the entire design rather than making incremental improvements. The redesign capitalizes on the best ideas from the existing system, which have been refined and developed to open new avenues through which digital librarians can tailor their collections. We demonstrate its flexibility by showing how digital library collections can be extended and altered to satisfy new requirements.
|
|
|
George Buchanan and Annika Hinze.
A generic alerting service for digital libraries.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
Users of modern digital libraries (DLs) can keep themselves up-to-date by searching and browsing their favorite collections, or more conveniently by resorting to an alerting service. The alerting service notifies its clients about new or changed documents. So far, no sophisticated service has been proposed that covers heterogeneous and distributed collections and is integrated with the digital library software.
This paper analyses the conceptual requirements of this much-sought after service for digital libraries. We demonstrate that the diffing concepts of digital libraries and its underlying technical design has extensive influence (a) the expectations, needs and interests of users regarding an alerting service, and (b) on the technical possibilities of the implementation of the service.
Our findings will show that the range of issues surrounding alerting services for digital libraries, their design and use is greater than one may anticipate. We also show that, conversely, the requirements for an alerting service have considerable impact on the concepts of DL design. Our findings should be of interest for librarians as well as system designers. We highlight and discuss the far-reaching implications for the design of, and interaction with, libraries. This paper discusses on the lessons learned from building such a distributed alerting service. We present our prototype implementation as a proof-of-concept for an alerting service for open DL software.
|
|
|
John Buford.
Evaluation of a query language for structured hypermedia documents.
In DAGS '95., 1995.
Format: Not Yet On-line.
Audience: Technical. HyTime developers..
References: 17.
Links: .
Relevance: Low.
Abstract: HyTime is an ISO standard for hypermedia time based
documents. This paper discusses an implementation of a database and search
engine operating in that language. Examples of queries, optimizations, etc.
|
|
|
J. Bumiller and S. Rather.
Electronic meeting assistance.
In Human Computer Interaction. Vienna Conference, VCHCI '93 Fin
de Siecle Proceedings, pages 425-6, Sep 1993.
The Electronic Meeting Assistance (EMA) is a virtually
co-located mixed system. That means that all participants of
the meeting are present at the same time but not necessary at
the same location (for example some people meet in a room and
an external, remote expert is included via a local area
network). During the meeting the personal Notepads of the
participants are linked together using a radio LAN. In
addition an interactive white-board e.g. the Xerox LiveBoard
is used for visualisation and manipulation of common data. To
assist cooperative work, the EMA system supports the exchange
of information during meetings. Various information can be
exchanged between meeting members, for example contact
information, prepared notes and diagrams; electronic
presentations could be given or a paper could be edited by
the group.
|
|
|
P. Buneman, S.B. Davidson, K. Hart, C. Overton, and L. Wong.
A data transformation system for biological data sources.
In Proceedings of the Twenty-first International Conference on
Very Large Databases, Zurich, Switzerland, 1995. VLDB Endowment, Saratoga,
Calif.
|
|
|
Robin Burke and Kristian J. Hammond.
Combining databases and knowledge bases for assisted browsing.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
Vannevar Bush.
As we may think.
The Atlantic Monthly, July 1945.
|
|
|
Christoph Bussler, Stefan Jablonski, Thomas Kirsche, Hans Schuster, and Hartmut
Wedekind.
Architectural issues of distributed workflow management systems.
In V. Malyshkin, editor, Parallel Computing Technologies. Third
International Conference, PACT-95, Proceedings., Berlin, Germany, 1995.
Springer-Verlag.
A specific task of distributed and parallel information
systems is workflow management. In particular,
workflow management systems execute business
processes that run on top of distributed and
parallel information systems. Parallelism is due to
performance requirements and involves data and
applications that are spread across a heterogeneous,
distributed computing environment. Heterogeneity and
distribution of the underlying computing
infrastructure should be made transparent in order
to alleviate programming and use. We introduce an
implementation architecture for workflow management
systems that meets these requirements. Scalability
(through transparent parallelism) and transparency
with respect to distribution and heterogeneity are
the major characteristics of this architecture. A
generic client/server class library in an
object-oriented environment demonstrates the
feasibility of the approach.
|
|
|
Sasa Buvac and Richard Fikes.
A declarative formalization of knowledge translation.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
Orkut Buyukkokten, Hector Garcia-Molina, and Andreas Paepcke.
Accordion summarization for end-game browsing on pdas and cellular
phones.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'01, 2000.
We demonstrate a new browsing technique for devices
with small displays such as PDAs or cellular phones. We
concentrate on end-game browsing, where the user is
close to or on the target page. We make browsing more
efficient and easier by Accordion Summarization. In
this technique the Web page is first represented as a
short summary. The user can then drill down to discover
relevant parts of the page. If desired, keywords can be
highlighted and exposed automatically. We discuss our
techniques, architecture, interface facilities, and the
result of user evaluations. We measured a 57 improvement in browsing speed and 75 input effort.
|
|
|
Orkut Buyukkokten, Hector Garcia-Molina, and Andreas Paepcke.
Focused web searching with pdas.
In Proceedings of the Ninth International World-Wide Web
Conference, 2000.
The Stanford Power Browser project addresses the
problems of interacting with the World-Wide Web through
wirelessly connected Personal Digital Assistants
(PDAs). These problems include bandwidth limitations,
screen real-estate shortage, battery capacity, and the
time costs of pen-based search keyword input. As a way
to address bandwidth and battery life limitations, we
provide local site search facilities for all sites. We
incrementally index Web sites in real time as the PDA
user visits them. These indexes have narrow scope at
first, and improve as the user dwells on the site, or
as more users visit the site over time. We address the
keyword input problem by providing site specific
keyword completion, and indications of keyword
selectivity within sites. The system is implemented on
the Palm Pilot platform, using a Metricom radio
link. We describe the user level experience, and then
present the analyses that informed our technical
decisions.
|
|
|
Orkut Buyukkokten, Hector Garcia-Molina, and Andreas Paepcke.
Seeing the whole in parts: Text summarization for web browsing on
handheld devices.
In 10th International WWW Conference, 2000.
Available at http://dbpubs.stanford.edu/pub/2001-45.
We introduce five methods for summarizing parts of Web
pages on handheld devices, such as personal digital
assistants (PDAs), or cellular phones. Each Web page is
broken into text units that can each be hidden,
partially displayed, made fully visible, or
summarized. The methods accomplish summarization by
different means. One method extracts significant
keywords from the text units, another attempts to find
each text unit's most significant sentence to act as a
summary for the unit. We use information retrieval
techniques, which we adapt to the World-Wide Web
context. We tested the relative performance of our five
methods by asking human subjects to accomplish
single-page information search tasks using each
method. We found that the combination of keywords and
single-sentence summaries works best for a variety of
search tasks.
|
|
|
Orkut Buyukkokten, Hector Garcia-Molina, and Andreas Paepcke.
Seeing the whole in parts: Text summarization for web browsing on
handheld devices.
In Proceedings of the Tenth International World-Wide Web
Conference, 2001.
Available at http://dbpubs.stanford.edu/pub/2001-45.
We introduce five methods for summarizing parts of Web
pages on handheld devices, such as personal digital
assistants (PDAs), or cellular phones. Each Web page is
broken into text units that can each be hidden,
partially displayed, made fully visible, or
summarized. The methods accomplish summarization by
different means. One method extracts significant
keywords from the text units, another attempts to find
each text unit's most significant sentence to act as a
summary for the unit. We use information retrieval
techniques, which we adapt to the World-Wide Web
context. We tested the relative performance of our five
methods by asking human subjects to accomplish
single-page information search tasks using each
method. We found that the combination of keywords and
single-sentence summaries works best for a variety of
search tasks.
|
|
|
Orkut Buyukkokten, Hector Garcia Molina, Andreas Paepcke, and Terry Winograd.
Power browser: Efficient web browsing for pdas.
In , editor, Proceedings of the Conference on Human Factors in
Computing Systems CHI'00, 2000.
We have designed and implemented new Web browsing
facilities to support effective navigation on Personal
Digital Assistants (PDAs) with limited capabilities:
low bandwidth, small display, and slow CPU. The
implementation supports wireless browsing from 3Com's
Palm Pilot. An HTTP proxy fetches web pages on the
client's behalf and dynamically generates summary views
to be transmitted to the client. These summaries
represent both the link structure and contents of a set
of web pages, using information about link
importance. We discuss the architecture, user interface
facilities, and the results of comparative performance
evaluations. We measured a 45 and a 42% reduction in required pen movements.
|
|
|
Orkut Buyukokkten, Junghoo Cho, Hector Garcia-Molina, Luis Gravano, and
Narayanan Shivakumar.
Exploiting geographical location information of web pages.
In Proceedings of Workshop on Web Databases (WebDB'99), June
1999.
Held in conjunction with ACM SIGMOD'99. Available at
http://dbpubs.stanford.edu/pub/1999-4.
Many information sources on the web are relevant
primarily to specific geographical communities. For
instance, web sites containing information on
restaurants, theatres and apartment rentals are
relevant primarily to web users in geographical
proximity to these locations. We make the case for
identifying and exploiting the geographical location
information of web sites so that web applications can
rank information in a geographically sensitive
fashion. For instance, when a user in Palo Alto issues
a query for Italian Restaurants, a web search engine
can rank results based on how close such restaurants
are to the user's physical location rather than based
on traditional IR measures. In this paper, we first
consider how to compute the geographical location of
web pages. Subsequently, we consider how to exploit
such information in one specific proof-of-concept
application we implemented in JAVA.
|
|
|
Donald Byrd.
A scrollbar-based visualization for document navigation.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
We are interested in questions of improving user control in best-
match
text-retrieval systems, specifically questions as to whether
simple visualizations that nonetheless go beyond the minimal
ones generally available can significantly help users. Recently, we
have been investigating ways to help users decide-given a set of
documents retrieved by a query-which documents and passages
are worth closer examination. We built a document viewer incorporating
a visualization
centered around a novel content-displaying scrollbar and color
term highlighting, and studied whether the visualization is helpful
to non-expert searchers. Participants' reaction to the visualization
was very positive, while the objective results were inconclusive.
|
|
|
Donald Byrd.
Music-notation searching and digital libraries.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
Almost all work on music information retrieval to date
has concentrated on music in the audio and event (normally MIDI) domains.
However, music in the form of notation, expecially Conventional Music
Notation (CMN), is of much interest to musically-trained persons, both
amateurs and professionals, and searching CMN has great value for digital
music libraries. One obvious reason little has been done on music retrieval
in CMN form is the overwhelming complexity of CMN, which requires a very
substantial investment in programming before one can even begin studying
music IR. This paper reports on work adding music-retrieval capabilities
to Nightingale, an existing professional-level music-notation editor.
|
|
|
Donald Byrd and Eric Isaacson.
Music representation in a digital music library [short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
The Variations2 digital music library currently supports music in audio
and score-image formats. In a future version, we plan to add music in a symbolic form.
This paper describes our work defining a music representation suitable for the needs of
our users.
|
|
|
Deng Cai, Xiaofei He, Ji-Rong Wen, and Wei-Ying Ma.
Block-level link analysis.
In SIGIR '04: Proceedings of the 27th annual international ACM
SIGIR conference on Research and development in information retrieval, pages
440-447, New York, NY, USA, 2004. ACM Press.
Link Analysis has shown great potential in improving the
performance of web search. PageRank and HITS are two
of the most popular algorithms. Most of the existing
link analysis algorithms treat a web page as a
single node in the web graph. However, in most
cases, a web page contains multiple semantics and
hence the web page might not be considered as the
atomic node. In this paper, the web page is
partitioned into blocks using the vision-based page
segmentation algorithm. By extracting the
page-to-block, block-to-page relationships from link
structure and page layout analysis, we can construct
a semantic graph over the WWW such that each node
exactly represents a single semantic topic. This
graph can better describe the semantic structure of
the web. Based on block-level link analysis, we
proposed two new algorithms, Block Level PageRank
and Block Level HITS, whose performances we study
extensively using web data.
|
|
|
Pßvel P. Calado, Marcos A.Gon‡alves, Edward A. Fox, Berthier Ribeiro-Neto,
Alberto H. F. Laender, Altigran S.da Silva, Davi C.Reis, Pablo A. Roberto,
Monique V. Vieira, and Juliano P. Lage.
The web-dl environment for building digital libraries from the web.
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
The Web contains a huge volume of unstructured data, which is difficult to manage. In digital
libraries, on the other hand, information is explicitly organized, described, and managed. Community-oriented
services are built to attend specific information needs and tasks. In this paper, we describe an environment,
Web-DL, that allows the construction of digital libraries from the Web. The Web-DL environment will allow us
to collect data from the Web, standardize it and publish it through a digital library system. It provides support
to services and organizational structure normally available in digital libraries, but benefiting from the breadth
of the Web contents. We experimented with applying the Web-DL environment to the Networked Digital Library of
Theses and Dissertations (NDLTD), thus demonstrating that the rapid construction of DLs from the Web is possible.
Also, Web-DL provides an alternative as a large-scale solution for interoperability between independent digital libraries.
|
|
|
Jinwei Cao and Jr. Jay F. Nunamaker.
Question answering on lecture videos: A multifaceted approach.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
In this paper, we introduce a multifaceted
approach for question answering on lecture videos. Text extracted from
PowerPoint slides associated with the lecture videos is used as a
source of domain knowledge to boost the answer extraction performance
on these domain specific videos. The three steps of this approach are
described and the evaluation plan is discussed.
|
|
|
Pei Cao, Jin Zhang, and Kevin Beach.
Active cache: Caching dynamic contents on the web.
In Proceedings of IFIP International Conference on Distributed
Systems Platforms and Open Distributed Processing (Middleware '98), pages
373-388, 1998.
|
|
|
Stuart K. Card, George G. Robertson, and William York.
The webbook and the web forager: An information workspace for the
world-wide web.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'96, 1996.
|
|
|
Michael J. Carey and Donald Kossmann.
On saying enough already! in sql.
In Proceedings of the International Conference on Management of
Data, pages 219-230, Tucson, Arizona, 1997. ACM Press, New York.
|
|
|
Michael J. Carey and Donald Kossmann.
Reducing the braking distance of an sql query engine.
In Proceedings of the Twenty-fourth International Conference on
Very Large Databases, pages 158-169, New York City, USA, 1998. VLDB
Endowment, Saratoga, Calif.
|
|
|
Jeromy Carriere and Rick Kazman.
Webquery: Searching and visualizing the web through connectivity.
In Proceedings of the Sixth International World-Wide Web
Conference, 1997.
|
|
|
Chad Carson, Megan Thomas, Serge Belongie, Joseph M. Hellerstein, and Jitendra
Malik.
Blobworld: A system for region-based image indexing and retrieval.
In Proceedings of the Third International Conference on Visual
Information Systems, June 1999.
|
|
|
Silvana Castano, Maria Grazia Fugini, Giancarlo Martella, and Pierangela
Samarati.
Database Security.
Addison-Wesley, 1994.
This is a comprehensive book on Database security.
Chapter 1,2, and 3 describe information security,
security models and security mechanisms and software
from a general point of view. Chapter 4 gives a detail
survey of Database security design. Chapter 5 explores
the problem of security on statistical
databases. Chapeter 6 describes different approaches in
instrusion detection. Chapter 7 explores security
models for next-generation databases (active db, oodb).
|
|
|
Nohema Castellanos and Alfredo Sßnchez.
Pops: Mobile access to digital library resources [short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
Mobile devices represent new opportunities for accessing digital libraries
(DLs) but also pose a number of challenges given the diversity of their hardware and
software features. We describe a framework aimed at facilitating the generation
of interfaces for access to DL resources from a wide range of mobile devices.
|
|
|
Donatella Castelli and Pasquale Pagano.
A system for building expandable digital libraries.
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
Expandability is one of the main requirements of future digital libraries. This paper
introduces a digital library service system, OpenDLib, that has been designed to be highly expandable
both in terms of content, services and usage. The paper illustrate the mechanisms that enable
expandability and discusses their impact on the development of the system architecture.
|
|
|
Duncan Cavens, Stephen Sheppard, and Michael Meitner.
Image database extension to arcview: How to find the photograph you
want.
In Proceedings of ESRI Users Conference, 2001.
|
|
|
William B. Cavnar and Andrew M. Gillies.
Data retrieval and the realities of document conversion.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (9K) .
Audience: Semi-technical, general computer science.
References: 5.
Links: 1.
Relevance: Low.
Abstract: Discusses need for inexact matching, eg. OCR recognition
errors.
Proposes using N-grams, overlapping sequences of N adjacent letters as search
target. Also research in matching in image of scanned documents (not doing
OCR). Some results on mail sorting & census data.
|
|
|
Augusto Celentano et al.
Knowledge-based document retrieval in office environments: The
kabiria system.
ACM Transactions on Information Systems, 13(3):237-268, July
1995.
In the office environment, the retrieval of documents
is performed using the concepts contained in the
documents, information about the procedural context
where the documents are used, and information about
the regulations and laws that discipline the life of
documents within a given application domain. To
fulfill the requirements of such a sophisticated
retrieval, we propose a document retrieval model and
system based on the representation of knowledge
describing the semantic contents of documents, the
way in which the documents are managed by procedures
and by people in the office, and the application
domain where the office operates. The article
describes the knowledge representation issues needed
for the document retrieval system and presents a
document retrieval model that captures these issues.
The effectiveness of the approach is illustrated by
describing a system, named Kabiria, built on
top of such model. The article describes the
querying and browsing environments, and the
architecture of the system.
|
|
|
Stefano Ceri, Sara Comai, Ernesto Damiani, Piero Fraternali, Stefano
Paraboschi, and Letizia Tanca.
Xml-gl: A graphical language for wuerying and restructuring xml
documents.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
The growing acceptance of XML as a standard for
semi-structured documents on the Web opens up
challenging opportunities for Web query languages. In
this paper we introduce XML-GL, a graphical query
language for XML documents. The use of a visual
formalism for representing both the content of XML
documents (and of their DTDs) and the syntax and
semantics of queries enables an intuitive expression of
queries, even when they are rather complex. XML-GL is
inspired by G-log, a general purpose, logic-based
language for querying structured and semi-structured
data. The paper presents the basic capabilities of
XML-GL through a sequence of examples of increasing
complexity.
|
|
|
Stefano Ceri and Giuseppe Pelagatti.
Distributed Databases.
McGraw-Hill, Inc., 1984.
Textbook
|
|
|
Common Gateway Interface (CGI).
http://hoohoo.ncsa.uiuc.edu/cgi/overview.html.
|
|
|
Wei Chai and Barry Vercoe.
Structural analysis of musical signals for indexing and thumbnailing.
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
A musical piece typically has a repetitive structure. Analysis of this structure
will be useful for music segmentation, indexing and thumbnailing. This paper
presents an algorithm that can automatically analyze the repetitive structure
of musical signals. First, the algorithm detects the repetition of each segment
of fixed length in a piece using dynamic programming. Second, the algorithm
summarizes this repetition information and infers the structure based on
heuristic rules. The performance of the approach is demonstrated visually
using figures for qualitative evaluation, and by two structural similarity
measures for quantitative evaluation. Based on the structural analysis result,
this paper also proposes a method for music thumbnailing. The preliminary
results obtained using a corpus of BeatlesÆ songs show that automatic structural
analysis and thumbnailing of music are possible.
|
|
|
S. Chakrabarti and S. Muthukrishnan.
Resource scheduling for parallel database and scientific
applications.
In 8th ACM Symposium on Parallel Algorithms and Architectures,
pages 329-335, June 1996.
|
|
|
Soumen Chakrabarti, Byron Dom, David Gibson, Ravi Kumar, Prabhakar Raghavan,
Sridhar Rajagopalan, and Andrew Tomkins.
Spectral filtering for resource discovery.
In ACM SIGIR workshop on Hypertext Information Retrieval on the
Web, 1998.
|
|
|
Soumen Chakrabarti, Byron Dom, and Piotr Indyk.
Enhanced hypertext categorization using hyperlinks.
In Proceedings of the International Conference on Management of
Data, 1998.
|
|
|
Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David
Gibson, and Jon Kleinberg.
Automatic resource compilation by analyzing hyperlink structure and
associated text.
In Proceedings of the Seventh International World-Wide Web
Conference, 1998.
|
|
|
Soumen Chakrabarti, David A. Gibson, and Kevin S. McCurley.
Surfing the web backwards.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
From a user's perspective, hypertext links on
the Web form a directed graph between distinct
information sources. We
investigate the effects of discovering
`backlinks' from Web resources, namely links
pointing to the resource. We describe
fools for backlink navigation on both the
client and server side, using an applet for the
client and a module for the Apache
Web server, We also discuss possible extensions
to the HTTP protocol to facilitate the
collection and navigation of backlink
information in the World Wide Web.
|
|
|
Soumen Chakrabarti, Martin van den Berg, and Byron Dom.
Focused crawling: A new approach to topic-specific web resource
discovery.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
The rapid growth of the World-Wide Web poses
unprecedented scaling challenges for general-purpose
crawlers and search engines. In this paper we describe a new
hypertext resource discovery system called a
Focused Crawler. The goal of a focused crawler is to selectively seek out
pages that are relevant to a pre-defined set of topics. The topics are specified
not using keywords, but using exemplary documents. Rather than collecting and
indexing
all accessible Web documents to be able to answer all possible ad-hoc queries,
a focused crawler analyzes its crawl boundary
to find the links that are likely to be most relevant for the crawl, and avoids
irrelevant regions of the Web. This leads to
significant savings in hardware
and network resources, and helps keep the crawl
more up-to-date. To achieve such goal-directed crawling, we
designed two hypertext mining programs that
guide our crawler: a classifier that evaluates the relevance of a hypertext
document with respect to the focus topics, and
a distiller that identifies hypertext
nodes that are great access points to many
relevant pages within a few links. We report on
extensive focused-crawling experiments using several topics at different
levels of specificity. Focused crawling acquires relevant pages steadily while
standard crawling quickly loses its way, even
though they are started from the same root set.
Focused crawling is robust against large perturbations in the starting set
of URLs. It discovers largely overlapping sets
of resources in spite of these perturbations. It is also capable of exploring
out and discovering valuable resources that are
dozens of links away from the start set, while carefully pruning the millions
of pages that may lie within this same radius.
Our anecdotes suggest that focused crawling is very effective for building
high-quality collections of Web documents on
specific topics, using modest desktop hardware.
|
|
|
Jim Challenger, Paul Dantzig, and Arun Iyengar.
A scalable system for consistently caching dynamic web data.
In Proceedings of the 18th Annual Joint Conference of the IEEE
Computer and Communications Societies, New York, New York, 1999.
|
|
|
Jim Challenger, Arun Iyengar, Karen Witting, Cameron Ferstat, and Paul Reed.
A publishing system for efficiently creating dynamic web content.
In Proceedings of IEEE INFOCOM 2000, Tel Aviv, Israel, 2000.
|
|
|
M. Chalmers, K. Rodden, and D. Brodbeck.
The order of things: activity-centered information access.
In Proceedings of the 7th World Wide Web Conference, 1998.
This paper focuses on the representation and access of
Web-based information, and how to make such a representation adapt to the
activities or interests of individuals within a community of users. The
heterogeneous mix of information on the Web restricts the coverage of
traditional indexing techniques and so limits the power of search
engines. In contrast to traditional methods, and in a way that extends
collavotaive filtering approaches, the path model centers representation
on usage histories rather than content analysis. By putting activity at
the center of representation and not the periphery, the path model
concentrates on the reader not the author and the brower not the site. We
describe metrics of similarity based on the path model, and their
application in a URL recommender tool and in visualising sets of URLs.
|
|
|
Leslie Champeny, Christine L. Borgman, Patricia Mautone, Richard E. Mayer,
Richard A. Johnson, Gregory H. Leazer, Anne J. Gilliland-Swetland, Kelli A.
Millwood, Leonard D'Avolio, Jason Finley, and Laura J. Smart.
Developing a digital learning environment: an evaluation of design
and implementation processes.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
The Alexandria Digital Earth Prototype (ADEPT)
Project (1999-2004) builds upon the Alexandria Digital Library Project
(1994-99) to add functions and services for undergraduate teaching to a
digital library of geospatial resources. The Digital Learning
Environment (DLE) services are being developed and evaluated
iteratively over the course of this research project. In the 2002-2003
academic year, the DLE was implemented in stages during the fall and
spring terms in undergraduate geography courses at the University of
California, Santa Barbara (UCSB). Evaluation of the fall term
implementation identified design issues of time and complexity of use
in the services for creating and organizing course domain knowledge. By
the time of the spring term implementation, these issues were addressed
and new services added for integrating selected course content into a
variety of class presentation formats. The implementation was evaluated
via interviews with the course instructor, development staff, and
students, and by observations (in person and videotaped) of the course.
Results of the iterative evaluation indicated that usability and
functionality for the instructor had increased between the two course
offerings. Students found classroom presentations to be useful for
understanding concepts, and Web access to the presentations useful for
study and review. Assessments of student learning suggest modest
improvements over time. Developers are now applying lessons learned
during these implementations to improve the system for subsequent
implementation in the 2003-04 academic year.
|
|
|
Alvin T.S. Chan.
Web-enabled smart card for ubiquitous access of patient's medical
record.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
The combined benefits of smart card to support
mobility in a pocket coupled with the
ubiquitous access of Web
technology, present a new paradigm for medical
information access systems. The paper describes
the framework of Java
Card Web Servlet (JCWS) that is being developed
to provide seamless access interface between a
Web browser and a Java-enabled smart card. Importantly, the smart
card is viewed as a mobile repository of Web
objects comprised of HTML
pages, medical data objects, and record
browsing and updating applet. As the patient
moves between hospitals, clinics and
countries, the mobility of the smart-card
database dynamically binds to the JCWS
framework to facilitate a truly ubiquitous
access and updating of medical information via
a standard Web-browser interface.
|
|
|
Chen-Chuan K. Chang and Hector Garcia-Molina.
Evaluating the cost of boolean query mapping.
In Proceedings of the Second ACM International Conference on
Digital Libraries, 1997.
At http://dbpubs.stanford.edu/pub/1997-25.
|
|
|
Chen-Chuan K. Chang and Hector Garcia-Molina.
Conjunctive constraint mapping for data translation.
Technical Report SIDL-WP-1998-0083; 1998-47, Stanford University,
January 1998.
Accessible at http://dbpubs.stanford.edu/pub/1998-47.
|
|
|
Chen-Chuan K. Chang and Héctor García-Molina.
Mind your vocabulary: Query mapping across heterogeneous information
sources.
In Proceedings of the International Conference on Management of
Data, pages 335-346, Philadelphia, Pa., June 1999. ACM Press, New York.
|
|
|
Chen-Chuan K. Chang, Héctor García-Molina, and Andreas Paepcke.
Boolean query mapping across heterogeneous information sources.
IEEE Transactions on Knowledge and Data Engineering,
8(4):515-521, Aug 1996.
Very technical, formal description of query translation. But
has the architecture picture.
|
|
|
Chen-Chuan K. Chang, Héctor García-Molina, and Andreas Paepcke.
Boolean query mapping across heterogeneous information sources
(extended version).
Technical Report SIDL-WP-1996-0044; 1996-1, Dept. of Computer
Science, Stanford Univ., Stanford, California, Sep 1996.
Accessible at http://dbpubs.stanford.edu/pub/1996-1).
Extend version of the paper of the same title appeared in
TKDE Aug. 1996
|
|
|
Chen-Chuan K. Chang, Héctor García-Molina, and Andreas Paepcke.
Predicate rewriting for translating boolean queries in a
heterogeneous information system.
ACM Transactions on Information Systems, 17(1):1-39, January
1999.
Available at http://dbpubs.stanford.edu/pub/1999-34.
Searching over heterogeneous information sources is
difficult in part because of the nonuniform query
languages. Our approach is to allow users to compose
Boolean queries in one rich front-end language. For
each user query and target source, we transform the
user query into a subsuming query that can be supported
by the source but that may return extra documents. The
results are then processed by a filter query to yield
the correct final results. In this article we introduce
the architecture and associated mechanism for query
translation. In particular, we discuss techniques for
rewriting predicates in Boolean queries into native
subsuming forms, which is a basis of translating
complex queries. In addition, we present experimental
results for evaluating the cost of postfiltering. We
also discuss the drawbacks of this approach and cases
when it may not be effective. We have implemented
prototype versions of these mechanisms and demonstrated
them on heterogeneous Boolean systems.
|
|
|
Chew-Hung Chang, John G Hedberg, Yin-Leng Theng, Ee-Peng Lim, Tiong-Sa Teh, and
Dion Hoe-Lian Goh.
Evaluating g-portal for geography learning and teaching.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
This paper describes G-Portal, a geospatial digital library of geographical assets, providing an interactive platform to engage students in active manipulation and analysis of information resources and collaborative learning activities. Using a G-Portal application in which students conducted a field study of an environmental problem of beach erosion and sea level rise, we described a pilot study to evaluate usefulness and usability issues in supporting geography learning, and in turn teaching.
|
|
|
Chia-Hui Chang and Ching-Chi Hsu.
Customizable multi-engine search tool with clustering.
In Proceedings of the Sixth International World-Wide Web
Conference, 1997.
|
|
|
Edward Chang.
An image coding and reconstruction scheme for mobile computing.
In Proceedings of the 5th IDMS (Springer-Verlag LNCS 1483),
p.137- 148, Oslo, Norway, September 1998., 1998.
Accessible at http://dbpubs.stanford.edu/pub/1997-10.
An asynchronous transfer mode (ATM) wireless network has bursty
and high error rates. To combat the contiguous bit loss due to damaged or
dropped packets, this paper presents a code packetization and image
reconstruction scheme. The packetization method distributes the loss
in both frequency and spatial domains to reduce the chance that
adjacent DCT blocks lose the same frequency components. The image
reconstruction takes into consideration the spatial characteristics
represented by the frequency components. Combining these two approaches
is able to reconstruct the damaged images more accurately, even under
very high loss rates. In addition, since the reconstruction technique is
computational efficient, it conserves system resources and power
consumption, which are restrictive in mobile computers.
|
|
|
Edward Chang and Hector Garcia-Molina.
Minimizing memory requirements in media servers.
Technical Report SIDL-WP-1996-0045; 1996-4, Stanford University,
December 1996.
|
|
|
Edward Chang and Héctor García-Molina.
Reducing initial latency in a multimedia storage system.
In Third International Workshop of Multimedia Database Systems,
1996.
A multimedia server delivers presentations (e.g., videos, movies,
providing high bandwidth
and continuous real-time deliveryIn this paper we present techniques for
reducing the initial
latency of presentations, i.e., for reducing the time between the arrival of
a request and the
start of the presentation. Traditionally, initial latency has not received
much attention. This is
because one major application of multimedia servers is movies on demand where
a delay of
a few minutes before a new multi-hour movie starts is acceptable. However ,
latency
reduction is important in interactive applications such as video games and
browsing of
multimedia documents. V arious latency reduction schemes are proposed and
analyzed,
and their performance compared. We show that our techniques can signicantly
reduce
(almost eliminate in some cases) initial latency without adversely affecting
throughput.
Moreover , a novel on-disk partial data replication scheme that we propose
proves to be
far more cost effective than any other previous attempts at reducing initial
latency.
Keywords: multimedia, data placement, data replication.
|
|
|
Edward Chang and Hector Garcia-Molina.
Effective memory use in a media server.
In Proceedings of the 23rd Very Large Data Base (VLDB)
Conference, 1997.
|
|
|
Edward Chang and Hector Garcia-Molina.
Medic: A memory & disk cache for multimedia clients.
Technical Report SIDL-WP-1997-0076; 1997-9, Stanford University,
October 1997.
|
|
|
Edward Chang and Hector Garcia-Molina.
Reducing initial latency in media servers.
In IEEE Multimedia, volume 4, 1997.
|
|
|
Edward Chang and Hector Garcia-Molina.
Cost-based media server design.
In To appear in the proceedings of the 8th Research Issues in
Data Engineering, Feb 1998.
|
|
|
Michelle Chang, John J. Leggett, Richard Furuta, Andruid Kerne, J. Patrick
Williams, Samuel A. Burns, and Randolph G. Bias.
Collection understanding.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
Collection understanding shifts the traditional
focus of retrieval in large collections from locating specific
artifacts to gaining a comprehensive view of the collection.
Visualization tools are critical to the process of efficient collection
understanding. By presenting simple visual interfaces and intuitive
methods of interacting with a collection, users come to understand the
essence of the collection by focusing on the artifacts. This paper
discusses a practical approach for enhancing collection understanding
in image collections.
|
|
|
Yee-Hsiang Chang and Ellis Chi.
Htgraph: A new method for information access over the world wide web.
In DAGS '95, 1995.
Format: HTML Document (27K + pictures).
Audience: Web surfers and computer scientists.
References: 11.
Links: 0.
Relevance: Low.
Abstract: Describes a browser which prefetches pages, and builds a graph
showing the relationships of those pages, allows you to jump down in the
hierarchy. User specified cutoff for how many nodes should be expanded. No
atte
mpt to automatically cluster. Describes naive data structures to implement a
breadth first search of the space.
|
|
|
Mitchell N. Charity.
Multiple standards? no problem.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (6K) .
Audience: Non-technical, standards committee membets.
References: 0.
Links: 1.
Relevance: Medium-low.
Abstract: Argues for an IETF rather than ISO model of standards
committee. Encouraging several different protocols with gateways being
constructed as needed, and generally letting the marketplace determine what
survives.
|
|
|
Michael Chau.
Personalized spiders for web search and analysis.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
Searching for useful information on the World Wide Web
has become increasingly difficult. While Internet search engines have
been helping people to search on the web, low recall rate and outdated
indexes have become more and more problematic as the web grows. In
addition, search tools usually present to the user only a list of search
results, failing to provide further personalized analysis which could help
users identify useful information and comprehend these results. To alleviate
these problems, we propose a client-based architecture that incorporates noun
phrasing and self-organizing map techniques. Two systems, namely CI Spider
and Meta Spider, have been built based on this architecure. User evaluation
studies have been conducted and the findings suggest that the proposed
architecture can effectively facilitate web search and analysis.
|
|
|
Michael Chau, Hsinchun Chen, Jialun Qin, Yilu Zhou, Yi Qin, Wai-Ki Sung, and
Daniel McDonald.
Comparison of two approaches to building a vertical search tool: A
case in the nanotechology domain.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
As the Web has been growing exponentially, it has become
increasingly difficult to search for desired information. In recent years,
many domain-specific (vertical) search tools have been developed to serve
the information needs of specific fields. This paper describes two approaches
to building a domain-specific search tool. We report our experience in building
two different tools in the nanotechnology domain - (1) a server-side search
engine, and (2) a client-side search agent. The designs of the two search
systems
are presented and discussed, and their strengths and weaknesses are compared. To
our knowledge, this paper is the first to compare prototype vertical search
systems built by the two different approaches. Some future research directions
are also discussed.
|
|
|
Michael Chau, Jialun Qin, Yilu Zhou, Chunju Tseng, and Hsinchun Chen.
Spidersrus: Automated development of vertical search engines in
different domains and languages.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
In this paper we discuss the architecture of a tool designed to help users develop vertical search engines in different domains and different languages. The design of the tool is presented and an evaluation study was conducted, showing that the system is easier to use than other existing tools.
|
|
|
Surajit Chaudhuri.
Finding nonrecursive envelopes for datalog predicates.
In Proceedings of the 12th ACM SIGACT-SIGMOD-SIGART Symposium on
Principles of Database Systems, pages 135-146, Washingtion, D.C., 1993. ACM
Press, New York.
|
|
|
Surajit Chaudhuri and Phokion G. Kolaitis.
Can datalog be approximated?
In Proceedings of the 13rd ACM SIGACT-SIGMOD-SIGART Symposium on
Principles of Database Systems, pages 86-96, Minneapolis, Minn., 1994. ACM
Press, New York.
|
|
|
Francine Chen, Marti Hearst, Julian Kupiec, Jan Pedersen, and Lynn Wilcox.
Mixed-media access.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (8K) .
Audience: Researchers, esp. in the area of multi-media searching.
References: 8.
Links: 1.
Relevance: Low-Medium.
Abstract: Essentially a set of pointers to Xerox PARC reports.
Describes projects related to scatter/gather, automatic segmenting, keyword
search equivalents for audio & video, and summarization.
|
|
|
Guanling Chen and David Kotz.
A survey of context-aware mobile computing research.
Technical Report TR2000-381, Dartmouth College, 2000.
|
|
|
M. Chen, M. Hearst, J. Hong, and J. Lin.
Cha-cha: A system for organizing intranet search results.
In Proceedings of the second USENIX Symposium on Internet
Technologies and SYSTEMS (USITS), 1999.
|
|
|
Yixin Chen and James Z. Wang.
A region-based fuzzy feature matching approach to content-based image
retrieval.
IEEE Trans. Pattern Anal. Mach. Intell., 24(9):1252-1267,
2002.
|
|
|
Yuan Chen, Jan Edler, Andrew Goldberg, Allan Gottlieb, Sumeet Sobti, and Peter
Yianilos.
A prototype implementation of archival intermemory.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
An Archival Intermemory solves the problem of highly
survivable digital data storage in the spirit of the Internet.
In this paper we describe a prototype implementation of
Intermemory, including an overall system
architecture and implementations of key system components.
The result is a working Intermemory that tolerates
up to 17 simultaneous node failures, and includes
a Web gateway for browser-based access to data. Our
work demonstrates the basic feasibility of Intermemory
and represents significant progress towards a deployable
system.
|
|
|
S. Cheshire and M. Baker.
Internet mobility 4x4.
In Proceedings of the ACM SIGCOMM'96 Conference, Aug 1996.
|
|
|
S. Cheshire and M. Baker.
A wireless network in mosquitonet.
IEEE Micro, Feb 1996.
|
|
|
David Chesnutt.
The model editions partnership: Historical editions in the digital
age.
D-Lib Magazine, Nov 1995.
Format: HTML Document().
|
|
|
Ed H. Chi, James Pitkow, Jock Mackinlay, Peter Pirolli, Rich Gossweiler, and
Stuart K. Card.
Visualizing the evolution of web ecologies.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'98, 1998.
|
|
|
Boris Chidlovskii, Claudia Roncancio, and Marie-Luise Schneider.
Semantic cache mechanism for heterogeneous web querying.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
In Web-based searching systems that access
distributed information providers, efficient
query processing requires an
advanced caching mechanism to reduce the
query response time. The keyword-based
querying is often the only way to
retrieve data from Web providers, and
therefore standard page-based and tuple-
based caching mechanisms turn out to be
improper for such a task. In this work, we
develop a mechanism for efficient caching of
Web queries and the answers
received from heterogeneous Web providers.
We also report results of experiments and
show how the caching mechanism
is implemented in the Knowledge Broker
system.
|
|
|
Boris Childovskii.
Schema extraction from xml collections.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
XML Schema language has been proposed to replace Document Type
Definitions (DTDs) as schema mechanism for XML data. This language consistently
extends grammar-based constructions with constraint- and pattern-based ones and
have a higher expressive power than DTDs. As schemas remain optional for XML,
we
address the problem of XML Schema extraction. We model the XML schema as
extended
context-free grammars and develop a novel extraction algorithm inspired by
methods
of grammatical inference. The algorithm copes also with the schema determinism
requirement imposed by XML DTDs and XML Schema languages.
|
|
|
R. Chimera, K. Wolman, S. Mark, and B. Shneiderman.
An exploratory evaluation of three interfaces for browsing large
hierarchical tables of contents.
In ACM Transactions on Information Systems, 12, 4, pages
383-406, 1994.
|
|
|
Junghoo Cho and Hector Garcia-Molina.
Estimating frequency of change.
In submitted for publication, 2000.
Available at http://dbpubs.stanford.edu/pub/2000-4.
|
|
|
Junghoo Cho and Hector Garcia-Molina.
The evolution of the web and implications for an incremental crawler.
In Proceedings of the Twenty-sixth International Conference on
Very Large Databases, 2000.
Available at http://dbpubs.stanford.edu/pub/1999-22.
In this paper we study how to build an effective
incremental crawler. The crawler selectively and
incrementally updates its index and/or local collection
of web pages, instead of periodically refreshing the
collection in batch mode. The incremental crawler can
improve the ``freshness'' of the collection
significantly and bring in new pages in a more timely
manner. We first present results from an experiment
conducted on more than half million web pages over 4
months, to estimate how web pages evolve over
time. Based on these experimental results, we compare
various design choices for an incremental crawler and
discuss their trade-offs. We propose an architecture
for the incremental crawler, which combines the best
design choices.
|
|
|
Junghoo Cho and Hector Garcia-Molina.
Synchronizing a database to improve freshness.
In Proceedings of the International Conference on Management of
Data, 2000.
Available at http://dbpubs.stanford.edu/pub/1999-40.
In this paper we study how to refresh a local copy of
an autonomous data source to maintain the copy
up-to-date. As the size of the data grows, it becomes
more difficult to maintain the copy fresh, making it
crucial to synchronize the copy effectively. We define
two freshness metrics, change models of the underlying
data, and synchronization policies. We analytically
study how effective the various policies are. We also
experimentally verify our analysis, based on data
collected from 270 web sites for more than 4 months,
and we show that our new policy improves the
freshness very significantly compared to current
policies in use.
|
|
|
Junghoo Cho, Hector Garcia-Molina, and Lawrence Page.
Efficient crawling through url ordering.
In Proceedings of the Seventh International World-Wide Web
Conference, 1998.
Available at http://dbpubs.stanford.edu/pub/1998-51.
In this paper we study in what order a crawler should visit
the URLs it has seen, in order to obtain more important
pages first. Obtaining important pages rapidly can be very
useful when a crawler cannot visit the entire Web in a
reasonable amount of time. We define several importance
metrics, ordering schemes, and performance evaluation
measures for this problem. We also experimentally evaluate
the ordering schemes on the Stanford University Web. Our
results show that a crawler with a good ordering scheme can
obtain important pages significantly faster than one without.
|
|
|
Junghoo Cho, Narayanan Shivakumar, and Hector Garcia-Molina.
Computing document clusters on the web.
In Proceedings of the International Conference on Management of
Data, 1998.
They crawl the Web and automatically find out which sites
completely or partially mirror each other.
|
|
|
Junghoo Cho, Narayanan Shivakumar, and Hector Garcia-Molina.
Finding replicated web collections.
In Proceedings of the International Conference on Management of
Data, 2000.
Available at http://dbpubs.stanford.edu/pub/1999-64.
Many web documents (such as JAVA FAQs) are being
replicated on the Internet. Often entire document
collections (such as hyperlinked Linux manuals) are
being replicated many times. In this paper, we make the
case for identifying replicated documents and
collections to improve web crawlers, archivers, and
ranking functions used in search engines. The paper
describes how to efficiently identify replicated
documents and hyperlinked document collections. The
challenge is to identify these replicas from an input
data set of several tens of millions of web pages and
several hundreds of gigabytes of textual data. We also
present two real-life case studies where we used
replication information to improve a crawler and a
search engine. We report these results for a data set
of 25 million web pages (about 150 gigabytes of HTML
data) crawled from the web
|
|
|
Michael G. Christel and Ronald M. Conescu.
Addressing the challenge of visual information access from digital
image and video libraries.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
While it would seem that digital video libraries should benefit from access mechanisms directed to their visual contents, years of TREC Video Retrieval Evaluation (TRECVID) research have shown that text search against transcript narrative text provides almost all the retrieval capability, even with visually oriented generic topics. A within-subjects study involving 24 novice participants on TRECVID 2004 tasks again confirms this result. The study shows that satisfaction is greater and performance is significantly better on specific and generic information retrieval tasks from news broadcasts when transcripts are available for search. Additional runs with 7 expert users reveal different novice and expert interaction patterns with the video library interface, helping explain the novices’ lack of success with image search and visual feature browsing for visual information needs. Analysis of TRECVID visual features well suited for particular generic tasks provides additional insights into the role of automated feature classification for digital image and video libraries.
|
|
|
Michael G. Christel, Bryan Maher, and Andrew Begun.
Xslt for tailored access to a digital video library.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
Surrogates, summaries, and visualizations have been developed
and evaluated for accessing a digital video library containing thousands of
documents and terabytes of data. These interfaces, formerly implemented
within a monolithic stand-alone application, are being migrated to XML
and XSLT for delivery through web browsers. The merits of these interfaces
are presented, along with a discussion of the benefits in using W3C
recommendations such as XML and XSLT for delivering tailored access to
video over the web.
|
|
|
V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl.
From structured documents to novel query facilities.
In Proceedings of the International Conference on Management of
Data, pages 313-324. ACM Press, New York, 1994.
|
|
|
Wesley W. Chu, M. A. Merzbacher, and L. Berkovich.
The design and implementation of cobase.
In Proceedings of the International Conference on Management of
Data, pages 517-522, Washington, D.C., 1993. ACM Press, New York.
|
|
|
Yi-Chun Chu, David Bainbridge, Matt Jones, and Ian H. Witten.
Realistic books: A bizarre homage to an obsolete medium?
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
For many readers, handling a physical book is an
enjoyably exquisite part of the information seeking process. Many
physical characteristics of a book its size, heft, the patina of use on
its pages and so on communicate ambient qualities of the document it
represents. In contrast, the experience of accessing and exploring
digital library documents is dull. The emphasis is utilitarian;
technophile rather than bibliophile. We have extended the page-turning
algorithm we reported at last year's JCDL into a scaleable, systematic
approach that allows users to view and interact with realistic
visualizations of any textual-based document in a Greenstone
collection. Here, we further motivate the approach, illustrate the
system in use, discuss the system architecture and present a user
evaluation. Our work leads us to believe that far from being a
whimsical gimmick, physical book models can usefully complement
conventional document viewers and increase the perceived value of a
digital library system.
|
|
|
Yi-Chun Chu, Ian H. Witten, Richard Lobb, and David Bainbridge.
How to turn the page [short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
Can digital libraries provide a reading experience that more
closely resembles a real book than a scrolled or paginated electronic display?
This paper describes a prototype page-turning system that realistically animates
full three-dimensional page-turns. The dynamic behavior is generated by a mass-spring
model defined on a rectangular grid of particles. The prototype takes a PDF or E-book
file, renders it into a sequence of PNG images representing individual pages, and
animates the pageturns under user control. The simulation behaves fairly naturally,
although more computer graphics work is required to perfect it.
|
|
|
Lian-Heong Chua, Dion Hoe-Lian Goh, Ee-Peng Lim, Zehua Liu, and Rebecca Pei-Hui
Ang.
A digital library for geography examination resources.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
We describe a Web-based application developed above a digital
library of geographical resources for Singapore students preparing to take a
national examination in geography. The application provides an interactive,
non-sequential approach to learning that supplements textbooks.
|
|
|
Y-Ming Chung, Qin He, Kevin Powell, and Bruce Schatz.
Semantic indexing for a complete subject discipline.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
As part of the Illinois Digital Library Initiative (DLI) project
we developed scalable semantics technologies. These statistical
techniques enabled us to index large collections for
deeper search than word matching. Through the auspices of
the DARPA Information Management program, we are developing
an integrated analysis environment, the Interspace
Prototype, that uses semantic indexing as the foundation
for supporting concept navigation. These semantic indexes
record the contextual correlation of noun phrases, and are
computed generically, independent of subject domain.
Using this technology, we were able to compute semantic indexes
for a subject discipline. In particular, in the summer
of 1998, we computed concept spaces for 9.3M MEDLINE
bibliographic records from the National Library of Medicine
(NLM) which extensively covered the biomedical literature
for the period from 1966 to 1997. In this experiment, we
first partitioned the collection into smaller collections (repositories)
by subject, extracted noun phrases from titles and
abstracts, then performed semantic indexing on these subcollections
by creating a concept space for each repository.
The computation required 2 days on a 128-node SGI/CRAY
Origin 2000 at the National Center for Supercomputer Ap-
plications (NCSA). This experiment demonstrated the feasibility
of scalable semantics techniques for large collections.
With the rapid increase in computing power, we believe this
indexing technology will shortly be feasible on personal computers.
|
|
|
Marcus Tullius Cicero.
De Oratione.
Loeb Classical Library, 55 B.C.
Book II, sec. 350ff.
|
|
|
W. V. Citrin and M. D. Gross.
Pda-based graphical interchange for field service and repair workers.
Computers & Graphics, vol.20, no.5, p. 641-9, 20(5):641-9,
1996.
We present an ongoing project to develop a system to provide
field service workers with timely and accurate service
information. The system will allow workers to download
diagrams or photographs from a host computer's central
database onto a PDA. The workers will be able to annotate the
diagrams to reflect work performed, and later upload the
annotations to the host computer, where they will be
integrated into an updated database. Diagram recognition
functionality is distributed between the PDA (which performs
low-level shape and handwriting recognition) and the host
computer (which performs high-level domain-based diagram
recognition). Distributing the functionality offers a number
of advantages: it allows the relatively resource-poor PDA to
be part of a powerful diagram recognition environment, it
allows the use of standardized hardware-based recognition
facilities in a domain-based recognition system, and it
allows off-line drawing recognition and storage of diagrams,
thereby avoiding excessive use of slow or expensive
communications channels.
|
|
|
Edith Cohen, Haim Kaplan, and Jeffrey Oldham.
Managing tcp connections under persistent http.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
Hyper Text Transfer Protocol (HTTP) traffic
dominates Internet traffic. The exchange of
HTTP messages is implemented
using the connection-oriented TCP. HTTP/l.0
establishes a new TCP connection for each HTTP
request, resulting in many consecutive short-lived TCP connections.
The emerging HTTP/ 1.1 reduces latencies and
overhead from closing and re-establishing connections by supporting
persistent connections as a default. A TCP
connection which is kept open and reused for the next HTTP request reduces
overhead and latency. Open connections,
however, consume sockets and memory
for socket-buffers. This trade-off establishes
a need for connection-management policies. We
propose policies that exploit embedded information in the HTTP request
messages, e.g., senders' identities and
requested URLs. and compare them to the fixed-timeout policy used in the current
implementation of the Apache Web server. An
experimental evaluation of connection management policies at Web
servers, conducted using Web server logs. shows
that our URL-based policy consistently outperforms other policies. and
achieves significant 15-25with respect to the fixed-timeout
policy. Hence, allowing Web servers and clients
to more fully reap the benefits of persistent HTTP.
|
|
|
William W. Cohen and Wei Fan.
Learning page-independent heuristics for extracting data from web
pages.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
One bottleneck in implementing a system that
intelligently queries the Web is developing
`wrappers' - programs that
extract data from Web pages. Here we describe a
method for learning general, page-independent
heuristics for extracting
data from HTML documents. The input to our
learning system is a set of working wrapper
programs, paired with HTML
pages they correctly wrap. The output is a
general procedure for extracting data that
works for many formats and many
pages. In experiments with a collection of 84
constrained but realistic extraction problems,
we demonstrate that 30problems can be handled perfectly by learned
extraction heuristics, and around 50handled acceptably. We also
demonstrate that learned page-independent
extraction heuristics can substantially improve
the performance of methods for
learning page-specific wrappers.
|
|
|
Tammara T.A. Combs and Benjamin B. Bederson.
Does zooming improve image browsing?
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
We describe an image retrieval system we built based on a
Zoomable User Interface (ZUI). We also discuss the
design, results and analysis of a controlled experiment we
performed on the browsing aspects of the system. The
experiment resulted in a statistically significant difference
in the interaction between number of images (25, 75, 225)
and style of browser (2D, ZUI, 3D). The 2D and ZUI
browser systems performed equally, and both performed
better than the 3D systems. The image browsers tested
during the experiment include
Cerious Software's Thumbs Plus, TriVista Technology's
Simple Landscape and Photo GoRound, and our Zoomable
Image Browser based on Pad++.
|
|
|
Jeff Conklin and Michael L. Begeman.
gIBIS: A hypertext tool for exploratory policy discussion.
In Proceedings of the Conference on Computer-Supported
Cooperative Work, CSCW'88, 1988.
|
|
|
Paul Conway.
Yale university library's project open book: Preliminary research
findings.
D-Lib Magazine, Feb 1996.
Format: HTML Document().
|
|
|
Brian Cooper, Mayank Bawa, Neil Daswani, and Hector Garcia-Molina.
Protecting the pipe from malicious peers.
Technical Report 2002-97, Stanford University, 2002.
Digital materials can be protected from failures by replicating
them at multiple autonomous, distributed sites. A significant challenge in such
a distributed system is ensuring that documents are replicated and accessible
despite malicious sites. Such sites may hinder the replication of documents
in a variety of ways, including agreeing to store a copy but erasing it
instead, refusing to serve a document, or serving an altered version of the
document. We describe the design of a a Peer-to-peer Information Preservation
and Exchange (PIPE) network: a distributed replication system that protects
documents both from failures and from malicious nodes. We present the design
of a PIPE system, discuss a threat model for malicious sites, and propose
basic solutions for managing these malicious sites.
|
|
|
Brian Cooper, Mayank Bawa, Neil Daswani, and Hector Garcia-Molina.
Protecting the pipe from malicious peers.
Technical Report 2002-03, Stanford University, 2002.
Digital materials can be protected from failures by replicating them
at multiple autonomous, distributed sites. A Peer-to-peer Information
Preservation and Exchange (PIPE) network is a good way to build a
distributed replication system. A significant challenge in such networks
is ensuring that documents are replicated and accessible despite malicious
sites. Such sites may hinder the replication of documents in a variety of ways,
including agreeing to store a copy but erasing it instead, refusing to serve a
document, or serving an altered version of the document. We define a model of
PIPE networks, a threat model for malicious sites, and propose basic solutions
for managing these malicious sites. The basic solutions are inefficient, but
demonstrate that a secure system can be built. We also sketch ways to improve
the efficiency of the system.
|
|
|
Brian Cooper, Arturo Crespo, and Hector Garcia-Molina.
Implementing a reliable digital object archive.
In Submitted for publication, 2000.
Available at http://dbpubs.stanford.edu/pub/2000-27.
An Archival Repository reliably stores digital objects
for long periods of time (decades or centuries). The
archival nature of the system requires new techniques
for storing, indexing, and replicating digital
objects. In this paper we discuss the specialized
indexing needs of a write-once archive. We also present
a reliability algorithm for effectively replicating
sets of related objects. We describe an administrative
user interface and a data import utility for archival
repositories. Finally, we discuss and evaluate a
prototype repository we have built, the Stanford
Archival Vault, SAV.
|
|
|
Brian Cooper and Hector Garcia-Molina.
Infomonitor: Unobtrusively archiving a world wide web server.
In Submitted for publication, 2000.
Available at http://dbpubs.stanford.edu/pub/2000-15.
It may be important to provide long-term preservation
of digital data even when that data is stored in an
unreliable system, such as a filesystem, a legacy
database, or even the World Wide Web. In this research
paper we focus on the problem of archiving the contents
of a web site without disrupting users who maintain the
site. We propose an archival storage system, the
InfoMonitor, in which a reliable archive is integrated
with an unmodified existing store. Implementing such a
system presents various challenges related to the
mismatch of features between the components, such as
differences in naming and data manipulation
operations. We examine each of these issues as well as
solutions for the conflicts that arise. We also discuss
our experience using the InfoMonitor to archive the
Stanford Database Group's web site.
|
|
|
Brian Cooper and Hector Garcia-Molina.
Peer to peer data trading to preserve information.
Technical Report 2000-33, Stanford University, 2000.
Data archiving systems rely on replication to preserve
information. In this paper, we discuss how a network of autonomous
archiving sites can trade data to achieve the most reliable replication.
A series of binary trades between sites produces a peer to peer
archiving network. We examine two trading algorithms, one based
on trading collections (even if they are different sizes) and another
based on trading equal sized blocks of space (which can then store
collections.) We introduce the concept of deeds, which track the sites
that own space at other sites. We then discuss policies for tuning these
algorithms to provide the highest reliability, for example by changing
the order in which sites are contacted and offered trades. Finally, we
present simulation results that reveal which policies are most reliable.
|
|
|
Brian Cooper and Hector Garcia-Molina.
Peer to peer data trading to preserve information (extended version).
Technical Report 2000-38, Stanford University, 2000.
Data archiving systems rely on replication to preserve
information. In this paper, we discuss how a network of autonomous
archiving sites can trade data to achieve the most reliable replication.
A series of binary trades between sites produces a peer to peer archiving
network. We examine two trading algorithms, one based on trading
collections (even if they are different sizes) and another based on trading
equal sized blocks of space (which can then store collections.) We introduce
the concept of deeds, which track the sites that own space at other sites.
We then discuss policies for tuning these algorithms to provide the highest
reliability, for example by changing the order in which sites are contacted
and offered trades. Finally, we present simulation results that reveal which
policies are most reliable.
|
|
|
Brian Cooper and Hector Garcia-Molina.
Bidding for storage space in a peer-to-peer data preservation system.
Technical Report 2001-52, Stanford University, 2001.
Digital archives protect important data collections from
failures by making multiple copies at other archives, so that there are
always several good copies of a collection. In a cooperative replication
network, sites ``trade'' space, so that each site contributes storage
resources to the system and uses storage resources at other sites. Here,
we examine bid trading: a mechanism where sites conduct auctions to determine
who to trade with. A local site wishing to make a copy of a collection
announces how much remote space is needed, and accepts bids for how much
of its own space the local site must ``pay'' to acquire that remote space.
We examine the best policies for determining when to call auctions and how
much to bid, as well as the effects of ``maverick'' sites that attempt to
subvert the bidding system. Simulations of auction and trading sessio ns
indicate that bid trading can allow sites to achieve higher reliability
than the alternative: a system where sites trade equal amounts of space
without bidding.
|
|
|
Brian Cooper and Hector Garcia-Molina.
Creating trading networks of digital archives.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
Digital archives can best survive failures if they have made several
copies of their collections at remote sites. In this paper, we discuss
how autonomous sites can cooperate to provide preservation by trading
data. We examine the decisions that an archive must make when forming
trading networks, such as the amount of storage space to provide and
the best number of partner sites. We also deal with the fact that some
sites may be more reliable than others. Experimental results from a
data trading simulator illustrate which policies are most
reliable. Our techniques focus on preserving the ``bits'' of digital
collections; other services that focus on other archiving concerns
(such as preserving meaningful metadata) can be built on top of the
system we describe here.
|
|
|
Brian Cooper and Hector Garcia-Molina.
Creating trading networks of digital archives.
Technical Report 2001-04, Stanford University, 2001.
Digital archives can best survive failures if they have made
several copies of their collections at remote sites. In this paper, we
discuss how autonomous sites can cooperate to provide preservation by
trading data. We examine the decisions that an archive must make when forming
trading networks, such as the amount of storage space to provide and the best
number of partner sites. We also deal with the fact that some sites may be
more reliable than others. Experimental results from a data trading simulator
illustrate which policies are most reliable.
|
|
|
Brian Cooper and Hector Garcia-Molina.
Creating trading networks of digital archives.
Technical Report 2001-23, Stanford University, 2001.
Digital archives can best survive failures if they have made
several copies of their collections at remote sites. In this paper, we discuss
how autonomous sites can cooperate to provide preservation by trading data. We
examine the decisions that an archive must make when forming trading networks,
such as the amount of storage space to provide and the best number of partner
sites. We also deal with the fact that some sites may be more reliable than
others. Experimental results from a data trading simulator illustrate which
policies are most reliable. Our techniques focus on preserving the ``bits''
of digital collections; other services that focus on other archiving concerns
(such as preserving meaningful metadata) can be built on top of the system we
describe here.
|
|
|
Brian Cooper and Hector Garcia-Molina.
Peer to peer data trading to preserve information.
Technical Report 2001-7, Stanford University, 2001.
Data archiving systems rely on replication to preserve
information. In this paper, we discuss how a network of autonomous
archiving sites can trade data to achieve the most reliable replication.
A series of binary trades between sites produces a peer to peer archiving
network. We examine two trading algorithms, one based on trading collections
(even if they are different sizes) and another based on trading equal
sized blocks of space (which can then store collections.) We introduce
the concept of deeds, which track the sites that own space at other
sites. We then discuss policies for tuning these algorithms to provide
the highest reliability, for example by changing the order in which sites
are contacted and offered trades. Finally, we present simulation results
that reveal which policies are most reliable.
|
|
|
Brian Cooper and Hector Garcia-Molina.
Peer to peer data trading to preserve information (extended version).
Technical Report 2001-6, Stanford University, 2001.
Data archiving systems rely on replication to preserve
information. In this paper, we discuss how a network of autonomous
archiving sites can trade data to achieve the most reliable replication.
A series of binary trades between sites produces a peer to peer archiving
network. We examine two trading algorithms, one based on trading
collections (even if they are different sizes) and another based on
trading equal sized blocks of space (which can then store collections.)
We introduce the concept of deeds, which track the sites that own space
at other sites. We then discuss policies for tuning these algorithms to
provide the highest reliability, for example by changing the order in which
sites are contacted and offered trades. Finally, we present simulation
results that reveal which policies are most reliable.
|
|
|
Brian Cooper and Hector Garcia-Molina.
Bidding for storage space in a peer-to-peer data preservation system
(extended version).
Technical Report 2002-22, Stanford University, 2002.
Digital archives protect important data collections from failures by
making multiple copies at other archives, so that there are always several good
copies of a collection. In a cooperative replication network, sites ``trade''
space, so that each site contributes storage resources to the system and uses
storage resources at other sites. Here, we examine bid trading: a mechanism
where sites conduct auctions to determine who to trade with. A local site
wishing to make a copy of a collection announces how much remote space is
needed, and accepts bids for how much of its own space the local site
must ``pay'' to acquire that remote space. We examine the best policies for
determining when to call auctions and how much to bid, as well as the effects
of ``maverick'' sites that attempt to subvert the bidding system. Simulations
of auction and trading sessio ns indicate that bid trading can allow sites to
achieve higher reliability than the alternative: a system where sites trade
equal amounts of space without bidding.
|
|
|
Brian F. Cooper and Hector Garcia-Molina.
Modeling and measuring scalable peer-to-peer search networks.
Technical Report 2002-44, Stanford University, 2002.
The popularity of peer-to-peer search networks grows, even as
the limitations to the scalability of existing systems become apparent. We
propose a simple model for search networks, called the search/index links
(SIL) model. The SIL model describes existing networks while also yielding
organizations not previously studied. Using simulation results, we argue that
a new organization, parallel search clusters, is superior to existing
supernode networks in many cases.
|
|
|
Brian F. Cooper and Hector Garcia-Molina.
Modeling and measuring scalable peer-to-peer search networks
(extended version).
Technical Report 2002-43, Stanford University, 2002.
The popularity of peer-to-peer search networks grows, even as
the limitations to scalability of existing systems becomes apparent. We propose
a simple model for search networks, called the search/index links (SIL) model.
The SIL model describes existing networks while also yielding organizations
not previously studied. Using simulation results, we argue that a new
organization, parallel search clusters, is superior to existing supernode
networks in many cases.
|
|
|
James W. Cooper, Mahesh Viswanathan, Donna Byron, and Margaret Chan.
Building searchable collections of enterprise speech data.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
We have applied speech recognition and text-mining
technologies to a set of recorded outbound marketing calls and
analyzed the results. Since speaker-independent speech recognition
technology results in a significantly lower recognition rate than that
found when the recognizer is trained for a particular speaker, we
applied a number of post-processing algorithms to the output of the
recognizer to render it suitable for the Textract text mining system.
We indexed the call transcripts using a search engine and used Textract
and associated Java technologies to place the relevant terms for each
document in a relational database. Following a search query, we generated
a thumbnail display of the results of each call with the salient terms
highlighted. We illustrate these results and discuss their utility. We
took the results of these experiments and continued this analysis on a
set of talks and presentations. We describe a distinct document genre
based on the note-taking concept of document content, and propose a
significant new method for measuring speech recognition accuracy. This
procedure is generally relevant to the problem of capturing meetings and
talks and providing a searchable index of these presentations on the web.
|
|
|
Matthew Cooper, Jonathan Foote, Andreas Girgensohn, and Lynn Wilcox.
Temporal event clustering for digital photo collections.
In Proceedings of the eleventh ACM international conference on
Multimedia, pages 364-373. ACM Press, 2003.
|
|
|
Antony Corfield, Matthew Dovey, Richard Mawby, and Colin Tatham.
Jafer toolkit project - interfacing z39.50 and xml.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
In this paper, we describe the JAFER ToolKit project which is
developing a simplified XML based API above the Z39.50 protocol. The ToolKit
allows the development of both Z39.50 based applications (both clients and
servers) without detailed knowledge of the complexities of the protocol.
|
|
|
Digital Equipment Corporation.
Millicent.
MilliCent website: http://www.millicent.digital.com/.
|
|
|
Microsoft Corporation.
Microsoft wallet.
Microsoft wallet website: http://www.microsoft.com/wallet/.
|
|
|
Steve Cousins.
Reification and Affordances in a User Interface for Interacting
with Heterogeneous Distributed Applications.
PhD thesis, Stanford University, 1997.
Steve Cousin's Ph.D. thesis
|
|
|
Steve B. Cousins.
A task-oriented interface to a digital library.
In CHI 96 Conference Companion, pages 103-104, 1996.
|
|
|
Steve B. Cousins, Scott W. Hassan, Andreas Paepcke, and Terry Winograd.
Towards wide-area distributed interfaces.
Technical Report SIDL-WP-1996-0037; 1997-67, Stanford University,
1996.
Available at http://dbpubs.stanford.edu/pub/1997-67.
Describes how the DLITE design enables shifting of
functionality among distributed components.
|
|
|
Steve B. Cousins, Steven P. Ketchpel, Andreas Paepcke, Héctor
García-Molina, Scott W. Hassan, and Martin Roescheisen.
Interpay: Managing multiple payment mechanisms in digital libraries.
In Proceedings of the Second Annual Conference on the Theory and
Practice of Digital Libraries, 1995.
Format: HTML Document(39K + pictures) .
Audience: Computer Scientists.
References: 10.
Links: 8.
Relevance: High.
Abstract: Describes an architecture called InterPay for allowing
heterogeneous payment mechanisms to interoperate. Defines three levels (a task
level, payment policy level, and payment mechanism level) that may be modified
in
dependently. Describes a working prototype using the ILU distributed object
system from Xerox. Shows a sample transaction using the architecture, and how
the components of the architecture (payment agents, collection agents, and paym
ent and collection capabilities) can be used in more complex transactions.
|
|
|
Steve B. Cousins, Steven P. Ketchpel, Andreas Paepcke, Hector Garcia-Molina,
Scott W. Hassan, and Martin Röscheisen.
Interpay: Managing multiple payment mechanisms in digital libraries.
Digital Library, 1995.
Interpay paper
|
|
|
Steve B. Cousins, Andreas Paepcke, Scott W. Hassan, and Terry Winograd.
Towards wide-area distributed interfaces.
Technical Report SIDL-WP-1996-0037; 1997-67, Stanford University,
1997.
At http://dbpubs.stanford.edu/pub/1997-67.
We have designed and prototyped a series of interfaces for
Digital Libraries. These interfaces use CORBA objects to
distribute interface modeling and rendering across
machines. We describe the design tensions arising in the
context of such distribution, locate existing UI technology
in the resulting design space, and explain the location of
our final prototype in that space. We view Digital Libraries
as collections of repositories and publication-related
services that may be distributed over large distances and
must be accessible from many locations and through multiple
hardware, software, and networking platforms. We describe
our use of CORBA and briefly introduce a drag-and-drop
interface developed to provide unified access to
heterogeneous Digital Library resources.
|
|
|
Steve B. Cousins, Andreas Paepcke, Terry Winograd, Eric A. Bier, and Ken Pier.
The digital library integrated task environment (dlite).
In Proceedings of the Second ACM International Conference on
Digital Libraries, pages 142-151, 1997.
Accessible at http://dbpubs.stanford.edu/pub/1997-69.
|
|
|
B. Cox, D. Tygar, and M. Sirbu.
Netbill security and transaction protocol.
In First USENIX Workshop of Electronic Commerce Proceedings,
1995.
|
|
|
Gregory Crane.
Building a digital library: The perseus project as a case study in
the humanities.
In Proceedings of DL'96, 1996.
Format: Not yet online.
|
|
|
Gregory Crane, David A. Smith, and Clifford E. Wulfman.
Building a hypertextual digital library in the humanities: A case
study on london.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
This paper describes the creation of a new humanities
digital library collection: 11,000,000 words and 10,000 images representing
books, images and maps on pre-twentieth century London and its environs.
The London collection contained far more dense and precise information than the
materials from the Grecco-Roman world on which we had previously concentrated.
The London collection thus allowed us to explore new problems of data
structure, manipulation, and visualization. This paper contrasts our model
for how humanities digital libraries are best used with the assumptions that
underlie many academic digital libraries on the one hand and more literary
hypertexts on the other. Since encoding guidelines such as those from the
TEI provide collection designers with far more options than any one
project can realize, this paper describes what structures we used to
organize the collection and why. We particularly emphasize the importance of
mining historical authority lists (encyclopedias, gazetteers, etc.) and
then generating automatic span-to-span links within the collection.
|
|
|
Gregory Crane, Clifford E. Wulfman, Lisa M. Cerrato, Anne Mahoney, Thomas L.
Milbank, David Mimno, Jeffrey A. Rydberg-Cox, David A. Smith, and Christopher
York.
Towards a cultural heritage digital library.
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
This paper surveys research areas relevant to cultural heritage
digital libraries. The emerging National Science Digital Library promises to
establish the foundation on which those of us beyond the scientific and
engineering community will likely build. This paper thus articulates the
particular issues that we have encountered in developing cultural heritage
collections. We provide a broad overview of audiences, collections, and services.
|
|
|
Nick Craswell and Peter Bailey.
Server selection on the world wide web.
In Proceedings of the Fifth ACM International Conference on
Digital Libraries, 2000.
We evaluate server selection methods in a Web
environment, modeling a digital library which makes use of
existing Web search servers rather than building its own index.
The evaluation framework portrays the Web realistically in
several ways. Its search servers index real Web documents, are
of various sizes, cover different topic areas and employ different
retrieval methods. Selection is based on statistics extracted from
the results of probe queries submitted to each server. We
evaluate published selection methods and a new method for enhancing
selection based on expected search server effectiveness.
Results show CORI to be the most effective of three published
selection methods.
|
|
|
Arturo Crespo and Eric A. Bier.
WebWriter: A browser-based editor for constructing web
applications.
In Proceedings of the Fifth International World-Wide Web
Conference, 1996.
|
|
|
Arturo Crespo, Orkut Buyukkokten, and Hector Garcia-Molina.
Efficient query subscription processing in a multicast environment.
In Proceedings of the 16th International Conference on Data
Engineering, 2000.
Available at http://dbpubs.stanford.edu/pub/2000-54.
This paper introduces techniques for reducing data
dissemination costs of query subscriptions. The
reduction is achieved by merging queries with
overlapping, but not necessarily equal, answers. The
paper formalizes the query-merging problem and
introduces a general cost model for it. We prove that
the problem is NP-hard and propose exhaustive
algorithms and three heuristic algorithms: the Pair
Merging Algorithm, the Directed Search Algorithm and
the Clustering Algorithm. We develop a simulator for
evaluating the different heuristics and show that the
performance of our heuristics is close to optimal.
|
|
|
Arturo Crespo, Bay-Wei Chang, and Eric A. Bier.
Responsive interaction for a large web application: The meteor shower
architecture in the WebWriter II editor.
In Proceedings of the Sixth International World-Wide Web
Conference, 1997.
Traditional server-based web applications allow access to
server-hosted resources, but often exhibit poor
responsiveness due to server load and network delays.
Client-side web applications, on the other hand, provide
excellent interactivity at the expense of limited access to
server resources. The WebWriter II Editor, a direct
manipulation HTML editor that runs in a web browser, uses
both server-side and client-side processing in order to
achieve the advantages of both. In particular, this editor
downloads the document data structure to the browser and
performs all operations locally. The user interface is
based on HTML frames and includes individual frames for
previewing the document and displaying general and specific
control panels. All editing is done by JavaScript code
residing in roughly twenty HTML pages that are downloaded
into these frames as needed. Such a client-server
architecture, based on frames, client-side data structures,
and multiple JavaScript-enhanced HTML pages appears
promising for a wide variety of applications. This paper
describes this architecture, the Meteor Shower Application
Architecture, and its use in the WebWriter II Editor.
|
|
|
Arturo Crespo and Hector Garcia-Molina.
Awareness services for digital libraries.
In Lecture Notes in Computer Science, volume 1324, 1997.
|
|
|
Arturo Crespo and Hector Garcia-Molina.
Archival storage for digital libraries.
In Proceedings of the Third ACM International Conference on
Digital Libraries, 1998.
Accessible at http://dbpubs.stanford.edu/pub/1998-49.
We propose an architecture for Digital Library
Repositories that assures long-term archival storage of
digital objects. The architecture is formed by a
federation of independent but collaborating sites, each
managing a collection of digital objects. The
architecture is based on the following key components:
use of signatures as object handles, no deletions of
digital objects, functional layering of services, the
presence of an awareness service in all layers, and use
of disposable auxiliary structures. Long-term
persistence of digital objects is achieved by creating
replicas at several sites.
|
|
|
Arturo Crespo and Hector Garcia-Molina.
Modeling archival repositories for digital libraries.
In Submitted for publication, 2000.
Available at http://dbpubs.stanford.edu/pub/1999-23.
This paper studies the archival problem: how a digital
library can preserve electronic documents over long
periods of time. We analyze how an archival repository
can fail and we present different strategies that help
solve the problem. We introduce ArchSim, a simulation
tool that for evaluating an implementation of an
archival repository system and compare options such as
different disk reliabilities, error detection and
correction algorithms, preventive maintenance, etc. We
use ArchSim to analyze a case study of an Archival
Repository for Computer Science Technical Reports.
|
|
|
Arturo Crespo and Hector Garcia-Molina.
Cost-driven design for archival repositories.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
Designing an archival repository is a complex task because there
are many alternative configurations, each with different reliability levels and
costs. In this paper we study the costs involved in an Archival Repository and
we
introduce a design framework for evaluating alternatives and choosing the best
configuration in terms of reliability and cost. We also present a new version of
our simulation took, ArchSim/C that aids in the decision process. The design
framework and the usage of ArchSim/C are illustrated with a case study of a
hypothetical (yet realistic) archival repository shared between two
universities.
|
|
|
Arturo Crespo and Hector Garcia-Molina.
Routing indices for peer-to-peer systems.
Technical Report 2001-48, Stanford University, 2001.
Finding information in a peer-to-peer system currently requires
either a costly and vulnerable central index, or flooding the network with queries.
In this paper we introduce the concept of Routing Indices (RIs), which allow nodes
to forward queries to neighbors that are more likely to have answers. If a node
cannot answer a query, it forwards the query to a subset of its neighbors, based
on its local RI, rather than by selecting neighbors at random or by flooding the
network by forwarding the query to all neighbors. We present three RI schemes:
the compound, the hop-count, and the exponential routing indices. We evaluate
their performance via simulations, and find that RIs can improve performance
by one or two orders of magnitude vs. a flooding-based system, and by up to
100the different RI schemes and highlight the effects of key design variables
on system performance.
|
|
|
Fabio Crestani.
Vocal access to a newspaper archive: Design issues and preliminary
investigations.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
This paper presents the design and the current prototype
implementation of an interactive vocal Information Retrieval
system that can be used to access articles of a large news
paper archive using a telephone. The results of preliminary
investigation into the feasibility of such a system are also
presented.
|
|
|
William T. Crocca and William L. Anderson.
Delivering technology for digital libraries: Experiences as vendors.
In DL '95, 1995.
Format: HTML Document (39K + picture).
.
Audience: Computer scientists and librarians
References: 10.
Links: 1.
Relevance: Low.
Abstract: Argues that many of the problems in DL development are not
technical, but social and political, as the nature of the work is transformed.
Describes two Xerox collaborations with academia, one on scanned documents, a
second with a web-based system. Lists some assumptions that are sometimes
made, and to what extent they are borne out. Special concerns are standards,
which ones to support, and ensuring access to acquisitions in old standards.
|
|
|
W. Bruce Croft.
What do people want from information retrieval? (the top 10 research
issues for companies that use and sell ir systems).
D-Lib Magazine, Nov 1995.
Format: HTML Document().
|
|
|
W. Bruce Croft, Robert Cook, and Dean Wilder.
Providing government information on the internet: Experiences with
thomas.
In Proceedings of the Second Annual Conference on the Theory and
Practice of Digital Libraries, 1995.
Format: HTML Document(29K) .
Audience: Information retrieval specialists.
References: 12.
Links: 1.
Relevance: Low.
Abstract: Describes use of THOMAS, the on-line source of congressional
information. Based on the INQUERY engine, offers keyword searches with other
advanced features (proximity, weighted averaging, synonyms) which are largely
ignored by the user population. The tendency is for short (3 word or less)
queries about a single topic. Describes domain-dependent performance
enhancements with the ranking algorithms to ensure that relevant hits appear
near the top
of the ranking.
|
|
|
Isabel Cruz.
. effective abstractions in multimedia.
In DAGS '95, 1995.
Format: PostScript () .
|
|
|
M. I. Crystal and G. E. Jakobson.
FRED, a front end for databases.
Online, 6(5):27-30, September 1982.
|
|
|
Pierre Cubaud, Pascal Stokowski, and Alexandre Topol.
Binding browsing and reading activities in a 3d digital library.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
Browsing through collections and reading activities are separated
in most present WWW-based user's interfaces of digitalized libraries. This
context
break induces longer apprenticeship and navigation time within the interface.
We study in this paper how 3D interaction metaphors can be used to provide a
continuous navigation space for these two tasks.
|
|
|
Hong Cui, P. Bryan Heidorn, and Hong Zhang.
An approach to automatic classification of text for information
retrieval.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
In this paper, we explore an approach to make better use of
semi-structured documents in information retrieval in the domain of biology.
Using machine learning techniques, we make those inherent structures explicit by
XML markups. This marking up has great potentials in improving task performance
in specimen identification and the usability of online flora and fauna.
|
|
|
Sally Jo Cunningham, David Bainbridge, and Masood Masoodian.
How people describe their image information needs: A grounded theory
analysis of visual arts queries.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
When people are looking for visual arts
information - information related to images - how do they
characterize their needs? We analyze a set of 405 queries to identify
the attributes that people provide to the Google Answers' ask an
expert online reference system. The results suggest directions to take
in developing an effective organization and features for an image
digital library.
|
|
|
Sally Jo Cunningham, Chris Knowles, and Nina Reeves.
An ethnographic study of technical support workers: Why we didn't
build a tech support digital library.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
In this paper we describe the results of an ethnographic
study of the information behaviours of university technical support workers
and their information needs. The study looked at how the group identified,
located and used information from a variety of sources to solve problems
arising in the course of their work. The results of the investigation are
discussed in the context of the feasibility of developing a potential
information base that could be used by all members of the group. Whilst a
number of their requirements would easily be fulfilled by the use of a
digital library, other requirements would not. The paper illustrates the
limitations of a digital library with respect to the information behaviours
of this group of subjects and focuses on why a digital library would not
appear to be the ideal support tool for their work.
|
|
|
Sally Jo Cunningham and Nina Reeves.
An ethnographic study of music information seeking: Implications for
the design of a music digital library.
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
At present, music digital library systems are being
developed based on anecdotal evidence of user needs, intuitive
feelings for user information seeking behavior, and a priori
assumptions of typical usage scenarios. Emphasis has instead been
placed on basic research into music document representation,
efficient searching, and audio-based searching, rather than on
exploring the music information needs or information behavior of
a target user group. This paper focuses on eliciting the 'native'
music information strategies employed by people searching for
popular music (that is, music sought for recreational or enjoyment
purposes rather than to support a 'serious' or scientific exploration
of some aspect of music). To this end, we conducted an ethnographic
study of the searching/browsing techniques employed by people in the
researchers' local communities, as they use two common sources of music:
the public library and music stores. We argue that the insights provided
by this type of study can inform the development of searching/browsing
support for music digital libraries.
|
|
|
Te Taka Keegan Sally Jo Cunningham.
Language preference in a bi-language digital library.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
This paper examines user choice of interface language in a bi-language digital library (English and Māori, the language of the indigenous people of New Zealand). The majority of collection documents are in Māori, and the interface is available in both Māori and English. Log analysis shows three categories of preference for interface language: primarily English, primarily Māori, and bilingual (switching back and forth between the two). As digital libraries increase in number, content, and potential user base, interest has grown in ‘multilingual’ or ‘multi-language’ collections-that is, digital libraries in which the collection documents and the collection interface include more than one language. Research in multilingual/multi-language digital libraries and web-based document collections has primarily focused on fundamental implementation issues and functionality, principles for design, and small-scale usability tests; at present no analysis exists of how these systems are used, or how the presence of more than one language in a digital library affects user interactions-presumably because multilingual/multi-language digital libraries are only recently moving from research lab prototypes to fielded systems, and few have built up a significant usage history. This paper describes the application of log analysis to examine interface language preference in a bi-language (English/Māori) digital library-the Niupepa Collection (Section 2). Web log data was collected for a year (Section 3), and log analysis indicates three categories of interface language preferences: English, Māori, and ‘bilingual’ (Section 4). A fine-grained analysis of activities within user sessions indicates different patterns of document access and information gathering strategy between these three categories (Section 5).
|
|
|
Doug Cutting, Bill Janssen, Mike Spreitzer, and Farrell Wymore.
ILU Reference Manual.
Xerox Palo Alto Research Center, December 1993.
Accessible at ftp://ftp.parc.xerox.com/pub/ilu/
ilu.html.
Reference manual. Tech report at cour94
|
|
|
Douglass R. Cutting, David Karger, and Jan Pedersen.
Constant interaction-time scatter/gather browsing of very large
document collections.
In Proceedings of the Sixteenth Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval, pages
126-135, 1993.
|
|
|
Douglass R. Cutting, Jan O. Pedersen, David Karger, and John W. Tukey.
Scatter/gather: A cluster-based approach to browsing large document
collections.
In Proceedings of the Fifteenth Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval, pages
318-329, 1992.
|
|
|
CyberCash.
Cybercash home page.
CyberCash website: http://www.cybercash.com/.
|
|
|
Gordon Dahlquist, Brian Hoffman, and David Millman.
Integrating digital libraries and electronic publishing in the dart
project.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
The Digital Anthropology Resources for Teaching (DART) project integrates the content acquisition and cataloging initiatives of a federated digital repository with the development of scholarly publications and the creation of digital tools to facilitate classroom teaching. The project's technical architecture and unique publishing model create a teaching context where students move easily between primary and secondary source material and between authored environments and independent research, and raise specific issues with regard to metadata, object referral, rights, and exporting content. The model also addresses the loss of provenance and catalog information for digital objects embedded in born-digital publications. The DART project presents a practical methodology to combine repository and publication that is both exportable and discipline-neutral.
|
|
|
Zubin Dalal, Suvendu Dash, Pratik Dave, Luis Francisco-Revilla, Richard Furuta,
Unmil Karadkar, and Frank Shipman.
Managing distributed collections: Evaluating web page changes,
movement, and replacement.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
Distributed collections of Web materials are
common. Bookmark lists, paths, and catalogs such as Yahoo! Directories
require human maintenance to keep up to date with changes to the
underlying documents. The Walden's Paths Path Manager is a tool to
support the maintenance of distributed collections. Earlier efforts
focused on recognizing the type and degree of change within Web pages
and identifying pages no longer accessible. We now extend this work
with algorithms for evaluating drastic changes to page content based on
context. Additionally, we expand on previous work to locate moved pages
and apply the modified approach to suggesting page replacements when
the original page cannot be found. Based on these results we are
redesigning the Path Manager to better support the range of assessments
necessary to manage distributed collections.
|
|
|
Zubin Dalal, Suvendu Dash, Pratik Dave, Luis Francisco-Revilla, Richard Furuta,
Unmil Karadkar, and Frank Shipman.
Managing distributed collections: evaluating web page changes,
movement, and replacement.
In JCDL '04: Proceedings of the 4th ACM/IEEE-CS joint conference
on Digital libraries, pages 160-168, New York, NY, USA, 2004. ACM Press.
Distributed collections of Web materials are
common. Bookmark lists, paths, and catalogs such as
Yahoo! Directories require human maintenance to keep
up to date with changes to the underlying
documents. The Walden's Paths Path Manager is a tool
to support the maintenance of distributed
collections. Earlier efforts focused on recognizing
the type and degree of change within Web pages and
identifying pages no longer accessible. We now
extend this work with algorithms for evaluating
drastic changes to page content based on
context. Additionally, we expand on previous work to
locate moved pages and apply the modified approach
to suggesting page replacements when the original
page cannot be found Based on these results we are
redesigning the Path Manager to better support the
range of assessments necessary to manage distributed
collections.
|
|
|
Raymond J. D'Amore, Daniel J. Helm, Puck-Fai Yan, and Stephen A. Glanowski.
Mitre information discovery system.
In Proceedings of DL'96, 1996.
Format: Not yet online.
|
|
|
B. C. Dasai and S. Swiercz.
Webjounal: Visualization of a web journey.
In Advances in Digital Libraries '95, 1995.
Format: Not Yet Online.
|
|
|
Neil Daswani, Dan Boneh, Hector Garcia-Molina, Steven Ketchpel, and Andreas
Paepcke.
A generalized digital wallet architecture.
In Proceedings of the 3rd USENIX Workshop on Electronic
Commerce, p. 121-39, 1998.
Publishers wishing to distribute text online fear that
customers will download their product and redistribute it
illegally. Although constraining the users to access the data
only through proprietary software that does not allow
downloading helps, it still leaves the possibility that users
could take screen dumps of the material to capture it. The
technique described in the paper relies on the perceptual
properties of the human eye, using two unreadable images
interleaved quickly to create a readable image, which cannot
be screen-dumped since the readability depends on averaging
in the human eye. Our program flickers two images of the text
each with an admixture of grey noise. Your eye sorts out the
letters and reads them, not paying close attention to the
grey background; but any screen dump captures the item at one
instant including the noise. The text is also scrolled up and
down slowly, which again your eye can track, but which would
frustrate a program trying to average out the flickering.
|
|
|
Neil Daswani, Hector Garcia-Molina, and Beverly Yang.
Open problems in data-sharing peer-to-peer systems.
In Proceedings of the 9th International Conference on Database
Theory, 2003.
In a Peer-To-Peer (P2P) system, autonomous computers pool
their resources (e.g., files, storage, compute cycles) in order to
inexpensively handle tasks that would normally require large costly
servers. The scale of these systems, their open nature, and the lack
of centralized control pose difficult performance and security challenges.
Much research has recently focused on tackling some of these challenges; in
this paper, we propose future directions for research in P2P systems, and
highlight problems that have not yet been studied in great depth. We focus
on two particular aspects of P2P systems - search and security - and
suggest several open and important research problems for the community
to address.
|
|
|
Mayur Datar.
Butterflies and peer-to-peer networks.
In Proceedings of the 10th European Symposium on Algorithms,
2002.
Research in Peer-to-peer systems has focussed on building
efficient Content Addressable Networks (CANs), which are essentially distributed
hash tables (DHT) that support location of resources based on unique keys.
While most proposed schemes are robust to a large number of random faults,
there are very few schemes that are robust to a large number of adversarial
faults. In a recent paper Fiat and Saia have proposed such a solution that
is robust to adversarial faults. We propose a new solution based on
multi-butterflies that improves upon the previous solution by Fiat and Saia.
Our new network, multi-hypercube, is a fault tolerant version of the hypercube,
and may find applications to other problems as well. We also demonstrate how
this network can be maintained dynamically. This addresses the first open
problem in the paper by Fiat and Saia.
|
|
|
Mayur Datar.
Butterflies and peer-to-peer networks.
Technical Report 2002-5, Stanford University, 2002.
The popularity of systems like Napster, Gnutella etc. have
spurred recent interest in Peer-to-peer systems. A central problem in all
these systems is efficient location of resources based on their keys. A
network that supports such queries is referred to as Content Addressable
Network (CAN). Many solutions have been proposed to building CANs. However
most of these solutions do not focus on adversarial faults, which might be
critical to building a censorship resistant peer-to-peer system. In a
recent paper Fiat and Saia have proposed a solution to building such a
system. We propose a new solution based on multi-butterflies that improves
upon the previous solution by Fiat and Saia. Our new network,
( multi-hypercube), is a fault tolerant version of hypercube.
We also demonstrate how this network can be maintained dynamically. This
addresses the first open problem in the paper by Fiat and Saia.
|
|
|
Winton H. E. Davies and Pete Edwards.
Agent-based knowledge discovery.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
Hugh Davis.
Using microcosm to access digital libraries.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (6K) .
Audience: UK funders .
References: 4.
Links: 1.
Relevance: Low.
Abstract: A description of the Microcosm system (campus document
delivery), a hypermedia system allowing links to 3rd party viewers.
|
|
|
Hugh Davis and Jessie Hey.
Automatic extraction of hypermedia bundles from the digital library.
In Proceedings of the Second Annual Conference on the Theory and
Practice of Digital Libraries, 1995.
Format: HTML Document(34K + pictures) .
Audience: Digitial library developers and users.
References: 21.
Links: 1.
Relevance: Low.
Abstract: Rather than just retrieving a list of hits for a query, the
system can bundle them, generating hyperlinks on keywords, offer interactive
query expansion or contraction. Suggests the addition of a length (in minu
tes to comprehend) and a reader level field of meta-information.
|
|
|
J. Davis, D. Krafft, and C. Lagoze.
Dienst: Building a production technical report server.
In Advances in Digital Libraries '95, 1995.
Format: Not Yet Online.
|
|
|
James R. Davis.
Creating a networked computer science technical report library.
D-Lib Magazine, Sep 1995.
Format: HTML Document().
|
|
|
James R. Davis.
Creating a networked computer science technical report library.
In Proceedings of DL'96, 1996.
Format: Not yet online.
|
|
|
Marc Davis, Simon King, Nathan Good, and Risto Sarvas.
From context to content: leveraging context to infer media metadata.
In Proceedings of the 12th International Conference on
Multimedia (MM2004), pages 188-195. ACM Press, 2004.
|
|
|
Peter T. Davis, David K. Elson, and Judith L. Klavans.
Methods for precise named entity matching in digital collections
[short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
In this paper, we describe an interactive system. built within
the context of CLiMB project, which permits a user to locate the occurrences of
named entities within a given text. The named entity tool was developed to identify
references to a single art object (e.g. a particular building) with high precision
in text related to images of that object in a digital collection. We start with an
authoritative list of art objects, and seek to match variants of these named entities
in related text. Our approach is to decay entities into progressively more
general variants while retaining high precision. As variants become more general,
and thus more ambiguous, we propose methods to disambiguate intermediate results.
Our results will be used to select records into which automatically generated
metadata will be loaded.
|
|
|
Colin Day.
Economics of electronic publishing.
In JEP, 1994.
Format: HTML Document (31K) .
Audience: Generalist, academic.
References: 1.
Links: 0.
Relevance: low-medium.
Abstract: Discusses the 4 services of publisher and library:
Gathering,
Selecting, Enhancing, and Informing in terms of benefits provided to academics
and society. Argues that distribution of ideas is too important to be
exclusively at the mercy of the market place, and should (like theater or public
TV) be subsidized, but the majority of cost recovery should still be from users.
A
rgues that the producers and consumers (university presses and faculty) are
largely part of the same institution, so there should be gains, but presses have
evolved to be largely independent.
|
|
|
Colin Day.
Pricing electronic products.
In JEP, 1994.
Format: HTML
Document (21K) .
Audience: publishers, librarians.
References: 0.
Links: 0 .
Relevance: low-medium .
Abstract:
Economic discussion of publishing. Looks at first copy and
incremental copy costs. Considers ways that publishers can recover
first copy costs while still distributing to all for whom it is
economically rational (value is greater than incremental cost.)
Possible models: 1) country club, where one pays high up-front dues,
but then low per-transaction cost; 2) differentiated costs where
different products are provided, one at a higher cost with certain
features, a second at lower (marginal) cost, e.g., more expensive
hardcover comes out first, followed by cheap paperback months later.
Mentions 3 specific examples: Project Muse, Chicago Journal of Theoretical
Computer Science, and Mathematical Reviews.
|
|
|
J.D. Day and H. Zimmermann.
The osi reference model.
Proc. of the IEEE, 71:1334-1340, December 1983.
|
|
|
D.Choy, R. Dievendorff, C. Dwork, J. B. Lotspiech, R. T. Morris, L. C.
Anderson, A. E. Bell, S. K. Boyer, T. D. Griffin, B. A. Hoenig, J. M.
McCrossin, A. M. Miller, N. J. Pass, F. P estoni, and D. S. Picciano.
The almaden distributed digital library system.
In Advances in Digital Libraries '95, 1995.
Format: Not Yet Online.
|
|
|
O. de Bruijn, R. Spence, and M. Y. Chong.
Rsvp browser: Web browsing on small screen devices.
Personal Ubiquitous Comput., 6(4):245-252, 2002.
Abstract: In this paper, we illustrate the use of space-time
trade-offs for information presentation on small screens. We
propose the use of Rapid Serial Visual Presentation (RSVP) to
provide a rich set of navigational information for Web
browsing. The principle of RSVP browsing is applied to the
development of a Web browser for small screen devices, the
RSVP browser. The results of an experiment in which Web
browsing with the RSVP browser is compared with that of a
typical WAP browser suggests that RSVP browsing may indeed
offer alternative to other forms of Web browsing on small
screen devices.
|
|
|
Jeffrey Dean and Monika R. Henzinger.
Finding related pages in the world wide web.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
When using traditional search engines, users
have to formulate queries to describe their
information need. This paper discusses a different approach to Web searching
where the input to the search process is not a
set of query terms, but instead is the URL of a page, and the output is
a set of related Web pages. A related Web page
is one that addresses the same topic as the original page. For example,
www.washingtonpost.con is a page related to
www.nytimes.con, since both are online newspapers.
We describe two algorithms to identify related
Web pages. These algorithms use only the
connectivity information in the Web (i.e., the links between pages) and
not the content of pages or usage information.
We have implemented both algorithms and measured their
runtime performance. To evaluate the effectiveness of our algorithms,
we performed a user study comparing our
algorithms with Netscape's `What's Related'
service (http://home.netscape.con/escapes/related/). Our study
showed that the precision at 10 for our two
algorithms are 73despite the fact that Netscape uses both
content and usage pattern information
in addition to connectivity information.
|
|
|
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and
Richard Harshman.
Indexing by latent semantic analysis.
Journal of the American Society for Information Science, 41,
June 1990.
|
|
|
A.J. Demers, K. Petersen, M.J. Spreitzer, D.B. Terry, M.M. Theimer, and B.B.
Welch.
The bayou architecture: Support for data sharing among mobile users.
In Proceedings IEEE Workshop on Mobile Computing Systems &
Applications, pages 2-7, Santa Cruz, California, December 8-9 1994.
At http://www.parc.xerox.com/bayou/.
|
|
|
Robert Demolombe and Andrew Jones.
A common logical framework to retrieve information and meta
information.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
Dorothy E. Denning and Peter J. Denning.
Data security.
ACM Computing Surveys, 11(3):227-249, September 1979.
This paper discusses four kinds of security controls: access
control, flow control, inference control, and
data encryption. It describes the general nature of
controls of each type, the kinds of problems they
can and cannot solve, and their inherent limitations
and weakness.
|
|
|
Jack B. Dennis and Earl C. Van Horn.
Programming semantics for multiprogrammed computations.
In Communications of the ACM, 1966.
|
|
|
Mark Derthick.
Interfaces for palmtop image search.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
We expect that people will want to search for video news or
entertainment on mobile platforms as soon as the technology
is ready. An Ipaq palmtop version of the Informedia Digital
Video Library interface has already been developed at the
Chinese University of Hong Kong. Separately, we used the
Desktop Informedia interface for the interactive part of the
Trec10 video track competition. The lesson we learned is
that automated image search is so poor that the best
interactive results come from showing the user many images
quickly, and allowing flexible drill down to images from
nearby shots. Here we report on an effort to apply this
lesson to palmtop platforms, where showing a large grid of
images in parallel is not feasible. Perceptual psychology
experiments suggest that time-multiplexing may be as
effective as space-multiplexing for this kind of primed
recognition task. In fact, it has been specifically
suggested that image retrieval interfaces using Rapid Serial
Visual Presentation (RSVP) may perform significantly better
than parallel presentation even on a desktop computer [2].
In our experiments, we did not find this to be true. An
important difference between previous experiments and our
own, we discovered, is that image search engines rank
retrievals, and correct answers are more likely to occur
early in the list of results. Thus we found that scrolling
(and low RSVP presentation rates) led to better recognition
of answers that occur early, but worse for answers that occur
far down the list.
This split confounded the global effects that we had
hypothesized, yet in itself is an important consideration for
future interface designs, which must adapt as search
technology improves.
|
|
|
J. P. Deschrevel.
The ansa model for trading and federation.
Technical Report APM.1005.01, APM, Cambridge, 1989.
|
|
|
Hrishikesh Deshpande, Mayank Bawa, and Hector Garcia-Molina.
Streaming live media over a peer-to-peer network.
Technical Report 2001-31, Stanford University, 2001.
The high bandwidth required by live streaming video greatly
limits the number of clients that can be served by a source. In this work,
we discuss and evaluate an architecture, called SpreadIt, for
streaming live media over a network of clients, using the resources of
the clients themselves. Using SpreadIt, we can distribute bandwidth
requirements over the network. The key challenge is to allow an application
level multicast tree to be easily maintained over a network of transient
peers, while ensuring that quality of service does not degrade. We propose
a basic peering infrastructure layer for streaming applications, which
uses a redirect primitive to meet the challenge successfully. Through
empirical and simulation studies, we show that SpreadIt provides a
good quality of service, which degrades gracefully with increasing
number of clients. Perhaps more significantly, existing applications
can be made to work with SpreadIt, without any change to their code base.
|
|
|
Hrishikesh Deshpande, Mayank Bawa, and Hector Garcia-Molina.
Streaming live media over a peer-to-peer network.
Technical Report 2001-30, Stanford University, 2001.
The high bandwidth required by live streaming video greatly limits the
number of clients that can be served by a source. In this work, we
discuss and evaluate an architecture, called SpreadIt, for streaming
live media over a network of clients, using the resources of the clients
themselves. Using SpreadIt, we can distribute bandwidth requirements over
the network. The key challenge is to allow an application level multicast
tree to be easily maintained over a network of transient peers, while
ensuring that quality of service does not degrade. We propose a basic
peering infrastructure layer for streaming applications, which uses a
redirect primitive to meet the challenge successfully. Through empirical
and simulation studies, we show that SpreadIt provides a good quality of
service, which degrades gracefully with increasing number of clients.
Perhaps more significantly, existing applications can be made to work
with SpreadIt, without any change to their code base.
|
|
|
Hrishikesh Deshpande, Mayank Bawa, and Hector Garcia-Molina.
Streaming live media over peers.
Technical Report 2002-21, Stanford University, 2002.
The high bandwidth required by live streaming video greatly
limits the number of clients that can be served by a source using unicast.
An efficient solution is IP-multicast, but it suffers from poor deployment.
Application-level multicast is being increasingly recognized as a viable
alternative. In this work, we discuss and evaluate a tree-based overlay network
called PeerCast that uses clients to forward the stream to their peers.
PeerCast is designed as a live-media streaming solution for peer-to-peer systems
that are populated by hundreds of autonomous, short-lived nodes. Further, we
argue for the need to take end-host behavior into account while evaluating an
application-level multicast architecture. An end-host behavior model is
proposed that allows us to capture a range of realistic peer behavior. Using
this model, we develop robust, yet simple, tree-maintenance policies. Through
empirical runs and extensive simulations, we show that PeerCast provides good
QoS, which gracefully degrades with the number of clients. We have implemented
a PeerCast prototype, which is available for download.
|
|
|
Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu.
A query language for xml.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
An important application of XML is the interchange of
electronic data (EDI) between multiple data sources on
the Web. As XML data proliferates on the Web,
applications will need to integrate and aggregate data
from multiple source and clean and transform data to
facilitate exchange. Data extraction, conversion,
transformation. and integration are all well-understood
database problems, and their solutions rely on a query
language. We present a query language for XML, called
XML-QL, which we argue is suitable for performing the
above tasks. XML-QL is a declarative. `relational
complete' query language and is simple enough that it
can be optimized. XML-QL can extract data from existing
XML documents and construct new XML documents.
|
|
|
Ann S. Devlin.
Mind and maze : spatial cognition and environmental behavior.
Praeger, 2001.
|
|
|
Prasun Dewan, Kevin Jeffay, John Smith, David Stotts, and William Oliver.
Early prototypes of the repository for patterned injury data.
In Proceedings of the Second Annual Conference on the Theory and
Practice of Digital Libraries, 1995.
Format: HTML Document (34K + pictures) .
Audience: Medical forensics, computer scientists.
References: 15.
Links: 1.
Relevance: Low-Medium.
Abstract: Describes a system for collaboration among coroners. Focuses
on issues of access rights-people in different roles (lead examiner,
toxicologist, judge) see different views of the same data (some fields are read-
prote
cted). Initial prototype was under the ABC system of UNC, but new prototypes
will be web-based. Also hope to incorporate tele-conferencing capabilities.
|
|
|
Anind K. Dey.
Understanding and using context.
Personal Ubiquitous Comput., 5(1):4-7, 2001.
|
|
|
Anind K. Dey and Gregory D. Abowd.
Towards a better understanding of context and context-awareness.
In Workshop on The What, Who, Where, When, and How of
Context-Awareness, as part of the 2000 Conference on Human Factors in
Computing Systems (CHI 2000), April 2000.
|
|
|
Anne R. Diekema and Jiangping Chen.
Experimenting with the automatic assignment of educational standards
to digital library content.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
This paper describes exploratory research concerning the automatic assignment of educational standards to lesson plans. An information retrieval based solution was proposed, and the results of several experiments are discussed. Results suggest the optimal solution would be a recommender tool where catalogers receive suggestions from the system but humans make the final decision.
|
|
|
DigiCash.
Digicash: Solutions for security and privacy.
DigiCash website: http://www.digicash.com/.
|
|
|
M. Diligenti, F. M. Coetzee, S. Lawrence, C. L. Giles, and M. Gori.
Focused crawling using context graphs.
In Proceedings of the Twenty-sixth International Conference on
Very Large Databases, 2000.
|
|
|
Michelangelo Diligenti, Frans Coetzee, Steve Lawrence, C. Lee Giles, and Marco
Gori.
Focused crawling using context graphs.
In Proceedings of the Twenty-sixth International Conference on
Very Large Databases, pages 527-534, September 2000.
|
|
|
Junyan Ding, Luis Gravano, and Narayanan Shivakumar.
Computing geographical scopes of web resources.
In Proceedings of the Twenty-sixth International Conference on
Very Large Databases, pages 545-556. Morgan Kaufmann Publishers Inc., 2000.
|
|
|
Wei Ding, Gary Marchionini, and Dagobert Soergel.
Multimodal surrogates for video browsing.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
Three types of video surrogates - visual (keyframes),
verbal (keywords/phrases), and visual
and verbal - were designed and studied in a
qualitative investigation of user cognitive processes.
The results favor the combined surrogates in which
verbal information and images
reinforce each other, lead to better comprehension,
and may actually require less processing
time, The results also highlight image features
users found most helpful. These findings will
inform the interface design and video representation
for video retrieval and browsing.
|
|
|
D.Koller and Y. Shoham.
Information agents: A new challenge for AI.
IEEE Expert, pages 8-10, June 1996.
|
|
|
R. Dolin, D. Agrawal, and A. El Abbadi.
Scalable collection summarization and selection.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
Information retrieval over the Internet increasingly requires
the filtering of thousands of information sources.
As the number and variety of sources increases, new
ways of automatically summarizing, discovering, and
selecting sources relevant to a user's query are needed.
Pharos is a highly scalable distributed architecture for
locating heterogeneous information sources. Its design
is hierarchical, thus allowing it to scale well as the number
of information sources increases. We demonstrate
the feasibility of the Pharos architecture using 2500
Usenet newsgroups as separate collections. Each newsgroup
is summarized via automated Library of Congress
classification. We show that using Pharos as an intermediate
retrieval mechanism provides acceptable accuracy
of source selection compared to selecting sources
using complete classification information, while maintaining
good scalability. This implies that hierarchical
distributed metadata and automated classification
are potentially useful paradigms to address scalability
problems in large-scale distributed information retrieval
applications.
|
|
|
Document Object Model Level 1 specification.
http://www.w3.org/TR/REC-DOM-Level-1/.
|
|
|
Peter Domel.
Webmap: a graphical hypertext navigation tool.
In Proceedings of the Second International World-Wide Web
Conference, 1994.
|
|
|
Andy Dong and Alice M. Agogino.
Design principles for the information architecture of a smet
education digital library.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
This implementation paper introduces principles for the
information architecture of an educational digital library, principles
that address the distinction between designing digital libraries for
education and designing digital libraries for information retrieval in general.
Design is a key element of any successful product. Good designers and their
designs put technology into the hands of the user, making the produc'ts focus
comprehensible and tangible through design. As straightforward as this may
appear,
the design of learning technologies is often masked by the enabling technology.
In fact, they often lack an explicitly stated instructional design methodology.
While the technologies are important hurdles to overcome, we advocate learning
systems that empower education-driven experiences rather than technology-driven
experiences. This work describes a concept for a digital library for science,
mathematics, engineering and technology education (SMETE), a library with an
information architecture designed to meet learners' and educators' needs.
Utilizing a constructivist model of learning, the authors present practical
approaches to implementing the information architecture and it technology
underpinnings. The authors propose the specifications for the information
architecture and a visual design of a digital library for communicating learning
to the audience. The design methodology indicates that a scenario-driven design
technique sensitive to the contextual nature of learning offers a useful
framework
for tailoring technologies that help empower, not hinder, the educational
sector.
|
|
|
Jim Dorward, Derek Reinke, and Mimi Recker.
An evaluation model for a digital library services tool.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
This paper describes an evaluation model for a digital library
tool,
the Instructional Architect, which enables users to discover, select, reuse,
sequence, and annotate digital library learning objects. By documenting our
rapid-prototyping, iterative, and user-centered approach for evaluating a
digital library service, we provide a model and set of methods that other
developers may wish to employ. In addition, we provide preliminary results
from our studies.
|
|
|
Fred Douglis, Thomas Ball, Yih-Farn Chen, and Eleftherios Koutsofios.
Webguide: Querying and navigating changes in web repositories.
In Proceedings of the Fifth International World-Wide Web
Conference, May 1996.
|
|
|
Fred Douglis, Thomas Ball, Yih-Farn Chen, and Eleftherios Koutsofios.
The at&t internet difference engine: Tracking and viewing changes on
the web.
World Wide Web, 1(1):27-44, January 1998.
|
|
|
Fred Douglis, Anja Feldmann, and Balachander Krishnamurthy.
Rate of change and other metrics: a live study of the world wide web.
In USENIX Symposium on Internetworking Technologies and
Systems, 1999.
|
|
|
Fred Douglis, Antonio Haro, and Michael Rabinovich.
Hpp: Html macro-preprocessing to support dynamic document caching.
In Proceedings of the USENIX Symposium on Internet Technologies
and Systems, Monterey, California, 1997.
|
|
|
Michael Droettboom.
Correcting broken characters in the recognition of historical printed
documents [short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
This paper presents a new technique for dealing with broken characters,
one of the major challenges in the optical character recognition (OCR) of degraded
historical printed documents. A technique based on graph combinatorics is used to rejoin
the appropriate connected components. It has been applied to real data with successful
results.
|
|
|
Michael Droettboom, Karl MacMillan, Iciro Fujinaga, G. Sayeed Choudhury, Tim
DiLauro, Mark Patton, and Teal Anderson.
Using the gamera framework for the recognition of cultural heritage
materials.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
This paper presents a new toolkit for the creation of customized
structured document recognition applications by domain experts. This open-
source
system, called Gamera, allows a user, with particular knowledge of the documents
to be recognized, to combine image processing and recognition tools in an easy
to
use, interactive, graphical scripting environment. Gamera is one of the key
technology components in a proposed international project for the digitization
of
diverse types of humanities documents.
|
|
|
Steven M. Drucker, Curtis Wong, Asta Roseway, Steven Glenner, and Steven De
Mar.
Mediabrowser: reclaiming the shoebox.
In AVI '04: Proceedings of the working conference on Advanced
visual interfaces, pages 433-436, New York, NY, USA, 2004. ACM Press.
|
|
|
Allison Druin, Benjamin B. Bederson, Juan Pablo Hourcade, Lisa Sherman, Glenda
Revelle, Michele Platner, and Stacy Weng.
Designing a digital library for young children: An intergenerational
partnership.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
As more information resources become accessible using computers,
our digital
interfaces to those resources need to be appropriate for all people. However
when it
comes to digital libraries, the interfaces have typically been designed for
older children or adults. Therefore, we have begun to develop a digital
library interface developmentally appropriate for young children (ages 5-10
years old). Our prototype system we now call SearchKinds offers a graphical
interface for
querying, browsing and reviewing search results. This paper describes our
motifation for the research, the design partnership we established between
children and adults, our design process, the technology outcomes of our current
work,
and the lessons we have learned.
|
|
|
D. Dubois and H. Prade.
Fuzzy Sets and Systems: Theory and Applications.
Academic Press, New York, 1980.
|
|
|
Monica Duke, Michael Day, Rachel Heery, Leslie A. Carr, and Simon J. Coles.
Enhancing access to research data : the challenge of crystallography.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
This paper describes an ongoing collaborative effort across digital library and scientific communities in the UK to improve access to research data. A prototype demonstrator service supporting the discovery and retrieval of detailed results of crystallography experiments has been deployed within an Open Archives digital library service model. Early challenges include the understanding of requirements in this specialized area of chemistry and reaching consensus on the design of a metadata model and schema. Future plans encompass the exploration of commonality and overlap with other schemas and across disciplines, working with publishers to develop mutually beneficial service models, and investigation of the pedagogical benefits. The potential improved access to experimental data to enrich scholarly communication from the perspective of both research and learning provides the driving force to continue exploring these issues.
|
|
|
Susan Dumais, Edward Cutrell, JJ Cadiz, Gavin Jancke, Raman Sarin, and
Daniel C. Robbins.
Stuff i've seen: a system for personal information retrieval and
re-use.
In Proceedings of the Twenty-Sixth Annual International ACM
SIGIR Conference on Research and Development in Information Retrieval, pages
72-79. ACM Press, 2003.
|
|
|
Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Scott Deerwester, and
Richard Harshman.
Using latent semantic analysis to improve access to textual
information.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'88, 1988.
A main citation for LSI. It explains roughly how it works.
|
|
|
M. H. Dunham and A. Helal.
A mobile transaction model that captures both the data and movement
behavior.
Mobile Networks and Applications, 2(2):149-62, 1997.
Unlike distributed transactions, mobile transactions do not
originate and end at the same site. The implication of the
movement of such transactions is that classical atomicity,
concurrency and recovery solutions must be revisited to
capture the movement behavior. As an effort in this
direction, we define a model of mobile transactions by
building on the concepts of split transactions and global
transactions in a multidatabase environment. Our view of
mobile transactions, called kangaroo transactions,
incorporates the property that transactions in a mobile
computing system hop from one base station to another as the
mobile unit moves through cells. Our model is the first to
capture this movement behavior as well as the data behavior
which reflects the access to data located in databases
throughout the static network. The mobile behavior is dynamic
and is realized in our model via the use of split operations.
The data access behavior is captured by using the idea of
global and local transactions in a multidatabase system.
|
|
|
Elke Dunker.
Cross-cultural usability of the library metaphor.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
Computing metaphors have become an intricate part of information
systems design. Yet, they are deeply rooted in cultural practices. This paper
presents an investigation of the cross-cultural use and usability of the
library
metaphor in digital libraries. The study examines the relevant features of the
Maori culture in New Zealand, their form of knowledge transfer and their use of
real world and digital libraries. On this basis the paper points out why and
when the library metaphor fails Maori and other indigenous users and how
this knowledge can contribute to the improvement of future designs.
|
|
|
Hayley Dunlop, Matt Jones, and Sally Jo Cunningham.
A digital library of conversational expressions: A communication aid
for people with profound physical disabilities.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
This paper describes the development of a communication aid for
people with profound physical disabilities, people who cannot communicate
verbally, and who cannot use conventional communication tools. The Greenstone
digital library software has been used to construct a digital library of
common conversational expressions. A case study approach was adopted, and
the target user for this particular digital library was a local high school
student. Tailoring the digital library's contents to this user entailed
identifying physical accessibility considerations for her, developing a suitable
mode of interaction with the digital library software, populating the digital
library with appropriate expressions for her, and evaluating the digital
library interface. Evaluation involved both a qualitative user evaluation
session and a quantitative analysis of the time and effort required to use
each of three proposed searching interfaces.
|
|
|
Jon W. Dunn and Costance A. Mayer.
Variations: A digital music library system at indiana university.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
The field of music provides an interesting context for the
development of digital library systems due to the variety of
information formats used by music students and scholars. The
VARIATIONS digital library project at Indiana University
currently delivers online access to sound recordings from the
collections of IU's William and Gayle Cook Music Library and is
developing access to musical score images and other formats.
This paper covers the motivations for the creation of
VARIATIONS, an overview of its operation and implementation,
user reactions to the system, and future plans for development.
|
|
|
Oliver M. Duschka.
Query Planning and Optimization in Information Integration.
PhD thesis, Stanford University, December 1997.
|
|
|
Naomi Dushay.
Localizing experience of digital content via structural metadata.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
With the increasing technical sophistication of both information
consumers and providers, there is increasing demand for more meaningful
experiences of digital information. We present a framework that separates
digital object experience, or rendering, from digital object storage and
manipulation, so the rendering can be tailored to particular communities of
users. Our framework also accommodates extensible digital object behaviors
and interoperability. The two key components of our approach are 1) exposing
structural metadata associated with digital objects - metadata about labeled
access points within a digital object and 2) information intermediaries called
context brokers that match structural characteristics of digital objects with
mechanisms that produce behaviors. These context brokers allow for localized
rendering of digital information stored externally.
|
|
|
Naomi Dushay, James C. French, and Carl Lagoze.
Using query mediators for distributed searching in federated digital
libraries.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
Resource discovery in a distributed digital library poses many
challenges, one of which is how to choose search engines for
query distribution. In this paper, we describe a federated,
distributed digital library architecture and introduce the notion of
a query mediator as a digital library service responsible for
selecting among available search engines, routing queries to those
search engines, and aggregating results. We examine operational
data from the NCSTRL digital library, focusing on two
characteristics of distributed resource discovery: availability
(will a search engine respond within a time limit) and response time
(how quickly will a search engine respond, given that it does
respond) and distinguishing between the query mediator view of
these characteristics and the indexer view. We also examine the
accuracy of predictions we made of QM-view availability and
response times of search engines.
|
|
|
P. Duygulu, Kobus Barnard, J. F. G. de Freitas, and David A. Forsyth.
Object recognition as machine translation: Learning a lexicon for a
fixed image vocabulary.
In Proceedings of the 7th European Conference on Computer Vision
(ECCV '02), pages 97-112. Springer-Verlag, 2002.
|
|
|
Lena Veiga e Silva, Alberto H. F. Laender, and Marcos Andre Goncalves.
A usability evaluation study of a digital library self-archiving
service.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
In this paper, we describe an evaluation study of a self-archiving service for the Brazilian Digital Library of Computing (BDBComp). We conducted an extensive usability experiment with several potential users, including graduate students, professors, and archivists/librarians. The results of the study are described and analyzed, following sound statistical principles.
|
|
|
D. Eastlake.
Universal payment preamble specification.
W3C website: http://www.w3.org/ECommerce/specs/upp.txt.
|
|
|
Joseph L. Ebersole.
Response to dr. linn's paper.
In IP Workshop Proccedings, 1994.
Format: HTML Document (15K).
Audience: Readers of Dr. Linn's article, lawyers.
References: 5.
Links: 0.
Relevance: Low-medium.
Abstract: Discusses the differences between a common carrier,
distributor,
and publisher. Also discusses trade secrets, fair use.
|
|
|
Judith Edwards.
The electronic world and central queensland university.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (6K) .
Audience: DL '94 officials & attendees.
References: 0 .
Links: 1 .
Relevance: Low.
Abstract: Queensland U's interest in attending DL '94. Some statistics
on current and expected use of networked information servers.
|
|
|
Miles Efron, Jonathan Elsas, Gary Marchionini, and Junliang Zhang.
Machine learning for information architecture in a large governmental
website.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
This paper describes ongoing research into the
application of machine learning techniques for improving access to
governmental information in complex digital libraries. Under the
auspices of the GovStat Project (http://www.ils.unc.edu/govstat), our
goal is to identify a small number of semantically valid concepts that
adequately spans the intellectual domain of a collection. The goal of
this discovery is twofold. First we desire a principled aid to
information architects. Second, automatically derived document-concept
relationships are a necessary precondition for real-world deployment of
many dynamic interfaces. The current study compares concept learning
strategies based on three document representations: keywords, titles,
and full-text. In statistical and user-based studies, human-created
keywords provide significant improvements in concept learning over both
title-only and full-text representations.
|
|
|
Miles Efron and Donald Sizemore.
Link attachment (preferential and otherwise) in contributor-run
digital libraries [short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
Ibiblio is a digital library whose materials are submitted and maintained by
volunteer contributors. This study analyzes the emergence of hyperlinked structures within the
ibiblio collection. In the context of ibiblio, we analyze the suitability of Barabasi's model
of preferential attachment to describe the distribution of incoming links. We find that the
degree of maintainer activity for a given site (as measured by the voluntary development of
descriptive metadata) is a stronger link count predictor for ibiblio than is a site's age,
as the standard model predicts. Thus we argue that the efforts of ibiblio's contributors
positively affect the popularity of their materials.
|
|
|
Jr. E.G. Coffman, Zhen Liu, and Richard R. Weber.
Optimal robot scheduling for web search engines.
Technical report, INRIA, 1997.
|
|
|
Dennis E. Egan, Joel R. Remde, and Thomas K. Landauer.
Behavioral evaluation and analysis of a hypertext browser.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'89, 1989.
|
|
|
L. Egghe and R. Rousseau.
Introduction to Informetrics.
Elsevier, 1990.
|
|
|
Kate Ehrlich and Debra Cash.
Turning information into knowledge: Information finding as a
collaborative activity.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (30K) .
Audience: Non-technical, social science, work flow.
References: 16.
Links: 1.
Relevance: Low.
Abstract: Case study of customer service organization that uses Lotus
Notes. Discusses importance of face-to-face, informal communication, human
information mediators.
|
|
|
Thomas Ellman.
Approximation and abstraction techniques for generating concise
answers to database queries.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
Ahmed K. Elmagarmid.
Database Transaction Models for Advanced Applications.
Morgan Kaufmann, San Mateo, CA, 1992.
|
|
|
Sara Elo.
Augmenting text: Good news on disasters.
In DAGS '95, 1995.
Format: HTML Document (30K + picture) .
Audience: General .
References: 12.
Links: 3.
Relevance: Medium.
Abstract: News wire stories on disasters are annotated with facts that
relate to the reader's local region. (eg, casualties are cast as a multiple of
the hometown population). Readers from different locales see different aug
mentations. Frames triggered by disaster keywords are filled in with relevant
material, which is then personalized.
|
|
|
T. Todd Elvins, David R. Nadeau, Rina Schul, and David Kirsh.
Worldlets: 3d thumbnails for 3d browsing.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'98, 1998.
|
|
|
D.W. Embley.
NFQL: The natural forms query language.
ACM Transactions on Database Systems, 14(2):168-211, June
1989.
They go beyond retrieval, to include updates and other ops
|
|
|
Robert Engelmore and Tony Morgan.
Blackboard Systems.
Addison-Wesley, 1988.
A collection of papers that introduce blackboard
systems, that provide a historical perspective
of blackboard systems, that evaluate the
contributions made by different systems, and that
illustrate by example the range of blackboard
applications and implementations.
|
|
|
John S. Erickson.
A copyright management system for networked interactive multimedia.
In DAGS '95, 1995.
Format: HTML Document (13K + pictures) .
Audience: Multimedia developers, computer scientists.
References: 8.
Links: 1.
Relevance: Medium-Low.
Abstract: Describes a rights management system for multimedia objects
called LicensIt. Wrapper around object includes information about author,
rights required, and digital signature to verify authenticity. Viewing objects
i
s through special LicensIt viewers or through commercial applications with
LicenseIt plug-ins.
|
|
|
D. Faensen, L. Faulstich, H. Schweppe, A. Hinze, and A. Steidinger.
Hermes - a notification service for digital libraries.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
The high publication rate of scholarly material makes searching
and browsing an inconveninet way to keep oneself up-to-date.
Instead of being the active part in information access, researchers want
to be notified whenever a new paper in one's research area is published.
While more and more publishing houses or portal sites offer notification
services this approach has several disadvantages. We introduce the Hermes
alerting service, a service that integrates a variety of different
information providers making their heterogeneity transparent for the users.
Hermes offers sophisticated filtering capabilities preventing the user
from drowning in a flood of irrelevant information. From the user's point of
view it integrates the providers into a single source. Its simple provider
interface
makes it easy for publishers to join the service and thus reaching the
potential readers directly. This paper presents the architecture of the
Hermes service and discusses the issues of heterogeneity of information sources.
Furthermore, we discuss the benefits and disadvantages of message-oriented
middleware for implementing such a service for digital libraries.
|
|
|
C. Faloutsos and S. Christodoulakis.
Signature files: An access method for documents and its analytical
performance evaluation.
ACM Transactions on Office Information Systems, 2(4):267-288,
October 1984.
|
|
|
C. Faloutsos and D. Oard.
A survey of information retrieval and filtering methods.
Technical report, Dept. of Computer Science, University of Maryland,
1995.
|
|
|
Christos Faloutsos.
Access methods for text.
ACM Computing Surveys, 17(1):49-74, March 1985.
|
|
|
Jianping Fan, Hangzai Luo, and Lide Wu.
Semantic video classification and feature subset selection under
context and concept uncertainty.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
As large-scale collection of medical videos comes
into view, there is an urgent need to develop semantic medical video
classification techniques and enable video retrieval at the semantic
level. However, most existing batch-based classifier training
techniques still suffer from context and concept uncertainty problems
when only a limited number of labeled training samples are available.
To address the context and concept uncertainty problems, we have
proposed a novel framework by integrating large-scale unlabeled samples
with a limited number of labeled samples to enable more effective
feature subset selection, parameter estimation and model selection.
Specifically, this framework includes: (a) A novel multimodal context
integration and semantic video concept interpretation framework; (b) A
novel classifier training technique by integrating feature subset
selection, parameter estimation and model selection seamlessly in a
single algorithm to address the context uncertainty problem over time;
(c) A cost-sensitive semantic video classification framework to address
the concept uncertainty problem. Our experimental results in a certain
medical education video domain have also been provided a convincing
proof of our conclusions.
|
|
|
Adam Farquhar, Angela Dappert, Richard Fikes, and Wanda Pratt.
Integrating information sources using context logic.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
S. Feiner.
Seeing the forest for the trees: Hierarchical display of hypertext
structure.
In Conference on Office Information Systems, New York:ACM,
pages 205-212, 1988.
|
|
|
Jean-Daniel Fekete and Micole Dufournaud.
Compus: Visualization and analysis of structured documents for
understanding social life in the 16th century.
In Proceedings of the Fifth ACM International Conference on
Digital Libraries, 2000.
This article describes the Compus visualization system that
assists in the exploration and analysis of structured document copora
encoded in XML. Compus has been developed for and applied to a corpus of
100 French manuscript letters of the 16th century, transcribed and encoded
for scholarly analysis using the recommendations of the Text Encoding
Initiative. By providing a synoptic visualization of a corpus and
allowing for dynamic queries and structural transformations, Compus assists
researchers in finding regularities or discrepancies, leading to a higher
level analysis of historic source. compus can be used with other
richly encoded text copora as well.
|
|
|
An Feng and Toshiro Wakayama.
SIMON: a grammar-based transformation system for structured
documents.
Electronic Publishing: Origination, Dissemination and Design,
6(4):361-372, December 1993.
|
|
|
Michelle Ferebee, Greg Boeshaar, Kathryn Bush, and Judy Hertz.
A scientific digital library in context: An earth radiation budget
experiment collection in the atmospheric sciences data center digital library
[short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
At the NASA Langley Research Center, the Earth Radiation Budget
Experiment (ERBE) Data Management Team and the Atmodpheric Sciences Data Center are
developing a digital collection for the ERBE project. The main goal is long-term
preservation of a comprehensive information environment. The secondary goal is to provide
a context for these data products by centralizing the 25 year research project's scattered
information elements. The development approach incorporates elements of rapid
prototyping and user-centered design in a standards-based implementation. A working
prototype is in testing with a small number of users.
|
|
|
E. Fernandez, R. Summers, and C. Wood.
Database Security and Integrity.
Addison-Wesley, 1981.
|
|
|
Mary F. Fernandez, Daniela Florescu, Jaewoo Kang, Alon Y. Levy, and Dan Suciu.
STRUDEL: A web-site management system.
In Proceedings of the International Conference on Management of
Data, pages 549-552, 1997.
|
|
|
Richard Fikes, Robert Engelmore, Adam Farquhar, and Wanda Pratt.
Network-based information brokers.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
Laura Fillmore.
How we must think.
In JEP.
Format: HTML Version (25K)> .
Audience: Publishers.
References: 0.
Links: 0.
Relevance: Low-Medium.
Abstract: The president of the Online Bookstore gives her suggestions
for other publishers to succeed in the digital age. Think creatively about
things that were not possible in paper. Add value by licensing content then giv
ing people a framework through which to think about it. Quote from Gregory
Rawlins, a computer science professor at Indiana University: If you're not
part of the steamroller, you're part of the road.
|
|
|
Laura Fillmore.
Internet publishing in a borderless environment: Bookworms into
butterflies.
In JEP, 1994.
Format: HTML Document (18K) .
Audience: Publishers.
References: 0.
Links: 0.
Relevance: Low.
Abstract: Electronic publishing will need to account for the distributed
nature of the Internet. Roles for a publisher include: Imprimatur of quality,
content filter. Verifying authenticity of the files. Creating context ar
ound core content; Developing and maintaining an equitable royalty system based
on number of accesses. Customizing the content for the readers.
|
|
|
Laura Fillmore.
Online publishing: Threat or menace?
In JEP, 1994.
Format: HTML Document (30K) .
Audience: General public, publishers.
References: 0.
Links: 0.
Relevance: low.
Abstract: One person's view on the future of publishing and books. Time
to press has decreased; non-linear thinking is encouraged; people use on-line
resources differently than traditional books; piracy is not likely to be a
big problem; publishers still needed to publicize.
|
|
|
Janet Fisher.
Copyright: The glue of the system.
In JEP, 1994.
Format: HTML Document (15K) .
Audience: Publishers, scholars, authors.
References: 0.
Links: 0.
Relevance: low.
Abstract: An MIT Press director gives her position on copyrights:
Journal publishers are essential because they 1) take
care of requests for reprints, etc and 2) provide the
filter which full-text on-line services need to determine
quality. Individual authors or their institutions could
not do these economically. Does suggest some changes to
current law, like allowing authors to copy for their own
classes without fee.
|
|
|
G. W. Fitzmaurice.
Situated information spaces and spatially aware palmtop computers.
Communications of the ACM, 36(7):38-49, Jul 1993.
Explores and uncovers a wide range of issues surrounding
computer-augmented environments. The Chameleon prototype and
a set of computer-augmented applications are described.
Chameleon is a prototype system under development at the
University of Toronto. It is part of an investigation on how
palmtop computers designed with a high-fidelity monitor can
become spatially aware of their location and orientation and
serve as bridges or portholes between computer-synthesized
information spaces and physical objects. In this prototype
design, a 3D input controller and an output display are
combined into one integrated unit.
|
|
|
George W. Fitzmaurice, Shumin Zhai, and Mark H. Chignell.
Virtual reality for palmtop computers.
ACM Trans. Inf. Syst., 11(3):197-218, 1993.
We are exploring how virtual reality theories can be applied
toward palmtop computers. In our prototype, called the
Chameleon, a small 4-inch hand-held monitor acts as a palmtop
computer with the capabilities of a Silicon graphics
workstation. A 6D input device and a response button are
attached to the small monitor to detect user gestures and
input selections for issuing commands. An experiment was
conducted to evaluate our design and to see how well depth
could be perceived in the small screen compared to a large
21-inch screen, and the extent to which movement of the small
display (in a palmtop virtual reality condition) could improve
depth perception, Results show that with very little training,
perception of depth in the palmtop virtual reality condition
is about as good as corresponding depth perception in a large
(but static) display. Variations to the initial design are
also discussed, along with issues to be explored in future
research, Our research suggests that palmtop virtual reality
may support effective navigation and search and retrieval, in
rich and portable information spaces.
|
|
|
Flickr.com.
http://www.flickr.com.
|
|
|
Daniela Florescu, Daphne Koller, and Alon Levy.
Awareness services for digital libraries.
In Proceedings of the Twenty-third International Conference on
Very Large Databases, 1997.
Deals with prioritizing queries to information sources on the
web, i.e., by first querying the ones more likely to be
relevant.
|
|
|
Daniela Florescu, Alon Y. Levy, and Alberto O. Mendelzon.
Database techniques for the world-wide web: A survey.
SIGMOD Record, 27(3):59-74, 1998.
|
|
|
Kathleen M. Flynn.
The knowledge manager as a digital librarian: An overview of the
knowledge management pilot program at the mitre corporation.
In Proceedings of the Second Annual Conference on the Theory and
Practice of Digital Libraries, 1995.
Format: HTML Document (10K)
.
Audience: Corporate librarians.
References: 0.
Links: 1.
Relevance: Low.
Abstract: Discusses the role of a Knowledge Manager (formerly
corporate librarian. In particular, finding and
organizing new networked information resources.
|
|
|
Peter W. Foltz.
Using latent semantic indexing for information filtering.
In Proceedings of the Conference on Office Information Systems,
1990.
LSI study to show how well it predicts interestingness of
newsgroup articles.
|
|
|
P.W. Foltz and S.T. Dumais.
Personalized information delivery: an analysis of information
methods.
Communications of the ACM, 35(12):51-60, December 1992.
With the increasing availability of information in electronic
form, it becomes more important and feasible to have
automatic methods to filter information. The results of an
experiment aimed at determining the effectiveness of four
information-filtering methods in the domain of technical
reports are presented. The experiment was conducted over a
six-month period with 34 users and over 150 new reports
published each month. Overall, the authors conclude that
filtering methods show promise for presenting personalized
information.
|
|
|
Leonard N. Foner.
Clustering and information sharing in an ecology of cooperating
agents.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
Consortium for University Printing and Information Distribution (CUPID).
Protocols and services (version 1): An architectural overview.
In IP Workshop Proceedings, 1994.
Format: HTML Document (52K).
Audience: Publishers, layperson, non-technical.
References: 0.
Links: 0.
Relevance: Low.
Abstract: Discussion of the CUPID project, basically just-in-time
printing, (of textbooks, e.g.) at trusted
printshops. No terminal display, hardcopy only.
|
|
|
Muriel Foulonneau, Timothy W. Cole, Thomas G. Habing, and Sarah L. Shreeves.
Using collection descriptions to enhance an aggregation of harvested
item-level metadata.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
As an increasing number of digital library projects embrace the harvesting of item-level descriptive metadata, issues of description granularity and concerns about potential loss of context when harvesting item-level metadata take on greater significance. Collection-level description can provide added context for item-level metadata records harvested from disparate and heterogeneous providers. This paper describes an ongoing experiment using collection-level description in concert with item-level metadata to improve quality of search and discovery across an aggregation of metadata describing resources held by a consortium of large academic research libraries. We present details of approaches implemented so far and preliminary analyses of the potential utility of these approaches. The paper concludes with a brief discussion of related issues and future work plans.
|
|
|
A. Fox, S. D. Gribble, Y. Chawathe, A. S. Polite, A. Huang, B. Ling, and E. A.
Brewer.
Orthogonal extensions to the www user interface using client-side
technologies.
In Proceedings of the ACM Symposium on User Interface Software
and Technology. 10th Annual Symposium. UIST '97, pages 83-4, Oct 1997.
We describe our experience implementing orthogonal
extensions
to the existing WWW user interface, to support user control
of intelligent services. Our extensions are orthogonal in
that they provide an interface to a service, which
complements the Web browsing experience but is independent of
the content of any particular site. We base our experiments
on the TranSend service at UC Berkeley, which performs lossy
compression on inline images to accelerate dialup Web access
for a community of 25,000 subscribers. The service keeps a
separate preferences profile for each user, which allows each
user to vary the aggressiveness of lossy compression,
selectively turn off the service for certain pages, and
select the type of interface provided for refinement of
degraded (lossily compressed) content. We are exploring three
technologies for implementing the TranSend service interface:
HTML decoration, Java and JavaScript.
|
|
|
Armando Fox and Eric A. Brewer.
Reducing www latency and bandwidth requirements by real-time
distillation.
Comput. Netw. ISDN Syst. (Netherlands), Computer Networks and
ISDN Systems, 28(7-11):1445-56, May 1996.
cache storage; client-server systems; computer communications
software; data compression; Internet; network servers; real-time systems;
network latency; bandwidth requirements; real-time distillation; Pythia proxy
mechanism; World Wide Web; real-time refinement; statistical models; metered
cellular phone service; transcoding; client-side rendering; data
representation; client display constraints; content optimization; PPP;
Point-to-Point Protocol; image loading; added value.
The Pythia proxy mechanism provides three important
orthogonal
benefits to World Wide Web (WWW) clients. (1) Real-time
distillation and refinement, guided by statistical models,
allow the user to bound latency and exercise explicit control
over bandwidth that may be scarce and expensive (e.g. a
metered cellular phone service). (2) Transcoding to a
representation understood directly by the client system may
improve rendering on the client or result in a representation
that can be transmitted more efficiently. (3) Knowledge of
client display constraints allows content to be optimized for
rendering on the client. Users have commented that even the
prototype version of Pythia provides a qualitative increase
of about 5 times when surfing the World Wide Web over PPP
(Point-to-Point Protocol) with a 14.4 kbit/s modem. These are
the same users that previously turned image loading off
completely in order to make surfing bearable. With the
continued growth of the WWW, the benefits afforded by proxied
services like Pythia will represent increasingly significant
added value to end users and content providers alike. Pythia
is the first fruit of a comprehensive research agenda aimed
at implementing and deploying such services.
|
|
|
Armando Fox, Ian Goldberg, Steven D. Gribble, and David C. Lee.
Experience with top gun wingman: A proxy-based graphical web browser
for the 3com palmpilot.
In Proceedings of Middleware '98, Lake District, England,
September 1998, 1998.
|
|
|
Armando Fox, Steven D. Gribble, Eric A. Brewer, and Elan Amir.
Adapting to network and client variability via on-demand dynamic
distillation.
SIGPLAN Not. (USA), SIGPLAN Notices, 31(9):160-70, Sep 1996.
Also Seventh Intl. Conf. on Arch. Support for Prog. Lang. and Oper.
Sys. (ASPLOS-VII).
The explosive growth of the Internet and the proliferation
of
smart cellular phones and handheld wireless devices is
widening an already large gap between Internet clients.
Clients vary in their hardware resources, software
sophistication, and quality of connectivity, yet server
support for client variation ranges from relatively poor to
none at all. In this paper we introduce some design
principles that we believe are fundamental to providing
meaningful Internet access for the entire range of clients.
In particular, we show how to perform on-demand
datatype-specific lossy compression on semantically typed
data, tailoring content to the specific constraints of the
client. We instantiate our design principles in a proxy
architecture that further exploits typed data to enable
application-level management of scarce network resources. Our
proxy architecture generalizes previous work addressing all
three aspects of client variation by applying well-understood
techniques in a novel way, resulting in quantitatively better
end-to-end performance, higher quality display output, and
new capabilities for low-end clients.
|
|
|
W. B. Frakes and R. Baeza-Yates.
Information Retrieval Data Structures & Algorithms.
Prentice Hall, Englewood Cliffs, N.J., 1992.
|
|
|
L. Francis.
Mobile computing-a fact in your future.
In 15th Annual International Conference on Computer
Documentation Conference Proceedings. SIGDOC '97. Crossroads in
Communication, pages 63-7, 1997.
Mobile computing is now at the stage where cell
phones were 5-7 years ago. Laptops are frequently
the choice of telecommuters who put in significant
amounts of time both at home and at the office, but
there is a growing group of mobile users who work
from more than two locations and who expect to
perform their full job responsibilities using a
laptop that rarely returns to the main
office. Although a mobile PC can be used without
ever connecting to a network, they are typically
connected with or without wires. Wired systems are
most common and generally use modems with the
dial-up lines found in homes or hotels. Wireless
connections are increasing in popularity and use
cellphone-like radio links to send and receive
information, but mobile computing is not just about
wireless connections; it is also about using your
laptop in a hotel (in any country), at home, in a
branch office or at a customer site. Using a laptop
in those locations frequently reduces your
interaction speed and range of functions to an
unacceptable level, but recent improvements have
attacked these problems. To really feel the freedom
offered by mobile computing, imagine setting up
overseas in your customer's spare office and working
as if you were in your own office. Imagine being in
a foreign country and not having to load printer
drivers for each printer and load US fonts for each
job in order to produce properly printed output. If
this sort of future appeals to you, you're not
alone. A likely and growing group is people who
already use laptops. In 1996, laptops comprised
30-35 are forecast to be manufactured.
|
|
|
Luis Francisco-Revilla, III Frank M. Shipman, Richard Furuta, Unmil Karadkar,
and Avital Arora.
Perception of content, structure, and presentation changes in
web-based hypertext.
In HYPERTEXT '01: Proceedings of the twelfth ACM conference on
Hypertext and Hypermedia, pages 205-214, New York, NY, USA, 2001. ACM
Press.
The Web provides access to a wide variety of information
but much of this information is fluid; it changes,
moves, and occasionally disappears. Bookmarks, paths
over Web pages, and catalogs like Yahoo! are
examples of page collections that can become
out-of-date as changes are made to their
components. Maintaining these collections requires
that they be updated continuously. Tools to help in
this maintenance require an understanding of what
changes are important, such as when pages no longer
exist, and what changes are not, such as when a
visit counter changes. We performed a study to look
at the effect of the type and quantity of change on
people's perception of its importance. Subjects were
presented pairs of Web pages with changes to either
content (e.g., text), structure (e.g., links), or
presentation (e.g., colors, layout). While changes
in content were the most closely connected to
subjects perceptions of the overall change to a
page, subjects indicated a strong desire to be
notified of structural changes. Subjects only
considered the simultaneous change of many
presentation characteristics as important.
|
|
|
Luis Francisco-Revilla, Frank Shipman, Richard Furuta, Unmil Karadkar, and
Avital Arora.
Managing change on the web.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
Increasingly, digital libraries are being defined that
collect pointers to World-Wide Web based resources rather than hold
the resources themselves. Maintaining these collections is challenging
due to distributed document owership and high fluidity. Typically a
collection's maintainer has to assess the relevance of changes with
little system aid. In this paper, we describe the Walden's Paths Path
Manager, which assists a maintainer in discovering when relevant changes
occur to linked resources. The approach and system design was informed by
a study of how humans perceive changes of Web pages. The study indicated
that structural changes are key in determining the overall change and
that presentation changes are considered irrelevant.
|
|
|
Paolo Frasconi, Giovanni Soda, and Alessandro Vullo.
Text categorization for multi-page documents: A hybrid naive bayes
hmm approach.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
Text categorization is typically formulated as a concept
learning problem where each instance is a single isolated document.
In this paper we are interested in a moe general formulation where
documents are organized as page sequences, as naturally occurring in
digital libraries of scanned books and magazines. We describe a method
for classifying pages of sequential OCR text dosuments into one of several
assigned categories and suggest that taking into account contextual
information provided by the whole page sequence can significantly improve
classification accuracy. The proposed architecture relies on hidden Markov
models whose emissions are bag-of-words according to a multinomial word event
model, as in the generative portion of the Naive Bayes classifier. Our results
on a collection of scanned journals from the Making of America project
contirm the importance of using whole page sequences. Empirical evaluation
indicates that the error rate (as obtained by running a plain Naive Bayes
classifier on isolated page) can be roughly reduced by hald if contextual
information is incorporated.
|
|
|
Lisa Freeman.
Testimony prepared on behalf of the association of american
university presses for the national information infrastructure task force
working group on intellectual property.
In JEP, 1994.
Format: HTML Document (10K) .
Audience: Legislators.
References: 0.
Links: 0.
Relevance: low.
Abstract: AAUP believes the current copyright law is sufficient for use
in the networked world. The copyright provides a valuable service in
designating the final (peer-reviewed) copy. Reprint fees, contracts, and copy
prot
ection should not be mandated but handled by the copyright holders.
|
|
|
James C. French.
Modeling web data.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
We have created three testbeds of web data for use in controlled
experiments in collection modeling. This short paper examines the applicability
of Ziff's and Heaps' laws as applied to web data. We find extremely close
agreement between observed vocabulary growth and Heaps' law. We find reasonable
agreement with Ziff's law for medium to low frequency terms. Ziff's law is a
poor predictor for high frequency terms. These findings hold for all three
testbeds although we restrict ourselves to one here due to space limitations.
|
|
|
James C. French, A. C. Chapin, and Worthy N. Martin.
An application of multiple viewpoints to content-based image
retrieval [short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
Content-based image retrieval uses features that can be extracted
from the images themselves. Using more than one representation of the images in a
collection can improve the results presented to a user without changing the
underlying feature extraction of search technologies. We present an example of
this multiple viewpoint approach, multiple image channels. and discuss
its advantages for an image-seeking user. This approach has also been shown to
dramatically improve retrieval effectiveness in content-based image retrieval
systems.
|
|
|
Jim French, Ed Fox, Kurt Maly, and Alan Selman.
Wide area technical report service: Technical reports online.
Communications of the ACM, 38(4):45, April 1995.
This is the WATERS paper.
|
|
|
J. Frew, M. Aurand, B. Buttenfield, L. Carver, P. Chang, R. Ellis, C. Fischer,
M. Gardner, M. Goodchild, G. Hajic, M. Larsgaard, K. Park, M. Probert,
T. Smith, and Q. Zheng.
The alexandria rapid prototype: Building a digital library for
spatial information.
In Advances in Digital Libraries '95, 1995.
Format: Not Yet Online.
|
|
|
Fred Friedman, Arthur M. Keller, Gio Wiederhold, Mike R. Berkowitz, John
Salasin, and David L. Spooner.
Reference model for ADA interfaces to database management systems.
In Proceedings Second IEEE Computer Society Data Engineering
Conference, 1986.
|
|
|
David Frohlich, Allan Kuchinsky, Celine Pering, Abbe Don, and Steven Ariss.
Requirements for photoware.
In Proceedings of the 2002 ACM conference on Computer supported
cooperative work, 2002.
|
|
|
Yueyu Fu, Weimao Ke, and Javed Mostafa.
Automated text classification using a multi-agent framework.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
Automatic text classification is an important operational problem in digital library practice. Most text classification efforts so far concentrated on developing centralized solutions. However, centralized classification approaches often are limited due to constraints on knowledge and computing resources. In addition, centralized approaches are more vulnerable to attacks or system failures and less robust in dealing with them. We present a de-centralized approach and system implementation (named MACCI) for text classification using a multi-agent framework. Experiments are conducted to compare our multi-agent approach with a centralized approach. The results show multi-agent classification can achieve promising classification results while maintaining its other advantages.
|
|
|
Yueyu Fu and Javed Mostafa.
Integration of biomedical text and sequence oai repositories.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
Archived biomedical literature and sequence data
are growing rapidly. OAI-PMH provides a convenient way for data
sharing, but it has not been tested in the biomedical domain,
especially in dealing with different types of data, such as protein,
and gene sequences. We built four individual OAI-PMH repositories based
on different biomedical resources. Using the harvested data from the
four repositories we created an integrated OAI-PMH repository, which
hosts the linked literature and sequence data in a single place.
|
|
|
Yueyu Fu and Javed Mostafa.
Toward information retrieval web services for digital libraries.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
Information retrieval (IR) functions serve a
critical role in many digital library systems. There are numerous
mature IR algorithms that have been implemented and it will be a waste
of resources and time to re-implement them. Those IR algorithms can be
modulated and composed through the framework of web services. Web
services in IR domain have not been widely tested. Concept extraction
is an important area in traditional IR. We demonstrated that it can be
easily adopted as IR web services and can be accessed in multiple ways.
For the IR web services, we take advantage of a term representation
database which was created as a result of a previous digital library
project containing 31,928,892 terms found on 49,602,191 pages of the
Web.
|
|
|
M. Fuchs.
The user interface as document: Sgml and distributed applications.
Computer Standards & Interfaces, 18(1):79-92, January 1996.
Multi-user distributed applications running on
heterogeneous networks must be able to display user
interface components on several platforms. In
wide-area public networks, such as the Internet, the
mix of platforms and participants in an application
will occur dynamically; the user interface will need
to coexist with environments completely uncontrolled
by the designer. We have dealt with this issue by
considering user interfaces as a kind of document
specifying the application`s requirements and
adopting SGML technology to process them
locally. This approach provides new flexibility,
with implications for the design of network
browsers, such as those of the World Wide Web, and
leads to an interesting class of active documents.
|
|
|
George W. Furnas.
Generalized fisheye views.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'86, 1986.
|
|
|
Kenneth Furuta.
Librarianship in the digital library.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (4K) .
Audience: Librarians and Digital Library researchers.
References: 0.
Links: 1.
Relevance: Low.
Abstract: A view on the role in classification, reference, ensuring
access, and collection development for the librarian of a digital library.
|
|
|
Richard Furuta.
Defining and using structure in digital documents.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (31K) .
Audience: Authors, Developers, slightly technical.
References: 43.
Links: 1.
Relevance: Low-Medium.
Abstract: Discussion of SGMLs, their motivation, research issues, how
they might be extended to non-text objects. Distinction between content &
presentation
|
|
|
Richard Furuta, Catherine C. Marshall, Frank M. Shipman III, and John J.
Leggett.
Physical objects in the digital library.
In Proceedings of DL'96, 1996.
Format: Not yet online.
|
|
|
Robert P. Futrelle and Xiaolan Zhang.
Large-scale persistent object systems for corpus linguistics and
information retrieval.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (40K + picture) .
Audience: technical, computer scientists with some knowledge of
computational linguistics.
References: 31.
Links: 1.
Relevance: Medium-Low.
Abstract: Discusses the challenges in indexing/searching large
databases. Argues for a bootstrapping/machine learning
approach to locate words in related contexts (surrounding
words). Suggests specific data structures. Discusses
tradeoffs between accuracy & speed, and scaling problems.
|
|
|
Prasanne Ganesan, Qixiang Sun, and Hector Garcia-Molina.
Yappers: A peer-to-peer lookup service over arbitrary topology.
Technical Report 2002-24, Stanford University, 2002.
Existing peer-to-peer search networks generally fall into two
categories: Gnutella-style systems that use arbitrary topology and rely on
controlled flooding for search, and systems that explicitly build an underlying
topology to efficiently support a distributed hash table (DHT). In this paper,
we propose a hybrid scheme for building a peer-to-peer lookup service over
arbitrary network topology. Specifically, for each node in the search network,
we build a small DHT consisting of nearby nodes and then provide an intelligent
search mechanism that can traverse all the small DHTs. Our hybrid approach can
reduce the nodes contacted for a lookup by an order of magnitude compared to
Gnutella, allows rapid searching of nearby nodes through quick fan-out, does
not reorganize the underlying overlay, and isolates the effect of topology
changes to small areas for better scalability and stability.
|
|
|
Y. J. Gao, J.J. Lim, and A.D. Narasimhalu.
Fuzzy multilinkage thesaurus builder in multimedia information
systems.
In Advances in Digital Libraries '95, 1995.
Format: Not Yet Online.
|
|
|
H. Garcia-Molina, J. Hammer, J. Widom, W. Labio, and Y. Zhuge.
The stanford data warehousing project.
IEEE Data Engineering Bulletin, 18(2):41-48, June 1995.
|
|
|
H. Garcia-Molina, J. Ullman, and J. Widom.
Database System Implementation.
Prentice-Hall, 2000.
|
|
|
Hector Garcia-Molina, Luis Gravano, and Narayanan Shivakumar.
dscam: Finding document copies across multiple databases.
Proceedings of 4th International Conference on Parallel and
Distributed Information Systems, 1996.
|
|
|
Héctor García-Molina, Joachim Hammer, Kelly Ireland, Yannis
Papakonstantinou, Jeffrey Ullman, and Jennifer Widom.
Integrating and accessing heterogeneous information sources in
TSIMMIS.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
Hector Garcia-Molina, Steven Ketchpel, and Narayanan Shivakumar.
Safeguarding and charging for information on the Internet.
In Proceedings of the Fourteenth International Conference on
Data Engineering, 1998.
Available at http://dbpubs.stanford.edu/pub/1998-26.
|
|
|
Héctor García-Molina, Wilburt Labio, and Ramana Yerneni.
Capability sensitive query processing on internet sources.
In Proceedings of the 15th International Conference on Data
Engineering, Sydney, Australia, March 1999.
Accessible at http://dbpubs.stanford.edu/pub/1998-40.
|
|
|
E. Garfield.
Citation analysis as a tool in journal evaluation.
Science, 178:471-479, 1972.
|
|
|
Ullas Gargi.
Managing and searching personal photo collections.
Technical Report HPL-2002-67, HP Laboratories, March 2002.
|
|
|
Ullas Gargi.
Consumer media capture: Time-based analysis and event clustering.
Technical Report HPL-2003-165, HP Laboratories, August 2003.
|
|
|
John R. Garrett.
Task force on archiving of digital information.
D-Lib Magazine, Sep 1995.
Format: HTML Document().
|
|
|
Susan Gauch, Ron Aust, Joe Evans, John Gauch, Gary Minden, Doug Niehaus, and
James Roberts.
The digital video library system: Vision and design.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (29K) .
Audience: slightly technical, generalist comfortable with technology.
References: 23.
Links: 1.
Relevance: Medium-Low (but not mainstream DL).
Abstract: Describes architecture of a system to retrieve & deliver video
on demand. Indexing done by audio track or transcript. 100 hours of video to
20-30 users. Different compression modes depending on bandwidth of user co
nnection.
|
|
|
Geri Gay and June P. Mead.
The common ground surrounding access: Theoretical and practical
perspectives.
In Proceedings of DL'96, 1996.
Format: Not yet online.
|
|
|
Maayan Geffet and Dror G. Feitelson.
Hierarchical indexing and document matching in bow.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
BoW is an on-line bibliographical repository based on a
hierarchical concept index to which entries are linked. Searching in
the repository should therefore return matching topics from the
hierarchy, rather than just a list of entries. Likewise, when new entries
are inserted, a search for relevant topics to which they should be linked
is required. We develop a vector-based algorithm that creates keyword
vectors for the set of competing topics at each node in the hierarchy,
and show how its performance improves when domain-specific features are
added (such as special handling of topic titles and author names). The
results of a 7-fold cross validation on a corpus of some 3,500 entries
with a 5-level index are hit ratios in the range of 89-95the misclassifications are indeed ambiguous to begin with.
|
|
|
Gary Geisler, Sarah Giersch, David McArthur, and Marty McClelland.
Creating virtual collections in digital libraries: Benefits and
implementation issues.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
Digital libraries have the potential to not only duplicate many
of the services provided by traditional libraries but to extend them. Basic
finding aids such as search and browse are common in most of today's digital
libraries. But just as a traditional library provides more than a card catalog
and browseable shelves of books, an effective digital library should offer a
wider range of services. Using the traditional library concept of special
collections as a model, in this paper we propose that explicitly defining
sub-collections in the digital library-virtual collections-can benefit both
the library's users and contributors and increase its viability. We first
introduce the concept of a virtual collection, outline the costs and benefits
for defining such collections, and describe an implementation of collection-
level
metadata to create virtual collections for two different digital libraries. We
conclude by discussing the implications of virtual collections for enhancing
interoperability and sharing across digital libraries, such as those that are
part of the National SMETE Digital Library.
|
|
|
Hans W. Gellersen, Albercht Schmidt, and Michael Beigl.
Multi-sensor context-awareness in mobile devices and smart artifacts.
Mob. Netw. Appl., 7(5):341-351, 2002.
|
|
|
Jim Gemmell, Gordon Bell, Roger Lueder, Steven Drucker, and Curtis Wong.
Mylifebits: fulfilling the memex vision.
In Proceedings of the tenth ACM international conference on
Multimedia, pages 235-238. ACM Press, 2002.
|
|
|
Michael R. Genesereth, Arthur M. Keller, and Oliver M. Duschka.
Infomaster: An information integration system.
In Proceedings of the International Conference on Management of
Data, Tucson, Ariz., 1997. ACM Press, New York.
|
|
|
Michael R. Genesereth and Steven P. Ketchpel.
Software agent.
Communications of the ACM, 37(7), July 1994.
Discusses important issues related to agent-based
software engineering, which was developed to create
interaperable softwares.
|
|
|
M.R. Genesereth, A.M. Keller, and O.M. Duschka.
Infomaster: an information integration system.
In SIGMOD Record, New York, 1997. ACM Press.
Infomaster is an information integration system that provides
integrated access to multiple distributed heterogeneous
information sources on the Internet, thus giving the illusion
of a centralized, homogeneous information system. We say that
Infomaster creates a virtual data warehouse. The core of
Infomaster is a facilitator that dynamically determines an
efficient way to answer the user's query using as few sources
as necessary and harmonizes the heterogeneities among these
sources. Infomaster handles both structural and content
translation to resolve differences between multiple data
sources and the multiple applications for the collected data.
Infomaster connects to a variety of databases using wrappers,
such as for Z39.50, SQL databases through ODBC, EDI
transactions, and other World Wide Web (WWW) sources. There
are several WWW user interfaces to Infomaster, including
forms based and textual. Infomaster also includes a
programmatic interface and it can download results in
structured form onto a client computer. Infomaster has been
in production use for integrating rental housing
advertisements from several newspapers (since fall 1995), and
for meeting room scheduling (since winter 1996). Infomaster
is also being used to integrate heterogeneous electronic
product catalogs.
|
|
|
Don Gentner, Frank Ludolph, and Chris Ryan.
Simplified applications for network computers.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'97, 1997.
|
|
|
D. Georgakopoulos, M. Hornick, and A. Sheth.
An overview of workflow management: From process modeling to
infrastructure for automation.
Journal on Distributed and Parallel Database Systems, 3(2),
1995.
|
|
|
Branko Gerovac and Richard J. Solomon.
Protect revenues, not bits: Identify your intellectual property.
In IP Workshop Proceedings, 1994.
Format: HTML Document (40K).
Audience: Standards committees, general technologists, technical
sections.
References: 14 footnotes.
Links: 0 .
Relevance: low-medium.
Abstract: Discusses a header-based approach to identifying data
streams, focusing on video domains. Gives a brief history of copyrights. Gives
desiderata for standards/design to ensure interoperability, flexibility,
extensib
ility, etc. Gives concrete examples of encoding used for certain applications.
|
|
|
N. Gershon, W. Ruh, J. LeVasseur, J. Winstead, and A. Kleiboemer.
Searching and discovery of resources in digital libraries.
In Advances in Digital Libraries '95, 1995.
Format: Not Yet Online.
|
|
|
Stean Gessler and Andreas Kotulla.
Pdas as mobile www browsers.
In Proceedings of the Second International World-Wide Web
Conference, 1994.
|
|
|
Paul Gherman.
Image vision: Forging a national image alliance.
In JEP, 1994.
Format: HTML Document (11K) .
Audience: Image catalogers & users, politicians.
References: 0.
Links: 0.
Relevance: low.
Abstract: Argues that all of the search & retrieval issues for
bibliographic records are worse for images. There are
more of them, there's no standard for representation or
indexing. Calls for creation of universal image
database, a single standard for representation, and a
standard license agreement for image owners.
|
|
|
David Gibson, Jon M. Kleinberg, and Prabhakar Raghavan.
Inferring web communities from link topology.
In HyperText, 1998.
|
|
|
Aristides Gionis and Heikki Mannila.
Finding recurrent sources in sequences.
In Proceedings of the seventh annual international conference on
Computational molecular biology, pages 123-130. ACM Press, 2003.
|
|
|
Richard Giordano.
Digital libraries and impacts on scientific careers.
In Proceedings of DL'96, 1996.
Format: Not yet online.
|
|
|
Andreas Girgensohn, John Adcock, Matthew Cooper, Jonathan Foote, and Lynn
Wilcox.
Simplifying the management of large photo collections.
In INTERACT '03: Ninth IFIP TC13 International Conference on
Human-Computer Interaction, pages 196-203. IOS Press, September 2003.
|
|
|
Andreas Girgensohn, John Adcock, and Lynn Wilcox.
Leveraging face recognition technology to find and organize photos.
In MIR '04: Proceedings of the 6th ACM SIGMM international
workshop on Multimedia information retrieval, pages 99-106. ACM Press,
2004.
|
|
|
Giovanni Giuffrida, Eddie C. Shek, and Jihoon Yang.
Knowledge-based metadata extraction from postscript files.
In Proceedings of the Fifth ACM International Conference on
Digital Libraries.
The automatic document metadata extraction process is an
important task in a world where thousands of documents are just one
click away. Thus, powerful indices are necessary to support effective
retrieval. The upcoming XML standard represents an important step in this
direction as its semistructured representation conveys document
metadata together with the text of the document. For example, retrieval
of scientific papers by authors or affiliations would be a straightforward
task if papers were stored in XML. Unfortunately, today, the largest
majority of documents on the web are available in forms that do not carry
additional semantics. Converting existing documents to a semistructured
representation is time consuming and no automatic process can be easily
applied. In this paper we discuss a system, based on a novel spatial/visual
knowledge principle, for extracting metadata from scientific papers
stored as PostScript files. Our system embeds the general knowledge
about the graphic layout of a scientific paper to guide the metadata
extraction process. Our system can effectively assist the automatic
index creation for digital libraries.
|
|
|
Henry M. Gladney, Edward A. Fox, Zahid Ahmed, Ron Ashany, Nicholas J. Belkin,
Michael Lesk, Richard Tong, and Maria Zemankova.
Digital library: Gross structure and requirements (report from a
march 1994 workshop).
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (35K) .
Audience: DL researchers, workgroup attendees.
References: 22.
Links: 1.
Relevance: Low-Medium.
Abstract: The report of a working group on digital libraries. Defines
terms, discusses a possible architecture in terms of resource managers and
application enablers.
|
|
|
Steven Glassman.
A caching relay for the world wide web.
In Proceedings of the First International World-Wide Web
Conference, 1994.
|
|
|
Dion Goh and John Leggett.
Patron-augmented digital libraries.
In Proceedings of the Fifth ACM International Conference on
Digital Libraries, 2000.
Digital library research is mostly focused on the generation
of large collections of multimedia resources and state-of-the-art tools
for their indexing and retrieval. However, digital libraries should provide
more than advanced collection maintenance and retrieval services since
the ultimate goal of any academic library is to serve the scholarly needs of
its users. This paper begins by presenting a case for digital scholarship
in which patrons perform all scholarly work electronically. A proposal is
then made for patron-augmented digital libraries (PADLs), a class of
digital libraries that supports the digital scholarship of its patrons.
Finally, a prototype PADL (called Synchrony) providing access to video
segments and associated textual transcripts is described. Synchrony
allows patrons to search the library for artifacts, create
annotations/original compositions, integrate these artifacts to form
synchronized mixed text and video presentations and, after suitable
review, publish these presentations into the digital library if desired.
A study to evaluate the PADL concept and usability of Synchrony is also
discussed. The study revealed that participants were able to use
Synchrony for the authoring and publishing of presentations and that
attitudes toward PADLs were generally positive.
|
|
|
Anna Keller Gold, Karen Baker, Kim Baldridge, and Jean-Yves Le Meur.
Building flow: Federating libraries on the web.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
An individual scientist, a collaborative team and a research
network
have a variety of document management needs in common. The levels of research
organization, when viewed as nested tiers, represent boundaries across which
information can flow openly if technology and metadata standards are partnered
to provide an accessible, interoperable digital framework. The CERN Document
System (CDS), implemented by a research partnership at the San Diego
Supercomputer
Center (SDSC), establishes a prototype tiered repository system.
An ongoing exploration of existing scientific research information
infrastructure suggests modifications to enable cross-tier and cross-domain
information flow across what could be represented as a metadata grid.
|
|
|
D. Goldberg, D. Nichols, B.M. Oki, and D. Terry.
Using collaborative filtering to weave an information tapestry.
Communications of the ACM, 35(12):61-70, December 1992.
use user annotations to help with filtering.
|
|
|
D. Goldberg, D. Nichols, B.M. Oki, and D. Terry.
Using collaborative filtering to weave an information tapestry.
Communications of the ACM, 35(12):61-70, 1992.
Tapestry is an experimental mail system developed at the Xerox
Palo Alto Research Center. The system manages an in-coming
stream of electronic documents, including E-mail, newswire
stories and NetNews articles. The system implements a novel
mechanism for collaborative filtering in which users annotate
documents before the documents art filtered. Because
annotations are not available at the time a new document
arrives, the system supports continuous queries that examine
the entire database of documents and take into account newly
introduced annotations during the filtering process.
|
|
|
Charles F. Goldfarb.
The SGML Handbook.
Oxford University Press, New York, 1990.
|
|
|
T. Goldstein.
The gateway security model in the java electronic commerce framework.
In R. Hirschfeld, editor, Financial Cryptography First
International Conference, FC'97. Proceedings., Berlin, Germany, 1997.
Springer-Verlag.
This paper describes an extension to the current Java security
model called the Gateway and why it was necessary to create
it. This model allows secure applications, such as those used
in electronic commerce, to safely exchange data and
interoperate without compromising each individual
application's security. The Gateway uses digital signatures
to enable application programming interfaces to authenticate
their caller. JavaSoft is using the Gateway to create a new
integrated open platform for financial applications called
Java Electronic Commerce Framework. The JECF will be the
foundation for electronic wallets, point of sale terminals,
electronic merchant servers and other financial software. The
Gateway model can also be used for access control in many
multiple application environments that require trusted
interaction between applications from multiple vendors. These
applications include browsers, servers, operating systems,
medical systems and smartcards.
|
|
|
Moises Goldszmidt and Mehran Sahami.
A probabilistic approach to full-text document clustering.
Technical report, SRI International, 1998.
|
|
|
Gene Golovchinsky.
Queries? links? is there a difference?
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'97, 1997.
|
|
|
G. Golub and C. Van Loan.
Matrix Computations.
John Hopkins Press, 1989.
|
|
|
Marcos Andre Goncalves and Edward A. Fox.
5sl - a language for declarative specification and generation of
digital libraries.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
Digital libraries (DLs) are among the most complex kinds of
information systems, due in part to their intrinsic multi-disciplinary nature.
Nowadays DLs are built within monolithic, tightly integrated, and generally
inflexible systems - or by assembling disparate components together in an
ad-hoc way, with resulting problems in interoperability and adaptability.
More importantly, conceptual modeling, requirements analysis, and software
engineering approaches are rarely supported, making it extremely difficult to
tailor DL content and behavior to the interests, needs, and preferences of
particular communities. In this paper, we address these problems. In particular,
we present 5SL, a declarative language for specifying and generating domain-
specific
digital libraries. 5SL is based on the 5S formal theory for digital libraries
and
enables high-level specification of DLs in five complementary dimensions,
including: the kinds of multimedia information the DL supports (Stream Model);
how that information is structured and organized (Structural Model); different
logical and presentational properties and operations of DL components
(Spatial Model); the behavior of the DL (Scenario Model); and the different
societies of actors and managers of services that act together to carry out
the DL behavior (Societal Model). The practical feasibility of the approach is
demonstrated by the presentation of a 5SL digital library generator for the
MARIAN digital library system.
|
|
|
Marcos Andre Goncalves, Edward A. Fox, Aaron Krowne, Pavel Calado, Alberto H.F.
Laender, Altigran S. da Silva, and Berthier Ribeiro-Neto.
The effectiveness of automatically structured queries in digital
libraries.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
Structured or fielded metadata is the basis for
many digital library services, including searching and browsing. Yet,
little is known about the impact of using structure in the
effectiveness of such services. In this paper, we investigate a key
research question: do structured queries improve effectiveness in DL
searching? To answer this question, we empirically compared the use of
unstructured queries to the use of structured queries. We then tested
the capability of a simple Bayesian network system, built on top of a
DL retrieval engine, to infer the best structured queries from the
keywords entered by the user. Experiments performed with 20 users
working with a DL containing a large collection of computer science
literature clearly indicate that structured queries, either manually
constructed or automatically generated, perform better than their
unstructured counterparts, in the majority of cases. Also, automatic
structuring of queries appears to be an effective and viable
alternative to manual structuring that may significantly reduce the
burden on users.
|
|
|
Marcos Andr‰ Gon‡alves, Ganesh Panchanathan, Unnikrishnan Ravindranathan, Aaron
Krowne, Edward A. Fox, Filip Jagodzinski, and Lillian Cassel.
The xml log standard for digital libraries: Analysis, evolution, and
deployment [short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
We describe current efforts and developments building on our proposal for
an XML log standard format for digital library (DL) logging analysis and companion tools.
Focus is given to the evolution of formats and tools based on analysis of deployment in
several DL systems and testbeds. Recent development of analysis tools also is discussed.
|
|
|
Google inc.
http://www.google.com.
|
|
|
Chetan Gopal and Roger Price.
Multimedia information delivery and the mheg standard.
In DAGS '95, 1995.
Format: Not Yet On-line.
Audience: Multimedia standards setters, developers - technical.
References: 11.
Relevance: Low.
Abstract: Describes the MHEG standard being developed for
multimedia objects and applications. Designed to deliver
real-time interchange of multimedia objects over wide area
networks.
|
|
|
D. A. Gorssman and J. R. Driscoll.
Structuring text within a relation system.
In Proc. of the 3rd Intl. Conf. on Database and Expert System
Applications, pages 72-77, September 1992.
|
|
|
Adrian Graham, Hector Garcia-Molina, Andreas Paepcke, and Terry Winograd.
Time as essence for photo browsing through personal digital
libraries.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
We developed two photo browsers for collections with thousands
of time-stamped digital images. Modern digital cameras record photo shoot times,
and semantically related photos tend to occur in bursts. Our browsers exploit
the
timing information to structure the collections and to automatically generate
meaningful summaries. The browsers differ in how users navigate and view the
structured collections. We conducted user studies to compare the two browsers
and a commercial image browser. Our results show that exploiting the time
dimension and appropriately summarizing collections can lead to significant
improvements. For example, for one task category, one of our browsers
enabled a 33commercial browser. Similarly, users were able to complete 29when using this same browser.
|
|
|
Peter S. Graham.
Intellectual preservation and electronic intellectual property.
In IP Workshop Proceedings, 1994.
Format: HTML Document (43K).
Audience: Non-technical, librarians.
References: 13 notes.
Links: 0.
Relevance: Low.
Abstract: Discussion of ensuring authenticity of documents, essentially
just notarization.
|
|
|
Peter S. Graham.
The digital research library: Tasks and commitments.
In Proceedings of the Second Annual Conference on the Theory and
Practice of Digital Libraries, 1995.
Format: HTML Document(36K) .
Audience: Librarians.
References: 23.
Links: 8.
Relevance: Low.
Abstract: Discusses technical and organizational challenges which must
be met to have a real digital research library. SIgnificant one is obtaining
the institutional commitments to ensure longevity of the collection and acces
s to it. Discusses some of the required tasks (like cataloging, backup ,
authentication) at a high level.
|
|
|
Karen D. Grant, Adrian Graham, Tom Nguyen, Andreas Paepcke, and Terry Winograd.
Beyond the shoe box: Foundations for flexibly organizing photographs
on a computer.
Technical Report 2002-45, Stanford University, 2002.
As a foundation for designing computer-supported photograph
management tools, we have been conducting focused experiments. Here, we
describe our analysis of how people initially organize batches of familiar
images. We asked 26 subjects in pairs to organize 50 images on a common
horizontal table. Each pair then organized a different 50-image set on a
computer table of identical surface area. The bottom-projected computer
tabletop displayed our interface to several online, pile-based affordances
we wished to evaluate. Subjects used pens to interact with the system. We
highlight aspects of the computer environment that were notably important to
subjects, and others that they cared about less than we had hypothesized.
For example, a strong majority preferred computer-generated representations
of piles to be grid- shaped over several alternatives, some of which mimicked
the physical world closely, and others that used transparency to save space.
|
|
|
Luis Gravano, Chen-Chuan K. Chang, Héctor García-Molina, and Andreas
Paepcke.
STARTS: Stanford protocol proposal for Internet retrieval and
search.
Technical Report SIDL-WP-1996-0043; 1997-68, Stanford University,
August 1996.
Accessible at http://dbpubs.stanford.edu/pub/1997-68.
|
|
|
Luis Gravano, Chen-Chuan K. Chang, Héctor García-Molina, and Andreas
Paepcke.
STARTS: Stanford proposal for Internet meta-searching.
In Proceedings of the International Conference on Management of
Data, 1997.
|
|
|
Luis Gravano, Chen-Chuan K. Chang, Hector Garcia-Molina, and Andreas Paepcke.
Starts: Stanford proposal for internet meta-searching.
In Proc. of the 1997 ACM SIGMOD International Conference On
Management of Data, 1997.
|
|
|
Luis Gravano and Héctor García-Molina.
Generalizing GlOSS to vector-space databases and broker
hierarchies.
In Proceedings of the Twenty-first International Conference on
Very Large Databases, pages 78-89, September 1995.
|
|
|
Luis Gravano and Hector Garcia-Molina.
Merging ranks from heterogeneous internet sources.
In Proceedings of the Twenty-third International Conference on
Very Large Databases, 1997.
|
|
|
Luis Gravano, Héctor García-Molina, and Anthony Tomasic.
The effectiveness of GlOSS for the text-database discovery
problem.
In Proceedings of the International Conference on Management of
Data, May 1994.
The popularity of on-line document databases has led to a
new problem: finding which text databases (out or many candidate choices)
are
the most relevant to a user. Identifying the relevant databases for a
given
query is the text database discovery problem. The first part of this
paper
presents a practical solution based on estimating the
result size of a query and a database. The method is termed
GLOSS-Glossary of Servers Server. The second part of this
paper evaluates the effectiveness of GLOSS based on a trace
of real user queries. In addition, we analyze the storage
cost of our approach.
|
|
|
Jim Gray, Pat Helland, Patrick E. O'Neil, and Dennis Shasha.
Dangers of replication and a solution.
SIGMOD, pages 173-82, June 1996.
Update anywhere-anytime-anyway transactional replication has
unstable
behavior as the workload scales up: a ten-fold increase in nodes and
traffic gives a thousand fold increase in deadlocks or
reconciliations. Master copy replication (primary copy) schemes
reduce this problem. A simple analytic model demonstrates these
results. A new two-tier replication algorithm is proposed that
allows
mobile (disconnected) applications to propose tentative update
transactions that are later applied to a master copy. Commutative
update transactions avoid the instability of other replication
schemes
|
|
|
Jim Gray and Andreas Reuter.
Transaction Processing: concepts and techniques.
Morgan Kaufmann Publishers, Inc., 1993.
This is a comprehensive book on transaction processing. Chapter
3 introduces the concept of Fault Tolerance. Chapter 4 presents
different transaction models. Chapters 5,6 give an overview of
the functionality of the TP monitor. Chapters 7,8 describe
concurrency control and its implementation. Chapter 9 give an
overview of recovery and how to implement logs. Chapter 10,11
defines a transaction manager and how to implement it. Chapter 12
is a compedium of advanced transaction manager topics including
heterogeneous commit coordinators, non-blocking commit coordinators,
transfer of commit, optimization of 2-phase commit, and disaster
recovery. The rest of the book describes in detail the low level
implementation of a transaction processing system, and provides
a survery of TP systems in the market.
|
|
|
Robert Gray.
Content-based image retrieval: Color and edges.
In DAGS '95, 1995.
Format: Not Yet On-line.
Audience: Vision/graphics researchers.
References: 15.
Links: .
Relevance: Low.
Abstract: Technical description of implementation of two techniques to
retrieve images based on color histograms and edge maps. Implemented and tested
on a small (48 image) database. Results mixed at best. Weaknesses identif
ied for future work.
|
|
|
Noah Green, Panagiotis G. Ipeirotis, and Luis Gravano.
Sdlip + starts = sdarts: A protocol and toolkit for metasearching.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
In this paper we describe how we combined SDLIP and STARTS,
two complementary protocols for searching over distributed document
collections. The resulting protocol, which we call SDARTS, is simple yet
expressible enough to enable building sophisticated metasearch engines.
SDARTS can be viewed as an instantiation of SDLIP with metasearch-specific
elements from STARTS. We also report on our experience building three
SDARTS-compliant wrappers: for locally available plain-text document
collections, for locally available XML document collections, and for
external web-accessible collections. These wrappers were developed to
be easily customizable for new collections. Our work was developed as
part of Columbia University's Digital Libraries Initiative-Phase 2 (DLI2)
project, which involves the departments of Computer Science, Medical
Informatics, and Electrical Engineering, the Columbia University libraries,
and a large number of industrial partners. The main goal of the project is
to provide personalized access to a distributed patient-care digital library.
|
|
|
Stephen J. Green.
Automated link generation: can we do better than term repetition?
In Proceedings of the Seventh International World-Wide Web
Conference, 1998.
|
|
|
Saul Greenberg and David Marwood.
Real time groupware as a distributed system: Concurrency control and
its effect on the interface.
In Richard Furuta and Christine Neuwirth, editors, CSCW '94,
New York, 1994. ACM.
This paper exposes the concurrency control problem in
groupware when it is implemented as a distributed
system. Traditional concurrency control methods
cannot be applied directly to groupware because
system interactions include people as well as
computers. Methods, such as locking, serialization,
and their degree of optimism, are shown to have
quite different impacts on the interface and how
operations are displayed and perceived by group
members. The paper considers both human and
technical considerations that designers should
ponder before choosing a particular concurrency
control method. It also reviews the authors`
work-in-progress designing and implementing a
library of concurrency schemes in GROUPKIT, a
groupware toolkit.
|
|
|
Adrienne GreenHeart.
Making multimedia work for women.
In DAGS '95, 1995.
Format: HTML Document (7K)
Audience: Women writers, readers.
References: 6. (though not in on-line version)
Links: 1.
Abstract: Argues that the non-linear nature of multimedia fits better
with the more cyclical nature of female life, and the non-linear way that many
women authors write. The new medium offers women a chance to fight the patria
rchy of tradition.
|
|
|
Philip Greenspun.
We have chosen shame and will get war.
In JEP, 1994.
Format: HTML Document (22K) .
Audience: Browser developers, content publishers.
References: 7.
Links: 13.
Relevance: low.
Abstract: Quote from conclusion HTML is inadequate. It lacks sufficient
structural and formatting tags to even render certain kinds of fiction
comprehensible much less aesthetic. HTML needs style sheets or improved
formatting
capabilities so that document designers can spare 20 million Internet users from
adjusting everything themselves. The META tag in HTML level 2 can be exploited
to implement a document typing system. We need to develop a hierarchy of do
cument types to facilitate implementation of programs that automatically process
Web documents. This type system must support multiple inheritance.
|
|
|
José-Marie Griffiths and Kimberly K. Kertis.
Access to large digital libraries of scientific information across
networks.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (34K) .
Audience: slightly technical, funders, general technology.
References: 14.
Links: 1.
Relevance: Low-Medium.
Abstract: Describes U. of Tennessee's Digital Library proposal.
Focuses on: representation, navigation, retrieval,
display of information; performance & scalability;
different user interfaces for different user groups.
Semantic net concept representation.
|
|
|
Gary N. Griswold.
A method for protecting copyright on networks.
In IP Workshop Proceedings, 1994.
Format: HTML Document(29K).
Audience: Computer scientists, specific & somewhat technical.
References: 10.
Links: 0.
Relevance: Medium.
Abstract: Secure copyrighted documents by transmitting them in an
envelope which is the
only way to view, print, etc. Periodic & per-use reverification with a server,
possible chargeback info.
PATENTS APPLIED FOR.
|
|
|
Kaj Gronbaek, Lennert Sloth, and Peter Orbaek.
Webvise: Browser and proxy support for open hypermedia structuring
mechanisms on the world wide web.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
This paper discusses how to augment the
World Wide Web with an open hypermedia service (Webvise) that provides
structures such as contexts, links, annotations, and guided tours stored in
hypermedia databases external to the Web pages. This includes the ability for
users
collaboratively to create links front parts of HTML Web pages they do not own
and
support for creating links to parts of Web pages without writing HTML target
tags. The
method for locating parts of Web pages can locate parts of pages across
frame hierarchies and it also supports certain repairs of links that break due
to modified Web pages.
Support for providing links to/from parts of non-HTML data, such
as sound and movie, will be possible via interfaces to plug-ins and
Java-based media players. The hypermedia structures are stored in a hypermedia
database, developed from the Devise Hypermedia framework, and the service is
available on the Web via an ordinary URL. The best user interface for
creating and manipulating the structures is currently provided for the
Microsoft Internet Explorer 4.x browser through COM integration that utilizes
the
Explorer's DOM representation of Web-pages. But the structures
can also be manipulated
and used via special Java applets and a pure proxy server solution is provided
for users who only need to browse the structures. A user can create and use
the external structures as `transparency' layers on top of arbitrary Web pages,
the
user can switch between viewing pageswith one or more layers (contexts) of
structures or without any external structures imposed on them.
|
|
|
Kaj Gronbaek, Lennert Sloth, and Peter Orbaek.
Webvise: browser and proxy support for open hypermedia structuring
mechanisms on the world wide web.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
|
|
|
R. L. Grossman, A. Sundaram, H. Ramamoorthy, M. Wu, S. Hogan, J. Shuler, and
O. Wolfson.
Viewing the u.s. government budget as a digital library.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
HTML Document (24K) .
Audience: Computer scientists, funders .
References: 7.
Links: 1.
Relevance: low.
Abstract: Describes a prototype system built using tools to access
data from the federal budget. Argues that statistical, numerical data is
fundamentally different from text and multimedia.
|
|
|
Tao Guan and Kam-Fai Wong.
Kps: a web information mining algorithm.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
The Web mostly contains semi-structured
information. It is, however, not easy to search
and extract structural
data hidden in a Web page. Current practices
address this problem by (1) syntax analysis
(i.e. HTML tags); or (2)
wrappers or user-defined declarative languages.
The former is only suitable for highly
structured Web sites and the latter is
time-consuming and offers low scalability.
Wrappers could handle tens, but certainly not
thousands, of information sources.
In this paper, we present a novel information
mining algorithm, namely KPS, over semi-structured
information on the Web.
KPS employs keywords, patterns and/or samples
to mine the desired information. Experimental
results show that KPS is
more efficient than existing Web extracting
methods.
|
|
|
François Guimbretiére and Terry Winograd.
Flowmenu: combining command, text, and data entry.
In UIST '00: Proceedings of the 13th annual ACM symposium on
User interface software and technology, pages 213-216, New York, NY, USA,
2000. ACM Press.
|
|
|
Oliver Gunther, Rudolf Muller, and Andreas S. Wiegand.
The design of mmm: A model management system for time series
analysis.
In DAGS '95, 1995.
Format: HTML Document(46K + pictures) .
Audience: Mathematicians, economists, statisticians, computer
scientists.
References: 37.
Links: 13.
Relevance: Low.
Abstract: Proposes a web-based repository for software implementing
time series analysis methods. Such a system would facilitate collaboration,
verification of results, and would help build an experience base of which models
worked well under which circumstances. Briefly describes the architecture,
which requires method implementors to specify the methods in terms of Ypsilon
(an abstract class system) classes.
|
|
|
A. Gupta, V. Harinarayan, and A. Rajaraman.
Virtual database technology.
In Proceedings of the Fourteenth International Conference on
Data Engineering, pages 23-27, February 1998.
|
|
|
Amarnath Gupta, Bertram Ludaescher, and Reagan W. Moore.
Ontology services for curriculum development in nsdl.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
We describe our effort to develop an ontology service on top of an
educational digital library. The ontology is developed by relating library
holdings
to the educational concepts they refer to. The ontology system supports basic
services like ontology-based search and complex services such as comparison of
multiple curricula.
|
|
|
Aparna Gurijala and Jr. J.R. Deller.
A quantified fidelity criterion for parameter-embedded watermarking
of audio archives [short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
A novel algorithm for speech watermarking through parametric modeling
is enhanced by inclusion of a quantified fidelity criterion. Watermarking is effected
through solution of a set-membership filtering (SMF) problem, subject to an l(infinity)
fidelity criterion in the signal space. The SMF approach provides flexibility in obtaining
watermark solutions that trade-off watermark robustness and stegosignal fidelity.
|
|
|
Samuel Gustman, Dagobert Soergel, Douglas Oard, William Byrne, Michael Picheny,
Bhuvana Ramabhadran, and Douglas Greenberg.
Supporting access to large digital oral history archives.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
This paper, describes our experience with the creation, indexing
and providing access to a very large archive of videotaped oral histories-
116,000
hours of digitized interviews in 32 languages from 52,000 survivors, liberators,
rescuers and witnesses of the Nazi Holocaust-and identifies a set of critical
research issues in user requirement studies, automatic speech recognition,
automatic classification, segmentation, and summarization, retrieval, and user
interfaces that must be addressed if we are to provide full and detailed access
to collections of this size.
|
|
|
Marc Gyssens, Jan Paredaens, and Dirk Van Gucht.
A grammar-based approach towards unifying hierarchical data models.
In Proceedings of the International Conference on Management of
Data, Portland, Oreg., June 1989. ACM Press, New York.
|
|
|
H.-J.Zimmermann.
Fuzzy Set Theory.
Kluwer Academic Publishers, 1996.
|
|
|
Laura M. Haas, Donald Kossmann, Edward L. Wimmers, and Jun Yang.
Optimizing queries across diverse data sources.
In Proceedings of the Twenty-third International Conference on
Very Large Databases, pages 276-285, Athens, Greece, August 1997. VLDB
Endowment, Saratoga, Calif.
|
|
|
Robert J. Hall.
Agents helping agents: Issues in sharing how-to knowledge.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
Joseph Y. Halpern and Carl Lagoze.
The computing research repository: Promoting the rapid dissemination
and archiving of computer science research.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
We describe the Computing Research Repository (CoRR), a new
electronic archive for rapid dissemination and archiving of
computer science research results. CoRR was initiated in
September 1998 through the cooperation of ACM, LANL (Los
Alamos National Laboratory) e-Print archive, and NCSTRL
(Networked Computer Science Technical Reference Library.
Through its implementation of the Dienst protocol, CoRR
combines the open and extensible architecture of NCSTRL with
the reliable access and well-established management practices of
the LANL XXX e-Print repository. This architecture will allow
integration with other e-Print archives and provides a foundation
for a future broad-based scholarly digital library. We describe the
decisions that were made in creating CoRR, the architecture of the
CoRR/NCSTRL interoperation, and issues that have arisen during
the operation of CoRR.
|
|
|
Joachim Hammer, Hector Garcia-Molina, Junghoo Cho, Arturo Crespo, and Rohan
Aranha.
Extracting semistructured information from the web.
In Proceedings of the Workshop on Management of Semistructured
Data, 1997.
|
|
|
Kristian Hammond, Robin Burke, Charles Martin, and Steven Lytinen.
Faq finder: A case-based approach to knowledge navigation.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang, and
Edward A. Fox.
Automatic document metadata extraction using support vector machines.
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
Automatic metadata generation provides scalability and
usability for digital libraries and their collections. Machine learning
methods offer robust and adaptable automatic metadata extraction. We
describe a support vector machine classification-based method for
metadata extraction from the header part of the research papers and show
that it outperforms other machine learning methods on the same task. The
method first classifies each line of the header into one or more of the
15 classes. An iterative convergence procedure is then used to improve the
line classification by using the predicted class labels of its neighbor
lines in the previous round. Further metadata extraction is done by seeking
the best chunk boundaries of each line. We found that discovery and use of
the structural patterns of the data and domain based feature selection can
improve the metadata extraction performance. An appropriate feature
normalization also greatly improves the classification performance.
|
|
|
Hui Han, C. Lee Giles, Hongyuan Zha, Cheng Li, and Kostas Tsioutsiouliklis.
Two supervised learning approaches for name disambiguation in author
citations.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
Due to name abbreviations, identical names, or
name misspellings in publications or bibliographies (citations), an
author may have multiple names and multiple authors may share the same
name. Such name ambiguity affects the performance of document
retrieval, web search, database integration, and may cause improper
attribution to authors. This paper investigates two supervised learning
approaches to disambiguate authors in the citations. One approach uses
the naive Bayes probability model, a generative model; the other uses
Support Vector Machines(SVMs) [?] and vector space
representation of citations, a discriminative model. Both approaches
utilize three types of citation attributes: co-author names, the title
of the paper ,and the title of the journal or proceeding. We illustrate
these two approaches on two types of data, one collected from the web,
mainly publication lists from homepages, the other collected from the
DBLP citation databases.
|
|
|
Hui Han, Hongyuan Zha, and C. Lee Giles.
Name disambiguation in author citations using a k-way spectral
clustering method.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
An author may have multiple names and multiple authors may share the same name simply due to name abbreviations, identical names, or name misspellings in publications or bibliographies (citations). This can produce name ambiguity which can affect the performance of document retrieval, web search, and database integration, and may cause improper attribution of credit. Proposed here is an unsupervised learning approach using K-way spectral clustering that disambiguates authors in citations. The approach utilizes three types of citation attributes: co-author names, paper titles, and publication venue titles. The approach is illustrated with 16 name datasets with citations collected from the DBLP database bibliography and author home pages and shows that name disambiguation can be achieved using these citation attributes.
|
|
|
Handango, inc.
http://www.handango.com.
|
|
|
Wei hao Lin and Alex Hauptmann.
A wearable digital library of personal conversations.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
We have developed a wearable, personalized digital library system,
which unobtrusively records the wearer's part of a conversation, recognizes the
face of the current dialog partner and remembers his/her voice. The next time
the system sees the same person's face and hears the same voice, it can replay
the audio from the last conversation in compressed form summarizing the names
and major issues mentioned. Experiments with a prototype system show that a
combination of face recognition and speaker identification can be effective
for retrieving conversations.
|
|
|
Masanori Harada, Shin ya Sato, and Kazuhiro Kazama.
Finding authoritative people from the web.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
Today's web is so huge and diverse
that it arguably reflects the real world. For this reason, searching the web is
a promising approach to find things in the real world. This paper presents
NEXAS, an extension to web search engines that attempts to
find real-world entities relevant to a topic. Its basic idea is to
extract proper names from the web pages retrieved for the topic. A main
advantage of this approach is that users can query any topic and
learn about relevant real-world entities without dedicated databases for the topic.
In particular, we focus on an application for finding
authoritative people from the web. This application is practically
important because once personal names are obtained, they can
lead users from the web to managed information stored in digital libraries.
To explore effective ways of finding
people, we first examine the distribution of Japanese personal names by
analyzing about 50 million Japanese web pages. We observe that personal
names appear frequently on the web, but the distribution is
highly influenced by automatically generated texts. To remedy the
bias and find widely acknowledged people accurately, we utilize
the number of servers containing a name instead of the number of web pages.
We show its effectiveness by
an experiment covering a wide range of topics. Finally, we demonstrate
two examples and discuss possible applications.
|
|
|
Susumu Harada, Mor Naaman, Yee Jiun Song, QianYing Wang, and Andreas Paepcke.
Lost in memories: Interacting with photo collections on PDAs.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
We developed two browsers to support large
personal photo collections on PDAs. Our first browser is based on a
traditional, folder-based layout that utilizes either the user's
manually created organization structure, or a system-generated
structure. Our second browser uses a novel interface that is based on a
vertical, zoomable timeline. This timeline browser does not require
users to organize their photos, but instead, relies solely on
system-generated structure. Our system creates a hierarchical structure
of the user's photos by applying time-based clustering to identify
subsets of photos that are likely to be related. In a user experiment,
we compared users' searching and browsing performance across these
browsers, using each user's own photo collection. Photo collection
sizes varied between 500 and 3000 photographs. Our results show that
our timeline browser is at least as effective for searching and
browsing tasks as a traditional browser that requires users to manually
organize their photos.
|
|
|
Darren R. Hardy, Michael F. Schwartz, and Duane Wessels.
Harvest user's manual, January 1996.
Accessible at
http://harvest.transarc.com/afs/transarc.com/public/trg/Harvest/
user-manual.
|
|
|
Donna Harman.
Document detection overview.
In Proceedings TIPSTER Text Program (Phase I), Fredricksburg,
Va., September 1993. Morgan Kaufmann, San Francisco, Calif.
|
|
|
David J Harper, Sara Coluthard, Raja Kalpana, and Sun Yixing.
A language modelling approach to relevance profiling for document
browsing.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
This paper describes a novel tool, SmartSkim, for content-based
browsing or skimming of documents. The tool integrates concepts from passage
retrieval and from interfaces, such as TileBars, which provide a compact
overview of query term hits within a document. We base our tool on the concept
of relevance profiling, in which a plot of retrieval status values at each word
position of a document is generated. A major contribution of this paper is
applying language modelling to the task of relevance profiling. We describe
in detail the design of the SmartSkim tool, and provide a critique of the
design.
Possible applications of the tool are described, and we consider how an
operational
version of SmartSkim might be architected.
|
|
|
Terry L. Harrison, Michael L. Nelson, and Mohammad Zubair.
The dienst-oai gateway [short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
Though the Open Archive Initiative Protocol for Metadata Harvesting
(OAI-PMH) is becoming the de facto standard for digital libraries, some of its
predecessors are still in use. Although a limited number of Dienst repositories continue
to be populated, others are precariously unsupported. The Dienst Open Archive Gateway
(DOG) is a gateway between the OAI-PMH and the Dienst (version 4.1) protocol. DOG allows
OAI-PMH harvesters to extract metadata records (in RFC-1807 or Dublin Core) from Dienst
servers.
|
|
|
Scott W. Hassan and Andreas Paepcke.
Stanford Digital Library Interoperability Protocol.
Technical Report SIDL-WP-1997-0054; 1997-73, Stanford University,
1997.
Accessible at http://dbpubs.stanford.edu/pub/1997-73.
|
|
|
Franz J. Hauck.
Supporting hierarchical guided tours in the world wide web.
In Proceedings of the Fifth International World-Wide Web
Conference, 1996.
|
|
|
Alex Hauptmann, Rong Jin, and Tobun Dorbin Ng.
Multi-modal information retrieval from broadcast video using ocr and
speech recognition.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
We examine multi-modal information retrieval from broadcast video
where text can be read on the screen through OCR and speech recognition can be
performed on the audio track. OCR and speech recognition are compared on the
2001 TREC Video Retrieval evaluation corpus. Results show that OCR is more
important that speech recognition for video retrieval. OCR retrieval can
further improve through dictionary-based post-processing. We demonstrate how
to utilize imperfect multi-modal metadata results to benefit multi-modal
information retrieval.
|
|
|
Alex Hauptmann and Norman Papernick.
Video-cuebik: Adapting image search to video shots.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
We propose a new analysis for searching images in video libraries
that goes beyond simple image search, which compares one still image frame to
another. The key idea is to expand the definition of an image to account for
the variability in the sequence of video frames that comprise a shot. A first
implementation of this method for a QBIC-like image search engine shows a clear
improvement over still image search. A combination of the traditional still
image search and the new video image search provided the overall best results
on the TREC video retrieval evaluation data.
|
|
|
Alexander G. Hauptmann, Michael J. Witbrock, and Michael G. Christel.
News-on-demand: An application of informedia technology.
D-Lib Magazine, Sep 1995.
Format: HTML Document().
|
|
|
Taher Haveliwala.
Efficient computation of pagerank.
Technical Report 1999-31, Database Group, Computer Science
Department, Stanford University, February 1999.
Available at http://dbpubs.stanford.edu/pub/1999-31.
|
|
|
Taher H. Haveliwala.
Topic-sensitive pagerank.
In WWW '02: Proceedings of the 11th international conference on
World Wide Web, pages 517-526, New York, NY, USA, 2002. ACM Press.
In the original PageRank algorithm for improving the
ranking of search-query results, a single PageRank
vector is computed, using the link structure of the
Web, to capture the relative importance of Web
pages, independent of any particular search
query. To yield more accurate search results, we
propose computing a set of PageRank vectors, biased
using a set of representative topics, to capture
more accurately the notion of importance with
respect to a particular topic. By using these
(precomputed) biased PageRank vectors to generate
query-specific importance scores for pages at query
time, we show that we can generate more accurate
rankings than with a single, generic PageRank
vector. For ordinary keyword search queries, we
compute the topic-sensitive PageRank scores for
pages satisfying the query using the topic of the
query keywords. For searches done in context (e.g.,
when the search query is performed by highlighting
words in a Web page), we compute the topic-sensitive
PageRank scores using the topic of the context in
which the query appeared.
|
|
|
D. Hawking and N. Craswell.
Overview of TREC-7 very large collection track.
In Proc. of the Seventh Text Retrieval Conf., pages 91-104,
November 1998.
|
|
|
David Hawking, Nick Craswell, Paul Thistlewaite, and Donna Harman.
Results and challenges in web search evaluation.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
A frozen 18.5 million page snapshot of part
of the Web has been created to enable and
encourage meaningful and
reproducible evaluation of Web search
systems and techniques. This collection is
being used in an evaluation framework
within the Text Retrieval Conference (TREC)
and will hopefully provide convincing
answers to questions such as, Can
link information result in better
rankings?`, Do longer queries result in
better answers?`, and, Do TREC systems
work well on Web data? The snapshot and
associated evaluation methods are described
and an invitation is extended to
participate. Preliminary results are
presented for an effective comparison of
six TREC systems working on the snapshot
collection against five well-known Web
search systems working over the current Web.
These suggest that the standard of
document rankings produced by public Web
search engines is by no means state-of-the-art.
|
|
|
D. T. Hawkins and L. R. Levy.
Front end software for online database searching Part 1:
Definitions, system features, and evaluation.
Online, 9(6):30-37, November 1985.
|
|
|
Marti A. Hearst.
Tilebars: Visualization of term distribution information in full text
information access.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'95, 1995.
The field of information retrieval has traditionally
focused on textbases consisting of titles and
abstracts. As a consequence, many underlying
assumptions must be altered for retrieval from
full-length text collections. This paper argues for
making use of text structure when retrieving from full
text documents, and presents a visualization paradigm,
called TileBars, that demonstrates the usefulness of
explicit term distribution information in Boolean-type
queries. TileBars simultaneously and compactly indicate
relative document length, query term frequency, and
query term distribution. The patterns in a column of
TileBars can be quickly scanned and deciphered, aiding
users in making judgments about the potential relevance
of the retrieved documents.
|
|
|
Sandra Heiler.
Semantic interoperability.
ACM Computing Surveys, 27(2):271-273, June 1995.
Discusses the issues related to semantic
interoperability. The purposes are to indicate why
semantic interoperability is so hard to achieve, and
to suggest that repository technology can provide the
beginnings of help to make it easier.
|
|
|
Albert Henning.
Dynamic authoring and retrieval of textbook information: Dartext.
In DAGS '95, 1995.
Format: HTML Document (34K + pictures) .
Audience: Instructors, students, textbook authors and publishers.
References: 15.
Links: 6.
Relevance: Low-medium.
Abstract: A very broad but shallow description of the textbook
production business. Argues for a distributed author model, but with publishers
that still piece together textbooks from the contributions of instructors,
students
, etc. CD-ROM versions in addition to on-line. Authors paid according to their
contribution. Briefly mentions administration, intellectual property issues.
Longer example of physics/engineering systems demo.
|
|
|
Jr. Henry H. Perritt.
Permission headers and contract law.
In IP WOrkshop Proceedings, 1994.
Format: HTML Document (71K).
Audience: Public policy, lawyers, and developers.
References: 47 notes.
Links: 0.
Relevance: Medium.
Abstract: Focusing primarily on intellectual property, this article
covers
a lot of ground. Briefly describes the CNRI copyright management project,
argues for permission headers that describe how each of the various protected
rights (viewing, copying, preparing derivative works, etc) can be in the header,
along with economic information. Describes whether digitally signed contracts
are likely to be legally enforceable (they probably are), and under what ci
rcumstances electronic records are court-admissable (when they are generated as
a regular course of business, and there's no reason to doubt them). Argues
against general encryption, too expensive & inconsistent with the open market o
f ideas. Seeks legal protection commensurate with the value of a transaction.
|
|
|
Monika R. Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc Najork.
Measuring index quality using random wals on the web.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
Recent research has studied how to measure
the size of a search engine, in terms of the
number of pages indexed. In this
paper, we consider a different measure for
search engines, namely the quality of the
pages in a search engine index. We
provide a simple, effective algorithm for
approximating the quality of an index by
performing a random walk on the Web,
and we use this methodology to compare the
index quality of several major search
engines.
|
|
|
Ralf G. Herrtwich and Thomas Kaeppner.
Network computers-ubiquitous computing or dumb multimedia?
In Third International Symposium on Autonomous Decentralized
Systems. IEEE Computer Society Press, 1997.
Introduces the NC spec and discusses its chances
|
|
|
M. Hersovici, M. Jacovi, Y. Maarek, D. Pelleg, M. Shtalhaim, and S. Ur.
The shark-search algorithm - an application: tailored web site
mapping.
In Proceedings of the 7th World Wide Web Conference, 1998.
This paper introduces the shark search algorithm, a refined
version of the first dynamic Web search lgorithms, the fish search. The
shark-search has been embodied into a dynamic Web site mapping that
enables users to tailor Web maps to their interests. Preliminary
experiments show significant improvement over the original fish-search
algorithm.
|
|
|
Walter B. Hewlett and eds. Eleanor Selfridge-Field.
Melodic Similarity. Concepts, Procedures, and Applications.
MIT Press and Center for Computing in the Humanities (CCARH),
Stanford University, 1998.
|
|
|
Allan Heydon and Marc Najork.
Mercator: A scalable, extensible Web crawler.
World Wide Web, 2(4):219-229, December 1999.
|
|
|
Linda L. Hill, James Frew, and Qi Zheng.
Geographic names - the implementation of a gazetteer in a
georeferenced digital library.
CNRI D-Lib Magazine, January 1999.
|
|
|
W. Hill and J. Hollan.
History-enriched digital objects: Prototypes and policy issues.
The Information Society, 10(2), April-June 1994.
Recording on digital objects (e. g. reports, forms, contracts,
mail-order catalogs, source code, manual pages, email,
spreadsheets, menus) the interaction events that comprise their use
makes it possible on future occasions, when the objects are used
again, to display graphical abstractions of the accrued histories as
parts of the objects themselves. For example, co-authors of a
report can see stable and unstable sections (lines of text are
marked by recency of changes or amount of editing) and identify
who has written what and when. In the case of reading
documentation, a reader can see who else has previously read a
particular section of interest. While using a spreadsheet to refine a
budget, the count of edits per spreadsheet cell can be mapped onto
grayscale to give an impression of which budget numbers have
been reworked the most and least. Or in the context of learning
unfamiliar menu selections in a new piece of software, the menu
itself can depict the distribution statistics of colleagues' previous
menu selections in the same or similar contexts. There are many
existing computational devices that hint at the prospect of
history-enriched digital objects. Automatic change-bars, citation
indices, and download counts on computer bulletin boards are
examples. In fact, for the last thirteen years, members of our lab
have been able to request AP News articles by specifying a
minimum number of previous readers and thus easily retrieve
articles that colleagues have chosen to read.
|
|
|
W. Hill, L. Stead, M. Rosenstein, and G. Furnas.
Recommending and evaluating choices in a virtual community of use.
In Proceedings of the Conference on Human Factors in Computing
Systems CHI'95, New York, 1995. ACM.
When making a choice in the absence of decisive first-hand
knowledge, choosing as other like-minded, similarly-situated
people have successfully chosen in the past is a good
strategy-in effect, using other people as filters and guides:
filters to strain out potentially bad choices and guides to
point out potentially good choices. Current human-computer
interfaces largely ignore the power of the social strategy.
For most choices within an interface, new users are left to
fend for themselves and if necessary, to pursue help outside
of the interface. We present a general history-of-use method
that automates a social method for informing choice and
report on how it fares in the context of a fielded test case:
the selection of videos from a large set. The positive
results show that communal history-of-use data can serve as a
powerful resource for use in interfaces.
|
|
|
Jun Hirai, Sriram Raghavan, Hector Garcia-Molina, and Andreas Paepcke.
Webbase: A repository of web pages.
In Proceedings of the Ninth International World-Wide Web
Conference, pages 277-293, May 2000.
Available at http://dbpubs.stanford.edu/pub/2000-51.
In this paper, we study the problem of constructing and
maintaining a large shared repository of web pages. We
discuss the unique characteristics of such a
repository, propose an architecture, and identify its
functional modules. We focus on the storage manager
module, and illustrate how traditional techniques for
storage and indexing can be tailored to meet the
requirements of a web repository. To evaluate design
alternatives, we also present experimental results from
a prototype repository called WebBase, that is
currently being developed at Stanford
University. Keywords : Repository, WebBase,
Architecture, Storage management
|
|
|
Steve Hitchcock, Les Carr, Zhuoan Jiao, Donna Bergmark, Wendy Hall, Carl
Lagoze, and Stevan Harnad.
Developing services for open eprint archives: Globalisation,
integration and the impact of links.
In Proceedings of the Fifth ACM International Conference on
Digital Libraries, 2000.
The rapid growth of scholarly information resources
available in electronic form and their organisation by digital
libraries is proving fertile ground for the development of
sophisticated new services, of which citation linking with be
one indispensable example. Many new projects, partnerships
and commercial agreements have been announced to build citation
linking applications. This paper describes the Open Citation
(OpCit) project, which will focus on linking papers held in
freely accessible eprint archives such as the Los Alamos physics
archives and other distributed archives, and which will build on the
work of the Open Archives initiative to make the data held in such
archives available to compliant services. The paper emphasises
the work of the project in the context of emerging digital library
information environments, explores how a range of new linking tools
might be combined and identifies ways in which different linking
applications might converge. Some early results of linked pages from
OpCit project are reported.
|
|
|
E. Hjelmas and B. K. Low.
Face detection: a survey.
Computer Vision and Image Understanding, 83(3):236 - 74, SEP
2001.
|
|
|
Patrick Hochstenbach, Henry Jerez, and Herbert Van de Sompel.
The oai-pmh static repository and static repository gateway.
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
Although the OAI-PMH specification is focused on making it straightforward for
data providers to expose metadata, practice shows that in certain significant situations deployment
of OAI-PMH conformant repository software remains problematic. In this paper, we report on research
aimed at devising solutions to further lower the barrier to make metadata collections harvestable. We
provide an in depth description of an approach in which a data provider makes a metadata collection
available as an XML file with a specific format an OAI Static Repository which is made OAI-PMH
harvestable through the intermediation of software an OAI Static Repository Gateway - operated by a
third party. We describe the properties of both components, and provide insights in our experience with
an experimental implementation of a Gateway.
|
|
|
H. Ulrich Hoppe and Jian Zhao.
C-tori: An interface for cooperative database retrieval.
In Dimitris Karagiannis, editor, 5th International Conference,
DEXA '94, Database and Expert Systems Applications, Berlin, Germany, 1994.
Springer-Verlag.
C-TORI (Cooperative TORI), a cooperative version of
TORI (Task-Oriented Database Retrieval Interface),
is presented in this paper. It extends interactive
query formulation and result browsing by supporting
cooperation between multiple users. In the
cooperative environment, three basic additional
operations are provided: copying, merging and
coupling for three types of TORI objects (query
forms, result forms, and query history
windows). Cooperation with query forms allows end
users to jointly formulate queries; cooperation with
result forms supports users in jointly browsing
through results and in sharing retrieved data
without re-accessing the database; cooperative use
of query histories yields a specific mechanism to
share memory between users. The implementation is
based on the concept of shared UI objects as an
application-independent cooperation and
communication model.
|
|
|
Ikumi Horie, Kazunori Yamaguchi, and Kenji Kashiwabara.
Higher-order rank analysis for web structure.
In HYPERTEXT '05: Proceedings of the sixteenth ACM conference on
Hypertext and hypermedia, pages 98-106, New York, NY, USA, 2005. ACM Press.
In this paper, we propose a method for the structural
analysis of Web sites.The Web has become one of the
most widely used media for electronic information
because of its great flexibility. However, this
flexibility has led to complicated structures. A
structure that differs from the typical structures
in a Web site might confuse readers, thus reducing
the effectiveness of the site. A method for
detecting unusual structures would be useful for
identifying such structures so that their impact can
be studied and ways to improve Web site
effectiveness developed.We viewed the Web as a
directed graph, and introduced a higher-order rank
based on the non-well-founded set theory. We then
developed higher-order rank analysis for detecting
irregularities, defined as structures which differ
from the typical structure of a target site. To test
the effectiveness of our method, we applied it to
several Web sites in actual use, and succeeded in
identifying irregular structures in the sites.
|
|
|
Nancy A. Van House.
User needs assessment and evaluation for the uc berkeley electronic
environmental library project.
In Proceedings of the Second Annual Conference on the Theory and
Practice of Digital Libraries, 1995.
Format: HTML Document(28K) .
Audience: HCI people, librarians.
References: 16.
Links: 3.
Relevance: Low-Medium.
Abstract: Starts off with a description of Berkeley's NSF/ARPA/NASA
project, focusing on environmental data, particularly water planning for
California. Diverse data types, bitmapped pages with OCR text. Describes the
proper
ties of the task, and the users of the system. Talks about methods of assessing
users needs, like interviews, observation, focus groups, etc. Claims that most
users' expectations are too low, so user input doesn't provide appropriate
goals.
|
|
|
Nancy A. Van House, Mark H. Butler, Virginia Ogle, and Lisa Schiff.
User-centered iterative design for digital libraries: The cypress
experience.
D-Lib Magazine, Feb 1996.
Format: HTML Document().
|
|
|
Nancy Van House.
Trust and epistemic communities in biodiversity data sharing.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
All knowledge work is, in some sense, collaborative. Trust is a
key element of knowledge work: what we know depends largely on others. A better
understanding of the epistemic machineries of knowledge communities and
especially
their practices of trust would be useful for designing effective digital
libraries.
This paper discusses the concepts of communities of practice and epistemic
cultures,
and their implication for design of digital libraries that support data sharing,
with particular reference to practices of trust and credibility. It uses an
empirical study of a biodiversity data system that collects and distributes
data from a variety of sources to illustrate the implications of these concepts
of knowledge communities for digital library design and operation. It concludes
that diversity and uncomfortable boundary areas typify, not only digital library
user groups, but the design and operation of digital libraries.
|
|
|
B.C. Housel and D.B. Lindquist.
Webexpress: A system for optimizing web browsing in a wireless
environment.
In In Proceedings of the Second Annual International Conference
on Mobile Computing and Networking, pages 108-116, November 1996, 1996.
|
|
|
Sherry Hsi and Holly Fait.
From playful exhibits to lom: Lessons from building an exploratorium
digital library.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
This paper describes several challenges that arise in designing a digital library for K12 education audiences when using Learning Object Metadata standard. These problems were multiplied when attempting to catalog the wide variety of informal learning and teaching resources from our museum's ever growing website and exhibit-based resource collections. This paper shares key challenges and early solutions for the creation of an educational metadata scheme based upon LOM, new vocabularies, and strategies for retrofitting existing informal learning science resources into learning objects.
|
|
|
Tony Hsieh, QianYing Wang, and Andreas Paepcke.
Piles across space-breaking the realestate barrier on pdas.
In Submitted for publication, 2005.
Available at http://dbpubs.stanford.edu/pub/2005-17.
We describe an implementation that has users `ck' notes,
images, audio, and videoles into piles beyond the screen
of the PDA. This scheme allows the PDA user to keep information
close at hand without sacrcing valuable screen
real estate. It also obviates the need to browse complexle
trees during a working session. Multiple workspaces can be
maintained in persistent store. Each workspace preserves one
coguration of off-screen piles. The system allows multiple
PDA owners within ad hoc radio range to share off-screen
piles. They point out to each other where a shared pile is to
reside in space. Once established, all sharing partners may
add to the pile and see its contents. One application is to
support biodiversity researchers in theeld, where they generate
data on their PDA and need to keep it organized until
they return to theireld station. We conducted an experiment
where participants used our system with up to ten simultaneous
piles. Not only were they able to operate the application,
but they remembered the location of piles when placed in
different physical environments and when asked to recall the
locations several days after the experiment. We describe gender
differences that suggest particular design choices for the
system
|
|
|
Forms in HTML documents - W3C HTML 4.01 recommendation.
http://www.w3.org/TR/html401/interact/forms.html.
|
|
|
Hypertext transfer protocol - HTTP/1.1.
ftp://ftp.isi.edu/in-notes/rfc2616.txt.
|
|
|
Michael J. Hu and Ye Jian.
Multimedia description framework (mdf) for content description of
audio/video documents.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
MPEG is undertaking a new initiative to standardize content
description of audio and video data/documents. When it is
finalized in 2001, MPEG-7 is expected to provide standardized
description language and schemes for concise and unambiguous
content description of data/documents of complex media types.
Meanwhile, other meta-data or description schemes, such as
Dublin Core, XML, RDF, etc., are becoming popular in different
application domains. In this paper, we propose Multimedia
Description Framework (MDF), which is designated to
accommodate multiple description (meta-data) schemes, MPEG-7
and non-MPEG-7, into integrated architecture. We will u`se
examples to show how MDF description makes use of combined
strength of different description schemes to enhance its expression
power and flexibility. We conclude the paper with discussion of
using MDF description of MPEG-7 Content Set to search/retrieve
required audio and video documents from the set utilizing an
MDF prototype system we have implemented.
|
|
|
Ning Hu and Roger B. Dannenberg.
A comparison of melodic database retrieval techniques using sung
queries.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
Query-by-humming systems search a database of music for good
matches to a sung, hummed, or whistled melody. Errors in transcription and
variations in pitch and tempo can cause substantial mismatch between queries
and targets. Thus, algorithms for measuring melodic similarity in query-by-
humming
systems should be robust. We compare several variations of search algorithms in
an
effort to improve search precision. In particular, we describe a new frame-based
algorithm that significantly outperforms note-by-note algorithms in tests using
sung queries and a database of MIDI-encoded music. Keywords dynamic programming,
melodic comparison, melodic searching, Music Information Retrieval (MIR), sung
query
|
|
|
Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, and Qinghua Zheng.
Automatic extraction of titles from general documents using machine
learning.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that do not belong to any specific genre, including presentations, book chapters, technical papers, brochures, reports, and letters. Previously, methods were proposed mainly for title extraction from research papers. It was not clear whether it is possible to conduct automatic title extraction from general documents. As case study, we consider extraction from Office including Word and PowerPoint. In our approach, we annotate titles in sample documents (for Word and PowerPoint respectively) and take them as training data, train machine learning models, and perform title extraction using the trained models. Our method is unique in that we mainly utilize format information such as font size as features in the models. It turns out that the use of format information is the key to a successful extraction from general documents. Precision and recall for title extraction from Word were 0.810 and 0.837 respectively, and precision and recall for title extraction from PowerPoint were 0.875 and 0.895 respectively in an experiment on intranet data. Other important new findings in this work include that we can train models in one domain and apply them to another domain, and more surprisingly we can even train models in one language and apply them to another language. Moreover, we can significantly improve search ranking results in document retrieval by using the extracted titles.
|
|
|
Mao Lin Huang, Peter Eades, and Robert F. Cohen.
Webofdav - navigating and visualizing the web on-line with animated
context swapping.
In Proceedings of the Seventh International World-Wide Web
Conference, 1998.
|
|
|
Yongqiang Huang and Hector Garcia-Molina.
Exactly-once semantics in a replicated messaging system.
In Submitted for publication, 2000.
Available at http://dbpubs.stanford.edu/pub/2000-7.
|
|
|
Zan Huang, Wingyan Chung, Thian-Huat Ong, and Hsinchun Chen.
A graph-based recommender system for digital library.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
Research shows that recommendations comprise a valuable service
for
users of a digital library. While most existing recommender systems rely either
on
a purely content-based approach or purely collaborative approach to make
recommendations, there is a need for digital libraries to use a combination of
both approaches (a hybrid approach) to improve recommendations. In this paper,
we report how we tested the idea of using a graph-based recommender system that
naturally combines the content-based and collaborative approaches. Due to the
similarity between our problem and a concept retrieval task, a Hopfield net
algorithm was used to exploit high-degree book-book, user-user and book-user
associations. Sample hold-out testing and preliminary subject testing were
conducted to evaluate the system, by which it was found that the system gained
improvement with respect to both precision and recall by combining content-based
and collaborative approaches. But no significant improvement was observed by
exploiting high-degree associations.
|
|
|
Zan Huang, Xin Li, and Hsinchun Chen.
Link prediction approach to collaborative filtering.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
Recommender systems can provide valuable services in a digital library environment, as demonstrated by its commercial success in book, movie, and music industries. One of the most successful recommendation algorithms is collaborative filtering, which explores the correlations within user-item interactions to infer user interests and preferences. However, the recommendation quality of collaborative filtering approaches is greatly limited by the data sparsity problem. To alleviate this problem we have previously proposed graph-based algorithms that explore transitive user-item associations. In this paper, we extend from the idea of analyzing user-item interactions as graphs and employ link prediction approaches proposed in the recent network modeling literature for making collaborative filtering recommendations. We have adapted a wide range of linkage measures for making recommendations. Our preliminary experimental results based on a book recommendation dataset show that some of these measures achieved significantly better performance than standard collaborative filtering algorithms.
|
|
|
Bernardo A. Huberman and Lada A. Adamic.
Growth dynamics of the World-Wide Web.
Nature, 401(6749), September 1999.
|
|
|
Scott B. Huffman and David Steier.
Heuristic joins to integrate structured heterogeneous data.
In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
|
|
|
M.N. Huhns and M.P. Singh.
Automating workflows for service provisioning: Integrating AI and
database technologies.
In Proceedings of the Tenth Conference on Artificial
Intelligence for Applications, Los Alamitos, CA, 1994. IEEE Computer Society
Press.
Workflows are the structured activities that take place
in information systems in typical business
environments. These activities frequently involve
several database systems, user interfaces, and
application programs. Traditional database systems
do not support workflows to any reasonable
extent. Usually, human beings must intervene to
ensure their proper execution. We have developed an
architecture based on AI technology that
automatically manages workflows. This architecture
executes on top of a distributed computing
environment. It has been applied to automating
service provisioning workflows; an implementation
that operates on one such workflow has been
developed. This work advances the Camel Project's
goal of developing technologies for integrating
heterogeneous database systems. It is notable in its
marriage of AI approaches with standard distributed
database techniques.
|
|
|
Jonathan T. Hujsak.
Digital libraries and corporate technology reuse.
D-Lib Magazine, Jan 1996.
Format: HTML Document().
|
|
|
David A. Hull and Gregory Grefenstette.
Querying across languages: A dictionary-based approach to
multilingual information retrieval.
In Proceedings of the Nineteenth Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval, 1996.
This paper presents cross-language multilingual
information retrieval using translated queries and a
bilingual transfer dictionary. The experiments shows
that multilingual IR is feasible, although performance
lags considerably behind the monolingual standard.
|
|
|
James J. Hunt, Kiem-Phong Vo, and Walter F. Tichy.
Delta algorithms: An empirical analysis.
ACM Transactions on Software Engineering and Methodology,
7:192-214, 1998.
|
|
|
Jane Hunter and Sharmin Choudhury.
A semi-automated digital preservation system based on semantic web
services.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
This paper describes a Web-services-based system
which we have developed to enable organizations to semi-automatically
preserve their digital collections by dynamically discovering and
invoking the most appropriate preservation service, as it is required.
By periodically comparing preservation metadata for digital objects in
a collection with a software version registry, potential object
obsolescence can be detected and a notification message sent to the
relevant agent. By making preservation software modules available as
Web services and describing them semantically using a
machine-processable ontology (OWL-S), the most appropriate preservation
service(s) for each object can then be automatically discovered,
composed and invoked by software agents (with optional human input at
critical decision-making steps). We believe that this approach
represents a significant advance towards providing a viable,
cost-effective solution to the long term preservation of large-scale
collections of digital objects.
|
|
|
W.J. Hutchins and H.L. Somers.
An Introduction to Machine Translation.
Academic Press, 1992.
Recent textbook on natural language translation.
|
|
|
Arwen Hutt and Jenn Riley.
Semantics and syntax of dublin core usage in open archives initiative
data providers of cultural heritage materials.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
This study analyzes metadata shared by cultural heritage institutions via the Open Archives Initiative Protocol for Metadata Harvesting. The syntax and semantics of metadata appearing in the Dublin Core fields creator, contributor, and date are examined. Preliminary conclusions are drawn regarding the effectiveness of Dublin Core in the Open Archives Initiative environment for cultural heritage materials.
|
|
|
Jason J. Hyon and Rosana Bisciotti Borgen.
Data archival and retrieval enhancement (DARE) metadata modeling
and its user interface.
In Proceedings of the First IEEE Metadata Conference, Silver
Spring, Md., April 1996. IEEE.
|
|
|
Ionut E. Iacob and Alex Dekhtyar.
xtagger: a new approach to authoring document-centric xml.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
The process of authoring document-centric XML documents in humanities disciplines is very different from the approach espoused by the standard XML editing software with the data-centric view of XML. Where data-centric XML is generated by first describing a tree structure of the encoding and then providing the content for the leaf elements, document-centric encodings start with content which is then marked up. In the paper we describe our approach to authoring document-centric XML documents and the tool, xTagger, originally developed for this purpose within the Electronic Boethius project, otherwise enhanced within the ARCHway project, an interdisciplinary project devoted to development of methods and software for preparation of image-based electronic editions of historic manuscripts.
|
|
|
Frank M. Shipman III, Haowei Hsieh, J. Michael Moore, and Anna Zacchi.
Supporting personal collections across digital libraries in spatial
hypertext.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
Creating, maintaining, or using a digital library
requires the manipulation of digital documents. Information workspaces
provide a visual representation allowing users to collect, organize,
annotate, and author information. The Visual Knowledge Builder (VKB)
helps users access, collect, annotate, and combine materials from
digital libraries and other sources into a personal information
workspace. VKB has been enhanced to include direct search interfaces
for NSDL and Google. Users create a visualization of search results
while selecting and organizing materials for their current activity.
Additionally, metadata applicators have been added to VKB. This
interface allows the rapid addition of metadata to documents and aids
the user in the extraction of existing metadata for application to
other documents. A study was performed to compare the selection and
organization of documents in VKB to the commonly used tools of a Web
browser and a word processor. This study shows the value of visual
workspaces for such effort but points to the need for sub-document
level objects, ephemeral visualizations, and support for moving from
visual representations to metadata.
|
|
|
The URN Implementors.
Uniform resource names: A progress report.
D-Lib Magazine, Feb 1996.
Format: HTML Document().
|
|
|
Palm Inc.
Palm os emulator.
Palm website: http://www.palm.com/dev/tech/tools/emulator/.
|
|
|
Palm Inc.
Web clipping development.
Palm website: http://www.palm.com/dev/tech/webclipping/.
|
|
|
The internet archive.
http://www.archive.org/.
|
|
|
Invisibleweb.com.
http://www.invisibleweb.com.
|
|
|
Panagiotis G. Ipeirotis, Tom Barry, and Luis Gravano.
Extending sdarts: Extracting metadata from web databases and
interfacing with the open archives initiative.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
SDARTS is a protocol and toolkit designed to facilitate
metasearching.
SDARTS combines two complementary existing protocols, SDLIP and STARTS, to
define a
uniform interface that collections should support for searching and exporting
metasearch-related metadata. SDARTS also includes a toolkit with wrappers that
are easily customized to make both local and remote document collections
SDARTS-compliant. This paper describes two significant ways in which we have
extended the SDARTS toolkit. First, we have added a tool that automatically
builds rich content summaries for remote web collections by probing the
collections
with appropriate queries. These content summaries can then be used by a
metasearcher
to select over which collections to evaluate a given query. Second, we have
enhanced
the SDARTS toolkit so that all SDARTS-compliant collections export their
metadata
under the emerging Open Archives Initiative OAI) protocol. Conversely, the
SDARTS
toolkit now also allows all OAI-compliant collections to be made SDARTS-
compliant
with minimal effort. As a result, we implemented a bridge between SDARTS and
OAI,
which will facilitate easy interoperability among a potentially large number of
collections. The SDARTS toolkit, with all related documentation and source code,
is
publicly available at http://sdarts.cs.columbia.edu.
|
|
|
ISO.
ISO 8777:1993 Information and Documentation - Commands for
Interactive Text Searching.
Int'l Organization for Standardization, Geneva, Switzerland, first
edition, 1993.
|
|
|
ISO/IEC.
ITU/ISO ODP Trading Function, 1997.
ISO/IEC IS 13235-1, ITU/T Draft Rec X950-1.
|
|
|
Melody Y. Ivory and Marti A. Hearst.
Statistical profiles of highly-rated web sites.
In CHI '02: Proceedings of the SIGCHI conference on Human
factors in computing systems, pages 367-374, New York, NY, USA, 2002. ACM
Press.
We are creating an interactive tool to help
non-professional web site builders create high
quality designs. We have previously reported that
quantitative measures of web page structure can
predict whether a site will be highly or poorly
rated by experts, with accuracies ranging from
67-80 several ways. First, we compute a much larger set of
measures (157 versus 11), over a much larger
collection of pages (5300 vs. 1900), achieving much
higher overall accuracy (94 contrasting good, average, and poor pages. Second,
we introduce new classes of measures that can make
assessments at the site level and according to page
type (home page, content page, etc.). Finally, we
create statistical profiles of good sites, and apply
them to an existing design, showing how that design
can be changed to better match high-quality designs
|
|
|
Ben Shneiderman Jack Kustanowitz.
Meaningful presentations of photo libraries: Rationale and
applications of bi-level radial quantum layouts.
In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2005.
|
|
|
A. K. Jain, M. N. Murty, and P. J. Flynn.
Data clustering: a review.
ACM Computing Surveys, 31(3):264-323, 1999.
|
|
|
Markus Jakobsson, Philip D. MacKenzie, and Julien P. Stern.
Secure and lightweight advertising on the web.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
We consider how to obtain a safe and efficient
scheme for Web advertising. We introduce to
cryptography the market model, a common concept from economics.
This corresponds to an assumption of rational
behavior of protocol participants. Making this assumption allows us
to design schemes that are highly efficient in
the common case - which is, when participants behave rationally. We
demonstrate such a scheme for Web advertising.
employing the concept of e-coupons. We prove that our proposed scheme
is safe and meets our stringent security requirements.
|
|
|
Frankie James.
Presenting html structure in audio: User satisfaction with audio
hypertext.
In ICAD '96 Proceedings, pages 97-103. ICAD, Xerox PARC,
November 1996.
Also appeared in Web Techniques entitled Experimenting with Audio
Interfaces, February 1998, vol 3 no 2, pages 55-58.
|
|
|
Frankie James.
Aha: Audio html access.
In Michael R. Genesereth and Anna Patterson, editors,
Proceedings of the Sixth International World Wide Web Conference. Published
in Computer Networks and ISDN Systems, vol.29, no.8-13, p. 1395-404,
0169-7552 Elsevier Sept. 1997, pages 129-140, Santa Clara, CA, April 1997.
IW3C.
|
|
|
Frankie James.
Distinguishability vs. distraction in audio html interfaces.
In Submitted to International Journal on Digital Libraries,
1997.
Analyzes results from a user study related to the AHA
(Audio HTML Access) framework, which tested three
audio browsers to determine the appropriateness of
certain types of audio markings for various HTML
structures. The results added another dimension to the
AHA framework, so that the principles outlined in it
for choosing sounds to use in an audio presentation of
HTML are now: (1) Vocal Source Identity (when to use
speaker changes to mark structures), (2)
Recognizability, and (3) Distraction (new)
|
|
|
Frankie James.
Presenting html structure in audio: User satisfaction with audio
hypertext.
CSLI Technical Report 97-201, Stanford University, 1997.
Available at http://dbpubs.stanford.edu/pub/1996-83.
|
|
|
Frankie James.
Lessons from developing audio html interfaces.
ACM SIGCAPH Conference on Assistive Technologies Proceedings of
the third international ACM conference on Assistive technologies. April
15-17, 1998, Marina del Rey, CA USA, pages 27-34., 1998.
Discusses application of the principles in the AHA framework to
the actual choice of sounds in scenario interfaces. By
looking at scenarios, we can see that other factors
related to users (such as musical ability, culture,
reading style, etc.) are needed in combination with the
AHA principles to select specific sounds.
|
|
|
Frankie James.
Lessons from developing audio html interfaces.
In ASSETS 98, pages 27-34, Marina del Rey, CA, April 1998. ACM
SIGCAPH.
|
|
|
Greg Janée and James Frew.
The adept digital library architecture.
In Proceedings of the Second ACM/IEEE-CS Joint Conference on
Digital Libraries, 2002.
The Alexandria Digital Earth ProtoType (ADEPT) architecture is a
framework for building distributed digital libraries of georeferenced
information.
An ADEPT system comprises of one or more autonomous libraries, each of which
provides a uniform interface to one or more collections, each of which manages
metadata for one or more items. The primary standard on which the architecture
is based is the ADEPT bucket framework, which defines uniform client-level
metadata query services that are compatible with heterogeneous underlying
collections. ADEPT functionality strikes a balance between the simplicity of
Web document delivery and the richness of Z39.50. The current ADEPT
implementation runs as servlet-based middleware and supports collections
housed in arbitrary relational databases.
|
|
|
Greg Janee, James Frew, and David Valentine.
Content access characterization in digital libraries [short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
To support non-trivial clients, such as data exploration and analysis
environments, digital libraries must be able to describe the access modes that their
contents support. We present a simple scheme that distinguishes four content accessibility
classes: download (byte-stream retrieval), service (API), web interface (interactive),
and alternative (semantically equivalent) or multipart (component) hierarchies. This
scheme is simple enough to be easily supported by DL content providers, yet rich enough
to allow programmatic clients to automatically identify appropriate access point(s).
|
|
|
William C. Janssen.
Collaborative extensions for the uplib system.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
The UpLib personal digital library system is
specifically designed for secure use by a single individual. However,
collaborative operation of multiple UpLib repositories is still
possible. This paper describes two mechanisms that have been added to
UpLib to facilitate community building around individual document
collections.
|
|
|
Charlotte Jenkins, Mike Jackson, Peter Burden, and Jon Wallis.
Automatic rdf metadata generation for resource discovery.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
Automatic metadata generation may provide a
solution to the problem of inconsistent,
unreliable metadata describing
resources on the Web. The Resource
Description Framework (RDF) provides a
domain-neutral foundation on which
extensible element sets can be defined and
expressed in a standard notation. This paper
describes how an automatic
classifier, that classifies HTML documents
according to Dewey Decimal Classification,
can be used to extract context
sensitive metadata which is then represented
using RDF. The process of automatic
classification is described and an
appropriate metadata element set is
identified comprising those elements that
can be extracted during classification. An
RDF data model and an RDF schema are defined
representing the element set and the
classifier is configured to output
the elements in RDF syntax according to the
defined schema.
|
|
|
Michael Jensen.
Need-based intellectual property protection and networked university
press publishing.
In IP Workshop Proceedings, 1994.
Format: HTML Document (22K).
Audience: Publishers, slightly technical..
References: 0.
Links: 0.
Relevance: Low-medium.
Abstract: A publisher's view on why publisher's won't be irrelevant.
Also, a description of the type of security that a publisher would expect
(ie, let's not worry about making it perfect, just reasonable.)
-a header based system, without details of how to ensure the restrictions
are obeyed. Gives a specific list of information that he thinks should be in
the header.
|
|
|
B-S. Jeong and E. Omiecinski.
Inverted file partitioning schemes in multiple disk systems.
IEEE Transactions on Parallel and Distributed Systems,
6(2):142-153, February 1995.
|
|
|
Henry N. Jerez, Xiaoming Liu, Patrick Hochstenbach, and Herbert Van de Sompel.
The multi-faceted use of the oai-pmh in the lanl repository.
In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on
Digital Libraries, 2004.
This paper focuses on the multifaceted use of the
OAI-PMH in a repository architecture designed to store digital assets
at the Research Library of the Los Alamos National Laboratory (LANL),
and to make the stored assets available in a uniform way to various
downstream applications. In the architecture, the MPEG-21 Digital Item
Declaration Language is used as the XML-based format to represent
complex digital objects. Upon ingestion, these objects are stored in a
multitude of autonomous OAI-PMH repositories. An OAI-PMH compliant
Repository Index keeps track of the creation and location of all those
repositories, whereas an Identifier Resolver keeps track of the
location of individual objects. An OAI-PMH Federator is introduced as a
single-point-of-access to downstream harvesters. It hides the
complexity of the environment to those harvesters, and allows them to
obtain transformations of stored objects. While the proposed
architecture is described in the context of the LANL library, the paper
will also touch on its more general applicability.
|
|
|
T. Joachims, D. Freitag, and T. Mitchell.
Webwatcher: A tour guide for the world wide web.
In Proceedings of IJCAI97, 1997.
We describe WebWatcher as a tour guide agent
for the web, the learning algorithms used by WebWatcher,
experimental results based on learning from thousands of
users, and lessons learned from this case study of tour guide
agents.
|
|
|
Robert Johansen.
Teleconferencing and beyond : communications in the office of
the future.
McGraw-Hill data communications book series. McGraw-Hill, 1984.
|
|
|
David B. Johnson and Willy E. Zwaenepoel.
Recovery in distributed systems using optimistic message logging and
checkpointing.
Journal of Algorithms, 11(3):462-491, September 1990.
|
|
|
Eric H. Johnson and Pauline A. Cochrane.
A hypertextual interface for a searcher's thesaurus.
In Proceedings of the Second Annual Conference on the Theory and
Practice of Digital Libraries, 1995.
Format: HTML Document(34K + pictures) .
Audience: Searchers, HCI people.
References: 10.
Links: 2.
Relevance: Low.
Abstract: Describes a MS Windows interface for searching using a
thesaurus of related terms. Has 3 parts: a hierarchical organization of the
terms, a cloud of related terms, and a keyword-in-context that tries to match
what
you type incrementally. The cloud and hierarchy are point and click, and the
hierarchy can be expanded and collapsed ala MS Word outline mode. Also capable
of handling multi-hierarchies, where a term has multiple roots.
|
|
|
Heidi Johnson.
Graded access to sensitive materials at the archive of the indigenous
languages of latin america [short paper].
In Proceedings of the Third ACM/IEEE-CS Joint Conference on
Digital Libraries, 2003.
The Archive of the Indigenous Languages of Latin America (AILLA) is a
web-accessible repository of multi-media resources in and about the indigenous
languages of Latin America. In this paper, I describe the Graded Access System
developed at AILLA to protect sensitive materials by allowing resource producers -
academics and indigenous people - finely-grained control over the resources they
house in the archive.
|
|
|
Christopher B. Jones, R. Purves, A. Ruas, M. Sanderson, M. Sester, M. van
Kreveld, and R. Weibel.
Spatial information retrieval and geographical ontologies an overview
of the spirit project.
In Proceedings of the 25th annual international ACM SIGIR
conference on Research and development in information retrieval, pages
387-388. ACM Press, 2002.
|
|
|
Matt Jones, Gary Marsden, Norliza Mohd-Nasir, Kevin Boone, and George Buchanan.
Improving web interaction on small displays.
In Proceedings of the Eighth International World-Wide Web
Conference, 1999.
Soon many people will retrieve information
from the Web using handheld, palmsized or even
smaller computers. Although these computers have dramatically
increased in sophistication, their display
size is - and will remain - much smaller than their conventional, desktop
counterparts. Currently, browsers for these
devices present Web pages without taking account of the very different
display capabilities. As part of a collaborative
project with Reuters, we carried out a study into the usability impact of small
displays for retrieval tasks. Users of the
small screen were 50subjects. Small screen users used a very
substantial number of scroll activities in attempting to complete the tasks. Our
study also provided us with interesting insights into the shifts in approach
users
seem to make when using a small screen device for retrieval. These results
suggest that the metaphors useful in a full screen desktop environment are not
the most
appropriate for the new devices. Design guidelines are discussed, here,
Proposing directed access methods for effective small screen interaction. In our
ongoing work, we are developing such 'meta-interfaces' which will sit between
the
small screen user and the `conventional' Web page.
|
|
|
Michael L.W. Jones, Robert H. Rieger, Paul Treadwell, and Geri K. Gay.
Live from the stacks: User feedback on mobile computers and wireless
tools for library patrons.
In Proceedings of the Fifth ACM International Conference on
Digital Libraries, 2000.
Digital library research is made more robust and effective
when end-user opinions and viewpoints inform the research, design and
development process. A rich understanding of user tasks and contexts
is especially necessary when investigation the use of mobile computers
in traditional and digital library environments, since the nature and scope
of the research questions at hand remain relatively undefined. This
paper outlines findings from a library technologies user survey and on-site
mobile library access prototype testing, and presents future research
directions that can be derived from the results of these two studies.
|
|
|
Steve Jones and Gordon Paynter.
Topic-based browsing within a digital library using keyphrases.
In Proceedings of the Fourth ACM International Conference on
Digital Libraries, 1999.
Many digital libraries are comprised of documents from disparate
sources that are independent of the rest of the collection in which
they reside. A user's ability to explore is severely curtailed when
each document stands in isolation; there is no way to navigate to
other, related, documents, or even to tell if such documents exist.
`We describe a method for automatically introducing topic-based
links into documents to support browsing in digital libraries.
Automatic keyphrase extraction is exploited to identify link
anchors, and keyphrase-based similarity measures are used to
select and rank destinations. Two implementations are described:
one that applies these techniques to existing WWW-based digital
library collections using standard HTML, and one that uses a
wider range of interface techniques to provide more sophisticated
linking capabilities. An evaluation shows that keyphrase-based
similarity measures work as well as a popular full-text retrieval
system for finding relevant destination documents.
|
|
|
Steve Jones and Gordon W. Paynter.
Human evaluatin of kea, an automatic keyphrasing system.
In Proceedings of the First ACM/IEEE-CS Joint Conference on
Digital Libraries, 2001.
This paper describes an evaluation of the Kea automatic
keyphrase extraction algorithm. Tools that automatically identify
keyphrases are desirale because document keyphrases have numerous
applications in digital library sysems, but are costly and time consuming
to manually assign. Keyphrase extraction algorithms are usually evaluated
by comparison to author-specified keywords, but this methodology has several
well-known shortcomings. The results presented in this pper are based on
subjective evaluations of the quality and appropriateness of keyphrases by
human assessors, and make a number of contributions. First they validate
previous evaluations of Kea that rely on author keywords. Second, they show
Kea's performance is comparable to that of similar systems that have been
evaluated by human assessors. Finally, they justify the use of author
keyphrases as a performance metric by showing that authors generally choose
good keywords.
|
|
|
Javaserver pages (jsp) technology.
http://java.sun.com/products/jsp/.
|
|
|
Jesper Juhne, Anders T. Jensen, and Kaj Gronbaek.
Ariadne: a java-based guided tour system for the world wide web.
In Proceedings of the Seventh International World-Wide Web
Conference, 1998.
|
|
|
Volker Jung.
Metaviz: Visual interaction with geospatial digital libraries.
Technical Report TR-99-017, International Computer Science Insitute,
1999.
|
|
|
Eija Kaasinen, Matti Aaltonen, Juha Kolari, Suvi Melakoski, and Timo Laakko.
Two approaches to bringing internet services to wap devices.
In Proceedings of the Ninth International World-Wide Web
Conference, 2000.
|
|
|
Charles Kacmar, Susan Hruska, Chris Lacher, Dean Jue, Christie Koontz, Myke
Gluck, and Stuart Weibel.
An architecture and operation model for a spatial digital library.
In Proceedings of the First Annual Conference on the Theory and
Practice of Digital Libraries, 1994.
Format: HTML Document (32K) .
Audience: Mostly non-technical, funders, Geographic IS people.
|