Digital Library Bibliography

	Kenneth R. Abbott and Sunil K. Sarin. Experiences with workflow management: Issues for the next generation. In Richard Furuta and Christine Neuwirth, editors, CSCW '94, New York, 1994. ACM. Workflow management is a technology that is considered strategically important by many businesses, and its market growth shows no signs of abating. It is, however, often viewed with skepticism by the research community, conjuring up visions of oppressed workers performing rigidly-defined tasks on an assembly line. Although the potential for abuse no doubt exists, workflow management can instead be used to help individuals manage their work and to provide a clear context for performing that work. A key challenge in the realization of this ideal is the reconciliation of workflow process models and software with the rich variety of activities and behaviors that comprise ``real'' work. Our experiences with the InConcert workflow management system are used as a basis for outlining several issues that will need to be addressed in meeting this challenge. This is intended as an invitation to CSCW researchers to influence this important technology in a constructive manner by drawing on research and experience.
	Tarek F. Abdelzaher and Nina Bhatti. Web content adaptation to improve server overload behavior. In Proceedings of the Eighth International World-Wide Web Conference, 1999. This paper presents a study of Web content adaptation to improve server overload performance, as well as an implementation of a' Web content adaptation software prototype. When the request rate on a Web server increases beyond server capacity, the server becomes overloaded and unresponsive. The TCP listen queue of the server's socket overflows exhibiting a drop-tail behavior. As a result, clients experience service outages. Since clients typically issue multiple requests over the duration of a session with the server, and since requests are dropped indiscriminately, all clients connecting to the server at overload are likely to experience connection failures, even though there may be enough capacity on the server to deliver all responses properly for a subset of clients. In this paper, we propose to resolve the overload problem by adapting delivered content to load conditions to alleviate overload. The premise is that successful delivery of a less resource intensive content under overload is more desirable to clients than connection rejection or failures.
	Serge Abiteboul, Sophie Cluet, and Tova Milo. Querying and updating the file. In Proceedings of the Nineteenth International Conference on Very Large Databases, pages 73-84, Dublin, Ireland, 1993. VLDB Endowment, Saratoga, Calif.
	Serge Abiteboul, Sophie Cluet, and Tova Milo. Correspondence and translation for heterogeneous data. In Proceedings of the 6th International Conference on Database Theory, Delphi, Greece, 1997. Springer, Berlin.
	Marc Abrams, Constantinos Phanouriou, Alan L. Batongbacal, Stephen M. Williams, and Jonathan E. Shuster. Uiml: An appliance-independent xml user interface language. In Proceedings of the Eighth International World-Wide Web Conference, 1999. Today's Internet appliances feature user interface technologies almost unknown a few years ago: touch screens, styli, handwriting and voice recognition, speech synthesis, tiny screens, and more. This richness creates problems. First. different appliances use different languages: WML for cell phones; SpeechML, JSML, and VoxML for voice enabled devices such as phones; HTML and XUL for desktop computers, and so on. Thus, developers must maintain multiple source code families to deploy interfaces to one information system on multiple appliances. Second, user interfaces differ dramatically in complexity (e.g, PC versus cell phone interfaces). Thus, developers must also manage interface content. Third, developers risk writing appliance-specific interfaces for an appliance that might not be on the market tomorrow. A solution is to build interfaces with a single, universal language free of assumptions about appliances and interface technology. This paper introduces such a language, the User Interface Markup Language (UIML), an XML-compliant language. UIML insulates the interface designer from the peculiarities of different appliances through style sheets. A measure of the power of UIML is that it can replace hand-coding of Java AWT or Swing user interfaces.
	Mark S. Ackerman. Providing social interaction in the digital library. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document(12K) . Audience: Non-technical, digital library researchers/funders. References: 13. Links: 2. Relevance: Low-medium. Abstract: Argues that social aspects of collaboration must be included in a Digital Library for the informal, organizational things that aren't always available in information sources. Mentions a TCL based system called CAFE that adds functionality of messages, bulletin boards, and talk.
	Mark S. Ackerman and Roy T. Fielding. Collection maintenance in the digital library. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document(39K + pictures) . Audience: Librarians, web masters. References: 27. Links: 2. Relevance: Low. Abstract: Discusses the problem of collection maintenance in the digital domain, and argues that while some traditional practices will carry over, new methods will have to be created, esp. for dynamic and informal resources. S uggests that some maintenance can be done automatically by agents, and gives 2 examples: MOMSpider, which checks to make sure links are still current and Web:Lookout which notifies user when `interesting` changes are made to a watched page.
	Michael J. Ackerman. Accessing the visible human project. D-Lib Magazine, October 1995. Format: HTML Document(11K). Audience: Medical professionals,. References: 1. Links: 5. Relevance: None. Abstract: Describes the Visible Human Project (1 mm cross sections of two cadavers), how to obtain the images, how large they are, what IP agreements need to be signed.
	R. Acuff, L. Fagan, T. Rindfleisch, B. Levitt, and P. Ford. Lightweight, mobile e-mail for intra-clinic communication. In Proceedings of the 1997 AMIA Annual Fall Symposium, pages 729-33, Oct 1997.
	N. Adam, Y. Yesha, B. Awerbuch, K. Bennet, B. Blaustein, A. Brodsky, R. Chen, O. Dogramaci, B. Grossman, R. Holowczak, J. Johnson K. Kalpakis, C. McCollum, A.-L. Neches, B. Neches, A. Rosenthal, J. Slonim, H. Wactlar, and O. Wolfson. Strategic directions in electronic commerce and digital libraries: towards a digital agora. ACM Computing Surveys, 28(4):818-35, December 1996. The paper examines the research requirements of electronic commerce and digital libraries in six key areas. It provides case studies that describe three electronic commerce research projects (USC-ISI, CommerceNet, First Virtual) and six digital libraries projects sponsored by an NSF/ARPA/NASA initiative. The paper focuses on the following common areas of EC and DL research: acquiring and storing information; finding and filtering information; securing information and auditing access; universal access; cost management and financial instruments; and socio-economic impact.
	Anne Adams and Ann Blandford. Digital libraries’ support for the user’s ‘information journey’. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. The temporal elements of users’ information requirements are a continually confounding aspect of digital library design. No sooner have users’ needs been identified and supported than they change. This paper evaluates the changing information requirements of users through their ‘information journey’ in two different domains (health and academia). In-depth analysis of findings from interviews, focus groups and observations of 150 users have identified three stages to this journey: information initiation, facilitation (or gathering) and interpretation. The study shows that, although digital libraries are supporting aspects of users’ information facilitation, there are still requirements for them to better support users’ overall information work in context. Users are poorly supported in the initiation phase, as they recognize their information needs, especially with regard to resource awareness; in this context, interactive press-alerts are discussed. Some users (especially clinicians and patients) also required support in the interpretation of information, both satisfying themselves that the information is trustworthy and understanding what it means for a particular individual.
	Eytan Adar and Jeremy Hylton. On-the-fly hyperlink creation for page images. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document () . Audience: Digital library researchers. References: 9. Links: 0. Relevance: Low. Abstract: Store pages as bitmaps, and retrieve a cite when user clicks on it, by doing OCR, then passing relevant line to library catalog, as 12 queries of 3 words each (randomly selected from the line) and returning the best scoring results. Somewhat robust to typos in cites, but not too slow.
	Paul S. Adler and Terry Winograd, editors. Usability : turning technologies into tools. Oxford University Press, 1992.
	Eugene Agichtein and Luis Gravano. Snowball: Extracting relations from large plain-text collections. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. We explore a technique for extracting such tables from document collections that requires only a handful of training examples from users. these examples are used to generate extraction patterns, that in turn result in new tuples being extracted from the document collection. We build on this idea and present our Snowball system. Snowball introduces novel strategies for generating patterns and extracting tuples from plain-text documents. At each iteration of the extraction process, Snowball evaluates the quality of these patterns and tuples without human intervention, and keeps only the most reliable ones for the next iteration. In this paper we also develop a scalable evaluation methodology and metrics for our task, and present a thorough experimental evaluation of Snowball and comparable techniques over a collection for more than 300,000 newspaper documents.
	Maristella Agosti, Nicola Ferro, and Nicola Orio. Annotating illuminated manuscripts: an effective tool for research and education. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. The aim of this paper is to report the research results of an ongoing project that deals with the exploitation of a digital archive of drawings and illustrations of historic documents for research and education purposes. According to the results on a study of user requirements, we designed tools to provide researchers with novel ways for accessing the digital manuscripts, sharing, and transferring knowledge in a collaborative environment. Annotations are proposed for making explicit the results of scientific research on the relationships between images belonging to manuscripts produced in a time span of centuries. For this purpose, a taxonomy for linking annotation is proposed, together with a conceptual schema for representing annotations and for linking them to digital objects.
	Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proceedings of the International Conference on Management of Data, pages 207-216. ACM Press, 1993.
	Alfred Aho, John Hopcroft, and Jeffrey Ullman. Data Structures and Algorithms. Addison-Wesley, 1983.
	T. Alanko, M. Kojo, M. Liljeberg, and K. Raatikainen. Mowgli: improvements for internet applications using slow wireless links. In Waves of the Year 2000+ PIMRC '97. The 8th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications. Technical Program, Proceedings (Cat. No.97TH8271), volume 3, pages 1038-42, 1997. Modern cellular telephone systems extend the usability of portable personal computers enormously. A nomadic user can be given ubiquitous access to remote information stores and computing services. However, the behavior of wireless links creates severe inconveniences within the traditional data communication paradigm. We give an overview of the problems related to wireless mobility. We also present a new software architecture for mastering the problems and discuss a new paradigm for designing mobile distributed applications. The key idea in the architecture is to place a mediator, a distributed intelligent agent, between the mobile node and the wireline network.
	Reka Albert, Albert-Laszlo Barabasi, and Hawoong Jeong. Diameter of the World Wide Web. Nature, 401(6749), September 1999.
	Alexa internet inc. http://www.alexa.com.
	R. B. Allen. Interface issues for interactive multimedia documents. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	Robert B. Allen. Navigating and searching in hierarchical digital library catalogs. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (21K) . Audience: non technical, users. References: 15. Links: 2. Relevance: Low. Abstract: Describes a particular user interface based on a book shelf metaphor. Tries to use an a priori classification (Dewey Decimal System) as an organization tool (in addition to results of electronic searches).
	Robert B. Allen. Two digital library intefaces which exploit hierarchical structure. In DAGS '95, 1995. Format: HTML Document(33K + pictures) . Audience: General Computer scientists, HCI . References: 22. Links: 1. Relevance: Low-Medium. Abstract: Uses metaphor of hierarchical Dewey Decimal system or faceted (implying a DAG) ACM literature categories to aid UI. Shows graphically where in the hierarchy hits were found for a search.
	Robert B. Allen. A query interface for an event gazetteer. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. We introduce the idea of an ``event gazetteer'' that stores and presents locations in time. Each event is coded as a schema with attributes of event type, location, actor, and beginning and ending times. Sets of events can be collected as timelines and the events on these timelines can be linked by annotations. The system has been built with JSP and Oracle. Systematic metadata is essential for effective interaction with this system. For instance, the actors may be described by the roles in which they participate. In this paper, we focus on the construction of queries for this complex metadata. Ultimately, we envision a flexible, broad-based service that is a resource for users ranging from students to genealogists interested in events.
	Robert B. Allen. A multi-timeline interface for historical newspapers. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Events may be are best understood in the context of other events. Because of the temporal ordering, we can call a set of related events a timeline. Even such timelines are best understood in the context of other timelines. To facilitate the exploration of a collection of timelines and events, a visualization tool has been developed that structures the user's browsing. In this model, each event is accompanied by a text description and links to related resources. In particular, this system can provide a browsing interface of digitized historical newspapers.
	Robert B. Allen and Jane Acheson. Browsing the structure of multimedia stories. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Stories may be analyzed as sequences of causally-related events and reactions to those events by the characters. We employ a notation of plot elements, similar to one developed by Lehnert, and we extend that by forming higher level `story threads`. This notation requires that events and reactions be linked and that the chains of links be terminated back to the beginning of the story. Furthermore, we have built a browser for the plot elements, the story threads, and associated multimedia. We apply the browser to Corduroy, a children's short feature which was analyzed in detail. We provide additional illustrations with analysis of Kiss of Death, a Film Noir classic. Effectively, the browser provides a framework for interactive summaries of the narrative.
	Open Mobile Alliance. Wireless application protocol. http://www.openmobilealliance.org/tech/affiliates/wap/wapindex.html#wap20, 2001. The WAP Web site from where the specs are available.
	Virgilio Almeida, Azer Bestavros, Mark Crovella, and Adriana de Oliveira. Characterizing reference locality in the www. In Proceedings of PDIS'96: The IEEE Conference on Parallel and Distributed Information Systems, 1996.
	Virgilio A.F. Almeida, Wagner Meira Jr., Vicotr F. Ribeiro, and Nivio Ziviani. Efficiency analysis of brokers in the electronic marketplace. In Proceedings of the Eighth International World-Wide Web Conference, 1999. In this paper we analyze the behavior of e-commerce users based on actual logs from two large non-English e-brokers. We start by presenting a quantitative study of the behavior of e-brokers and discuss the influence of regional and cultural issues on them. We then discuss a model that quantifies the efficiency of the results provided by brokers in the electronic marketplace. This model is a function of factors such as server response time and regional factors. Our findings clearly indicate that e-commerce is strongly tied to local language, national customs and regulations, currency conversion and logistics, and Internet infrastructure. We found that the behavior of customers of online bookstores is strongly affected by brand and regional factors. Music CD shoppers show a different behavior that might stem from the fact that music is universal and not so language dependent.
	Altavista incorporated. http://www.altavista.com.
	Amazon inc. http://www.amazon.com.
	Jose-Luis Ambite and Craig A. Knoblock. Reconciling distributed information sources. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	B. Amento, L. Terveen, and W. Hill. Does authority mean quality? Predicting expert quality ratings of web documents. In Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2000. evaluating different link based ranking techniques
	Einat Amitay, Nadav Har'El, Ron Sivan, and Aya Soffer. Web-a-where: geotagging web content. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 273-280. ACM Press, 2004.
	E. Amoroso. Fundamentals of Computer Security Technology. Prentice Hall, Englewood Cliffs, NJ., 1994.
	H. Anan, X. Liu, K. Maly, M. Nelson, M. Zubair, J. C. French, E. Fox, and P. Shivakumar. Preservation and transition of ncstrl using an oai-based architecture. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. NCSTRL (Networked Computer Science Technical Reference Library) is a federation of digital libraries providing computer science materials. The architecture of the original NCSTRL was based largely on the Dienst software. It was implemented and maintained by the digital library group at Cornell University until September 2001. At that time, we had an immediate goal of preserving the existing NCSTRL collection and a long-term goal of providing a framework where participating organizations could continue to disseminate technical publications. Moreover, we wanted the new NCSTRL to be based on OAI (Open Archives Initiative) principles that provide a framework to facilitate the discovery of content in distributed archives. In this paper, we describe our experience in moving towards an OAI-based NCSTRL.
	Dan Ancona, Jim Frew, Greg Jan‰e, and Dave Valentine. Accessing the alexandria digital library from geographic information systems. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. We describe two experimental desktop library clients that offer improved access to geospatial data via the Alexandria Digital Library (ADL): ArcADL, an extension to ESRI's ArcView GIS, and vtADL, an extension to the Virtual Terrain Project's Enviro terrain visualization package. ArcADL provides a simplified user interface to ADL's powerful underlying distributed geospatial search technology. Both clients use the ADL Access Framework to access library data that is available in multiple formats and retrievable by multiple methods. Issues common to both clients and future scenarios are also considered.
	Kenneth M. Anderson, Aaron Andersen, Neet Wadhwani, and Laura M. Bartolo. Metis: Lightweight, flexible, and web-based workflow services for digital libraries. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. The Metis project is developing workflow technology designed for use in digital libraries by avoiding the assumptions made by traditional workflow systems. In particular, digital libraries have highly distributed sets of stakeholders who nevertheless must work together to perform shared activities. Hence, traditional assumptions that all members of a workflow belong to the same organization, work in the same fashion, or have access to similar computing platforms are invalid. The Metis approach makes use of event-based workflows to support the distributed nature of digital library workflow and employs techniques to make the resulting technology lightweight, flexible, and integrated with the Web. This paper describes the conceptual framework behind the Metis approach as well as a prototype which implements the framework. The prototype is evaluated based on its ability to model and execute a workflow drawn from a `real-world` digital library. After describing related work, the paper concludes with a discussion of future research opportunities in the area of digital library workflow and outlines how Metis is being deployed to a small set of digital libraries for additional evaluation.
	R. Anderson and M. Kuhn. Tamper resistance-a cautionary note. In Proceedings of the Second USENIX Workshop on Electronic Commerce, Berkeley, CA, USA, 1996. USENIX Assoc. An increasing number of systems, from pay-TV to electronic purses, rely on the tamper resistance of smartcards and other security processors. We describe a number of attacks on such systems some old, some new and some that are simply little known outside the chip testing community. We conclude that trusting tamper resistance is problematic; smartcards are broken routinely, and even a device that was described by a government signals agency as the most secure processor generally available' turns out to be vulnerable. Designers of secure systems should consider the consequences with care.
	R. Anderson, C. Manifavas, and C. Sutherland. Netcard - a practical electronic cash system. In Fourth Cambridge Workshop on Security Protocols, 1996.
	R.C. Angell, G.E. Freund, and P. Willett. Automatic spelling correction using a trigram similarity measure. Information Processing and Management, 19(4):255-261, 1983.
	ANSI/NISO. Information Retrieval: Application Service Definition and Protocol Specification, April 1995. Available at http://lcweb.loc.gov/z3950/agency/document.html.
	Vinod Anupam, Alain Mayer, Kobbi Nissim, Benny Pinkas, and Michael K. Reiter. On the security of pay-per-click and other web advertising schemes. In Proceedings of the Eighth International World-Wide Web Conference, 1999. We present a hit inflation attack on pay-per- click Web advertising schemes. Our attack is virtually impossible for the program provider to detect conclusively, regardless of whether the provider is a third- party `ad network` or the target of the click itself. If practiced widely, this attack could accelerate a move away from pay- per-click program, and toward programs in which referrers are paid only if the referred user subsequently makes a purchase (pay-per-sale) or engages in other substantial activity at the target site (pay-per-lead). We also briefly discuss the lack of auditability inherent in these schemes.
	Kyoichi Arai, Teruo Yokoyama, and Yutaka Matsushita. A window sytems with leafing through mode: Bookwindow. In Proceedings of the Conference on Human Factors in Computing Systems CHI'92, 1992.
	Avi Arampatzis, Marc van Kreveld, Iris Reinbacher, Paul Clough, Hideo Joho, Mark Sanderson, Christopher B. Jones, Subodh Vaid, Marc Benkert, and Alexander Wolff. Web-based delineation of imprecise regions. In Proceedings of the Workshop on Geographic Information Retrieval, 2004.
	Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, and Sriram Raghavan. Searching the web. ACM Transactions on Internet Technology, 2001. Submitted for publication. Available at http://dbpubs.stanford.edu/pub/2000-37. We offer an overview of current Web search engine design. After introducing a generic search engine architecture, we examine each engine component in turn. We cover crawling, local Web page storage, indexing, and the use of link analysis for boosting search performance. The most common design and implementation techniques for each of these components are presented. We draw for this presentation from the literature, and from our own experimental search engine testbed. Emphasis is on introducing the fundamental concepts, and the results of several performance analyses we conducted to compare different designs.
	William Y. Arms. Key concepts in the architecture of the digital library. D-Lib Magazine, Jul 1995. Format: HTML Document(18K + pictures). Audience: computer scientists, digital library researchers. References: 1. Links: 3. Relevance: Medium-low. Abstract: Outlines 8 principles that are important to DLs, a combination of social/economic issues (avoid using words like ``copy'' and ``publish'') and technical ones (basically a sales pitch for the Kahn/Wilensky model of handles, maintenance, and access control.)
	William Y. Arms. Key concepts in the architecture of the digital library. D-Lib Magazine, July 1995.
	R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell. Webwatcher: A learning apprentice for the world wide web. In AAAI Spring Symposium on Information Gathering, 1995. We describe an information seeking assistant for the world wide web. This agent, called WebWatcher, interactively helps users locate desired information by employing learned knowledge about which hyperlinks are likely to lead to the target information.
	Robert Armstrong, Dayne Freitag, Thorsten Joachims, and Tom Mitchell. Webwatcher: A learning apprentice for the world wide web. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Kenneth Arnold. The body in the virtual library: Rethinking scholarly communication. In JEP. Format: HTML Document (41K) . Audience: Scholars, publishers (esp. university press), librarians. References: 10. Links: 1. Relevance: Low-Medium. Abstract: Discusess the future of university presses, in pretty grim terms. Suggests that they lack the capital, staff, and quick reaction time to survive in an electronic world. Considers the Mellon report on scholarly comm unication (which suggests universities get copyrights on books their faculty produce) unreasonable. Thinks that relying on commercial network providers (esp. cable, telecom) would be disastrous. Advocates a non-profit distribution ne twork for scholarly publication.
	Kenneth Arnold. The electronic librarian is a verb/the electronic library is not a sentence. In JEP, 1994. Format: HTML Document (49K) . Audience: Librarians, policy makers. References: 10. Links: 1. Relevance: low. Abstract: A vision of the networked library. Sees the real value of librarians as creating `attention structures` which anticipate the way clients search.
	Dennis S. Arnon. Scrimshaw: a language for document queries and transformations. Electronic Publishing: Origination, Dissemination and Design, 6(4):361-372, December 1993.
	J. Ashley, M. Flickner, J. Hafner, D. Lee, W. Niblack, and D. Petkovic. The query by image content (QBIC) system. In Proceedings of the International Conference on Management of Data (SIGMOD). ACM Press, 1995.
	N. Asokan, P.A. Janson, M. Steiner, and M. Waidner. The state of the art in electronic payment systems. Computer, 30(9):28-35, September 1997. `The exchange of goods conducted face-to-face between two parties dates back to before the beginning of recorded history. Traditional means of payment have always had security problems, but now electronic payments retain the same drawbacks and add some risks. Unlike paper, digital`documentscan be copied perfectly and arbitrarily often, digital signatures can be produced by anybody who knows the secret cryptographic key, and a buyer's name can be associated with every payment, eliminating the anonymity of cash. Without new security measures, widespread electronic commerce is not viable. On the other hand, properly designed electronic payment systems can actually provide better security than traditional means of payments, in addition to flexibility. This article provides an overview of electronic payment systems, focusing on issues related to security.
	Active Server Pages technology. http://msdn.microsoft.com/workshop/server/asp/aspfeat.asp.
	R. Atkinson, A. Demers, C. Hauser, C. Jacobi, P. Kessler, and M. Weiser. Experiences creating a portable cedar. SIGPLAN Not. (USA), SIGPLAN Notices, 24(7):322-8, 1989. The authors have recently re-implemented the Cedar language to make it portable across many different architectures. The strategy was, first, to use machine-dependent C code as an intermediate language, second, to create a language-independent layer known as the Portable Common Runtime, and third, to write a relatively large amount of Cedar-specific runtime code in a subset of Cedar itself. The paper presents a brief description of the Cedar language, the portability strategy for the compiler and runtime, the manner of making connections to other languages and the Unix operating system, and some performance measures of the Portable Cedar.
	Neal Audenaert, Richard Furuta, Eduardo Urbina, Jie Deng, Carlos Monroy, Rosy Sáenz, and Doris Careaga. Integrating collections at the cervantes project. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Unlike many efforts that focus on supporting scholarly research by developing large-scale, general resources for a wide range of audiences, we at the Cervantes Project have chosen to focus more narrowly on developing resources in support of ongoing research about the life and works of a single author, Miguel de Cervantes Saavedra (1547-1616). This has lead to a group of hypertextual archives, tightly integrated around the narrative and thematic structure of Don Quixote. This project is typical of many humanities research efforts and we discuss how our experiences inform the broader challenge of developing resources to support humanities research.
	Cyrus Azarbod and William Perrizo. Building concept hierarchies for schema integration in hddbs using incremental concept formation. In B. Bhargava, T. Finin, and Y. Yesha, editors, CIKM 93. Proceedings of the Second International Conference on Information and Knowledge Management, pages 732-734, Washington, D.C., November 1993. ACM.
	Sulin Ba, Aimo Hinkkanen, and Andre B. Whinston. Digital library as a foundation for decision support systems. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (43K) . Audience: Semi-technical, business slant, funding proposal. References: 14. Links: 1. Relevance: Low. Abstract: Sees a DL as an enterprise wide collection of executable documents. SGML and Mathematica suggested as integration tools. Search for data representation which will allow automatic combination of separate documents to solve problems.
	D. Bachiochi, M. Berstene, E. Chouinard, N. Conlan, M. Danchak, T. Furey, C. Neligon, and D. Way. Usability studies and designing navigational aids for the world wide web. In Proceedings of the Sixth International World-Wide Web Conference, 1997.
	B. R. Badrinath. Distributed computing in mobile environments. Computers & Graphics, 20(5):615-17, 1996. Rapid progress in hardware has led to the availability of portable personal computers ranging from laptops to hand-held computers (PDAs and Internet terminals). The presence of wireless connectivity gives these hand-held units the capability of accessing information anywhere, at any time. These mobile units can be considered to be part of a worldwide distributed information system. Distributed computing in mobile environments faces new challenges as more and more mobile hosts become an integral part of a distributed system. Problems in distributed computing in mobile environments are due to: (1) mobility, (2) wireless and (3) resource constraints at the mobile host. In this paper, we discuss the impact of these factors and research issues that need to be addressed in mobile distributed systems.
	Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley-Longman, May 1999. The chapters of the book are: Introduction Modeling Retrieval Evaluation Query Languages (with Gonzalo Navarro) Query Operations Text and Multimedia Languages and Properties Text Operations (with Nivio Ziviani) Indexing and Searching (with Gonzalo Navarro) Parallel and Distributed IR (by Eric Brown) User Interfaces and Visualization (by Marti Hearst) Multimedia IR: Models and Languages (by Elisa Bertino, Barbara Catania and Elena Ferrari) Multimedia IR: Indexing and Searching (by Christos Faloutsos) Searching the Web Libraries and Bibliographic Systems (by Edie Rasmussen) Digital Libraries (by Edward Fox and Ohm Sornil) Appendix: Porter's Algorithm Glossary References (more than 800) Index More information can be found in: http://www.sims.berkeley.edu/ hearst/irbook
	David Bainbridge, Craig G. Nevill-Manning, Ian H. Witten, Lloyd A. Smith, and Rodger J. McNab. Towards a digital library of popular music. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. Digital libraries of music have the potential to capture popular imagination in ways that more scholarly libraries cannot. we are working towards a comprehensive digital library of musical material, including popular music. We have developed new ways of collecting musical material, accessing it through searching and browsing, and presenting the results to the user. We work with different representations of music: facsimile images of scores, the internal representation of a music editing program, page images typeset by a music editor, MIDI files, audio files representing sung user input, and textual metadata such as title, composer and arranger, and lyrics. This paper describes a comprehensive suite of tools that we have built for this project. These tools gather musical material, convert between many of these representations, allow searching based on combined musical and textual criteria, and help present the results of searching and browsing. Although we do not yet have a single fully-blown digital music library, we have built several exploratory prototype collections of music, some of them very large (100,000 tunes), and critical components of the system have been evaluated.
	David Bainbridge, John Thompson, and Ian H. Witten. Assembling and enriching digital library collections. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. People who create digital libraries need to gather together the raw material, add metadata as necessary, and design and build new collections. This paper sets out the requirements for these tasks and describes a new tool that supports them interactively, making it easy for users to create their own collections from electronic files of all types. The process involves selecting documents for inclusion, coming up with a suitable metadata set, assigning metadata to each document or group of documents, designing the form of the collection in terms of document formats, searchable indexes, and browsing facilities, building the necessary indexes and data structures, and putting the collection in place for others to use. All these tasks are supported within a modern point-and-click interaction paradigm. Although the tool is specific to the Greenstone digital library software, the underlying ideas should prove useful in more general contexts.
	M. Baker. Changing communication environments in mosquitonet. In Proceedings of the IEEE Workshop on Mobile Computing Systems and Applications, Dec 1994.
	M. Baker, X. Zhao, S. Cheshire, and J. Stone. Supporting mobility in mosquitonet. In Proceedings of the 1996 USENIX Conference, Jan 1996.
	Scott Baker and John H. Hartman. The gecko nfs web proxy. In Proceedings of the Eighth International World-Wide Web Conference, 1999. The World-Wide Web provides remote access to pages using its own naming scheme (URLs). transfer protocol (HTTP), and cache algorithms. Not only does using these special-purpose mechanisms have performance implications, but they make it impossible for standard Unix applications to access the Web. Gecko is a system that provides access to the Web via the NFS protocol. URLs are mapped to Unix file names, providing unmodified applications access to Web pages; pages are transferred from the Gecko server to the clients using NFS instead of HTTP. significantly improving performance; and NFS's cache consistency mechanism ensures that all clients have the same version of a page. Applications access pages as they would Unix files. A client-side proxy translates HTTP requests into file accesses, allowing existing Web applications to use Gecko. Experiments performed on our prototype show that Gecko is able to provide this additional functionality at a performance level that exceeds that of HTTP.
	Scott M. Baker and Bongki Moon. Distributed cooperative web servers. In Proceedings of the Eighth International World-Wide Web Conference, 1999. Traditional techniques for a distributed web server design rely on manipulation of central resources, such as routers or DNS services, to distribute requests designated for a single IP address to multiple web servers. The goal of the distributed cooperative Web server (DCWS) system development is to explore application-level techniques for distributing web content. We achieve this by dynamically manipulating the hyperlinks stored within the web documents themselves. The DCWS system effectively eliminates the bottleneck of centralized resources, while balancing the load among distributed web servers. DCWS servers may be located in different networks, or even different continents and still balance load effectively. DCWS system design is fully compatible with existing HTTP protocol semantics and existing web client software products.
	M. Balabanovic and Y. Shoham. Learning information retrieval agents: Experiments with automated web browsing. In AAAI spring symposium on Information Gathering, 1995. The current exponential growth of the Internet precipitates a need for new tools to help people cope with the volume of information. To complement recent work on creating searchable indexex of the World-Wide Web and systems for filtering incoming e-mail and Usenet news articles, we describe a system which helps users keep abreast of new and interesting information. Every day it presents a selection of interesting web pages. The user evaluates each page, and given this feedback the system adapts and attempts to produce better pages the following day. We prsent some early results from an AI programming class to whom this was set as a project, and then describe our current implementation. Over the course of 24 days the output of our system was compared to both randomly-selected and human-selected pages. It consistently performed better than the random pages, and was better than the human-selected pages half of the time.
	M. Balabanovic and Y. Shoham. Fab: content-based collaborative recommendation. Communications of the ACM, 40(3):66-72, March 1997. Online readers are in need of tools to help them cope with the mass of content that is available on the World Wide Web. In traditional media, readers are provided assistance in making selections. This includes both implicit assistance in the form of editorial oversight and explicit assistance in the form of recommendation services such as movie reviews and restaurant guides. The electronic medium offers new opportunities to create recommendation services, ones that adapt over time to track users' evolving interests. Fab is such a recommendation system for the Web, and has been operational in several versions since December 1994. By combining both collaborative and content-based filtering systems, Fab may eliminate many of the weaknesses found in each approach.
	M. Balabanovic, Y. Shoham, and Y. Yun. An adaptive agent for automated web browsing. Journal of Visual Communication and Image Representation, 6(4), December 1995.
	Marko Balabanovic. An adaptive web page recommendation service. In Proceedings of the First International Conference on Autonomous Agents p. 378-385, February 1997.
	Marko Balabanovic. Exploring versus exploiting when learning user models for text recommendation. User Modeling and User-Adapted Interaction (to appear), 8(1), 1998.
	Marko Balabanovic. An interface for learning multi-topic user profiles from implicit feedback. Technical Report SIDL-WP-1998-0089, Stanford University, 1998.
	Marko Balabanovic. The ``slider'' interface. IBM interVisions, 11, February 1998.
	Marko Balabanovic, Lonny L. Chu, and Gregory J. Wolff. Storytelling with digital photographs. In CHI '00: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 564-571, New York, NY, USA, 2000. ACM Press.
	Marko Balabanovic and Yoav Shoham. Learning inforamtion retrieval agents: Experiments with automated web browsing. In Proceedings of the AAAI Spring Symposium on Information Gathering from Heterogenous, Distributed Resources, 1995. Format: Compressed PostScript
	Marko Balabanovic and Yoav Shoham. Combining content-based and collaborative recommendation. Communications of the ACM, 40(3), March 1997.
	Marko Balabanovic, Yoav Shoham, and Yeogirl Yun. An adaptive agent for automated web browsing. Journal of Visual Communication and Image Representation, 6(4), December 1995. you give agent profile. It looks at the Web for things of interest and reports back. You give feedback
	Michelle Baldonado. Searching, browsing, and metasearching with sensemaker. Web Techniques Magazine, May 1997.
	Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke. Metadata for digital libraries: Architecture and design rationale. Technical Report SIDL-WP-1997-0055; 1997-26, Stanford University, 1997. Accessible at http://dbpubs.stanford.edu/pub/1997-26. In a distributed, heterogeneous, proxy-based digital library, autonomous services and collections are accessed indirectly via proxies. To facilitate metadata compatibility and interoperability in such a digital library, we have designed a metadata architecture that includes four basic component classes: attribute model proxies, attribute model translators, metadata facilities for search proxies, and metadata repositories. Attribute model proxies elevate both attribute sets and the attributes they define to first-class objects. They also allow relationships among attributes to be captured. Attribute model translators map attributes and attribute values from one attribute model to another (where possible). Metadata facilities for search proxies provide structured descriptions both of the collections to which the search proxies provide access and of the search capabilities of the proxies. Finally, metadata repositories accumulate selected metadata from local instances of the other three component classes in order to facilitate global metadata queries and local metadata caching. In this paper, we outline further the roles of these component classes, discuss our design rationale, and analyze related work.
	Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke. Metadata for digital libraries: Architecture and design rationale. In Proceedings of the Second ACM International Conference on Digital Libraries, pages 47-56, 1997. At http://dbpubs.stanford.edu/pub/1997-26. In a distributed, heterogeneous, proxy-based digital library, autonomous services and collections are accessed indirectly via proxies. To facilitate metadata compatibility and interoperability in such a digital library, we have designed a metadata architecture that includes four basic component classes: attribute model proxies, attribute model translators, metadata facilities for search proxies, and metadata repositories. Attribute model proxies elevate both attribute sets and the attributes they define to first-class objects. They also allow relationships among attributes to be captured. Attribute model translators map attributes and attribute values from one attribute model to another (where possible). Metadata facilities for search proxies provide structured descriptions both of the collections to which the search proxies provide access and of the search capabilities of the proxies. Finally, metadata repositories accumulate selected metadata from local instances of the other three component classes in order to facilitate global metadata queries and local metadata caching. In this paper, we outline further the roles of these component classes, discuss our design rationale, and analyze related work.
	Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke. The Stanford Digital Library metadata architecture. International Journal of Digital Libraries, 1(2), February 1997. See also http://dbpubs.stanford.edu/pub/1997-56.
	Michelle Baldonado, Steve Cousins, B. Lee, and Andreas Paepcke. Notable: An annotation system for networked handheld devices. In Proceedings of the Conference on Human Factors in Computing Systems CHI'99, pages 210-211, 1999.
	Michelle Baldonado, Seth Katz, Andreas Paepcke, Chen-Chuan K. Chang, Hector Garcia-Molina, and Terry Winograd. An extensible constructor tool for the rapid, interactive design of query synthesizers. In Proceedings of the Third ACM International Conference on Digital Libraries, 1998. Accessible at http://dbpubs.stanford.edu/pub/1998-48. We describe an extensible constructor tool that helps information experts (e.g., librarians) create specialized query synthesizers for heterogeneous digital-library environments. A query synthesizer provides a graphical user interface in which a digital-library patron can specify a high-level, fielded, multi-source query. Furthermore, a query synthesizer interacts with a query translator and an attribute translator to transform high-level queries into sets of source-specific queries. We discuss how the constructor can facilitate discovery of available attributes (e.g., title), collation of schemas from different sources, selection of input widgets for a synthesizer (e.g., a text box or a drop-down list widget to support input of controlled vocabulary), and other design aspects. We also describe a prototype constructor we implemented, based on the Stanford InfoBus and metadata architecture.
	Michelle Q Wang Baldonado and Steve B. Cousins. Addressing heterogeneity in the networked information environment. New Review of Information Networking, 2:83-102, 1996. Several ongoing Stanford University Digital Library projects address the issue of heterogeneity in networked information environments. A networked information environment has the following components: users, information repositories, information services, and payment mechanisms. This paper describes three of the heterogeneity-focused Stanford projects-InfoBus, REACH, and DLITE. The InfoBus project is at the protocol level, while the REACH and DLITE projects are both at the conceptual model level. The InfoBus project provides the infrastructure necessary for accessing heterogeneous services and utilizing heterogeneous payment mechanisms. The REACH project sets forth a uniform conceptual model for finding information in networked information repositories. The DLITE project presents a general task-based strategy for building user interfaces to heterogeneous networked information services.
	Michelle Q Wang Baldonado and Terry Winograd. Techniques and tools for making sense out of heterogeneous search service results. Technical Report SIDL-WP-1995-0019; 1995-59, Stanford University, 1995.
	Michelle Q Wang Baldonado and Terry Winograd. A user interaction model for browsing based on category-level operations. Technical Report SIDL-WP-1996-0029; 1996-75, Stanford University, 1996. We propose a user interaction model for browsing based on itera tive category-level operations. The motivation comes from two observations: 1) people naturally think in terms of categories, and 2) in browsing, the types of categories that are salient to users change as they browse. We define a set of category-level operations that lets users iteratively view and find results in terms of these changing category types. We also show that we can express some standard IR operations as iteratively applied sequences of a funda mental category-level operation (thus unifying them). Finally, we describe SenseMaker, a prototype interface for browsing heteroge neous sources.
	Michelle Q Wang Baldonado and Terry Winograd. SenseMaker: An information-exploration interface supporting the contextual evolution of a user's interests. In Proceedings of the Conference on Human Factors in Computing Systems CHI'97, pages 11-18, Atlanta, Ga., March 1997. ACM Press, New York.
	Sujata Banerjee and Vibhu O. Mittal. On the use of linguistic ontologies for accessing and indexing distributed digital libraries. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document () . Audience: Non-technical, on-line searchers. References: 16. Links: 1. Relevance: Low. Abstract: Addresses problem of finding correct keywords to search for by using WordNet. If a search doesn't turn up the hits needed, it modifies query by using synonyms, generalizing, or replacing with a set of more specific words. Searcher is asked to approve modified queries, which are then re-sent to content providers.
	Gaurav Banga, Fred Douglis, and Michael Rabinovich. Optimistic deltas for www latency reduction. In Proceedings of USENIX Technical Conference, pages 289-303, 1997.
	Ziv Bar-Yossef, Alexander Berg, Steve Chien, and Jittat Fakcharoenphol Dror Weitz. Approximating aggregate queries about web pages via random walks. In Proceedings of the Twenty-sixth International Conference on Very Large Databases, 2000.
	Ziv Bar-Yossef, Andrei Z. Broder, Ravi Kumar, and Andrew Tomkins. Sic transit gloria telae: towards an understanding of the web's decay. In WWW '04: Proceedings of the 13th international conference on World Wide Web, pages 328-337, New York, NY, USA, 2004. ACM Press. The rapid growth of the web has been noted and tracked extensively. Recent studies have however documented the dual phenomenon: web pages have small half lives, and thus the web exhibits rapid death as well. Consequently, page creators are faced with an increasingly burdensome task of keeping links up-to-date, and many are falling behind. In addition to just individual pages, collections of pages or even entire neighborhoods of the web exhibit significant decay, rendering them less effective as information resources. Such neighborhoods are identified only by frustrated searchers, seeking a way out of these stale neighborhoods, back to more up-to-date sections of the web; measuring the decay of a page purely on the basis of dead links on the page is too naive to reflect this frustration. In this paper we formalize a strong notion of a decay measure and present algorithms for computing it efficiently. We explore this measure by presenting a number of validations, and use it to identify interesting artifacts on today's web. We then describe a number of applications of such a measure to search engines, web page maintainers, ontologists, and individual users.
	Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random networks. Science, 286(5439):509-512, October 1999.
	David Bargeron, Anoop Gupta, Jonathan Grudin, and Elizabeth Sanocki. Annotations for streaming video on the web: System design and usage studies. In Proceedings of the Eighth International World-Wide Web Conference, 1999. Streaming video on the World Wide Web is being widely deployed, and workplace training and distance education are key applications. The ability to annotate video on the Web can provide significant added value in these and other areas. Written and spoken annotations can provide `in context' personal notes and can enable asynchronous collaboration among groups of users. With annotations, users are no longer limited to viewing content passively on the Web, but are free to add and share commentary and links, thus transforming the Web into an interactive medium. We discuss design considerations in constructing a collaborative video annotation system, and we introduce our prototype, called MRAS. We present preliminary data on the use of Web- based annotations for personal note-taking and for sharing notes in a distance education scenario, Users showed a strong preference for MRAS over pen-and-paper for taking notes, despite taking longer to do so. They also indicated that they would make more abstract and questions with MRAS than in a `live' situation, and that sharing added substantial value.
	Bruce R. Barkstrom, Melinda Finch, Michelle Ferebee, and Calvin Mackey. Adapting digital libraries to continual evolution. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. In this paper, we describe five investment streams (data storage infrastructure, knowledge management, data production control, data transport and security, and personnel skill mix) that need to be balanced against short-term operating demands in order to maximize the probability of long-term viability of a digital library. Because of the rapid pace of information technology change, a digital library cannot be a static institution. Rather, it has to become a flexible organization adapted to continuous evolution of its infrastructure.
	Kobus Barnard, Pinar Duygulu, David Forsyth, Nando de Freitas, David M. Blei, and Michael I. Jordan. Matching words and pictures. J. Mach. Learn. Res., 3:1107-1135, 2003. We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation) and corresponding to particular image regions (region naming). Auto-annotation might help organize and access large collections of images. Region naming is a model of object recognition as a process of translating image regions to words, much as one might translate from one language to another. Learning the relationships between image regions and semantic correlates (words) is an interesting example of multi-modal data mining, particularly because it is typically hard to apply data mining techniques to collections of images. We develop a number of models for the joint distribution of image regions and words, including several which explicitly learn the correspondence between regions and words. We study multi-modal and correspondence extensions to Hofmann's hierarchical clustering/aspect model, a translation model adapted from statistical machine translation (Brown et al.), and a multi-modal extension to mixture of latent Dirichlet allocation (MoM-LDA). All models are assessed using a large collection of annotated images of real scenes. We study in depth the difficult problem of measuring performance. For the annotation task, we look at prediction performance on held out data. We present three alternative measures, oriented toward different types of task. Measuring the performance of correspondence methods is harder, because one must determine whether a word has been placed on the right region of an image. We can use annotation performance as a proxy measure, but accurate measurement requires hand labeled data, and thus must occur on a smaller scale. We show results using both an annotation proxy, and manually labeled data.
	Kobus Barnard and David .A. Forsyth. Learning the semantics of words and pictures. In Proceedings of the IEEE International Conference on Computer Vision, July 2001.
	Rob Barrett, Paul P. Maglio, and Daniel C. Kellem. How to personalize the web. In Proceedings of the Conference on Human Factors in Computing Systems CHI'97, 1997.
	Laura M. Bartolo, Cathy S. Lowe, Adam C. Powell IV, Donald R. Sadoway, Jorges Vieyra, and Kyle Stemen. Use of matml with software applications for e-learning. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. This pilot project investigates facilitating the development of the Semantic Web for e-learning through a practical example, using Materials Property Data Markup Language (MatML) to provide materials property data to a web-based application program. Property data for 100 materials is marked up with MatML and used as an input format for an application program. Students use the program to generate graphs showing selected properties for different materials. Selected graphs are submitted to the Materials Digital Library (MatDL) so that successive classes may be informed by earlier work to encourage new discoveries.
	C. Batini, M. Lenzerini, and S. Navathe. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys, 18(4), 1986.
	Patrick Baudisch and Ruth Rosenholtz. Halo: a technique for visualizing off-screen objects. In CHI '03: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 481-488, New York, NY, USA, 2003. ACM Press.
	E. Bauer, D. Koller, and Y. Singer. Update rules for parameter estimation in Bayesian networks. In Proceedings of the 13th Annual Conference on Uncertainty in AI (UAI), 1997.
	M. Bearman. Odp-trader. Open Distributed Processing, 2:19 - 33, 1994.
	Herb Becker. The role of the library of congress in the national digital library. In Proceedings of DL'96, 1996. Format: Not yet online.
	Benjamin B. Bederson. Photomesa: a zoomable image browser using quantum treemaps and bubblemaps. In Proceedings of the 14th annual ACM symposium on User interface software and technology, pages 71-80. ACM Press, 2001.
	Benjamin B. Bederson, Ben Shneiderman, and Martin Wattenberg. Ordered and quantum treemaps: Making effective use of 2D space to display hierarchies. ACM Transactions on Graphics, 21(4):833-854, 2002.
	Doug Beeferman, Adam Berger, and John D. Lafferty. Statistical models for text segmentation. Machine Learning, 34(1-3):177-210, 1999.
	Alireza Behreman. Generic electronic payment services. In The Second USENIX Workshop on Electronic Commerce Proceedings, 1996.
	Alireza Behreman and Rajkumar Narayanaswamy. Payment method negotiation service. In The Second USENIX Workshop on Electronic Commerce Proceedings, 1996.
	M. Beigl and R. Rudisch. System support for mobile computing. Computers & Graphics, 20(5):619-625, 1996. Today a mobile user wants to connect his portable computer: remotely to the central database at home, locally to the printer on the spot and globally to the world-wide-web. To achieve this, different connection lines are available: wireless networks for connecting out in the fields, ISDN or analogue telephone lines when residing in a hotel, Ethernet access at the customer's site. But this connectivity raises a lot of questions, about technical, security or accounting issues. This paper presents the architecture of an environment aiming to support mobile users and dealing with the given problems.
	N.J. Belkin and W. Bruce Croft. Information filtering and information retrieval: two sides of same coin? Communications of the ACM, 35(12):29-38, December 1992. A comparison is made between information retrieval and information filtering. The authors determine that information filtering is a well defined process. By examining its foundations and comparing it to the foundations of the IR enterprise, the authors find there is very little difference between filtering and retrieval at an abstract level. They conclude that the two enterprises have the same goal; namely they are both concerned with getting information to people who need it. However, the authors emphasize that IR research has ignored some aspects of the general problem which both IR and information filtering address, and that these aspects are precisely those which especially relevant to the specific contexts of filtering.
	Timothy C. Bell, Alistair Moffat, and Ian H. Witten. Compressing the digital library. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (32K) . Audience: Semi-technical, general computer scientists. References: 8. Links: 1. Relevance: Medium (but not mainstream DL). Abstract: Discusses the interaction of compression and indexing. Suggests a Huffman encoding applied to words & non-words. Inverted bitmap for indexing, enhanced with Golomb encoding. Compressed 266 Mb Wall Street Journal a rticle database by 50including creating the index. Queries were processed in less than .1 sec.
	M. Bellare, J.A. Garay, R. Hauser, A. Herzberg, H. Krawczyk, M. Steiner, G. Tsudik, and M. Waidner. ikp-a family of secure electronic payment protocols. In Proceedings of the First USENIX Workshop of Electronic Commerce, Berkeley, CA, USA, 1995. USENIX Assoc. This paper proposes a family of protocols-iKP (i=1,2,3)-for secure electronic payments over the Internet. The protocols implement credit card-based transactions between the customer and the merchant while using the existing financial network for clearing and authorization. The protocols can be extended to apply to other payment models, such as debit cards and electronic checks. They are based on public-key cryptography and can be implemented in either software or hardware. Individual protocols differ in key management complexity and degree of security. It is intended that their deployment be gradual and incremental. The iKP protocols are presented herein with the intention to serve as a starting point for eventual standards on secure electronic payment.
	Jezekiel Ben-Arie, Purvin Pandit, and ShyamSundar Rajaram. Design of a digital library for human movement. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. This paper is focused on a central aspect in the design of our planned digital library for human movement, i.e. on the aspect of representation and recognition of human activity from video data. The method of representation is important since it has a major impact on the design of all the other building blocks of our system such as the user interface/query block or the activity recognition/storage block. In this paper we evaluate a representation method for human movement that is based on sequences of angular poses and angular velocites of the human skeletal joints, for storage and retrieval of human actions in video databases. The choice of a representation method plays an important role in the database structure, search methods, storage efficiency etc.. For this representation, we develop a novel approach for complex human activity recognition by employing multidimensional indexing combined with temporal or sequential correlation. this scheme is then evaluated with respect to its efficiency in storage and retrieval. For the indexing we use postures of humans in videos that are decomposed into a set of multidimensional tuples which represent the poses/velocities of human body parts such as arms, legs and torso. Three novel methods for human activity recognition are theoretically and experimentally compared. The methods require only a few sparsely sampled human postures. We also achieve speed invariant recognition of activities by eliminating the time factor and replacing it with sequence information. The indexing approach also provides robust recognition and an efficient storage/retrieval of all the activities in a small set of hash tables.
	Israel Ben-Shaul, Michael Herscovici, Michal Jacovi, Yoelle S. Maarek, Dan Pelleg, Menachem Shtalhaim, Vladimir Soroka, and Sigalit Ur. Adding support for dynamic and focused search with fetuccino. In Proceedings of the Eighth International World-Wide Web Conference, 1999. This paper proposes two enhancements to existing search services over the Web. One enhancement is the addition of limited dynamic search around results provided by regular Web search services, in order to correct part of the discrepancy between the actual Web and its static image as stored in search repositories. The second enhancement is an experimental two-phase paradigm that allows the user to distinguish between a domain query and a focused query within the dynamically identified domain. We present Fetuccino, an extension of the Mapuccino system that implements these two enhancements. Fetuccino provides an enhanced user-interface for visualization of search results, including advanced graph layout, display of structural information and support for standards (such as XML). While Fetuccino has been implemented on top of existing search services, its features could easily be integrated into any search engine for better performance. A light version of Fetuccino is available on the Internet at http://www.ibm.com/java/fetuccino.
	Israel Ben-Shaul, Michael Herscovici, Michal Jacovi, Yoelle S. Maarek, Dan Pelleg, Menachem Shtalhaim, Vladimir Soroka, and Sigalit Ur. Adding support for dynamic and focused search with fetuccino. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
	Tamara L. Berg, Alexander C. Berg, Jaety Edwards, Michael Maire, Ryan White, Yee-Whye Teh, Erik Learned-Miller, and D.A. Forsyth. Names and faces in the news. In CVPR 2004: Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2004.
	Donna Bergmark. Collection synthesis. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. The invention of the hyperlink and the HTTP transmission protocol caused an amazing new structure to appear on the Internet - the World Wide Web. With the Web, there came spiders, robots, and Web crawlers, which go from one link to the next checking Web health, ferreting out information and resources, and imposing organization on the huge collection of information (and dross) residing on the net. This paper reports on the use of one such crawler to synthesize document collections on various topics in science, mathematics, engineering and technology. Such collections could be part of a digital library.
	Howard Besser. Mesl project description. In Proceedings of DL'96, 1996. Format: Not yet online.
	Krishna Bharat and Andrei Broder. Mirror, mirror on the web: A study of host pairs with replicated content. In Proceedings of the Eighth International World-Wide Web Conference, 1999. TWO previous studies. one done at Stanford in 1997 based on data collected by the Google search engine, and one done at Digital in 1996 based on AltaVista data, revealed that almost a third of the Web consists of duplicate pages. Both studies identified mirroring, that is, the systematic replication of content over a pair of hosts, as the principal cause of duplication, but did not further investigate this phenomenon. The main aim of this paper is to present a clearer picture of mirroring on the Web. As input we used a set of 179 million URLs found during a Web crawl done in the summer of 1998. We looked at all hosts with more than 100 URLs in our input (about 238,000), and discovered that about 10the prevalence of mirroring based on a mirroring classification scheme that we define. There are numerous reasons for mirroring: technical (e.g., to improve access time), commercial (e.g., different intermediaries offering the same products), cultural (e.g., same content in two languages), social (e.g.. sharing of research data). and so forth. Although we have not done a exhaustive study of the causes of replication, we discuss and provide examples for several representative cases. Our technique for detecting mirrored hosts from large sets of collected URLs depends mostly on the syntactic analysis of URL strings, and requires retrieval and content analysis only for a small number of pages. We are able to detect both partial and total mirroring, and handle cases where the content is not byte-wise identical. Furthermore, our technique is computationally very efficient and does not assume that the initial set of URLs gathered from each host is comprehensive. Hence, this approach has practical uses beyond our study, and can be applied in other settings. For instance, for Web crawlers and caching proxies, detecting mirrors can be valuable to avoid redundant fetching. and knowledge of mirroring can be used to compensate for broken links.
	Krishna Bharat, Andrei Broder, Monika Henzinger, Puneet Kumar, and Suresh Venkatasubramanian. The connectivity server: Fast access to linkage information on the web. In Proceedings of the Seventh International World-Wide Web Conference, April 1998.
	B. Bhushan et al. Managing heterogeneous networks-integrator-based approach. In IFIP Transactions C (Communication Systems), 1993. The authors discuss an object oriented approach to network management. Their goal is to briefly explain a real example of an integrated network management (INM) system. One of the major requirements when looking at information transfer between the managed network and the management system is to mask the heterogeneity of the underlying resources. As an example of the unification of heterogeneity networks, a software called the Integrator has been designed and implemented. The Integrator is a mechanism that provides an object oriented interface to the user (human or network management application programs) to offer a homogeneous view of a world (set of heterogeneous domains) through a model (depicting a formal information view). The Integrator uses two agents to communicate with underlying network elements: an SNMP agent accessing TCP/IP parameters for an Ethernet network through a SNMP agent, and an X.25 interface program doing the same for X.25 parameters through proprietary management software. The concepts of the Integrator has been applied in the EC project PEMMON
	Timothy W. Bickmore and Bill N. Schilit. Digestor: Device-independent access to the world wide web. In Proceedings of the Sixth International World-Wide Web Conference, 1997.
	Eric Bier, Lance Good, Kris Popat, and Alan Newberger. A document corpus browser for in-depth reading. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Software tools, including Web browsers, e-books, electronic document formats, search engines, and digital libraries are changing the way that people read, making it easier for them to find and view documents. However, while these tools provide significant help with short-term reading projects involving small numbers of documents, they fall short of supporting readers engaged in longer-term reading projects, in which a topic is to be understood in-depth by reading many documents. Such readers need to find and manage many documents and citations, remember what they have read, and prioritize what to read next. In this paper, we describe three integrated software tools that facilitate in-depth reading. A first tool extracts citation information from documents. A second finds on-line documents from their citations. The last is a document corpus browser that uses a zoomable user interface to show a corpus at multiple granularities while supporting reading tasks that take days, weeks, or longer. We describe these tools and the design principles that motivated them.
	Eric A. Bier and Adam Perer. Icon abacus and ghost icons. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. We present two techniques that make document collection visualizations more informative. Icon abacus uses the horizontal position of icon groups to communicate document attributes. Ghost icons show linked documents by adding temporary icons and by highlighting or dimming existing ones.
	William P. Birmingham. An agent-based architecture for digital libraries. D-Lib Magazine, July 1995. Format: HTML Document().
	William P. Birmingham, Karen M. Drabenstott, Carolyn O. Frost, Amy J. Warner, and Katherine Willis. The university of michigan digital library: This is not your father's library. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (36K) . Audience: slightly technical, generalist comfortable with technology, funders. References: 13. Links: 1. Relevance: Medium-High. Abstract: Describes the UMichigan Digital Libraries proposal, including some detail about their agent architecture. User agents, Collection-interface agents, and mediators all play a role. Network resources are allocated on a market-based mechanism, and proposal mentions need to protect intellectual property & handle payment issues.
	William P. Birmingham, Edmund H. Durfee, Tracy Mullen, and Michael P. Wellman. The distributed agent architecture of the university of michigan digital library (extended abstract). In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Ann Peterson Bishop. Working towards an understanding of digital library use: A report on the user research efforts of the nsf/arpa/nasa dli projects. D-Lib Magazine, October 1995. Format: HTML Document().
	Ann Peterson Bishop. Making digital libraries go: Comparing use across genres. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. A new federal initiative called Information Technology for the Twenty-First Century (IT2) recognizes the need to bridge research across domains in or&r to bring computing benefits to society at large. One implication for digital library (DL) research is that we should start looking at projects that span the spectrum from basic computer science to the implementation of working systems and consider links among findings on information system use from a variety of arenas in life. In this paper, I integrate findings from my research on people's encounters with DLs in two different arenas: academia and low-income neighborhoods. The point is to see how concepts and conclusions related to use do, in fact, cross these arenas. The paper also aims to help bring results from studies of local community information practices into the realm of DLs, since community networking represents one particular genre and audience that has not yet received a great deal of attention from those engaged in DL research. Beginning with a discussion of DL use as an `assemblage` of infrastructure, norms, knowledge, and practice, the paper explores a number of insights gleaned from user studies associated with two separate research projects: 1) the recently completed University of Illinois Digital Libraries Initiative (DLI) project; and 2) the Community Networking Initiative (CNI) currently in progress under the auspices of the University of Illinois, the Urban League of Champaign County and Prairienet, the community network serving East Central Illinois. Insights about DL use discussed in this paper include: the way in which trivial barriers are magnified until they effectively cut off use on a large scale; the difficulties faced by `outsiders` whose information worlds are impoverished, the primacy of comfort and relevant content in encouraging use; and the importance of informal social networks for providing help related to system use.
	Barclay Blair and John Boyer. Xfdl: Creating electronic commerce transaction records using xml. In Proceedings of the Eighth International World-Wide Web Conference, 1999. In the race to transform the World Wide Web from a medium for information presentation to a medium for information exchange, the development of practices for ensuring the security, auditability, and non- repudiation of transactions that are well established in the paper-based world has not kept pace in the digital world. Existing Internet technology provides no easy way to create a valid `digital receipt' that meets the requirements of both complex distributed networks and the business community. In addition, an improved articulation of digital signatures is needed. Extensible Forms Description Language (XFDL), developed by UWI.Com and Tim Bray, is an application of XML that allows organizations to move their paper-based forms systems to the Internet while maintaining the necessary attributes of paper-based transaction records. XFDL was designed for implementation in business-to-business electronic commerce and intra-organizational information transactions.
	Catherine Blake. Information synthesis: A new approach to explore secondary information in scientific literature. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Advances in both technology and publishing practices continue to increase the quantity of scientific literature that is available electronically. In this paper, we introduce the Information Synthesis process, a new approach that enables scientists to visualize, explore, and resolve contradictory findings that are inevitable when multiple empirical studies explore the same natural phenomena. Central to the Information Synthesis approach is a cyber-infrastructure that provides a scientist with both secondary information from an article and structured information resources. To demonstrate this approach, we have developed the Multi-User, Information Extraction for Information Synthesis (METIS) System. METIS is an interactive system that automates critical tasks within the Information Synthesis process. We provide two case-studies that demonstrate the utility of the Information Synthesis approach.
	J.A. Blakeley, W.J. McKenna, and G. Graefe. Experiences building the open oodb query optimizer. In Proceedings of the International Conference on Management of Data, 1993. The authors report their experiences building the query optimizer for TI's Open OODB system. It is probably the first working object query optimizer to be based on a complete extensible optimization framework including logical algebra, execution algorithms, property enforcers, logical transformation rules, implementation rules, and selectivity and cost estimation. Their algebra incorporates a new materialize operator with its corresponding logical transformation and implementation rules that enable the optimization of path expressions. The Open OODB query optimizer was constructed using the Volcano Optimizer Generator, demonstrating that this second-generation optimizer generator enables rapid development of efficient and effective query optimizers for non-standard data models and systems.
	Ann Blandford, Suzette Keith, Iain Connell, and Helen Edwards. Analytical usability evaluation for digital libraries: a case study. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. There are two main kinds of approach to considering usability of any system: empirical and analytical. Empirical techniques involve testing systems with users, whereas analytical techniques involve usability personnel assessing systems using established theories and methods. We report here on a set of studies in which four different techniques were applied to various digital libraries, focusing on the strengths, limitations and scope of each approach. Two of the techniques, Heuristic Evaluation and Cognitive Walkthrough, were applied in text-book fashion, because there was no obvious way to contextualize them to the Digital Libraries (DL) domain. For the third, Claims Analysis, it was possible to develop a set of re-usable scenarios and personas that relate the approach specifically to DL development. The fourth technique, CASSM, relates explicitly to the DL domain by combining empirical data with an analytical approach. We have found that Heuristic Evaluation and Cognitive Walkthrough only address superficial aspects of interface design (but are good for that), whereas Claims Analysis and CASSM can help identify deeper conceptual difficulties (but demand greater skill of the analyst). However, none fit seamlessly within the fragmented function-oriented design practices that typify much digital library development, highlighting an important area for further work to support improved usability.
	Ann Blandford, Hanna Stelmaszewska, and Nick Bryan-Kinns. Use of multiple digital libraries: A case study. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. The aim of the work reported here was to better understand the usability issues raised when digital libraries are used in a natural setting. The method used was a protocol analysis of users working on a task of their own choosing to retrieve documents from publicly available digital libraries. Various classes of usability difficulties were found. Here, we focus on use in context - that is, usability concerns that arise from the fact that libraries are accessed in particular ways, under technically and organisationally imposed constraints, and that use of any particular resource is discretionary. The concepts from an Interaction Framework, which provides support for reasoning about patterns of interaction between users and systems, are applied to understand interaction issues.
	R. Boisvert, S. Browne, J. Dongarra, and E. Grosse. Digital software and data repositories for support of scientific computing. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	Kurt D. Bollacker, Steve Lawrence, and C. Lee Giles. A system for automatic personalized tracking of scientific literature on the web. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. We introduce a system as part of the CiteSeer digital library project for automatic tracking of scientific literature that is relevant to a user's research interests. Unlike previous systems that use simple keyword matching, CiteSeer is able to track and recommend topically relevant papers even when keyword based query profiles fail. This is made possible through the use of a heterogenous profile to represent user interests. These profiles include several representations, including content based relatedness measures. The CiteSeer tracking system is well integrated into the search and browsing facilities'of CiteSeer, and provides the user with great flexibility in tuning a profile to better match his or her interests. The software for this system is available, and a sample database is online as a public service.
	Leslie Bondaryk. Calculus modules online: An internet multimedia application. In DAGS'95, 1995. Format: HTML Document(21K + pictures) Audience: Calculus Instructors. References: 13. Links: 16. Abstract: Discusses an architecture for a system that aids in the teaching of calculus.
	J. Bonigk and A. Lubinski. A basic architecture for mobile information access. Computers & Graphics, 20(5):683-91, 1996. As the development of pen computing' continues, more and more of today's computers are likely gradually to move away from people's desktops and into their pockets. The development of personal digital assistants (PDAs) has initiated this move. As these devices move into people's pockets, they need the ability to access information on the move. This article describes a generic view of a client server mobile computing architecture. It also sheds some light on the basic network topologies that have been considered previously for such systems. The scenario used is a hospital ward. Each doctor is equipped with a PDA and each ward or a group of wards with a server providing patient records. As a doctor visits a patient in a ward, the patient's record is accessed from the server onto the PDA. The doctor updates the record and sends the update back to the server.
	Jos‰ Borbinha, Nuno Freire, and Joƒo Neves. Bnd: A national digital library as a jigsaw. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. This paper describes the architecture and components of the infrastructure in construction for the National Digital Library in Portugal. The requirements emerged from the definition of the services to support, with a special focus on scalability, and from the decision to give a special attention to community building standards, open solutions, and reusable and cost effective components. The generic bibliographic metadata format in this project is UNIMARC, and the structural metadata is METS. The URN identifiers are processed and resolved as simple but very effective PURL identifiers, and the storage is provided by the new emerging LUSTRE file system, for immediate access, and by a locally developed GRID architecture, ARCO, for long term preservation. All these components run on Linux servers, as also the middleware for access based in the FEDORA framework.
	N. Borenstein and N. Freed. MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for specifying and describing the format of Internet message bodies, September 1993. Internet RFC 1521.
	Nathaniel Borenstein. Cooperative work in the andrew message system. In Proceedings of the Conference on Computer-Supported Cooperative Work, CSCW'88, 1988. Describes collab-related aspects of Andrew.
	Christine L. Borgman, Gregory Leazer, Anne Gilliland-Swetland, Kelli Millwood, Leslie Champeny, Jason Finley, and Laura J. Smart. How geography professors select materials for classroom lectures: Implications for the design of digital libraries. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. A goal of the Alexandria Digital Earth Prototype (ADEPT) project is to make primary resources in geography useful for undergraduate instruction in ways that will promote inquiry learning. The ADEPT education and evaluation team interviewed professors about their use of geography information as they prepare for class lectures, as compared to their research activities. We found that professors desired the ability to search by concept (erosion, continental drift, etc.) as well as geographic location, and that personal research collections were an important source of instructional materials. Resources in geo-spatial digital libraries are typically described by location, but are rarely described by concept or educational application. This paper presents implications for the design of an educational digital library from our observations of the lecture preparation process. Findings include functionality requirements for digital libraries and implications for the notion of digital libraries as a shared information environment. The functional requirements include definitions and enhancements of searching capabilities, the ability to contribute and to share personal collections of resources, and the capability to manipulate data and images.
	Katy Borner, Ying Feng, and Tamara McMahon. Collaborative visual interfaces to digital libraries. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. This paper argues for the design of collaborative visual interfaces to digital libraries that support social navigation. As an illustrative example we present work in progress on the design of a three-dimensional document space for a scholarly community - namely faculty, staff, and students at the School of Library and Information Science, Indiana University. We conclude with a set of research challenges.
	C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber, Michael F. Schwartz, and Duane P. Wessels. Harvest: A scalable, customizable discovery and access system. Technical Report CU-CS-732-94, Dept. of Computer Science, Univ. of Colorado, Boulder, Colo., August 1994. Accessible at `http://harvest.transarc.com/`.
	C.M. Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber, and Michael F. Schwartz. The harvest information discovery and access system. Computer Networks and ISDN Systems, 28(1-2):119-125, December 1995. It is increasingly difficult to make effective use of Internet information, given the rapid growth in data volume, user base, and data diversity. We introduce Harvest, a system that provides a scalable, customizable architecture for gathering, indexing, caching, replicating, and accessing Internet information.
	Claus Brabrand, Anders Moller, Anders Sandholm, and Michael I. Schwartzbach. A runtime system for interactive web services. In Proceedings of the Eighth International World-Wide Web Conference, 1999. Interactive Web services are increasingly replacing traditional static Web pages. Producing Web services seems to require a tremendous amount of laborious low-level coding due to the primitive nature of CGI programming. We present ideas for an improved runtime system for interactive Web services built on top of CGI running on virtually every combination of browser and HTTP/CGI server. The runtime system has been implemented and used extensively in <bigwig>. a tool for producing interactive Web services.
	Onn Brandman, Junghoo Cho, Hector Garcia-Molina, and Narayanan Shivakumar. Crawler-friendly web servers. In Proceedings of the Workshop on Performance and Architecture of Web Servers (PAWS), Santa Clara, California, June 2000. Held in conjunction with ACM SIGMETRICS 2000. Available at http://dbpubs.stanford.edu/pub/2000-25. In this paper we study how to make web servers (e.g., Apache) more crawler friendly. Current web servers offer the same interface to crawlers and regular web surfers, even though crawlers and surfers have very different performance requirements. We evaluate simple and easy-to-incorporate modifications to web servers so that there are significant bandwidth savings. Specifically, we propose that web servers export meta-data archives decribing their content.
	Onn Brandman, Hector Garcia-Molina, and Andreas Paepcke. Where have you been? a comparison of three web tracking technologies. In Submitted for publication, 1999. Available at http://dbpubs.stanford.edu/pub/1999-61. Web searching and browsing can be improved if browsers and search engines know which pages users frequently visit. 'Web tracking' is the process of gathering that information. The goal for Web tracking is to obtain a database describing Web page download times and users' page traversal patterns. The database can then be used for data mining or for suggesting popular or relevant pages to other users. We implemented three Web tracking systems, and compared their performance. In the first system, rather than connecting directly to Web sites, a client issues URL requests to a proxy. The proxy connects to the remote server and returns the data to the client, keeping a log of all transactions. The second system uses `sniffers` to log all HTTP traffic on a subnet. The third system periodically collects browser log files and sends them to a central repository for processing. Each of the systems differs in its advantages and pitfalls. We present a comparison of these techniques.
	Jack Brassil. September - secure electronic publishing trial. In Proceedings of DL'96, 1996. Format: Not yet online.
	Lee Breslau, Pei Cao, Li Fan, Graham Phillips, and Scott Shenker. Web caching and zipf-like distributions: Evidence and implications. In Proceedings of Infocom, 1999.
	Allen Brewer, Wei Ding, Karla Hahn, and Anita Komlodi. The role of intermediary services in emerging digital libraries. In Proceedings of DL'96, 1996. Format: Not yet online.
	M.W. Bright, A.R. Hurson, and S. Pakzad. Automated resolution of sematic heterogeneity in multidatabases. ACM Transaction on Database Systems, 19(2):212-253, June 1994.
	M.W. Bright, A.R. Hurson, and Simin H. Pakzad. A taxonomy and current issues in multidatabase systems. IEEE Computer, 25(3):51-60, March 1992. This article presents a taxonomy of global information-sharing systems and discusses where multidatabase systems fit in the spectrum of solutions. The authors use this taxonomy as a basis for defining multidatabase systems, then discuss the issues associated with them. In particular, the paper focuses on two major design approaches- global schema systems and multidatabase language systems.
	Brightplanet.com. http://www.brightplanet.com.
	The Deep Web: Surfacing Hidden Value. http://www.completeplanet.com/Tutorials/DeepWeb/.
	S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of 7th World Wide Web Conference, 1998. In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/ To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine - the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
	Sergev Brin, James Davis, and Hector Garcia-Molina. Copy detection mechanisms for digital documents. SIGMOD, pages 398-409, 1995. In a digital library system, documents are available in digital form and therefore are more easily copied and their copyrights are more easily violated. This is a very serious problem, as it discourages owners of valuable information from sharing it with authorized users. There are two main philosophies for addressing this problem: prevention and detection. The former actually makes unauthorized use of documents difficult or impossible while the latter makes it easier to discover such activity. We propose a system for registering documents and then detecting copies, either complete copies or partial copies. We describe algorithms for such detection, and metrics required for evaluating detection mechanisms (covering accuracy, efficiency, and security). We also describe a working prototype, called COPS, describe implementation issues, and present experimental results that suggest the proper settings for copy detection parameters.
	Sergey Brin. Extracting patterns and relations from the world wide web. In WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT'98, 1998. Available at http://www-db.stanford.edu/ sergey/extract.ps. Seed a search with examples of a pattern, such as citations to books. Let the engine run over Web pages and learn. Get back more books.
	Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International World-Wide Web Conference, 1998. Shows architecture of Google.
	Sergey Brin and Lawrence Page. Dynamic data mining: A new architecture for data with high dimensionality. Technical report, Stanford University, 1998. Describes a new architecture for data mining. It makes use of some of the dynamic itemset counting technology
	Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web: experiments and models. In Proceedings of the Ninth International World-Wide Web Conference, 2000.
	Eric W. Brown, James P. Callan, and W. Bruce Croft. Fast incremental indexing for full-text information retrieval. In Proceedings of the Twentieth Internationl Conference on Very Large Databases, pages 192-202, September 1994.
	Eric W. Brown, James P. Callan, W. Bruce Croft, and J. Eliot B. Moss. Supporting full-text information retrieval with a persistent object store. In Proceedings of the Fourth Internationl Conference on Extending Database Technology-EDBT'94, pages 365-378, March 1994.
	Michael S. Brown and W. Brent Seales. Beyond 2d images: Effective 3d imaging for library materials. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Significant efforts are being made to digitize rare and valuable library materials, with the goal of providing patrons and historians digital facsimiles that capture the look and feel of the original materials. This is often done by digitally photographing the materials and making high resolution 2D images available. The underlying assumption is that the objects are flat. However, older materials may not be flat in practice, being warped and crinkled due to decay, neglect, accident and the passing of time. In such cases, 2D imaging is insufficient to capture the look and feel of the original. For these materials, 3D acquisition is necessary to create a realistic facsimile. This paper outlines a technique for capturing an accurate 3D representation of library materials which can be integrated directly into current digitization setups. This will allow digitization efforts to provide patrons with more realistic digital facsimile of library materials.
	Michael S. Brown and W. Brent Seales. The digital atheneum: New approaches for preserving, restoring and analyzing damaged manuscripts. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. This paper presents research focused on developing new techniques and algorithms for the digital acquisition, restoration, and study of damaged manuscripts. We present results from an acquisition effort in partnership with the British Library, funded through the NSF DLI-2 program, designed to capture 3-D models of old and damaged manuscripts. We show how these 3-D facsimiles can be analyzed and manipulated in ways that are tedious or even impossible if confined to the physical manuscript. In particular, we present results from a restoration framework we have developed for flattening the 3-D representation of badly warped manuscripts. We expect these research directions to give scholars more sophisticated methods to preserve, restore, and better understand the physical objects they study.
	Michael S. Brown and Desmond Tsoi. Correcting common image distortions in library materials acquired by a camera [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. We present a technique to correct image distortion that can occur when library materials are imaged by cameras. Our approach provides a general framework to undo a variety of common distortions, including binder curl, fold distortion, and combinations of the two. Our algorithm is described and demonstrated on several examples.
	Shirley Browne, Jack Dongarra, Eric Grosse, and Tom Rowan. The netlib mathematical software repository. D-Lib Magazine, Sep 1995. Format: HTML Document().
	Shirley Browne, Jack Dongarra, Ken Kennedy, and Tom Rowan. Management of the nationale hpcc software exchange-a virtual distributed digital library. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document (35K + pictures) . Audience: Computer scientists, mathematicians, librarians. References: 15. Links: 13. Relevance: Low-Medium. Abstract: Describes the NHSE software repository, with files kept at authors' sites, but a central index in a common form (prepared manually now, but hopefully automatically later). Includes a special process for submission a nd revision via web forms, with digital signatures (PGP) required for authentication. Accepted files are fingerprinted using MD5 so that modifications can be detected. A scheme of LIFNs (Location Independent FileNames) is essentially a precursor to URN's.
	Peter Brusilovsky, Rosta Farzan, and Jaewook Ahn. Comprehensive personalized information access in an educational digital library. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. This paper explores two ways to help students locate most relevant resources in educational digital libraries. One is a more comprehensive access to educational resources through several ways of information access including browsing and information visualization. Another is personalized information access through social navigation support. The paper presents the details of the Knowledge Sea III system for comprehensive personalized access to educational resources and presented results of a classroom study. The study delivered a convincing argument for the importance of providing several ways of information showing that only about 10% of all resource accesses were made through the traditional search interface. We have also collected some good evidence in favor of the social navigation support.
	George Buchanan, David Bainbridge, Katherine Don, and Ian H. Witten. A new framework for building digital library collections. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. This paper introduces a new framework for building digital library collections and contrasts it with existing systems. It describes a significant new step in the development of a widely-used open-source digital library system, Greenstone, which has evolved over many years. It is supported by a fresh implementation, which forced us to rethink the entire design rather than making incremental improvements. The redesign capitalizes on the best ideas from the existing system, which have been refined and developed to open new avenues through which digital librarians can tailor their collections. We demonstrate its flexibility by showing how digital library collections can be extended and altered to satisfy new requirements.
	George Buchanan and Annika Hinze. A generic alerting service for digital libraries. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Users of modern digital libraries (DLs) can keep themselves up-to-date by searching and browsing their favorite collections, or more conveniently by resorting to an alerting service. The alerting service notifies its clients about new or changed documents. So far, no sophisticated service has been proposed that covers heterogeneous and distributed collections and is integrated with the digital library software. This paper analyses the conceptual requirements of this much-sought after service for digital libraries. We demonstrate that the diffing concepts of digital libraries and its underlying technical design has extensive influence (a) the expectations, needs and interests of users regarding an alerting service, and (b) on the technical possibilities of the implementation of the service. Our findings will show that the range of issues surrounding alerting services for digital libraries, their design and use is greater than one may anticipate. We also show that, conversely, the requirements for an alerting service have considerable impact on the concepts of DL design. Our findings should be of interest for librarians as well as system designers. We highlight and discuss the far-reaching implications for the design of, and interaction with, libraries. This paper discusses on the lessons learned from building such a distributed alerting service. We present our prototype implementation as a proof-of-concept for an alerting service for open DL software.
	John Buford. Evaluation of a query language for structured hypermedia documents. In DAGS '95., 1995. Format: Not Yet On-line. Audience: Technical. HyTime developers.. References: 17. Links: . Relevance: Low. Abstract: HyTime is an ISO standard for hypermedia time based documents. This paper discusses an implementation of a database and search engine operating in that language. Examples of queries, optimizations, etc.
	J. Bumiller and S. Rather. Electronic meeting assistance. In Human Computer Interaction. Vienna Conference, VCHCI '93 Fin de Siecle Proceedings, pages 425-6, Sep 1993. The Electronic Meeting Assistance (EMA) is a virtually co-located mixed system. That means that all participants of the meeting are present at the same time but not necessary at the same location (for example some people meet in a room and an external, remote expert is included via a local area network). During the meeting the personal Notepads of the participants are linked together using a radio LAN. In addition an interactive white-board e.g. the Xerox LiveBoard is used for visualisation and manipulation of common data. To assist cooperative work, the EMA system supports the exchange of information during meetings. Various information can be exchanged between meeting members, for example contact information, prepared notes and diagrams; electronic presentations could be given or a paper could be edited by the group.
	P. Buneman, S.B. Davidson, K. Hart, C. Overton, and L. Wong. A data transformation system for biological data sources. In Proceedings of the Twenty-first International Conference on Very Large Databases, Zurich, Switzerland, 1995. VLDB Endowment, Saratoga, Calif.
	Robin Burke and Kristian J. Hammond. Combining databases and knowledge bases for assisted browsing. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Vannevar Bush. As we may think. The Atlantic Monthly, July 1945.
	Christoph Bussler, Stefan Jablonski, Thomas Kirsche, Hans Schuster, and Hartmut Wedekind. Architectural issues of distributed workflow management systems. In V. Malyshkin, editor, Parallel Computing Technologies. Third International Conference, PACT-95, Proceedings., Berlin, Germany, 1995. Springer-Verlag. A specific task of distributed and parallel information systems is workflow management. In particular, workflow management systems execute business processes that run on top of distributed and parallel information systems. Parallelism is due to performance requirements and involves data and applications that are spread across a heterogeneous, distributed computing environment. Heterogeneity and distribution of the underlying computing infrastructure should be made transparent in order to alleviate programming and use. We introduce an implementation architecture for workflow management systems that meets these requirements. Scalability (through transparent parallelism) and transparency with respect to distribution and heterogeneity are the major characteristics of this architecture. A generic client/server class library in an object-oriented environment demonstrates the feasibility of the approach.
	Sasa Buvac and Richard Fikes. A declarative formalization of knowledge translation. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Orkut Buyukkokten, Hector Garcia-Molina, and Andreas Paepcke. Accordion summarization for end-game browsing on pdas and cellular phones. In Proceedings of the Conference on Human Factors in Computing Systems CHI'01, 2000. We demonstrate a new browsing technique for devices with small displays such as PDAs or cellular phones. We concentrate on end-game browsing, where the user is close to or on the target page. We make browsing more efficient and easier by Accordion Summarization. In this technique the Web page is first represented as a short summary. The user can then drill down to discover relevant parts of the page. If desired, keywords can be highlighted and exposed automatically. We discuss our techniques, architecture, interface facilities, and the result of user evaluations. We measured a 57 improvement in browsing speed and 75 input effort.
	Orkut Buyukkokten, Hector Garcia-Molina, and Andreas Paepcke. Focused web searching with pdas. In Proceedings of the Ninth International World-Wide Web Conference, 2000. The Stanford Power Browser project addresses the problems of interacting with the World-Wide Web through wirelessly connected Personal Digital Assistants (PDAs). These problems include bandwidth limitations, screen real-estate shortage, battery capacity, and the time costs of pen-based search keyword input. As a way to address bandwidth and battery life limitations, we provide local site search facilities for all sites. We incrementally index Web sites in real time as the PDA user visits them. These indexes have narrow scope at first, and improve as the user dwells on the site, or as more users visit the site over time. We address the keyword input problem by providing site specific keyword completion, and indications of keyword selectivity within sites. The system is implemented on the Palm Pilot platform, using a Metricom radio link. We describe the user level experience, and then present the analyses that informed our technical decisions.
	Orkut Buyukkokten, Hector Garcia-Molina, and Andreas Paepcke. Seeing the whole in parts: Text summarization for web browsing on handheld devices. In 10th International WWW Conference, 2000. Available at http://dbpubs.stanford.edu/pub/2001-45. We introduce five methods for summarizing parts of Web pages on handheld devices, such as personal digital assistants (PDAs), or cellular phones. Each Web page is broken into text units that can each be hidden, partially displayed, made fully visible, or summarized. The methods accomplish summarization by different means. One method extracts significant keywords from the text units, another attempts to find each text unit's most significant sentence to act as a summary for the unit. We use information retrieval techniques, which we adapt to the World-Wide Web context. We tested the relative performance of our five methods by asking human subjects to accomplish single-page information search tasks using each method. We found that the combination of keywords and single-sentence summaries works best for a variety of search tasks.
	Orkut Buyukkokten, Hector Garcia-Molina, and Andreas Paepcke. Seeing the whole in parts: Text summarization for web browsing on handheld devices. In Proceedings of the Tenth International World-Wide Web Conference, 2001. Available at http://dbpubs.stanford.edu/pub/2001-45. We introduce five methods for summarizing parts of Web pages on handheld devices, such as personal digital assistants (PDAs), or cellular phones. Each Web page is broken into text units that can each be hidden, partially displayed, made fully visible, or summarized. The methods accomplish summarization by different means. One method extracts significant keywords from the text units, another attempts to find each text unit's most significant sentence to act as a summary for the unit. We use information retrieval techniques, which we adapt to the World-Wide Web context. We tested the relative performance of our five methods by asking human subjects to accomplish single-page information search tasks using each method. We found that the combination of keywords and single-sentence summaries works best for a variety of search tasks.
	Orkut Buyukkokten, Hector Garcia Molina, Andreas Paepcke, and Terry Winograd. Power browser: Efficient web browsing for pdas. In , editor, Proceedings of the Conference on Human Factors in Computing Systems CHI'00, 2000. We have designed and implemented new Web browsing facilities to support effective navigation on Personal Digital Assistants (PDAs) with limited capabilities: low bandwidth, small display, and slow CPU. The implementation supports wireless browsing from 3Com's Palm Pilot. An HTTP proxy fetches web pages on the client's behalf and dynamically generates summary views to be transmitted to the client. These summaries represent both the link structure and contents of a set of web pages, using information about link importance. We discuss the architecture, user interface facilities, and the results of comparative performance evaluations. We measured a 45 and a 42% reduction in required pen movements.
	Orkut Buyukokkten, Junghoo Cho, Hector Garcia-Molina, Luis Gravano, and Narayanan Shivakumar. Exploiting geographical location information of web pages. In Proceedings of Workshop on Web Databases (WebDB'99), June 1999. Held in conjunction with ACM SIGMOD'99. Available at http://dbpubs.stanford.edu/pub/1999-4. Many information sources on the web are relevant primarily to specific geographical communities. For instance, web sites containing information on restaurants, theatres and apartment rentals are relevant primarily to web users in geographical proximity to these locations. We make the case for identifying and exploiting the geographical location information of web sites so that web applications can rank information in a geographically sensitive fashion. For instance, when a user in Palo Alto issues a query for `Italian Restaurants,` a web search engine can rank results based on how close such restaurants are to the user's physical location rather than based on traditional IR measures. In this paper, we first consider how to compute the geographical location of web pages. Subsequently, we consider how to exploit such information in one specific `proof-of-concept` application we implemented in JAVA.
	Donald Byrd. A scrollbar-based visualization for document navigation. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. We are interested in questions of improving user control in best- match text-retrieval systems, specifically questions as to whether simple visualizations that nonetheless go beyond the minimal ones generally available can significantly help users. Recently, we have been investigating ways to help users decide-given a set of documents retrieved by a query-which documents and passages are worth closer examination. We built a document viewer incorporating a visualization centered around a novel content-displaying scrollbar and color term highlighting, and studied whether the visualization is helpful to non-expert searchers. Participants' reaction to the visualization was very positive, while the objective results were inconclusive.
	Donald Byrd. Music-notation searching and digital libraries. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. Almost all work on music information retrieval to date has concentrated on music in the audio and event (normally MIDI) domains. However, music in the form of notation, expecially Conventional Music Notation (CMN), is of much interest to musically-trained persons, both amateurs and professionals, and searching CMN has great value for digital music libraries. One obvious reason little has been done on music retrieval in CMN form is the overwhelming complexity of CMN, which requires a very substantial investment in programming before one can even begin studying music IR. This paper reports on work adding music-retrieval capabilities to Nightingale, an existing professional-level music-notation editor.
	Donald Byrd and Eric Isaacson. Music representation in a digital music library [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. The Variations2 digital music library currently supports music in audio and score-image formats. In a future version, we plan to add music in a symbolic form. This paper describes our work defining a music representation suitable for the needs of our users.
	Deng Cai, Xiaofei He, Ji-Rong Wen, and Wei-Ying Ma. Block-level link analysis. In SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 440-447, New York, NY, USA, 2004. ACM Press. Link Analysis has shown great potential in improving the performance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web page as a single node in the web graph. However, in most cases, a web page contains multiple semantics and hence the web page might not be considered as the atomic node. In this paper, the web page is partitioned into blocks using the vision-based page segmentation algorithm. By extracting the page-to-block, block-to-page relationships from link structure and page layout analysis, we can construct a semantic graph over the WWW such that each node exactly represents a single semantic topic. This graph can better describe the semantic structure of the web. Based on block-level link analysis, we proposed two new algorithms, Block Level PageRank and Block Level HITS, whose performances we study extensively using web data.
	Pßvel P. Calado, Marcos A.Gon‡alves, Edward A. Fox, Berthier Ribeiro-Neto, Alberto H. F. Laender, Altigran S.da Silva, Davi C.Reis, Pablo A. Roberto, Monique V. Vieira, and Juliano P. Lage. The web-dl environment for building digital libraries from the web. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. The Web contains a huge volume of unstructured data, which is difficult to manage. In digital libraries, on the other hand, information is explicitly organized, described, and managed. Community-oriented services are built to attend specific information needs and tasks. In this paper, we describe an environment, Web-DL, that allows the construction of digital libraries from the Web. The Web-DL environment will allow us to collect data from the Web, standardize it and publish it through a digital library system. It provides support to services and organizational structure normally available in digital libraries, but benefiting from the breadth of the Web contents. We experimented with applying the Web-DL environment to the Networked Digital Library of Theses and Dissertations (NDLTD), thus demonstrating that the rapid construction of DLs from the Web is possible. Also, Web-DL provides an alternative as a large-scale solution for interoperability between independent digital libraries.
	Jinwei Cao and Jr. Jay F. Nunamaker. Question answering on lecture videos: A multifaceted approach. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. In this paper, we introduce a multifaceted approach for question answering on lecture videos. Text extracted from PowerPoint slides associated with the lecture videos is used as a source of domain knowledge to boost the answer extraction performance on these domain specific videos. The three steps of this approach are described and the evaluation plan is discussed.
	Pei Cao, Jin Zhang, and Kevin Beach. Active cache: Caching dynamic contents on the web. In Proceedings of IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing (Middleware '98), pages 373-388, 1998.
	Stuart K. Card, George G. Robertson, and William York. The webbook and the web forager: An information workspace for the world-wide web. In Proceedings of the Conference on Human Factors in Computing Systems CHI'96, 1996.
	Michael J. Carey and Donald Kossmann. On saying `enough already!` in sql. In Proceedings of the International Conference on Management of Data, pages 219-230, Tucson, Arizona, 1997. ACM Press, New York.
	Michael J. Carey and Donald Kossmann. Reducing the braking distance of an sql query engine. In Proceedings of the Twenty-fourth International Conference on Very Large Databases, pages 158-169, New York City, USA, 1998. VLDB Endowment, Saratoga, Calif.
	Jeromy Carriere and Rick Kazman. Webquery: Searching and visualizing the web through connectivity. In Proceedings of the Sixth International World-Wide Web Conference, 1997.
	Chad Carson, Megan Thomas, Serge Belongie, Joseph M. Hellerstein, and Jitendra Malik. Blobworld: A system for region-based image indexing and retrieval. In Proceedings of the Third International Conference on Visual Information Systems, June 1999.
	Silvana Castano, Maria Grazia Fugini, Giancarlo Martella, and Pierangela Samarati. Database Security. Addison-Wesley, 1994. This is a comprehensive book on Database security. Chapter 1,2, and 3 describe information security, security models and security mechanisms and software from a general point of view. Chapter 4 gives a detail survey of Database security design. Chapter 5 explores the problem of security on statistical databases. Chapeter 6 describes different approaches in instrusion detection. Chapter 7 explores security models for next-generation databases (active db, oodb).
	Nohema Castellanos and Alfredo Sßnchez. Pops: Mobile access to digital library resources [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Mobile devices represent new opportunities for accessing digital libraries (DLs) but also pose a number of challenges given the diversity of their hardware and software features. We describe a framework aimed at facilitating the generation of interfaces for access to DL resources from a wide range of mobile devices.
	Donatella Castelli and Pasquale Pagano. A system for building expandable digital libraries. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Expandability is one of the main requirements of future digital libraries. This paper introduces a digital library service system, OpenDLib, that has been designed to be highly expandable both in terms of content, services and usage. The paper illustrate the mechanisms that enable expandability and discusses their impact on the development of the system architecture.
	Duncan Cavens, Stephen Sheppard, and Michael Meitner. Image database extension to arcview: How to find the photograph you want. In Proceedings of ESRI Users Conference, 2001.
	William B. Cavnar and Andrew M. Gillies. Data retrieval and the realities of document conversion. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (9K) . Audience: Semi-technical, general computer science. References: 5. Links: 1. Relevance: Low. Abstract: Discusses need for inexact matching, eg. OCR recognition errors. Proposes using N-grams, overlapping sequences of N adjacent letters as search target. Also research in matching in image of scanned documents (not doing OCR). Some results on mail sorting & census data.
	Augusto Celentano et al. Knowledge-based document retrieval in office environments: The kabiria system. ACM Transactions on Information Systems, 13(3):237-268, July 1995. In the office environment, the retrieval of documents is performed using the concepts contained in the documents, information about the procedural context where the documents are used, and information about the regulations and laws that discipline the life of documents within a given application domain. To fulfill the requirements of such a sophisticated retrieval, we propose a document retrieval model and system based on the representation of knowledge describing the semantic contents of documents, the way in which the documents are managed by procedures and by people in the office, and the application domain where the office operates. The article describes the knowledge representation issues needed for the document retrieval system and presents a document retrieval model that captures these issues. The effectiveness of the approach is illustrated by describing a system, named Kabiria, built on top of such model. The article describes the querying and browsing environments, and the architecture of the system.
	Stefano Ceri, Sara Comai, Ernesto Damiani, Piero Fraternali, Stefano Paraboschi, and Letizia Tanca. Xml-gl: A graphical language for wuerying and restructuring xml documents. In Proceedings of the Eighth International World-Wide Web Conference, 1999. The growing acceptance of XML as a standard for semi-structured documents on the Web opens up challenging opportunities for Web query languages. In this paper we introduce XML-GL, a graphical query language for XML documents. The use of a visual formalism for representing both the content of XML documents (and of their DTDs) and the syntax and semantics of queries enables an intuitive expression of queries, even when they are rather complex. XML-GL is inspired by G-log, a general purpose, logic-based language for querying structured and semi-structured data. The paper presents the basic capabilities of XML-GL through a sequence of examples of increasing complexity.
	Stefano Ceri and Giuseppe Pelagatti. Distributed Databases. McGraw-Hill, Inc., 1984. Textbook
	Common Gateway Interface (CGI). http://hoohoo.ncsa.uiuc.edu/cgi/overview.html.
	Wei Chai and Barry Vercoe. Structural analysis of musical signals for indexing and thumbnailing. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. A musical piece typically has a repetitive structure. Analysis of this structure will be useful for music segmentation, indexing and thumbnailing. This paper presents an algorithm that can automatically analyze the repetitive structure of musical signals. First, the algorithm detects the repetition of each segment of fixed length in a piece using dynamic programming. Second, the algorithm summarizes this repetition information and infers the structure based on heuristic rules. The performance of the approach is demonstrated visually using figures for qualitative evaluation, and by two structural similarity measures for quantitative evaluation. Based on the structural analysis result, this paper also proposes a method for music thumbnailing. The preliminary results obtained using a corpus of BeatlesÆ songs show that automatic structural analysis and thumbnailing of music are possible.
	S. Chakrabarti and S. Muthukrishnan. Resource scheduling for parallel database and scientific applications. In 8th ACM Symposium on Parallel Algorithms and Architectures, pages 329-335, June 1996.
	Soumen Chakrabarti, Byron Dom, David Gibson, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Spectral filtering for resource discovery. In ACM SIGIR workshop on Hypertext Information Retrieval on the Web, 1998.
	Soumen Chakrabarti, Byron Dom, and Piotr Indyk. Enhanced hypertext categorization using hyperlinks. In Proceedings of the International Conference on Management of Data, 1998.
	Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, and Jon Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of the Seventh International World-Wide Web Conference, 1998.
	Soumen Chakrabarti, David A. Gibson, and Kevin S. McCurley. Surfing the web backwards. In Proceedings of the Eighth International World-Wide Web Conference, 1999. From a user's perspective, hypertext links on the Web form a directed graph between distinct information sources. We investigate the effects of discovering `backlinks' from Web resources, namely links pointing to the resource. We describe fools for backlink navigation on both the client and server side, using an applet for the client and a module for the Apache Web server, We also discuss possible extensions to the HTTP protocol to facilitate the collection and navigation of backlink information in the World Wide Web.
	Soumen Chakrabarti, Martin van den Berg, and Byron Dom. Focused crawling: A new approach to topic-specific web resource discovery. In Proceedings of the Eighth International World-Wide Web Conference, 1999. The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified not using keywords, but using exemplary documents. Rather than collecting and indexing all accessible Web documents to be able to answer all possible ad-hoc queries, a focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl, and avoids irrelevant regions of the Web. This leads to significant savings in hardware and network resources, and helps keep the crawl more up-to-date. To achieve such goal-directed crawling, we designed two hypertext mining programs that guide our crawler: a classifier that evaluates the relevance of a hypertext document with respect to the focus topics, and a distiller that identifies hypertext nodes that are great access points to many relevant pages within a few links. We report on extensive focused-crawling experiments using several topics at different levels of specificity. Focused crawling acquires relevant pages steadily while standard crawling quickly loses its way, even though they are started from the same root set. Focused crawling is robust against large perturbations in the starting set of URLs. It discovers largely overlapping sets of resources in spite of these perturbations. It is also capable of exploring out and discovering valuable resources that are dozens of links away from the start set, while carefully pruning the millions of pages that may lie within this same radius. Our anecdotes suggest that focused crawling is very effective for building high-quality collections of Web documents on specific topics, using modest desktop hardware.
	Jim Challenger, Paul Dantzig, and Arun Iyengar. A scalable system for consistently caching dynamic web data. In Proceedings of the 18th Annual Joint Conference of the IEEE Computer and Communications Societies, New York, New York, 1999.
	Jim Challenger, Arun Iyengar, Karen Witting, Cameron Ferstat, and Paul Reed. A publishing system for efficiently creating dynamic web content. In Proceedings of IEEE INFOCOM 2000, Tel Aviv, Israel, 2000.
	M. Chalmers, K. Rodden, and D. Brodbeck. The order of things: activity-centered information access. In Proceedings of the 7th World Wide Web Conference, 1998. This paper focuses on the representation and access of Web-based information, and how to make such a representation adapt to the activities or interests of individuals within a community of users. The heterogeneous mix of information on the Web restricts the coverage of traditional indexing techniques and so limits the power of search engines. In contrast to traditional methods, and in a way that extends collavotaive filtering approaches, the path model centers representation on usage histories rather than content analysis. By putting activity at the center of representation and not the periphery, the path model concentrates on the reader not the author and the brower not the site. We describe metrics of similarity based on the path model, and their application in a URL recommender tool and in visualising sets of URLs.
	Leslie Champeny, Christine L. Borgman, Patricia Mautone, Richard E. Mayer, Richard A. Johnson, Gregory H. Leazer, Anne J. Gilliland-Swetland, Kelli A. Millwood, Leonard D'Avolio, Jason Finley, and Laura J. Smart. Developing a digital learning environment: an evaluation of design and implementation processes. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. The Alexandria Digital Earth Prototype (ADEPT) Project (1999-2004) builds upon the Alexandria Digital Library Project (1994-99) to add functions and services for undergraduate teaching to a digital library of geospatial resources. The Digital Learning Environment (DLE) services are being developed and evaluated iteratively over the course of this research project. In the 2002-2003 academic year, the DLE was implemented in stages during the fall and spring terms in undergraduate geography courses at the University of California, Santa Barbara (UCSB). Evaluation of the fall term implementation identified design issues of time and complexity of use in the services for creating and organizing course domain knowledge. By the time of the spring term implementation, these issues were addressed and new services added for integrating selected course content into a variety of class presentation formats. The implementation was evaluated via interviews with the course instructor, development staff, and students, and by observations (in person and videotaped) of the course. Results of the iterative evaluation indicated that usability and functionality for the instructor had increased between the two course offerings. Students found classroom presentations to be useful for understanding concepts, and Web access to the presentations useful for study and review. Assessments of student learning suggest modest improvements over time. Developers are now applying lessons learned during these implementations to improve the system for subsequent implementation in the 2003-04 academic year.
	Alvin T.S. Chan. Web-enabled smart card for ubiquitous access of patient's medical record. In Proceedings of the Eighth International World-Wide Web Conference, 1999. The combined benefits of smart card to support mobility in a pocket coupled with the ubiquitous access of Web technology, present a new paradigm for medical information access systems. The paper describes the framework of Java Card Web Servlet (JCWS) that is being developed to provide seamless access interface between a Web browser and a Java-enabled smart card. Importantly, the smart card is viewed as a mobile repository of Web objects comprised of HTML pages, medical data objects, and record browsing and updating applet. As the patient moves between hospitals, clinics and countries, the mobility of the smart-card database dynamically binds to the JCWS framework to facilitate a truly ubiquitous access and updating of medical information via a standard Web-browser interface.
	Chen-Chuan K. Chang and Hector Garcia-Molina. Evaluating the cost of boolean query mapping. In Proceedings of the Second ACM International Conference on Digital Libraries, 1997. At http://dbpubs.stanford.edu/pub/1997-25.
	Chen-Chuan K. Chang and Hector Garcia-Molina. Conjunctive constraint mapping for data translation. Technical Report SIDL-WP-1998-0083; 1998-47, Stanford University, January 1998. Accessible at http://dbpubs.stanford.edu/pub/1998-47.
	Chen-Chuan K. Chang and Héctor García-Molina. Mind your vocabulary: Query mapping across heterogeneous information sources. In Proceedings of the International Conference on Management of Data, pages 335-346, Philadelphia, Pa., June 1999. ACM Press, New York.
	Chen-Chuan K. Chang, Héctor García-Molina, and Andreas Paepcke. Boolean query mapping across heterogeneous information sources. IEEE Transactions on Knowledge and Data Engineering, 8(4):515-521, Aug 1996. Very technical, formal description of query translation. But has the architecture picture.
	Chen-Chuan K. Chang, Héctor García-Molina, and Andreas Paepcke. Boolean query mapping across heterogeneous information sources (extended version). Technical Report SIDL-WP-1996-0044; 1996-1, Dept. of Computer Science, Stanford Univ., Stanford, California, Sep 1996. Accessible at http://dbpubs.stanford.edu/pub/1996-1). Extend version of the paper of the same title appeared in TKDE Aug. 1996
	Chen-Chuan K. Chang, Héctor García-Molina, and Andreas Paepcke. Predicate rewriting for translating boolean queries in a heterogeneous information system. ACM Transactions on Information Systems, 17(1):1-39, January 1999. Available at http://dbpubs.stanford.edu/pub/1999-34. Searching over heterogeneous information sources is difficult in part because of the nonuniform query languages. Our approach is to allow users to compose Boolean queries in one rich front-end language. For each user query and target source, we transform the user query into a subsuming query that can be supported by the source but that may return extra documents. The results are then processed by a filter query to yield the correct final results. In this article we introduce the architecture and associated mechanism for query translation. In particular, we discuss techniques for rewriting predicates in Boolean queries into native subsuming forms, which is a basis of translating complex queries. In addition, we present experimental results for evaluating the cost of postfiltering. We also discuss the drawbacks of this approach and cases when it may not be effective. We have implemented prototype versions of these mechanisms and demonstrated them on heterogeneous Boolean systems.
	Chew-Hung Chang, John G Hedberg, Yin-Leng Theng, Ee-Peng Lim, Tiong-Sa Teh, and Dion Hoe-Lian Goh. Evaluating g-portal for geography learning and teaching. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. This paper describes G-Portal, a geospatial digital library of geographical assets, providing an interactive platform to engage students in active manipulation and analysis of information resources and collaborative learning activities. Using a G-Portal application in which students conducted a field study of an environmental problem of beach erosion and sea level rise, we described a pilot study to evaluate usefulness and usability issues in supporting geography learning, and in turn teaching.
	Chia-Hui Chang and Ching-Chi Hsu. Customizable multi-engine search tool with clustering. In Proceedings of the Sixth International World-Wide Web Conference, 1997.
	Edward Chang. An image coding and reconstruction scheme for mobile computing. In Proceedings of the 5th IDMS (Springer-Verlag LNCS 1483), p.137- 148, Oslo, Norway, September 1998., 1998. Accessible at http://dbpubs.stanford.edu/pub/1997-10. An asynchronous transfer mode (ATM) wireless network has bursty and high error rates. To combat the contiguous bit loss due to damaged or dropped packets, this paper presents a code packetization and image reconstruction scheme. The packetization method distributes the loss in both frequency and spatial domains to reduce the chance that adjacent DCT blocks lose the same frequency components. The image reconstruction takes into consideration the spatial characteristics represented by the frequency components. Combining these two approaches is able to reconstruct the damaged images more accurately, even under very high loss rates. In addition, since the reconstruction technique is computational efficient, it conserves system resources and power consumption, which are restrictive in mobile computers.
	Edward Chang and Hector Garcia-Molina. Minimizing memory requirements in media servers. Technical Report SIDL-WP-1996-0045; 1996-4, Stanford University, December 1996.
	Edward Chang and Héctor García-Molina. Reducing initial latency in a multimedia storage system. In Third International Workshop of Multimedia Database Systems, 1996. A multimedia server delivers presentations (e.g., videos, movies, providing high bandwidth and continuous real-time deliveryIn this paper we present techniques for reducing the initial latency of presentations, i.e., for reducing the time between the arrival of a request and the start of the presentation. Traditionally, initial latency has not received much attention. This is because one major application of multimedia servers is movies on demand where a delay of a few minutes before a new multi-hour movie starts is acceptable. However , latency reduction is important in interactive applications such as video games and browsing of multimedia documents. V arious latency reduction schemes are proposed and analyzed, and their performance compared. We show that our techniques can signicantly reduce (almost eliminate in some cases) initial latency without adversely affecting throughput. Moreover , a novel on-disk partial data replication scheme that we propose proves to be far more cost effective than any other previous attempts at reducing initial latency. Keywords: multimedia, data placement, data replication.
	Edward Chang and Hector Garcia-Molina. Effective memory use in a media server. In Proceedings of the 23rd Very Large Data Base (VLDB) Conference, 1997.
	Edward Chang and Hector Garcia-Molina. Medic: A memory & disk cache for multimedia clients. Technical Report SIDL-WP-1997-0076; 1997-9, Stanford University, October 1997.
	Edward Chang and Hector Garcia-Molina. Reducing initial latency in media servers. In IEEE Multimedia, volume 4, 1997.
	Edward Chang and Hector Garcia-Molina. Cost-based media server design. In To appear in the proceedings of the 8th Research Issues in Data Engineering, Feb 1998.
	Michelle Chang, John J. Leggett, Richard Furuta, Andruid Kerne, J. Patrick Williams, Samuel A. Burns, and Randolph G. Bias. Collection understanding. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Collection understanding shifts the traditional focus of retrieval in large collections from locating specific artifacts to gaining a comprehensive view of the collection. Visualization tools are critical to the process of efficient collection understanding. By presenting simple visual interfaces and intuitive methods of interacting with a collection, users come to understand the essence of the collection by focusing on the artifacts. This paper discusses a practical approach for enhancing collection understanding in image collections.
	Yee-Hsiang Chang and Ellis Chi. Htgraph: A new method for information access over the world wide web. In DAGS '95, 1995. Format: HTML Document (27K + pictures). Audience: Web surfers and computer scientists. References: 11. Links: 0. Relevance: Low. Abstract: Describes a browser which prefetches pages, and builds a graph showing the relationships of those pages, allows you to jump down in the hierarchy. User specified cutoff for how many nodes should be expanded. No atte mpt to automatically cluster. Describes naive data structures to implement a breadth first search of the space.
	Mitchell N. Charity. Multiple standards? no problem. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (6K) . Audience: Non-technical, standards committee membets. References: 0. Links: 1. Relevance: Medium-low. Abstract: Argues for an IETF rather than ISO model of standards committee. Encouraging several different protocols with gateways being constructed as needed, and generally letting the marketplace determine what survives.
	Michael Chau. Personalized spiders for web search and analysis. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. Searching for useful information on the World Wide Web has become increasingly difficult. While Internet search engines have been helping people to search on the web, low recall rate and outdated indexes have become more and more problematic as the web grows. In addition, search tools usually present to the user only a list of search results, failing to provide further personalized analysis which could help users identify useful information and comprehend these results. To alleviate these problems, we propose a client-based architecture that incorporates noun phrasing and self-organizing map techniques. Two systems, namely CI Spider and Meta Spider, have been built based on this architecure. User evaluation studies have been conducted and the findings suggest that the proposed architecture can effectively facilitate web search and analysis.
	Michael Chau, Hsinchun Chen, Jialun Qin, Yilu Zhou, Yi Qin, Wai-Ki Sung, and Daniel McDonald. Comparison of two approaches to building a vertical search tool: A case in the nanotechology domain. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. As the Web has been growing exponentially, it has become increasingly difficult to search for desired information. In recent years, many domain-specific (vertical) search tools have been developed to serve the information needs of specific fields. This paper describes two approaches to building a domain-specific search tool. We report our experience in building two different tools in the nanotechnology domain - (1) a server-side search engine, and (2) a client-side search agent. The designs of the two search systems are presented and discussed, and their strengths and weaknesses are compared. To our knowledge, this paper is the first to compare prototype vertical search systems built by the two different approaches. Some future research directions are also discussed.
	Michael Chau, Jialun Qin, Yilu Zhou, Chunju Tseng, and Hsinchun Chen. Spidersrus: Automated development of vertical search engines in different domains and languages. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. In this paper we discuss the architecture of a tool designed to help users develop vertical search engines in different domains and different languages. The design of the tool is presented and an evaluation study was conducted, showing that the system is easier to use than other existing tools.
	Surajit Chaudhuri. Finding nonrecursive envelopes for datalog predicates. In Proceedings of the 12th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 135-146, Washingtion, D.C., 1993. ACM Press, New York.
	Surajit Chaudhuri and Phokion G. Kolaitis. Can datalog be approximated? In Proceedings of the 13rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 86-96, Minneapolis, Minn., 1994. ACM Press, New York.
	Francine Chen, Marti Hearst, Julian Kupiec, Jan Pedersen, and Lynn Wilcox. Mixed-media access. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (8K) . Audience: Researchers, esp. in the area of multi-media searching. References: 8. Links: 1. Relevance: Low-Medium. Abstract: Essentially a set of pointers to Xerox PARC reports. Describes projects related to scatter/gather, automatic segmenting, `keyword` search equivalents for audio & video, and summarization.
	Guanling Chen and David Kotz. A survey of context-aware mobile computing research. Technical Report TR2000-381, Dartmouth College, 2000.
	M. Chen, M. Hearst, J. Hong, and J. Lin. Cha-cha: A system for organizing intranet search results. In Proceedings of the second USENIX Symposium on Internet Technologies and SYSTEMS (USITS), 1999.
	Yixin Chen and James Z. Wang. A region-based fuzzy feature matching approach to content-based image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 24(9):1252-1267, 2002.
	Yuan Chen, Jan Edler, Andrew Goldberg, Allan Gottlieb, Sumeet Sobti, and Peter Yianilos. A prototype implementation of archival intermemory. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. An Archival Intermemory solves the problem of highly survivable digital data storage in the spirit of the Internet. In this paper we describe a prototype implementation of Intermemory, including an overall system architecture and implementations of key system components. The result is a working Intermemory that tolerates up to 17 simultaneous node failures, and includes a Web gateway for browser-based access to data. Our work demonstrates the basic feasibility of Intermemory and represents significant progress towards a deployable system.
	S. Cheshire and M. Baker. Internet mobility 4x4. In Proceedings of the ACM SIGCOMM'96 Conference, Aug 1996.
	S. Cheshire and M. Baker. A wireless network in mosquitonet. IEEE Micro, Feb 1996.
	David Chesnutt. The model editions partnership: Historical editions in the digital age. D-Lib Magazine, Nov 1995. Format: HTML Document().
	Ed H. Chi, James Pitkow, Jock Mackinlay, Peter Pirolli, Rich Gossweiler, and Stuart K. Card. Visualizing the evolution of web ecologies. In Proceedings of the Conference on Human Factors in Computing Systems CHI'98, 1998.
	Boris Chidlovskii, Claudia Roncancio, and Marie-Luise Schneider. Semantic cache mechanism for heterogeneous web querying. In Proceedings of the Eighth International World-Wide Web Conference, 1999. In Web-based searching systems that access distributed information providers, efficient query processing requires an advanced caching mechanism to reduce the query response time. The keyword-based querying is often the only way to retrieve data from Web providers, and therefore standard page-based and tuple- based caching mechanisms turn out to be improper for such a task. In this work, we develop a mechanism for efficient caching of Web queries and the answers received from heterogeneous Web providers. We also report results of experiments and show how the caching mechanism is implemented in the Knowledge Broker system.
	Boris Childovskii. Schema extraction from xml collections. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. XML Schema language has been proposed to replace Document Type Definitions (DTDs) as schema mechanism for XML data. This language consistently extends grammar-based constructions with constraint- and pattern-based ones and have a higher expressive power than DTDs. As schemas remain optional for XML, we address the problem of XML Schema extraction. We model the XML schema as extended context-free grammars and develop a novel extraction algorithm inspired by methods of grammatical inference. The algorithm copes also with the schema determinism requirement imposed by XML DTDs and XML Schema languages.
	R. Chimera, K. Wolman, S. Mark, and B. Shneiderman. An exploratory evaluation of three interfaces for browsing large hierarchical tables of contents. In ACM Transactions on Information Systems, 12, 4, pages 383-406, 1994.
	Junghoo Cho and Hector Garcia-Molina. Estimating frequency of change. In submitted for publication, 2000. Available at http://dbpubs.stanford.edu/pub/2000-4.
	Junghoo Cho and Hector Garcia-Molina. The evolution of the web and implications for an incremental crawler. In Proceedings of the Twenty-sixth International Conference on Very Large Databases, 2000. Available at http://dbpubs.stanford.edu/pub/1999-22. In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of periodically refreshing the collection in batch mode. The incremental crawler can improve the ``freshness'' of the collection significantly and bring in new pages in a more timely manner. We first present results from an experiment conducted on more than half million web pages over 4 months, to estimate how web pages evolve over time. Based on these experimental results, we compare various design choices for an incremental crawler and discuss their trade-offs. We propose an architecture for the incremental crawler, which combines the best design choices.
	Junghoo Cho and Hector Garcia-Molina. Synchronizing a database to improve freshness. In Proceedings of the International Conference on Management of Data, 2000. Available at http://dbpubs.stanford.edu/pub/1999-40. In this paper we study how to refresh a local copy of an autonomous data source to maintain the copy up-to-date. As the size of the data grows, it becomes more difficult to maintain the copy `fresh,` making it crucial to synchronize the copy effectively. We define two freshness metrics, change models of the underlying data, and synchronization policies. We analytically study how effective the various policies are. We also experimentally verify our analysis, based on data collected from 270 web sites for more than 4 months, and we show that our new policy improves the `freshness` very significantly compared to current policies in use.
	Junghoo Cho, Hector Garcia-Molina, and Lawrence Page. Efficient crawling through url ordering. In Proceedings of the Seventh International World-Wide Web Conference, 1998. Available at http://dbpubs.stanford.edu/pub/1998-51. In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more `important` pages first. Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time. We define several importance metrics, ordering schemes, and performance evaluation measures for this problem. We also experimentally evaluate the ordering schemes on the Stanford University Web. Our results show that a crawler with a good ordering scheme can obtain important pages significantly faster than one without.
	Junghoo Cho, Narayanan Shivakumar, and Hector Garcia-Molina. Computing document clusters on the web. In Proceedings of the International Conference on Management of Data, 1998. They crawl the Web and automatically find out which sites completely or partially mirror each other.
	Junghoo Cho, Narayanan Shivakumar, and Hector Garcia-Molina. Finding replicated web collections. In Proceedings of the International Conference on Management of Data, 2000. Available at http://dbpubs.stanford.edu/pub/1999-64. Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times. In this paper, we make the case for identifying replicated documents and collections to improve web crawlers, archivers, and ranking functions used in search engines. The paper describes how to efficiently identify replicated documents and hyperlinked document collections. The challenge is to identify these replicas from an input data set of several tens of millions of web pages and several hundreds of gigabytes of textual data. We also present two real-life case studies where we used replication information to improve a crawler and a search engine. We report these results for a data set of 25 million web pages (about 150 gigabytes of HTML data) crawled from the web
	Michael G. Christel and Ronald M. Conescu. Addressing the challenge of visual information access from digital image and video libraries. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. While it would seem that digital video libraries should benefit from access mechanisms directed to their visual contents, years of TREC Video Retrieval Evaluation (TRECVID) research have shown that text search against transcript narrative text provides almost all the retrieval capability, even with visually oriented generic topics. A within-subjects study involving 24 novice participants on TRECVID 2004 tasks again confirms this result. The study shows that satisfaction is greater and performance is significantly better on specific and generic information retrieval tasks from news broadcasts when transcripts are available for search. Additional runs with 7 expert users reveal different novice and expert interaction patterns with the video library interface, helping explain the novices’ lack of success with image search and visual feature browsing for visual information needs. Analysis of TRECVID visual features well suited for particular generic tasks provides additional insights into the role of automated feature classification for digital image and video libraries.
	Michael G. Christel, Bryan Maher, and Andrew Begun. Xslt for tailored access to a digital video library. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. Surrogates, summaries, and visualizations have been developed and evaluated for accessing a digital video library containing thousands of documents and terabytes of data. These interfaces, formerly implemented within a monolithic stand-alone application, are being migrated to XML and XSLT for delivery through web browsers. The merits of these interfaces are presented, along with a discussion of the benefits in using W3C recommendations such as XML and XSLT for delivering tailored access to video over the web.
	V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl. From structured documents to novel query facilities. In Proceedings of the International Conference on Management of Data, pages 313-324. ACM Press, New York, 1994.
	Wesley W. Chu, M. A. Merzbacher, and L. Berkovich. The design and implementation of cobase. In Proceedings of the International Conference on Management of Data, pages 517-522, Washington, D.C., 1993. ACM Press, New York.
	Yi-Chun Chu, David Bainbridge, Matt Jones, and Ian H. Witten. Realistic books: A bizarre homage to an obsolete medium? In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. For many readers, handling a physical book is an enjoyably exquisite part of the information seeking process. Many physical characteristics of a book its size, heft, the patina of use on its pages and so on communicate ambient qualities of the document it represents. In contrast, the experience of accessing and exploring digital library documents is dull. The emphasis is utilitarian; technophile rather than bibliophile. We have extended the page-turning algorithm we reported at last year's JCDL into a scaleable, systematic approach that allows users to view and interact with realistic visualizations of any textual-based document in a Greenstone collection. Here, we further motivate the approach, illustrate the system in use, discuss the system architecture and present a user evaluation. Our work leads us to believe that far from being a whimsical gimmick, physical book models can usefully complement conventional document viewers and increase the perceived value of a digital library system.
	Yi-Chun Chu, Ian H. Witten, Richard Lobb, and David Bainbridge. How to turn the page [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Can digital libraries provide a reading experience that more closely resembles a real book than a scrolled or paginated electronic display? This paper describes a prototype page-turning system that realistically animates full three-dimensional page-turns. The dynamic behavior is generated by a mass-spring model defined on a rectangular grid of particles. The prototype takes a PDF or E-book file, renders it into a sequence of PNG images representing individual pages, and animates the pageturns under user control. The simulation behaves fairly naturally, although more computer graphics work is required to perfect it.
	Lian-Heong Chua, Dion Hoe-Lian Goh, Ee-Peng Lim, Zehua Liu, and Rebecca Pei-Hui Ang. A digital library for geography examination resources. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We describe a Web-based application developed above a digital library of geographical resources for Singapore students preparing to take a national examination in geography. The application provides an interactive, non-sequential approach to learning that supplements textbooks.
	Y-Ming Chung, Qin He, Kevin Powell, and Bruce Schatz. Semantic indexing for a complete subject discipline. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. As part of the Illinois Digital Library Initiative (DLI) project we developed `scalable semantics` technologies. These statistical techniques enabled us to index large collections for deeper search than word matching. Through the auspices of the DARPA Information Management program, we are developing an integrated analysis environment, the Interspace Prototype, that uses `semantic indexing` as the foundation for supporting concept navigation. These semantic indexes record the contextual correlation of noun phrases, and are computed generically, independent of subject domain. Using this technology, we were able to compute semantic indexes for a subject discipline. In particular, in the summer of 1998, we computed concept spaces for 9.3M MEDLINE bibliographic records from the National Library of Medicine (NLM) which extensively covered the biomedical literature for the period from 1966 to 1997. In this experiment, we first partitioned the collection into smaller collections (repositories) by subject, extracted noun phrases from titles and abstracts, then performed semantic indexing on these subcollections by creating a concept space for each repository. The computation required 2 days on a 128-node SGI/CRAY Origin 2000 at the National Center for Supercomputer Ap- plications (NCSA). This experiment demonstrated the feasibility of scalable semantics techniques for large collections. With the rapid increase in computing power, we believe this indexing technology will shortly be feasible on personal computers.
	Marcus Tullius Cicero. De Oratione. Loeb Classical Library, 55 B.C. Book II, sec. 350ff.
	W. V. Citrin and M. D. Gross. Pda-based graphical interchange for field service and repair workers. Computers & Graphics, vol.20, no.5, p. 641-9, 20(5):641-9, 1996. We present an ongoing project to develop a system to provide field service workers with timely and accurate service information. The system will allow workers to download diagrams or photographs from a host computer's central database onto a PDA. The workers will be able to annotate the diagrams to reflect work performed, and later upload the annotations to the host computer, where they will be integrated into an updated database. Diagram recognition functionality is distributed between the PDA (which performs low-level shape and handwriting recognition) and the host computer (which performs high-level domain-based diagram recognition). Distributing the functionality offers a number of advantages: it allows the relatively resource-poor PDA to be part of a powerful diagram recognition environment, it allows the use of standardized hardware-based recognition facilities in a domain-based recognition system, and it allows off-line drawing recognition and storage of diagrams, thereby avoiding excessive use of slow or expensive communications channels.
	Edith Cohen, Haim Kaplan, and Jeffrey Oldham. Managing tcp connections under persistent http. In Proceedings of the Eighth International World-Wide Web Conference, 1999. Hyper Text Transfer Protocol (HTTP) traffic dominates Internet traffic. The exchange of HTTP messages is implemented using the connection-oriented TCP. HTTP/l.0 establishes a new TCP connection for each HTTP request, resulting in many consecutive short-lived TCP connections. The emerging HTTP/ 1.1 reduces latencies and overhead from closing and re-establishing connections by supporting persistent connections as a default. A TCP connection which is kept open and reused for the next HTTP request reduces overhead and latency. Open connections, however, consume sockets and memory for socket-buffers. This trade-off establishes a need for connection-management policies. We propose policies that exploit embedded information in the HTTP request messages, e.g., senders' identities and requested URLs. and compare them to the fixed-timeout policy used in the current implementation of the Apache Web server. An experimental evaluation of connection management policies at Web servers, conducted using Web server logs. shows that our URL-based policy consistently outperforms other policies. and achieves significant 15-25with respect to the fixed-timeout policy. Hence, allowing Web servers and clients to more fully reap the benefits of persistent HTTP.
	William W. Cohen and Wei Fan. Learning page-independent heuristics for extracting data from web pages. In Proceedings of the Eighth International World-Wide Web Conference, 1999. One bottleneck in implementing a system that intelligently queries the Web is developing `wrappers' - programs that extract data from Web pages. Here we describe a method for learning general, page-independent heuristics for extracting data from HTML documents. The input to our learning system is a set of working wrapper programs, paired with HTML pages they correctly wrap. The output is a general procedure for extracting data that works for many formats and many pages. In experiments with a collection of 84 constrained but realistic extraction problems, we demonstrate that 30problems can be handled perfectly by learned extraction heuristics, and around 50handled acceptably. We also demonstrate that learned page-independent extraction heuristics can substantially improve the performance of methods for learning page-specific wrappers.
	Tammara T.A. Combs and Benjamin B. Bederson. Does zooming improve image browsing? In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. We describe an image retrieval system we built based on a Zoomable User Interface (ZUI). We also discuss the design, results and analysis of a controlled experiment we performed on the browsing aspects of the system. The experiment resulted in a statistically significant difference in the interaction between number of images (25, 75, 225) and style of browser (2D, ZUI, 3D). The 2D and ZUI browser systems performed equally, and both performed better than the 3D systems. The image browsers tested during the experiment include Cerious Software's Thumbs Plus, TriVista Technology's Simple Landscape and Photo GoRound, and our Zoomable Image Browser based on Pad++.
	Jeff Conklin and Michael L. Begeman. gIBIS: A hypertext tool for exploratory policy discussion. In Proceedings of the Conference on Computer-Supported Cooperative Work, CSCW'88, 1988.
	Paul Conway. Yale university library's project open book: Preliminary research findings. D-Lib Magazine, Feb 1996. Format: HTML Document().
	Brian Cooper, Mayank Bawa, Neil Daswani, and Hector Garcia-Molina. Protecting the pipe from malicious peers. Technical Report 2002-97, Stanford University, 2002. Digital materials can be protected from failures by replicating them at multiple autonomous, distributed sites. A significant challenge in such a distributed system is ensuring that documents are replicated and accessible despite malicious sites. Such sites may hinder the replication of documents in a variety of ways, including agreeing to store a copy but erasing it instead, refusing to serve a document, or serving an altered version of the document. We describe the design of a a Peer-to-peer Information Preservation and Exchange (PIPE) network: a distributed replication system that protects documents both from failures and from malicious nodes. We present the design of a PIPE system, discuss a threat model for malicious sites, and propose basic solutions for managing these malicious sites.
	Brian Cooper, Mayank Bawa, Neil Daswani, and Hector Garcia-Molina. Protecting the pipe from malicious peers. Technical Report 2002-03, Stanford University, 2002. Digital materials can be protected from failures by replicating them at multiple autonomous, distributed sites. A Peer-to-peer Information Preservation and Exchange (PIPE) network is a good way to build a distributed replication system. A significant challenge in such networks is ensuring that documents are replicated and accessible despite malicious sites. Such sites may hinder the replication of documents in a variety of ways, including agreeing to store a copy but erasing it instead, refusing to serve a document, or serving an altered version of the document. We define a model of PIPE networks, a threat model for malicious sites, and propose basic solutions for managing these malicious sites. The basic solutions are inefficient, but demonstrate that a secure system can be built. We also sketch ways to improve the efficiency of the system.
	Brian Cooper, Arturo Crespo, and Hector Garcia-Molina. Implementing a reliable digital object archive. In Submitted for publication, 2000. Available at http://dbpubs.stanford.edu/pub/2000-27. An Archival Repository reliably stores digital objects for long periods of time (decades or centuries). The archival nature of the system requires new techniques for storing, indexing, and replicating digital objects. In this paper we discuss the specialized indexing needs of a write-once archive. We also present a reliability algorithm for effectively replicating sets of related objects. We describe an administrative user interface and a data import utility for archival repositories. Finally, we discuss and evaluate a prototype repository we have built, the Stanford Archival Vault, SAV.
	Brian Cooper and Hector Garcia-Molina. Infomonitor: Unobtrusively archiving a world wide web server. In Submitted for publication, 2000. Available at http://dbpubs.stanford.edu/pub/2000-15. It may be important to provide long-term preservation of digital data even when that data is stored in an unreliable system, such as a filesystem, a legacy database, or even the World Wide Web. In this research paper we focus on the problem of archiving the contents of a web site without disrupting users who maintain the site. We propose an archival storage system, the InfoMonitor, in which a reliable archive is integrated with an unmodified existing store. Implementing such a system presents various challenges related to the mismatch of features between the components, such as differences in naming and data manipulation operations. We examine each of these issues as well as solutions for the conflicts that arise. We also discuss our experience using the InfoMonitor to archive the Stanford Database Group's web site.
	Brian Cooper and Hector Garcia-Molina. Peer to peer data trading to preserve information. Technical Report 2000-33, Stanford University, 2000. Data archiving systems rely on replication to preserve information. In this paper, we discuss how a network of autonomous archiving sites can trade data to achieve the most reliable replication. A series of binary trades between sites produces a peer to peer archiving network. We examine two trading algorithms, one based on trading collections (even if they are different sizes) and another based on trading equal sized blocks of space (which can then store collections.) We introduce the concept of deeds, which track the sites that own space at other sites. We then discuss policies for tuning these algorithms to provide the highest reliability, for example by changing the order in which sites are contacted and offered trades. Finally, we present simulation results that reveal which policies are most reliable.
	Brian Cooper and Hector Garcia-Molina. Peer to peer data trading to preserve information (extended version). Technical Report 2000-38, Stanford University, 2000. Data archiving systems rely on replication to preserve information. In this paper, we discuss how a network of autonomous archiving sites can trade data to achieve the most reliable replication. A series of binary trades between sites produces a peer to peer archiving network. We examine two trading algorithms, one based on trading collections (even if they are different sizes) and another based on trading equal sized blocks of space (which can then store collections.) We introduce the concept of deeds, which track the sites that own space at other sites. We then discuss policies for tuning these algorithms to provide the highest reliability, for example by changing the order in which sites are contacted and offered trades. Finally, we present simulation results that reveal which policies are most reliable.
	Brian Cooper and Hector Garcia-Molina. Bidding for storage space in a peer-to-peer data preservation system. Technical Report 2001-52, Stanford University, 2001. Digital archives protect important data collections from failures by making multiple copies at other archives, so that there are always several good copies of a collection. In a cooperative replication network, sites ``trade'' space, so that each site contributes storage resources to the system and uses storage resources at other sites. Here, we examine bid trading: a mechanism where sites conduct auctions to determine who to trade with. A local site wishing to make a copy of a collection announces how much remote space is needed, and accepts bids for how much of its own space the local site must ``pay'' to acquire that remote space. We examine the best policies for determining when to call auctions and how much to bid, as well as the effects of ``maverick'' sites that attempt to subvert the bidding system. Simulations of auction and trading sessio ns indicate that bid trading can allow sites to achieve higher reliability than the alternative: a system where sites trade equal amounts of space without bidding.
	Brian Cooper and Hector Garcia-Molina. Creating trading networks of digital archives. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. Digital archives can best survive failures if they have made several copies of their collections at remote sites. In this paper, we discuss how autonomous sites can cooperate to provide preservation by trading data. We examine the decisions that an archive must make when forming trading networks, such as the amount of storage space to provide and the best number of partner sites. We also deal with the fact that some sites may be more reliable than others. Experimental results from a data trading simulator illustrate which policies are most reliable. Our techniques focus on preserving the ``bits'' of digital collections; other services that focus on other archiving concerns (such as preserving meaningful metadata) can be built on top of the system we describe here.
	Brian Cooper and Hector Garcia-Molina. Creating trading networks of digital archives. Technical Report 2001-04, Stanford University, 2001. Digital archives can best survive failures if they have made several copies of their collections at remote sites. In this paper, we discuss how autonomous sites can cooperate to provide preservation by trading data. We examine the decisions that an archive must make when forming trading networks, such as the amount of storage space to provide and the best number of partner sites. We also deal with the fact that some sites may be more reliable than others. Experimental results from a data trading simulator illustrate which policies are most reliable.
	Brian Cooper and Hector Garcia-Molina. Creating trading networks of digital archives. Technical Report 2001-23, Stanford University, 2001. Digital archives can best survive failures if they have made several copies of their collections at remote sites. In this paper, we discuss how autonomous sites can cooperate to provide preservation by trading data. We examine the decisions that an archive must make when forming trading networks, such as the amount of storage space to provide and the best number of partner sites. We also deal with the fact that some sites may be more reliable than others. Experimental results from a data trading simulator illustrate which policies are most reliable. Our techniques focus on preserving the ``bits'' of digital collections; other services that focus on other archiving concerns (such as preserving meaningful metadata) can be built on top of the system we describe here.
	Brian Cooper and Hector Garcia-Molina. Peer to peer data trading to preserve information. Technical Report 2001-7, Stanford University, 2001. Data archiving systems rely on replication to preserve information. In this paper, we discuss how a network of autonomous archiving sites can trade data to achieve the most reliable replication. A series of binary trades between sites produces a peer to peer archiving network. We examine two trading algorithms, one based on trading collections (even if they are different sizes) and another based on trading equal sized blocks of space (which can then store collections.) We introduce the concept of deeds, which track the sites that own space at other sites. We then discuss policies for tuning these algorithms to provide the highest reliability, for example by changing the order in which sites are contacted and offered trades. Finally, we present simulation results that reveal which policies are most reliable.
	Brian Cooper and Hector Garcia-Molina. Peer to peer data trading to preserve information (extended version). Technical Report 2001-6, Stanford University, 2001. Data archiving systems rely on replication to preserve information. In this paper, we discuss how a network of autonomous archiving sites can trade data to achieve the most reliable replication. A series of binary trades between sites produces a peer to peer archiving network. We examine two trading algorithms, one based on trading collections (even if they are different sizes) and another based on trading equal sized blocks of space (which can then store collections.) We introduce the concept of deeds, which track the sites that own space at other sites. We then discuss policies for tuning these algorithms to provide the highest reliability, for example by changing the order in which sites are contacted and offered trades. Finally, we present simulation results that reveal which policies are most reliable.
	Brian Cooper and Hector Garcia-Molina. Bidding for storage space in a peer-to-peer data preservation system (extended version). Technical Report 2002-22, Stanford University, 2002. Digital archives protect important data collections from failures by making multiple copies at other archives, so that there are always several good copies of a collection. In a cooperative replication network, sites ``trade'' space, so that each site contributes storage resources to the system and uses storage resources at other sites. Here, we examine bid trading: a mechanism where sites conduct auctions to determine who to trade with. A local site wishing to make a copy of a collection announces how much remote space is needed, and accepts bids for how much of its own space the local site must ``pay'' to acquire that remote space. We examine the best policies for determining when to call auctions and how much to bid, as well as the effects of ``maverick'' sites that attempt to subvert the bidding system. Simulations of auction and trading sessio ns indicate that bid trading can allow sites to achieve higher reliability than the alternative: a system where sites trade equal amounts of space without bidding.
	Brian F. Cooper and Hector Garcia-Molina. Modeling and measuring scalable peer-to-peer search networks. Technical Report 2002-44, Stanford University, 2002. The popularity of peer-to-peer search networks grows, even as the limitations to the scalability of existing systems become apparent. We propose a simple model for search networks, called the search/index links (SIL) model. The SIL model describes existing networks while also yielding organizations not previously studied. Using simulation results, we argue that a new organization, parallel search clusters, is superior to existing supernode networks in many cases.
	Brian F. Cooper and Hector Garcia-Molina. Modeling and measuring scalable peer-to-peer search networks (extended version). Technical Report 2002-43, Stanford University, 2002. The popularity of peer-to-peer search networks grows, even as the limitations to scalability of existing systems becomes apparent. We propose a simple model for search networks, called the search/index links (SIL) model. The SIL model describes existing networks while also yielding organizations not previously studied. Using simulation results, we argue that a new organization, parallel search clusters, is superior to existing supernode networks in many cases.
	James W. Cooper, Mahesh Viswanathan, Donna Byron, and Margaret Chan. Building searchable collections of enterprise speech data. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. We have applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, we applied a number of post-processing algorithms to the output of the recognizer to render it suitable for the Textract text mining system. We indexed the call transcripts using a search engine and used Textract and associated Java technologies to place the relevant terms for each document in a relational database. Following a search query, we generated a thumbnail display of the results of each call with the salient terms highlighted. We illustrate these results and discuss their utility. We took the results of these experiments and continued this analysis on a set of talks and presentations. We describe a distinct document genre based on the note-taking concept of document content, and propose a significant new method for measuring speech recognition accuracy. This procedure is generally relevant to the problem of capturing meetings and talks and providing a searchable index of these presentations on the web.
	Matthew Cooper, Jonathan Foote, Andreas Girgensohn, and Lynn Wilcox. Temporal event clustering for digital photo collections. In Proceedings of the eleventh ACM international conference on Multimedia, pages 364-373. ACM Press, 2003.
	Antony Corfield, Matthew Dovey, Richard Mawby, and Colin Tatham. Jafer toolkit project - interfacing z39.50 and xml. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. In this paper, we describe the JAFER ToolKit project which is developing a simplified XML based API above the Z39.50 protocol. The ToolKit allows the development of both Z39.50 based applications (both clients and servers) without detailed knowledge of the complexities of the protocol.
	Digital Equipment Corporation. Millicent. MilliCent website: http://www.millicent.digital.com/.
	Microsoft Corporation. Microsoft wallet. Microsoft wallet website: http://www.microsoft.com/wallet/.
	Steve Cousins. Reification and Affordances in a User Interface for Interacting with Heterogeneous Distributed Applications. PhD thesis, Stanford University, 1997. Steve Cousin's Ph.D. thesis
	Steve B. Cousins. A task-oriented interface to a digital library. In CHI 96 Conference Companion, pages 103-104, 1996.
	Steve B. Cousins, Scott W. Hassan, Andreas Paepcke, and Terry Winograd. Towards wide-area distributed interfaces. Technical Report SIDL-WP-1996-0037; 1997-67, Stanford University, 1996. Available at http://dbpubs.stanford.edu/pub/1997-67. Describes how the DLITE design enables shifting of functionality among distributed components.
	Steve B. Cousins, Steven P. Ketchpel, Andreas Paepcke, Héctor García-Molina, Scott W. Hassan, and Martin Roescheisen. Interpay: Managing multiple payment mechanisms in digital libraries. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document(39K + pictures) . Audience: Computer Scientists. References: 10. Links: 8. Relevance: High. Abstract: Describes an architecture called InterPay for allowing heterogeneous payment mechanisms to interoperate. Defines three levels (a task level, payment policy level, and payment mechanism level) that may be modified in dependently. Describes a working prototype using the ILU distributed object system from Xerox. Shows a sample transaction using the architecture, and how the components of the architecture (payment agents, collection agents, and paym ent and collection capabilities) can be used in more complex transactions.
	Steve B. Cousins, Steven P. Ketchpel, Andreas Paepcke, Hector Garcia-Molina, Scott W. Hassan, and Martin Röscheisen. Interpay: Managing multiple payment mechanisms in digital libraries. Digital Library, 1995. Interpay paper
	Steve B. Cousins, Andreas Paepcke, Scott W. Hassan, and Terry Winograd. Towards wide-area distributed interfaces. Technical Report SIDL-WP-1996-0037; 1997-67, Stanford University, 1997. At http://dbpubs.stanford.edu/pub/1997-67. We have designed and prototyped a series of interfaces for Digital Libraries. These interfaces use CORBA objects to distribute interface modeling and rendering across machines. We describe the design tensions arising in the context of such distribution, locate existing UI technology in the resulting design space, and explain the location of our final prototype in that space. We view Digital Libraries as collections of repositories and publication-related services that may be distributed over large distances and must be accessible from many locations and through multiple hardware, software, and networking platforms. We describe our use of CORBA and briefly introduce a drag-and-drop interface developed to provide unified access to heterogeneous Digital Library resources.
	Steve B. Cousins, Andreas Paepcke, Terry Winograd, Eric A. Bier, and Ken Pier. The digital library integrated task environment (dlite). In Proceedings of the Second ACM International Conference on Digital Libraries, pages 142-151, 1997. Accessible at http://dbpubs.stanford.edu/pub/1997-69.
	B. Cox, D. Tygar, and M. Sirbu. Netbill security and transaction protocol. In First USENIX Workshop of Electronic Commerce Proceedings, 1995.
	Gregory Crane. Building a digital library: The perseus project as a case study in the humanities. In Proceedings of DL'96, 1996. Format: Not yet online.
	Gregory Crane, David A. Smith, and Clifford E. Wulfman. Building a hypertextual digital library in the humanities: A case study on london. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. This paper describes the creation of a new humanities digital library collection: 11,000,000 words and 10,000 images representing books, images and maps on pre-twentieth century London and its environs. The London collection contained far more dense and precise information than the materials from the Grecco-Roman world on which we had previously concentrated. The London collection thus allowed us to explore new problems of data structure, manipulation, and visualization. This paper contrasts our model for how humanities digital libraries are best used with the assumptions that underlie many academic digital libraries on the one hand and more literary hypertexts on the other. Since encoding guidelines such as those from the TEI provide collection designers with far more options than any one project can realize, this paper describes what structures we used to organize the collection and why. We particularly emphasize the importance of mining historical `authority lists` (encyclopedias, gazetteers, etc.) and then generating automatic `span-to-span` links within the collection.
	Gregory Crane, Clifford E. Wulfman, Lisa M. Cerrato, Anne Mahoney, Thomas L. Milbank, David Mimno, Jeffrey A. Rydberg-Cox, David A. Smith, and Christopher York. Towards a cultural heritage digital library. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. This paper surveys research areas relevant to cultural heritage digital libraries. The emerging National Science Digital Library promises to establish the foundation on which those of us beyond the scientific and engineering community will likely build. This paper thus articulates the particular issues that we have encountered in developing cultural heritage collections. We provide a broad overview of audiences, collections, and services.
	Nick Craswell and Peter Bailey. Server selection on the world wide web. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. We evaluate server selection methods in a Web environment, modeling a digital library which makes use of existing Web search servers rather than building its own index. The evaluation framework portrays the Web realistically in several ways. Its search servers index real Web documents, are of various sizes, cover different topic areas and employ different retrieval methods. Selection is based on statistics extracted from the results of probe queries submitted to each server. We evaluate published selection methods and a new method for enhancing selection based on expected search server effectiveness. Results show CORI to be the most effective of three published selection methods.
	Arturo Crespo and Eric A. Bier. WebWriter: A browser-based editor for constructing web applications. In Proceedings of the Fifth International World-Wide Web Conference, 1996.
	Arturo Crespo, Orkut Buyukkokten, and Hector Garcia-Molina. Efficient query subscription processing in a multicast environment. In Proceedings of the 16th International Conference on Data Engineering, 2000. Available at http://dbpubs.stanford.edu/pub/2000-54. This paper introduces techniques for reducing data dissemination costs of query subscriptions. The reduction is achieved by merging queries with overlapping, but not necessarily equal, answers. The paper formalizes the query-merging problem and introduces a general cost model for it. We prove that the problem is NP-hard and propose exhaustive algorithms and three heuristic algorithms: the Pair Merging Algorithm, the Directed Search Algorithm and the Clustering Algorithm. We develop a simulator for evaluating the different heuristics and show that the performance of our heuristics is close to optimal.
	Arturo Crespo, Bay-Wei Chang, and Eric A. Bier. Responsive interaction for a large web application: The meteor shower architecture in the WebWriter II editor. In Proceedings of the Sixth International World-Wide Web Conference, 1997. Traditional server-based web applications allow access to server-hosted resources, but often exhibit poor responsiveness due to server load and network delays. Client-side web applications, on the other hand, provide excellent interactivity at the expense of limited access to server resources. The WebWriter II Editor, a direct manipulation HTML editor that runs in a web browser, uses both server-side and client-side processing in order to achieve the advantages of both. In particular, this editor downloads the document data structure to the browser and performs all operations locally. The user interface is based on HTML frames and includes individual frames for previewing the document and displaying general and specific control panels. All editing is done by JavaScript code residing in roughly twenty HTML pages that are downloaded into these frames as needed. Such a client-server architecture, based on frames, client-side data structures, and multiple JavaScript-enhanced HTML pages appears promising for a wide variety of applications. This paper describes this architecture, the Meteor Shower Application Architecture, and its use in the WebWriter II Editor.
	Arturo Crespo and Hector Garcia-Molina. Awareness services for digital libraries. In Lecture Notes in Computer Science, volume 1324, 1997.
	Arturo Crespo and Hector Garcia-Molina. Archival storage for digital libraries. In Proceedings of the Third ACM International Conference on Digital Libraries, 1998. Accessible at http://dbpubs.stanford.edu/pub/1998-49. We propose an architecture for Digital Library Repositories that assures long-term archival storage of digital objects. The architecture is formed by a federation of independent but collaborating sites, each managing a collection of digital objects. The architecture is based on the following key components: use of signatures as object handles, no deletions of digital objects, functional layering of services, the presence of an awareness service in all layers, and use of disposable auxiliary structures. Long-term persistence of digital objects is achieved by creating replicas at several sites.
	Arturo Crespo and Hector Garcia-Molina. Modeling archival repositories for digital libraries. In Submitted for publication, 2000. Available at http://dbpubs.stanford.edu/pub/1999-23. This paper studies the archival problem: how a digital library can preserve electronic documents over long periods of time. We analyze how an archival repository can fail and we present different strategies that help solve the problem. We introduce ArchSim, a simulation tool that for evaluating an implementation of an archival repository system and compare options such as different disk reliabilities, error detection and correction algorithms, preventive maintenance, etc. We use ArchSim to analyze a case study of an Archival Repository for Computer Science Technical Reports.
	Arturo Crespo and Hector Garcia-Molina. Cost-driven design for archival repositories. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. Designing an archival repository is a complex task because there are many alternative configurations, each with different reliability levels and costs. In this paper we study the costs involved in an Archival Repository and we introduce a design framework for evaluating alternatives and choosing the best configuration in terms of reliability and cost. We also present a new version of our simulation took, ArchSim/C that aids in the decision process. The design framework and the usage of ArchSim/C are illustrated with a case study of a hypothetical (yet realistic) archival repository shared between two universities.
	Arturo Crespo and Hector Garcia-Molina. Routing indices for peer-to-peer systems. Technical Report 2001-48, Stanford University, 2001. Finding information in a peer-to-peer system currently requires either a costly and vulnerable central index, or flooding the network with queries. In this paper we introduce the concept of Routing Indices (RIs), which allow nodes to forward queries to neighbors that are more likely to have answers. If a node cannot answer a query, it forwards the query to a subset of its neighbors, based on its local RI, rather than by selecting neighbors at random or by flooding the network by forwarding the query to all neighbors. We present three RI schemes: the compound, the hop-count, and the exponential routing indices. We evaluate their performance via simulations, and find that RIs can improve performance by one or two orders of magnitude vs. a flooding-based system, and by up to 100the different RI schemes and highlight the effects of key design variables on system performance.
	Fabio Crestani. Vocal access to a newspaper archive: Design issues and preliminary investigations. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. This paper presents the design and the current prototype implementation of an interactive vocal Information Retrieval system that can be used to access articles of a large news paper archive using a telephone. The results of preliminary investigation into the feasibility of such a system are also presented.
	William T. Crocca and William L. Anderson. Delivering technology for digital libraries: Experiences as vendors. In DL '95, 1995. Format: HTML Document (39K + picture). . Audience: Computer scientists and librarians References: 10. Links: 1. Relevance: Low. Abstract: Argues that many of the problems in DL development are not technical, but social and political, as the nature of the work is transformed. Describes two Xerox collaborations with academia, one on scanned documents, a second with a web-based system. Lists some assumptions that are sometimes made, and to what extent they are borne out. Special concerns are standards, which ones to support, and ensuring access to acquisitions in old standards.
	W. Bruce Croft. What do people want from information retrieval? (the top 10 research issues for companies that use and sell ir systems). D-Lib Magazine, Nov 1995. Format: HTML Document().
	W. Bruce Croft, Robert Cook, and Dean Wilder. Providing government information on the internet: Experiences with thomas. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document(29K) . Audience: Information retrieval specialists. References: 12. Links: 1. Relevance: Low. Abstract: Describes use of THOMAS, the on-line source of congressional information. Based on the INQUERY engine, offers keyword searches with other advanced features (proximity, weighted averaging, synonyms) which are largely ignored by the user population. The tendency is for short (3 word or less) queries about a single topic. Describes domain-dependent performance enhancements with the ranking algorithms to ensure that relevant hits appear near the top of the ranking.
	Isabel Cruz. . effective abstractions in multimedia. In DAGS '95, 1995. Format: PostScript () .
	M. I. Crystal and G. E. Jakobson. FRED, a front end for databases. Online, 6(5):27-30, September 1982.
	Pierre Cubaud, Pascal Stokowski, and Alexandre Topol. Binding browsing and reading activities in a 3d digital library. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Browsing through collections and reading activities are separated in most present WWW-based user's interfaces of digitalized libraries. This context break induces longer apprenticeship and navigation time within the interface. We study in this paper how 3D interaction metaphors can be used to provide a continuous navigation space for these two tasks.
	Hong Cui, P. Bryan Heidorn, and Hong Zhang. An approach to automatic classification of text for information retrieval. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. In this paper, we explore an approach to make better use of semi-structured documents in information retrieval in the domain of biology. Using machine learning techniques, we make those inherent structures explicit by XML markups. This marking up has great potentials in improving task performance in specimen identification and the usability of online flora and fauna.
	Sally Jo Cunningham, David Bainbridge, and Masood Masoodian. How people describe their image information needs: A grounded theory analysis of visual arts queries. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. When people are looking for visual arts information - information related to images - how do they characterize their needs? We analyze a set of 405 queries to identify the attributes that people provide to the Google Answers' `ask an expert` online reference system. The results suggest directions to take in developing an effective organization and features for an image digital library.
	Sally Jo Cunningham, Chris Knowles, and Nina Reeves. An ethnographic study of technical support workers: Why we didn't build a tech support digital library. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. In this paper we describe the results of an ethnographic study of the information behaviours of university technical support workers and their information needs. The study looked at how the group identified, located and used information from a variety of sources to solve problems arising in the course of their work. The results of the investigation are discussed in the context of the feasibility of developing a potential information base that could be used by all members of the group. Whilst a number of their requirements would easily be fulfilled by the use of a digital library, other requirements would not. The paper illustrates the limitations of a digital library with respect to the information behaviours of this group of subjects and focuses on why a digital library would not appear to be the ideal support tool for their work.
	Sally Jo Cunningham and Nina Reeves. An ethnographic study of music information seeking: Implications for the design of a music digital library. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. At present, music digital library systems are being developed based on anecdotal evidence of user needs, intuitive feelings for user information seeking behavior, and a priori assumptions of typical usage scenarios. Emphasis has instead been placed on basic research into music document representation, efficient searching, and audio-based searching, rather than on exploring the music information needs or information behavior of a target user group. This paper focuses on eliciting the 'native' music information strategies employed by people searching for popular music (that is, music sought for recreational or enjoyment purposes rather than to support a 'serious' or scientific exploration of some aspect of music). To this end, we conducted an ethnographic study of the searching/browsing techniques employed by people in the researchers' local communities, as they use two common sources of music: the public library and music stores. We argue that the insights provided by this type of study can inform the development of searching/browsing support for music digital libraries.
	Te Taka Keegan Sally Jo Cunningham. Language preference in a bi-language digital library. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. This paper examines user choice of interface language in a bi-language digital library (English and Māori, the language of the indigenous people of New Zealand). The majority of collection documents are in Māori, and the interface is available in both Māori and English. Log analysis shows three categories of preference for interface language: primarily English, primarily Māori, and bilingual (switching back and forth between the two). As digital libraries increase in number, content, and potential user base, interest has grown in ‘multilingual’ or ‘multi-language’ collections-that is, digital libraries in which the collection documents and the collection interface include more than one language. Research in multilingual/multi-language digital libraries and web-based document collections has primarily focused on fundamental implementation issues and functionality, principles for design, and small-scale usability tests; at present no analysis exists of how these systems are used, or how the presence of more than one language in a digital library affects user interactions-presumably because multilingual/multi-language digital libraries are only recently moving from research lab prototypes to fielded systems, and few have built up a significant usage history. This paper describes the application of log analysis to examine interface language preference in a bi-language (English/Māori) digital library-the Niupepa Collection (Section 2). Web log data was collected for a year (Section 3), and log analysis indicates three categories of interface language preferences: English, Māori, and ‘bilingual’ (Section 4). A fine-grained analysis of activities within user sessions indicates different patterns of document access and information gathering strategy between these three categories (Section 5).
	Doug Cutting, Bill Janssen, Mike Spreitzer, and Farrell Wymore. ILU Reference Manual. Xerox Palo Alto Research Center, December 1993. Accessible at `ftp://ftp.parc.xerox.com/pub/ilu/ ilu.html`. Reference manual. Tech report at cour94
	Douglass R. Cutting, David Karger, and Jan Pedersen. Constant interaction-time scatter/gather browsing of very large document collections. In Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 126-135, 1993.
	Douglass R. Cutting, Jan O. Pedersen, David Karger, and John W. Tukey. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 318-329, 1992.
	CyberCash. Cybercash home page. CyberCash website: http://www.cybercash.com/.
	Gordon Dahlquist, Brian Hoffman, and David Millman. Integrating digital libraries and electronic publishing in the dart project. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. The Digital Anthropology Resources for Teaching (DART) project integrates the content acquisition and cataloging initiatives of a federated digital repository with the development of scholarly publications and the creation of digital tools to facilitate classroom teaching. The project's technical architecture and unique publishing model create a teaching context where students move easily between primary and secondary source material and between authored environments and independent research, and raise specific issues with regard to metadata, object referral, rights, and exporting content. The model also addresses the loss of provenance and catalog information for digital objects embedded in `born-digital` publications. The DART project presents a practical methodology to combine repository and publication that is both exportable and discipline-neutral.
	Zubin Dalal, Suvendu Dash, Pratik Dave, Luis Francisco-Revilla, Richard Furuta, Unmil Karadkar, and Frank Shipman. Managing distributed collections: Evaluating web page changes, movement, and replacement. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Distributed collections of Web materials are common. Bookmark lists, paths, and catalogs such as Yahoo! Directories require human maintenance to keep up to date with changes to the underlying documents. The Walden's Paths Path Manager is a tool to support the maintenance of distributed collections. Earlier efforts focused on recognizing the type and degree of change within Web pages and identifying pages no longer accessible. We now extend this work with algorithms for evaluating drastic changes to page content based on context. Additionally, we expand on previous work to locate moved pages and apply the modified approach to suggesting page replacements when the original page cannot be found. Based on these results we are redesigning the Path Manager to better support the range of assessments necessary to manage distributed collections.
	Zubin Dalal, Suvendu Dash, Pratik Dave, Luis Francisco-Revilla, Richard Furuta, Unmil Karadkar, and Frank Shipman. Managing distributed collections: evaluating web page changes, movement, and replacement. In JCDL '04: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, pages 160-168, New York, NY, USA, 2004. ACM Press. Distributed collections of Web materials are common. Bookmark lists, paths, and catalogs such as Yahoo! Directories require human maintenance to keep up to date with changes to the underlying documents. The Walden's Paths Path Manager is a tool to support the maintenance of distributed collections. Earlier efforts focused on recognizing the type and degree of change within Web pages and identifying pages no longer accessible. We now extend this work with algorithms for evaluating drastic changes to page content based on context. Additionally, we expand on previous work to locate moved pages and apply the modified approach to suggesting page replacements when the original page cannot be found Based on these results we are redesigning the Path Manager to better support the range of assessments necessary to manage distributed collections.
	Raymond J. D'Amore, Daniel J. Helm, Puck-Fai Yan, and Stephen A. Glanowski. Mitre information discovery system. In Proceedings of DL'96, 1996. Format: Not yet online.
	B. C. Dasai and S. Swiercz. Webjounal: Visualization of a web journey. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	Neil Daswani, Dan Boneh, Hector Garcia-Molina, Steven Ketchpel, and Andreas Paepcke. A generalized digital wallet architecture. In Proceedings of the 3rd USENIX Workshop on Electronic Commerce, p. 121-39, 1998. Publishers wishing to distribute text online fear that customers will download their product and redistribute it illegally. Although constraining the users to access the data only through proprietary software that does not allow downloading helps, it still leaves the possibility that users could take screen dumps of the material to capture it. The technique described in the paper relies on the perceptual properties of the human eye, using two unreadable images interleaved quickly to create a readable image, which cannot be screen-dumped since the readability depends on averaging in the human eye. Our program flickers two images of the text each with an admixture of grey noise. Your eye sorts out the letters and reads them, not paying close attention to the grey background; but any screen dump captures the item at one instant including the noise. The text is also scrolled up and down slowly, which again your eye can track, but which would frustrate a program trying to average out the flickering.
	Neil Daswani, Hector Garcia-Molina, and Beverly Yang. Open problems in data-sharing peer-to-peer systems. In Proceedings of the 9th International Conference on Database Theory, 2003. In a Peer-To-Peer (P2P) system, autonomous computers pool their resources (e.g., files, storage, compute cycles) in order to inexpensively handle tasks that would normally require large costly servers. The scale of these systems, their `open nature,` and the lack of centralized control pose difficult performance and security challenges. Much research has recently focused on tackling some of these challenges; in this paper, we propose future directions for research in P2P systems, and highlight problems that have not yet been studied in great depth. We focus on two particular aspects of P2P systems - search and security - and suggest several open and important research problems for the community to address.
	Mayur Datar. Butterflies and peer-to-peer networks. In Proceedings of the 10th European Symposium on Algorithms, 2002. Research in Peer-to-peer systems has focussed on building efficient Content Addressable Networks (CANs), which are essentially distributed hash tables (DHT) that support location of resources based on unique keys. While most proposed schemes are robust to a large number of random faults, there are very few schemes that are robust to a large number of adversarial faults. In a recent paper Fiat and Saia have proposed such a solution that is robust to adversarial faults. We propose a new solution based on multi-butterflies that improves upon the previous solution by Fiat and Saia. Our new network, multi-hypercube, is a fault tolerant version of the hypercube, and may find applications to other problems as well. We also demonstrate how this network can be maintained dynamically. This addresses the first open problem in the paper by Fiat and Saia.
	Mayur Datar. Butterflies and peer-to-peer networks. Technical Report 2002-5, Stanford University, 2002. The popularity of systems like Napster, Gnutella etc. have spurred recent interest in Peer-to-peer systems. A central problem in all these systems is efficient location of resources based on their keys. A network that supports such queries is referred to as Content Addressable Network (CAN). Many solutions have been proposed to building CANs. However most of these solutions do not focus on adversarial faults, which might be critical to building a censorship resistant peer-to-peer system. In a recent paper Fiat and Saia have proposed a solution to building such a system. We propose a new solution based on multi-butterflies that improves upon the previous solution by Fiat and Saia. Our new network, ( multi-hypercube), is a fault tolerant version of hypercube. We also demonstrate how this network can be maintained dynamically. This addresses the first open problem in the paper by Fiat and Saia.
	Winton H. E. Davies and Pete Edwards. Agent-based knowledge discovery. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Hugh Davis. Using microcosm to access digital libraries. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (6K) . Audience: UK funders . References: 4. Links: 1. Relevance: Low. Abstract: A description of the Microcosm system (campus document delivery), a hypermedia system allowing links to 3rd party viewers.
	Hugh Davis and Jessie Hey. Automatic extraction of hypermedia bundles from the digital library. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document(34K + pictures) . Audience: Digitial library developers and users. References: 21. Links: 1. Relevance: Low. Abstract: Rather than just retrieving a list of `hits` for a query, the system can bundle them, generating hyperlinks on keywords, offer interactive query expansion or contraction. Suggests the addition of a `length` (in minu tes to comprehend) and a `reader level` field of meta-information.
	J. Davis, D. Krafft, and C. Lagoze. Dienst: Building a production technical report server. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	James R. Davis. Creating a networked computer science technical report library. D-Lib Magazine, Sep 1995. Format: HTML Document().
	James R. Davis. Creating a networked computer science technical report library. In Proceedings of DL'96, 1996. Format: Not yet online.
	Marc Davis, Simon King, Nathan Good, and Risto Sarvas. From context to content: leveraging context to infer media metadata. In Proceedings of the 12th International Conference on Multimedia (MM2004), pages 188-195. ACM Press, 2004.
	Peter T. Davis, David K. Elson, and Judith L. Klavans. Methods for precise named entity matching in digital collections [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. In this paper, we describe an interactive system. built within the context of CLiMB project, which permits a user to locate the occurrences of named entities within a given text. The named entity tool was developed to identify references to a single art object (e.g. a particular building) with high precision in text related to images of that object in a digital collection. We start with an authoritative list of art objects, and seek to match variants of these named entities in related text. Our approach is to `decay` entities into progressively more general variants while retaining high precision. As variants become more general, and thus more ambiguous, we propose methods to disambiguate intermediate results. Our results will be used to select records into which automatically generated metadata will be loaded.
	Colin Day. Economics of electronic publishing. In JEP, 1994. Format: HTML Document (31K) . Audience: Generalist, academic. References: 1. Links: 0. Relevance: low-medium. Abstract: Discusses the 4 services of publisher and library: Gathering, Selecting, Enhancing, and Informing in terms of benefits provided to academics and society. Argues that distribution of ideas is too important to be exclusively at the mercy of the market place, and should (like theater or public TV) be subsidized, but the majority of cost recovery should still be from users. A rgues that the producers and consumers (university presses and faculty) are largely part of the same institution, so there should be gains, but presses have evolved to be largely independent.
	Colin Day. Pricing electronic products. In JEP, 1994. Format: HTML Document (21K) . Audience: publishers, librarians. References: 0. Links: 0 . Relevance: low-medium . Abstract: Economic discussion of publishing. Looks at `first copy` and `incremental copy` costs. Considers ways that publishers can recover first copy costs while still distributing to all for whom it is economically rational (value is greater than incremental cost.) Possible models: 1) `country club`, where one pays high up-front dues, but then low per-transaction cost; 2) `differentiated costs` where different products are provided, one at a higher cost with certain features, a second at lower (marginal) cost, e.g., more expensive hardcover comes out first, followed by cheap paperback months later. Mentions 3 specific examples: Project Muse, Chicago Journal of Theoretical Computer Science, and Mathematical Reviews.
	J.D. Day and H. Zimmermann. The osi reference model. Proc. of the IEEE, 71:1334-1340, December 1983.
	D.Choy, R. Dievendorff, C. Dwork, J. B. Lotspiech, R. T. Morris, L. C. Anderson, A. E. Bell, S. K. Boyer, T. D. Griffin, B. A. Hoenig, J. M. McCrossin, A. M. Miller, N. J. Pass, F. P estoni, and D. S. Picciano. The almaden distributed digital library system. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	O. de Bruijn, R. Spence, and M. Y. Chong. Rsvp browser: Web browsing on small screen devices. Personal Ubiquitous Comput., 6(4):245-252, 2002. Abstract: In this paper, we illustrate the use of space-time trade-offs for information presentation on small screens. We propose the use of Rapid Serial Visual Presentation (RSVP) to provide a rich set of navigational information for Web browsing. The principle of RSVP browsing is applied to the development of a Web browser for small screen devices, the RSVP browser. The results of an experiment in which Web browsing with the RSVP browser is compared with that of a typical WAP browser suggests that RSVP browsing may indeed offer alternative to other forms of Web browsing on small screen devices.
	Jeffrey Dean and Monika R. Henzinger. Finding related pages in the world wide web. In Proceedings of the Eighth International World-Wide Web Conference, 1999. When using traditional search engines, users have to formulate queries to describe their information need. This paper discusses a different approach to Web searching where the input to the search process is not a set of query terms, but instead is the URL of a page, and the output is a set of related Web pages. A related Web page is one that addresses the same topic as the original page. For example, www.washingtonpost.con is a page related to www.nytimes.con, since both are online newspapers. We describe two algorithms to identify related Web pages. These algorithms use only the connectivity information in the Web (i.e., the links between pages) and not the content of pages or usage information. We have implemented both algorithms and measured their runtime performance. To evaluate the effectiveness of our algorithms, we performed a user study comparing our algorithms with Netscape's `What's Related' service (http://home.netscape.con/escapes/related/). Our study showed that the precision at 10 for our two algorithms are 73despite the fact that Netscape uses both content and usage pattern information in addition to connectivity information.
	Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, June 1990.
	A.J. Demers, K. Petersen, M.J. Spreitzer, D.B. Terry, M.M. Theimer, and B.B. Welch. The bayou architecture: Support for data sharing among mobile users. In Proceedings IEEE Workshop on Mobile Computing Systems & Applications, pages 2-7, Santa Cruz, California, December 8-9 1994. At http://www.parc.xerox.com/bayou/.
	Robert Demolombe and Andrew Jones. A common logical framework to retrieve information and meta information. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Dorothy E. Denning and Peter J. Denning. Data security. ACM Computing Surveys, 11(3):227-249, September 1979. This paper discusses four kinds of security controls: access control, flow control, inference control, and data encryption. It describes the general nature of controls of each type, the kinds of problems they can and cannot solve, and their inherent limitations and weakness.
	Jack B. Dennis and Earl C. Van Horn. Programming semantics for multiprogrammed computations. In Communications of the ACM, 1966.
	Mark Derthick. Interfaces for palmtop image search. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We expect that people will want to search for video news or entertainment on mobile platforms as soon as the technology is ready. An Ipaq palmtop version of the Informedia Digital Video Library interface has already been developed at the Chinese University of Hong Kong. Separately, we used the Desktop Informedia interface for the interactive part of the Trec10 video track competition. The lesson we learned is that automated image search is so poor that the best interactive results come from showing the user many images quickly, and allowing flexible drill down to images from nearby shots. Here we report on an effort to apply this lesson to palmtop platforms, where showing a large grid of images in parallel is not feasible. Perceptual psychology experiments suggest that time-multiplexing may be as effective as space-multiplexing for this kind of primed recognition task. In fact, it has been specifically suggested that image retrieval interfaces using Rapid Serial Visual Presentation (RSVP) may perform significantly better than parallel presentation even on a desktop computer [2]. In our experiments, we did not find this to be true. An important difference between previous experiments and our own, we discovered, is that image search engines rank retrievals, and correct answers are more likely to occur early in the list of results. Thus we found that scrolling (and low RSVP presentation rates) led to better recognition of answers that occur early, but worse for answers that occur far down the list. This split confounded the global effects that we had hypothesized, yet in itself is an important consideration for future interface designs, which must adapt as search technology improves.
	J. P. Deschrevel. The ansa model for trading and federation. Technical Report APM.1005.01, APM, Cambridge, 1989.
	Hrishikesh Deshpande, Mayank Bawa, and Hector Garcia-Molina. Streaming live media over a peer-to-peer network. Technical Report 2001-31, Stanford University, 2001. The high bandwidth required by live streaming video greatly limits the number of clients that can be served by a source. In this work, we discuss and evaluate an architecture, called SpreadIt, for streaming live media over a network of clients, using the resources of the clients themselves. Using SpreadIt, we can distribute bandwidth requirements over the network. The key challenge is to allow an application level multicast tree to be easily maintained over a network of transient peers, while ensuring that quality of service does not degrade. We propose a basic peering infrastructure layer for streaming applications, which uses a redirect primitive to meet the challenge successfully. Through empirical and simulation studies, we show that SpreadIt provides a good quality of service, which degrades gracefully with increasing number of clients. Perhaps more significantly, existing applications can be made to work with SpreadIt, without any change to their code base.
	Hrishikesh Deshpande, Mayank Bawa, and Hector Garcia-Molina. Streaming live media over a peer-to-peer network. Technical Report 2001-30, Stanford University, 2001. The high bandwidth required by live streaming video greatly limits the number of clients that can be served by a source. In this work, we discuss and evaluate an architecture, called SpreadIt, for streaming live media over a network of clients, using the resources of the clients themselves. Using SpreadIt, we can distribute bandwidth requirements over the network. The key challenge is to allow an application level multicast tree to be easily maintained over a network of transient peers, while ensuring that quality of service does not degrade. We propose a basic peering infrastructure layer for streaming applications, which uses a redirect primitive to meet the challenge successfully. Through empirical and simulation studies, we show that SpreadIt provides a good quality of service, which degrades gracefully with increasing number of clients. Perhaps more significantly, existing applications can be made to work with SpreadIt, without any change to their code base.
	Hrishikesh Deshpande, Mayank Bawa, and Hector Garcia-Molina. Streaming live media over peers. Technical Report 2002-21, Stanford University, 2002. The high bandwidth required by live streaming video greatly limits the number of clients that can be served by a source using unicast. An efficient solution is IP-multicast, but it suffers from poor deployment. Application-level multicast is being increasingly recognized as a viable alternative. In this work, we discuss and evaluate a tree-based overlay network called PeerCast that uses clients to forward the stream to their peers. PeerCast is designed as a live-media streaming solution for peer-to-peer systems that are populated by hundreds of autonomous, short-lived nodes. Further, we argue for the need to take end-host behavior into account while evaluating an application-level multicast architecture. An end-host behavior model is proposed that allows us to capture a range of realistic peer behavior. Using this model, we develop robust, yet simple, tree-maintenance policies. Through empirical runs and extensive simulations, we show that PeerCast provides good QoS, which gracefully degrades with the number of clients. We have implemented a PeerCast prototype, which is available for download.
	Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu. A query language for xml. In Proceedings of the Eighth International World-Wide Web Conference, 1999. An important application of XML is the interchange of electronic data (EDI) between multiple data sources on the Web. As XML data proliferates on the Web, applications will need to integrate and aggregate data from multiple source and clean and transform data to facilitate exchange. Data extraction, conversion, transformation. and integration are all well-understood database problems, and their solutions rely on a query language. We present a query language for XML, called XML-QL, which we argue is suitable for performing the above tasks. XML-QL is a declarative. `relational complete' query language and is simple enough that it can be optimized. XML-QL can extract data from existing XML documents and construct new XML documents.
	Ann S. Devlin. Mind and maze : spatial cognition and environmental behavior. Praeger, 2001.
	Prasun Dewan, Kevin Jeffay, John Smith, David Stotts, and William Oliver. Early prototypes of the repository for patterned injury data. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document (34K + pictures) . Audience: Medical forensics, computer scientists. References: 15. Links: 1. Relevance: Low-Medium. Abstract: Describes a system for collaboration among coroners. Focuses on issues of access rights-people in different roles (lead examiner, toxicologist, judge) see different views of the same data (some fields are read- prote cted). Initial prototype was under the ABC system of UNC, but new prototypes will be web-based. Also hope to incorporate tele-conferencing capabilities.
	Anind K. Dey. Understanding and using context. Personal Ubiquitous Comput., 5(1):4-7, 2001.
	Anind K. Dey and Gregory D. Abowd. Towards a better understanding of context and context-awareness. In Workshop on The What, Who, Where, When, and How of Context-Awareness, as part of the 2000 Conference on Human Factors in Computing Systems (CHI 2000), April 2000.
	Anne R. Diekema and Jiangping Chen. Experimenting with the automatic assignment of educational standards to digital library content. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. This paper describes exploratory research concerning the automatic assignment of educational standards to lesson plans. An information retrieval based solution was proposed, and the results of several experiments are discussed. Results suggest the optimal solution would be a recommender tool where catalogers receive suggestions from the system but humans make the final decision.
	DigiCash. Digicash: Solutions for security and privacy. DigiCash website: http://www.digicash.com/.
	M. Diligenti, F. M. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused crawling using context graphs. In Proceedings of the Twenty-sixth International Conference on Very Large Databases, 2000.
	Michelangelo Diligenti, Frans Coetzee, Steve Lawrence, C. Lee Giles, and Marco Gori. Focused crawling using context graphs. In Proceedings of the Twenty-sixth International Conference on Very Large Databases, pages 527-534, September 2000.
	Junyan Ding, Luis Gravano, and Narayanan Shivakumar. Computing geographical scopes of web resources. In Proceedings of the Twenty-sixth International Conference on Very Large Databases, pages 545-556. Morgan Kaufmann Publishers Inc., 2000.
	Wei Ding, Gary Marchionini, and Dagobert Soergel. Multimodal surrogates for video browsing. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. Three types of video surrogates - visual (keyframes), verbal (keywords/phrases), and visual and verbal - were designed and studied in a qualitative investigation of user cognitive processes. The results favor the combined surrogates in which verbal information and images reinforce each other, lead to better comprehension, and may actually require less processing time, The results also highlight image features users found most helpful. These findings will inform the interface design and video representation for video retrieval and browsing.
	D.Koller and Y. Shoham. Information agents: A new challenge for AI. IEEE Expert, pages 8-10, June 1996.
	R. Dolin, D. Agrawal, and A. El Abbadi. Scalable collection summarization and selection. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. Information retrieval over the Internet increasingly requires the filtering of thousands of information sources. As the number and variety of sources increases, new ways of automatically summarizing, discovering, and selecting sources relevant to a user's query are needed. Pharos is a highly scalable distributed architecture for locating heterogeneous information sources. Its design is hierarchical, thus allowing it to scale well as the number of information sources increases. We demonstrate the feasibility of the Pharos architecture using 2500 Usenet newsgroups as separate collections. Each newsgroup is summarized via automated Library of Congress classification. We show that using Pharos as an intermediate retrieval mechanism provides acceptable accuracy of source selection compared to selecting sources using complete classification information, while maintaining good scalability. This implies that hierarchical distributed metadata and automated classification are potentially useful paradigms to address scalability problems in large-scale distributed information retrieval applications.
	Document Object Model Level 1 specification. http://www.w3.org/TR/REC-DOM-Level-1/.
	Peter Domel. Webmap: a graphical hypertext navigation tool. In Proceedings of the Second International World-Wide Web Conference, 1994.
	Andy Dong and Alice M. Agogino. Design principles for the information architecture of a smet education digital library. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. This implementation paper introduces principles for the information architecture of an educational digital library, principles that address the distinction between designing digital libraries for education and designing digital libraries for information retrieval in general. Design is a key element of any successful product. Good designers and their designs put technology into the hands of the user, making the produc'ts focus comprehensible and tangible through design. As straightforward as this may appear, the design of learning technologies is often masked by the enabling technology. In fact, they often lack an explicitly stated instructional design methodology. While the technologies are important hurdles to overcome, we advocate learning systems that empower education-driven experiences rather than technology-driven experiences. This work describes a concept for a digital library for science, mathematics, engineering and technology education (SMETE), a library with an information architecture designed to meet learners' and educators' needs. Utilizing a constructivist model of learning, the authors present practical approaches to implementing the information architecture and it technology underpinnings. The authors propose the specifications for the information architecture and a visual design of a digital library for communicating learning to the audience. The design methodology indicates that a scenario-driven design technique sensitive to the contextual nature of learning offers a useful framework for tailoring technologies that help empower, not hinder, the educational sector.
	Jim Dorward, Derek Reinke, and Mimi Recker. An evaluation model for a digital library services tool. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. This paper describes an evaluation model for a digital library tool, the Instructional Architect, which enables users to discover, select, reuse, sequence, and annotate digital library learning objects. By documenting our rapid-prototyping, iterative, and user-centered approach for evaluating a digital library service, we provide a model and set of methods that other developers may wish to employ. In addition, we provide preliminary results from our studies.
	Fred Douglis, Thomas Ball, Yih-Farn Chen, and Eleftherios Koutsofios. Webguide: Querying and navigating changes in web repositories. In Proceedings of the Fifth International World-Wide Web Conference, May 1996.
	Fred Douglis, Thomas Ball, Yih-Farn Chen, and Eleftherios Koutsofios. The at&t internet difference engine: Tracking and viewing changes on the web. World Wide Web, 1(1):27-44, January 1998.
	Fred Douglis, Anja Feldmann, and Balachander Krishnamurthy. Rate of change and other metrics: a live study of the world wide web. In USENIX Symposium on Internetworking Technologies and Systems, 1999.
	Fred Douglis, Antonio Haro, and Michael Rabinovich. Hpp: Html macro-preprocessing to support dynamic document caching. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, Monterey, California, 1997.
	Michael Droettboom. Correcting broken characters in the recognition of historical printed documents [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. This paper presents a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.
	Michael Droettboom, Karl MacMillan, Iciro Fujinaga, G. Sayeed Choudhury, Tim DiLauro, Mark Patton, and Teal Anderson. Using the gamera framework for the recognition of cultural heritage materials. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. This paper presents a new toolkit for the creation of customized structured document recognition applications by domain experts. This open- source system, called Gamera, allows a user, with particular knowledge of the documents to be recognized, to combine image processing and recognition tools in an easy to use, interactive, graphical scripting environment. Gamera is one of the key technology components in a proposed international project for the digitization of diverse types of humanities documents.
	Steven M. Drucker, Curtis Wong, Asta Roseway, Steven Glenner, and Steven De Mar. Mediabrowser: reclaiming the shoebox. In AVI '04: Proceedings of the working conference on Advanced visual interfaces, pages 433-436, New York, NY, USA, 2004. ACM Press.
	Allison Druin, Benjamin B. Bederson, Juan Pablo Hourcade, Lisa Sherman, Glenda Revelle, Michele Platner, and Stacy Weng. Designing a digital library for young children: An intergenerational partnership. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. As more information resources become accessible using computers, our digital interfaces to those resources need to be appropriate for all people. However when it comes to digital libraries, the interfaces have typically been designed for older children or adults. Therefore, we have begun to develop a digital library interface developmentally appropriate for young children (ages 5-10 years old). Our prototype system we now call `SearchKinds` offers a graphical interface for querying, browsing and reviewing search results. This paper describes our motifation for the research, the design partnership we established between children and adults, our design process, the technology outcomes of our current work, and the lessons we have learned.
	D. Dubois and H. Prade. Fuzzy Sets and Systems: Theory and Applications. Academic Press, New York, 1980.
	Monica Duke, Michael Day, Rachel Heery, Leslie A. Carr, and Simon J. Coles. Enhancing access to research data : the challenge of crystallography. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. This paper describes an ongoing collaborative effort across digital library and scientific communities in the UK to improve access to research data. A prototype demonstrator service supporting the discovery and retrieval of detailed results of crystallography experiments has been deployed within an Open Archives digital library service model. Early challenges include the understanding of requirements in this specialized area of chemistry and reaching consensus on the design of a metadata model and schema. Future plans encompass the exploration of commonality and overlap with other schemas and across disciplines, working with publishers to develop mutually beneficial service models, and investigation of the pedagogical benefits. The potential improved access to experimental data to enrich scholarly communication from the perspective of both research and learning provides the driving force to continue exploring these issues.
	Susan Dumais, Edward Cutrell, JJ Cadiz, Gavin Jancke, Raman Sarin, and Daniel C. Robbins. Stuff i've seen: a system for personal information retrieval and re-use. In Proceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 72-79. ACM Press, 2003.
	Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Scott Deerwester, and Richard Harshman. Using latent semantic analysis to improve access to textual information. In Proceedings of the Conference on Human Factors in Computing Systems CHI'88, 1988. A main citation for LSI. It explains roughly how it works.
	M. H. Dunham and A. Helal. A mobile transaction model that captures both the data and movement behavior. Mobile Networks and Applications, 2(2):149-62, 1997. Unlike distributed transactions, mobile transactions do not originate and end at the same site. The implication of the movement of such transactions is that classical atomicity, concurrency and recovery solutions must be revisited to capture the movement behavior. As an effort in this direction, we define a model of mobile transactions by building on the concepts of split transactions and global transactions in a multidatabase environment. Our view of mobile transactions, called kangaroo transactions, incorporates the property that transactions in a mobile computing system hop from one base station to another as the mobile unit moves through cells. Our model is the first to capture this movement behavior as well as the data behavior which reflects the access to data located in databases throughout the static network. The mobile behavior is dynamic and is realized in our model via the use of split operations. The data access behavior is captured by using the idea of global and local transactions in a multidatabase system.
	Elke Dunker. Cross-cultural usability of the library metaphor. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Computing metaphors have become an intricate part of information systems design. Yet, they are deeply rooted in cultural practices. This paper presents an investigation of the cross-cultural use and usability of the library metaphor in digital libraries. The study examines the relevant features of the Maori culture in New Zealand, their form of knowledge transfer and their use of real world and digital libraries. On this basis the paper points out why and when the library metaphor fails Maori and other indigenous users and how this knowledge can contribute to the improvement of future designs.
	Hayley Dunlop, Matt Jones, and Sally Jo Cunningham. A digital library of conversational expressions: A communication aid for people with profound physical disabilities. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. This paper describes the development of a communication aid for people with profound physical disabilities, people who cannot communicate verbally, and who cannot use conventional communication tools. The Greenstone digital library software has been used to construct a digital library of common conversational expressions. A case study approach was adopted, and the target user for this particular digital library was a local high school student. Tailoring the digital library's contents to this user entailed identifying physical accessibility considerations for her, developing a suitable mode of interaction with the digital library software, populating the digital library with appropriate expressions for her, and evaluating the digital library interface. Evaluation involved both a qualitative user evaluation session and a quantitative analysis of the time and effort required to use each of three proposed searching interfaces.
	Jon W. Dunn and Costance A. Mayer. Variations: A digital music library system at indiana university. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. The field of music provides an interesting context for the development of digital library systems due to the variety of information formats used by music students and scholars. The VARIATIONS digital library project at Indiana University currently delivers online access to sound recordings from the collections of IU's William and Gayle Cook Music Library and is developing access to musical score images and other formats. This paper covers the motivations for the creation of VARIATIONS, an overview of its operation and implementation, user reactions to the system, and future plans for development.
	Oliver M. Duschka. Query Planning and Optimization in Information Integration. PhD thesis, Stanford University, December 1997.
	Naomi Dushay. Localizing experience of digital content via structural metadata. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. With the increasing technical sophistication of both information consumers and providers, there is increasing demand for more meaningful experiences of digital information. We present a framework that separates digital object experience, or rendering, from digital object storage and manipulation, so the rendering can be tailored to particular communities of users. Our framework also accommodates extensible digital object behaviors and interoperability. The two key components of our approach are 1) exposing structural metadata associated with digital objects - metadata about labeled access points within a digital object and 2) information intermediaries called context brokers that match structural characteristics of digital objects with mechanisms that produce behaviors. These context brokers allow for localized rendering of digital information stored externally.
	Naomi Dushay, James C. French, and Carl Lagoze. Using query mediators for distributed searching in federated digital libraries. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. Resource discovery in a distributed digital library poses many challenges, one of which is how to choose search engines for query distribution. In this paper, we describe a federated, distributed digital library architecture and introduce the notion of a query mediator as a digital library service responsible for selecting among available search engines, routing queries to those search engines, and aggregating results. We examine operational data from the NCSTRL digital library, focusing on two characteristics of distributed resource discovery: availability (will a search engine respond within a time limit) and response time (how quickly will a search engine respond, given that it does respond) and distinguishing between the query mediator view of these characteristics and the indexer view. We also examine the accuracy of predictions we made of QM-view availability and response times of search engines.
	P. Duygulu, Kobus Barnard, J. F. G. de Freitas, and David A. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proceedings of the 7th European Conference on Computer Vision (ECCV '02), pages 97-112. Springer-Verlag, 2002.
	Lena Veiga e Silva, Alberto H. F. Laender, and Marcos Andre Goncalves. A usability evaluation study of a digital library self-archiving service. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. In this paper, we describe an evaluation study of a self-archiving service for the Brazilian Digital Library of Computing (BDBComp). We conducted an extensive usability experiment with several potential users, including graduate students, professors, and archivists/librarians. The results of the study are described and analyzed, following sound statistical principles.
	D. Eastlake. Universal payment preamble specification. W3C website: http://www.w3.org/ECommerce/specs/upp.txt.
	Joseph L. Ebersole. Response to dr. linn's paper. In IP Workshop Proccedings, 1994. Format: HTML Document (15K). Audience: Readers of Dr. Linn's article, lawyers. References: 5. Links: 0. Relevance: Low-medium. Abstract: Discusses the differences between a common carrier, distributor, and publisher. Also discusses trade secrets, fair use.
	Judith Edwards. The electronic world and central queensland university. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (6K) . Audience: DL '94 officials & attendees. References: 0 . Links: 1 . Relevance: Low. Abstract: Queensland U's interest in attending DL '94. Some statistics on current and expected use of networked information servers.
	Miles Efron, Jonathan Elsas, Gary Marchionini, and Junliang Zhang. Machine learning for information architecture in a large governmental website. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. This paper describes ongoing research into the application of machine learning techniques for improving access to governmental information in complex digital libraries. Under the auspices of the GovStat Project (http://www.ils.unc.edu/govstat), our goal is to identify a small number of semantically valid concepts that adequately spans the intellectual domain of a collection. The goal of this discovery is twofold. First we desire a principled aid to information architects. Second, automatically derived document-concept relationships are a necessary precondition for real-world deployment of many dynamic interfaces. The current study compares concept learning strategies based on three document representations: keywords, titles, and full-text. In statistical and user-based studies, human-created keywords provide significant improvements in concept learning over both title-only and full-text representations.
	Miles Efron and Donald Sizemore. Link attachment (preferential and otherwise) in contributor-run digital libraries [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Ibiblio is a digital library whose materials are submitted and maintained by volunteer contributors. This study analyzes the emergence of hyperlinked structures within the ibiblio collection. In the context of ibiblio, we analyze the suitability of Barabasi's model of preferential attachment to describe the distribution of incoming links. We find that the degree of maintainer activity for a given site (as measured by the voluntary development of descriptive metadata) is a stronger link count predictor for ibiblio than is a site's age, as the standard model predicts. Thus we argue that the efforts of ibiblio's contributors positively affect the popularity of their materials.
	Jr. E.G. Coffman, Zhen Liu, and Richard R. Weber. Optimal robot scheduling for web search engines. Technical report, INRIA, 1997.
	Dennis E. Egan, Joel R. Remde, and Thomas K. Landauer. Behavioral evaluation and analysis of a hypertext browser. In Proceedings of the Conference on Human Factors in Computing Systems CHI'89, 1989.
	L. Egghe and R. Rousseau. Introduction to Informetrics. Elsevier, 1990.
	Kate Ehrlich and Debra Cash. Turning information into knowledge: Information finding as a collaborative activity. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (30K) . Audience: Non-technical, social science, `work flow`. References: 16. Links: 1. Relevance: Low. Abstract: Case study of customer service organization that uses Lotus Notes. Discusses importance of face-to-face, informal communication, human `information mediators`.
	Thomas Ellman. Approximation and abstraction techniques for generating concise answers to database queries. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Ahmed K. Elmagarmid. Database Transaction Models for Advanced Applications. Morgan Kaufmann, San Mateo, CA, 1992.
	Sara Elo. Augmenting text: Good news on disasters. In DAGS '95, 1995. Format: HTML Document (30K + picture) . Audience: General . References: 12. Links: 3. Relevance: Medium. Abstract: News wire stories on disasters are annotated with facts that relate to the reader's local region. (eg, casualties are cast as a multiple of the hometown population). Readers from different locales see different aug mentations. Frames triggered by disaster keywords are filled in with relevant material, which is then `personalized`.
	T. Todd Elvins, David R. Nadeau, Rina Schul, and David Kirsh. Worldlets: 3d thumbnails for 3d browsing. In Proceedings of the Conference on Human Factors in Computing Systems CHI'98, 1998.
	D.W. Embley. NFQL: The natural forms query language. ACM Transactions on Database Systems, 14(2):168-211, June 1989. They go beyond retrieval, to include updates and other ops
	Robert Engelmore and Tony Morgan. Blackboard Systems. Addison-Wesley, 1988. A collection of papers that introduce blackboard systems, that provide a historical perspective of blackboard systems, that evaluate the contributions made by different systems, and that illustrate by example the range of blackboard applications and implementations.
	John S. Erickson. A copyright management system for networked interactive multimedia. In DAGS '95, 1995. Format: HTML Document (13K + pictures) . Audience: Multimedia developers, computer scientists. References: 8. Links: 1. Relevance: Medium-Low. Abstract: Describes a rights management system for multimedia objects called LicensIt. Wrapper around object includes information about author, rights required, and digital signature to verify authenticity. Viewing objects i s through special LicensIt viewers or through commercial applications with LicenseIt plug-ins.
	D. Faensen, L. Faulstich, H. Schweppe, A. Hinze, and A. Steidinger. Hermes - a notification service for digital libraries. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. The high publication rate of scholarly material makes searching and browsing an inconveninet way to keep oneself up-to-date. Instead of being the active part in information access, researchers want to be notified whenever a new paper in one's research area is published. While more and more publishing houses or portal sites offer notification services this approach has several disadvantages. We introduce the Hermes alerting service, a service that integrates a variety of different information providers making their heterogeneity transparent for the users. Hermes offers sophisticated filtering capabilities preventing the user from drowning in a flood of irrelevant information. From the user's point of view it integrates the providers into a single source. Its simple provider interface makes it easy for publishers to join the service and thus reaching the potential readers directly. This paper presents the architecture of the Hermes service and discusses the issues of heterogeneity of information sources. Furthermore, we discuss the benefits and disadvantages of message-oriented middleware for implementing such a service for digital libraries.
	C. Faloutsos and S. Christodoulakis. Signature files: An access method for documents and its analytical performance evaluation. ACM Transactions on Office Information Systems, 2(4):267-288, October 1984.
	C. Faloutsos and D. Oard. A survey of information retrieval and filtering methods. Technical report, Dept. of Computer Science, University of Maryland, 1995.
	Christos Faloutsos. Access methods for text. ACM Computing Surveys, 17(1):49-74, March 1985.
	Jianping Fan, Hangzai Luo, and Lide Wu. Semantic video classification and feature subset selection under context and concept uncertainty. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. As large-scale collection of medical videos comes into view, there is an urgent need to develop semantic medical video classification techniques and enable video retrieval at the semantic level. However, most existing batch-based classifier training techniques still suffer from context and concept uncertainty problems when only a limited number of labeled training samples are available. To address the context and concept uncertainty problems, we have proposed a novel framework by integrating large-scale unlabeled samples with a limited number of labeled samples to enable more effective feature subset selection, parameter estimation and model selection. Specifically, this framework includes: (a) A novel multimodal context integration and semantic video concept interpretation framework; (b) A novel classifier training technique by integrating feature subset selection, parameter estimation and model selection seamlessly in a single algorithm to address the context uncertainty problem over time; (c) A cost-sensitive semantic video classification framework to address the concept uncertainty problem. Our experimental results in a certain medical education video domain have also been provided a convincing proof of our conclusions.
	Adam Farquhar, Angela Dappert, Richard Fikes, and Wanda Pratt. Integrating information sources using context logic. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	S. Feiner. Seeing the forest for the trees: Hierarchical display of hypertext structure. In Conference on Office Information Systems, New York:ACM, pages 205-212, 1988.
	Jean-Daniel Fekete and Micole Dufournaud. Compus: Visualization and analysis of structured documents for understanding social life in the 16th century. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. This article describes the Compus visualization system that assists in the exploration and analysis of structured document copora encoded in XML. Compus has been developed for and applied to a corpus of 100 French manuscript letters of the 16th century, transcribed and encoded for scholarly analysis using the recommendations of the Text Encoding Initiative. By providing a synoptic visualization of a corpus and allowing for dynamic queries and structural transformations, Compus assists researchers in finding regularities or discrepancies, leading to a higher level analysis of historic source. compus can be used with other richly encoded text copora as well.
	An Feng and Toshiro Wakayama. SIMON: a grammar-based transformation system for structured documents. Electronic Publishing: Origination, Dissemination and Design, 6(4):361-372, December 1993.
	Michelle Ferebee, Greg Boeshaar, Kathryn Bush, and Judy Hertz. A scientific digital library in context: An earth radiation budget experiment collection in the atmospheric sciences data center digital library [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. At the NASA Langley Research Center, the Earth Radiation Budget Experiment (ERBE) Data Management Team and the Atmodpheric Sciences Data Center are developing a digital collection for the ERBE project. The main goal is long-term preservation of a comprehensive information environment. The secondary goal is to provide a context for these data products by centralizing the 25 year research project's scattered information elements. The development approach incorporates elements of rapid prototyping and user-centered design in a standards-based implementation. A working prototype is in testing with a small number of users.
	E. Fernandez, R. Summers, and C. Wood. Database Security and Integrity. Addison-Wesley, 1981.
	Mary F. Fernandez, Daniela Florescu, Jaewoo Kang, Alon Y. Levy, and Dan Suciu. STRUDEL: A web-site management system. In Proceedings of the International Conference on Management of Data, pages 549-552, 1997.
	Richard Fikes, Robert Engelmore, Adam Farquhar, and Wanda Pratt. Network-based information brokers. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Laura Fillmore. How we must think. In JEP. Format: HTML Version (25K)> . Audience: Publishers. References: 0. Links: 0. Relevance: Low-Medium. Abstract: The president of the Online Bookstore gives her suggestions for other publishers to succeed in the digital age. Think creatively about things that were not possible in paper. Add value by licensing content then giv ing people a framework through which to think about it. Quote from Gregory Rawlins, a computer science professor at Indiana University: `If you're not part of the steamroller, you're part of the road.`
	Laura Fillmore. Internet publishing in a borderless environment: Bookworms into butterflies. In JEP, 1994. Format: HTML Document (18K) . Audience: Publishers. References: 0. Links: 0. Relevance: Low. Abstract: Electronic publishing will need to account for the distributed nature of the Internet. Roles for a publisher include: Imprimatur of quality, content filter. Verifying authenticity of the files. Creating context ar ound core content; Developing and maintaining an equitable royalty system based on number of accesses. Customizing the content for the readers.
	Laura Fillmore. Online publishing: Threat or menace? In JEP, 1994. Format: HTML Document (30K) . Audience: General public, publishers. References: 0. Links: 0. Relevance: low. Abstract: One person's view on the future of publishing and books. Time to press has decreased; `non-linear` thinking is encouraged; people use on-line resources differently than traditional books; piracy is not likely to be a big problem; publishers still needed to publicize.
	Janet Fisher. Copyright: The glue of the system. In JEP, 1994. Format: HTML Document (15K) . Audience: Publishers, scholars, authors. References: 0. Links: 0. Relevance: low. Abstract: An MIT Press director gives her position on copyrights: Journal publishers are essential because they 1) take care of requests for reprints, etc and 2) provide the filter which full-text on-line services need to determine quality. Individual authors or their institutions could not do these economically. Does suggest some changes to current law, like allowing authors to copy for their own classes without fee.
	G. W. Fitzmaurice. Situated information spaces and spatially aware palmtop computers. Communications of the ACM, 36(7):38-49, Jul 1993. Explores and uncovers a wide range of issues surrounding computer-augmented environments. The Chameleon prototype and a set of computer-augmented applications are described. Chameleon is a prototype system under development at the University of Toronto. It is part of an investigation on how palmtop computers designed with a high-fidelity monitor can become spatially aware of their location and orientation and serve as bridges or portholes between computer-synthesized information spaces and physical objects. In this prototype design, a 3D input controller and an output display are combined into one integrated unit.
	George W. Fitzmaurice, Shumin Zhai, and Mark H. Chignell. Virtual reality for palmtop computers. ACM Trans. Inf. Syst., 11(3):197-218, 1993. We are exploring how virtual reality theories can be applied toward palmtop computers. In our prototype, called the Chameleon, a small 4-inch hand-held monitor acts as a palmtop computer with the capabilities of a Silicon graphics workstation. A 6D input device and a response button are attached to the small monitor to detect user gestures and input selections for issuing commands. An experiment was conducted to evaluate our design and to see how well depth could be perceived in the small screen compared to a large 21-inch screen, and the extent to which movement of the small display (in a palmtop virtual reality condition) could improve depth perception, Results show that with very little training, perception of depth in the palmtop virtual reality condition is about as good as corresponding depth perception in a large (but static) display. Variations to the initial design are also discussed, along with issues to be explored in future research, Our research suggests that palmtop virtual reality may support effective navigation and search and retrieval, in rich and portable information spaces.
	Flickr.com. http://www.flickr.com.
	Daniela Florescu, Daphne Koller, and Alon Levy. Awareness services for digital libraries. In Proceedings of the Twenty-third International Conference on Very Large Databases, 1997. Deals with prioritizing queries to information sources on the web, i.e., by first querying the ones more likely to be relevant.
	Daniela Florescu, Alon Y. Levy, and Alberto O. Mendelzon. Database techniques for the world-wide web: A survey. SIGMOD Record, 27(3):59-74, 1998.
	Kathleen M. Flynn. The knowledge manager as a digital librarian: An overview of the knowledge management pilot program at the mitre corporation. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document (10K) . Audience: Corporate librarians. References: 0. Links: 1. Relevance: Low. Abstract: Discusses the role of a Knowledge Manager (formerly corporate librarian. In particular, finding and organizing new networked information resources.
	Peter W. Foltz. Using latent semantic indexing for information filtering. In Proceedings of the Conference on Office Information Systems, 1990. LSI study to show how well it predicts interestingness of newsgroup articles.
	P.W. Foltz and S.T. Dumais. Personalized information delivery: an analysis of information methods. Communications of the ACM, 35(12):51-60, December 1992. With the increasing availability of information in electronic form, it becomes more important and feasible to have automatic methods to filter information. The results of an experiment aimed at determining the effectiveness of four information-filtering methods in the domain of technical reports are presented. The experiment was conducted over a six-month period with 34 users and over 150 new reports published each month. Overall, the authors conclude that filtering methods show promise for presenting personalized information.
	Leonard N. Foner. Clustering and information sharing in an ecology of cooperating agents. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Consortium for University Printing and Information Distribution (CUPID). Protocols and services (version 1): An architectural overview. In IP Workshop Proceedings, 1994. Format: HTML Document (52K). Audience: Publishers, layperson, non-technical. References: 0. Links: 0. Relevance: Low. Abstract: Discussion of the CUPID project, basically just-in-time printing, (of textbooks, e.g.) at trusted printshops. No terminal display, hardcopy only.
	Muriel Foulonneau, Timothy W. Cole, Thomas G. Habing, and Sarah L. Shreeves. Using collection descriptions to enhance an aggregation of harvested item-level metadata. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. As an increasing number of digital library projects embrace the harvesting of item-level descriptive metadata, issues of description granularity and concerns about potential loss of context when harvesting item-level metadata take on greater significance. Collection-level description can provide added context for item-level metadata records harvested from disparate and heterogeneous providers. This paper describes an ongoing experiment using collection-level description in concert with item-level metadata to improve quality of search and discovery across an aggregation of metadata describing resources held by a consortium of large academic research libraries. We present details of approaches implemented so far and preliminary analyses of the potential utility of these approaches. The paper concludes with a brief discussion of related issues and future work plans.
	A. Fox, S. D. Gribble, Y. Chawathe, A. S. Polite, A. Huang, B. Ling, and E. A. Brewer. Orthogonal extensions to the www user interface using client-side technologies. In Proceedings of the ACM Symposium on User Interface Software and Technology. 10th Annual Symposium. UIST '97, pages 83-4, Oct 1997. We describe our experience implementing orthogonal extensions to the existing WWW user interface, to support user control of intelligent services. Our extensions are orthogonal in that they provide an interface to a service, which complements the Web browsing experience but is independent of the content of any particular site. We base our experiments on the TranSend service at UC Berkeley, which performs lossy compression on inline images to accelerate dialup Web access for a community of 25,000 subscribers. The service keeps a separate preferences profile for each user, which allows each user to vary the aggressiveness of lossy compression, selectively turn off the service for certain pages, and select the type of interface provided for refinement of degraded (lossily compressed) content. We are exploring three technologies for implementing the TranSend service interface: HTML decoration, Java and JavaScript.
	Armando Fox and Eric A. Brewer. Reducing www latency and bandwidth requirements by real-time distillation. Comput. Netw. ISDN Syst. (Netherlands), Computer Networks and ISDN Systems, 28(7-11):1445-56, May 1996. cache storage; client-server systems; computer communications software; data compression; Internet; network servers; real-time systems; network latency; bandwidth requirements; real-time distillation; Pythia proxy mechanism; World Wide Web; real-time refinement; statistical models; metered cellular phone service; transcoding; client-side rendering; data representation; client display constraints; content optimization; PPP; Point-to-Point Protocol; image loading; added value. The Pythia proxy mechanism provides three important orthogonal benefits to World Wide Web (WWW) clients. (1) Real-time distillation and refinement, guided by statistical models, allow the user to bound latency and exercise explicit control over bandwidth that may be scarce and expensive (e.g. a metered cellular phone service). (2) Transcoding to a representation understood directly by the client system may improve rendering on the client or result in a representation that can be transmitted more efficiently. (3) Knowledge of client display constraints allows content to be optimized for rendering on the client. Users have commented that even the prototype version of Pythia provides a qualitative increase of about 5 times when surfing the World Wide Web over PPP (Point-to-Point Protocol) with a 14.4 kbit/s modem. These are the same users that previously turned image loading off completely in order to make surfing bearable. With the continued growth of the WWW, the benefits afforded by proxied services like Pythia will represent increasingly significant added value to end users and content providers alike. Pythia is the first fruit of a comprehensive research agenda aimed at implementing and deploying such services.
	Armando Fox, Ian Goldberg, Steven D. Gribble, and David C. Lee. Experience with top gun wingman: A proxy-based graphical web browser for the 3com palmpilot. In Proceedings of Middleware '98, Lake District, England, September 1998, 1998.
	Armando Fox, Steven D. Gribble, Eric A. Brewer, and Elan Amir. Adapting to network and client variability via on-demand dynamic distillation. SIGPLAN Not. (USA), SIGPLAN Notices, 31(9):160-70, Sep 1996. Also Seventh Intl. Conf. on Arch. Support for Prog. Lang. and Oper. Sys. (ASPLOS-VII). The explosive growth of the Internet and the proliferation of smart cellular phones and handheld wireless devices is widening an already large gap between Internet clients. Clients vary in their hardware resources, software sophistication, and quality of connectivity, yet server support for client variation ranges from relatively poor to none at all. In this paper we introduce some design principles that we believe are fundamental to providing `meaningful` Internet access for the entire range of clients. In particular, we show how to perform on-demand datatype-specific lossy compression on semantically typed data, tailoring content to the specific constraints of the client. We instantiate our design principles in a proxy architecture that further exploits typed data to enable application-level management of scarce network resources. Our proxy architecture generalizes previous work addressing all three aspects of client variation by applying well-understood techniques in a novel way, resulting in quantitatively better end-to-end performance, higher quality display output, and new capabilities for low-end clients.
	W. B. Frakes and R. Baeza-Yates. Information Retrieval Data Structures & Algorithms. Prentice Hall, Englewood Cliffs, N.J., 1992.
	L. Francis. Mobile computing-a fact in your future. In 15th Annual International Conference on Computer Documentation Conference Proceedings. SIGDOC '97. Crossroads in Communication, pages 63-7, 1997. Mobile computing is now at the stage where cell phones were 5-7 years ago. Laptops are frequently the choice of telecommuters who put in significant amounts of time both at home and at the office, but there is a growing group of mobile users who work from more than two locations and who expect to perform their full job responsibilities using a laptop that rarely returns to the main office. Although a mobile PC can be used without ever connecting to a network, they are typically connected with or without wires. Wired systems are most common and generally use modems with the dial-up lines found in homes or hotels. Wireless connections are increasing in popularity and use cellphone-like radio links to send and receive information, but mobile computing is not just about wireless connections; it is also about using your laptop in a hotel (in any country), at home, in a branch office or at a customer site. Using a laptop in those locations frequently reduces your interaction speed and range of functions to an unacceptable level, but recent improvements have attacked these problems. To really feel the freedom offered by mobile computing, imagine setting up overseas in your customer's spare office and working as if you were in your own office. Imagine being in a foreign country and not having to load printer drivers for each printer and load US fonts for each job in order to produce properly printed output. If this sort of future appeals to you, you're not alone. A likely and growing group is people who already use laptops. In 1996, laptops comprised 30-35 are forecast to be manufactured.
	Luis Francisco-Revilla, III Frank M. Shipman, Richard Furuta, Unmil Karadkar, and Avital Arora. Perception of content, structure, and presentation changes in web-based hypertext. In HYPERTEXT '01: Proceedings of the twelfth ACM conference on Hypertext and Hypermedia, pages 205-214, New York, NY, USA, 2001. ACM Press. The Web provides access to a wide variety of information but much of this information is fluid; it changes, moves, and occasionally disappears. Bookmarks, paths over Web pages, and catalogs like Yahoo! are examples of page collections that can become out-of-date as changes are made to their components. Maintaining these collections requires that they be updated continuously. Tools to help in this maintenance require an understanding of what changes are important, such as when pages no longer exist, and what changes are not, such as when a visit counter changes. We performed a study to look at the effect of the type and quantity of change on people's perception of its importance. Subjects were presented pairs of Web pages with changes to either content (e.g., text), structure (e.g., links), or presentation (e.g., colors, layout). While changes in content were the most closely connected to subjects perceptions of the overall change to a page, subjects indicated a strong desire to be notified of structural changes. Subjects only considered the simultaneous change of many presentation characteristics as important.
	Luis Francisco-Revilla, Frank Shipman, Richard Furuta, Unmil Karadkar, and Avital Arora. Managing change on the web. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. Increasingly, digital libraries are being defined that collect pointers to World-Wide Web based resources rather than hold the resources themselves. Maintaining these collections is challenging due to distributed document owership and high fluidity. Typically a collection's maintainer has to assess the relevance of changes with little system aid. In this paper, we describe the Walden's Paths Path Manager, which assists a maintainer in discovering when relevant changes occur to linked resources. The approach and system design was informed by a study of how humans perceive changes of Web pages. The study indicated that structural changes are key in determining the overall change and that presentation changes are considered irrelevant.
	Paolo Frasconi, Giovanni Soda, and Alessandro Vullo. Text categorization for multi-page documents: A hybrid naive bayes hmm approach. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. Text categorization is typically formulated as a concept learning problem where each instance is a single isolated document. In this paper we are interested in a moe general formulation where documents are organized as page sequences, as naturally occurring in digital libraries of scanned books and magazines. We describe a method for classifying pages of sequential OCR text dosuments into one of several assigned categories and suggest that taking into account contextual information provided by the whole page sequence can significantly improve classification accuracy. The proposed architecture relies on hidden Markov models whose emissions are bag-of-words according to a multinomial word event model, as in the generative portion of the Naive Bayes classifier. Our results on a collection of scanned journals from the Making of America project contirm the importance of using whole page sequences. Empirical evaluation indicates that the error rate (as obtained by running a plain Naive Bayes classifier on isolated page) can be roughly reduced by hald if contextual information is incorporated.
	Lisa Freeman. Testimony prepared on behalf of the association of american university presses for the national information infrastructure task force working group on intellectual property. In JEP, 1994. Format: HTML Document (10K) . Audience: Legislators. References: 0. Links: 0. Relevance: low. Abstract: AAUP believes the current copyright law is sufficient for use in the networked world. The copyright provides a valuable service in designating the `final` (peer-reviewed) copy. Reprint fees, contracts, and copy prot ection should not be mandated but handled by the copyright holders.
	James C. French. Modeling web data. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We have created three testbeds of web data for use in controlled experiments in collection modeling. This short paper examines the applicability of Ziff's and Heaps' laws as applied to web data. We find extremely close agreement between observed vocabulary growth and Heaps' law. We find reasonable agreement with Ziff's law for medium to low frequency terms. Ziff's law is a poor predictor for high frequency terms. These findings hold for all three testbeds although we restrict ourselves to one here due to space limitations.
	James C. French, A. C. Chapin, and Worthy N. Martin. An application of multiple viewpoints to content-based image retrieval [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Content-based image retrieval uses features that can be extracted from the images themselves. Using more than one representation of the images in a collection can improve the results presented to a user without changing the underlying feature extraction of search technologies. We present an example of this `multiple viewpoint` approach, multiple image channels. and discuss its advantages for an image-seeking user. This approach has also been shown to dramatically improve retrieval effectiveness in content-based image retrieval systems.
	Jim French, Ed Fox, Kurt Maly, and Alan Selman. Wide area technical report service: Technical reports online. Communications of the ACM, 38(4):45, April 1995. This is the WATERS paper.
	J. Frew, M. Aurand, B. Buttenfield, L. Carver, P. Chang, R. Ellis, C. Fischer, M. Gardner, M. Goodchild, G. Hajic, M. Larsgaard, K. Park, M. Probert, T. Smith, and Q. Zheng. The alexandria rapid prototype: Building a digital library for spatial information. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	Fred Friedman, Arthur M. Keller, Gio Wiederhold, Mike R. Berkowitz, John Salasin, and David L. Spooner. Reference model for ADA interfaces to database management systems. In Proceedings Second IEEE Computer Society Data Engineering Conference, 1986.
	David Frohlich, Allan Kuchinsky, Celine Pering, Abbe Don, and Steven Ariss. Requirements for photoware. In Proceedings of the 2002 ACM conference on Computer supported cooperative work, 2002.
	Yueyu Fu, Weimao Ke, and Javed Mostafa. Automated text classification using a multi-agent framework. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Automatic text classification is an important operational problem in digital library practice. Most text classification efforts so far concentrated on developing centralized solutions. However, centralized classification approaches often are limited due to constraints on knowledge and computing resources. In addition, centralized approaches are more vulnerable to attacks or system failures and less robust in dealing with them. We present a de-centralized approach and system implementation (named MACCI) for text classification using a multi-agent framework. Experiments are conducted to compare our multi-agent approach with a centralized approach. The results show multi-agent classification can achieve promising classification results while maintaining its other advantages.
	Yueyu Fu and Javed Mostafa. Integration of biomedical text and sequence oai repositories. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Archived biomedical literature and sequence data are growing rapidly. OAI-PMH provides a convenient way for data sharing, but it has not been tested in the biomedical domain, especially in dealing with different types of data, such as protein, and gene sequences. We built four individual OAI-PMH repositories based on different biomedical resources. Using the harvested data from the four repositories we created an integrated OAI-PMH repository, which hosts the linked literature and sequence data in a single place.
	Yueyu Fu and Javed Mostafa. Toward information retrieval web services for digital libraries. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Information retrieval (IR) functions serve a critical role in many digital library systems. There are numerous mature IR algorithms that have been implemented and it will be a waste of resources and time to re-implement them. Those IR algorithms can be modulated and composed through the framework of web services. Web services in IR domain have not been widely tested. Concept extraction is an important area in traditional IR. We demonstrated that it can be easily adopted as IR web services and can be accessed in multiple ways. For the IR web services, we take advantage of a term representation database which was created as a result of a previous digital library project containing 31,928,892 terms found on 49,602,191 pages of the Web.
	M. Fuchs. The user interface as document: Sgml and distributed applications. Computer Standards & Interfaces, 18(1):79-92, January 1996. Multi-user distributed applications running on heterogeneous networks must be able to display user interface components on several platforms. In wide-area public networks, such as the Internet, the mix of platforms and participants in an application will occur dynamically; the user interface will need to coexist with environments completely uncontrolled by the designer. We have dealt with this issue by considering user interfaces as a kind of document specifying the application`s requirements and adopting SGML technology to process them locally. This approach provides new flexibility, with implications for the design of network browsers, such as those of the World Wide Web, and leads to an interesting class of active documents.
	George W. Furnas. Generalized fisheye views. In Proceedings of the Conference on Human Factors in Computing Systems CHI'86, 1986.
	Kenneth Furuta. Librarianship in the digital library. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (4K) . Audience: Librarians and Digital Library researchers. References: 0. Links: 1. Relevance: Low. Abstract: A view on the role in classification, reference, ensuring access, and collection development for the librarian of a digital library.
	Richard Furuta. Defining and using structure in digital documents. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (31K) . Audience: Authors, Developers, slightly technical. References: 43. Links: 1. Relevance: Low-Medium. Abstract: Discussion of SGMLs, their motivation, research issues, how they might be extended to non-text objects. Distinction between content & presentation
	Richard Furuta, Catherine C. Marshall, Frank M. Shipman III, and John J. Leggett. Physical objects in the digital library. In Proceedings of DL'96, 1996. Format: Not yet online.
	Robert P. Futrelle and Xiaolan Zhang. Large-scale persistent object systems for corpus linguistics and information retrieval. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (40K + picture) . Audience: technical, computer scientists with some knowledge of computational linguistics. References: 31. Links: 1. Relevance: Medium-Low. Abstract: Discusses the challenges in indexing/searching large databases. Argues for a bootstrapping/machine learning approach to locate words in related contexts (surrounding words). Suggests specific data structures. Discusses tradeoffs between accuracy & speed, and scaling problems.
	Prasanne Ganesan, Qixiang Sun, and Hector Garcia-Molina. Yappers: A peer-to-peer lookup service over arbitrary topology. Technical Report 2002-24, Stanford University, 2002. Existing peer-to-peer search networks generally fall into two categories: Gnutella-style systems that use arbitrary topology and rely on controlled flooding for search, and systems that explicitly build an underlying topology to efficiently support a distributed hash table (DHT). In this paper, we propose a hybrid scheme for building a peer-to-peer lookup service over arbitrary network topology. Specifically, for each node in the search network, we build a small DHT consisting of nearby nodes and then provide an intelligent search mechanism that can traverse all the small DHTs. Our hybrid approach can reduce the nodes contacted for a lookup by an order of magnitude compared to Gnutella, allows rapid searching of nearby nodes through quick fan-out, does not reorganize the underlying overlay, and isolates the effect of topology changes to small areas for better scalability and stability.
	Y. J. Gao, J.J. Lim, and A.D. Narasimhalu. Fuzzy multilinkage thesaurus builder in multimedia information systems. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	H. Garcia-Molina, J. Hammer, J. Widom, W. Labio, and Y. Zhuge. The stanford data warehousing project. IEEE Data Engineering Bulletin, 18(2):41-48, June 1995.
	H. Garcia-Molina, J. Ullman, and J. Widom. Database System Implementation. Prentice-Hall, 2000.
	Hector Garcia-Molina, Luis Gravano, and Narayanan Shivakumar. dscam: Finding document copies across multiple databases. Proceedings of 4th International Conference on Parallel and Distributed Information Systems, 1996.
	Héctor García-Molina, Joachim Hammer, Kelly Ireland, Yannis Papakonstantinou, Jeffrey Ullman, and Jennifer Widom. Integrating and accessing heterogeneous information sources in TSIMMIS. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Hector Garcia-Molina, Steven Ketchpel, and Narayanan Shivakumar. Safeguarding and charging for information on the Internet. In Proceedings of the Fourteenth International Conference on Data Engineering, 1998. Available at http://dbpubs.stanford.edu/pub/1998-26.
	Héctor García-Molina, Wilburt Labio, and Ramana Yerneni. Capability sensitive query processing on internet sources. In Proceedings of the 15th International Conference on Data Engineering, Sydney, Australia, March 1999. Accessible at http://dbpubs.stanford.edu/pub/1998-40.
	E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471-479, 1972.
	Ullas Gargi. Managing and searching personal photo collections. Technical Report HPL-2002-67, HP Laboratories, March 2002.
	Ullas Gargi. Consumer media capture: Time-based analysis and event clustering. Technical Report HPL-2003-165, HP Laboratories, August 2003.
	John R. Garrett. Task force on archiving of digital information. D-Lib Magazine, Sep 1995. Format: HTML Document().
	Susan Gauch, Ron Aust, Joe Evans, John Gauch, Gary Minden, Doug Niehaus, and James Roberts. The digital video library system: Vision and design. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (29K) . Audience: slightly technical, generalist comfortable with technology. References: 23. Links: 1. Relevance: Medium-Low (but not mainstream DL). Abstract: Describes architecture of a system to retrieve & deliver video on demand. Indexing done by audio track or transcript. 100 hours of video to 20-30 users. Different compression modes depending on bandwidth of user co nnection.
	Geri Gay and June P. Mead. The common ground surrounding access: Theoretical and practical perspectives. In Proceedings of DL'96, 1996. Format: Not yet online.
	Maayan Geffet and Dror G. Feitelson. Hierarchical indexing and document matching in bow. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. BoW is an on-line bibliographical repository based on a hierarchical concept index to which entries are linked. Searching in the repository should therefore return matching topics from the hierarchy, rather than just a list of entries. Likewise, when new entries are inserted, a search for relevant topics to which they should be linked is required. We develop a vector-based algorithm that creates keyword vectors for the set of competing topics at each node in the hierarchy, and show how its performance improves when domain-specific features are added (such as special handling of topic titles and author names). The results of a 7-fold cross validation on a corpus of some 3,500 entries with a 5-level index are hit ratios in the range of 89-95the misclassifications are indeed ambiguous to begin with.
	Gary Geisler, Sarah Giersch, David McArthur, and Marty McClelland. Creating virtual collections in digital libraries: Benefits and implementation issues. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Digital libraries have the potential to not only duplicate many of the services provided by traditional libraries but to extend them. Basic finding aids such as search and browse are common in most of today's digital libraries. But just as a traditional library provides more than a card catalog and browseable shelves of books, an effective digital library should offer a wider range of services. Using the traditional library concept of special collections as a model, in this paper we propose that explicitly defining sub-collections in the digital library-virtual collections-can benefit both the library's users and contributors and increase its viability. We first introduce the concept of a virtual collection, outline the costs and benefits for defining such collections, and describe an implementation of collection- level metadata to create virtual collections for two different digital libraries. We conclude by discussing the implications of virtual collections for enhancing interoperability and sharing across digital libraries, such as those that are part of the National SMETE Digital Library.
	Hans W. Gellersen, Albercht Schmidt, and Michael Beigl. Multi-sensor context-awareness in mobile devices and smart artifacts. Mob. Netw. Appl., 7(5):341-351, 2002.
	Jim Gemmell, Gordon Bell, Roger Lueder, Steven Drucker, and Curtis Wong. Mylifebits: fulfilling the memex vision. In Proceedings of the tenth ACM international conference on Multimedia, pages 235-238. ACM Press, 2002.
	Michael R. Genesereth, Arthur M. Keller, and Oliver M. Duschka. Infomaster: An information integration system. In Proceedings of the International Conference on Management of Data, Tucson, Ariz., 1997. ACM Press, New York.
	Michael R. Genesereth and Steven P. Ketchpel. Software agent. Communications of the ACM, 37(7), July 1994. Discusses important issues related to agent-based software engineering, which was developed to create interaperable softwares.
	M.R. Genesereth, A.M. Keller, and O.M. Duschka. Infomaster: an information integration system. In SIGMOD Record, New York, 1997. ACM Press. Infomaster is an information integration system that provides integrated access to multiple distributed heterogeneous information sources on the Internet, thus giving the illusion of a centralized, homogeneous information system. We say that Infomaster creates a virtual data warehouse. The core of Infomaster is a facilitator that dynamically determines an efficient way to answer the user's query using as few sources as necessary and harmonizes the heterogeneities among these sources. Infomaster handles both structural and content translation to resolve differences between multiple data sources and the multiple applications for the collected data. Infomaster connects to a variety of databases using wrappers, such as for Z39.50, SQL databases through ODBC, EDI transactions, and other World Wide Web (WWW) sources. There are several WWW user interfaces to Infomaster, including forms based and textual. Infomaster also includes a programmatic interface and it can download results in structured form onto a client computer. Infomaster has been in production use for integrating rental housing advertisements from several newspapers (since fall 1995), and for meeting room scheduling (since winter 1996). Infomaster is also being used to integrate heterogeneous electronic product catalogs.
	Don Gentner, Frank Ludolph, and Chris Ryan. Simplified applications for network computers. In Proceedings of the Conference on Human Factors in Computing Systems CHI'97, 1997.
	D. Georgakopoulos, M. Hornick, and A. Sheth. An overview of workflow management: From process modeling to infrastructure for automation. Journal on Distributed and Parallel Database Systems, 3(2), 1995.
	Branko Gerovac and Richard J. Solomon. Protect revenues, not bits: Identify your intellectual property. In IP Workshop Proceedings, 1994. Format: HTML Document (40K). Audience: Standards committees, general technologists, technical sections. References: 14 footnotes. Links: 0 . Relevance: low-medium. Abstract: Discusses a header-based approach to identifying data streams, focusing on video domains. Gives a brief history of copyrights. Gives desiderata for standards/design to ensure interoperability, flexibility, extensib ility, etc. Gives concrete examples of encoding used for certain applications.
	N. Gershon, W. Ruh, J. LeVasseur, J. Winstead, and A. Kleiboemer. Searching and discovery of resources in digital libraries. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	Stean Gessler and Andreas Kotulla. Pdas as mobile www browsers. In Proceedings of the Second International World-Wide Web Conference, 1994.
	Paul Gherman. Image vision: Forging a national image alliance. In JEP, 1994. Format: HTML Document (11K) . Audience: Image catalogers & users, politicians. References: 0. Links: 0. Relevance: low. Abstract: Argues that all of the search & retrieval issues for bibliographic records are worse for images. There are more of them, there's no standard for representation or indexing. Calls for creation of universal image database, a single standard for representation, and a standard license agreement for image owners.
	David Gibson, Jon M. Kleinberg, and Prabhakar Raghavan. Inferring web communities from link topology. In HyperText, 1998.
	Aristides Gionis and Heikki Mannila. Finding recurrent sources in sequences. In Proceedings of the seventh annual international conference on Computational molecular biology, pages 123-130. ACM Press, 2003.
	Richard Giordano. Digital libraries and impacts on scientific careers. In Proceedings of DL'96, 1996. Format: Not yet online.
	Andreas Girgensohn, John Adcock, Matthew Cooper, Jonathan Foote, and Lynn Wilcox. Simplifying the management of large photo collections. In INTERACT '03: Ninth IFIP TC13 International Conference on Human-Computer Interaction, pages 196-203. IOS Press, September 2003.
	Andreas Girgensohn, John Adcock, and Lynn Wilcox. Leveraging face recognition technology to find and organize photos. In MIR '04: Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval, pages 99-106. ACM Press, 2004.
	Giovanni Giuffrida, Eddie C. Shek, and Jihoon Yang. Knowledge-based metadata extraction from postscript files. In Proceedings of the Fifth ACM International Conference on Digital Libraries. The automatic document metadata extraction process is an important task in a world where thousands of documents are just one click away. Thus, powerful indices are necessary to support effective retrieval. The upcoming XML standard represents an important step in this direction as its semistructured representation conveys document metadata together with the text of the document. For example, retrieval of scientific papers by authors or affiliations would be a straightforward task if papers were stored in XML. Unfortunately, today, the largest majority of documents on the web are available in forms that do not carry additional semantics. Converting existing documents to a semistructured representation is time consuming and no automatic process can be easily applied. In this paper we discuss a system, based on a novel spatial/visual knowledge principle, for extracting metadata from scientific papers stored as PostScript files. Our system embeds the general knowledge about the graphic layout of a scientific paper to guide the metadata extraction process. Our system can effectively assist the automatic index creation for digital libraries.
	Henry M. Gladney, Edward A. Fox, Zahid Ahmed, Ron Ashany, Nicholas J. Belkin, Michael Lesk, Richard Tong, and Maria Zemankova. Digital library: Gross structure and requirements (report from a march 1994 workshop). In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (35K) . Audience: DL researchers, workgroup attendees. References: 22. Links: 1. Relevance: Low-Medium. Abstract: The report of a working group on digital libraries. Defines terms, discusses a possible architecture in terms of resource managers and application enablers.
	Steven Glassman. A caching relay for the world wide web. In Proceedings of the First International World-Wide Web Conference, 1994.
	Dion Goh and John Leggett. Patron-augmented digital libraries. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Digital library research is mostly focused on the generation of large collections of multimedia resources and state-of-the-art tools for their indexing and retrieval. However, digital libraries should provide more than advanced collection maintenance and retrieval services since the ultimate goal of any academic library is to serve the scholarly needs of its users. This paper begins by presenting a case for digital scholarship in which patrons perform all scholarly work electronically. A proposal is then made for patron-augmented digital libraries (PADLs), a class of digital libraries that supports the digital scholarship of its patrons. Finally, a prototype PADL (called Synchrony) providing access to video segments and associated textual transcripts is described. Synchrony allows patrons to search the library for artifacts, create annotations/original compositions, integrate these artifacts to form synchronized mixed text and video presentations and, after suitable review, publish these presentations into the digital library if desired. A study to evaluate the PADL concept and usability of Synchrony is also discussed. The study revealed that participants were able to use Synchrony for the authoring and publishing of presentations and that attitudes toward PADLs were generally positive.
	Anna Keller Gold, Karen Baker, Kim Baldridge, and Jean-Yves Le Meur. Building flow: Federating libraries on the web. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. An individual scientist, a collaborative team and a research network have a variety of document management needs in common. The levels of research organization, when viewed as nested tiers, represent boundaries across which information can flow openly if technology and metadata standards are partnered to provide an accessible, interoperable digital framework. The CERN Document System (CDS), implemented by a research partnership at the San Diego Supercomputer Center (SDSC), establishes a prototype tiered repository system. An ongoing exploration of existing scientific research information infrastructure suggests modifications to enable cross-tier and cross-domain information flow across what could be represented as a metadata grid.
	D. Goldberg, D. Nichols, B.M. Oki, and D. Terry. Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12):61-70, December 1992. use user annotations to help with filtering.
	D. Goldberg, D. Nichols, B.M. Oki, and D. Terry. Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12):61-70, 1992. Tapestry is an experimental mail system developed at the Xerox Palo Alto Research Center. The system manages an in-coming stream of electronic documents, including E-mail, newswire stories and NetNews articles. The system implements a novel mechanism for collaborative filtering in which users annotate documents before the documents art filtered. Because annotations are not available at the time a new document arrives, the system supports continuous queries that examine the entire database of documents and take into account newly introduced annotations during the filtering process.
	Charles F. Goldfarb. The SGML Handbook. Oxford University Press, New York, 1990.
	T. Goldstein. The gateway security model in the java electronic commerce framework. In R. Hirschfeld, editor, Financial Cryptography First International Conference, FC'97. Proceedings., Berlin, Germany, 1997. Springer-Verlag. `This paper describes an extension to the current Java security model called the`Gatewayand why it was necessary to create it. This model allows secure applications, such as those used in electronic commerce, to safely exchange data and interoperate without compromising each individual application's security. The Gateway uses digital signatures to enable application programming interfaces to authenticate their caller. JavaSoft is using the Gateway to create a new integrated open platform for financial applications called Java Electronic Commerce Framework. The JECF will be the foundation for electronic wallets, point of sale terminals, electronic merchant servers and other financial software. The Gateway model can also be used for access control in many multiple application environments that require trusted interaction between applications from multiple vendors. These applications include browsers, servers, operating systems, medical systems and smartcards.
	Moises Goldszmidt and Mehran Sahami. A probabilistic approach to full-text document clustering. Technical report, SRI International, 1998.
	Gene Golovchinsky. Queries? links? is there a difference? In Proceedings of the Conference on Human Factors in Computing Systems CHI'97, 1997.
	G. Golub and C. Van Loan. Matrix Computations. John Hopkins Press, 1989.
	Marcos Andre Goncalves and Edward A. Fox. 5sl - a language for declarative specification and generation of digital libraries. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Digital libraries (DLs) are among the most complex kinds of information systems, due in part to their intrinsic multi-disciplinary nature. Nowadays DLs are built within monolithic, tightly integrated, and generally inflexible systems - or by assembling disparate components together in an ad-hoc way, with resulting problems in interoperability and adaptability. More importantly, conceptual modeling, requirements analysis, and software engineering approaches are rarely supported, making it extremely difficult to tailor DL content and behavior to the interests, needs, and preferences of particular communities. In this paper, we address these problems. In particular, we present 5SL, a declarative language for specifying and generating domain- specific digital libraries. 5SL is based on the 5S formal theory for digital libraries and enables high-level specification of DLs in five complementary dimensions, including: the kinds of multimedia information the DL supports (Stream Model); how that information is structured and organized (Structural Model); different logical and presentational properties and operations of DL components (Spatial Model); the behavior of the DL (Scenario Model); and the different societies of actors and managers of services that act together to carry out the DL behavior (Societal Model). The practical feasibility of the approach is demonstrated by the presentation of a 5SL digital library generator for the MARIAN digital library system.
	Marcos Andre Goncalves, Edward A. Fox, Aaron Krowne, Pavel Calado, Alberto H.F. Laender, Altigran S. da Silva, and Berthier Ribeiro-Neto. The effectiveness of automatically structured queries in digital libraries. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Structured or fielded metadata is the basis for many digital library services, including searching and browsing. Yet, little is known about the impact of using structure in the effectiveness of such services. In this paper, we investigate a key research question: do structured queries improve effectiveness in DL searching? To answer this question, we empirically compared the use of unstructured queries to the use of structured queries. We then tested the capability of a simple Bayesian network system, built on top of a DL retrieval engine, to infer the best structured queries from the keywords entered by the user. Experiments performed with 20 users working with a DL containing a large collection of computer science literature clearly indicate that structured queries, either manually constructed or automatically generated, perform better than their unstructured counterparts, in the majority of cases. Also, automatic structuring of queries appears to be an effective and viable alternative to manual structuring that may significantly reduce the burden on users.
	Marcos Andr‰ Gon‡alves, Ganesh Panchanathan, Unnikrishnan Ravindranathan, Aaron Krowne, Edward A. Fox, Filip Jagodzinski, and Lillian Cassel. The xml log standard for digital libraries: Analysis, evolution, and deployment [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. We describe current efforts and developments building on our proposal for an XML log standard format for digital library (DL) logging analysis and companion tools. Focus is given to the evolution of formats and tools based on analysis of deployment in several DL systems and testbeds. Recent development of analysis tools also is discussed.
	Google inc. http://www.google.com.
	Chetan Gopal and Roger Price. Multimedia information delivery and the mheg standard. In DAGS '95, 1995. Format: Not Yet On-line. Audience: Multimedia standards setters, developers - technical. References: 11. Relevance: Low. Abstract: Describes the MHEG standard being developed for multimedia objects and applications. Designed to deliver real-time interchange of multimedia objects over wide area networks.
	D. A. Gorssman and J. R. Driscoll. Structuring text within a relation system. In Proc. of the 3rd Intl. Conf. on Database and Expert System Applications, pages 72-77, September 1992.
	Adrian Graham, Hector Garcia-Molina, Andreas Paepcke, and Terry Winograd. Time as essence for photo browsing through personal digital libraries. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We developed two photo browsers for collections with thousands of time-stamped digital images. Modern digital cameras record photo shoot times, and semantically related photos tend to occur in bursts. Our browsers exploit the timing information to structure the collections and to automatically generate meaningful summaries. The browsers differ in how users navigate and view the structured collections. We conducted user studies to compare the two browsers and a commercial image browser. Our results show that exploiting the time dimension and appropriately summarizing collections can lead to significant improvements. For example, for one task category, one of our browsers enabled a 33commercial browser. Similarly, users were able to complete 29when using this same browser.
	Peter S. Graham. Intellectual preservation and electronic intellectual property. In IP Workshop Proceedings, 1994. Format: HTML Document (43K). Audience: Non-technical, librarians. References: 13 notes. Links: 0. Relevance: Low. Abstract: Discussion of ensuring authenticity of documents, essentially just notarization.
	Peter S. Graham. The digital research library: Tasks and commitments. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document(36K) . Audience: Librarians. References: 23. Links: 8. Relevance: Low. Abstract: Discusses technical and organizational challenges which must be met to have a real digital research library. SIgnificant one is obtaining the institutional commitments to ensure longevity of the collection and acces s to it. Discusses some of the required tasks (like cataloging, backup , authentication) at a high level.
	Karen D. Grant, Adrian Graham, Tom Nguyen, Andreas Paepcke, and Terry Winograd. Beyond the shoe box: Foundations for flexibly organizing photographs on a computer. Technical Report 2002-45, Stanford University, 2002. As a foundation for designing computer-supported photograph management tools, we have been conducting focused experiments. Here, we describe our analysis of how people initially organize batches of familiar images. We asked 26 subjects in pairs to organize 50 images on a common horizontal table. Each pair then organized a different 50-image set on a computer table of identical surface area. The bottom-projected computer tabletop displayed our interface to several online, pile-based affordances we wished to evaluate. Subjects used pens to interact with the system. We highlight aspects of the computer environment that were notably important to subjects, and others that they cared about less than we had hypothesized. For example, a strong majority preferred computer-generated representations of piles to be grid- shaped over several alternatives, some of which mimicked the physical world closely, and others that used transparency to save space.
	Luis Gravano, Chen-Chuan K. Chang, Héctor García-Molina, and Andreas Paepcke. STARTS: Stanford protocol proposal for Internet retrieval and search. Technical Report SIDL-WP-1996-0043; 1997-68, Stanford University, August 1996. Accessible at http://dbpubs.stanford.edu/pub/1997-68.
	Luis Gravano, Chen-Chuan K. Chang, Héctor García-Molina, and Andreas Paepcke. STARTS: Stanford proposal for Internet meta-searching. In Proceedings of the International Conference on Management of Data, 1997.
	Luis Gravano, Chen-Chuan K. Chang, Hector Garcia-Molina, and Andreas Paepcke. Starts: Stanford proposal for internet meta-searching. In Proc. of the 1997 ACM SIGMOD International Conference On Management of Data, 1997.
	Luis Gravano and Héctor García-Molina. Generalizing GlOSS to vector-space databases and broker hierarchies. In Proceedings of the Twenty-first International Conference on Very Large Databases, pages 78-89, September 1995.
	Luis Gravano and Hector Garcia-Molina. Merging ranks from heterogeneous internet sources. In Proceedings of the Twenty-third International Conference on Very Large Databases, 1997.
	Luis Gravano, Héctor García-Molina, and Anthony Tomasic. The effectiveness of GlOSS for the text-database discovery problem. In Proceedings of the International Conference on Management of Data, May 1994. The popularity of on-line document databases has led to a new problem: finding which text databases (out or many candidate choices) are the most relevant to a user. Identifying the relevant databases for a given query is the text database discovery problem. The first part of this paper presents a practical solution based on estimating the result size of a query and a database. The method is termed GLOSS-Glossary of Servers Server. The second part of this paper evaluates the effectiveness of GLOSS based on a trace of real user queries. In addition, we analyze the storage cost of our approach.
	Jim Gray, Pat Helland, Patrick E. O'Neil, and Dennis Shasha. Dangers of replication and a solution. SIGMOD, pages 173-82, June 1996. Update anywhere-anytime-anyway transactional replication has unstable behavior as the workload scales up: a ten-fold increase in nodes and traffic gives a thousand fold increase in deadlocks or reconciliations. Master copy replication (primary copy) schemes reduce this problem. A simple analytic model demonstrates these results. A new two-tier replication algorithm is proposed that allows mobile (disconnected) applications to propose tentative update transactions that are later applied to a master copy. Commutative update transactions avoid the instability of other replication schemes
	Jim Gray and Andreas Reuter. Transaction Processing: concepts and techniques. Morgan Kaufmann Publishers, Inc., 1993. This is a comprehensive book on transaction processing. Chapter 3 introduces the concept of Fault Tolerance. Chapter 4 presents different transaction models. Chapters 5,6 give an overview of the functionality of the TP monitor. Chapters 7,8 describe concurrency control and its implementation. Chapter 9 give an overview of recovery and how to implement logs. Chapter 10,11 defines a transaction manager and how to implement it. Chapter 12 is a compedium of advanced transaction manager topics including heterogeneous commit coordinators, non-blocking commit coordinators, transfer of commit, optimization of 2-phase commit, and disaster recovery. The rest of the book describes in detail the low level implementation of a transaction processing system, and provides a survery of TP systems in the market.
	Robert Gray. Content-based image retrieval: Color and edges. In DAGS '95, 1995. Format: Not Yet On-line. Audience: Vision/graphics researchers. References: 15. Links: . Relevance: Low. Abstract: Technical description of implementation of two techniques to retrieve images based on color histograms and edge maps. Implemented and tested on a small (48 image) database. Results mixed at best. Weaknesses identif ied for future work.
	Noah Green, Panagiotis G. Ipeirotis, and Luis Gravano. Sdlip + starts = sdarts: A protocol and toolkit for metasearching. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. In this paper we describe how we combined SDLIP and STARTS, two complementary protocols for searching over distributed document collections. The resulting protocol, which we call SDARTS, is simple yet expressible enough to enable building sophisticated metasearch engines. SDARTS can be viewed as an instantiation of SDLIP with metasearch-specific elements from STARTS. We also report on our experience building three SDARTS-compliant wrappers: for locally available plain-text document collections, for locally available XML document collections, and for external web-accessible collections. These wrappers were developed to be easily customizable for new collections. Our work was developed as part of Columbia University's Digital Libraries Initiative-Phase 2 (DLI2) project, which involves the departments of Computer Science, Medical Informatics, and Electrical Engineering, the Columbia University libraries, and a large number of industrial partners. The main goal of the project is to provide personalized access to a distributed patient-care digital library.
	Stephen J. Green. Automated link generation: can we do better than term repetition? In Proceedings of the Seventh International World-Wide Web Conference, 1998.
	Saul Greenberg and David Marwood. Real time groupware as a distributed system: Concurrency control and its effect on the interface. In Richard Furuta and Christine Neuwirth, editors, CSCW '94, New York, 1994. ACM. This paper exposes the concurrency control problem in groupware when it is implemented as a distributed system. Traditional concurrency control methods cannot be applied directly to groupware because system interactions include people as well as computers. Methods, such as locking, serialization, and their degree of optimism, are shown to have quite different impacts on the interface and how operations are displayed and perceived by group members. The paper considers both human and technical considerations that designers should ponder before choosing a particular concurrency control method. It also reviews the authors` work-in-progress designing and implementing a library of concurrency schemes in GROUPKIT, a groupware toolkit.
	Adrienne GreenHeart. Making multimedia work for women. In DAGS '95, 1995. Format: HTML Document (7K) Audience: Women writers, readers. References: 6. (though not in on-line version) Links: 1. Abstract: Argues that the non-linear nature of multimedia fits better with the more cyclical nature of female life, and the non-linear way that many women authors write. The new medium offers women a chance to fight the patria rchy of tradition.
	Philip Greenspun. We have chosen shame and will get war. In JEP, 1994. Format: HTML Document (22K) . Audience: Browser developers, content publishers. References: 7. Links: 13. Relevance: low. Abstract: Quote from conclusion HTML is inadequate. It lacks sufficient structural and formatting tags to even render certain kinds of fiction comprehensible much less aesthetic. HTML needs style sheets or improved formatting capabilities so that document designers can spare 20 million Internet users from adjusting everything themselves. The META tag in HTML level 2 can be exploited to implement a document typing system. We need to develop a hierarchy of do cument types to facilitate implementation of programs that automatically process Web documents. This type system must support multiple inheritance.
	José-Marie Griffiths and Kimberly K. Kertis. Access to large digital libraries of scientific information across networks. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (34K) . Audience: slightly technical, funders, general technology. References: 14. Links: 1. Relevance: Low-Medium. Abstract: Describes U. of Tennessee's Digital Library proposal. Focuses on: representation, navigation, retrieval, display of information; performance & scalability; different user interfaces for different user groups. Semantic net concept representation.
	Gary N. Griswold. A method for protecting copyright on networks. In IP Workshop Proceedings, 1994. Format: HTML Document(29K). Audience: Computer scientists, specific & somewhat technical. References: 10. Links: 0. Relevance: Medium. Abstract: Secure copyrighted documents by transmitting them in an `envelope` which is the only way to view, print, etc. Periodic & per-use reverification with a server, possible chargeback info. PATENTS APPLIED FOR.
	Kaj Gronbaek, Lennert Sloth, and Peter Orbaek. Webvise: Browser and proxy support for open hypermedia structuring mechanisms on the world wide web. In Proceedings of the Eighth International World-Wide Web Conference, 1999. This paper discusses how to augment the World Wide Web with an open hypermedia service (Webvise) that provides structures such as contexts, links, annotations, and guided tours stored in hypermedia databases external to the Web pages. This includes the ability for users collaboratively to create links front parts of HTML Web pages they do not own and support for creating links to parts of Web pages without writing HTML target tags. The method for locating parts of Web pages can locate parts of pages across frame hierarchies and it also supports certain repairs of links that break due to modified Web pages. Support for providing links to/from parts of non-HTML data, such as sound and movie, will be possible via interfaces to plug-ins and Java-based media players. The hypermedia structures are stored in a hypermedia database, developed from the Devise Hypermedia framework, and the service is available on the Web via an ordinary URL. The best user interface for creating and manipulating the structures is currently provided for the Microsoft Internet Explorer 4.x browser through COM integration that utilizes the Explorer's DOM representation of Web-pages. But the structures can also be manipulated and used via special Java applets and a pure proxy server solution is provided for users who only need to browse the structures. A user can create and use the external structures as `transparency' layers on top of arbitrary Web pages, the user can switch between viewing pageswith one or more layers (contexts) of structures or without any external structures imposed on them.
	Kaj Gronbaek, Lennert Sloth, and Peter Orbaek. Webvise: browser and proxy support for open hypermedia structuring mechanisms on the world wide web. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
	R. L. Grossman, A. Sundaram, H. Ramamoorthy, M. Wu, S. Hogan, J. Shuler, and O. Wolfson. Viewing the u.s. government budget as a digital library. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. HTML Document (24K) . Audience: Computer scientists, funders . References: 7. Links: 1. Relevance: low. Abstract: Describes a prototype system built using tools to access data from the federal budget. Argues that statistical, numerical data is fundamentally different from text and multimedia.
	Tao Guan and Kam-Fai Wong. Kps: a web information mining algorithm. In Proceedings of the Eighth International World-Wide Web Conference, 1999. The Web mostly contains semi-structured information. It is, however, not easy to search and extract structural data hidden in a Web page. Current practices address this problem by (1) syntax analysis (i.e. HTML tags); or (2) wrappers or user-defined declarative languages. The former is only suitable for highly structured Web sites and the latter is time-consuming and offers low scalability. Wrappers could handle tens, but certainly not thousands, of information sources. In this paper, we present a novel information mining algorithm, namely KPS, over semi-structured information on the Web. KPS employs keywords, patterns and/or samples to mine the desired information. Experimental results show that KPS is more efficient than existing Web extracting methods.
	François Guimbretiére and Terry Winograd. Flowmenu: combining command, text, and data entry. In UIST '00: Proceedings of the 13th annual ACM symposium on User interface software and technology, pages 213-216, New York, NY, USA, 2000. ACM Press.
	Oliver Gunther, Rudolf Muller, and Andreas S. Wiegand. The design of mmm: A model management system for time series analysis. In DAGS '95, 1995. Format: HTML Document(46K + pictures) . Audience: Mathematicians, economists, statisticians, computer scientists. References: 37. Links: 13. Relevance: Low. Abstract: Proposes a web-based repository for software implementing time series analysis methods. Such a system would facilitate collaboration, verification of results, and would help build an experience base of which models worked well under which circumstances. Briefly describes the architecture, which requires method implementors to specify the methods in terms of Ypsilon (an abstract class system) classes.
	A. Gupta, V. Harinarayan, and A. Rajaraman. Virtual database technology. In Proceedings of the Fourteenth International Conference on Data Engineering, pages 23-27, February 1998.
	Amarnath Gupta, Bertram Ludaescher, and Reagan W. Moore. Ontology services for curriculum development in nsdl. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We describe our effort to develop an ontology service on top of an educational digital library. The ontology is developed by relating library holdings to the educational concepts they refer to. The ontology system supports basic services like ontology-based search and complex services such as comparison of multiple curricula.
	Aparna Gurijala and Jr. J.R. Deller. A quantified fidelity criterion for parameter-embedded watermarking of audio archives [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. A novel algorithm for speech watermarking through parametric modeling is enhanced by inclusion of a quantified fidelity criterion. Watermarking is effected through solution of a set-membership filtering (SMF) problem, subject to an l(infinity) fidelity criterion in the signal space. The SMF approach provides flexibility in obtaining watermark solutions that trade-off watermark robustness and stegosignal fidelity.
	Samuel Gustman, Dagobert Soergel, Douglas Oard, William Byrne, Michael Picheny, Bhuvana Ramabhadran, and Douglas Greenberg. Supporting access to large digital oral history archives. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. This paper, describes our experience with the creation, indexing and providing access to a very large archive of videotaped oral histories- 116,000 hours of digitized interviews in 32 languages from 52,000 survivors, liberators, rescuers and witnesses of the Nazi Holocaust-and identifies a set of critical research issues in user requirement studies, automatic speech recognition, automatic classification, segmentation, and summarization, retrieval, and user interfaces that must be addressed if we are to provide full and detailed access to collections of this size.
	Marc Gyssens, Jan Paredaens, and Dirk Van Gucht. A grammar-based approach towards unifying hierarchical data models. In Proceedings of the International Conference on Management of Data, Portland, Oreg., June 1989. ACM Press, New York.
	H.-J.Zimmermann. Fuzzy Set Theory. Kluwer Academic Publishers, 1996.
	Laura M. Haas, Donald Kossmann, Edward L. Wimmers, and Jun Yang. Optimizing queries across diverse data sources. In Proceedings of the Twenty-third International Conference on Very Large Databases, pages 276-285, Athens, Greece, August 1997. VLDB Endowment, Saratoga, Calif.
	Robert J. Hall. Agents helping agents: Issues in sharing how-to knowledge. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Joseph Y. Halpern and Carl Lagoze. The computing research repository: Promoting the rapid dissemination and archiving of computer science research. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. We describe the Computing Research Repository (CoRR), a new electronic archive for rapid dissemination and archiving of computer science research results. CoRR was initiated in September 1998 through the cooperation of ACM, LANL (Los Alamos National Laboratory) e-Print archive, and NCSTRL (Networked Computer Science Technical Reference Library. Through its implementation of the Dienst protocol, CoRR combines the open and extensible architecture of NCSTRL with the reliable access and well-established management practices of the LANL XXX e-Print repository. This architecture will allow integration with other e-Print archives and provides a foundation for a future broad-based scholarly digital library. We describe the decisions that were made in creating CoRR, the architecture of the CoRR/NCSTRL interoperation, and issues that have arisen during the operation of CoRR.
	Joachim Hammer, Hector Garcia-Molina, Junghoo Cho, Arturo Crespo, and Rohan Aranha. Extracting semistructured information from the web. In Proceedings of the Workshop on Management of Semistructured Data, 1997.
	Kristian Hammond, Robin Burke, Charles Martin, and Steven Lytinen. Faq finder: A case-based approach to knowledge navigation. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang, and Edward A. Fox. Automatic document metadata extraction using support vector machines. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a support vector machine classification-based method for metadata extraction from the header part of the research papers and show that it outperforms other machine learning methods on the same task. The method first classifies each line of the header into one or more of the 15 classes. An iterative convergence procedure is then used to improve the line classification by using the predicted class labels of its neighbor lines in the previous round. Further metadata extraction is done by seeking the best chunk boundaries of each line. We found that discovery and use of the structural patterns of the data and domain based feature selection can improve the metadata extraction performance. An appropriate feature normalization also greatly improves the classification performance.
	Hui Han, C. Lee Giles, Hongyuan Zha, Cheng Li, and Kostas Tsioutsiouliklis. Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Due to name abbreviations, identical names, or name misspellings in publications or bibliographies (citations), an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integration, and may cause improper attribution to authors. This paper investigates two supervised learning approaches to disambiguate authors in the citations. One approach uses the naive Bayes probability model, a generative model; the other uses Support Vector Machines(SVMs) [?] and vector space representation of citations, a discriminative model. Both approaches utilize three types of citation attributes: co-author names, the title of the paper ,and the title of the journal or proceeding. We illustrate these two approaches on two types of data, one collected from the web, mainly publication lists from homepages, the other collected from the DBLP citation databases.
	Hui Han, Hongyuan Zha, and C. Lee Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. An author may have multiple names and multiple authors may share the same name simply due to name abbreviations, identical names, or name misspellings in publications or bibliographies (citations). This can produce name ambiguity which can affect the performance of document retrieval, web search, and database integration, and may cause improper attribution of credit. Proposed here is an unsupervised learning approach using K-way spectral clustering that disambiguates authors in citations. The approach utilizes three types of citation attributes: co-author names, paper titles, and publication venue titles. The approach is illustrated with 16 name datasets with citations collected from the DBLP database bibliography and author home pages and shows that name disambiguation can be achieved using these citation attributes.
	Handango, inc. http://www.handango.com.
	Wei hao Lin and Alex Hauptmann. A wearable digital library of personal conversations. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We have developed a wearable, personalized digital library system, which unobtrusively records the wearer's part of a conversation, recognizes the face of the current dialog partner and remembers his/her voice. The next time the system sees the same person's face and hears the same voice, it can replay the audio from the last conversation in compressed form summarizing the names and major issues mentioned. Experiments with a prototype system show that a combination of face recognition and speaker identification can be effective for retrieving conversations.
	Masanori Harada, Shin ya Sato, and Kazuhiro Kazama. Finding authoritative people from the web. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Today's web is so huge and diverse that it arguably reflects the real world. For this reason, searching the web is a promising approach to find things in the real world. This paper presents NEXAS, an extension to web search engines that attempts to find real-world entities relevant to a topic. Its basic idea is to extract proper names from the web pages retrieved for the topic. A main advantage of this approach is that users can query any topic and learn about relevant real-world entities without dedicated databases for the topic. In particular, we focus on an application for finding authoritative people from the web. This application is practically important because once personal names are obtained, they can lead users from the web to managed information stored in digital libraries. To explore effective ways of finding people, we first examine the distribution of Japanese personal names by analyzing about 50 million Japanese web pages. We observe that personal names appear frequently on the web, but the distribution is highly influenced by automatically generated texts. To remedy the bias and find widely acknowledged people accurately, we utilize the number of servers containing a name instead of the number of web pages. We show its effectiveness by an experiment covering a wide range of topics. Finally, we demonstrate two examples and discuss possible applications.
	Susumu Harada, Mor Naaman, Yee Jiun Song, QianYing Wang, and Andreas Paepcke. Lost in memories: Interacting with photo collections on PDAs. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. We developed two browsers to support large personal photo collections on PDAs. Our first browser is based on a traditional, folder-based layout that utilizes either the user's manually created organization structure, or a system-generated structure. Our second browser uses a novel interface that is based on a vertical, zoomable timeline. This timeline browser does not require users to organize their photos, but instead, relies solely on system-generated structure. Our system creates a hierarchical structure of the user's photos by applying time-based clustering to identify subsets of photos that are likely to be related. In a user experiment, we compared users' searching and browsing performance across these browsers, using each user's own photo collection. Photo collection sizes varied between 500 and 3000 photographs. Our results show that our timeline browser is at least as effective for searching and browsing tasks as a traditional browser that requires users to manually organize their photos.
	Darren R. Hardy, Michael F. Schwartz, and Duane Wessels. Harvest user's manual, January 1996. Accessible at `http://harvest.transarc.com/afs/transarc.com/public/trg/Harvest/ user-manual`.
	Donna Harman. Document detection overview. In Proceedings TIPSTER Text Program (Phase I), Fredricksburg, Va., September 1993. Morgan Kaufmann, San Francisco, Calif.
	David J Harper, Sara Coluthard, Raja Kalpana, and Sun Yixing. A language modelling approach to relevance profiling for document browsing. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. This paper describes a novel tool, SmartSkim, for content-based browsing or skimming of documents. The tool integrates concepts from passage retrieval and from interfaces, such as TileBars, which provide a compact overview of query term hits within a document. We base our tool on the concept of relevance profiling, in which a plot of retrieval status values at each word position of a document is generated. A major contribution of this paper is applying language modelling to the task of relevance profiling. We describe in detail the design of the SmartSkim tool, and provide a critique of the design. Possible applications of the tool are described, and we consider how an operational version of SmartSkim might be architected.
	Terry L. Harrison, Michael L. Nelson, and Mohammad Zubair. The dienst-oai gateway [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Though the Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) is becoming the de facto standard for digital libraries, some of its predecessors are still in use. Although a limited number of Dienst repositories continue to be populated, others are precariously unsupported. The Dienst Open Archive Gateway (DOG) is a gateway between the OAI-PMH and the Dienst (version 4.1) protocol. DOG allows OAI-PMH harvesters to extract metadata records (in RFC-1807 or Dublin Core) from Dienst servers.
	Scott W. Hassan and Andreas Paepcke. Stanford Digital Library Interoperability Protocol. Technical Report SIDL-WP-1997-0054; 1997-73, Stanford University, 1997. Accessible at http://dbpubs.stanford.edu/pub/1997-73.
	Franz J. Hauck. Supporting hierarchical guided tours in the world wide web. In Proceedings of the Fifth International World-Wide Web Conference, 1996.
	Alex Hauptmann, Rong Jin, and Tobun Dorbin Ng. Multi-modal information retrieval from broadcast video using ocr and speech recognition. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We examine multi-modal information retrieval from broadcast video where text can be read on the screen through OCR and speech recognition can be performed on the audio track. OCR and speech recognition are compared on the 2001 TREC Video Retrieval evaluation corpus. Results show that OCR is more important that speech recognition for video retrieval. OCR retrieval can further improve through dictionary-based post-processing. We demonstrate how to utilize imperfect multi-modal metadata results to benefit multi-modal information retrieval.
	Alex Hauptmann and Norman Papernick. Video-cuebik: Adapting image search to video shots. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We propose a new analysis for searching images in video libraries that goes beyond simple image search, which compares one still image frame to another. The key idea is to expand the definition of an image to account for the variability in the sequence of video frames that comprise a shot. A first implementation of this method for a QBIC-like image search engine shows a clear improvement over still image search. A combination of the traditional still image search and the new video image search provided the overall best results on the TREC video retrieval evaluation data.
	Alexander G. Hauptmann, Michael J. Witbrock, and Michael G. Christel. News-on-demand: An application of informedia technology. D-Lib Magazine, Sep 1995. Format: HTML Document().
	Taher Haveliwala. Efficient computation of pagerank. Technical Report 1999-31, Database Group, Computer Science Department, Stanford University, February 1999. Available at http://dbpubs.stanford.edu/pub/1999-31.
	Taher H. Haveliwala. Topic-sensitive pagerank. In WWW '02: Proceedings of the 11th international conference on World Wide Web, pages 517-526, New York, NY, USA, 2002. ACM Press. In the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative `importance` of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. By using these (precomputed) biased PageRank vectors to generate query-specific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. For ordinary keyword search queries, we compute the topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive PageRank scores using the topic of the context in which the query appeared.
	D. Hawking and N. Craswell. Overview of TREC-7 very large collection track. In Proc. of the Seventh Text Retrieval Conf., pages 91-104, November 1998.
	David Hawking, Nick Craswell, Paul Thistlewaite, and Donna Harman. Results and challenges in web search evaluation. In Proceedings of the Eighth International World-Wide Web Conference, 1999. A frozen 18.5 million page snapshot of part of the Web has been created to enable and encourage meaningful and reproducible evaluation of Web search systems and techniques. This collection is being used in an evaluation framework within the Text Retrieval Conference (TREC) and will hopefully provide convincing answers to questions such as, Can link information result in better rankings?`,Do longer queries result in better answers?`, and, `Do TREC systems work well on Web data?` The snapshot and associated evaluation methods are described and an invitation is extended to participate. Preliminary results are presented for an effective comparison of six TREC systems working on the snapshot collection against five well-known Web search systems working over the current Web. These suggest that the standard of document rankings produced by public Web search engines is by no means state-of-the-art.
	D. T. Hawkins and L. R. Levy. Front end software for online database searching Part 1: Definitions, system features, and evaluation. Online, 9(6):30-37, November 1985.
	Marti A. Hearst. Tilebars: Visualization of term distribution information in full text information access. In Proceedings of the Conference on Human Factors in Computing Systems CHI'95, 1995. The field of information retrieval has traditionally focused on textbases consisting of titles and abstracts. As a consequence, many underlying assumptions must be altered for retrieval from full-length text collections. This paper argues for making use of text structure when retrieving from full text documents, and presents a visualization paradigm, called TileBars, that demonstrates the usefulness of explicit term distribution information in Boolean-type queries. TileBars simultaneously and compactly indicate relative document length, query term frequency, and query term distribution. The patterns in a column of TileBars can be quickly scanned and deciphered, aiding users in making judgments about the potential relevance of the retrieved documents.
	Sandra Heiler. Semantic interoperability. ACM Computing Surveys, 27(2):271-273, June 1995. Discusses the issues related to semantic interoperability. The purposes are to indicate why semantic interoperability is so hard to achieve, and to suggest that repository technology can provide the beginnings of help to make it easier.
	Albert Henning. Dynamic authoring and retrieval of textbook information: Dartext. In DAGS '95, 1995. Format: HTML Document (34K + pictures) . Audience: Instructors, students, textbook authors and publishers. References: 15. Links: 6. Relevance: Low-medium. Abstract: A very broad but shallow description of the textbook production business. Argues for a distributed author model, but with publishers that still piece together textbooks from the contributions of instructors, students , etc. CD-ROM versions in addition to on-line. Authors paid according to their contribution. Briefly mentions administration, intellectual property issues. Longer example of physics/engineering systems demo.
	Jr. Henry H. Perritt. Permission headers and contract law. In IP WOrkshop Proceedings, 1994. Format: HTML Document (71K). Audience: Public policy, lawyers, and developers. References: 47 notes. Links: 0. Relevance: Medium. Abstract: Focusing primarily on intellectual property, this article covers a lot of ground. Briefly describes the CNRI copyright management project, argues for `permission headers` that describe how each of the various protected rights (viewing, copying, preparing derivative works, etc) can be in the header, along with economic information. Describes whether digitally signed contracts are likely to be legally enforceable (they probably are), and under what ci rcumstances electronic records are court-admissable (when they are generated as a regular course of business, and there's no reason to doubt them). Argues against general encryption, too expensive & inconsistent with the open market o f ideas. Seeks legal protection commensurate with the value of a transaction.
	Monika R. Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc Najork. Measuring index quality using random wals on the web. In Proceedings of the Eighth International World-Wide Web Conference, 1999. Recent research has studied how to measure the size of a search engine, in terms of the number of pages indexed. In this paper, we consider a different measure for search engines, namely the quality of the pages in a search engine index. We provide a simple, effective algorithm for approximating the quality of an index by performing a random walk on the Web, and we use this methodology to compare the index quality of several major search engines.
	Ralf G. Herrtwich and Thomas Kaeppner. Network computers-ubiquitous computing or dumb multimedia? In Third International Symposium on Autonomous Decentralized Systems. IEEE Computer Society Press, 1997. Introduces the NC spec and discusses its chances
	M. Hersovici, M. Jacovi, Y. Maarek, D. Pelleg, M. Shtalhaim, and S. Ur. The shark-search algorithm - an application: tailored web site mapping. In Proceedings of the 7th World Wide Web Conference, 1998. This paper introduces the shark search algorithm, a refined version of the first dynamic Web search lgorithms, the fish search. The shark-search has been embodied into a dynamic Web site mapping that enables users to tailor Web maps to their interests. Preliminary experiments show significant improvement over the original fish-search algorithm.
	Walter B. Hewlett and eds. Eleanor Selfridge-Field. Melodic Similarity. Concepts, Procedures, and Applications. MIT Press and Center for Computing in the Humanities (CCARH), Stanford University, 1998.
	Allan Heydon and Marc Najork. Mercator: A scalable, extensible Web crawler. World Wide Web, 2(4):219-229, December 1999.
	Linda L. Hill, James Frew, and Qi Zheng. Geographic names - the implementation of a gazetteer in a georeferenced digital library. CNRI D-Lib Magazine, January 1999.
	W. Hill and J. Hollan. History-enriched digital objects: Prototypes and policy issues. The Information Society, 10(2), April-June 1994. Recording on digital objects (e. g. reports, forms, contracts, mail-order catalogs, source code, manual pages, email, spreadsheets, menus) the interaction events that comprise their use makes it possible on future occasions, when the objects are used again, to display graphical abstractions of the accrued histories as parts of the objects themselves. For example, co-authors of a report can see stable and unstable sections (lines of text are marked by recency of changes or amount of editing) and identify who has written what and when. In the case of reading documentation, a reader can see who else has previously read a particular section of interest. While using a spreadsheet to refine a budget, the count of edits per spreadsheet cell can be mapped onto grayscale to give an impression of which budget numbers have been reworked the most and least. Or in the context of learning unfamiliar menu selections in a new piece of software, the menu itself can depict the distribution statistics of colleagues' previous menu selections in the same or similar contexts. There are many existing computational devices that hint at the prospect of history-enriched digital objects. Automatic change-bars, citation indices, and download counts on computer bulletin boards are examples. In fact, for the last thirteen years, members of our lab have been able to request AP News articles by specifying a minimum number of previous readers and thus easily retrieve articles that colleagues have chosen to read.
	W. Hill, L. Stead, M. Rosenstein, and G. Furnas. Recommending and evaluating choices in a virtual community of use. In Proceedings of the Conference on Human Factors in Computing Systems CHI'95, New York, 1995. ACM. When making a choice in the absence of decisive first-hand knowledge, choosing as other like-minded, similarly-situated people have successfully chosen in the past is a good strategy-in effect, using other people as filters and guides: filters to strain out potentially bad choices and guides to point out potentially good choices. Current human-computer interfaces largely ignore the power of the social strategy. For most choices within an interface, new users are left to fend for themselves and if necessary, to pursue help outside of the interface. We present a general history-of-use method that automates a social method for informing choice and report on how it fares in the context of a fielded test case: the selection of videos from a large set. The positive results show that communal history-of-use data can serve as a powerful resource for use in interfaces.
	Jun Hirai, Sriram Raghavan, Hector Garcia-Molina, and Andreas Paepcke. Webbase: A repository of web pages. In Proceedings of the Ninth International World-Wide Web Conference, pages 277-293, May 2000. Available at http://dbpubs.stanford.edu/pub/2000-51. In this paper, we study the problem of constructing and maintaining a large shared repository of web pages. We discuss the unique characteristics of such a repository, propose an architecture, and identify its functional modules. We focus on the storage manager module, and illustrate how traditional techniques for storage and indexing can be tailored to meet the requirements of a web repository. To evaluate design alternatives, we also present experimental results from a prototype repository called WebBase, that is currently being developed at Stanford University. Keywords : Repository, WebBase, Architecture, Storage management
	Steve Hitchcock, Les Carr, Zhuoan Jiao, Donna Bergmark, Wendy Hall, Carl Lagoze, and Stevan Harnad. Developing services for open eprint archives: Globalisation, integration and the impact of links. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. The rapid growth of scholarly information resources available in electronic form and their organisation by digital libraries is proving fertile ground for the development of sophisticated new services, of which citation linking with be one indispensable example. Many new projects, partnerships and commercial agreements have been announced to build citation linking applications. This paper describes the Open Citation (OpCit) project, which will focus on linking papers held in freely accessible eprint archives such as the Los Alamos physics archives and other distributed archives, and which will build on the work of the Open Archives initiative to make the data held in such archives available to compliant services. The paper emphasises the work of the project in the context of emerging digital library information environments, explores how a range of new linking tools might be combined and identifies ways in which different linking applications might converge. Some early results of linked pages from OpCit project are reported.
	E. Hjelmas and B. K. Low. Face detection: a survey. Computer Vision and Image Understanding, 83(3):236 - 74, SEP 2001.
	Patrick Hochstenbach, Henry Jerez, and Herbert Van de Sompel. The oai-pmh static repository and static repository gateway. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Although the OAI-PMH specification is focused on making it straightforward for data providers to expose metadata, practice shows that in certain significant situations deployment of OAI-PMH conformant repository software remains problematic. In this paper, we report on research aimed at devising solutions to further lower the barrier to make metadata collections harvestable. We provide an in depth description of an approach in which a data provider makes a metadata collection available as an XML file with a specific format an OAI Static Repository which is made OAI-PMH harvestable through the intermediation of software an OAI Static Repository Gateway - operated by a third party. We describe the properties of both components, and provide insights in our experience with an experimental implementation of a Gateway.
	H. Ulrich Hoppe and Jian Zhao. C-tori: An interface for cooperative database retrieval. In Dimitris Karagiannis, editor, 5th International Conference, DEXA '94, Database and Expert Systems Applications, Berlin, Germany, 1994. Springer-Verlag. C-TORI (Cooperative TORI), a cooperative version of TORI (Task-Oriented Database Retrieval Interface), is presented in this paper. It extends interactive query formulation and result browsing by supporting cooperation between multiple users. In the cooperative environment, three basic additional operations are provided: copying, merging and coupling for three types of TORI objects (query forms, result forms, and query history windows). Cooperation with query forms allows end users to jointly formulate queries; cooperation with result forms supports users in jointly browsing through results and in sharing retrieved data without re-accessing the database; cooperative use of query histories yields a specific mechanism to share `memory` between users. The implementation is based on the concept of shared UI objects as an application-independent cooperation and communication model.
	Ikumi Horie, Kazunori Yamaguchi, and Kenji Kashiwabara. Higher-order rank analysis for web structure. In HYPERTEXT '05: Proceedings of the sixteenth ACM conference on Hypertext and hypermedia, pages 98-106, New York, NY, USA, 2005. ACM Press. In this paper, we propose a method for the structural analysis of Web sites.The Web has become one of the most widely used media for electronic information because of its great flexibility. However, this flexibility has led to complicated structures. A structure that differs from the typical structures in a Web site might confuse readers, thus reducing the effectiveness of the site. A method for detecting unusual structures would be useful for identifying such structures so that their impact can be studied and ways to improve Web site effectiveness developed.We viewed the Web as a directed graph, and introduced a higher-order rank based on the non-well-founded set theory. We then developed higher-order rank analysis for detecting irregularities, defined as structures which differ from the typical structure of a target site. To test the effectiveness of our method, we applied it to several Web sites in actual use, and succeeded in identifying irregular structures in the sites.
	Nancy A. Van House. User needs assessment and evaluation for the uc berkeley electronic environmental library project. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document(28K) . Audience: HCI people, librarians. References: 16. Links: 3. Relevance: Low-Medium. Abstract: Starts off with a description of Berkeley's NSF/ARPA/NASA project, focusing on environmental data, particularly water planning for California. Diverse data types, bitmapped pages with OCR text. Describes the proper ties of the task, and the users of the system. Talks about methods of assessing users needs, like interviews, observation, focus groups, etc. Claims that most users' expectations are too low, so user input doesn't provide appropriate goals.
	Nancy A. Van House, Mark H. Butler, Virginia Ogle, and Lisa Schiff. User-centered iterative design for digital libraries: The cypress experience. D-Lib Magazine, Feb 1996. Format: HTML Document().
	Nancy Van House. Trust and epistemic communities in biodiversity data sharing. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. All knowledge work is, in some sense, collaborative. Trust is a key element of knowledge work: what we know depends largely on others. A better understanding of the epistemic machineries of knowledge communities and especially their practices of trust would be useful for designing effective digital libraries. This paper discusses the concepts of communities of practice and epistemic cultures, and their implication for design of digital libraries that support data sharing, with particular reference to practices of trust and credibility. It uses an empirical study of a biodiversity data system that collects and distributes data from a variety of sources to illustrate the implications of these concepts of knowledge communities for digital library design and operation. It concludes that diversity and uncomfortable boundary areas typify, not only digital library user groups, but the design and operation of digital libraries.
	B.C. Housel and D.B. Lindquist. Webexpress: A system for optimizing web browsing in a wireless environment. In In Proceedings of the Second Annual International Conference on Mobile Computing and Networking, pages 108-116, November 1996, 1996.
	Sherry Hsi and Holly Fait. From playful exhibits to lom: Lessons from building an exploratorium digital library. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. This paper describes several challenges that arise in designing a digital library for K12 education audiences when using Learning Object Metadata standard. These problems were multiplied when attempting to catalog the wide variety of informal learning and teaching resources from our museum's ever growing website and exhibit-based resource collections. This paper shares key challenges and early solutions for the creation of an educational metadata scheme based upon LOM, new vocabularies, and strategies for retrofitting existing informal learning science resources into learning objects.
	Tony Hsieh, QianYing Wang, and Andreas Paepcke. Piles across space-breaking the realestate barrier on pdas. In Submitted for publication, 2005. Available at http://dbpubs.stanford.edu/pub/2005-17. We describe an implementation that has users `ck' notes, images, audio, and videoles into piles beyond the screen of the PDA. This scheme allows the PDA user to keep information close at hand without sacrcing valuable screen real estate. It also obviates the need to browse complexle trees during a working session. Multiple workspaces can be maintained in persistent store. Each workspace preserves one coguration of off-screen piles. The system allows multiple PDA owners within ad hoc radio range to share off-screen piles. They point out to each other where a shared pile is to reside in space. Once established, all sharing partners may add to the pile and see its contents. One application is to support biodiversity researchers in theeld, where they generate data on their PDA and need to keep it organized until they return to theireld station. We conducted an experiment where participants used our system with up to ten simultaneous piles. Not only were they able to operate the application, but they remembered the location of piles when placed in different physical environments and when asked to recall the locations several days after the experiment. We describe gender differences that suggest particular design choices for the system
	Forms in HTML documents - W3C HTML 4.01 recommendation. http://www.w3.org/TR/html401/interact/forms.html.
	Hypertext transfer protocol - HTTP/1.1. ftp://ftp.isi.edu/in-notes/rfc2616.txt.
	Michael J. Hu and Ye Jian. Multimedia description framework (mdf) for content description of audio/video documents. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. MPEG is undertaking a new initiative to standardize content description of audio and video data/documents. When it is finalized in 2001, MPEG-7 is expected to provide standardized description language and schemes for concise and unambiguous content description of data/documents of complex media types. Meanwhile, other meta-data or description schemes, such as Dublin Core, XML, RDF, etc., are becoming popular in different application domains. In this paper, we propose Multimedia Description Framework (MDF), which is designated to accommodate multiple description (meta-data) schemes, MPEG-7 and non-MPEG-7, into integrated architecture. We will u`se examples to show how MDF description makes use of combined strength of different description schemes to enhance its expression power and flexibility. We conclude the paper with discussion of using MDF description of MPEG-7 Content Set to search/retrieve required audio and video documents from the set utilizing an MDF prototype system we have implemented.
	Ning Hu and Roger B. Dannenberg. A comparison of melodic database retrieval techniques using sung queries. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Query-by-humming systems search a database of music for good matches to a sung, hummed, or whistled melody. Errors in transcription and variations in pitch and tempo can cause substantial mismatch between queries and targets. Thus, algorithms for measuring melodic similarity in query-by- humming systems should be robust. We compare several variations of search algorithms in an effort to improve search precision. In particular, we describe a new frame-based algorithm that significantly outperforms note-by-note algorithms in tests using sung queries and a database of MIDI-encoded music. Keywords dynamic programming, melodic comparison, melodic searching, Music Information Retrieval (MIR), sung query
	Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, and Qinghua Zheng. Automatic extraction of titles from general documents using machine learning. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that do not belong to any specific genre, including presentations, book chapters, technical papers, brochures, reports, and letters. Previously, methods were proposed mainly for title extraction from research papers. It was not clear whether it is possible to conduct automatic title extraction from general documents. As case study, we consider extraction from Office including Word and PowerPoint. In our approach, we annotate titles in sample documents (for Word and PowerPoint respectively) and take them as training data, train machine learning models, and perform title extraction using the trained models. Our method is unique in that we mainly utilize format information such as font size as features in the models. It turns out that the use of format information is the key to a successful extraction from general documents. Precision and recall for title extraction from Word were 0.810 and 0.837 respectively, and precision and recall for title extraction from PowerPoint were 0.875 and 0.895 respectively in an experiment on intranet data. Other important new findings in this work include that we can train models in one domain and apply them to another domain, and more surprisingly we can even train models in one language and apply them to another language. Moreover, we can significantly improve search ranking results in document retrieval by using the extracted titles.
	Mao Lin Huang, Peter Eades, and Robert F. Cohen. Webofdav - navigating and visualizing the web on-line with animated context swapping. In Proceedings of the Seventh International World-Wide Web Conference, 1998.
	Yongqiang Huang and Hector Garcia-Molina. Exactly-once semantics in a replicated messaging system. In Submitted for publication, 2000. Available at http://dbpubs.stanford.edu/pub/2000-7.
	Zan Huang, Wingyan Chung, Thian-Huat Ong, and Hsinchun Chen. A graph-based recommender system for digital library. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Research shows that recommendations comprise a valuable service for users of a digital library. While most existing recommender systems rely either on a purely content-based approach or purely collaborative approach to make recommendations, there is a need for digital libraries to use a combination of both approaches (a hybrid approach) to improve recommendations. In this paper, we report how we tested the idea of using a graph-based recommender system that naturally combines the content-based and collaborative approaches. Due to the similarity between our problem and a concept retrieval task, a Hopfield net algorithm was used to exploit high-degree book-book, user-user and book-user associations. Sample hold-out testing and preliminary subject testing were conducted to evaluate the system, by which it was found that the system gained improvement with respect to both precision and recall by combining content-based and collaborative approaches. But no significant improvement was observed by exploiting high-degree associations.
	Zan Huang, Xin Li, and Hsinchun Chen. Link prediction approach to collaborative filtering. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Recommender systems can provide valuable services in a digital library environment, as demonstrated by its commercial success in book, movie, and music industries. One of the most successful recommendation algorithms is collaborative filtering, which explores the correlations within user-item interactions to infer user interests and preferences. However, the recommendation quality of collaborative filtering approaches is greatly limited by the data sparsity problem. To alleviate this problem we have previously proposed graph-based algorithms that explore transitive user-item associations. In this paper, we extend from the idea of analyzing user-item interactions as graphs and employ link prediction approaches proposed in the recent network modeling literature for making collaborative filtering recommendations. We have adapted a wide range of linkage measures for making recommendations. Our preliminary experimental results based on a book recommendation dataset show that some of these measures achieved significantly better performance than standard collaborative filtering algorithms.
	Bernardo A. Huberman and Lada A. Adamic. Growth dynamics of the World-Wide Web. Nature, 401(6749), September 1999.
	Scott B. Huffman and David Steier. Heuristic joins to integrate structured heterogeneous data. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	M.N. Huhns and M.P. Singh. Automating workflows for service provisioning: Integrating AI and database technologies. In Proceedings of the Tenth Conference on Artificial Intelligence for Applications, Los Alamitos, CA, 1994. IEEE Computer Society Press. Workflows are the structured activities that take place in information systems in typical business environments. These activities frequently involve several database systems, user interfaces, and application programs. Traditional database systems do not support workflows to any reasonable extent. Usually, human beings must intervene to ensure their proper execution. We have developed an architecture based on AI technology that automatically manages workflows. This architecture executes on top of a distributed computing environment. It has been applied to automating service provisioning workflows; an implementation that operates on one such workflow has been developed. This work advances the Camel Project's goal of developing technologies for integrating heterogeneous database systems. It is notable in its marriage of AI approaches with standard distributed database techniques.
	Jonathan T. Hujsak. Digital libraries and corporate technology reuse. D-Lib Magazine, Jan 1996. Format: HTML Document().
	David A. Hull and Gregory Grefenstette. Querying across languages: A dictionary-based approach to multilingual information retrieval. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996. This paper presents cross-language multilingual information retrieval using translated queries and a bilingual transfer dictionary. The experiments shows that multilingual IR is feasible, although performance lags considerably behind the monolingual standard.
	James J. Hunt, Kiem-Phong Vo, and Walter F. Tichy. Delta algorithms: An empirical analysis. ACM Transactions on Software Engineering and Methodology, 7:192-214, 1998.
	Jane Hunter and Sharmin Choudhury. A semi-automated digital preservation system based on semantic web services. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. This paper describes a Web-services-based system which we have developed to enable organizations to semi-automatically preserve their digital collections by dynamically discovering and invoking the most appropriate preservation service, as it is required. By periodically comparing preservation metadata for digital objects in a collection with a software version registry, potential object obsolescence can be detected and a notification message sent to the relevant agent. By making preservation software modules available as Web services and describing them semantically using a machine-processable ontology (OWL-S), the most appropriate preservation service(s) for each object can then be automatically discovered, composed and invoked by software agents (with optional human input at critical decision-making steps). We believe that this approach represents a significant advance towards providing a viable, cost-effective solution to the long term preservation of large-scale collections of digital objects.
	W.J. Hutchins and H.L. Somers. An Introduction to Machine Translation. Academic Press, 1992. Recent textbook on natural language translation.
	Arwen Hutt and Jenn Riley. Semantics and syntax of dublin core usage in open archives initiative data providers of cultural heritage materials. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. This study analyzes metadata shared by cultural heritage institutions via the Open Archives Initiative Protocol for Metadata Harvesting. The syntax and semantics of metadata appearing in the Dublin Core fields creator, contributor, and date are examined. Preliminary conclusions are drawn regarding the effectiveness of Dublin Core in the Open Archives Initiative environment for cultural heritage materials.
	Jason J. Hyon and Rosana Bisciotti Borgen. Data archival and retrieval enhancement (DARE) metadata modeling and its user interface. In Proceedings of the First IEEE Metadata Conference, Silver Spring, Md., April 1996. IEEE.
	Ionut E. Iacob and Alex Dekhtyar. xtagger: a new approach to authoring document-centric xml. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. The process of authoring document-centric XML documents in humanities disciplines is very different from the approach espoused by the standard XML editing software with the data-centric view of XML. Where data-centric XML is generated by first describing a tree structure of the encoding and then providing the content for the leaf elements, document-centric encodings start with content which is then marked up. In the paper we describe our approach to authoring document-centric XML documents and the tool, xTagger, originally developed for this purpose within the Electronic Boethius project, otherwise enhanced within the ARCHway project, an interdisciplinary project devoted to development of methods and software for preparation of image-based electronic editions of historic manuscripts.
	Frank M. Shipman III, Haowei Hsieh, J. Michael Moore, and Anna Zacchi. Supporting personal collections across digital libraries in spatial hypertext. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Creating, maintaining, or using a digital library requires the manipulation of digital documents. Information workspaces provide a visual representation allowing users to collect, organize, annotate, and author information. The Visual Knowledge Builder (VKB) helps users access, collect, annotate, and combine materials from digital libraries and other sources into a personal information workspace. VKB has been enhanced to include direct search interfaces for NSDL and Google. Users create a visualization of search results while selecting and organizing materials for their current activity. Additionally, metadata applicators have been added to VKB. This interface allows the rapid addition of metadata to documents and aids the user in the extraction of existing metadata for application to other documents. A study was performed to compare the selection and organization of documents in VKB to the commonly used tools of a Web browser and a word processor. This study shows the value of visual workspaces for such effort but points to the need for sub-document level objects, ephemeral visualizations, and support for moving from visual representations to metadata.
	The URN Implementors. Uniform resource names: A progress report. D-Lib Magazine, Feb 1996. Format: HTML Document().
	Palm Inc. Palm os emulator. Palm website: http://www.palm.com/dev/tech/tools/emulator/.
	Palm Inc. Web clipping development. Palm website: http://www.palm.com/dev/tech/webclipping/.
	The internet archive. http://www.archive.org/.
	Invisibleweb.com. http://www.invisibleweb.com.
	Panagiotis G. Ipeirotis, Tom Barry, and Luis Gravano. Extending sdarts: Extracting metadata from web databases and interfacing with the open archives initiative. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. SDARTS is a protocol and toolkit designed to facilitate metasearching. SDARTS combines two complementary existing protocols, SDLIP and STARTS, to define a uniform interface that collections should support for searching and exporting metasearch-related metadata. SDARTS also includes a toolkit with wrappers that are easily customized to make both local and remote document collections SDARTS-compliant. This paper describes two significant ways in which we have extended the SDARTS toolkit. First, we have added a tool that automatically builds rich content summaries for remote web collections by probing the collections with appropriate queries. These content summaries can then be used by a metasearcher to select over which collections to evaluate a given query. Second, we have enhanced the SDARTS toolkit so that all SDARTS-compliant collections export their metadata under the emerging Open Archives Initiative OAI) protocol. Conversely, the SDARTS toolkit now also allows all OAI-compliant collections to be made SDARTS- compliant with minimal effort. As a result, we implemented a bridge between SDARTS and OAI, which will facilitate easy interoperability among a potentially large number of collections. The SDARTS toolkit, with all related documentation and source code, is publicly available at http://sdarts.cs.columbia.edu.
	ISO. ISO 8777:1993 Information and Documentation - Commands for Interactive Text Searching. Int'l Organization for Standardization, Geneva, Switzerland, first edition, 1993.
	ISO/IEC. ITU/ISO ODP Trading Function, 1997. ISO/IEC IS 13235-1, ITU/T Draft Rec X950-1.
	Melody Y. Ivory and Marti A. Hearst. Statistical profiles of highly-rated web sites. In CHI '02: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 367-374, New York, NY, USA, 2002. ACM Press. We are creating an interactive tool to help non-professional web site builders create high quality designs. We have previously reported that quantitative measures of web page structure can predict whether a site will be highly or poorly rated by experts, with accuracies ranging from 67-80 several ways. First, we compute a much larger set of measures (157 versus 11), over a much larger collection of pages (5300 vs. 1900), achieving much higher overall accuracy (94 contrasting good, average, and poor pages. Second, we introduce new classes of measures that can make assessments at the site level and according to page type (home page, content page, etc.). Finally, we create statistical profiles of good sites, and apply them to an existing design, showing how that design can be changed to better match high-quality designs
	Ben Shneiderman Jack Kustanowitz. Meaningful presentations of photo libraries: Rationale and applications of bi-level radial quantum layouts. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005.
	A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Computing Surveys, 31(3):264-323, 1999.
	Markus Jakobsson, Philip D. MacKenzie, and Julien P. Stern. Secure and lightweight advertising on the web. In Proceedings of the Eighth International World-Wide Web Conference, 1999. We consider how to obtain a safe and efficient scheme for Web advertising. We introduce to cryptography the market model, a common concept from economics. This corresponds to an assumption of rational behavior of protocol participants. Making this assumption allows us to design schemes that are highly efficient in the common case - which is, when participants behave rationally. We demonstrate such a scheme for Web advertising. employing the concept of e-coupons. We prove that our proposed scheme is safe and meets our stringent security requirements.
	Frankie James. Presenting html structure in audio: User satisfaction with audio hypertext. In ICAD '96 Proceedings, pages 97-103. ICAD, Xerox PARC, November 1996. Also appeared in Web Techniques entitled Experimenting with Audio Interfaces, February 1998, vol 3 no 2, pages 55-58.
	Frankie James. Aha: Audio html access. In Michael R. Genesereth and Anna Patterson, editors, Proceedings of the Sixth International World Wide Web Conference. Published in Computer Networks and ISDN Systems, vol.29, no.8-13, p. 1395-404, 0169-7552 Elsevier Sept. 1997, pages 129-140, Santa Clara, CA, April 1997. IW3C.
	Frankie James. Distinguishability vs. distraction in audio html interfaces. In Submitted to International Journal on Digital Libraries, 1997. Analyzes results from a user study related to the AHA (Audio HTML Access) framework, which tested three audio browsers to determine the appropriateness of certain types of audio markings for various HTML structures. The results added another dimension to the AHA framework, so that the principles outlined in it for choosing sounds to use in an audio presentation of HTML are now: (1) Vocal Source Identity (when to use speaker changes to mark structures), (2) Recognizability, and (3) Distraction (new)
	Frankie James. Presenting html structure in audio: User satisfaction with audio hypertext. CSLI Technical Report 97-201, Stanford University, 1997. Available at http://dbpubs.stanford.edu/pub/1996-83.
	Frankie James. Lessons from developing audio html interfaces. ACM SIGCAPH Conference on Assistive Technologies Proceedings of the third international ACM conference on Assistive technologies. April 15-17, 1998, Marina del Rey, CA USA, pages 27-34., 1998. Discusses application of the principles in the AHA framework to the actual choice of sounds in scenario interfaces. By looking at scenarios, we can see that other factors related to users (such as musical ability, culture, reading style, etc.) are needed in combination with the AHA principles to select specific sounds.
	Frankie James. Lessons from developing audio html interfaces. In ASSETS 98, pages 27-34, Marina del Rey, CA, April 1998. ACM SIGCAPH.
	Greg Janée and James Frew. The adept digital library architecture. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. The Alexandria Digital Earth ProtoType (ADEPT) architecture is a framework for building distributed digital libraries of georeferenced information. An ADEPT system comprises of one or more autonomous libraries, each of which provides a uniform interface to one or more collections, each of which manages metadata for one or more items. The primary standard on which the architecture is based is the ADEPT bucket framework, which defines uniform client-level metadata query services that are compatible with heterogeneous underlying collections. ADEPT functionality strikes a balance between the simplicity of Web document delivery and the richness of Z39.50. The current ADEPT implementation runs as servlet-based middleware and supports collections housed in arbitrary relational databases.
	Greg Janee, James Frew, and David Valentine. Content access characterization in digital libraries [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. To support non-trivial clients, such as data exploration and analysis environments, digital libraries must be able to describe the access modes that their contents support. We present a simple scheme that distinguishes four content accessibility classes: download (byte-stream retrieval), service (API), web interface (interactive), and alternative (semantically equivalent) or multipart (component) hierarchies. This scheme is simple enough to be easily supported by DL content providers, yet rich enough to allow programmatic clients to automatically identify appropriate access point(s).
	William C. Janssen. Collaborative extensions for the uplib system. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. The UpLib personal digital library system is specifically designed for secure use by a single individual. However, collaborative operation of multiple UpLib repositories is still possible. This paper describes two mechanisms that have been added to UpLib to facilitate community building around individual document collections.
	Charlotte Jenkins, Mike Jackson, Peter Burden, and Jon Wallis. Automatic rdf metadata generation for resource discovery. In Proceedings of the Eighth International World-Wide Web Conference, 1999. Automatic metadata generation may provide a solution to the problem of inconsistent, unreliable metadata describing resources on the Web. The Resource Description Framework (RDF) provides a domain-neutral foundation on which extensible element sets can be defined and expressed in a standard notation. This paper describes how an automatic classifier, that classifies HTML documents according to Dewey Decimal Classification, can be used to extract context sensitive metadata which is then represented using RDF. The process of automatic classification is described and an appropriate metadata element set is identified comprising those elements that can be extracted during classification. An RDF data model and an RDF schema are defined representing the element set and the classifier is configured to output the elements in RDF syntax according to the defined schema.
	Michael Jensen. Need-based intellectual property protection and networked university press publishing. In IP Workshop Proceedings, 1994. Format: HTML Document (22K). Audience: Publishers, slightly technical.. References: 0. Links: 0. Relevance: Low-medium. Abstract: A publisher's view on why publisher's won't be irrelevant. Also, a description of the type of security that a publisher would expect (ie, let's not worry about making it perfect, just reasonable.) -a header based system, without details of how to ensure the restrictions are obeyed. Gives a specific list of information that he thinks should be in the header.
	B-S. Jeong and E. Omiecinski. Inverted file partitioning schemes in multiple disk systems. IEEE Transactions on Parallel and Distributed Systems, 6(2):142-153, February 1995.
	Henry N. Jerez, Xiaoming Liu, Patrick Hochstenbach, and Herbert Van de Sompel. The multi-faceted use of the oai-pmh in the lanl repository. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. This paper focuses on the multifaceted use of the OAI-PMH in a repository architecture designed to store digital assets at the Research Library of the Los Alamos National Laboratory (LANL), and to make the stored assets available in a uniform way to various downstream applications. In the architecture, the MPEG-21 Digital Item Declaration Language is used as the XML-based format to represent complex digital objects. Upon ingestion, these objects are stored in a multitude of autonomous OAI-PMH repositories. An OAI-PMH compliant Repository Index keeps track of the creation and location of all those repositories, whereas an Identifier Resolver keeps track of the location of individual objects. An OAI-PMH Federator is introduced as a single-point-of-access to downstream harvesters. It hides the complexity of the environment to those harvesters, and allows them to obtain transformations of stored objects. While the proposed architecture is described in the context of the LANL library, the paper will also touch on its more general applicability.
	T. Joachims, D. Freitag, and T. Mitchell. Webwatcher: A tour guide for the world wide web. In Proceedings of IJCAI97, 1997. We describe WebWatcher as a tour guide agent for the web, the learning algorithms used by WebWatcher, experimental results based on learning from thousands of users, and lessons learned from this case study of tour guide agents.
	Robert Johansen. Teleconferencing and beyond : communications in the office of the future. McGraw-Hill data communications book series. McGraw-Hill, 1984.
	David B. Johnson and Willy E. Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. Journal of Algorithms, 11(3):462-491, September 1990.
	Eric H. Johnson and Pauline A. Cochrane. A hypertextual interface for a searcher's thesaurus. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document(34K + pictures) . Audience: Searchers, HCI people. References: 10. Links: 2. Relevance: Low. Abstract: Describes a MS Windows interface for searching using a thesaurus of related terms. Has 3 parts: a hierarchical organization of the terms, a `cloud` of related terms, and a keyword-in-context that tries to match what you type incrementally. The cloud and hierarchy are point and click, and the hierarchy can be expanded and collapsed ala MS Word outline mode. Also capable of handling multi-hierarchies, where a term has multiple roots.
	Heidi Johnson. Graded access to sensitive materials at the archive of the indigenous languages of latin america [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. The Archive of the Indigenous Languages of Latin America (AILLA) is a web-accessible repository of multi-media resources in and about the indigenous languages of Latin America. In this paper, I describe the Graded Access System developed at AILLA to protect sensitive materials by allowing resource producers - academics and indigenous people - finely-grained control over the resources they house in the archive.
	Christopher B. Jones, R. Purves, A. Ruas, M. Sanderson, M. Sester, M. van Kreveld, and R. Weibel. Spatial information retrieval and geographical ontologies an overview of the spirit project. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 387-388. ACM Press, 2002.
	Matt Jones, Gary Marsden, Norliza Mohd-Nasir, Kevin Boone, and George Buchanan. Improving web interaction on small displays. In Proceedings of the Eighth International World-Wide Web Conference, 1999. Soon many people will retrieve information from the Web using handheld, palmsized or even smaller computers. Although these computers have dramatically increased in sophistication, their display size is - and will remain - much smaller than their conventional, desktop counterparts. Currently, browsers for these devices present Web pages without taking account of the very different display capabilities. As part of a collaborative project with Reuters, we carried out a study into the usability impact of small displays for retrieval tasks. Users of the small screen were 50subjects. Small screen users used a very substantial number of scroll activities in attempting to complete the tasks. Our study also provided us with interesting insights into the shifts in approach users seem to make when using a small screen device for retrieval. These results suggest that the metaphors useful in a full screen desktop environment are not the most appropriate for the new devices. Design guidelines are discussed, here, Proposing directed access methods for effective small screen interaction. In our ongoing work, we are developing such 'meta-interfaces' which will sit between the small screen user and the `conventional' Web page.
	Michael L.W. Jones, Robert H. Rieger, Paul Treadwell, and Geri K. Gay. Live from the stacks: User feedback on mobile computers and wireless tools for library patrons. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Digital library research is made more robust and effective when end-user opinions and viewpoints inform the research, design and development process. A rich understanding of user tasks and contexts is especially necessary when investigation the use of mobile computers in traditional and digital library environments, since the nature and scope of the research questions at hand remain relatively undefined. This paper outlines findings from a library technologies user survey and on-site mobile library access prototype testing, and presents future research directions that can be derived from the results of these two studies.
	Steve Jones and Gordon Paynter. Topic-based browsing within a digital library using keyphrases. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. Many digital libraries are comprised of documents from disparate sources that are independent of the rest of the collection in which they reside. A user's ability to explore is severely curtailed when each document stands in isolation; there is no way to navigate to other, related, documents, or even to tell if such documents exist. `We describe a method for automatically introducing topic-based links into documents to support browsing in digital libraries. Automatic keyphrase extraction is exploited to identify link anchors, and keyphrase-based similarity measures are used to select and rank destinations. Two implementations are described: one that applies these techniques to existing WWW-based digital library collections using standard HTML, and one that uses a wider range of interface techniques to provide more sophisticated linking capabilities. An evaluation shows that keyphrase-based similarity measures work as well as a popular full-text retrieval system for finding relevant destination documents.
	Steve Jones and Gordon W. Paynter. Human evaluatin of kea, an automatic keyphrasing system. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. This paper describes an evaluation of the Kea automatic keyphrase extraction algorithm. Tools that automatically identify keyphrases are desirale because document keyphrases have numerous applications in digital library sysems, but are costly and time consuming to manually assign. Keyphrase extraction algorithms are usually evaluated by comparison to author-specified keywords, but this methodology has several well-known shortcomings. The results presented in this pper are based on subjective evaluations of the quality and appropriateness of keyphrases by human assessors, and make a number of contributions. First they validate previous evaluations of Kea that rely on author keywords. Second, they show Kea's performance is comparable to that of similar systems that have been evaluated by human assessors. Finally, they justify the use of author keyphrases as a performance metric by showing that authors generally choose good keywords.
	Javaserver pages (jsp) technology. http://java.sun.com/products/jsp/.
	Jesper Juhne, Anders T. Jensen, and Kaj Gronbaek. Ariadne: a java-based guided tour system for the world wide web. In Proceedings of the Seventh International World-Wide Web Conference, 1998.
	Volker Jung. Metaviz: Visual interaction with geospatial digital libraries. Technical Report TR-99-017, International Computer Science Insitute, 1999.
	Eija Kaasinen, Matti Aaltonen, Juha Kolari, Suvi Melakoski, and Timo Laakko. Two approaches to bringing internet services to wap devices. In Proceedings of the Ninth International World-Wide Web Conference, 2000.
	Charles Kacmar, Susan Hruska, Chris Lacher, Dean Jue, Christie Koontz, Myke Gluck, and Stuart Weibel. An architecture and operation model for a spatial digital library. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (32K) . Audience: Mostly non-technical, funders, Geographic IS people. References: 18. Links: 1. Relevance: Low-Medium. Abstract: Discusses value and problems of spatial (geographic) data. Proposes a distributed hierarchy of metadata to assist in location of relevant data.
	Charles Kacmar, Dean Jue, David Stage, and Christie Koontz. Automatic creation and maintenance of an organizational spatial metadata and document digital library. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document(37K + pictures). Audience: Librarians, computer scientists, slight govt/business slant. References: 1. Links: 25. Relevance: Low. Abstract: Discusses a geographic information system for environmental data about Florida. Information is organized around `information zones` which are geographic territories. Prototype system in use.
	Jose Kahan. Wdai: A simple world wide web distributed authorization infrastructure. In Proceedings of the Eighth International World-Wide Web Conference, 1999. The World Wide Web (W3) has the potential to link different kinds of documents into hypertext collections and to distribute such collections among many document servers. Distributed collections can bring forth new W3 applications in extranets and expand the concept of content reuse. However, they also bring new authorization problems. such as the need for coordinated user administration, user authentication, and revocation of rights. This paper proposes WDAI. a simple and general infrastructure for distributed authorization on the World Wide Web. Under WDAI, browsers and servers exchange authorization information using X.S09v3-based authorization certificates. WDAI is designed to be open to a wide variety of security policies and, being compatible with existing W3 technology, can be implemented without modifying existing browsers.
	Brian Kahin. Institutional and policy issues in the development of the digital library. In JEP, 1994. Format: HTML Document (31K) . Audience: net users, librarians, publishers. References: 1. Links: 0. Relevance: low-medium. Abstract: Argues that publishers and libraries will fill the same niche in the digital library. Both will have to change. Talks about the Chicago Journal of Theoretical CS, and says that MIT press is making the effort to cha nge the perception of the online journal so it is considered as prestigious as a traditional journal. Worries that drained of traditional revenue sources, publishers will charge much higher reprint fees. Also concerned that there may be a patent in the pipeline that the Patent Office will stupidly allow, causing major problems for the field.
	Brian Kahin. The strategic environment for protecting multimedia. In IP Workshop Proceedings, 1994. Format: HTML Document (23K + pictures). Audience: Non technical, general public. References: 0. Links: 0. Relevance: medium. Abstract: A very good big picture overview that covers a broad range of issues, from desirability of technological protection of IP, to the government's role, to past models of protection.
	Robert Kahn and Robert Wilensky. A framework for distributed digital object services. Technical Report tn95-01, CNRI, May 1995. At http://WWW.CNRI.Reston.VA.US/home/cstr/arch/k-w.html. This paper provides a method for naming, identifying and/or invoking digital objects in a system of distributed repositories. It provides the foundation for the naming system of the CS-TR project. See also At http://WWW.CNRI.Reston.VA.US/home/cstr/arch/slides.html for a slide presentation.
	Robert E. Kahn. Deposit, registration and recordation in an electronic copyright management system. In IP Workshop Proceedings, 1994. Format: HTML Document (25K + picture). Audience: Good technical introduction . References: 0. Links: 0. Relevance: Medium. Abstract: Reasonable intro to the ideas of encryption, digital signatures, notarization, and a possible mechanism for copyright request for law abiding users doesn't deal with protection against caching/passing documents.
	Oliver Kaljuvee, Orkut Buyukkokten, Hector Garcia-Molina, and Andreas Paepcke. Efficient web form entry on pdas. In To appear in the Proceedings of the 10th Intl. WWW conference, 2000. Available at http://dbpubs.stanford.edu/pub/2001-44. We propose a design for displaying and manipulating HTML forms on small PDA screens. The form input widgets are not shown until the user is ready to fill them in. At that point, only one widget is shown at a time. The form is summarized on the screen by displaying just the text labels that prompt the user for each widget's information. The challenge of this design is to automatically find the match between each text label in a form, and the input widget for which it is the prompt. We developed eight algorithms for performing such label/widget matches. Some of the algorithms are based on n-gram comparisons, while others are based on common form layout conventions. We applied a combination of these algorithms to 100 simple HTML forms with an average of four input fields per form. These experiments achieved a 95 accuracy. We developed a scheme that combines all algorithms into a matching system. This system did well even on complex forms, achieving 80 experiments involving 330 input fields spread over 48 complex forms.
	Oliver Kaljuvee, Orkut Buyukkokten, Hector Garcia-Molina, and Andreas Paepcke. Efficient web form entry on pdas. In Proceedings of the Tenth International World-Wide Web Conference, 2001. Available at http://dbpubs.stanford.edu/pub/2001-44. We propose a design for displaying and manipulating HTML forms on small PDA screens. The form input widgets are not shown until the user is ready to fill them in. At that point, only one widget is shown at a time. The form is summarized on the screen by displaying just the text labels that prompt the user for each widget's information. The challenge of this design is to automatically find the match between each text label in a form, and the input widget for which it is the prompt. We developed eight algorithms for performing such label/widget matches. Some of the algorithms are based on n-gram comparisons, while others are based on common form layout conventions. We applied a combination of these algorithms to 100 simple HTML forms with an average of four input fields per form. These experiments achieved a 95 accuracy. We developed a scheme that combines all algorithms into a matching system. This system did well even on complex forms, achieving 80 experiments involving 330 input fields spread over 48 complex forms.
	Kenichi Kamiya, Martin Röscheisen, and Terry Winograd. Grassroots: A system providing a uniform framework for communicating, structuring, sharing information, and organizing people. In Proceedings of the Fifth International World-Wide Web Conference, 1996. Also published in part as a short paper for CHI'96 (conference companion). People keep pieces of information in diverse collections such as folders, hotlists, e-mail inboxes, newsgroups, and mailing lists. These collections mediate various types of collaborations including communicating, structuring, sharing information, and organizing people. Grassroots is a system that provides a uniform framework to support people's collaborative activities mediated by collections of information. The system seamlessly integrates functionalities currently found in such disparate systems as e-mail, newsgroups, shared hotlists, hierarchical indexes, hypermail, etc. Grassroots co-exists with these systems in that its users benefit from the uniform image provided by Grassroots, but other people can continue using other mechanisms, and Grassroots leverages from them. The current Grassroots prototype is based on an http-proxy implementation, and can be used with any Web browser. In the context of the design of a next-generation version of the Web, Grassroots demonstrates the utility of a uniform notification infrastructure.
	Min-Yen Kan and Judith L. Klavans. Using librarian techniques in automatic text summarization for information retrieval. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. A current application of automatic text summarization is to provide an overview of relevant documents coming from an information retrieval (IR) system. This paper examines how Centrifuser, one such summarization system, was designed with respect to methods used in the library community. We have reviewed these librarian expert techniques to assist information seekers and codified them into eight distinct strategies. We detail how we have operationalized six of these strategies in our computational system and present a preliminary evaluation.
	Min-Yen Kan and Danny C. C. Poo. Detecting and supporting known item queries in online public access catalogs. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. When users seek to find specific resources in a digital library, they often use the library catalog to locate them. These catalog queries are defined as known item queries. As known item queries search for specific resources, it is important to manage them differently from other search types, such as area searches. We study how to identify known item queries in the context of a large academic institution’s online public access catalog (OPAC). We also examine how to recognize when a known item query has retrieved the item in question. Our approach combines techniques in machine learning, language modeling and machine translation evaluation metrics to build a classifier capable of distinguishing known item queries with an accuracy of 72% and correctly classifies titles for whether they are the known item sought with an accuracy of 86%. To our knowledge, this is the first report of such work, which has the potential to streamline the user interface of both OPACs and digital libraries in support of known item searches.
	ByungHoon Kang and Robert Wilensky. Toward a model of self-administering data. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. We describe a model of self-administering data. In this model, a declarative description of how a data object should behave is attached to the object, either by a user or by a data input device. A widespread infrastructure of self-administering data handlers is presumed to exist; these handlers are responsible for carrying out the specifications attached to the data. Typically, the specifications express how and to whom the data should be transferred, how it should be incorporated when it is received, what rights recipients of the data will have with respect toit, and the kind of relation that should exist between distributed copies of the object. Functions such as distributed version control can be implemented on top of the basic handler functions. We suggest that this model can provide superior support for common cooperative functions. Because the model is declarative, users need only express their intentions once in creating a self-administering description, and need not be concerned with manually performing subsequent repetitious operations. Because the model is peer-to-per, users are less dependent on additional, perhaps costly resources, at least when these are not critical. An initial implementation of the model has been created. We are experimenting with the model both as a tool to aid in digital library functions, and as a possible replacement for some server oriented functions.
	Hyunmo Kang and Ben Shneiderman. Visualization methods for personal photo collections: Browsing and searching in the photofinder. In IEEE International Conference on Multimedia and Expo, 2000.
	Hyunmo Kang and Ben Shneiderman. Mediafinder: an interface for dynamic personal media management with semantic regions. In CHI '03: CHI '03 extended abstracts on Human factors in computing systems, pages 764-765, New York, NY, USA, 2003. ACM Press.
	Hyunmo Kang and Ben Shneiderman. Exploring personal media: A spatial interface supporting user-defined semantic regions. Technical Report ISR 2005-51, University of Maryland, 2005. Media Finder
	C. K. Kantarjiev, A. Demers, R. Frederick, R. T. Krivacic, and M. Weiser. Experiences with x in a wireless environment. In Proceedings of the USENIX Mobile and Location-Independent Computing Symposium, pages 117-28, Aug 1993. Wireless computing is all the rage; the X Window System seems to be inescapable. We have been experimenting with the cross-product, and have had mixed results. The network may not be the computer any more, but it certainly influences the way the computer performs, and adding a fairly atypical network layer cannot help but expose some underlying assumptions. We discuss a few that we found and go on to speculate about how to best push X in the direction of mobile and location-independent computing.
	Olga Kapitskaia, Anthony Tomasic, and Patrick Valduriez. Dealing with discrepancies in wrapper functionality. Technical Report RR-3138, INRIA, 1997.
	Nancy Kaplan and Yoram Chisik. In the company of readers: The digital library book as practiced place. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. In this article we describe a system for annotating digital books in a digital library for young adult readers and discuss a first field test of the system with a small group of young adults together with their families and a few friends. We argue that most studies of digital libraries and their patrons' needs or desires look at adult users pursuing work-related goals such as collaborative writing task encountered in college or in professional work places. Most studies also examine the problems and issues of research libraries rather than public libraries. Yet children who are growing up digital may not only have different and heretofore unrecognized needs but also may have insights into the needs future library patrons may have. And public libraries, which support a broader public's reading habits, may also need different kinds of tools for their patrons. Because we are working with younger readers, we have focused on supporting active reading but in the context of reading for pleasure. Our results show that the digital library book for young adults can become a `practiced place,` by which we mean a site of shared, constructed meaning through the traces that individuals' reading and writing create.
	Unmil P. Karadkar, Richard Furuta, and Jin-Cheon Na. Exploring user perceptions of digital image similarity. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. The MIDAS project is developing infrastructure and policies for optimal display of digital information on devices with diverse characteristics. In this paper we present the preliminary results of a study that explored the effects of scaling and color-depth variation in digital photographs on user perceptions of similarity. Our results indicate general trends in user preferences and can serve as guidelines for designing policies and systems that display digital images optimally on various information devices.
	David Karger, Alex Sherman, Andy Berkheimer, Bill Bogstad, Rizwan Dhanidina, Ken Iwamoto, Brian Kim, Luke Matkins, and Yoav Yerushalmi. Web caching with consistent hashing. In Proceedings of the Eighth International World-Wide Web Conference, 1999. A key performance measure for the World Wide Web is the speed with which content is served to users. As traffic on the Web increases, users are faced with increasing delays and failures in data delivery. Web caching is one of the key strategies that has been explored to improve performance. An important issue in many caching systems is how to decide what is cached where at any given time. Solutions have included multicast queries and directory schemes. In this paper, we offer a new Web caching strategy based on consistent hashing. Consistent hashing provides an alternative to multicast and directory schemes, and has several other advantages in load balancing and fault tolerance. Its performance was analyzed theoretically in previous work: in this paper we describe the implementation of a consistent-hashing-based system and experiments that support our thesis that it can provide performance improvements.
	Ahmad Rafee Che Kassim and Thomas R. Kochtanek. Interactive digital library resource information system: A web portal for digital library education. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. This paper describes a collaborative database project that focuses on access to materials on topics relating to digital libraries that are organized within an educational framework.
	H. Kautz, B. Selman, , and M. Shah. Referral web: Combining social networks and collaborative filtering. Communications of the ACM, 40(3):63-65, March 1997. One of the most effective channels for disseminating information and expertise within an organization is its informal social network of collaborators, colleagues and friends. Manually searching for a referral chain can be a frustrating and time-consuming task. One is faced with the trade-off of contacting a large number of individuals at each step, and thus straining both the time and goodwill of the possible respondents, or of contacting a smaller, more focused set, and being more likely to fail to locate an appropriate expert. In response to these problems, we are building ReferralWeb, an interactive system for reconstructing, visualizing and searching social networks on the World Wide Web. Simulation experiments we ran before we began construction of ReferralWeb showed that automatically generated referrals can be highly successful in locating experts in a large network.
	Henry Kautz, Al Milewski, and Bart Selman. Agent amplified communication. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript(). Audience: AI researchers, sociologists. References: .12 Links: . Relevance: Low. Abstract: Many information requests cannot be answered via on-line resources, but must be answered by a human expert. Finding the right person is hard, and usually works by contacting a friend or friend of a friend. This system is designed to automatically generate chains of referrals-each user encodes areas of expertise, and lists colleagues who may be questioned. The system records all mail, incoming & outgoing, and uses that to determine area of expertise of communicators, storing an inverted index of the mail. Then when a question comes up, the system automatically selects relevant people, and sends queries. Their agents intercept, and either route to user, send other referrals, or delete. Simulation results show number of steps to answer for random graphs based on accuracy and responsiveness of referrals.
	Kiyokuni Kawachiya and Hiroshi Ishikawa. Navipoint: An input device for mobile information browsing. In Proceedings of the Conference on Human Factors in Computing Systems CHI'98, 1998.
	Roland Kaye and Stephen Little. Strategies and standards for cultural interoperability in global business systems. In Proceedings of the 29th Annual Hawaii International Conference on System Sciences. IEEE Computer Society Press, 1996. Discusses the dynamics of standardization processes for achieving interoperability and compatibility necessary for global business systems.
	A. Keller, O. Densmore, Wei Huang, and B. Razavi. Zippering: Managing intermittent connectivity in diana. Mobile Networks and Applications, 2(2):357-64, 1997. This paper describes an approach for handling intermittent connectivity between mobile clients and network-resident applications, which we call zippering. When the client connects with the application, communication between the client and the application is synchronous. When the client intermittently connects with the application, communication becomes asynchronous. The DIANA (Device-Independent, Asynchronous Network Access) approach allows the client to perform a variety of operations while disconnected. Finally, when the client reconnects with the application, the operations performed independently on the client are replayed to the application in the order they were originally done. Zippering allows the user at the client to fix errors detected during reconciliation and continues the transaction gracefully instead of aborting the whole transaction when errors are detected.
	Robert B. Kellogg and Madhan Subhas. Text to hypertext: Can clustering solve the problem in digital libraries? In Proceedings of DL'96, 1996. Format: Not yet online.
	Diane Kelly and Colleen Cool. The effects of topic familiarity on information search behavior. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We describe results from a preliminary investigation of the relationship between topic familiarity and information search behavior. Two types of information search behaviors are considered: reading time and efficacy. Our results indicate that as one's familiarity with a topic increases, one's searching efficacy increases and one's reading time decreases. These results suggest that it may be possible to infer topic familiarity from information search behavior.
	Robert E. Kent and Christian Neuss. Creating a web analysis and visualization environment. In Proceedings of the Second International World-Wide Web Conference, 1994.
	Eamonn J. Keogh, Selina Chu, David Hart, and Michael J. Pazzani. An online algorithm for segmenting time series. In IEEE International Conference on Data Mining, pages 289-296, 2001.
	Brett Kessler, Geoffrey Nunberg, and Hinrich Schuetze. Automatic detection of text genre. In Proceedings ACL/EACL, 1997. Describes experiment where they detected genre just from surface features, not from tags marking structures. They show that they can do at least as well with this low-overhead approach as others can with tagging. They also detect level of sophistication
	Jack Kessler. The french minitel: Is there digital life outside of the us ascii internet? a challenge or convergence? D-Lib Magazine, Dec 1995. Format: HTML Document().
	Steven Ketchpel. Transaction protection for information buyers and sellers. In DAGS '95, 1995. Format: HTML Document (28K) . Audience: Computer scientists. References: 6. Links: 7. Relevance: High. Abstract: Describes protocols and mechanisms which allow parties in an information sale to ensure that neither side is taken advantage of. (eg, giving information without getting paid, or paying without getting promised inform ation) Uses digital signatures to verify the source, relies on a trusted intermediary to act as either a delivery service or escrow agent (which determines whether it works to prevent problems or just punish people who commit them.).
	Steven Ketchpel. Distributed commerce transactions with timing deadlines and direct trust. Poster at International Joint Conference on AI'97, 1997.
	Steven Ketchpel and Héctor García-Molina. Making trust explicit in distributed commerce transactions. In Proceedings of the International Conference on Distributed Computing Systems, 1996.
	Steven Ketchpel and Hector Garcia-Molina. Competitive sourcing for internet commerce. In Proceedings of International Conference on Distributed Computing Systems, 1998. Abstract: In electronic commerce on the Internet, a customer can choose among several competitive suppliers, but because of the nature of the Internet, the reliability and trustworthiness of suppliers may vary significantly. The customer's goal is to maximize its utility, by minimizing the expense required to fulfill its request, and maximizing its probability of success by some deadline. To this end, the customer creates a request strategy, describing which suppliers to contact under what conditions. In this paper we describe models for representing request strategies complete with supplier reliabilities, delivery timeliness profiles, and customer deadlines. We also develop decision procedures for selecting request strategies that maximize expected utility under certain scenarios, and more efficient heuristics that approximate the optimal solution.
	Steven Ketchpel and Hector Garcia-Molina. A sound and complete algorithm for distributed commerce transactions. Distributed Computing, 12(1), 1999. Available at http://dbpubs.stanford.edu/pub/2000-44. In a multi-party transaction such as fulfilling an information request from multiple sources (also called a distributed commerce transaction), agents face risks from dealing with untrusted agents. These risks are compounded in the face of deadlines, e.g., an agent may fail to deliver purchased goods by the time the goods are needed. We present a distributed algortihm that mitigates these risks, by generating a safe sequence of actions (when possible) that completes a commerce transaction with no risk. We show that the algorithm is sound (produces only safe multi-agent action sequences) and complete (finds a safe sequence whenever one exists). We also show how the algorithm may be extended so that agents may interact directly with other participants rather than through a trusted intermediary.
	Steven Ketchpel, Hector Garcia-Molina, Andreas Paepcke, Scott Hassan, and Steve Cousins. UPAI: A universal payment application interface. In USENIX 2nd e-commerce workshop, 1996.
	Steven P. Ketchpel, Hector Garcia-Molina, and Andreas Paepcke. Shopping models: A flexible architecture for information commerce. In Proceedings of the Second ACM International Conference on Digital Libraries, 1997. At http://dbpubs.stanford.edu/pub/1997-52. In a digital library, there are many different interaction models between customers and information providers or merchants. Subscriptions, sessions, pay-per-view, shareware, and pre-paid vouchers are different models that each have different properties. A single merchant may use several of them. Yet if a merchant wants to support multiple models, there is a substantial amount of work to implement each one. In this paper, we formalize the shopping models which represent these different modes of consumer to merchant interaction. In addition to developing the overall architecture, we define the application program interfaces (API) to interact with the models. We show how a small number of primitives can be used to construct a wide range of shopping models that a digital library can support, and provide examples of the shopping models in operation, demonstrating their flexibility.
	Amir Khella and Benjamin B. Bederson. Pocket PhotoMesa: a zoomable image browser for PDAs. In MUM '04: Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia, pages 19-24, New York, NY, USA, 2004. ACM Press.
	Mohamed Kholief, Kurt Maly, and Stewart Shen. Event-based retrieval from a digital library containing medical streams [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. We describe a digital library that contains streams and supports event-based retrieval. Streams used in the digital library are CT scan, medical text, and audio streams. Events, such as 'tumor appeared', were generated and represented in the user interface to enable doctors to retrieve and playback segments of the streams. This paper concentrates on describing the data organization and the user interface.
	Michael Khoo. Community design of dlese's collections review policy: A technological frames analysis. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. In this paper, I describe the design of a collection review policy for the Digital Library for Earth System Education (DLESE). A distinctive feature of DLESE as a digital library is the 'DLESE community,' composed of voluntary members who contribute metadata and resource reviews to DLESE. As the DLESE community is open, the question of how to evaluate community contributions is a crucial part of the review policy design process. In this paper, technological frames theory is used to analyse this design process by looking at how the designers work with two differing definitions of the 'peer reviewer,' (a) per reviewer as arbiter or editor, and (b) peer reviewer as colleague. Content analysis of DLESE documents shows that these frames can in turn be related to two definitions that DLESE offers of itself: DLESE as a library, and DLESE as a digital artifact. The implications of the presence of divergent, technological frames for the design process are summarised, and some suggestions for future research are outlined.
	Michael Khoo. Tacit user and development frames in user-led collection development: The case of the digital water education library. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. This paper discusses the impact of developers’ and users’ tacit forms of understanding on digital library development. It draws on three years of ethnographic research with the Digital Water Education Library (DWEL) that focused on the observation, collection, and analysis of the project’s face-to-face and electronic organizational communication. The DWEL project involved formal and informal educators in the development of its collection, and experienced problems at the start of the project with getting these educators to complete their cataloguing tasks. The research showed that despite having spent several days in face-to-face workshops, the project’s PIs and the educators had different tacit understandings of what digital libraries were, that were impeding the project’s organizational communication and workflow. I describe how these differences were identified and analyzed, and subsequently addressed and mediated through the design and development of online tools that acted as boundary objects between the PIs and the educators.
	Gregor Kiczales, Jim des Rivières, and Daniel G. Bobrow. The Art of the Metaobject Protocol. MIT Press, 1991. This is the official citation for the MOP spec. It can also be used as a citation for an intro to the idea of a MOP.
	Gregor Kiczales and Andreas Paepcke. Open implementations and metaobject protocols. Expanded tutorial notes, 1994. At http://db.stanford.edu/ paepcke/shared-documents/Tutorial.ps.
	Hyunki Kim, Hakgene Shin, and Jaewoo Chang. An object-oriented hypermedia system for structured documents. In Proceedings of DL'96, 1996. Format: Not yet online.
	Gary King, H. T. Kung, Barbara Grosz, Sidney Verba, Dale Flecker, and Brian Kahin. The harvard self-enriching library facilities (self) project. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (23K) . Audience: Non technical, funders. References: 0. Links: 1. Relevance: Low-Medium. Abstract: Digital libraries proposal that focuses on 2 way information flow. Allow users to enter annotations, reviews, dataset descriptions, etc. Emphasis on user interface, some intellectual property. Combination of Z39.50 & HTTP.
	Richard P. King, Nagui Halim, Hector Garcia-Molina, and Christos A. Polyzois. Management of a remote backup copy for disaster recovery. TODS, 16(2):338-68, 1991. A remote backup database system tracks the state of a primary system, taking over transaction processing when disaster hits the primary site. The primary and backup sites are physically isolated so that failures at one site are unlikely to propagate to the other. For correctness, the execution schedule at the backup must be equivalent to that at the primary. When the primary and backup sites contain a single processor, it is easy to achieve this property. However, this is harder to do when each site contains multiple processors and sites are connected via multiple communication lines. The authors present an efficient transaction processing mechanism for multiprocessor systems that guarantees this and other important properties. They also present a database initialization algorithm that copies the database to a backup site while transactions are being processed.
	Thomas Kirk, Alon Y. Levy, Yehoshua Sagiv, and Divesh Srivastava. The information manifold. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	T. Kirste and U. Rauschenbach. A presentation model for mobile information visualization. Computers & Graphics, 20(5):669-81, 1996. One of the visions of mobile computing is to put `all information at the user's fingertips`-to allow a user to operate on any data, any time, anywhere. The idea is to create an information environment providing homogeneous access to all data and services available in the distributed, mobile computing infrastructure. A fundamental requirement for access to such an open, distributed information system is an intelligent selection of methods for information visualization based on user requirements and available display functionality. A flexible concept is proposed that allows one to enrich the nodes of an information structure with information about which alternative display methods can be used for what parts of the node. These facets' are then used by a recursive view generation process for selecting suitable display methods while creating a visualization of an information structure. Influence parameters such as user characteristics, display resources, and data properties can be used to guide the selection process in order to create a presentation that optimally meets the user's goals.
	Jon Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604-632, November 1999.
	Vickie L. Kline. Spirit guides of cyberspace. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (5K). Audience: Non-technical. Librarians and futurologists.. References: 0. Links: 1. Relevance: Low. Abstract: A futuristic view of what librarians might be once virtual reality is commonplace. Told as a sci-fi story.
	Rob Kling and Margaret Elliott. Digital library design for usability. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (49K + picture) . Audience: Non-technical, developers, human factors people. References: 16. Links: 1. Relevance: Low. Abstract: Distinguishes between usability in terms of user interface and `organizational usability` (integration into existing working environment). Claims both are ignored, the latter more so. Presents 5 traditional models of software development (primarily from point of view of end-user inclusion) and a 6th based on organizational usability.
	Craig A. Knoblock. Integrating planning and execution for information gathering. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Craig A. Knoblock and Alon Y. Levy. Exploiting run-time information for efficient processing of queries. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Judson Knott and Paul Jones. Sunsite: Serving your internet needs since 1992. D-Lib Magazine, Feb 1996. Format: HTML Document().
	Rajiv Kochumman, Carlos Monroy, Jie Deng, Richard Furuta, and Eduardo Urbina</. Tools for a new generation of scholarly edition unified by a tei-based interchange format. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. We report on experience gained from our ongoing multi-year project to produce an Electronic Variorum Edition of Cervantes' Don Quixote de la Mancha. Initially designed around a custom database representation, the project's evolution has lead to the inclusion of a TEI-based format for information interchange among the project's major components. We discuss the mechanics of his approach and its benefits.
	Rajiv Kochumman, Carlos Monroy, Richard Furuta, Arpita Goenka, Eduardo Urbina, and Erendira Melgoza. Towards an electronic variorum edition of cervantes' don quixote: Visualizations that support preparation. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. The Cervantes Project is creating an Electronic Variorum Edition (EVE) of Cervantes' well-known Don Quixote de la Mancha, published beginning in 1605. In this paper, we report on visualizations of features of a text collection that help us validate our text transcriptions and understand the relationships among the different printings of an edition.
	Ron Kohavi and Mehran Sahami. Error-based and entropy-based discretization of continuous features. In Second International Conference on Knowledge Discovery in Databases, 1996. At ftp://starry.stanford.edu/pub/sahami/papers/kdd96-disc.ps.
	Pavlos Kokosis, Vlassis Krikos, Sofia Stamou, and Dimitris Christodoulakis. Hibo: A system for automatically organizing bookmarks. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. In this paper, we introduce the HiBO bookmark management system. HiBO aims at extending the populated personal repositories (aka bookmarks) by automatically organizing their contents into topics, through the use of a built-in subject hierarchy. HiBO offers customized personalized services, such as the meaningful grouping and ordering of bookmarks within the hierarchy’s topics in terms of the bookmarks’ conceptual similarity to each other. HiBO also provides a framework that allows the user to customize and assist the categorization process.
	D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML-97), 1997. The proliferation of topic hierarchies for text documents has resulted in a need for tools that automatically classify new documents within such hierarchies. One can use existing classifiers by ignoring the hierarchical structure, treating the topics as separate classes. Unfortunately, in the context of text categorization, we are faced with a large number of classes and a huge number of relevant features needed to distinguish between them. Consequently, we are restricted to using only very simple classifiers, both because of computational cost and the tendency of complex models to overfit. We propose an approach that utilizes the hierarchical topic structure to decompose the classification task into a set of simpler problems, one at each node in the classification tree. As we show, each of these smaller problems can be solved accurately by focusing only on a very small set of features, those relevant to the task at hand. This set of relevant features varies widely throughout the hierarchy, so that, while the overall relevant feature set may be large, each classifier only examines a small subset. The use of reduced feature sets allows us to utilize more complex (probabilistic) models, without encountering the computational and robustness difficulties described above.
	Daphne Koller and Mehran Sahami. Toward optimal feature selection. Submitted for publication, 1996.
	J.A. Konstan, B.N. Miller, D. Maltz, J.L. Herlocker, L.R. Gordon, and J. Riedl. Grouplens: Applying collaborative filtering to usenet news. Communications of the ACM, 40(3):77-87, March 1997. The GroupLens project designed, implemented and evaluated a collaborative filtering system for Usenet news-a high-volume, high-turnover discussion list service on the Internet. Usenet newsgroups (the individual discussion lists) may carry hundreds of messages each day. The combination of high volume and personal taste made Usenet news a promising candidate for collaborative filtering. More formally, we determined that the potential predictive utility for Usenet news was very high. GroupLens has proved to be an experimental success and it shows promise as a viable service for all Usenet news users.
	Theodorich Kopetzky and Max Muhlhauser. Visual preview for link traversal on the world wide web. In Proceedings of the Eighth International World-Wide Web Conference, 1999. This paper demonstrates a technique for augmenting current World Wide Web browser implementations with features found in classical, hypertext applications but unknown to the World Wide Web community until now. An example implementation is shown using Netscape Navigator 4.x using JavaScript, dynamic HTML and Java. The implementation follows an architecture based on a proxy server which acts as a gateway between the Internet and the browsing client. Based on the detailed example, support for further features is discussed.
	Eckhart Koppen and Gustaf Neumann. A practical approach towards active hyperlinked documents. In Proceedings of the Seventh International World-Wide Web Conference, 1998.
	David G. Korn and Kiem-Phong Vo. Engineering a differencing and compression data format. In Proceedings of Usenix 2002. USENIX, 2002.
	Martijn Koster. Robots in the web: threat or treat? ConneXions, 4(4), April 1995.
	Menno-Jan Kraak. Integrating multimedia in geographical information systems. IEEE MultiMedia, 3(2):59-65, 1996.
	Balachander Krishnamurthy, Jeffrey C. Mogul, and David M. Kristol. Key differences between http/1.0 and http/1.1. In Proceedings of the Eighth International World-Wide Web Conference, 1999. The HTTP/l.1 protocol is the result of four years of discussion and debate among a broad group of Web researchers and developers. It improves upon its phenomenally successful predecessor, HTTP/1.0, in numerous ways. We discuss the differences between HTTP/1.0 and HITP/l.l, as well as some of the rationale behind these changes.
	Anders Kristensen. Formsheets and the xml forms language. In Proceedings of the Eighth International World-Wide Web Conference, 1999. This paper presents XForm - a proposal for a general and powerful mechanism for handling forms in XML. XForm defines form - related constructs independent of any particular XML language and set of form controls. It defines the notion of formsheets as a mechanism for computing form values on the client, form values being arbitrary, typed XML documents. This enables a symmetrical exchange of data between clients and servers which is useful for example for database and workflow applications. Formsheets can be written in a variety of languages - we argue that the document transformation capabilities of XSL stylesheets make them an elegant choice.
	Aaron Krowne and Martin Halbert. Combined searching of web and oai digital library resources. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. In this paper, we describe an experiment in combined searching of web pages and digital library resources, exposed via an Open Archives metadata provider and web gateway service. We utilize only free/open source software components for our investigation, in order to demonstrate feasibility of deployment for all institutions.
	Aaron Krowne and Martin Halbert. An evaluation of automatic ontologies for digital library browsing. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. In this article we present an evaluation of the use of text clustering and classification methods to improve digital library browse interfaces over metadata lacking a unified ontological basis. This situation is common in ``portal'' style digital libraries, which are built by harvesting content from many disparate sources, typically using the Open Archives protocol for metadata harvesting (OAI-PMH).
	Charles W. Krueger. Software reuse. ACM Computing Surveys, 24(2):131-183, June 1992. Surveys the different approaches to software reuse and uses a taxonomy to describe and compare the approaches.
	Bruce Krulwich. Learning user interests across heterogeneous document databases. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Allan Kuchinsky, Celine Pering, Michael L. Creech, Dennis Freeze, Bill Serra, and Jacek Gwizdka. Fotofile: a consumer multimedia organization and retrieval system. In Proceedings of the Conference on Human Factors in Computing Systems CHI'99, pages 496-503, 1999.
	Bill Kules, Hyunmo Kang, Catherine Plaisant, Anne Rose, and Ben Shneiderman. Immediate usability: Kiosk design principles from the CHI 2001 photo library. Technical Report CS-TR-4293, University of Maryland, 2003.
	Anoop Kumar, Ranjani Saigal, Robert Chavez, and Nikolai Schwertner. Architecting an extensible digital repository. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. The Digital Collection and Archives (DCA) in partnership with Academic Technology (AT) at Tufts University developed a digital library solution for long-term storage and integration of existing digital collections, like Perseus, TUSK, Bolles and Artifact. In this paper, we describe the Tufts Digital Library (TDL) architecture. TDL is an extensible, modular, flexible and scalable architecture that uses FEDORA at its core. The extensible nature of the TDL architecture allows for seamless integration of collections that may be developed in the future, while leveraging the extensive tools that are available as part of individual digital library applications at Tufts. We describe the functionality and implementation details of the individual components of TDL. Two applications that have successfully interfaced with TDL are presented. We conclude with some remarks about the future development of TDL.
	Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, D. Sivakumar, Andrew Tomkins, and Eli Upfal. The web as a graph. In Proceedings of the 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 2000.
	Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Trawling the web for emerging cyber-communities. In Proceedings of the Eighth International World-Wide Web Conference, 1999. The Web harbors a large number of communities - groups of content-creators sharing a common interest - each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities - those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,O0O such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.
	Fang-Fei Kuo and Man-Kwan Shan. Looking for new, not known music only: Music retrieval by melody style. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. With the growth of digital music, content-based music retrieval (CBMR) has attracted increasingly attention. For most CBMR systems, the task is to return music objects similar to query in syntactic properties such as pitch and interval contour sequence. These approaches provide users the capability to look for music that has been heard. However, sometimes, listeners are looking, not for music they have been known, but for music that is new to them. Moreover, people sometimes want to retrieve music that feels like another music object or a music style. To the best of our knowledge, no published work investigates the content-based music style retrieval. This paper describes an approach for CBMR by melody style. We proposed four types of query specification for melody style query. The output of the melody style query is a music list ranked by the degree of relevance, in terms of music style, to the query. We developed the melody style mining algorithm to obtain the melody style classification rules. The style ranking is determined by the style classification rules. The experiment showed the proposed approach provides a satisfactory way for query by melody style.
	Pei-Jeng Kuo, Terumasa Aoki, and Hiroshi Yasuda. Building personal digital photograph libraries: An approach with ontology-based mpeg-7 dozen dimensional digital content architecture. In CGI '04: Proceedings of the Computer Graphics International (CGI'04), pages 482-489, Washington, DC, USA, 2004. IEEE Computer Society.
	Daniel Kuokka and Larry Harada. Supporting information retrieval via matchmaking. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Andrew J. Kurtz and Javed Mostafa. Topic detection and interest tracking in a dynamic online news source [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Digital libraries in the news domain may contain frequently updated data. Providing personalized access to such dynamic resources is an important goal. In this paper, we investigate the area of filtering online dynamic news sources based on personal profiles. We experimented with an intelligent news-sifting system that tracks topic development in a dynamic online news source. Vocabulary discovery and clustering are used to expose current news topics. User interest profiles, generated from explicit and implicit feedback are used to customize the news retrieval system's interface.
	Jack Kustanowitz and Ben Shneiderman. Meaningful presentations of photo libraries: Rationale and applications of bi-level radial quantum layouts. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Searching photo libraries can be made more satisfying and successful if search results are presented in a way that allows users to gain an overview of the photo categories. Since photo layouts on computer displays are the primary way that users get an overview, we propose a novel approach to show more photos in meaningful groupings. Photo layouts can be linear strips, or zoomable three dimensional arrangements, but the most common form is the flat two-dimensional grid. This paper introduces a novel bi-level hierarchical layout with motivating examples. In a bi-level hierarchy, one region is designated for primary content, which can be a single photo, text, graphic, or combination. Adjacent to that primary region, groups of photos are placed radially in an ordered fashion, such that the relationship of the single primary region to its many secondary regions is immediately apparent. A compelling aspect is the interactive experience in which the layout is dynamically resizable, allowing users to rapidly, incrementally, and reversibly alter the dimensions and content. It can accommodate hundreds of photos in dozens of regions, can be customized in a corner or center layout, and can scale from an element on a web page to a large poster size. On typical displays (1024 x 1280 or 1200 x 1600 pixels), bi-level radial quantum layouts can conveniently accommodate 2-20 regions with tens or hundreds of photos per region.
	Martha Kyrillidou and Sarah Giersch. Developing the digiqual protocol for digital library evaluation. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. The distributed, project-oriented nature of digital libraries (DLs) has made them difficult to evaluate in aggregate. By modifying the methods and tools used to evaluate traditional libraries’ content and services, measures can be developed whose results can be used across a variety of DLs. The DigiQUAL protocol being developed by the Association of Research Libraries (ARL) has the potential to provide the National Science Digital Library (NSDL) with a standardized methodology and survey instrument with which to evaluate not only its distributed projects but also to gather data to assess the value and impact of the NSDL.
	Wilburt Labio and Hector Garcia-Molina. Efficient snapshot differential algorithms for data warehousing. In Proceedings of the Twenty-second International Conference on Very Large Databases, pages 63-74, 1996.
	Wilburt Juan Labio and Hector Garcia-Molina. Efficient snapshot differential algorithms for data warehousing. In Proceedings of 22th International Conference on Very Large Data Bases, September 1996. Detecting and extracting modifications from information sources is an integral part of data warehousing. For unsophisticated sources, in practice it is often necessary to infer modifications by periodically comparing snapshots of data from the source. Although this snapshot differential problem is closely related to traditional joins and outerjoins, there are significant differences, which lead to simple new algorithms. In particular, we present algorithms that perform (possibly lossy) compression of records. We also present a window algorithm that works very well if the snapshots are not ``very different.'' The algorithms are studied via analysis and an implementation of two of them; the results illustrate the potential gains achievable with the new algorithms.
	Alberto H. F. Laender, Marcos Andr‰ Gon‡alves, and Pablo A. Roberto. Bdbcomp: Building a digital library for the brazilian computer science community. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. This paper reports initial efforts towards building BDBComp, a digital library for the Brazilian computer science community. BDBComp is based on a number of standards (e.g., OAI, Dublin Core, SQL) as well as on new technologies (e.g., Web data extraction tools), which allowed fast and easy prototyping. The paper focuses on architectural issues and specific challenges faced during the construction of this digital library as well as on proposed solutions.
	Carl Lagoze. A secure repository design for digital libraries. D-Lib Magazine, Dec 1995. Format: HTML Document().
	Carl Lagoze and Jim Davis. Dienst: An architecture for distributed document libraries. Communications of the ACM, 38(4):47, April 1995. This is the DIENST paper
	Carl Lagoze and Herbert Van de Sompel. The open archives initiative: Building a low-barrier interoperability framework. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. The Open Archive Initiative (OAI) develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. The roots of the OAI lie in the E-Print community. Over the last year its focus has been extended to include all content providers. This paper describes the recent history of the OAI - its origins in promoting E-Prints, the broadening of its focus, the details of its technical standard for metadata harvesting, the applications of this standard, and future plans
	Carl Lagoze and David Ely. Implementation issues in an open architectural framework for digital object services. Technical Report TR95-1590, Cornell University, September 1995. We provide high level designs for implementing some key aspects of the Kahn/Wilensky Framework for Distributed Digital Object Services. We focus on five aspects of the architecture: 1) Negotiation on terms and conditions initiated by requests for stored digital objects. 2) Replication of handle server data and the notion of a primary handle server, 3) The mechanisms for replicating digital objects in multiple repositories and the assertions concerning such replication. 4) The meaning of mutable and immutable states for digital objects and the mechanisms for changing these states. 5) The basic services that the Repository Access Protocol (RAP) needs to support the infrastructure.
	Carl Lagoze, Walter Hoehn, David Millman, William Arms, Stoney Gan, Diane Hillmann, Christopher Ingram, Dean Krafft, Richard Marisa, Jon Phipps, John Saylor, Carol Terrizzi, James Allen, Sergio Guzman-Lara, and Tom Kalt. Core services in the architecture of the national digital library for science education. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We describe the core components of the architecture for the (NSDL) National Science, Mathematics, Engineering, and Technology Education Digital Library. Over time the NSDL will include heterogeneous users, content, and services. To accommodate this, a design for a technical and organization infrastructure has been formulated based on the notion of a spectrum of interoperability. This paper describes the first phase of the interoperability infrastructure including the metadata repository, search and discovery services, rights management services, and user interface portal facilities.
	Carl Lagoze, Clifford A. Lynch, and Ron Daniel, Jr. The Warwick Framework: A container architecture for aggregating sets of metadata. Technical Report TR96-1593, Cornell Univ., Computer Science Dept., June 1996.
	Carl Lagoze, Clifford A. Lynch, and Ron Daniel Jr. The Warwick framework: A container architecture for aggregating sets of metadata. Technical Report TR96-1593, Cornell University, June 1996. We describe a result of the June 1996 Warwick Metadata II Workshop. This Warwick Framework is a container architecture for aggregating logically, and perhaps physically, distinct packages of metadata. This architecture allows separate administration and access to metadata packages, provides for varying syntax in each package in conformance with semantic requirements, and it promotes interoperability and extensibility by allowing tools and agents to selectively access and manipulate individual packages and ignore others. At the conclusion of the paper we propose implementations of the Framework in HTML, MIME, SGML, and distributed objects
	Carl Lagoze, Robert McGrath, Ed Overly, and Nancy Yeager. Implementation issues in an open architectural framework for digital object services. Technical Report TR95-1558, Cornell University, November 1995. We describe a distributed object-based design for repositories in a digital library infrastructure. This design for Inter-operable Secure Object Stores, ISOS, defines the interfaces to secure repositories that inter-operate with each other, clients, and other services in the infrastructure. We define the interfaces to ISOS as class definitions in a distributed object system. We also define an extension to CORBA security that is used by repositories to secure access to themselves and their contained objects.
	Catherine Lai, Ichiro Fujinaga, and Cynthia Leive. The challenges in developing digital collections of phonograph records. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. To facilitate long-term preservation and to sustain the utility of phonograph record albums, an efficient and economical workflow management system for digitization of these important cultural heritage artifacts is needed. In this paper, we describe the digitization process of creating an online digital collection and our procedure for creating the ground-truth data, which is essential for developing an efficient metadata and content capturing system. We also discuss the challenges of defining metadata for phonograph records and its packaging to facilitate new forms of online access and preservation.
	Kevin Lai, Mema Roussopoulos, Diane Tang, Xinhua Zhao, and Mary Baker. Experiences with a mobile testbed. In Proceedings of The Second International Conference on Worldwide Computing and its Applications (WWCA'98), Mar 1998.
	Jean R. Laleuf and Anne Morgan Spalter. A component repository for learning objects: A progress report. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. We believe that an important category of SMET digital library content will be highly interactive, explorable microworlds for teaching science, mathematics, and engineering concepts. Such environments have proved extraordinarily time-consuming and difficult to produce, however, threatening the goals of widespread creation and use. One proposed solution for accelerating production has been the creation of repositories of reusable software components or learning objects. Programmers would use such components to rapidly assemble larger-scale environments. Although many agree on the value of this approach, few repositories of such components have been successully created. We suggest some reasons for the lack of expected results and propose two strategies for developing such repositories. We report on a case study that provides a proof of concept of these strategies.
	W. Lam, S. Mukhopadhyay, J. Mostafa, and M. Palakal. Detection of shifts in user interests for personalized information filtering. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1996.
	Wai Lam, Pik-Shan Cheung, and Ruizhang Huang</. Mining events and new name translation from online daily news. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. We develop a system for mining events and unseen name translations from online daily Web news. This system first automatically discovers bilingual events by analyzing the content of the news stories. The discovered event can be treated as comparable bilingual news and can be used for generating name cognates. A name matching algorithm is developed to discover new unseen name translations based on phonetic and context clues. The experimental results show that our system is effective for mining new knowledge and information from online Web news.
	Wang Lam and Hector Garcia-Molina. Multicasting a web repository. In , editor, Submitted for publication, , 2000. . Available at http://dbpubs.stanford.edu/pub/2000-58. Web crawlers generate signicant loads on Web servers, and are difficult to operate. Instead of running crawlers at many client sites, we propose a central crawler and Web repository that then multicasts appropriate subsets of the central repository to clients. Loads at Web servers are reduced because a single crawler visits the servers, as opposed to all the client crawlers. In this paper we model and evaluate such a central Web multicast facility. We develop multicast algorithms for the facility, comparing them with ones for `broadcast disks.` We also evaluate performance as several factors, such as object granularity and client batching, are varied.
	Anthony LaMarca, Yatin Chawathe, Sunny Consolvo, Jeffrey Hightower, Ian Smith, James Scott, Tim Sohn, James Howard, Jeff Hughes, Fred Potter, Jason Tabert, Pauline Powledge, Gaetano Borriello, and Bill Schilit. Place lab: Device positioning using radio beacons in the wild. In Pervasive 2005: Third International Conference on Pervasive Computing, pages 116-133. Springer-Verlag, 2005.
	Roberta Lamb. Using online information resources: Reaching for the .'s. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document (54K) . Audience: Social scientists, HCI people, slight business slant. References: 39. Links: 1. Relevance: Low-medium. Abstract: A literature review of usage of on-line information resources, filling in blocks in the matrix where one axis is research outlook (includes rational, human relations, institutional, and postmodern) and the other axis is usability factors (includes HCI usability, content usability, organizational usability, interorganizational usability). Some of the reviewed studies involve situated users with tasks. Shows that online information is used much les s than predicted. .
	Edward Lank and Son Phan. Focus+context sketching on a pocket pc. In CHI '04: CHI '04 extended abstracts on Human factors in computing systems, pages 1275-1278, New York, NY, USA, 2004. ACM Press.
	Mark Lansdale and Ernest Edmonds. Using memory for events in the design of personal filing systems. International Journal of Man-Machine Studies, 36(1):97-126, 1992.
	Leah S. Larkey. A patent search and classification system. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. We present a system for searching and classifying U.S. patent documents, based on Inquery. Patents are distributed through hundreds of collections, divided up by general area. The system selects the best collections for the query. Users can search for pants or classify patent text. The user interface helps users search in fields without requiring the knowledge of Inquery query operators. The system includes a unique `phrase help` facility, which helps users find and add phrases and terms related to those in their query.
	Leah S. Larkey, Paul Ogilvie, M. Andrew Price, and Brenden Tamilio. Acrophile: An automated acronym extractor and server. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. We implemented a web server for acronym and abbreviation lookup, containing a collection of acronyms and their expansions gathered from a large number of web pages by a heuristic extraction process. Several different extraction algorithms were evaluated and compared. The corpus resulting from the best algorithm is comparable to a high-quality hand-crafted site, but has the potential to be much more inclusive as data from more web pages are processed.
	Ray R. Larson and Patricia Frontiera. Geographic information retrieval (gir) ranking methods for digital libraries. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, pages 415-415. ACM Press, 2004.
	Ray R. Larson and Patricia Frontiera. Spatial ranking methods for geographic information retrieval (GIR) in digital libraries. In Proceedings of the 8th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2005), 2004.
	Ray R. Larson, Fredric Gey, Aitao Chen, and Michael Buckland. Harvesting translingual vocabulary mappings for multilingual digital libraries. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. This paper presents a method of information harvesting and consolidation to support the multilingual information requirements for cross-language information retrieval within digital library systems. We describe a way to create both customized bilingual dictionaries and multilingual query mappings from a source language to many target languages. We will describe a multilingual conceptual mapping resource with broad coverage (over 100 written languages can be supported) that is truly multilingual as opposed to bilingual parings usually derived from machine translation. This resource is derived from the 10+ million title online library catalog of the University of California. It is created statistically via maximum likelihood associations from word and phrases in book titles of many languages to human assigned subject headings in English. The 150,000 subject headings can form interlingua mappings between pairs of languages or from one language to several languages. While our current demonstration prototype maps between ten languages (English, Arabic, Chinese, French, German, Italian, Japanese, Portuguese, Russian, Spanish), extensions to additional languages are straightforward. We also describe how this resource is being expanded for languages where linguistic coverage is limited in our initial database, by automatically harvesting new information from international online library catalogs using the Z39.50 networked library search protocol.
	Ray R. Larson and Robert Sanderson. Grid-based digital libraries: Cheshire3 and distributed retrieval. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Work at the University of California, Berkeley and the University of Liverpool in the UK is developing a Information Retrieval and Digital Library service system (Cheshire3) that will operate in both single-processor and ``Grid'' distributed computing environments. This short paper discusses the object architecture of the Cheshire3 system and discusses how it can be used for a variety of Digital Library tasks, and how it performs in a Grid processing environment.
	Steve Lawrence and C. Lee Giles. Searching the World Wide Web. Science, 280(5360):98, 1998.
	Steve Lawrence and C. Lee Giles. Accessibility of information on the web. Nature, 400:107-109, 1999.
	Gregory H. Leazer and Richard P. Smiraglia. Toward the bibliographic control of works: Derivative bibliographic relationships in an online union catalog. In Proceedings of DL'96, 1996. Format: Not yet online.
	Yvan Leclerc, Martin Reddy, Lee Iverson, and Michael Eriksen. The geoweb - a new paradigm for finding data on the web. In International Cartographic Conference (ICC2001), 2001.
	Jintae Lee and Thomas W. Malone. Partially shared views: A scheme for communicating among groups that use different type hierarchies. ACM Transactions on Information Systems, 8(1), January 1990. They use the type hierarchies to find maximally compatible types in interpreting mail messages. If both parties know about the fields in an `equipment-request', message recognition proceeds on that basis. If one of them only knows about `request' types, that coarser understanding is used.
	Wang-Chien Lee and Dik Lun Lee. Information filtering in wireless and mobile environments. In Conference Proceedings of the 1996 IEEE Fifteenth Annual International Phoenix Conference on Computers and Communications (Cat. No.96CH35917), pages 508-14, 1996. The paper describes the issue of power conservation on mobile clients, e.g., palmtop computers, and suggests that signature methods are suitable for real-time information filtering on wireless communication services. Two signature-based approaches, namely simple signature and multi-level signature schemes, are presented. Cost models for access time and tune-in time of these two approaches are developed.
	Yong Kyu Lee, Seong-Joon Yoo, Kyoungro Yoon, and P. Bruce Berra. Index structures for structured documents. In Proceedings of DL'96, 1996. Format: Not yet online.
	U. Leonhardt, J. Magee, and P. Dias. Location service in mobile computing environments. Computers & Graphics, 20(5):627-32, 1996. With the advent of mobile computing devices and cheap location sensing systems, location information has become an important resource both for mobile and desktop users'. In this paper, we describe some key concepts that a scaleable ubiquitous location service should be based on. Firstly, we show how such a service can accommodate multiple location sensing systems. Secondly, we discuss hierarchy-based access control policies as a flexible and powerful mechanism to protect users' privacy. Thirdly, we address some issues concerning the visualization of location information.
	Gondy Leroy, Hsinchun Chen, Jesse D. Martinez, Shauna Eggers, Ryan Falsey, Kerri Kislin, Zan Huang, Jiexun Li, Jie Xu, Daniel McDonald, and Gavin Ng. Genescene: Biomedical text and data mining [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. To access the content of digital texts efficiently, it is necessary to provide more sophisticated access than keyword based searching. Genescene provides biomedical researchers with research findings and background relations automatically extracted from text and experimental data. These provide a more detailed overview of the information available. The extracted relations were evaluated by qualified researchers and are precise. A qualitative ongoing evaluation of the current online interface indicates that this method to search the literature is more useful and efficient than keyword based searching.
	Alon Y. Levy, Alberto O. Mendelzon, Yehoshua Sagiv, and Divesh Srivastava. Answering queries using views. In Proceedings of the 14th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 95-104, San Jose, Calif., May 1995.
	Alon Y. Levy, Anand Rajaraman, and Joann J. Ordille. Query-answering algorithms for information agents. In Proceedings of the 13th National Conference on Artificial Intelligence, AAAI-96, Portland, Oreg., August 1996. AAAI Press, Menlo Park, Calif.
	Alon Y. Levy, Anand Rajaraman, and Joann J. Ordille. Querying heterogeneous information sources using source descriptions. In Proceedings of the Twenty-second International Conference on Very Large Databases, pages 251-262, Bombay, India, 1996. VLDB Endowment, Saratoga, Calif.
	Alon Y. Levy, Anand Rajaraman, and Jeffrey D. Ullman. Answering queries using limited external query processors. In Proceedings of the 15th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 27-37, Montreal, Canada, June 1996.
	Alon Y. Levy, Divesh Srivastava, and Thomas Kirk. Data model and query evaluation in global information systems. Journal of Intelligent Information Systems, 1995. Special Issue on Networked Information Discovery and Retrieval, 1995.
	David M. Levy. Cataloging in the digital order. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document (34K) . Audience: Net culturists, computer scientists, librarians. References: 25. Links: 1. Relevance: Low. Abstract: Describes the role of catalogers in the traditional library and their prospects in the future. Argues that catalogers impose an order, and their job requires skills that will not be mechanizable-software agents can not do what catlogers do. Raises questions about the nature of digital material with respect to permanence, maintenance, etc. And how digital materials fit in with traditional definitions (What's a publisher or edition?)
	David M. Levy. To grow in wisdom: Vannevar bush, information overload, and the life of leisure. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. It has been nearly sixty years since Vannevar Bush’s essay, “As We May Think,” was first published in The Atlantic Monthly, an article that foreshadowed and possibly invented hypertext. While much has been written about this seminal piece, little has been said about the argument Bush presented to justify the creation of the memex, his proposed personal information device. This paper revisits the article in light of current technological and social trends. It notes that Bush’s argument centered around the problem of information overload and observes that in the intervening years, despite massive technological innovation, the problem has only become more extreme. It goes on to argue that today’s manifestation of information overload will require not just better management of information but the creation of space and time for thinking and reflection, an objective that is consonant with Bush’s original aims.
	David M. Levy and Catherine C. Marshall. What color was george washington's white horse? a look at the assumptions underlying digital libraries. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document () . Audience: Non technical, digital library researchers. References: 18. Links: 2. Relevance: Medium. Abstract: Raises a number of interesting questions about directions for DL research. Eg, How can DL be integrated with paper documents? What support will there be for collaboration? Where does the DL fall on the permanent, fixed document vs. fluid, short-lived memo spectrum? Description of typical analysts working methods. Problem is not finding relevant information, but determining an appropriate subset to read to solve problem at hand. Much more collaboration than is typical acknowledged.
	Wei Li, Susan Gauch, John Gauch, and Kok Meng Pua. Vision: A digital video library. In Proceedings of DL'96, 1996. Format: Not yet online.
	Wen-Syan Li. Knowledge gathering and matching in heterogeneous databases. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Wen-Syan Li, Quoc Vu, Divakant Agrawal, Yoshinori Hara, and Hajime Takano. Powerbookmarks: A system for personalizable web information organization, sharing, and management. In Proceedings of the Eighth International World-Wide Web Conference, 1999. We extend the notion of bookmark management by introducing the functionalities of hypermedia databases. Power- Bookmarks is a Web information organization, sharing, and management tool, which parses metadata from bookmarked URLs and uses it to index and classify the URLs. PowerBookmarks supports advanced query, classification, and navigation functionalities on collections of bookmarks. PowerBookmarks monitors and utilizes users' access patterns to provide many useful personalized services, such as automated URL bookmarking, document refreshing, and bookmark expiration. It also allows users to specify their preference in bookmark management, such as ranking schemes and classification tree structures. Subscription services for new or updated documents of users' interests are also supported.
	Wen-Syan Li, Quoc Vu, Divakant Agrawal, Yoshinori Hara, and Hajime Takano. Powerbookmarks: a system for personalizable web information organization, sharing and management. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
	Yalun Li and V. Leung. Supporting personal mobility for nomadic computing over the internet. Mobile computing and communications review, 1(1):22-31, 1997. This paper presents a new paradigm for nomadic computing over the internet called universal personal computing (UPC), where mobile users can access computing resources, network services, and personalized computing environments anywhere using any available terminals. The concept of UPC and system design issues are discussed, and the required system architecture capable of managing different mobile objects, i.e. users and terminals, in the UPC environment is presented. Modifications of connection setup procedures between user application programs to enable addressing based on a global user identity are considered.
	Library of Congress. Z39.50 Profile for Simple Distributed Search and Ranked Retrieval, March 1997. Accessible at http://lcweb.loc.gov/z3950/agency/profiles/zdsr.html. Z39.50 profile based on STARTS. The profile specializes to search on metadata about documents and about search engines.
	Library of Congress. About Profiles, January 1998. Accessible at http://lcweb.loc.gov/z3950/agency/profiles/about.html. Defines concisely what a protocol 'profile' is.
	Elizabeth D. Liddy, Michael B. Eisenberg, Charles R. McClure, Kim Mills, Susan Mernit, and James D. Luckett. Research agenda for the intelligent digital library. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (40K) . Audience: Mostly non-technical, librarians, funders. References: Not included. Links: 1. Relevance: Low-Medium. Abstract: Describes an ambitious project proposal to create a digital librarian to handle natural language requests. Documents understood via TIPSTER-like system. Proposal for implementing parallel, connectionist models of t he NLP module. Also interested in impact of such a system for K-12 education.
	H. Lieberman and Hugo Liu. Adaptive linking between text and photos using common sense reasoning. In Adaptive Hypermedia and Adaptive Web-Based Systems. Second International Conference, AH 2002. Proceedings, 29-31 May 2002, Malaga, Spain, pages 2 - 11. Springer-Verlag, 2002, 2002.
	Henry Lieberman. Letizia: An agent that assists web browsing. In C.S. Mellish, editor, Proceedings of 14th International Joint Conference on Artificial Intelligence, 1995. Letizia is a user interface agent that assists a user browsing the World Wide Web. As the user operates a conventional Web browser such as Netscape, the agent tracks user behaviour and attempts to anticipate items of interest by doing concurrent, autonomous exploration of links from the user's current position. The agent automates a browsing strategy consisting of a best first search augmented by heuristics inferring user interest from browsing behaviour.
	Henry Lieberman. Autonomous interface agents. In Proceedings of the Conference on Human Factors in Computing Systems CHI'97, 1997.
	Henry Lieberman, Elizabeth Rozenweig, and Push Singh. Aria: An agent for annotating and retrieving images. Computer, 34(7):57-62, 2001.
	Ee-Peng Lim, Dion Hoe-Lian Goh, Zehua Liu, Wee-Keong Ng, Christopher Soo-Guan Khoo, and Susan Ellen Higgins. G-portal: A map-based digital library for distributed geospatial and georeferenced resources. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. As the World Wide Web evolves into an immense information network, it is tempting to build new digital library services and expand existing digital library services to make use of web content. In this paper, we present the design and implementation of G-Portal, a web portal that aims to provide digital library services over geospatial and georeferenced content found on the World Wide Web. G-Portal adopts a map-based user interface to visualize and manipulate the distributed geospatial and georeferenced content. Annotation capabilities are supported, allowing users to contribute geospatial and georeferenced objects as well as their associated metadata. The other features included in G-Portal's design are query support, content classification, and content maintenance. This paper will mainly focus on the architecture design, visualization and annotation capabilities of G-Portal.
	Joo-Hwee Lim. Learnable visual keywords for image classification. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. Automatic categorization of multimedia documents is an important function for a digital library system. While text categorization has received much attentions by IR researchers, classification of visual data is at its infancy stage. In this paper, we propose a notion of visual keywords for similarity matching between visual contents. Visual keywords can be constructed automatically from samples of visual data through supervised/unsupervised learning. Given a visual content, the occurrences of visual keywords are detected, summarized spatially, and coded via singular value decomposition to arrive at a concise coded description. The methods to create, detect, summarize, select, and code visual keywords will be detailed. Last but not least, we describe an evaluation experiment that classifies professional nature scenery photographs to demonstrate the effectiveness and efficiency of visual keywords for automatic categorization of images in digital libraries.
	Joo-Hwee Lim, Philippe Mulhem, and Qi Tian. Home photo content modeling for personalized event-based retrieval. IEEE Multimedia, 10(4):28-37, 2003.
	Xia Lin. Graphical table of contents. In Proceedings of DL'96, 1996. Format: Not yet online.
	R.J. (Jerry) Linn. Copyright and information services in the context of the national research and education network. In IP Workshop Proceedings, 1994. Format: HTML Document (34K + pictures). Audience: Computer scientists, politicians. References: 0. Links: 0. Relevance: Medium-Low. Abstract: Argues that the 1991 High Performance Computing act (which stipulates that the `Network` shall ensure copyright laws are obeyed) is unenforceable. Suggests that the responsibility for enforcement should be at the app lication level. Outlines a system of software envelopes, digital signatures, time stamps, and special purpose, limited capability viewers to ensure protection. Also proposes an amendment to HPC 1991.
	Curtis Lisle. Modeling for interaction in virtual worlds. In DAGS '95., 1995. Format: Not Yet On-line. Audience: computer scinetists, VR, DB, semi-technical. References: 13. Links: . Relevance: Low. Abstract: Argues that VR representations should no longer depend on data structures for efficient rendering, but structures that support efficient interactions with the virtual world. Namely, use an object oriented database. Having multiple participants in a VR experience is crucial, even if they're interacting at different pieces of the environment.
	Witold Litwin, Leo Mark, and Nick Roussopoulos. Interoperability of multiple autonomous databases. ACM Computing Surveys, 22(3):267-293, September 1990. Discusses the approach of multidatabase or federated systems, which make databases interoperable, i.e., usable without a globally integrated schema.
	Bede Liu, Wayne Wolf, Sanjeev Kulkarni, Andrew Wolfe, Hisashi Kobayashi, Fred Greenstein, Ira Fuchs, Arding Hsu, Farshid Arman, and Yiqing Liang. The princeton video library of politics. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (8K) . Audience: Digital library funders. References: 0. Links: 1. Relevance: Low. Abstract: Discusses the problems with searching, browsing and indexing video.
	Bin Liu, Wen Gao, Tie jun Huang, Ling Zhang, and Jun Che. Toward a distributed terabyte text retrieval system in china-us million book digital library. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. In China-US Million Book Digital Library, output of the digitalization process is more than one terabyte of text in OEB and PDF format. To access these data quickly and accurately, we are developing a distributed terabyte text retrieval system. With the query cache, system can search less data while maintaining acceptable retrieval accuracy. From the OEB package, we get its metadata and structural information to implement multi-scale indexing and retrieval. We are to explore some new ranking and relevance feedback approaches.
	Jyi-Shane Liu, Mu-Hsi Tseng, and Tse-Kai Huang</. Mediating team work for digital heritage archiving. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Building digital heritage requires substantial resources in materials, expertise, tools, and cost. Projects supported by governments and academics can only cover a small part of the world's heritage in both time and space dimensions. The preservation coverage problem is most serious in domains where sources of intellectual and cultural heritage may diminish or disappear over time. A central notion that helps resolve these issues is to facilitate global reach of digital technology to sources of valuable heritage. We propose an approach to exploit non-institutional resources for wider participation and coverage in digital heritage endeavor. The approach attempts to replicate institutional digital heritage work by teaming up non-institutional resources and providing standard practice.
	Xiaoming Liu, Kurt Maly, Mohammad Zubair, and Michael Nelson. Repository synchronization in the oai framework. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) began as an alternative to distributed searching of scholarly eprint repositories. The model embraced by the OAI-PMH is that of metadata harvesting, where value-added services (by a ``service provider'') are constructed on cached copies of the metadata extracted from the repositories of the harvester's choosing. While this model dispenses with the well known problems of distributed searching, it introduces the problem of synchronization. Stated simply, this problem arises when the service provider's copy of the metadata does not match the metadata currently at the constituent repositories. We define some metrics for describing the synchronization problem in the OAI-PMH. Based on these metrics, we study the synchronization problem of the OAI-PMH framework and propose several approaches for harvesters to implement better synchronization. In particular, if a repository knows its update frequency, it can publish it in an OAI-PMH Identify response using an optional About container that borrows from RDF Site Syndication (RSS) Format.
	Xiaoming Liu, Kurt Maly, Mohammad Zubair, and Michael L. Nelson. Dp9- an oai gateway service for web crawlers. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Many libraries and databases are closed to general-purpose web crawlers, and they expose their content only to their own search engines. At the same time many use general-purpose search engines to locate research papers. DP9 is an open source gateway service that allows general search engines, (e.g. Google, Inktomi) to index OAI-compliant archives. DP9 does this by providing consistent URLs for repository records, and converting them to OAI queries against the appropriate repository when the URL is requested. This allows search engines that do not support the OAI protocol to index the `deep web` contained within OAI compliant repositories.
	Zehua Liu, Ee-Peng Lim, Wee-Keong Ng, and Dion H. Goh. On querying geospatial and georeferenced metadata resources in gportal. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. G-Portal is a web portal system providing a range of digital library services to access geospatial and georeferenced re- sources on the Web. Among them are the storage and query subsystems that provide a central repository of metadata resources organized under different projects. In G-Portal, all metadata resources are represented in XML (Extensible Markup Language) and they are compliant to some resource schemas defined by their creators. The resource schemas are extended versions of a basic resource schema making it easy to accommodate all kinds of metadata resources while maintaining the portability of resource data. To support queries over the geospatial and georeferenced metadata re- sources, a XQuery-like query language known as RQL (Re- source Query Language) has been designed. In this paper, we present the RQL language features and provide some experimental findings about the storage design and query evaluation strategies for RQL queries.
	S. Loeb. Architecting personalized delivery of multimedia information. Communications of the ACM, 35(12):39-48, December 1992. Information filters are essential mediators between information sources and their users. In most cases, both the information sources and the information users possess no mutual knowledge that can guide them in finding the information most relevant for the users' momentary and long-term needs. Filters, which are positioned logically as 'third parties' to the communication between users and sources, should possess both the knowledge and the functionality to examine the information in the sources and to forward the information 'they judge' as relevant to individual users. The author views the information-filtering process as dependent on the application domain in which it operates and on the context in which it is used. An introduction is given to some of the dimensions which can help distinguish the variety of known filtering applications and usage scenarios and a description is given of a novel filtering model for casual users and its implementation in the Lyric-Times personalized music system. This model utilizes a stored long-term user profile and involves time explicitly in its selection criteria.
	Shoshana Loeb and Douglas Terry. Information Filtering. Communications of the ACM, December 1992. This entry is here to allow citing of this CACM issue as a whole
	Arjan Loeffen. Text databases: A survey of text models and systems. SIGMOD Record, 23(1):97-106, March 1994.
	Paul Longley, Michael Goodchild, David Maguire, and David Rhind. Geographic Information Systems and Science. John Wiley & Sons, 2001.
	Daniel Lopresti and Andrew Tomkins. Block edit models for approximate string matching. Theoretical Computer Science, 181(1):159-179, July 1997.
	Raymond A. Lorie. Long term preservation of digital information. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. The preservation of digital data for the long term presents a variety of challenges from technical to social and organization. The technical challenge is to ensure that the information, generated today, can survive long term changes in storage media, devices and data formats. This paper presents a novel approach to the problem. It distinguishes between archiving of data files and archiving of programs (so that their behavior may be reenacted in the future). For the archiving of a data file, the proposal consists of specifying the processing that needs to be performed on the data (as physically stored) in order to return the information to a future client (according to a logical view of the data). The process specification and the logical view definition are archived with the data. For the archiving of a program behavior, the proposal consists of saving the original executable object code together with the specificaiton of the processing that needs to be performed for each machine instruction of the original computer (emulation). In both cases, the processing specificatio is based on a Universal Virtual Computer that is general, yet basic enough as to remain relevant in the future.
	Raymond A. Lorie. A methodology and system for preserving digital data. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. This paper refers to a previous proposal made at the 1st Joint Conference on Digital Libraries, on a novel approach to the problem of the long-term archival of digital data. It reports on ongoing work in refining the methodology and building an initial prototype. The method is based on the use of a Universal Virtual Computer (UVC) to specify the process that needs to be applied to the archived data in order to make it understandable for a future client. There is a certain amount of information (a Convention) that must be preserved for an indefinite time, to make sure that the client will be able to recover the information. A first version of this Convention is given here; it includes the architecture of the UVC. The paper also briefly mentions our current activities in implementation and evaluation.
	A. Loui and A. E. Savakis. Automatic image event segmentation and quality screening for albuming applications. In IEEE International Conference on Multimedia and Expo, 2000.
	Alexander C. Loui and Mark D. Wood. A software system for automatic albuming of consumer pictures. In MULTIMEDIA '99: Proceedings of the seventh ACM international conference on Multimedia (Part 2), pages 159-162, New York, NY, USA, 1999. ACM Press.
	Peter G. Jeavons Louis W.G. Barton, John A. Caldwell. E-library of medieval chant manuscript transcriptions. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. In this paper we present our rationale and design principles for a distributed e-library of medieval chant manuscript transcriptions. We describe the great variety in early neumatic notations, in order to motivate a standardised data representation that is lossless and universal with respect to these musical artefacts. We present some details of the data representation and an XML Schema for describing and delivering transcriptions via the Web. We argue against proposed data formats that look simpler, on the grounds that they will inevitably lead to fragmentation of digital libraries. We plan to develop applications software that will allow users to take full advantage of the carefully designed representation we describe, while shielding users from its complexity. We argue that a distributed e-library of this kind will greatly facilitate scholarship, education, and public appreciation of these artefacts.
	J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2):22-31, 1968.
	Richard E. Lucier and Peter Brantley. The red sage project: An experimental digital journal library for the health sciences, a descriptive overview. D-Lib Magazine, Aug 1995. Format: HTML Document().
	Bertram Ludascher and Amarnath Gupta. Modeling interactive web sources for information mediation. In Intl. Workshop on the World-Wide Web and Conceptual Modeling, Paris, France (WWWCM'99). Springer-Verlag, 1999, 1999.
	H. P. Luhn. Keyword-in-context index for technical literature (kwic index). In American Documentation XI, 4, pages 288-295, 1960.
	J. Luo and A. Savakis. Indoor vs. outdoor classification of consumer photographs. In Proceedings of the international conference on Image Processing (ICIP 2001), 2001.
	Clifford Lynch and Héctor García-Molina. Iita digital libraries workshop report. Available on request, May 1995.
	Patrice A. Lyons. Access to digital objects: A communications law strategy. D-Lib Magazine, Oct 1995. Format: HTML Document().
	Michael R. Lyu, Edward Yau, and Sam Sze. A multilingual, multimodal digital video library system. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We describe a multilingual, multimodal digital video content management system, iVIEW, for intelligent searching and access of English and Chinese video contents. The iVIEW system allows full content indexing, searching and retrieval of multilingual text, audio and video material. iVIEW integrates image processing techniques for scenes and scene changes analyses, speech processing techniques for audio signal transcriptions, and multilingual natural language processing techniques for word relevance determination. iVIEW is composed of three subsystems: Video Information Processing (VIP) Subsystem for multi-modal information processing of rich video media, Searching and Indexing Subsystem for constructing XML-based multimedia representation in enhancing multi-modal indexing and searching capabilities, and Visualization and Presentation Subsystem for flexible and seamless delivery of multimedia contents in various browsing tools and devices.
	Yoelle S. Maarek and Israel Z. Ben Shaul. Webcutter: A system for dynamic and tailorable site mapping. In Proceedings of the Sixth International World-Wide Web Conference, 1997.
	Wendy E. Mackay, Thomas W. Malone, Kevin Crowston, Ramana Rao, David Rosenblitt, and Stuart K. Card. How do experienced information lens users use rules? In Proceedings of the Conference on Human Factors in Computing Systems CHI'89, 1989. Study of 13 users to show that rules work for mail presorting.
	Jock D. Mackinlay, Ramana Rao, and Stuart K. Card. An organic user interface for searching citation links. In Proceedings of the Conference on Human Factors in Computing Systems CHI'95, 1995. Crawls a citation index db and builds a 3D graph. Includes app for exploring the graph
	Benôit Macq and Jean-Jacques Quisquater. Digital images multiresolution encryption. In IP Workshop Proceedings, 1994. Format: HTML Document (11K). Audience: Computer Scientists, very technical. References: 6. Links: 0. Relevance: low. Abstract: A description of good properties of encryption algorithms applied to image data. A proposal for a technical means to achieve some of those desirable properties.
	P. Maes. Agents that reduce work and information overload. Communications of the ACM, 37(7):31-40, July 1994. The currently dominant interaction metaphor of direct manipulation requires the user to initiate all tasks explicitly and to monitor all events. This metaphor will have to change if untrained users are to make effective use of the computer and networks of tomorrow. Techniques from the field of AI, in particular so-called `autonomous agents`, can be used to implement a complementary style of interaction, which has been referred to as indirect management. Instead of user-initiated interaction via commands and or direct manipulation, the user is engaged in a cooperative process in which human and computer agents both initiate communication, monitor events and perform tasks. The metaphor used is that of a personal assistant who is collaborating with the user in the same work environment. The assistant becomes gradually more effective as it learns the user's interests, habits and preferences (as well as those of his or her community). Novice that the agent is not necessarily an interface between the computer and the user. In fact, the most successful interface agents are those that do not prohibit the user from taking actions and fulfilling tasks personally. This article focuses on a novel approach to building interface agents. It presents results from several prototype agents that have been built using this approach, including agents that provide personalized assistance with meeting scheduling, e-mail handling, electronic news filtering and selection of entertainment
	P. Maes. Agents that reduce work and information overload. Communications of the ACM, 37(7):30-40, July 1994. The currently dominant interaction metaphor of direct manipulation requires the user to initiate all tasks explicitly and to monitor all events. This metaphor will have to change if untrained users are to make effective use of the computer and networks of tomorrow. Techniques from the field of AI, in particular so-called `autonomous agents`, can be used to implement a complementary style of interaction, which has been referred to as indirect management. Instead of user-initiated interaction via commands and or direct manipulation, the user is engaged in a cooperative process in which human and computer agents both initiate communication, monitor events and perform tasks. The metaphor used is that of a personal assistant who is collaborating with the user in the same work environment. The assistant becomes gradually more effective as it learns the user's interests, habits and preferences (as well as those of his or her community).
	P. Maes. Scial interface agents: Acquiring competence by learning from users and other agents. In Proceedings of AAAI Spring Symposium Series, 1994. Interface agents are computer programs that employ artificial intelligence techniques in order to provide assistance to a user dealing with a particular computer application. The paper discusses an interface agent which has been modeled closely after the metaphor of a personal assistant. The agent learns how to assist the user by (i) observing the user`s actions and imitating them. (ii) Receiving user feedback when it takes wrong action. (iii) Being trained by the user on the basis of hypothetical examples and (iv) learning from other agents that assist other users with the same task. The paper discusses how this learning agent was implemented using memory-based learning and reinforcement learning techniques. It presents actual results from two prototype agents built using these techniques: one for a meeting scheduling application and one for electronic mail. It argues that the machine learning approach to building interface agents is a feasible one which has several advantages over other approaches: it provides a customized and adaptive solution which is less costly and ensures better user acceptability
	C. Magerkurth, M. Memisoglu, R. Stenzel, and N. Streitz. Towards the next generation of tabletop gaming experiences. In Proceedings of Graphics Interface 2004, pages 73-80, 2004.
	Thomas W. Malone, Kenneth R. Grant, Kum-Yew Lai, Ramana Rao, and David Rosenblitt. Semistructured messages are surprisingly useful for computer-supported coordination. ACM Transactions on Information Systems, 5(2):115-131, April 1987. Show how they Information Lens lets users write rules about email. Explains that semi-structured data is special
	T.W. Malone, K.R. Grant, and F. A. Turbak. The information lens: An intelligent system for information sharing in organizations. In Proceedings of the Conference on Human Factors in Computing Systems CHI'86, 1986. This is the original Information Lens paper.
	D. Maltz and K. Ehrlich. Pointing the way: Active collaborative filtering. In Proceedings of the Conference on Human Factors in Computing Systems CHI'95, New York, 1995. ACM. Collaborative filtering is based on the premise that people looking for information should be able to make use of what others have already found and evaluated. Current collaborative filtering systems provide tools for readers to filter documents based on aggregated ratings over a changing group of readers. Motivated by the results of a study of information sharing, we describe a different type of collaborative filtering system in which people who find interesting documents actively send `pointers` to those documents to their colleagues. A `pointer` contains a hypertext link to the source document as well as contextual information to help the recipient determine the interest relevance of the document prior to accessing it. Preliminary data suggest that people are using the system in anticipated and unanticipated ways, as well as creating information digests.
	K. Maly, M. Nelson, M. Zubair, A. Amrou, S. Kothamasa, L. Wang, and Rick Luce. Light-weight communal digital libraries. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. We describe Kepler, a collection of light-weight utilities that allow for simple and quick digital library construction. Kepler bridges the gap between established, organization-backed digital libraries and groups of researchers that wish to publish their findings under their control, anytime, anywhere yet have the advantage of their personal libraries. The personal libraries, or archivelets, are Open Archives Initiative (OAI) compliant and thus available for harvesting from OAI service providers. A Kepler archivelet can be installed in the order of minutes by an author on a personal machine and a Kepler group server in the order of hours.
	M.S. Manasse. The millicent protocols for electronic commerce. In Proceedings of the First USENIX Workshop of Electronic Commerce, Berkeley, CA, USA, 1995. USENIX Assoc. Cryptography can be useful to send yourself a message via an untrusted intermediary. Pushing the cost of storing infrequently accessed information off to the client allows the servers to run quickly: the network acts as a perfect prefetch unit for loading your memory with the necessary state for the next operation. By way of example, a server that has to store a lot of information about subscribers might buckle under the seek time of accessing subscriber records. Pushing it off to clients allows us to have the salient parts of the database arrive in memory precisely when needed, at the small cost of verifying that the information hasnt been tampered with. A service can help aggregate the transactions into a grain large enough to be acceptable to more conventional transaction handlers, without providing complete information about the transaction to the service. By making fund transfer explicit in each retrieval, all parties involved can instantly tell when theyre being cheated; by limiting the scale of transaction, consumers can control the level of risk they accept.
	U. Manber, M. Smith, and B. Gopal. Webglimpse: Combining browsing and searching. In Proceedings of 1997 Usenix Technical Conference, 1997. The two paradigms of searching and browsing are currently almost always used separately. One can either look at the library card catalog, or browse the shelves; one can either search large WWW sites (or the whole web), or browse page by page. In this paper we describe a software tool we developed, called WebGlimpse, that combines the two paradigms. It allows the search to be limited to a neighborhood of the current document. WebGlimpse automatically analyzes collections of web pages and computes those neighborhoods (at indexing time). With WebGlimpse users can browse at will, using the same pages; they can also jump from each page, through a search to close-by pages related to their needs.
	Udi Manber and Gene Myers. Suffix arrays: A new method for on-line string searches. In Proc. of the 1st ACM-SIAM Symposium on Discrete Algorithms, pages 319-327, 1990.
	Richard Mander, Gitta Salomon, and Yin Yin Wong. A pile metaphor for supporting casual organization of information. In CHI '92: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 627-634, New York, NY, USA, 1992. ACM Press.
	B.S. Manjunath. Image browsing in the alexandria digital library (adl) project. D-Lib Magazine, Sep 1995. Format: HTML Document().
	R. Manmatha, Chengfeng Han, E. M. Riseman, and W. B. Croft. Indexing handwriting using word matching. In Proceedings of DL'96, 1996. Format: Not yet online.
	Frank Manola. Interoperability issues in large-scale distributed object systems. ACM Computing Surveys, 27(2):268-270, June 1995. Focuses on enterprise-wide client/server systems being developed to support operational computing within large organizations to illustrate interoperability issues.
	Massimo Marchiori. The quest for correct information on the web: Hyper search engines. In Proceedings of the Sixth International World-Wide Web Conference, pages 265-276, 1997.
	R. S. Marcus. User assistance in bibliographic retrieval networks through a computer intermediary. IEEE Trans. on Systems, Man, and Cybernetics, smc-12(2):116-133, 1982.
	Richard J. Marisa. Heinonline: An online archive of law journals. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. HeinOnline is a new online archive of law journals. Development of HeinOnline began in late 1997 through the cooperation of Cornell Information Technologies, William S. Hein & Co., Inc. of Buffalo, NY., and the Cornell Law Library. Built upon the familar Dienst and new Open Archive Initiative protocols, HeinOnline extends the reliable and well-established management practices of open access archives like NCSTRL and CoRR to a subscription-based collection. The decisions made in creating HeinOnline. Dienst architectural extensions, and issues which have arisen during operation of HeinOnline are described.
	Byron Marshall and Therani Madhusudan. Element matching in concept maps. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Concept maps (CM) are informal, semantic, node-link conceptual graphs used to represent knowledge in a variety of applications. Algorithms that compare concept maps would be useful in supporting educational processes and in leveraging indexed digital collections of concept maps. Map comparison begins with element matching and faces computational challenges arising from vocabulary overlap, informality, and organizational variation. Our implementation of an adapted similarity flooding algorithm improves matching of CM knowledge elements over a simple string matching approach.
	Byron Marshall, Hua Su, Shauna Eggers, Karin Quiñones, and Hsinchun Chen. Visualizing aggregated biological pathway relations. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. The Genescene development team has constructed an aggregation interface for automatically-extracted biomedical pathway relations that is intended to help researchers identify and process relevant information from the vast digital library of abstracts found in the National Library of Medicine’s PubMed collection. Users view extracted relations at various levels of relational granularity in an interactive and visual node-link interface. Anecdotal feedback reported here suggests that this multi-granular visual paradigm aligns well with various research tasks, helping users find relevant articles and discover new information.
	Byron Marshall, Yiwen Zhang, Hsinchun Chen, Ann Lally, and Edward Fox. Convergence of knowledge management and e-learning: the getsmart experience. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. The National Science Digital Library (NSDL), launched in December 2002, is emerging as a center of innovation in digital libraries as applied to education. As a part of this extensive project, the GetSmart system was created to apply knowledge management techniques in a learning environment. The design of the system is based on an analysis of learning theory and the information search process. Its key notion is the integration of search tools and curriculum support with concept mapping. More than 100 students at the University of Arizona and Virginia Polytechnic Institute used the system in the fall of 2002. A database of more than one thousand student-prepared concept maps has been collected with more than forty thousand relationships expressed in semantic, graphical, node-link representations. Preliminary analysis of the collected data is revealing interesting knowledge representation patterns.
	Catherine C. Marshall and Sara Bly. Sharing encountered information: Digital libraries get a social life. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. People often share information they encounter in their everyday reading in the form of paper or electronic clippings. As part of a more extensive study of clipping practices, we have performed twenty contextual interviews in a variety of home and workplace settings to investigate how and why people share this encountered material with each other. Specifically, we pursued three kinds of questions: (1) how sharing encountered items fits into the broader spectrum of clipping practices; (2) the function and value of the shared information; and (3) the social role of sharing the encountered information. What we found is that sharing forms a significant use for encountered material. Furthermore, the function of these clippings extends far beyond a simple exchange of content to inform the recipient; in fact, the content itself may have little immediate value to the recipient. We also found the practice to be ubiquitous: all of our participants had both shared clippings with others and received them themselves. Our findings suggest that, from a technological standpoint, we should think beyond an email model for sharing encountered information and, from a social perspective, we should attend to how sharing this sort of material contributes to the strength of social ties outside of a traditional information needs framework.
	Catherine C. Marshall and Sara Bly. Turning the page on navigation. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. In this paper, we discuss the findings of an in-depth ethnographic study of reading and within-document navigation and add to these findings the results of a second analysis of how people read comparable digital materials on the screen, given limited navigational functionality. We chose periodicals as our initial foil since they represent a type of material that invites many different kinds of reading and strategies for navigation. Using multiple sources of evidence from the data, we first characterize readers’ navigation strategies and specific practices as they make their way through the magazines. We then focus on two observed phenomena that occur when people read paper magazines, but are absent in their digital equivalents: the lightweight navigation that readers use unselfconsciously when they are reading a particular article and the approximate navigation readers engage in when they flip multiple pages at a time. Because page-turning is so basic and seems deceptively simple, we dissect the turn of a page, and use it to illustrate the importance and invisibility of lightweight navigation. Finally, we explore the significance of our results for navigational interfaces to digital library materials.
	Catherine C. Marshall and A.J. Bernheim Brush. Exploring the relationship between personal and public annotations. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Today people typically read and annotate printed documents even if they are obtained from electronic sources like digital libraries. If there is a reason for them to share these personal annotations online, they must re-enter them. Given the advent of better computer support for reading and annotation, including tablet interfaces, will people ever share their personal digital ink annotations as is, or will they make substantial changes to them? What can we do to anticipate and support the transition from personal to public annotations? To investigate these questions, we performed a study to characterize and compare students' personal annotations as they read assigned papers with those they shared with each other using an online system. By analyzing over 1,700 annotations, we confirmed three hypotheses: (1) only a small fraction of annotations made while reading are directly related to those shared in discussion; (2) some types of annotations those that consist of anchors in the text coupled with margin notes are more apt to be the basis of public commentary than other types of annotations; and (3) personal annotations undergo dramatic changes when they are shared in discussion, both in content and in how they are anchored to the source document. We then use these findings to explore ways to support the transition from personal to public annotations.
	Catherine C. Marshall, Frank M. Shipman III, and Raymond J. McCall. Putting digital libraries to work: Issues from experience with community memories. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (43K + picture) . Audience: Non-technical, Social Scientists, CSCW developers. References: 27. Links: 1. Relevance: Low. Abstract: Discusses the notion of `community memories`, how they are created (seeding, evolutionary growth, re-seeding), how they are used & searched. Some specific discussion of particular systems like XNetwork, VIKI, JANUS, NoteCards, VNS.
	Catherine C. Marshall, Morgan N. Price, Gene Golovchinsky, and Bill N. Schilit. Introducing a digital library reading appliance into a reading group. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. How will we read digital library materials? This paper describes the reading practices of an on-going reading group, and how these practices changed when we introduced XLibris, a digital library reading appliance that uses a pen tablet computer to provide a paper-like interface. We interviewed group members about their reading practices, observed their meetings. and analyzed their annotations. both when they read a paper document and when they read using XLibris: We use these data to characterize their analytic reading, reference use, and annotation practices. We also describe the use of the Reader's Notebook, a list of clippings that XLibris computes from a reader's annotations. Implications for digital libraries stem from our findings on reading and mobility. the complexity of analytic reading, the social nature of reference following. and the unselfconscious nature of readers' annotations.
	Catherine C. Marshall, Morgan N. Price, Gene Golovchinsky, and Bill N. Schilit. Designing e-books for legal research. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. In this paper we report the findings from a field study of legal research in a first-tier law school and on the resulting redesign of XLibris, a next-generation e-book. We first characterize a work setting in which we expected an e-book to be a useful interface for reading and otherwise using a mix of physical and digital library materials, and explore what kinds of reading-related functionality would bring value to this setting. We do this by describing important aspects of legal research in a heterogeneous information environment, including mobility, reading, annotation, link following and writing practices, and their general implications for design. We then discuss how our work with a user community and an evolving e-book prototype allowed us to examine tandem issues of usability and utility, and to redesign an existing e-book user interface to suit the needs of law students. The study caused us to move away from the notion of a stand-alone reading device and toward the concept of a document laptop, a platform that would provide wireless access to information resources, as well as support a fuller spectrum of reading-related activities.
	Catherine C. Marshall and Christine Ruotolo. Reading-in-the-small: A study of reading on small form factor devices. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. The growing ubiquity of small form factor devices such as Palm Pilots and Pocket PCs, coupled with widespread availability of digital library materials and users' increasing willingness to read on the screen, raises the question of whether people can and will read digital library materials on handhelds. We investigated this question by performing a field study based on a university library's technology deployment in an academic environment: two classes were conducted using materials that were available in e-book format on Pocket PCs in addition to other electronic and paper formats. The handheld devices, the course materials, and technical support were all provided to students in the courses to use as they saw fit. We found that the handhelds were a good platform for reading secondary materials, excerpts, and shorter readings; they were used in a variety of circumstances where portability is important, including collaborative situations such as the classroom. We also discuss the effectiveness of annotation, search, and navigation functionality on the small form factor devices. We conclude by defining a set of focal areas and issues for digital library efforts designed for access by handheld computers.
	Patrick Martin, Ian A. Macleod, and Brent Nordin. A design of a distributed full text retrieval system. In Proceedings of the Ninth International Conference on Research and Development in Information Retrieval, pages 131-137, September 1986.
	Philippe Martin and Peter Eklund. Embedding knowledge in web documents. In Proceedings of the Eighth International World-Wide Web Conference, 1999. The paper argues for the use of general and intuitive knowledge representation languages (and simpler notational variants, e.g. subsets of natural languages) for indexing the content of Web documents and representing knowledge within them. We believe that these languages have advantages over metadata languages based on the Extensible Mark-up Language (XML). Indeed, the retrieval of precise information is better supported by languages designed to represent semantic content and support logical inference, and the readability of such a language eases its exploitation, presentation and direct insertion within a document (thus also avoiding information duplication). We advocate the use of Conceptual Graphs and simpler notational variants that enhance knowledge readability. To further ease the representation process, we propose techniques allowing users to leave some knowledge terms undeclared. We also show how lexical, structural and knowledge-based techniques may be combined to retrieve or generate knowledge or Web documents. To support and guide the knowledge modeling approach, we present a top-level ontology of 400 concept and relation types. We have implemented these features in a Web-accessible tool named WebKB, and show examples to illustrate them.
	T. H. Martin. A feature analysis of interactive retrieval systems. Report SU-COMM-ICR-74-1, Institute of Communication Research, Stanford Univ., Stanford, Calif., September 1974.
	David Martland. Developing and using documentation tools for setext. In DAGS '95, 1995. Format: Not Yet On-line. Audience: Authors. References: 14. Relevance: Low-Medium. Abstract: Describes an alternative markup language called Setext. Simpler than HTML and LaTeX, it relies on special formatting characters like to indicate italics , directives (which allow specifying destinations of links) , and positions of formatting (eg, character in position 1). LaTeX and HTML may be automatically generated from Setext, and plain Setext is more readable than either for plain ASCII.
	Barry M. Massarsky. The operating dynamics behind ascap, bmi and sesac, the u.s. performing rights societies. In IP Workshop Proceedings, 1994. Format: HTML Document (25K) . Audience: Business, IP people, non-technical. References: 0. Links: 0. Relevance: Low-Medium. Abstract: Discusses the historical convergence to the current mechanism of music royalties, including some statistics about the monitoring procedure of ASCAP. Argues that the licensing model is to be preferred to the transaction model for multimedia.
	Toshiyuki Masui. An efficient text input method for pen-based computers. In Proceedings of the Conference on Human Factors in Computing Systems CHI'98, pages 328-335, 1998.
	Kineo Matsui and Kiyoshi Tanaka. Video-steganography: How to secretly embed a signature in a picture. In IP Workshop Proceedings, 1994. Format: ARTICLE NOT AVAILABLE.
	Hermann Maurer and Klaus Schmaranz. J.ucs and extensions as paradigm for electronic publishing. In DAGS '95, 1995. Format: Not Yet On-line. Audience: Journal authors, referees, readers and editors. References: 9. Relevance: Low-medium. Abstract: Describes an implemented system for on-line journal publication. Based primarily on Hyper-G, a web browser with support for access control and a separate link database, rather than having links in the text (thereby allowing links on PostScript). Journal is replicated on servers world-wide. All articles are refereed, may only be changed by adding refereed annotations. Supports both hypertext and PostScript formats. Uses ACM computing reviews k eywords.
	James Mayfield, Yannis Labrou, and Tim Finin. Desiderata for agent communication languages. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	A. McCallum, K. Nigam, J. Rennie, and K. Seymore. Building domain-specific search engines with machine learning techniques. In Proceedings of the AAAI Spring Symposium on Intelligent Agents in Cyberspace, 1999.
	Alexa T. McCray. Knowledge-based biomedical information retrieval. In Proceedings of DL'96, 1996. Format: Not yet online.
	Daniel McDonald and Hsinchun Chen. Using sentence-selection heuristics to rank text segments in txtractor. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. TXTRACTOR is a tool that uses established sentence-selection heuristics to rank text segments, producing summaries that contain a user-defined number of sentences. We hypothesize that ranking text segments via traditional sentence-selection heuristics produces a balanced summary with more useful information than one produced by using segmentation alone. The proposed summary is created in a three-step process, which includes 1) sentence evaluation 2) segment identification and 3) segment ranking. As the required length of the summary changes, low-ranking segments can then be dropped from (or higher ranking segments added to) the summary. We compare the output of TXTRACTOR to the output of a segmentation tool based on the TextTiling algorithm to validate the approach.
	Verne E. McFarland and Steven Wyman. Public access to epa superfund records - a digital alternative. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document (19K). Audience: Government, not too technical, except for system specs. References: 14 . Links: 4. Relevance: Low. Abstract: Describes the benefits of putting government records on CD-ROM. Lower storage costs, greater access, less risk of missing records.
	Robert E. McGrath. Caching for large scale systems: Lessons from the www. D-Lib Magazine, Jan 1996. Format: HTML Document().
	Robert E. McGrath, Joe Futrelle, and Ray Plante. Digital library technology for locating and accessing scientific data. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. In this paper we describe our efforts to bring scientific data into the digital library. This has required extension of the standard WWW, and also the extension of metadata standards far beyond the Dublin Core. Our system demonstrates this technology for real scientific data from astronomy.
	Kathleen McKeown, David Millman, Brian Donnelly, James Hoover, Robert McClintock, Willem Scholten, Dimitris Anastassiou, Shih-Fu Chang, Alan Croswell, Mukesh Dalal, Steven Feiner, Paul Kantor, Judith Klavans, and Mischa Schwartz. The janus digital library. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (33K) . Audience: non-technical, funders. References: 26. Links: 1. Relevance: Low-Medium. Abstract: Project description of the JANUS library centered at Columbia. Focuses on user interface, generation of NL summaries, multimedia searching, some intellectual property.
	Kathleen R. McKeown, Shih-Fu Chang, James Cimino, Steven K. Feiner, Carol Friedman, Luis Gravano, Vasileios Hatzivassiloglou, Steven Johnson, Desmond A. Jordan, Judith L. Klavans, Andre Kushniruk, Vimla Patel, and Simone Teufel. Persival, a system for personalized search and summarization over multimedia healthcare information. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. In healthcare settings, patients need access to online information that can help them understand their medical situation. Physicians need information that is clinically relevant to an individual patient. In this paper, we present our progress on developing a system, PERSIVAL, that is designed to provide personalized access to a distributed patient care digital library. Using the secure, online patient records at New York Presbyterian Hospital as a user model, PERSIVAL's components tailor search, presentation and summarization of online multimedia information to both patients and healthcare providers.
	Kathleen R. McKeown, Noemie Elhadad, and Vasileios Hatzivassiloglou. Leveraging a common representation for personalized search and summarization in a medical digital library. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Despite the large amount of online medical literature, it can be difficult for clinicians to find relevant information at the point of patient care. In this paper, we present techniques to personalize the results of search, making use of the online patient record as a sophisticated, pre-existing user model. Our work in PERSIVAL, a medical digital library, includes methods for re-ranking the results of search to prioritize those that better match the patient record. It also generates summaries of the re-ranked results which highlight information that is relevant to the patient under the physician's care. We focus on the use of a common representation for the articles returned by search and the patient record which facilitates both the re-ranking and the summarization tasks. Taken together, this common approach to both tasks has a strong positive effect on the amount of personalization which is possible.
	Cliff McKnight. Digital library research at loughborough: The last fifteen years. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document(27K) . Audience: Librarians, Journal readers, authors, and publishers. References: 26. Links: 2. Relevance: Low. Abstract: Describes several projects (BLEND, EVELYN, QUARTET, ADONIS, TEJ) taking place in the UK, going back to 1981. Results were rather pessimistic, with computing resources (esp. screen technology) typically not being suff icient to support what was really necessary. On-line publication was no faster than paper publication (referee reports still took as long) and there was the question of whether on-line publication counts toward tenure. One benefit wa s the immediacy with which on-line discussion of articles occured.
	Cliff McKnight, Jack Meadows, David Pullinger, and Fytton Rowland. Elvyn-publisher and library working towards the electronic distribution and use of journals. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (24K) . Audience: Non-technical, librarians (though some jargon is used). References: 9. Links: 1. Relevance: Low-Medium. Abstract: Describes an existing project of journal distribution to libraries via electronic formats. Claims that libraries & publishers still have a role in the new model.
	Flora McMartin and Youki Terada. Digital library services for authors of learning materials. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Digital libraries, particularly those designed to meet the needs of educators and students, focus their primary services on the needs of their end users [1]. In this paper, we introduce and discuss the types of services authors of the materials cataloged within this type of digital library expect, or may find useful. Results from a study of authors cataloged in NEEDS - a national engineering education digital library, guide this discussion and illustrate the challenges associated with meeting the needs of authors who are critical to the continued growth of educational digital libraries.
	Rodger J. McNab, Lloyd A. Smith, Ian H. Witten, Clare L. Henderson, and Sally Jo Cunningham. Towards the digital music library: Tune retrieval from acoustic input. In Proceedings of DL'96, 1996. Format: Not yet online.
	Giansalvatore Mecca, Paolo Atzeni, Alessandro Masci, Paolo Merialdo, and Giuseppe Sindoni. The ARANEUS web-base management system. In Proceedings of the International Conference on Management of Data, pages 544-546, 1998.
	Richard A. Medina, Lloyd A. Smith, and Debra R. Wagner. Content-based indexing of musical scores. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Score-based music information retrieval (MIR) systems search databases of musical scores to find occurrences of a musical query pattern. Several such systems have been developed based on exhaustive approximate search of the database. Exhaustive search does not scale well, however, to large databases. This paper describes a method of automatically creating a content-based index of musical scores. The goal is to capture the themes, or motifs, that appear in the music. User queries, then, would be matched only over the index, rather than over the entire database. The method was tested by building an index of 25 orchestral movements from the classical music literature. For every movement, the system captured the primary theme, or a variation of the primary theme. In addition, it captured 13 of 28 secondary themes. The resulting index was 14By deleting musical patterns not found early in the music, it is possible to reduce the size of the index to 2discards secondary themes. A listening experiment using five symphonic movements showed that people can reliably recognize secondary themes after listening to a piece of music therefore, it may be necessary to retain secondary themes in a score index.
	R. Medina-Mora and K.W. Cartron. Actionworkflow in use: Clark county department of business license. In S.Y.W. Su, editor, Proceedings of the Twelfth International Conference on Data Engineering, Los Alamitos, CA, 1996. IEEE Computer Society Press. We present the basic concepts of ActionWorkflow and a study of a successful implementation in Clark County Department of Business License. The image/workflow system reengineers a labyrinthine licensing system into simplistic processes that are more customer oriented, yield superior productivity, establish a work in progress tracking mechanism, and archive the resulting licensing processes permanently on an unalterable optical storage system.
	R. Medina-Mora, Terry Winograd, Rodrigo Flores, and Fernando Flores. The actionworkflow approach to workflow management technology. In J. Turner and R. Kraut, editors, CSCW '92, New York, 1992. ACM. Describes the ActionWorkflow approach to workflow management technology: this is a design methodology and associated computer software for the support of work in organizations. The approach is based on theories of communicative activity as language/action, and has been developed in a series of systems for coordination among users of networked computers. This paper describes the approach, gives an example of its application, and shows the architecture of a workflow management system based on it.
	G. Medvinsky and C. Neuman. Netcash: A design for practical electronic currency on the internet. In Proceedings of the Second ACM Conference on Computer and Communication Security, 1994.
	Sergey Melnik. Declarative mediation in distributed systems. In Proceedings of the International Conference on Conceptual Modeling (ER'00), Salt Lake City, October 2000.
	Sergey Melnik, Hector Garcia-Molina, and Andreas Paepcke. A mediation infrastructure for digital library services. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Available at http://dbpubs.stanford.edu/pub/1999-25. Digital library mediators allow interoperation between diverse information services. In this paper we describe a flexible and dynamic mediator infrastructure that allows mediators to be composed from a set of modules (``blades''). Each module implements a particular mediation function, such as protocol translation, query translation, or result merging. All the information used by the mediator, including the mediator logic itself, is represented by an RDF graph. We illustrate our approach using a mediation scenario involving a Dienst and a Z39.50 server, and we discuss the potential advantages and weaknesses of our framework.
	Sergey Melnik, Sriram Raghavan, Beverly Yang, and Hector Garcia-Molina. Building a distributed full-text index for the web. Technical Report SIDL-WP-2000-0140; 2000-29, Stanford Digital Library Project, Computer Science Department, Stanford University, July 2000. Available at http://dbpubs.stanford.edu/pub/2000-29. We identify crucial design issues in building a distributed inverted index for a large collection of web pages. We introduce a novel pipelining technique for structuring the core index- building system that substantially reduces the index construction time. We also propose a storage scheme for creating and managing inverted files using an embedded database system. We propose and compare different strategies for addressing various issues relevant to distributed index construction. Finally, we present performance results from experiments on a testbed distributed indexing system that we have implemented.
	Sergey Melnik, Sriram Raghavan, Beverly Yang, and Hector Garcia-Molina. Building a distributed full-text index for the web. In Proceedings of the Tenth International World-Wide Web Conference, 2001.
	Massimo Melucci and Nicola Orio. Musical information retrieval using melodic surface. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. The automatic best-match and content-based retrieval of musical documents against musical queries is addressed in this paper. By `musical documents` we mean scores or performances, while musical queries are supposed to be inserted by final users using a musical interface (GUI or MIDI keyboard). Musical documents lack of separators necessary to detect `lexical units` like text words. Moreover there are many variants of a musical phrase between different works. The paper presents a technique to automatically detect musical phrases to be used as content descriptors, and conflate musical phrase variants by extracting a common stem. An experimental study reports on the results of indexing and retrieval tests using the vector-space model. The technique can complement catalogue-based access whenever the user is unable to use fixed values, or he would find performances or scores being `similar` in content to known ones.
	Massimo Melucci and Nicola Orio. Evaluating automatic melody segmentation aimed at music information retrieval. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. The main contribution of this paper is an investigation on the effects of exploiting melodic features for automatic melody segmentation aimed at content-based music retrieval. We argue that segmentation based on melodic features is more effective than random or n-grams-based segmentation, which ignore any context. We have carried out an experiment employing experienced subjects. The manual segmentation result has been processed to detect the most probable boundaries in the melodic surface, using a probabilistic decision function. The detected boundaries have then been compared with the boundaries detected by an automatic procedure implementing an algorithm for melody segmentation, as well as by a random segmenter and by a n-gram-based segmenter. Results showed that automatic segmentation based on melodic features is closer to manual segmentation than algorithms that do not use such information.
	Filippo Menczer, Richard K. Belew, and Wolfram Willuhn. Artificial life applied to adaptive information agents. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	T. Miah and O. Bashir. Mobile workers: access to information on the move. Computing & Control Engineering Journal, 8(5):215-23, Oct 1997. As the development of pen computing' continues, more and more of today's computers are likely gradually to move away from people's desktops and into their pockets. The development of personal digital assistants (PDAs) has initiated this move. As these devices move into people's pockets, they need the ability to access information on the move. This article describes a generic view of a client server mobile computing architecture. It also sheds some light on the basic network topologies that have been considered previously for such systems. The scenario used is a hospital ward. Each doctor is equipped with a PDA and each ward or a group of wards with a server providing patient records. As a doctor visits a patient in a ward, the patient's record is accessed from the server onto the PDA. The doctor updates the record and sends the update back to the server.
	Giorgio De Michelis and M. Antonietta Grasso. Situating conversations within the language/action perspective: The Milan conversation model. In Proceedings of the Conference on Computer-Supported Cooperative Work, CSCW'94, 1994.
	Baldonado Michelle, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke. The Stanford Digital Library metadata architecture. International Journal of Digital Libraries, 1997. See also http://dbpubs.stanford.edu/pub/1997-71.
	Terry Winograd Michelle Q. Wang Baldonado. Hi-cites: dynamically created citations with active highlighting. In Proceedings of the Conference on Human Factors in Computing Systems CHI'98, 1998. The original SenseMaker interface for information exploration [2] used tables to present heterogeneous document descriptions. In contrast, printed bibliographies and World Wide Web (WWW) search engines use formatted citations to convey this information. In this paper, we discuss hi-cites, a new interface construct developed for SenseMaker that combines the benefits of tables (which encourage the comparison of descriptions) and citations (which facilitate browsing). Hi-cites are dynamically created citations with active highlighting. They are useful in environments where heterogeneous structured descriptions must be browsed and compared with ease. Examples beyond digital libraries include product catalogs, classified advertisements, and WWW search engines. We have performed an evaluation of hi-cites, tables, and citations for tasks involving single attribute comparisons in the digital-library domain. This evaluation supports our claim that hi-cites are valuable for both comparison and skimming tasks in this environment.
	Sun Microsystems. Java commerce home page. JavaSoft website: http://java.sun.com/commerce/.
	Francis Miksa and Philip Doty. Intellectual realities and the digital library. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML document (24K) Audience: Non-technical, librarians. References: 4. Links: 1. Relevance: Low. Abstract: Asks how a digital library is a library. Considers different aspects of libraries and to what extent they are matched by a DL.
	Brett Milash, Catherine Plaisant, and Anne Rose. Lifelines: visualizing personal histories. In Conference companion on Human factors in computing systems, pages 392-393. ACM Press, 1996.
	Thomas L. Milbank. Extracting geometry from digital models in a cultural heritage digital library [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. This paper describes research to enhance the integration between digital models and the services provided by the document management systems of digital libraries. Processing techniques designed for XML texts are applied to X3D models, allowing specific geometry to be automatically retrieved and displayed. The research demonstrates that models designed on object-oriented paradigms are most easily exploited by XML document management systems.
	B. Miller, J. Riedl, and J. Konstan. Experiences with grouplens: Making usenet useful again. In Proceedings of Usenix Winter Technical 1997 Conference, 1997.
	Eric Miller. An introduction to the resource description framework. D-Lib Magazine, May 1998. http://www.dlib.org/dlib/may98/miller/05miller.html.
	George A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. The Psychological Review, 63:81-97, 1956.
	John A. Miller, Amit P. Sheth, Krys J. Kochut, and Xuzhong Wang. Corba-based run-time architectures for workflow management systems. Journal of Database Management, Special Issue on Multidatabases, 7(1):16-27, 1996.
	R. Miller and K. Bharat. Sphinx: a framework for creating personal site-specific web crawlers. In Proceedings of the 7th World Wide Web Conference, 1998. Crawlers, also called robots and spiders, are programs that browse the World Wide Web autonomously. This paper describes SPHINX, a Java toolkit and interactive development environment for Web crawlers. Unlike other crawler development systems, SPHINX is geared towards developming crawlers that are Web-site specific, personally customized, and relocatable. SPHINX allows site-specific crawling rules to be encapsulated and reused in content analyzers, known as classifiers. Personal crawling tasks can be performed in the Crawler Workbench, an interactive environment for crawler development and testing. For efficiency, relocatable crawlers developed using SPHINX can be uploaded and executed on a remote Web server.
	Robert C. Miller and Krishna Bharat. Sphinx: a framework for creating personal, site-specific web crawlers. In Proceedings of the Seventh International World-Wide Web Conference, 1998.
	S. Milliner and A. Bouguettaya. Data discovery in large scale heterogeneous and autonomous databases. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	Timothy J. Mills, David Pye, David Sinclair, and Kenneth R. Wood. Shoebox: A digital photo management system. Technical Report 2000.10, AT&T Laboratories Cambridge, 2000.
	David Mimno, Alison Jones, and Gregory Crane. Finding a catalog: Generating analytical catalog records from well-structured digital texts. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. One of the criticisms library users often make of catalogs is that they rarely include information below the bibliographic level. It is generally impossible to search a catalog for the titles and subjects of particular chapters or volumes. There has been no way to add this information to catalog records without exponentially increasing the workload of catalogers. This paper describes how initial investments in full text digitization and structural markup combined with current named entity extraction technology can efficiently generate the detailed level of catalog data that users want, at no significant additional cost. This system is demonstrated on an existing digital collection within the Perseus Digital Library.
	Natalia Minibayeva and Jon W. Dunn. A digital library data model for music. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. In this paper, we introduce a data and metadata model for music content. We discuss motivations for creation of the model, outline its basic structure, and discuss how it will be used in a music digital library system which we is presently being built.
	William H. Mischo, Thomas G. Habing, and Timothy W. Cole. Integration of simultaneous searching and reference linking across bibliographic resources on the web. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Libraries and information providers are actively developing customized portals and gateway software designed to integrate secondary information resources such as A & I services, online catalogs, and publishers full-text repositories. This paper reports on a project carried out at the Grainger Engineering Library at the University of Illinois at Urbana-Champaign to provide web-based asynchronous simultaneous searching of multiple secondary information resources and integrated reference linking between bibliographic resources. The project has tested two different approaches to simultaneous broadcast searching. One approach utilizes custom distributed searchbots and shared blackboard databases. The other approach uses event- driven asynchronous HTTP queries within a single web script. The reference linking implementation is built around the application of OpenURL and Digital Object Identifier (DOI) technologies and the CrossRef metadata database within a proxy server environment.
	Patrick C. Mitchell. A note about the proximity operators in information retrieval. In Proceedings of ACM SIGPLAN-SIGIR Interface Meeting, pages 177-180, Gaithersburg, Md., November 1973. ACM Press, New York.
	Stefano Mizzaro. Relevance: The whole history. Journal of the American Society for Information Science, 48(9):810-832, 1997. Historical roundup of thoughts on relevance. Nicely written. Covers three eras: 17th century-1958; 1958-1976; 1977-1997; Lists the essentials of about 160 papers, summarizing their respective contributions for several aspects of relevance.
	P. Mockapetris. Domain names - concepts and facilities. Technical Report RFC 1034, Network Working Group, November 1987. At http://info.internet.isi.edu/in-notes/rfc/files/rfc1034.txt. Introduction to DNS, their use for Internet mail and host address support, and the protocols and servers used to implement domain name facilities. RFC 1035 provides implementation detail. RFC 1101 replaces it; however, without reading this, it's hard t understand RFC 1101.
	P. Mockapetris. Domain names - implementation and specification. Technical Report RFC 1035, Network Working Group, November 1987. At http://info.internet.isi.edu/in-notes/rfc/files/rfc1035.txt. Describes the details of the domain system and protocol (assumes one is familiar with RFC 1034.
	P. Mockapetris. Dns encoding of network names and other types. Technical Report RFC 1101, Network Working Group, April 1989. At http://info.internet.isi.edu/in-notes/rfc/files/rfc1101.txt. Presents 2 extensions to DNS: A specific method for entering and retrieving data records which map between network names and numbers. Ideas for a general method for describing mappings between arbitrary identifiers and numbers. One can't understand it w/o reading RFC 1034. Other RFCs related with DNS: 973, 1123, 1348.
	William E. Moen and John Perkins. The cultural heritage information online project: Demonstrating access to distributed cultural heritage museum information. In Proceedings of DL'96, 1996. Format: Not yet online.
	A. Moffat and T. Bell. In situ generation of compressed inverted files. Journal of the American Society for Information Science, 46(7):537-550, 1995.
	A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems, 14(4):349-379, October 1996.
	Baback Moghaddam, Qi Tian, Neal Lesh, Chia Shen, and Thomas S. Huang. Visualization and user-modeling for browsing personal photo libraries. International Journal of Computer Vision, 56(1-2):109-130, 2004.
	Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander Krishnamurthy. Potential benefits of delta encoding and data compression for http. In Proceedings of ACM SIGCOMM, pages 181-194, 1997. An extended version appears as Research Report 97/4, Digital Equipment Corporation Western Research Laboratory July, 1997.
	Chuang-Huc Moh, Ee-Peng Lim, and Wee-Keong Ng. Re-engineering structures from web documents. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. To realise a wide range of applications (including digital libraries) on the Web, a more structured way of accessing the Web is required and such requirement can be facilitated by the use of XML standard. In this paper, we propose a general framework for reverse engineering (or re-engineering) the underlying structures i.e., the DTD from a collection of similarly structured XML documents when they share some common but unknown DTDs. The essential data structures and experiments on real Web collections have been conducted to demonstrate their feasibility. In addition, we also proposed a method of imposing a constraint on the repetitiveness on the elements in a DTD rule to further simplify the generated DTD without compromising their correctness.
	Carlos Monroy, Richard Furuta, and Enrique Mallen. Visualizing and exploring picasso's world [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. We discuss the preliminary use of a visualization tool called Interactive Timeline Viewer (ItLv) in visualizing and exploring a collection of art works by Pablo Ruiz Picasso. Our data set is composed of a subset of the Online Picasso Project, a significantly-sized on-line art repository of the renowned Spanish artist. We also include a brief discussion about how this visualization tool can help art scholars to study and analyze an artist's life and works.
	Raymond J. Mooney. Content-based book recommending using learning for text categorization. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Recommender systems improve access to relevant products and information by making personalized suggestions based on previous examples of a user's likes and dislikes. Most existing recommender systems use collaborative filtering methods that base recommendations on other users' preferences. By contrast, content-based methods use information about an item itself to make suggestions. This approach has the advantage of being able to recommend previously unrated items to users with unique interests and to provide explanations for its recommendations. We describe a content-based book recommending system that utilizes information extraction and a machine-learning algorithm for text categorization. Initial experimental results demonstrate that this approach can produce accurate recommendations.
	T. P. Moran, P. Chiu, W. van Melle, and G. Kurtenbach. Implicit structures for pen-based systems within a freeform interaction paradigm. In Human Factors in Computing Systems. CHI'95 Conference Proceedings, pages 487-94, 1995. Presents a scheme for extending an informal, pen-based whiteboard system (Tivoli on the Xerox LiveBoard) to provide a structured editing capability without violating its free expression and ease of use. The scheme supports list, text, table and outline structures over handwritten scribbles and typed text. The scheme is based on the system temporarily perceiving the implicit structure that humans see in the material, which is called a WYPIWYG (what you perceive is what you get) capability. The design techniques, principles, trade-offs and limitations of the scheme are discussed. A notion of `freeform interaction` is proposed to position the system with respect to current user interface techniques.
	Neema Moraveji. Improving video browsing with an eye-tracking evaluation of feature-based color bars. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. This paper explains a method for leveraging the standard video timeline widget as an interactive visualization of image features. An eye-tracking experiment is described with results that indicate that such a widget increases task efficiency without increasing complexity while being easily learned by experiment participants.
	M. Morita and Y. Shinoda. Information filtering based on user behavior analysis and best match text retrieval. In Proceedings of 17th International Conference on Research and Development in Information Retrieval, New York, 1994. ACM. Information filtering systems have potential power that may provide an efficient means of navigating through a large and diverse data space. However, current information filtering technology heavily depends on a user's active participation for describing his interest in information items, forcing him to accept an extra load to overcome the already loaded situation. Furthermore, because the user's interests are often expressed in a discrete format, such as a set of keywords, sometimes augmented with if-then rules, it is difficult to express ambiguous interests, which users often want to do. We propose a technique that uses user behavior monitoring to transparently capture the user's information interests, and a technique to use these interests to filter incoming information in a very efficient way. It is verified by conducting a field experiment and a series of simulations that the proposed techniques perform very well.
	M. Morita and Y. Shinoda. Information filtering based on user behavior analysis and best match text retrieval. In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994.
	M. R. Morris, Andreas Paepcke, and Terry Winograd. Teamtag: Exploring centralized versus replicated controls for co-located groupware. Technical Report , Stanford University, , 2005.
	M. R. Morris, Andreas Paepcke, and Terry Winograd. Teamsearch: Comparing techniques for co-present collaborative search of digital media. In The First IEEE International Workshop on Horizontal Interactive Human-Computer System (TableTop2006), 2006. Available at http://dbpubs.stanford.edu/pub/2006-1. Interactive tables can enhance small-group co-located collaborative work in many domains. One application enabled by this new technology is co-present, collaborative search for digital content. For example, a group of students could sit around an interactive table and search for digital images to use in a report. We have developed TeamSearch, an application that enables this type of activity by supporting group specification of Boolean-style queries. We explore whether TeamSearch should consider all group members activities as contributing to a single query or should interpret them as separate, parallel search requests. The results reveal that both strategies are similarly efficient, but that collective query formation has advantages in terms of enhancing group collaboration and awareness, allowing users to bootstrap query-specification skills, and personal preference. This suggests that team-centric UIs may offer benefits beyond the staples of efficiency and result quality that are usually considered when designing search interfaces.
	Meredith Ringel Morris, Anqi Huang, Andreas Paepcke, and Terry Winograd. Cooperative gestures: Multi-user gestural interactions for co-located groupware. In Submitted for publication., 2005. Multi-user, touch-sensing input devices [7] create opportunities for the use of cooperative gestures multiuser gestural interactions for single display groupware. Cooperative gestures are interactions where the system interprets the gestures of more than one user as contributing to a single, combined command. Cooperative gestures can be used to enhance users sense of teamwork, increase awareness of important system events, facilitate reachability and access control on large, shared displays, or add a unique touch to an entertainment-oriented activity. This paper discusses motivating scenarios for the use of cooperative gesturing and describes some initial experiences with CollabDraw, a system for collaborative art and photo manipulation. We identify design issues relevant to cooperative gesturing interfaces, and present a preliminary design framework. We conclude by identifying directions for future research on cooperative gesturing interaction techniques.
	Scott Morris, Alan Morris, and Kobus Barnard. Digital trail libraries. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. We propose the idea of an online, user submitted digital library of recreation trails. Digital libraries of trails offer advantages over paper guidebooks in that they are more accurate, dynamic and not limited to the experience of the author(s). The basic representation of a trail is a GPS track log, recorded as recreators travel on trails. As users complete trips, the GPS tracklogs of their trips are submitted to the central library voluntarily. A major problem is that track logs will overlap and intersect each other. We present a method for the combination of overlapping and intersecting GPS track logs into a network of GPS trails. Each trail segment in the network can then be attributed by automatic and manual means, producing a digital library of trails. We also describe the TopoFusion system which creates, manages and visualizes GPS data, including GPS networks.
	Yueyu Fu Javed Mostafa and Kazuhiro Seki. Protein association discovery in biomedical literature [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Protein association discovery can directly contribute toward developing protein pathways; hence it is a significant problem in bioinformatics. LUCAS (Library of User-Oriented Concepts for Access Services) was designed to automatically extract and determine associations among proteins from biomedical literature. Such a tool has notable potential to automate database construction in biomedicine, instead of relying on experts' analysis. This paper reports on the mechanisms for automatically generating clusters of proteins. A formal evaluation of the system, based on a subset of 2000 MEDLINE titles and abstracts, has been conducted against Swiss-Prot database in which the associations among concepts are entered by experts manually.
	Rajeev Motwani and Prabakar Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
	A. Moukas and P. Maes. Amalthaea: An evolving information filtering and discovery system for the www. Journal of Autonomous Agents and Multi-Agent Systems, 1998. Amalthaea is an evolving, multiagent ecosystem for personalized filtering, discovery and monitoring of information sites. Amalthaea's primary application domain is the World-Wide-Web and its main purpose is to assist its users in finding interesting information. Two different categories of agents are introduced in the system: filtering agents that model and monitor the interests of the user and discovery agents that model the information sources. A market-like ecosystem where the agents evolve, compete and collaborate is presented: agents that are useful to the user or other agents reproduce while low- performing agents are destroyed. Results from various experiments with different system configurations and varying ratios of user interests versus agents in the system are presented. Finally issues like fine-tuning the initial parameters of the system and establishing and maintaining equilibria in the ecosystem are discussed.
	Steven Moyer. N.o.d.e.s. report. WWW, 1996. There is an explosion of information QUANTITY without the capacity to identify the QUALITY of that information. NODES is a planned software system which helps you identify the people and information which fit your needs. It does this by keeping records of which pieces of information on the network are valuable to you, based on your judgments, and making correlations with other users. A public domain database of ratings is essential for this concept to work. Projections are made on a daily basis for each user, based on what other users who share the same interests rate as `good` or `excellent.` Each projection is able to be rated. Therefore, over time NODES becomes better and better at finding quality information for you. NODES can be directly configured to operate the way you wish. You can also answer any number of questionnaires, interest surveys, and personality tests to build the database which NODES works with.
	Xiangming Mu, Gary Marchionini, and Amy Pattee. The interactive shared educational environment: User interface, system architecture and field study. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. The user interface and system architecture of a novel Interactive Shared Educational Environment (ISEE) are presented. Based on a lightweight infrastructure, ISEE enables relatively low bandwidth network users to share videos as well as the text messages. Smartlink is a new concept introduced in this paper. Individual information presentation components, like the video player and text chat room, are smartly linked together through video timestamps and hyperlinks. A field study related to children book selections using ISEE was conducted. The results indicated that the combination of three information presentation components, including video player with storyboard, shared browser, and text chat room, provided an effective and more comfortable collaboration and learning environment for the given tasks than text reviews or text chat alone or in combination. The video player was the most preferred information component. Text abstract in the chat room that did not synchronize with the video content distracted some participants due to limited cognitive capacity. Using smartlink to synchronize various information components or channels is our attempt to reduce the userÆs working memory load in information enriched distance learning environments made possible by digital libraries.
	Adrienne Muir. Legal deposit of digital publications: A review of research and development activity. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. There is a global trend towards extending legal deposit to include digital publications in order to maintain comprehensive national archives. However, including digital publications in legal deposit regulation is not enough to ensure the long-term preservation of these publications. Concepts, principles and practices accepted and understood in the print environment, may have new meanings or no longer be appropriate in a networked environment. Mechanisms for identifying, selecting and depositing digital material either do not exist, or are inappropriate, for some kinds of digital publication. Work on developing digital preservation strategies is at an early stage. National and other deposit libraries are at the forefront of research and development in this area, often working in partnership with other libraries, publishers and technology vendors. Most work is of a technical nature. There is some work on developing policies and strategies for managing digital resources. However, not all management issues or users needs are being addressed.
	Kazunori Muraki and Kenji Satoh. Penstation: Easy access to relevant facts without retrieving. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: PostScript Document (). Audience: Digital library researchers, esp. HCI & natural language. References: 16. Links: 0. Relevance: Low-medium. Abstract: Describes a work-group based document editing facility which provides for `episodic indexing`, allowing search & retrieval based on editing events (such as previous cutting & pasting, searching, etc). Also notifies author when someone from the workgroup cites his work. Finally, there is a feature which does predictive searching. As you type, it does approximate parsing (in Japanese) and searches for information related to what you're typing.
	Lisa D. Murphy. Information product evaluation as asynchronous communication in context: A model for organizational research. In Proceedings of DL'96, 1996. Format: Not yet online.
	Ray L. Murray. Toward a metadata standard for digitized historical newspapers. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. This paper is a case study of metadata development in the early stages of the National Digital Newspaper Program, a twenty-year digital initiative to expand access to historical newspapers in support of research and education. Some of the issues involved in newspaper metadata are examined, and a new XML-based standard is described that is suited to the large volume of data, while remaining flexible into the future.
	B. Myers, H. Stiel, and R. Gargiulo. Collaboration using multiple pdas connected to a pc. In Proceedings of the Conference on Computer-Supported Cooperative Work, CSCW'98, pages 285-294, 1998.
	Brad A. Myers, Juan P. Casares, Scott Stevens, Laura Dabbish, Dan Yocum, and Albert Corbett. A multi-view intelligent editor for digital video libraries. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. Silver is an authoring tool that aims to allow novice users to edit digital video. The goal is to make editing of digital video as easy as text editing. Silver provides multiple coordinated views, including project, source, outline, subject, storyboard, textual transcript and timeline views. Selections and edits in any view are synchronized with all other views. A variety of recognition algorithms are applied to the video and audio content and then are used to aid in the editing tasks. The Informedia Digital Library supplies the recognition algorithms and metadata used to support inelligent editing, and Informedia also provides search and a repository. The metadata includes shot boundaries and a time-synchronized transcript, which are used to support intelligent selection and intelligent cut/copy/paste
	A. Myka and U. Guntzer. Fuzzy full-text searches in ocr databases. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	Barbee T. Mynatt, Laura Marie Leventhal, Keith Instone, John Farhat, and Diane S. Rohlman. Hypertext or book: Which is better for answering questions? In Proceedings of the Conference on Human Factors in Computing Systems CHI'92, 1992.
	Jin-Cheon Na, Christopher S.G. Khoo, Syin Chan, and Norraihan Bte Hamzah. Sentiment-based search in digital libraries. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Several researchers have developed tools for classifying/ clustering Web search results into different topic areas, and to help users identify relevant results quickly in the area of interest. They mainly focused on topical categorization, such as sports, movies, travel, computers, etc. This study is in the area of sentiment classification - automatically classifying on-line review documents according to the overall sentiment expressed in them. A challenging aspect is that while topics are often identifiable by keywords alone, sentiment can be expressed in a more subtle manner which means sentiment requires more natural language understanding than the usual topic-based classification. A prototype system has been developed to assist users to quickly focus on recommended (or non-recommended) information by automatically classifying Web search results into four categories: positive, negative, neutral, and non-review documents, by using an automatic classifier based on a supervised machine learning algorithm, Support Vector Machine (SVM).
	Mor Naaman, Hector Garcia-Molina, and Andreas Paepcke. Evaluation of delivery techniques for dynamic web content (extended version). Technical report, Stanford University, June 2002. Available at http://dbpubs.stanford.edu/pub/2002-31.
	Mor Naaman, Hector Garcia-Molina, and Andreas Paepcke. Evaluation of delivery techniques for dynamic web content. Technical Report 2003-7, Stanford University, 2003. Available at http://dbpubs.stanford.edu/pub/2003-7.
	Mor Naaman, Hector Garcia-Molina, and Andreas Paepcke. Evaluation of esi and class-based delta encoding. In 8th International Workshop on Web Content Caching and Distribution (IWCW 2003), 2003. Available at http://dbpubs.stanford.edu/pub/2003-61.
	Mor Naaman, Hector Garcia-Molina, Andreas Paepcke, and Ron B. Yeh. Leveraging context to resolve identity in photo albums. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, pages 178-187, New York, NY, USA, 2005. ACM Press. Available at http://dbpubs.stanford.edu/pub/2005-10. Our system suggests likely identity labels for photographs in a personal photo collection. Instead of using face recognition techniques, the system leverages automatically-available context, like the time and location where the photos were taken. Based on time and location, the system automatically computes event and location groupings of photos. As the user annotates some of the identities of people in their collection, patterns of re-occurrence and co-occurrence of different people in different locations and events emerge. The system uses these patterns to generate label suggestions for identities that were not yet annotated. These suggestions can greatly accelerate the process of manual annotation. We obtained ground-truth identity annotation for four different photo albums, and used them to test our system. The system proved effective, making very accurate label suggestions, even when the number of suggestions for each photo was limited to five names, and even when only a small subset of the photos was annotated.
	Mor Naaman, Susumu Harada, QianYing Wang, Hector Garcia-Molina, and Andreas Paepcke. Context data in geo-referenced digital photo collections. In Proceedings of the 12th International Conference on Multimedia (MM2004). ACM Press, October 2004.
	Mor Naaman, Susumu Harada, QianYing Wang, and Andreas Paepcke. Adventures in space and time: Browsing personal collections of geo-referenced digital photographs. Technical report, Stanford University, April 2004. Available at http://dbpubs.stanford.edu/pub/2004-26.
	Mor Naaman, Andreas Paepcke, and Hector Garcia-Molina. From where to what: Metadata sharing for digital photographs with geographic coordinates. In 10th International Conference on Cooperative Information Systems (CoopIS), 2003.
	Mor Naaman, Yee Jiun Song, Andreas Paepcke, and Hector Garcia-Molina. Automatic organization for digital photographs with geographic coordinates. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. We describe PhotoCompas, a system that utilizes the time and location information embedded in digital photographs to automatically organize a personal photo collection. PhotoCompas produces browseable location and event hierarchies for the collection. This organization is created using algorithms that interleave time and location to produce an organization that mimics the way people think about their photo collections. In addition, our algorithm annotates the generated hierarchy with geographical names. We tested our approach on several real-world collections and verified that the results are meaningful and useful for the collection owners.
	Mor Naaman, Yee Jiun Song, Andreas Paepcke, and Hector Garcia-Molina. Automatically generating metadata for digital photographs with geographic coordinates. In Proceedings of the Thirteenth International World-Wide Web Conference, 2004.
	Marc Nanard and Jocelyne Nanard. Cumulating and sharing end users knowledge to improve video indexing in a video digital library. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. In this paper, we focus on a user driven approach to improve video indexing. It consists in cumulating the large amount of small, individual efforts done by the users who access information, and to provide a community management mechanism to let users share the elicited knowledge. This technique is currently being developed in the `OPALES` environment and tuned up at the `Institut National de l'Audiovisuel` (INA), a National Video Library in Paris, to increase the value of its partimonial video archive collections. It relies on a portal providing private workspaces to end users, so that a large part of their work can be shared between them. The effort for interpreting documents is directly done by the expert users who work for their own job on the archives. OPALES provides an original notion of `point of view` to enable the elicitation and the sharing of knowledge between communities of users, without leading to messy structures. The overall result consists in linking exportable private metadata to archive documetns and managing the sharing of the elicited knowledge between users communities.
	Shiva Narayanan and Hector Garcia-Molina. Computing iceberg queries efficiently. Technical report, Stanford University, 1998. Iceberg queries are queries about the 'most' of or 'biggest' of a set.
	Chandra Narayanaswami and M. T. Raghunath. Expanding the reach of the digital camera. IEEE Computer, 37(12):65-73, December 2004.
	S. Narayanaswamy, S. Seshan, E. Amir, E. Brewer, R. W. Brodersen, F. Burghardt, A. Burstein, Yuan-Chi Chang, A. Fox, J. M. Gilbert, R. Han, R. H. Katz, A. C. Long, D. G. Messerschmitt, and J. M. Rabaey. A low-power, lightweight unit to provide ubiquitous information access application and network support for infopad. IEEE Personal Communications, 3(2):4-17, Apr 1996. Some of the most important trends in computer systems are the emerging use of multimedia Internet services, the popularity of portable computing, and the development of wireless data communications. The primary goal of the InfoPad project is to combine these trends to create a system that provides ubiquitous information access. The system is built around a low-power, lightweight wireless multimedia terminal that operates in indoor environments and supports a high density of users. The InfoPad system uses a number of innovative techniques to provide the high-bandwidth connectivity, portability, and user interface needed for this environment. The article describes the design, implementation, and evaluation of the software network and application services that support the InfoPad terminal. Special applications, type servers, and recognizers are developed for the InfoPad system. This software is designed to take advantage of the multimedia capabilities of the portable terminal and the additional computational resources available on the servers. The InfoNet system provides low-latency, high bandwidth connectivity between the computation and the portable terminal. It also provides the routing and handoff support that allows users to roam freely. The performance measurements of the system show that this design is a viable alternative, especially in the indoor environment.
	Bonnie A. Nardi and Vicki L. O'Day. Intelligent agents: What we learned in the library. Libri, 46(2), 1996.
	David A. Nation, Catherine Plaisant, Gary Marchionini, and Anita Komlodi. Visualizing web sites using a hierarchical table of contents browser: Webtoc. In Proceedings of the third conference on Human Factors and the Web, 1997.
	National Information Standards Organization. Z39.58-1992 Common Command Language for Online Interactive Information Retrieval. NISO Press, Bethesda, Md., 1993.
	National Information Standards Organization. Information Retrieval (Z39.50): Application Service Definition and Protocol Specification (ANSI/NISO Z39.50- 1995). NISO Press, Bethesda, Md., 1995. Accessible at `http://lcweb.loc.gov/z3950/agency/`.
	Gonzalo Navarro and Ricardo Baeza-Yates. A language for queries on structure and contents of textual databases. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 93-101, Seattle, Wash., July 1995. ACM Press, New York.
	Douglas D. Nebert and James Fullton. Use of the isite z39.50 software to search and retrieve spatially- referenced data. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document(38K + pictures). Audience: Technical, Geographic Information Systems. References: 7. Links: 11. Relevance: Low. Abstract: Discusses the problems of geographic data (tough to index using text keywords), and an approach based on using fields of a Z39.50 database corresponding to bounding rectangle coordinates. By simple comparisons, a document can be judged to be contained within, overlapping, or disjoint with a target area. Describes a prototype, and debates the merits of a server or client based approach to supporting geographic data.
	A. E. Negus. Development of the Euronet-Diane Common Command Language. In Proceedings 3rd Int'l Online Information Meeting, pages 95-98. Learned Information, Oxford, U.K., 1979.
	Theodor Holm Nelson. A publishing and royalty model for networked documents. In IP Workshop Proceedings, 1994. Format: HTML Document (6K). Audience: Auhtors, publishers, users. References: 1. Links: 0. Relevance: Low-medium. Abstract: Describes the ideas behind Xanadu. Works available at a pay- per-byte rate set by publisher. Embedding of documents (called transclusion) results in original author being paid.
	Raja Neogi and Arindam Saha. High performance idct-based video decompression algorithm and architecture. In DAGS '95, 1995. Not Yet On-line. Nor in proceedings.
	B.C. Neuman and G. Medvinsky. Requirements for network payment: The netcheque perspective. In Proceedings of IEEE COMPCON, Mar 1995. Secure methods of payment are needed before we will see widespread commercial use of the Internet. Recently proposed and implemented payment methods follow one of three models: electronic currency, credit-debit, and secure credit card transactions. Such payment services have different strengths and weaknesses with respect to the requirements of security, reliability, scalability, anonymity, acceptability, customer base, flexibility, convertibility, efficiency, ease of integration with applications, and ease of use. NetCheque is a payment system based on the credit-debit model. NetCheque is described and its strengths with respect to these requirements are discussed.
	Christine M. Neuwirth et al. Computer support for distributed collaborative writing: Defining parameters of interaction. In Richard Furuta and Christine Neuwirth, editors, CSCW '94, New York, 1994. ACM. This paper reports research to define a set of interaction parameters that collaborative writers will find useful. Our approach is to provide parameters of interaction and to locate the decision of how to set the parameters with the users. What is new is the progress we have made outlining task management parameters, notification, scenarios of use, as well as some implementation architectures.
	Fernando A. Das Neves and Edward A. Fox. A study of user behavior in an immersive virtual environment for digital libraries. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. In this paper we present a 2x3 factorial design study evaluating the limits and differences on the behavior of 10 users when searching in a virtual reality representation that mimics the arrangement of a traditional library. The focus of this study was the effect of clustering techniques and query highlighting on search strategy users develop in the virtual environment, and whether position or spatial arrangement influenced user behavior. We found several particularities that can be attributed to the differences in the VR environment. This study's results identify: 1) the need of co-designing both spatial arrangement and interaction method; 2) a difficulty novice users faced when using clusters to identify common topics; 3) the influence of position and distance on users' selection of collection items to inspect; and 4) that users did not search until they found the best match, but only until they found a satisfactory match.
	Gregory B. Newby and Charles Franks. Distributed proofreading [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Distributed proofreading allows many people working individually across the Internet to contribute to the proofreading of a new electronic book. This paper describes Project Gutenberg's Distributed Proofreading project, along with our general procedures for creating an electronic book from a physical book. Distributed proofreading has promise for the future of Project Gutenberg, and is likely to be a useful strategy for other digital library projects.
	Paula S. Newman. Exploring discussion lists: Steps and directions. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. This paper describes some new facilities for exploring archived email-based discussion lists. The facilities exploit some specific properties of email messages to obtain improved archive overviews, and then use new tree visualizations, developed for the purpose, to obtain thread overviews and mechanisms to aid in the coherent reading of threads. We consider these approaches to be limited, but useful, approximations to more ideal facilities; a final section suggests directions for further work in this area.
	Anh NgocVo and Alistair Moffat. Compressed inverted files with reduced decoding overheads. In Proceedings of the Twenty-First Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 290-297, August 1998.
	Daniel Siaw Weng Ngu and Xindong Wu. Sitehelper: A localized agent that helps incremental exploration of the world wide web. In Proceedings of the Sixth International World-Wide Web Conference, 1997.
	A. Nica and E. A. Rundensteiner. Uniform structured document handling using a constraint-based object approach. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	Oscar Nierstrasz. Composing active objects. In P. Wegner G. Agha and A. Yonezawa, editors, Research Directions in Concurrent Object-Oriented Programming, pages 151-171. MIT Press, 1993.
	Mark H. Nodine. Language learning via the world wide web. In DAGS '95, 1995. Format: HTML Document (29K + pictures). Audience: People interested in learning Welsh, serving a multimedia course. References: 8. Links: 9. Abstract: Describes a web-based self-paced Welsh language course. Lessons written in setext, mailed in ASCII or converted to HTML. Conversion process automated to add index, etc using Perl scripts.
	Cathie Norris, Elliot Soloway, and June M. Abbas. Middle school children's use of the artemis digital library. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Middle school student's interaction within a digital library is explored. Research into the differential use of interface features by students is presented. Issues of representation and retrieval obstacles are examined. A mechanism for evaluating user's search terms and questions is explained. Findings of current research indicate that student's interaction with the system varied between individual classes and between different achievement levels. Terms used by the system to represent the resources do not adequately represent the user groups' information needs.
	Chris North, Ben Shneiderman, and Catherine Plaisant. User controlled overviews of an image library: A case study of the visible human. In Proceedings of DL'96, 1996. Format: Not yet online.
	David Notkin, Norman Hutchinson, Jan Sanislo, and Michael Schwartz. Heterogeneous computing environments: Report on the acm sigops workshop on accommodating heterogeneity. Communications of the ACM, 30(2):24-32, February 1987. This paper reports a workshop conducted in December 1985 as a forum for an international group of fifty researchers to discuss the technical issues surrounding heterogeneous computing environments. In particular, it discusses five basic topics of heterogeneity: interconnection, filing, authentication, naming, and user interfaces.
	Alexandros Ntoulas, Petros Zerfos, and Junghoo Cho. Downloading textual hidden web content through keyword queries. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. An ever-increasing amount of information on the Web today is available only through search interfaces: the users have to type in a set of keywords in a search form in order to access the pages from certain Web sites. These pages are often referred to as the Hidden Web or the Deep Web. Since there are no static links to the Hidden Web pages, search engines cannot discover and index such pages and thus do not return them in the results. However, according to recent studies, the content provided by many Hidden Web sites is often of very high quality and can be extremely valuable to many users. In this paper, we study how we can build an effective Hidden Web crawler that can autonomously discover and download pages from the Hidden Web. Since the only ``entry point'' to a Hidden Web site is a query interface, the main challenge that a Hidden Web crawler has to face is how to automatically generate meaningful queries to issue to the site. Here, we provide a theoretical framework to investigate the query generation problem for the Hidden Web and we propose effective policies for generating queries automatically. Our policies proceed iteratively, issuing a different query in every iteration. We experimentally evaluate the effectiveness of these policies on 4 real Hidden Web sites and our results are very promising. For instance, in one experiment, one of our policies downloaded more than 90% of a Hidden Web site (that contains 14 million documents) after issuing fewer than 100 queries.
	Peter J. Nuernberg, Richard Furuta, John J. Leggett, Catherine C. Marshall, and Frank M. Shipman III. Digital libraries: Issues and architectures. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document (33K + pictures) . Audience: Computer scientists, somewhat technical. References: 13. Links: 1. Relevance: Low-medium. Abstract: Argues that digital libraries is an intersection of a number of different fields, with a distinct research agenda from any of them. Divides areas of study into Objects, Meta-objects, and processes, 3 categories whic h may exist for translations of traditional library things to digital library things, or for exclusively new digital library things. Describes a client/server architecture, and shows how it is instantiated for the problem of personali zing information display in a web-based environment.
	Tim Oates, M.V. Nagendra Prasad, Victor R. Lesser, and Keith Decker. A distributed problem solving approach to cooperative information gathering. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Object Management Group. The Common Object Request Broker: Architecture and specification. Accessible at `ftp://omg.org/pub/CORBA`, Dec 1993.
	Virginia E. Ogle and Michael Stonebraker. Chabot: Retrieval from a relational database of images. Computer, 28(9):40-48, 1995.
	Kazhuo Oku. Palmscape. Palmscape website: http://palmscape.ilinx.co.jp/.
	Frank Oldenettel and Michael Malachinski. Integrating digital libraries into learning environments: The leboned approach. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. This paper presents the project LEBONED that focuses on the integration of digital libraries and their contents into web-based learning environments. We describe in general how the architecture of a standard learning management system has to be modified to enable the integration of digital libraries. An important part of this modification is the 'LEBONED Metadata Architecture' which depicts the handling of metadata and documents imported from digital libraries. The main components of this architecture and their interrelation are presented in detail. Afterwards we show a practical application of the concepts described before: The integration of the digital library 'eVerlage' into the learning environment 'Blackboard'.
	T.W. Olle. Impact of standardization work on the future of information technology. In Nobuyoshi Terashima and Edward Altman, editors, IFIP World Conference on IT Tools, pages 97-105. Chapman & Hall, September 1996. This paper presents the way in which international standards for information technology are organized, and what are the driving forces behind such standards. The paper abstract on the criteria for success of IT standards and suggests shortcomings in the current approach to standardization that need to be rectified to enable complete interoperability in the future.
	M. Olson, K. Bostic, and M. Seltzer. Berkeley DB. In Proc. of the 1999 Summer Usenix Technical Conf., June 1999.
	Erik Oltmans, Raymond J. van Diessen, and Hilde van Wijngaarden. Preservation functionality in a digital archive. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Early 2003 the digital archiving system of the National Library of the Netherlands (KB) was taken into production. This system is called the e-Depot and its technical heart is the IBM system called Digital Information Archiving System (DIAS). The e-Depot is built according to the recommendations in the OAIS reference model and is dedicated to the long-term storage of, and access to large quantities of digital publications. To control safe storage and provide for future rendering of the digital documents, extra functionality was needed. Therefore, at the same time the system was taken into production, a joint KB/IBM project group started with the design, development and implementation of the Preservation Manager. This system provides the functionality for monitoring the technical environment needed to render the electronic resources stored in DIAS. In this paper we present the design of the Preservation Manager, its rationale, and the way it is used within the operational digital archiving environment of the KB e-Depot.
	Byung-Won On, Dongwon Lee, Jaewoo Kang, and Prasenjit Mitra. Comparative study of name disambiguation problem using a scalable blocking-based framework. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. In this paper, we consider the problem of ambiguous author names in bibliographic citations, and comparatively study alternative approaches to identify name variants (e.g., ``Vannevar Bush'' and ``V. Vush''). Based on two-step framework, where step 1 is to substantially reduce the number of candidates via blocking, and step 2 is to measure the distance of two names via coauthor information. Combining four blocking methods and seven distance measures on four data sets, we present an extensive experimental results, and identify a few combinations that are scalable and effective.
	Open directory. http://www.dmoz.org.
	Joann J. Ordille. Information gathering and distribution in nomenclator. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Jason Orendorf and Charles Kacmar. A spatial approach to organizing and locating digital libraries and their content. In Proceedings of DL'96, 1996. Format: Not yet online.
	François Pacull et al. Duplex: A distributed collaborative editing environment in large scale. In Richard Furuta and Christine Neuwirth, editors, CSCW '94, New York, 1994. ACM. DUPLEX is a distributed collaborative editor for users connected through a large-scale environment such as the Internet. Large-scale implies heterogeneity, unpredictable communication delays and failures, and inefficient implementations of techniques traditionally used for collaborative editing in local area networks. To cope with these unfavorable conditions, DUPLEX proposes a model based on splitting the document into independent parts, maintained individually and replicated by a kernel. Users act on document parts and interact with co-authors using a local environment providing a safe store and recovery mechanisms against failures or divergence with co-authors. Communication is reduced to a minimum, allowing disconnected operation. Atomicity, concurrency, and replica control are confined to a manageable small context.
	V. N. Padmanabhan and L. Qiu. The content and access dynamics of a busy web site: Findings and implications. In Proceedings of ACM SIGCOMM, Stockholm, Sweden, 2000.
	Andreas Paepcke. PCLOS: A Flexible Implementation of CLOS Persistence. In S. Gjessing and K. Nygaard, editors, Proceedings of the European Conference on Object-Oriented Programming. Lecture Notes in Computer Science, Springer Verlag, 1988. pages: 374-389
	Andreas Paepcke. PCLOS: A Critical Review. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages and Applications, 1989. Uses PCLOS as a roadmap through issues of obj persistence. This replaces paep89b, which is the techreport edition.
	Andreas Paepcke. An object-oriented view onto public, heterogeneous text databases. In Proceedings of the Ninth International Conference on Data Engineering, 1993. This describes the model of the interface to Dialog
	Andreas Paepcke. An object-oriented view onto public, heterogeneous text databases. In Proceedings of the Ninth International Conference on Data Engineering, pages 484-493, Vienna, Austria, April 1993. IEEE Computer Society, Washington, D.C.
	Andreas Paepcke. Information needs in technical work settings and their implications for the design of computer tools. Computer Supported Cooperative Work: The Journal of Collaborative Computing, 5:63-92, 1996.
	Andreas Paepcke. Searching is not enough: What we learned on-site. D-Lib Magazine, May 1996.
	Andreas Paepcke, Michelle Baldonado, Chen-Chuan K. Chang, Steve Cousins, and Hector Garcia-Molina. Using distributed objects to build the stanford digital library infobus. IEEE Computer, 32(2):80-87, February 1999. Similar version available at http://dbpubs.stanford.edu/pub/2000-50. Title = Building the InfoBus. We review selected technical challenges addressed in our digital library project. Our InfoBus, a CORBA-based distributed object infrastructure, unifies access to heterogeneous document collections and information processing services. We organize search access using a protocol (DLIOP) that is tailored for use with distributed objects. A metadata architecture supports novel user interfaces and query translation facilities. We briefly explain these components and then describe how technology choices such as distributed objects, commercial cataloguing schemes and Java, helped and hindered our progress. We also describe the evolution of our design tradeoffs.
	Andreas Paepcke, Robert Brandriff, Greg Janee, Ray Larson, Bertram Ludaescher, Sergey Melnik, and Sriram Raghavan. Search middleware and the simple digital library interoperability protocol. DLIB Magazine, March 2000. Available at http://dbpubs.stanford.edu/pub/2000-53. We describe our Simple Digital Library Interoperability Protocol (SDLIP), which allows clients to query information sources in a uniform syntax. The protocol was developed in a collaboration between Stanford, the Universities of California at Berkeley, and Santa Barbara, the San Diego Supercomputer Center, and the California Digital Library. In addition to introducing the protocol, we describe several of our design choices, and compare them with the choices made in other search middleware approaches. The protocol allows for both stateful and stateless operation, supports multiple query languages, and defines a simple XML-based return format. A default query language that is included in SDLIP follows the evolving IETF DASL 'basicsearch'. This is an XML-encoded language reminiscent of SQL, but adjusted for use in full-text environments. SDLIP can be used with CORBA or HTTP.
	Andreas Paepcke, Robert Brandriff, Greg Janee, Ray Larson, Bertram Ludaescher, Sergey Melnik, and Sriram Raghavan. Search middleware and the simple digital library interoperability protocol (long version). Technical Report SIDL-WP-2000-0134, Stanford University, February 2000. Available at http://dbpubs.stanford.edu/pub/2000-52. We describe our Simple Digital Library Interoperability Protocol (SDLIP), which allows clients to query information sources in a uniform syntax. The protocol was developed in a collaboration between Stanford, the Universities of California at Berkeley, and Santa Barbara, the San Diego Supercomputer Center, and the California Digital Library. In addition to introducing the protocol, we describe several of our design choices, and compare them with the choices made in other search middleware approaches. The protocol allows for both stateful and stateless operation, supports multiple query languages, and defines a simple XML-based return format. A default query language that is included in SDLIP follows the evolving IETF DASL 'basicsearch'. This is an XML-encoded language reminiscent of SQL, but adjusted for use in full-text environments. SDLIP can be used with CORBA or HTTP.
	Andreas Paepcke, Chen-Chuan K. Chang, Hector Garcia-Molina, and Terry Winograd. Interoperability for digital libraries worldwide. Communications of the ACM, 41(4), April 1998. Accessible at http://dbpubs.stanford.edu/pub/1998-24. Discusses the history and current directions of interoperability in different parts of computing systems relevant to Digital Libraries
	Andreas Paepcke, Steve B. Cousins, Héctor García-Molina, Scott W. Hassan, Steven K. Ketchpel, Martin Röscheisen, and Terry Winograd. Using distributed objects for digital library interoperability. IEEE Computer Magazine, 29(5):61-68, May 1996. Standard citation for InfoBus
	Andreas Paepcke, Hector Garcia-Molina, Gerard Rodriguez, and Junghoo Cho. Beyond document similarity: Understanding value-based search and browsing technologies. SIGMOD Records, 29(1): , March 2000. Available at http://dbpubs.stanford.edu/pub/2000-5. High volumes of diverse documents on the Web are overwhelming search and ranking technologies that are based on document similarity measures. The increase of multimedia data within documents sharply exacerbates the shortcomings of these approaches. Recently, research prototypes and commercial experiments have added techniques that augment similarity-based search and ranking. These techniques rely on judgments about the value of documents. Judgments are obtained directly from users, are derived by conjecture based on observations of user behavior, or are surmised from analyses of documents and collections. All these systems have been pursued independently, and no common understanding of the underlying processes has been presented. We survey existing value-based approaches, develop a reference architecture that helps compare the approaches, and categorize the constituent algorithms. We explain the options for collecting value metadata, and for using that metadata to improve search, ranking of results, and the enhancement of information browsing. Based on our survey and analysis, we then point to several open problems.
	Andreas Paepcke, Hector Garcia-Molina, and Gerard Rodriquez. Collaborative value filtering on the web. kss. In Proceedings of the Seventh International World-Wide Web Conference, 1998. Additional citation info: Computer and ISDN Systems (1998). Volume 30, Numbers 1-7, April 1998
	Andreas Paepcke, QianYing Wang, Sheila Patel, Matthew Wang, and Harada Susumu. A cost-effective three-in-one pda input control. Technical Report 2003-60, Stanford University, August 2003. Available at http://dbpubs.stanford.edu/pub/2003-60. We attach an inexpensive pressure sensor to the side of a PDA and use it as three input devices at once. Users can squeeze the device to provide near-continuous input to applications. At the same time the drivers interpret a sudden full squeeze as the push of a virtual button. A user's sudden pressure release while squeezing is detected as the push of a second virtual button. We briefly describe our hardware and signal processing techniques. The remainder of the writing describes an experiment that explores whether users can cope cognitively with the 3-in-1 control. We compare against a three-control setup consisting of a jog wheel and two physical buttons. We show that the 3-in-1 control enables a 13 time over the three-control, but that the 3-in-1 suffers a 4 in the accuracy of users choosing between the two buttons in response to cues from an application. We show that a good choice of application cue is more important for assuring accuracy in the 3-in-1 than in the more traditional set of separate controls. In particular, we examined four types of cues. One abstract cue, one cue with clear semantic relevance to the application, one symbolic, and one textual cue.
	Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Computer Science Department, Stanford University, 1998.
	Casey Palowitch and Darin Stewart. Automating the structural markup process in the conversion of print documents to electronic texts. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document(20K) . Audience: Librarians, SGML translators. References: 12. Links: 1. Relevance: Low. Abstract: Describes a project for generating an SGML version of an 1849 medical text. Commercial OCR software that also gives geometric position, font size info is augmented with Perl to infer section boundaries. Yields a 40 erroneous tags.
	Bing Pan, Geri Gay, John M. Saylor, Helene Hembrooke, and David Henderson. Usability, learning, and subjective experience: User evaluation of k-moddl in an undergraduate class. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. This paper describes an evaluation effort of the use of the Kinematic Model for Design Digital Library (K-MODDL) in an undergraduate mathematics class. Based on activity theory and CIAO! framework, multiple methods were used to evaluate students' learning and subjective experience, as well as usability problems they encountered while using various physical and digital objects. The research confirmed the value of K-MODDL and the usefulness of the digital objects in facilitating learning and revealed interesting relationships among usability, learning, and subjective experience.
	Georgia Panagopoulou, Spiros Sirmakessis, and Athanasios Tsakalidis. Ph model: A persistent approach to versioning in hypertext systems. In DAGS '95, 1995. Format: Not Yet On-line. Audience: Hypertext users and developers. References: 20. Relevance: Medium-High. Abstract: Presents an approach for full persistence in hypertext systems. Not only are old versions kept around, but you can also start modifications from any version. Works by keeping information about each link- the version number it was associated with, whether it's been updated, access rights, whether it's part of an aggregation. There is also a version tree which shows which versions result from modifications from other versions. Then, based on the version you're starting with and the preorder traversal of the version tree, the system determines which links are `current` for you. Some of the analysis (eg, O(1) worst case per access step) seem suspect (it seems like you need to t raverse the version tree), but interesting ideas.
	Ashwini Pande, Malini Kothapalli, Ryan Richardson, and Edward A. Fox. Mirroring an oai archive on the i2-dsi channel. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. The Open Archives Initiative (OAI) promotes interoperability among digital libraries and has created a protocol for data providers to easily export their metadata. One problem with this approach is that some of the more popular servers quickly become heavily loaded. The obvious solution is replication. Fortunately, the Internet-2 Distributed Storage Infrastructure (I2-DSI) has begun to develop technology for highly distributed transparent replication of servers. This paper presents our solution for transparent mirroring of OAI repositories within the I2-DSI.
	Gautam Pant, Kostas Tsioutsiouliklis, Judy Johnson, and C. Lee Giles. Panorama: Extending digital libraries with topical crawlers. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. A large amount of research, technical and professional documents are available today in digital formats. Digital libraries are created to facilitate search and retrieval of information supplied by the documents. These libraries may span an entire area of interest (e.g., computer science) or be limited to documents within a small organization. While tools that index, classify, rank and retrieve documents from such libraries are important, it would be worthwhile to complement these tools with information available on the Web. We propose one such technique that uses a topical crawler driven by the information extracted from a research document. The goal of the crawler is to harvest a collection of Web pages that are focused on the topical subspaces associated with the given document. The collection created through Web crawling is further processed using lexical and linkage analysis. The entire process is automated and uses machine learning techniques to both guide the crawler as well as analyze the collection it fetches. A report is generated at the end that provides visual cues and information to the researcher.
	Yannis Papakonstantinou, Serge Abiteboul, and Hector Garcia-Molina. Object fusion in mediator systems. In Proceedings of the Twenty-second International Conference on Very Large Databases, pages 413-424, 1996.
	Yannis Papakonstantinou, Hector Garcia-Molina, Ashish Gupta, and Jeffrey Ullman. A query translation scheme for rapid implementation of wrappers. In Fourth International Conference on Deductive and Object-Oriented Databases, pages 161-186, National University of Singapore(NUS), Singapore, 1995.
	Yannis Papakonstantinou, Héctor García-Molina, and Jeffrey Ullman. Medmaker: A mediation system based on declarative specifications. In Proceedings of the 12th International Conference on Data Engineering, New Orleans, La., 1996.
	Yannis Papakonstantinou, Hector Garcia-Molina, and Jennifer Widom. Object exchange across heterogeneous information sources. In Proceedings of the Eleventh International Conference on Data Engineering, pages 251-260, Taiwan, 1995.
	Yannis Papakonstantinou, Ashish Gupta, and Laura Haas. Capabilities-based query rewriting in mediator systems. In Proceedings of 4th International Conference on Parallel and Distributed Information Systems, Miami Beach, Flor., 1996.
	Ian Parberry. The internet and the aspiring games programer. In DAGS '95, 1995. Format: Not Yet On-line. Audience: College instructors. References: 12. Abstract: Describes a game-programming course offered at U. of North Texas, and argues that it's a legitimate means to teach students a lot about computer science while providing them with a practical skill. Describes differe nt distribution models (shareware, freeware, nagware, etc).
	Ian Parberry and David S. Johnson. The sigact theoretical computer science genealogy: Preliminary report. In DAGS '95, 1995. Format: Not Yet On-line. Audience: Computer Science Theoreticians, trivia buffs. References: 6. Abstract: A web-based application which shows the advisor-student relations among the theoretical CS community. Also allows indexing by letter, university, and country.
	Joonah Park and Jinwoo Kim. Effects of contextual navigation aids on browsing diverse web systems. In Proceedings of the Conference on Human Factors in Computing Systems CHI'00, 2000.
	Soyeon Park. User preferences when searching individual and integrated full-text databases. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. This paper addresses a crucial issue in the digital library environment: how to support effective interaction of users with heterogeneous and distributed information resources. We compared users' preference for systems which implement interaction with multiple databases through a common interface and with multiple databases as if they were one (integrated interaction) in an experiment in the Text REtrieval Conference (TREC) environment. Twenty-eight volunteers were recruited from the graduate students of School of Communication, Information, & Library Studies at Rutgers University. Significantly more subjects preferred the common interface (HERMES) to the integrated interface (HERA). For most of the subjects in this study, the greater control in HERMES outweighed the advantages of HERA such as convenience, efficiency, and ease of use. These results suggest that: (1) the general assumption of the information retrieval (IR) literature that an integrated interaction is best needs to be revisited; (2) it is important to allow for more user control in various ways in the distributed environment; and (3) for digital library purposes, it is important to characterize different databases to support user choice for integration.
	Charles Parker. A tree-based method for fast melodic retrieval. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. The evolution of aurally queryable melodic databases (so-called query-by-humming systems) has reached a point where retrieval accuracy is relatively high, even at large database sizes. With this accuracy has come a decrease in retrieval speed as methods have become more sophisticated and computationally expensive. In this paper, we turn our attention to heuristically culling songs from our database that are unlikely given a sung query, in hopes that we can increase speed by reducing the number of matching computations necessary to reach the proper target song.
	David Patterson, Garth Gibson, and Randy H. Katz. A case for redundant arrays of inexpensive disks (raid). SIGMOD Record, 17(3):109-116, September 1988. This is the paper that introduced RAIDs. The paper proposes arrays of inexpensive disk as a cheaper, faster, and more scalable alternative to single large disk. The problem is that an array of inexpensive disk has less reliability than a single large disk. The authors propose 5 levels of RAIDs to solve that problem. Level 1 uses mirrored disk, level 2 uses hamming code for ECC, level 3 uses only a single check disk per group, level 4 allows independent reads/writes to the disks, level 5 distributes the check information across all disks.
	Gordon W. Paynter and Steve Mitchell. Developing practical automatic metadata assignment and evaluation tools for internet resources. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. This paper describes the development of practical automatic metadata assignment tools to support automatic record creation for virtual libraries, metadata repositories and digital libraries, with particular reference to library-standard metadata. The development process is incremental in nature, and depends upon an automatic metadata evaluation tool to objectively measure its progress. The evaluation tool is based on and informed by the metadata created and maintained by librarian experts at the INFOMINE Project, and uses different metrics to evaluate different metadata fields. In this paper, we describe the form and function of common metadata fields, and identify appropriate performance measures for these fields. The automatic metadata assignment tools in the iVia virtual library software are described, and their performance is measured. Finally, we discuss the limitations of automatic metadata evaluation, and cases where we choose to ignore its evidence in favor of human judgment.
	Gordon W. Paynter, Ian H. Witten, Sally Jo Cunningham, and George Buchanan. Scalable browsing for large collections: A case study. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Phrase browsing techniques use phrases extracted automatically from a large information collection as a basis for browsing and accessing it. This paper describes a case study that uses an automatically constructed phrase hierarchy to facilitate browsing of an ordinary large Web site. Phrases are extracted from the full text using a novel combination of rudimentary syntactic processing and sequential grammar induction techniques. The interface is simple, robust and easy to use.
	Virginia A. Peck and Bonnie E. John. Browser-soar: A computational model of a highly interactive task. In Proceedings of the Conference on Human Factors in Computing Systems CHI'92, 1992.
	Nikos Pediotakis and Mountaz Zizi. Visual relevance analysis. In Proceedings of DL'96, 1996. Format: Not yet online.
	J.L. Abad Peiroa, N. Asokan, M. Steiner, and M. Waidner. Ibm systems journal. Technical Report 37, IBM, 1998. The growing importance of electronic commerce has resulted in the introduction of a variety of different and incompatible payment systems. For business application developers, this variety implies the need to understand the details of different systems, to adapt the code as soon as new payment systems are introduced and also to provide a way of picking a suitable payment instrument for every transaction. We unify the different mechanisms in a common framework with application programming interfaces. Our framework provides services for transparent negotiation and selection of payment instruments as well. This allows applications to be developed independent of specific payment systems with the additional benefit of providing a central point of control for payment information and policies.
	M. Perkowitz and O. Etzioni. Adaptive web sites: an ai challenge. In Fifteenth International Joint Conference on Intelligence, 1997. The creation of a complex web site is a thorny problem in user interface design. First, differnt visitors have distinct goals. Second, even a single visitor may have different needs at different times. Much of the information at the site may also be dynamic or time-dependent. Third, as the site grows and evolves, its original design may no longer be appropriate. Finally, a site may be designed for a particular purpose but used in unexpected ways. Web servers record data about user interactions and accumulate this data over time. We believe that AI techniques can be used to examine user access logs in order to automatically improve the site. We challenge the AI community to create adaptive web sites: sites that automatically improve their organization and presentation based on user access data. Several unrelated research projects in plan recognition, machine learning, knowledge representation, and user modeling have begun to explore aspects of this problem. We hope that posing this challenge explicitly will bring these projects together and stimulate fundamental AI research. Success would have a broad and highly visible impact on the web and the AI community.
	M. Perkowitz and O. Etzioni. Adaptive web sites: Automatically synthesizing web pages. In Fifteenth National Conference on Artificial Intelligence, 1998. The creation of a complex web site is a thorny problem user interface design. In IJCAI '97, we challenged the AI community to address this problem by creating adaptive web sites: sites that automatically improve their organization and presentation by min- ing visitor access data collected in Web server logs. In this paper we introduce our own approach to this broad challenge. Specifically, we investigate the problem of index page synthesis - the automatic creation of pages that facilitate a visitor's navigation of a Web site. First, we formalize this problem as a clustering problem and introduce a novel approach to clustering, which we call cluster mining: Instead of attempting to partition the entire data space into disjoint clusters, we search for a small number of cohesive (and possibly overlapping) clusters. Next, we present PageGather, a cluster mining algorithm that takes Web server logs as input and outputs the contents of candidate index pages. Finally, we show experimentally that Page-Gather is both faster (by a factor of three) and more effective than traditional clustering algorithms on this task. Our experiment relies on access logs collected over a month from an actual web site.
	Mike Perkowitz and Oren Etzioni. Category translation: Learning to understand information on the internet. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Saverio Perugini, Kate McDevitt, Ryan Richardson, Manuel Perez-Qui±ones, Rao Shen, Naren Ramakrishnan, Chris Williams, and Edward A. Fox. Enhancing usability in citidel: Multimodal, multilingual, and interactive visualization interfaces. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. The CITIDEL digital library is a modular system designed to support education in computer science and related fields. By taking advantage of its modular nature, we have made enhancements that draw from different fields in computer science. Examples are the incorporation of interactive visualizations, usability enhancements, multimodal interactions, and community multilingual translation. Pilot studies show improvements in quality as measured across a number of metrics.
	Yves Petinot, C. Lee Giles, Vivek Bhatnagar, Pradeep B. Teregowda, and Hui Han. Enabling interoperability for autonomous digital libraries : An api to citeseer services</. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. We introduce CiteSeer-API, a public API to CiteSeer-like services. CiteSeer-API is SOAP/WSDL-based and allows for easy programatical access to all the specific functionalities offered by CiteSeer services, including full text search of documents and citations and citation-based information access. CiteSeer-API is currently demonstrated on SMEALSearch, a digital library search engine for business academic publications.
	Yves Petinot, Pradeep B. Teregowda, Hui Han, C. Lee Giles, Steve Lawrence, and Arvind Rangaswamy. ebizsearch: An oai-compliant digital library for ebusiness. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Niche Search Engines offer an efficient alternative to traditional search engines when the results returned by general-purpose search engines do not provide a sufficient degree of relevance and when nontraditional search features are required. Niche search engines can take advantage of their domain of concentration to achieve higher relevance and offer enhanced features. We discuss a new digital library niche search engine, eBizSearch, dedicated to e-business and e-business documents. The ground technology for eBizSearch is CiteSeer, a special-purpose automatic indexing document digital library and search engine developed at NEC Research Institute. We present here the integration of CiteSeer in the framework of eBizSearch and the process necessary to tune the whole system towards the specific area of e-business. We also discuss how using machine learning algorithms we generate metadata to make eBizSearch Open Archives compliant. eBizSearch is a publicly available service and can be reached at [EBIZ].
	Robert Pettengill and Guillermo Arango. Four lessons learned from managing world wide web digital libraries. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document (23K + pictures). Audience: Web publishers, esp. corporate. References: 12. Links: 1. Relevance: Low-medium. Abstract: Describes the application of software engineering techniques to web maintenance. Keeping separate development and production areas, using automated tools like UNIX's make to process, test and install new documents, and using a versioning control system like VCS to maintain a document history. Suggest a convention in URLs to specify the version number after a comma.
	Lev Pevzner and Marti Hearst. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28(1):19-36, 2002.
	Constantinos Phanouriou, Neill A. Kipp, Ohm Sornil, Paul Mather, and Edward A. Fox. A digital library for authors: Recent progress of the networked digital library of theses and dissertations. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. The Networked Digital Library of Theses and Dissertations (NDLTD) is more than an online collection of Electronic Theses and Dissertations (ETDs). It is a scalable project that has impact on thousands of graduate students in many countries as well as diverse researchers worldwide. By May 1999 it had 59 official members rep resenting 13 countries and integrated some of the world's newest research works, including ETD collections at Virginia Tech and West Virginia University, where ETD submission is now required. The number of accesses to the Virginia Tech collection has grown by more than half in the last year. NDLTD is committed to authors, aiming to improve graduate education for the over 100,000 students that prepare a thesis or dissertation each year. It encourages them to be more expressive by facilitating incorporation of multimedia components into their theses. NDLTD activities include: applying automation methods to simplify submission of ETDs over the WWW; specifying the application of the Dublin Core to guarantee that metadata can satisfy needs of search- ing and browsing; selecting open standards and procedures to facilitate interoperability and preservation; and demonstrating a variety of interfaces, both 2D and 3D, along with exploring their usability.
	Thomas A. Phelps and Robert Wilensky. Toward active, extensible, networked documents: Multivalent architecture and applications. In Proceedings of the First ACM International Conference on Digital Libraries, pages 100-108, Bethesda, Maryland, 1996. Format: Not yet online.
	Kenneth L. Phillips. Meta-information, the network of the future and intellectual property protection. In IP Workshop Proceedings, 1994. Format: HTML Document (24K). Audience: Non-technical (some jargon), somewhat business-oriented. References: 4. Links: 0. Relevance: Low. Abstract: Tries to apply information theory ideas of half-life. Description of ATM header packet. Discussion of value of marketing info from charge card & 800 calls. (Claims marketers would pay $3 per name/address for access to 800-lists, $3-7 if you could add queries about charge cards.)
	Photopals photo album. http://photopals2002.com.
	PHP Hypertext Processor. http://www.php.net.
	Jeffrey S. Pierce, Matthew Conway, Maarten van Dantzich, and George Robertson. Toolspaces and glances: storing, accessing, and retrieving objects in 3d desktop applications. In SI3D '99: Proceedings of the 1999 symposium on Interactive 3D graphics, pages 163-168, New York, NY, USA, 1999. ACM Press.
	Alexandre Topol Pierre Cubaud, Jérôme Dupire. Digitization and 3d modeling of movable books. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Movable books provide interesting challenges for digitization and user interfaces design. We report in this paper some preliminary results in the building of a 3D visualization workbench for such books.
	Ka ping Yee. Peephole displays: Pen interaction on spatially aware handheld computers. In Proceedings of the Conference on Human Factors in Computing Systems CHI'03, pages 1-8, 2003.
	G. Pinski and F. Narin. Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Inf. Proc. and Management, 12, 1976.
	David Pinto, Michael Branstein, Ryan Coleman, Matthew King, Xing Wei Wei Li, and W. Bruce Croft. Quasm: A system for question answering using semi-structured data. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. This paper describes a system for question answering using semi-structured metadata, QuASM (pronounced `chasm`). Question answering systems aim to improve search performance by providing users with specific answers, rather than having users scan retrieved documents for these answers. Our goal is to answer factual questions by exploiting the structure inherent in documents found on the World Wide Web (WWW). Based on this structure, documents are indexed into smaller units and associated with metadata. Transforming table cells into smaller units associated with metadata is an important part of this task. In addition, we report on work to improve question classification using language models. The domain used to develop this system is documents retrieved from a crawl of www.fedstats.gov.
	Peter Pirolli. Exploring browser design trade-offs using a dynamical model of optimal information foraging. In Proceedings of the Conference on Human Factors in Computing Systems CHI'98, 1998.
	James Pitkow and Peter Pirolli. Life, death, and lawfulness on the electronic frontier. In Proceedings of the Conference on Human Factors in Computing Systems CHI'97, 1997.
	Harry Plantinga. Digital libraries and large text documents on the world wide web. In DAGS '95, 1995. Format: HTML Document (17K) . Audience: HTML publishers, Webmasters, Browser builders. References: 6 notes. Links: 8. Relevance: Low. Abstract: Considers the problem of reference materials done in HTTP. Most people want to access only a particular section, but the HTTP model requires transferring the whole document. Describes a `Pager` (CGI script) that fin ds the desired section and returns it as its own document. Could preprocess documents and index them, parse to ensure it's valid HTML. Closes with concrete suggestions for UI features to ease reading books over Web.
	John C. Platt. Autoalbum: Clustering digital photographs using probabilistic model merging. In CBAIVL '00: Proceedings of the IEEE Workshop on Content-based Access of Image and Video Libraries (CBAIVL'00), page 96, Washington, DC, USA, 2000. IEEE Computer Society.
	John C. Platt, Mary Czerwinski, and Brent A. Field. Phototoc: Automatic clustering for browsing personal photographs. Technical Report MSR-TR-2002-17, Microsoft Research, February 2002.
	E. Poger and M. Baker. Secure public internet access handler (spinach). In Proceedings of the USENIX Symposium on Internet Technologies and Systems, Dec 1997.
	Stephen Pollock. A rule-based message filtering system. ACM Transactions on Office Information Systems, 6(3), July 1988. You write rules and the system filters your mail.
	Jeffrey Pomerantz and R. David Lankes. Taxonomies for automated question triage in digital reference [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. This study identifies (1) several taxonomies of questions at different levels of linguistic analysis, according to which questions received by digital reference services are classified, and (2) a simple categorization of triage recipients. The utility of these taxonomies and categorizations of is discussed as the basis for systems for automating triage and other steps in the digital reference process.
	A. Poon, L. Fagan, and E. Shortliffe. The pen-ivory project: Exploring user-interface design for the selection of items from large controlled vocabularies of medicine. Journal of the American Medical Informatics Association, 3(2):168-83, 1996.
	Viswanath Poosala and Venkatesh Ganti. Fast approximate query answering using precomputed statistics. In Proceedings of the 15th International Conference on Data Engineering, page 252, Sydney, Austrialia, 1999. ACM Press, New York.
	M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130-137, July 1980.
	Michael Potmesil. Maps alive: Viewing geospatial information on the www. In Proceedings of the Sixth International World-Wide Web Conference, 1997.
	Atul Prakash and Hyong Sop Shim. Distview: Support for building efficient collaborative applications using replicated objects. In Richard Furuta and Christine Neuwirth, editors, CSCW '94, New York, 1994. ACM. The ability to share synchronized views of interactions with an application is critical to supporting synchronous collaboration. The paper suggests a simple synchronous collaboration paradigm in which the sharing of the views of user/application interactions occurs at the window level within a multi-user, multi-window application. The paradigm is incorporated in a toolkit, DistView, that allows some of the application windows to be shared at a fine-level of granularity, while still allowing other application windows to be private. The toolkit is intended for supporting synchronous collaboration over wide-area networks. To keep bandwidth requirements and interactive response time low in such networks, DistView uses an object-level replication scheme, in which the application and interface objects that need to be shared among users are replicated. We discuss the design of DistView and present our preliminary experience with a prototype version of the system.
	S.E. Preece and M.E. Williams. Software for the searcher's workbench. In Proceedings of the 43rd American Society for Information Science Annual Meeting, volume 17, pages 403-405, Anaheim, Calif., October 1980. Knowledge Industry Publications, White Plains, N.Y.
	Christopher J. Prom and Thomas G. Habing. Using the open archives initiative protocols with ead. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. The Open Archives Initiative Protocols present a promising opportunity to make metadata about archives, manuscript collections, and cultural heritage resources easier to locate and search. However, several technical barriers must be overcome before useful OAI records can be produced from the disparate metadata formats used to describe these resources. This paper examines the case of Encoded Archival Description (EAD) as a test case of the issues to be addressed in mapping cultural heritage metadata to OAI. Encoding guidelines and selected EAD files are analyzed, and a suggested mapping from EAD to OAI is provided. The paper suggests that in some cases it may be necessary to create numerous OAI records from one source file. In addition, the findings indicate that further standardization of EAD markup practices would enhance interoperability.
	ProxiNet. Proxiweb. ProxiNet website: http://www.proxinet.com/.
	Konstantinos Psounis. Class-based delta-encoding: a scalable scheme for caching dynamic web content. In 22nd International Conference on Distributed Computing Systems Workshops, 2002.
	Varna Puvvada and Roy H. Campbell. Inverse mapping in the handle management system. In Proceedings of DL'96, 1996. Format: Not yet online.
	Jialun Qin, Yilu Zhou, and Michael Chau. Building domain-specific web collections for scientific digital libraries: A meta-search enhanced focused crawling method. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Collecting domain-specific documents from the Web using focused crawlers has been considered one of the most important strategies to build digital libraries that serve the scientific community. However, because most focused crawlers use local search algorithms to traverse the Web space, they could be easily trapped within a limited sub-graph of the Web that surrounds the starting URLs and build domain-specific collections that are not comprehensive and diverse enough to scientists and researchers. In this study, we investigated the problems of traditional focused crawlers caused by local search algorithms and proposed a new crawling approach, meta-search enhanced focused crawling, to address the problems. We conducted two user evaluation experiments to examine the performance of our proposed approach and the results showed that our approach could build domain-specific collections with higher quality than traditional focused crawling techniques.
	Michael Rabinovich and Amit Aggarwal. Radar: a scalable architecture for a global web hosting service. In Proceedings of the Eighth International World-Wide Web Conference, 1999. As commercial interest in the Internet grows, more and more companies are offering the service of hosting and providing access to information that belongs to third-party information providers. In the future, successful hosting services may host millions of objects on thousands of servers deployed around the globe. To provide reasonable access performance to popular resources, these resources will have to be mirrored on multiple servers. In this paper, we identify some challenges due to the scale that a platform for such global services would face, and propose an architecture capable of handling this scale. The proposed architecture has no bottleneck points. A trace-driven simulation using an access trace from AT&T's hosting service shows very promising results for our approach.
	Michael Rabinovich, Zhen Xiao, Fred Douglis, and Chuck Kalmanek. Moving edge-side includes to the real edge - the clients. In USENIX Symposium on Internet Technologies and Systems, 2003.
	Prabhakar Raghavan. Information retrieval algorithms: A survey. In Proceedings of ACM Symposium on Discrete Algorithms, 1997.
	Sriram Raghavan and Hector Garcia-Molina. Crawling the hidden web. Technical Report 2000-36, Computer Science Department, Stanford University, December 2000. Available at http://dbpubs.stanford.edu/pub/2000-36.
	Sriram Raghavan and Hector Garcia-Molina. Crawling the hidden web. In Proceedings of the Twenty-seventh International Conference on Very Large Databases, September 2001.
	Anand Rajaraman, Yehoshua Sagiv, and Jeffrey D. Ullman. Answering queries using templates with binding patterns. In Proceedings of the 14th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 105-112, San Jose, Calif., May 1995.
	R. Ram and B. Block. Development of a portable information system: connecting palmtop computers with medical records systems and clinical reference resources. In Seventeenth Annual Symposium on Computer Applications in Medical Care. Patient-Centered Computing, pages 125-128. McGraw-Hill, 1994. The portability of palmtop computers makes them an ideal platform to maintain communication between busy physicians and medical information systems (MIS). In our academic FHC (Family Health Center), we have developed software that runs on a palmtop computer allowing access to information in the hospital information system and our FHC's AAMRS (Automated Ambulatory Medical Record System). Using a Hewlett-Packard 95LX palmtop computer, custom software has been developed to access summary data on in-patients and out-patients. Data is downloaded into a database on a palmtop computer memory card. ASCII data from a MIS is transformed into a database format readable on the palmtop. Our hospital MIS department transmits information daily on our in-patient service. We also download, weekly, a patient summary on all of our active out-patients in our MUMPS-based AAMRS. Each morning, the resident in the Family Practice program updates his palmtop memory card at a central workstation. We have made the palmtop computer even more valuable to physicians by providing an integrated software package. This package includes information management software with to-do lists, reference software such as drug formularies and decision support software. The downloading of patient information creates two important problems: security and reliability. To assure the confidentiality of downloaded patient information, the palmtop system uses password protection.
	Roberta Y. Rand. The global change data and information system-assisted search for knowledge (gc-ask) project. D-Lib Magazine, Aug 1995. Format: HTML Document().
	Roberta Y. Rand and Betty Coyle-Friedman. Gc-ask: A prototype information discovery project for the global change data and information system. In Proceedings of DL'96, 1996. Format: Not yet online.
	R. Rao, B. Janssen, and A. Rajaraman. GAIA technical overview. Technical report, Xerox PARC, 1994.
	R. Rao, D.M. Russel, and J.D. Mackinlay. System components for embedded information retrieval from multiple disparate information sources. In Proceedings of the ACM UIST '93, pages 23-33, Atlanta, Ga., November 1993. ACM Press, New York.
	Andreas Rauber and Alexander Muller-Kogler. Integrating automatic genre analysis into digital libraries. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. With the number and types of documents in digital library systems increasing, tools for automatically organizaing and presenting the content have to be found. While many approaches focus on topic-based organization and structuring, hardly any system incorporates automatic structural analysis and representation. Yet, genre information (unconsciously) forms one of the most distinguishing features in conventional libraries and in information searches. In this paper we present an approach to automatically analyze the structure of documents and to integrate this information into an automatically created content-based organization. In the resulting visualization, documents on similar topics, yet representing different genres, are depicted as books in differing colors. This representation supports users intuitively in locating relevant information presented in a relvant form.
	Unni Ravindranathan, Rao Shen, Marcos Andre Goncalves, Weiguo Fan, Edward A. Fox, and James W. Flanagan. Etana-dl: A digital library for integrated handling of heterogeneous archaeological data. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Archaeologists have to deal with vast quantities of information, generated both in the field and laboratory. That information is heterogeneous in nature, and different projects have their own systems to store and use it. This adds to the challenges regarding collaborative research between such projects as well as information retrieval for other more general purposes. This paper describes our approach towards creating ETANA-DL, a digital library (DL) to help manage these vast quantities of information and to provide various kinds of services. The 5S framework for modeling a DL gives us an edge in understanding this vast and complex information space, as well as in designing and prototyping a DL to satisfy information needs of archaeologists and other user communities.
	Mimi Recker, Jim Dorward, Deonne Dawson, Sam Halioris, Ye Liu, Xin Mao, Bart Palmer, and Jaeyang Park. You can lead a horse to water: Teacher development and use of digital library resources. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Much effort has been expended on infrastructures, technologies, and standards for digital libraries of learning resources. Key objectives of these initiatives are to improve teacher and learner access to high-quality learning resources and to increase their use in order to improve education. In this article, we take a broader view by proposing a framework that incorporates inputs and process variables affecting these desired outcomes. Inputs include variables such as audience characteristics. Outcomes include increased teacher and student use of learning resources from digital libraries. In particular, two key variables are examined in this framework. The first is a professional development program aimed at educators on the topic of using educational digital libraries. The second is a simple end-user authoring service, called the Instructional Architect (IA). The IA helps users, particularly teachers, discover, select, sequence, annotate, and reuse learning resources stored in digital libraries.
	Saveen Reddy, Dale Lowry, Surenda Reddy, Rick Henderson, Jim Davis, and Alan Babich. Dav searching & locating, internet draft. Technical report, IETF, June 1999. Available at http://www.webdav.org/dasl/protocol/draft-dasl-protocol- 00.html. Draft of the DASL document.
	J. Redi and Y. Bar-Yam. Interjournal: A distributed refereed electronic journal. In DAGS '95, 1995. Format: HTML Document(30K) . Audience: Journal authors, referees, readers, and editors. References: 1. Links: 5. Relevance: Low-medium. Abstract: Describes an implemented system for on-line journal publication. Based on WWW and forms, the system calls for each author to maintain his own articles, while there is a centralized index for searching. Referee proc ess is conducted on-line, and there is an option for public abstract as well. Checksum is generated at submission to ensure the article hasn't been changed.
	Xiaona Ren, Lloyd A. Smith, and Richard A. Medina. Discovery of retrograde and inverted themes for indexing musical scores. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. This paper describes extensions to a musical score indexing program that enable it to discover sequences of notes that appear in retrograde and/or inverted form. The program was tested over a set of 25 orchestral movements by several composers of the Baroque, Classical, and Romantic periods. The retrograde and inversion discovery algorithm added an average of 3.7 patterns per movement to the index, increasing the number of notes in the index by 6entries by 6.8of 2 themes in retrograde, 4 themes in inversion, and 3 themes in retrograde inversion.
	Allen Renear, Dave Dubin, Michael Sperberg-McQueen, and Claus Huitfeldt. Xml semantics and digital libraries [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. The lack of a standard formalism for expressing the semantics of an XML vocabulary is a major obstacle to the development of high-function interoperable digital libraries. XML document type definitions (DTDs) provide a mechanism for specifying the syntax of an XML vocabulary, but there is no comparable mechanism for specifying the semantics of that vocabulary - where semantics simply means the basic facts and relationships represented by the occurrence of XML constructs. A substantial loss of functionality and interoperability in digital libraries results from not having a common machine-readable formalism for expressing these relationships for the XML vocabularies currently being used to encode content. Recently a number of projects and standards have begun taking up related topics. We describe the problem and our own project.
	Resco picture viewer for pocket pc. http://www.resco-net.com/.
	P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. GroupLens: an open architecture for collaborative filtering of netnews. In Proceedings of the Conference on Computer-Supported Cooperative Work, CSCW'94, 1994.
	P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: an open architecture for collaborative filtering of netnews. In Proceedings of the Conference on Computer Supported Collaborative Work. Transcending Boundaries, 1994. Collaborative filters help people make choices based on the opinions of other people. GroupLens is a system for collaborative filtering of netnews, to help people find articles they will like in the huge stream of available articles. News reader clients display predicted scores and make it easy for users to rate articles after they read them. Rating servers, called Better Bit Bureaus, gather and disseminate the ratings. The rating servers predict scores based on the heuristic that people who agreed in the past will probably agree again. Users can protect their privacy by entering ratings under pseudonyms, without reducing the effectiveness of the score prediction. The entire architecture is open: alternative software for news clients and Better Bit Bureaus can be developed independently and can interoperate with the components we have developed.
	P. Resnick and J. Miller. Pics: Internet access controls without censorship. Communications of the ACM, 39(10):87-93, 1996. With its recent explosive growth, the Internet now faces a problem inherent in all media that serve diverse audiences: not all materials are appropriate for every audience. Societies have tailored their responses to the characteristics of the media [1, 3]: in most countries, there are more restrictions on broadcasting than on the distribution of printed materials. Any rules about distribution, however, will be too restrictive from some perspectives, yet not restrictive enough from others. We can do better-we can meet diverse needs by controlling reception rather than distribution. In the TV industry, this realization has led to the V-chip, a system for blocking reception based on labels embedded in the broadcast stream.
	Paul Resnick and Hal R. Varian. Recommender Systems. Communications of the ACM, March 1997. This entry is here to allow citing of this CACM issue as a whole
	Joan A. Reyes. The electronic reserve system at penn state university. In Proceedings of DL'96, 1996. Format: Not yet online.
	Natalia Reyes-Farfßn and Alfredo Sßnchez. Personal spaces in the context of the oai initiative [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. We describe MiBiblio 2.0 a highly personalizable user interface for a federation of digital libraries under the OAI Protocol for Metadata Harvesting. (OAI-PMH). MiBiblio 2.0 allows users to personalize their personal space by choosing the resources and services they need, as well as to organize, classify and manage their workspaces including resources from any of the federated libraries. Results can be kept in personal spaces and organized into categories using a drag-and-drop interface.
	Michael Ribaudo, Colette Wagner, Michael Kress, and Bernard Rous. The challenges to designing viable digital libraries. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (19K) . Audience: Mostly non-technical, funders & business people. References: 5. Links: 1. Relevance: Low-Medium. Abstract: Lists 6 `major` areas that need to be addressed for DL. 1) Is internet sufficient in terms of topology & bandwidth? 2) UI for disabled users? 3) Economic model of publishing 4) Production model for electronic pub lishing (work flow) 5) Electronic tools to support publishing 6) Intellectual Property. Doesn't make concrete suggestions beyond suggesting various committees.
	B. Ribeiro-Neto and R. Barbosa. Query performance for tightly coupled distributed digital libraries. In Proceedings of the Third ACM International Conference on Digital Libraries, pages 182-190, June 1998.
	Berthier Ribeiro-Neto, Edleno S. Moura, Marden S. Neubert, and Nivio Ziviani. Efficient distributed algorithms to build inverted files. In Proceedings of the Twenty-Second Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 105-112, August 1999.
	David Ribes, Karen Baker, Geoffrey Bowker, and Florence Millerand. Comparative interoperability project: Configurations of commmunity, technology, and organization. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. In this paper we describe the methods, goals and early findings of the research endeavor ‘Comparative Interoperability Project’ (CIP). The CIP is an extended interdisciplinary collaboration of information and social scientists with the shared goal of understanding the diverse range of interoperability strategies within information infrastructure building activities. We take interoperability strategies to be the simultaneous mobilization of community, organizational and technical resources to enable data integration. The CIP draws together work with three ongoing collaborative scientific projects (GEON, LTER, Ocean Informatics) that are building information infrastructures for the natural sciences.
	Tracy Riggs and Robert Wilensky. An algorithm for automated rating of reviewers. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. The current system for scholarly information dissemination may be amenable to significant improvement. In particular, going from the current system of journal publication to one of self-distributed documents offers significant cost and timeliness advantages. A major concern with such alternatives is how to provide the value currently afforded by the peer review system. Here we propose a mechanism that could plausibly supply such value. In the peer review system, papers are judged meritorious if good reviewers give them good reviews. In its place, we propose a collaborative filtering algorithm which automatically rates reviews, and incorporates the quality of the reviewer into the metric of merit for the paper. Such a system seems to provide all the benefits of the current peer review system, while at the same time being much more flexible. We have implemented a number of parameterized variations of this algorithm, and tested them on data available from a quite different application. Our initial experiments suggest that the algorithm is in fact ranking reviewers reasonably.
	C. J. Van Rijsbergen. Information Retrieval, 2nd edition. Butterworths, London, 1979.
	Chris Roadknight, Ian Marshall, and Debbie Vearer. File popularity characterisation. SIGMETRICS Perform. Eval. Rev., 27(4):45-50, 2000.
	Daniel C. Robbins, Edward Cutrell, Raman Sarin, and Eric Horvitz. Zonezoom: map navigation for smartphones with recursive view segmentation. In AVI '04: Proceedings of the working conference on Advanced visual interfaces, pages 231-234, New York, NY, USA, 2004. ACM Press.
	Scott Robertson, Cathleen Wharton, Catherine Ashworth, and Marita Franzke. Dual device user interface design: Pdas and interactive television. In Proceedings of the Conference on Human Factors in Computing Systems CHI'96, 1996.
	Robots exclusion protocol. http://info.webcrawler.com/mak/projects/robots/exclusion.html.
	Kerry Rodden. How do people organise their photographs? In 21st Annual BCS-IRSG Colloquium on IR, 1999. Available at http://www.rodden.org/kerry/irsg.pdf.
	Kerry Rodden, Wojciech Basalaj, David Sinclair, and Kenneth Wood. Does organisation by similarity assist image browsing? In CHI '01: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 190-197. ACM Press, 2001.
	Kerry Rodden and Kenneth R. Wood. How do people manage their digital photographs? In Proceedings of the conference on Human factors in computing systems, pages 409-416. ACM Press, 2003.
	M. E. Rorvig, M. W. Hutchison, R. O. Shelton, S. L. Smith, and M. E. Yazbeck. An intelligent agent for the k-12 educational community. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	M. Röscheisen, C. Mogensen, and T. Winograd. Beyond browsing: Shared comments, SOAPs, trails and on-line communities. In Proceedings of the Fourth International World-Wide Web Conference, pages 739-749, Darmstadt, Germany, 1995. The paper describes a system we have implemented that enables people to share structured in-place annotations attached to material in arbitrary documents on the WWW. The basic conceptual decisions are laid out, and a prototypical example of the client server interaction is given. We then explain the usage perspective, describe our experience with using the system, and discuss other experimental usages of our prototype implementation, such as collaborative filtering, seals of approval, and value added trails. We show how this is a specific instantiation of a more general `virtual document` architecture in which, with the help of light weight distributed meta information, viewed documents can incorporate material that is dynamically integrated from multiple distributed sources. Development of that architecture is part of a larger project on Digital Libraries that we are engaged in
	M. Röscheisen, C. Mogensen, and T. Winograd. Interaction design for shared world-wide web annotations. In Proceedings of the Conference on Human Factors in Computing Systems CHI'95, 1995.
	M. Röscheisen, C. Mogensen, and T. Winograd. A platform for third-party value-added information providers: Architecture, protocols, and usage examples. Technical report, Stanford University, May 1995.
	M. Roscheisen, T. Winograd, and A. Paepcke. Content rating and other third-party value-added applications for the world-wide web. D-Lib magazine, August 1995. Imagine you want to know what your colleagues in the interest group DLissues have found worth seeing lately. In your browser, you select `Tour annotation set DLissues`, with the filter set to `annotations newly created since yesterday`. You get a report containing pointers to annotated locations in various documents; you inspect some of these links with a comment previewer. Sara evidently appreciated a paper on security in the proceedings of a conference last year-she gave it the highest ranking on her personal scale. You click on the link and jump to the annotated section in the paper. You scan it up and down and wonder whether the security research group you know at another university has any opinion on this paper. You turn on their annotation set SecurityPapers, which you can access free of charge since your school has a site licensing agreement. You see that they have made a `trailmarker annotation` to the top of the paper. You inspect the annotation icon with the previewer: it says that the paper you are viewing is really subsumed now by the one at a more recent conference which the trail marker points to. With another click you jump to this more recent paper, which turns out to be written even more clearly. You go back to reply to Sara's original comment and include a pointer to the SecurityPapers set.
	Martin Röscheisen, Christian Mogensen, and Terry Winograd. Shared web annotations as a platform for third-party value-added information providers: Architecture, protocols, and usage examples. Technical report, Computer Science Department, Stanford University, Nov 1994.
	Martin Röscheisen and Terry Winograd. A communication agreement framework of access/action control. In Proceedings of the 1996 IEEE Symposium on Research in Security and Privacy, 1996. Format: PostScript (286K)
	Martin Röscheisen and Terry Winograd. A network-centric design for relationship-based rights management. Journal of Computer Security, 5(3):249-254, 1997. Available at http://dbpubs.stanford.edu/pub/2000-46. The author's dissertation with the same title was completed at Stanford University. Main article for Roscheisen's work on the use of contract notions for digital library rights management.
	Martin Röscheisen, Terry Winograd, and Andreas Paepcke. Content ratings and other third-party value-added information: Defining an enabling platform. CNRI D-Lib Magazine, August 1995.
	Daniel E. Rose, Richard Mander, Tim Oren, Dulce B. Poncéleon, Gitt Salomon, and Yin Yin Wong. Content awareness in a file system interface: implementing the metaphor for organizing information. In SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pages 260-269, New York, NY, USA, 1993. ACM Press.
	Mendel Rosenblum and John K. Ousterhoust. The design and implementation of a log-structured file system. In Proc. of the 13th Intl. ACM Symposium on Operating Systems Principles, pages 1-15, October 1991.
	Ruth A. Ross, Lois F. Kelso, Gary R. Broughton, and Edward J. Hopkins. Providing multiple levels of difficulty in earthlab's digital library. In Proceedings of DL'96, 1996. Format: Not yet online.
	Gustavo Rossi, Daniel Schwabe, and Fernando Lyardet. Improving web information systems with navigational patterns. In Proceedings of the Eighth International World-Wide Web Conference, 1999. In this paper we show how to improve the architecture of Web information systems (WIS) using design patterns. in particular navigational patterns. We first present a framework to reason about the process of designing and implementing these applications. Then we introduce navigational patterns and show some prototypical patterns. We next show how these patterns have been used in some successful WIS. Finally, we explain how patterns are integrated into the development process of WIS.
	Mary Tork Roth and Peter M. Schwarz. Don't scrap it, wrap it! a wrapper architecture for legacy data sources. In Proceedings of the Twenty-third International Conference on Very Large Databases, pages 266-275, Athens, Greece, August 1997. VLDB Endowment, Saratoga, Calif.
	Dmitri Roussinov and Jose Robles. Web question answering through automatically learned patterns. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. While being successful in providing keyword based access to web pages, commercial search portals, such as Google, Yahoo, AltaVista, and AOL, still lack the ability to answer questions expressed in a natural language. We explore the feasibility of a completely trainable approach to the automated question answering on the Web or large scale digital libraries. By using the inherent redundancy of large scale collections, each candidate answer found by the system is triangulated (confirmed or disconfirmed) against other possible answers. Since our approach is entirely self-learning and does not involve any linguistic resources it can be easily implemented within digital libraries or Web search portals.
	Jeremy Rowe, Anshuman Razdan, and Arleyn Simon. Acquisition, representation, query and analysis of spatial data: A demonstration 3d digital library. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. The increasing power of techniques to model complex geometry and extract meaning from 3D information create complex data that must be described, stored, and displayed to be useful to researchers. Responding to the limitations of two-dimensional (2D) data representations perceived by discipline scientists, the Partnership for Research in Spatial Modeling (PRISM) project at Arizona State University (ASU) developed modeling and analytic tools that raise the level of abstraction and add semantic value to 3D data. The goals are to improve scientific communication, and to assist in generating new knowledge, particularly for natural objects whose asymmetry limit study using 2D representations. The tools simplify analysis of surface and volume using curvature and topology to help researchers understand and interact with 3D data. The tools produced automatically extract information about features and regions of interest to researchers, calculate quantifiable, replicable metric data, and generate metadata about the object being studied. To help researchers interact with the information, the project developed prototype interactive, sketch-based interfaces that permit researchers to remotely search, identify and interact with the detailed, highly accurate 3D models of the objects. The results support comparative analysis of contextual and spatial information, and extend research about asymmetric man-made and natural objects.
	Neil C. Rowe. Virtual multimedia libraries built from the web. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We have developed a tool MARIE-4 for building virtual libraries of multimedia (images, video, and audio) by automatically exploring (crawling) a specified subdomain of the World Wide Web to create an index based on caption keywords. Our approach uses carefully-researched criteria to identify and rate caption text, and employs both an expert system and a neural network. We have used it to create a keyword-based interface to nearly all nontrivial captioned publicly-accessible U.S. Navy images (667,573), video (8,290), and audio (2,499), called the Navy Virtual Multimedia Library (NAVMULIB).
	J. Rucker and M.J. Polanco. Siteseer: personalized navigation for the web. Communications of the ACM, 40(3):73-75, March 1997. Siteseer is a World Wide Web page recommendation system that uses an individual's bookmarks and the organization of bookmarks within folders for predicting and recommending relevant pages. Siteseer utilizes each user's bookmarks as an implicit declaration of interest in the underlying content, and the user's grouping behavior (such as the placement of subjects in folders) as an indication of semantic coherency or relevant groupings between subjects. In addition, Siteseer treats folders as a personal classification system which enables it to contextualize recommendations into classes defined by the user. Over time, Siteseer learns each user's preferences and the categories through which they view the world, and at the same time it learns, for each Web page, how different communities or affinity-based clusters of users regard it. Siteseer then delivers personalized recommendations of online content and Web pages, organized according to each user's folders.
	Daniela Rus and James Allan. Structural queries in electronic corpora. In DAGS '95, 1995. Format: Not Yet On-line.. Audience: Information Retrieval, computer scientists. References: 22. Links: . Relevance: Low-Medium. Abstract: Automatic construction of hyperlinks, based on structure of document (inferred from LaTeX source or PostScript image). So, for example, a query relating to a figure would link to definitions, theorems, and proofs re lated to the figure (automatically deduced), possibly over many documents. TexTile like interface, curved into a circle to allow intra-document links.
	Edward A. Fox Ryan Richardson. Using concept maps in digital libraries as a cross-language resource discovery tool. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. The concept map, first suggested by Joseph Novak, has been extensively studied as a way for learners to increase understanding. We are automatically generating and translating concept maps from electronic theses and dissertations, for both English and Spanish, as a DL aid to discovery and summarization.
	Jeffrey A. Rydberg-Cox. Automatic disambiguation of latin abbreviations in early modern texts for humanities digital libraries [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Early modern books written in Latin contain many abbreviations of common words that are derived from earlier manuscript practice. While these abbreviations are usually easily deciphered by a reader well-versed in Latin, they pose technical problems for full text digitization: they are difficult to OCR or have typed and - if they are not expanded correctly - they limit the effectiveness of information retrieval and reading support tools in the digital library. In this paper, I will describe a method for the automatic expansion and disambiguation of these abbreviations.
	M. Sahami, M. Hearst, and E. Saund. Applying the multiple cause mixture model to text categorization. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 435-443. Morgan Kaufmann, 1996. At http://dbpubs.stanford.edu/pub/1996-78.
	Mehran Sahami. Learning limited dependence Bayesian classifiers. In Second International Conference on Knowledge Discovery in Databases, 1996. At ftp://starry.stanford.edu/pub/sahami/papers/kdd96-learn-bn.ps.
	Mehran Sahami. Applications of machine learning to information access. In AAAI-97, Proceedings of the Fourteenth National Conference on Artificial Intelligence, 1997.
	Mehran Sahami. Using Machine Learning to Improve Information Access. PhD thesis, Stanford University, Computer Science Department, 1998.
	Mehran Sahami, Salim Yusufali, and Michelle Baldonado. Sonia: A service for organizing networked information autonomously. In Proceedings of the Third ACM International Conference on Digital Libraries, 1998.
	Mehran Sahami, Salim Yusufali, and Michelle Q. Wang Baldonado. Real-time full-text clustering of networked documents. In AAAI-97, Proceedings of the Fourteenth National Conference on Artificial Intelligence, 1997. We describe a initial results with a service for clustering networked documents that has been successfully integrated into the Stanford Digital Libraries Testbed.
	J. Sairamesh, C. Nikolaou, D. Ferguson, and Y. Yemini. Economic framework for pricing and charging in digital libraries. D-Lib Magazine, Feb 1996. Format: HTML Document().
	J. Sairamesh, Y. Yemini, D. F. Ferguson, and C. Nikolaou. A framework for pricing services in digital libraries. In Proceedings of DL'96, 1996. Format: Not yet online.
	Tetsuo Sakaguchi, Akira Maeda, Takehisa Fujita, Shigeo Sugimoto, and Koichi Tabata. A browsing tool of multi-lingual documents for users without multi-lingual fonts. In Proceedings of DL'96, 1996. Format: Not yet online.
	Airi Salminen et al. From text to hypertext by indexing. ACM Transactions on Information Systems, 13(1):69-99, January 1995. A model is presented for converting a collection of documents to hypertext by means of indexing. The documents are assumed to be semistructured, i.e., their text is a hierarchy of parts, and some of the parts consist of natural language. The model is intended as a framework for specifying hypertextual reading capabilities for specific application areas and for developing new automated tools for the conversion of semistructured text to hypertext. In the model, two well-known paradigms-formal grammars and document indexing-are combined. The structure of the source text is defined by a schema that is a constrained context-free grammar. The hierarchic structure of the source may thus be modeled by a parse tree for the grammar. The effect of indexing is described by the grammar transformations. The new grammar, called an indexing schema, is associated with a new parse tree where some text parts are index elements. The indexing schema may hide some parts of the original documents or the structure of some parts. For information retrieval, parts of the indexed text are considered to be nodes of a hypergraph. In the hypergraph-based information access, the navigation capabilities of hypertext systems are combined with the querying capabilities of information retrieval systems.
	Gerard Salton. Introduction to modern information retrieval. McGraw-Hill, New York, 1983. Math & Comp Sci/Green Z699.S313
	Gerard Salton. Automatic Text Processing. Addison-Wesley, Reading, Mass., 1989.
	Jerome H. Saltzer. Technology, networks, and the library of the year 2000. In A. Bensoussan and J.-P. Verjus, editors, In Future Tendencies in Computer Science, Control, and Applied Mathematics. Proceedings of the International Conference on the Occasion of the 25th Anniversary of INRIA, pages 51-67, New York, 1992. Springer-Verlag. An under-appreciated revolution in the technology of on-line storage, display, and communications will, by the year 2000, make it economically possible to place the entire contents of a library on-line, in image form, accessible from computer workstations located anywhere, with a hardware storage cost comparable to one year's operational budget of that library. In this paper we describe a vision in which one can look at any book, journal, paper, thesis, or report in the library without leaving the office, and can follow citations by pointing; the item selected pops up immediately in an adjacent window. To bring this vision to reality, research with special attention to issues of modularity and scale will be needed, on applying the client/server model, on linking data, and on the implications of storage that must persist for decades.
	J.H. Saltzer, D.P. Reed, and D.D. Clark. End-to-end arguments in system design. ACM Transactions on Computer Systems, 2(4):277-288, November 1984. Argues that you should put functionality at the higher app layers, rather than at low layers. Includes a security example
	J. Alfredo Sánchez. User agents in the interface to digital libraries. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (7K) . Audience: Non-technical. References: 3. Links: 1. Relevance: Low-medium. Abstract: Suggests user or interface agents are a valuable organizational tool. Examples of missions that can be delegated to agents in a digital library include notifying the user when information of interest is added or upd ated, filtering retrieved information according to the user's needs or preferences, and handling routine administrative procedures in the library (such as copyright and billing procedures). Agents may also provide hints to the user bas ed on their knowledge of the library or on observed usage by other users, or contact other users (or user agents) to obtain needed information. Goes on to list properties that agents should have like: security, inspectability, adapti vity, etc.
	David Sankoff and J.B. Kruskal (ed.). Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, 1983. Found this in hewl98 (Smith/McNab/Witten) as a reference for closest editing distance.
	Manojit Sarkar and Marc H. Brown. Graphical fisheye views of graphs. In Proceedings of the Conference on Human Factors in Computing Systems CHI'92, 1992.
	Manojit Sarkar, Scott S. Snibbe, Oren J. Tversky, and Steven P. Reiss. Stretching the rubber sheet: a metaphor for viewing large layouts on small screens. In UIST '93: Proceedings of the 6th annual ACM symposium on User interface software and technology, pages 81-91, New York, NY, USA, 1993. ACM Press.
	Risto Sarvas, Erick Herrarte, Anita Wilhelm, and Marc Davis. Metadata creation system for mobile images. In Proceedings of the 2nd international conference on Mobile systems, applications, and services, pages 36-48. ACM Press, 2004.
	M. Satyanarayanan. Mobile computing: Where's the tofu? Mobile computing and communications review, 1(1):17-21, 1997. A general article on recent research in mobile computing, focuses on the challenges to be faced and any possible insights to offer the field of computer science. Covers constraints of mobility, adaptation strategies and several other areas.
	Linda Schamber. Relevance and information behavior. Annual Review of Information Science and Technology (ARIST), 29:3-48, 1994. This is a long survey of what people have said/thought about information relevance. (See also Mizz97)
	Bruce Schatz, Ann Bishop, William Mischo, and Joseph Hardin. Digital library infrastructure for a university engineering community. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document () . Audience: Non-technical, librarians, funders (some jargon). References: Not included. Links: 1. Relevance: Low. Abstract: This proposal describes the DL project at U. of Illinois. It will focus on engineering documents. One idea contained within is searching `concept space` rather than object space, where concepts are a graph of co- occuring terms.
	Bruce R. Schatz, Eric H. Johnson, Pauline A. Cochrane, and Hsinchun Chen. Interactive term suggestion for users of digital libraries: Using subject thesauri and co-occurrence lists for information retrieval. In Proceedings of DL'96, 1996. Format: Not yet online.
	S. Schechter, M. Krishnan, and M. Smith. Using path profiles to predict http request. In Proceedings of 7th International World Wide Web Conference, 1998. Webmasters often use the following rule of thumb to ensure that HTTP server performance does not degrade when traffic is its heaviest provide twice the server capacity required to handle your site's average load. As a result the server will spend half of its CPU cycles idle during normal operation. These cycles could be used to reduce the latency of a significant subset of HTTP transactions handled by the server. In this paper we introduce the use of path profiles for describing HTTP request behavior and describe an algorithm for efficiently creating these profiles. We then show that we can predict request behavior using path profiles with high enough probability to justify generating dynamic content before the client requests it. If requests are correctly predicted and pre-generated by the server, the end user will witness significantly lower latencies for these requests.
	Bill N. Schilit, Anthony LaMarca, Gaetano Borriello, William G. Griswold, David McDonald, Edward Lazowska, Anand Balachandran, Jason Hong, and Vaughn Iverson. Challenge: ubiquitous location-aware computing and the ``place lab'' initiative. In WMASH '03: Proceedings of the 1st ACM international workshop on Wireless mobile applications and services on WLAN hotspots, pages 29-35. ACM Press, 2003.
	G.A. Schloss and M. Stonebraker. Highly redundant management of distributed data. In Proceedings of Workshop on the Management of Replicated Data, pages 91-95. IEEE, IEEE Computing Society, November 1990. Introduces the idea of RADDs (Redundant Array of Distributed Disks). This paper expands the idea of RAIDs to disk that are connected by a reliable high speed data communication network. Main results on storage space utilization, I/O performance and reliability are outlined.
	J. L. Schnase and E. L. Cunnius. The studyspace project: collaborative hypermedia in nomadic computing environments. Communications of the ACM, 38(8):72-3, Aug 1995. The StudySpace Project at Washington University School of Medicine is bringing together an assortment of computing and communications technologies to address the challenges of health sciences education. Collaborative hypermedia is the key integrating technology, and our goal is to provide effective any-time/any-place use of information. In StudySpace, we are using LiveBoards, mobile computers, ATM networks and wireless LANs. Lotus Notes is the primary software system. Spatial and temporal boundaries are reduced by using Notes shared and private databases, while interdocument linking allows the structuring, personalization and continued evolution of information. Asynchronous interactions over this material among developers, students and teachers is supported by Notes' discussion group databases and integrated hypermedia mail facility. However, our experience with current releases of Lotus Notes raises two important design issues that future systems must address in multiplatform, nomadic computing environments: (1) mobile interfaces,and (2) synchronous personalization
	John L. Schnase, John J. Leggett, Edward S. Metcalfe, Nancy R. Morin, Edward L.Cunnius, Jonathan S. Turner, Richard K. Furuta, Leland Ellis, Michael S. Pilant, Richard E. Ewing, Scott W. Hassan, and Mark E. Frisse. The colib project-enabling digital botany for the 21st century. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (49K) . Audience: Non-technical, funders, botanists. References: 30. Links: 1. Relevance: Low. Abstract: U. of Missouri's digital libraries proposal, in the botany domain. points: need for collaboration among DL users, ATM as network platform.
	Werner Schoggl. Establishing computer-based information services in the school library. In Proceedings of DL'96, 1996. Format: Not yet online.
	C. Scholefield. Mobile telecommunications and nomadic computing in asia. In 1997 IEEE 6th International Conference on Universal Person Communications Record. Bridging the Way to the 21st Century, ICUPC '97. Conference Record (Cat. No.97TH8255), volume 2, pages 908-12, 1997. In November 1996 a number companies joined together with a research consortium known as the Microelectronics and Computer Technology Corporation (MCC) to form a North American team on a strategic technology benchmarking tour of Asia. The team met with several companies and government authorities. This paper describe the state-of-the art and anticipated future trends observed during the tour. In particular we found high growth in both personal digital cellular (PDC) and the personal handyphone system (PHS) which are projected to reach 50 million subscribers by 2000. We also gained insights into third generation research activities and a road map to fourth generation wireless networks.
	M. Schunter and M. Waidner. Architecture and design of a secure electronic marketplace. In H. Lubich, editor, Proceedings of JENC8. 8th Joint European Networking Conference (JENC8). Diversity and Integration: The New European Networking Landscape, Amsterdam, Netherlands, 1997. TERENA. Backed by the European Commission, a consortium of partners from European industry, financial institutions, and academia has embarked on a research project to develop the fundamentals of secure electronic commerce. The goal of the ACTS Project SEMPER (Secure Electronic Marketplace for Europe) is to provide the first open and comprehensive solution for secure commerce over the Internet and other public information networks. SEMPER's flexible open architecture is based on a model of electronic commerce which comprehends a business scenario as a sequence of transfers and fair exchanges ofbusiness items, which are payments, data, or rights. This is reflected in the architecture: The exchange and transfer layer handles transfers and fair exchanges of items. The commerce layer provides methods for downloading certified commerce services and the necessary trust management. The commerce services implement the terms of business of a seller using the exchange and transfer layer services. A prototype of this architecture implemented in the Java programming language will be trialed for sales of multimedia courseware (EUROCOM, Athens, GR), on-line consultancy and subscriptions (FOGRA, Munchen, D) as well as mail-order retailing (Otto-Versand, Hamburg, D). It will integrate the payment systems SET (provided by IBM), Chipper (provided by KPN Research), and ecash (provided by DigiCash). The prototype uses a distinguished user-interface for trustworthy user in- and output which enables to use SEMPER on secure hardware.
	Michael Schwartz. Report of the distributed indexing/searching workshop. url, May 1996. At http://www.w3.org/pub/WWW/Search/9605-Indexing-Workshop/.
	Gideon Schwarz. Estimating the dimension of a model. The Annals of Statistic, 6:461-464, 1978.
	Edward Sciore, Michael Siegel, and Arnon Rosenthal. Using semantic values to facilitate interoperability among heterogeneous information systems. Transactions on Database Systems, 19(2):254-290, June 1994. Provides a theory of `semantic values` as a unit of exchange that facilatates semantic interoperability between heterogeneous information systems.
	Michael Seadle, J. R. Deller, and Aparna Gurijala. Why watermark? the copyright need for an engineering solution. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. An important research component in the creation of the National Gallery of the Spoken Word (NGSW) is the development of watermarking technologies for the audio library. In this paper we argue that audio watermarking is a particularly desirable means of intellectual property protection. There is evidence that the courts consider watermarks to be a legitimate form of copyright protection. Watermarking facilitates redress, and represents a form of copyright protection that universities can use without being inconsistent in their mission to disseminate knowledge.
	W. Brent Seales and Yun Lin. Digital restoration using volumetric scanning. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. In this paper we present a new, nondestructive method for revealing inaccessible text buried within damaged books and scrolls. The method is based on volumetric scanning followed by data modeling and physically-based simulation. We show by experiment that it is possible to recover readable text from objects without physically opening or damaging them. In handling damaged collections, conservators often face a choice between two frustrating alternatives: indefinite preservation without analysis, or irreversible physical harm for the sake of potential discovery. We believe that this work creates a new opportunity that embraces both the need to preserve and the possibility for complete analysis.
	Inc. Searchlight Software. Searchlight software's 'project odessa' brings bbs-like features to the web. http://www.searchlight.com/odessa/odessa.htm, 1995.
	Glen M. Secor. Legal aspects of electronic publishing: Look both ways before crossing this street. In DAGS '95, 1995. Format: Not Yet On-line. Audience: Publishers and authors. References: 9. Relevance: Low. Abstract: An attorney's point of view on the problems of electronic rights for published works. Problems include: Moral rights (much easier for a work to be incorporated into something else in a way the author doesn't like); duration of rights and what rights are granted-publishers have typically asked for rights in `all media hereinafter discovered`, something authors are wary of giving away, esp. when many publishers aren't really in the position to exp loit electronic rights. How are royalties granted in the electronic age? Should they be higher because costs are lower, or lower because of new costs of transferring work? Should the publisher have the right to sublicense? Bottom l ine: it's a very unclear area. Forethought can help, as well as an understanding of other parties' positions.
	Erik Selberg and Oren Etzioni. Multi-service search and comparison using the MetaCrawler. In Proceedings of the 4th International WWW Conference, Boston, Mass., December 1995.
	Java servlet technology. http://java.sun.com/products/servlet/.
	Mark England Melissa Shaffer. Librarians in the digital library. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (6K) . Audience: Librarians and Digital Library researchers. References: 0. Links: 1. Relevance: Medium. Abstract: Offers predictions/suggestions for the role of librarians as `teaching, consulting, researching, preserving intellectual and access freedom, and collaborating in the design, application, and maintenance of information access systems.`
	U. Shardanand and P. Maes. Social information filtering: Algorithms for automating ``word of mouth.''. In Proceedings of the Conference on Human Factors in Computing Systems CHI'95, New York, 1995. ACM. We describe a technique for making personalized recommendations from any type of database to a user based on similarities between the interest profile of that user and those of other users. In particular, we discuss the implementation of a networked system called Ringo, which makes personalized recommendations for music albums and artists. Ringo's database of users and artists grows dynamically as more people use the system and enter more information. Four different algorithms for making recommendations by using social information filtering were tested and compared. We present quantitative and qualitative results obtained from the use of Ringo by more than 2000 people.
	Upendra Shardanand and Pattie Maes. Social information filtering: Algorithms for automating 'word of mouth'. In Proceedings of the Conference on Human Factors in Computing Systems CHI'95. Addison-Wesley, 1995.
	C. Shen, K. Everitt, and K. Ryall. Ubitable: Impromptu face-to-face collaboration on horizontal interactive surfaces. In Proceedings of UbiComp 2003, pages 281-288, 2003.
	Chia Shen, Neal Lesh, and Frédéric Vernier. Personal digital historian: story sharing around the table. interactions, 10(2):15-22, 2003.
	S. Shen, R. Mukkamala, A. Wadaa, C. Zhang, H. Abdel-Wahab, K. Maly, A. Liu, and M. Yuan. An interoperable architecture for digital information repositories. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (29K + Picture) . Audience: Slightly technical, funders. References: 11. Links: 1. Relevance: High. Abstract: Basically a proposal that mirrors the Stanford digital library project. They present a 3 layer architecture (User Interface Layer, Interoperability Layer, Resource Repository Layer) that corresponds closely with the interface clients, InfoBus, and Information Source model. Also includes a brief description of mechanisms for Gopher, WAIS, and Archie. Suggests a protocol using a minimal set of efficient primitives that sources would have to provid e to be part of the library, but also expects the set to be extensible.
	M. A. Shepherd, C.R. Watters, and F.J. Burkowski. Digital libraries for electronic news. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	D. Sherertz, M. Tuttle, R. Carlson, R. Acuff, and L. Fagan. Mobile pen-based access to knowledge: Prototype for pen-based, handheld, wireless pc access to pdq and cancerlit databases. Technical report, Lexical Technology, Inc. & Stanford University, 1997. National Cancer Institute SBIR Contract N43-CO-33066 Phase I final report. SBIR topic no. 165: Prototype fr Pen-Based, Handheld, Wireless PC Access to PDQ and CANCERLIT Databases
	Paraic Sheridan and Jean Paul Ballerini. Experiments in multilingual information retrieval using the SPIDER system. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 58-65, Zurich, Switzerland, 1996. This paper introduces an approach to multilingual information retrieval based on the use of thesaurus-based query expansion techniques applied over a collection of comparable multilingual documents. It shows that the SPIDER system retrieves Italian documents in response to user queries written in German with better effectiveness than a baseline system evaluating Italian queries against Italian documents.
	Amit Sheth, editor. Proceedings NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-art and Future Directions, Athens, Georgia, May 1996. At http:://lsdis.cs.uga.edu/activities/NSF-workflow/.
	Amit P. Sheth and James A. Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys, 22(3):183-236, September 1990. This paper defines a reference architecture for distributed database management systems from system and schema viewpoints and shows how various federated database systems (FDBS) can be developed. It then define a methodology for developing one of the popular architectures of an FDBS. Finally, it discusses critical issues related to developing and operating FDBS.
	B. Sheth and P. Maes. Evolving agents for personalized information filtering. In Proceedings of the Ninth Conference on Artificial Intelligence for Applications. IEEE Computer Society Press, 1993. Describes how techniques from artificial life can be used to evolve a population of personalized information filtering agents. The technique of artificial evolution and the technique of learning from feedback are combined to develop a semi-automated information filtering system which dynamically adapts to the changing interests of the user. Results of a set of experiments are presented in which a small population of information filtering agents was evolved to make a personalized selection of news articles from the USENET newsgroups. The results show that the artificial evolution component of the system is responsible for improving the recall rate of the selected set of articles, while learning from feedback component improves the precision rate.
	Jonah Shifrin, Bryan Pardo, Colin Meek, and William Birmingham. Hmm-based musical query retrieval. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We have created a system for music search and retrieval. A user sings a theme from the desired piece of music. Pieces in the database are represented as hidden Markov models (HMMs). The query is treated as an observation sequence and a piece is judged similar to the query if its HMM has a high likelihood of generating the query. The top pieces are returned to the user in rank-order. This paper reports the basic approach for the construction of the target database of themes, encoding and transcription of user queries, and the results of initial experimentation with a small set of sung queries.
	N. Shivakumar and H. Garcia-Molina. The scam approach to copy detection in digital libraries. CNRI D-Lib Magazine, November 1995.
	Narayanan Shivakumar and Héctor García-Molina. Scam: A copy detection mechanism for digital documents. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: PostScript Version () . Audience: Computer scientists, information retrieval experts, technical. References: 26. Links: 0. Relevance: High. Abstract: Describes SCAM, a word-based registration mechanism for copy detection. Considers traditional IR similarity measures (cosine, vector) and shows how they're lacking for copy detection. Introduces a new measure, and contrasts to sentence-based COPS approach. Experimental results on comparison of Netnews articles.
	Narayanan Shivakumar and Héctor García-Molina. Building a scalable and accurate copy detection mechanism. In Proceedings of the First ACM International Conference on Digital Libraries, 1996. Format: pdf, ps
	Narayanan Shivakumar, Hector Garcia-Molina, and Chandra Chekuri. Filtering with approximate predicates. In Proceedings of the Twenty-fourth International Conference on Very Large Databases, pages 263-274, New York City, USA, 1998. VLDB Endowment, Saratoga, Calif.
	Ben Shneiderman, David Felsman, Anne Rose, and Xavier Ferre Grau. Visualizing digital library search results with categorical and hierarchical axes. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Digital library search results are usually shown as a textual list, with 10-20 items per page. Viewing several thousand search results at once on a two-dimensional display with continuous variables is a promising alternative. Since these displays can overwhelm some users, we created a simplified two-dimensional display that uses categorical and hierarchical axes, called hieraxes. Users appreciate the meaningful and limited number of terms on each hieraxis. At each grid point of the display we show a cluster of color-coded dots or a bar chart. Users see the entire result set and can then click on labels to move down a level in the hierarchy. Handling broad hierarchies and arranging for imposed hierarchies led to additional design innovations. We applied hieraxes to a digital video library of science topics used by middle school teachers, a legal information system, and a technical library using the ACM Computing Classification System. Feedback from usability testing with 32 subjects revealed strengths and weaknesses.
	Ben Shneiderman and Hyunmo Kang. Direct annotation: A drag-and-drop strategy for labeling photos. In Proceedings of the International Conference on Information Visualization, May 2000.
	Ben Shneiderman, Azriel Rosenfeld, Gary Marchionini, William G. Holliday, Glenn Ricart, Christos Faloutsos, and Judith P. Dick. Quest-query environment for science teaching. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (31K). Audience: non-technical, funders. References: 14. Links: 1. Relevance: Low-Medium. Abstract: Describes U. of Maryland digital libraries proposal. Focused on user interface, search engines, multimedia, information capture (e.g., page segmentation). In the context of science education.
	Sarah L. Shreeves, Christine Kirkham, Joanne Kaczmarek, and Timothy W. Cole. Utility of an oai service provider search portal [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. The Open Archive Initiative (OAI) Protocol for Metadata Harvesting (PMH) facilitates efficient interoperability between digital collections, in particular by enabling service providers to construct, with relatively modest effort, search portals that present aggregated metadata to specific communities. This paper describes the experiences of the University of Illinois at Urbana-Champaign Library as an OAI service provider. We discuss the creation of a search portal to an aggregation of metadata describing cultural heritage resources. We examine several key challenges posed by the aggregated metadata and present preliminary findings of a pilot study of the utility of the portal for a specific community (student teachers). We also comment briefly on the potential for using text analysis tools to uncover themes and relationships within the aggregated metadata.
	Howard Jay Siegel, Henry G. Dietz, and John K. Antonio. Software support for heterogeneous computing. ACM Computing Surveys, 28(1):237-239, March 1996. Part of ACM Computing Surveys' special issue on Perspectives in Computer Science. Describes supports necessary for executing subtasks on different machines with diverse execution environments.
	A. Silberschatz, J. Peterson, and P. Galvin. Operating System Concepts. Addison Wesley, 1991. Textbook
	Mario J. Silva, Bruno Martins, Marcirio Chaves, and Nuno Cardoso. Adding geographic scope to web resources. In Proceedings of the Workshop on Geographic Information Retrieval, 2004.
	Sergiu S. Simmel and Ivan Godard. Metering and licensing of resources: Kala's general purpose approach. In IP Workshop Proceedings, 1994. Format: HTML Document (72K). Audience: information/software users & producers on the network. References: 9. Links: 0. Relevance: medium. Abstract: Describes a revenue collection mechanism for software/data over in a networked environment. Scheme enables either pay-per-use or licensed arrangements. Stresses recursive nature, so that components may be made up o f other components. Primarily non-technical introduction, followed by specification of API and resource acquiring algorithm in pseudo-code. Mentions concerns of people trying to break the system, and discusses a `cookie` algorithm, b ut it didn't seem like a complete answer.
	M.P. Singh and M.N. Huhns. Automating workflows for service order processing: Integrating ai and database technologies. IEEE Expert, 9(5):19-23, October 1994. We have developed an AI-based architecture that automatically manages workflows, and we have implemented a prototype that executes on top of a distributed computing environment to help a telecommunications company better provide a service that requires coordination among many operation support systems and network elements. The activities involve several database systems, user interfaces, and application programs.
	Marvin A. Sirbu. Internet billing service design and prototype implementation. In IP Workshop Proceedings, 1994. Format: HTML Document (33K + 5 pictures). Audience: Service providers and users . References: 4 notes. Links: 0. Relevance: Medium-High. Abstract: Details an account-based billing server. Lists design requirements, and motivates the need for such a service. Describes the steps involved in a transaction. Buyer sends purchase agreement (including price) to sel ler, seller sends an independent copy to billing server. If both match, server checks that buyer has sufficient funds, then tells service to go ahead. Service does work, then sends invoice to buyer via billing server. Billing server reconciles accounts monthly. Other features: access control, hierarchical organization of corporate accounts, price negotiation & spending caps.
	E. Sloan and A. Okerson. Columbia working group on electronic texts. In JEP, 1994. Format: HTML Document (20K) . Audience: Academicians, esp. university librarians and presses. References: 0. Links: 0. Relevance: low. Abstract: Report from a meeting of university representatives on the need to make electronic publishing viable (primarily economic and timeliness) and the prerequisites for doing so (getting critical mass of papers in a field, ease of use, tenure considerations). Suggests universities encourage their faculty to publish on own university pre-print servers, and have differing levels of status for discussion, pre-print, accepted, etc.
	Arnold W. M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell., 22(12):1349-1380, 2000.
	G. J. M. Smit, P. J. M. Havinga, and D. van Os. The harpoon security system for helper programs on a pocket companion. In Proceedings. 23rd Euromicro Conference: New Frontiers of Information Technology (Cat. No.97TB100167), pages 231-8, 1997. We present a security framework for executing foreign programs, called helpers, on a Pocket Companion: a wireless hand-held computer. A helper program as proposed in this paper is a service program that can migrate once from a server to a Pocket Companion or vice-versa. A helper program is convenient, provides environment awareness and allows asynchronous interaction. Moreover helpers can be used to save processing power and to reduce communication. By migrating to the location of a resource, a helper can access the resource more efficiently. This is particularly attractive for mobile computing, where the network conditions can be poor and unreliable, and because it does not require a permanent connectivity. Security is a significant concern for helpers, as the user of a Pocket Companion receiving a piece of code for execution may require strong assurances about the helper's behaviour. The best way to achieve a high security is to use a combination of several methods. We are designing a prototype of a helper system, called Harpoon, on top of the Inferno operating system.
	Brian K. Smith, Erik Blankinship, III Alfred Ashford, Michael Baker, and Timothy Hirzel. Inquiry with imagery: historical archive retrieval with digital cameras. In MULTIMEDIA '99: Proceedings of the seventh ACM international conference on Multimedia (Part 1), pages 405-408, New York, NY, USA, 1999. ACM Press.
	David A. Smith. Detecting events with date and place information in unstructured text. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Digital libraries of historical documents provide a wealth of information about past events, often in unstructured form. Once dates and place names are identified and disambiguated, using methods that can differ by genre, we examine collocations to detect events. Collocations can be ranked by several measures, which vary in effectiveness according to type of events, but the log-likelihood measure offers a reasonable balance between frequently and infrequently mentioned events and larger and smaller spatial and temporal ranges. Significant date-place collocations can be displayed on timelines and maps as an interface to digital libraries. More detailed displays can highlight key names and phrases associated with a given event.
	David A. Smith, Anne Mahoney, and Gregory Crane. Integrating harvesting into digital library content. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. The Open Archives Initiative has gained success by aiming between complex federation schemes and low functionality web crawling. Much information still remains hidden inside documents catalogued by OAI metadata. We discuss how subdocument information can be exposed by data providers and exploited by service providers. We demonstrate services for citation reversal and name and term linking with harvested data in the Perseus Project's document management system.
	John Miles Smith, Philip A. Bernstein, Umeshwar Dayal, Nathan Goodman, Terry Landers, Ken W.T. Lin, and Eugene Wong. Multibase - integrating heterogeneous distributed database systems. In AFIPS National Computer Conf., 1981.
	Lloyd A. Smith, Elaine F. Chiu, and Brian L. Scott. A speech interface for building musical score collection. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Building machine readable collections of musical scores is a tedious and time consuming task. The most common interface for performing music data entry is a mouse and toolbar system; using the mouse, the user selects a rhythm (note shape) from a toolbar, then drags the note to the correct position on the staff. We compare the usability of a hybrid speech and mouse-driven interface to a traditional mouse-driven one. The speech-enhanced interface allows users to enter note rhythms by voice, while still using the mouse to indicate pitches. While task completion time is nearly the same, users (N=13) significantly preferred the speech-augmented interface. A second study using the first two authors of this paper (N=2) indicates that experienced users can enter music 11 speech interface. Many users expressed a desire to enter pitches, as well as rhythms, by speech. A third study, however, shows that the recognizer is unable to reliably distinguish among A, B, C, D, E, F and G (N=10).
	Lloyd A. Smith, Rodger J. McNab, and Ian H. Witten. Sequence-based melodic comparison: A dynamic-programming approach. In Walter B. Hewlett and Eleanor Selfridge-Field, editors, Melodic Similarity. Concepts, Procedures, and Applications, pages 101-117. MIT Press and Center for Computing in the Humanities (CCARH), Stanford University, 1998. Nice summary of dynamic programming and closest editing distance. Shows how to apply the concept to music.
	Terence R. Smith. A digital library for geographically referenced materials. Computer, 29(5):54 - 60, MAY 1996.
	Terence R. Smith, Steven Geffner, and Jonathan Gottsegen. A general framework for the meta-information and catalogs in digital libraries, 1996. At http://www.nml.org/resources/misc/metadata/proceedings/smith/ieee.html. We present a general framework to support the modeling of digital documents and user queries in the context of digital libraries (DL's). The basis of the framework is a four-component model of a DL catalog involving a document modeling component, a query modeling component, a match component, and a catalog interoperability component. Meta-information in such a catalog provides models of library documents and facilitates efficient access to information represented in the documents. In particular, meta-information is conceptualized in terms of sets of relations between nominal representations of library documents and their properties, and sets of relations between document properties. The properties of the documents are modeled in meta-information in terms of a multiplicity of languages which vary between the catalog components and between catalogs. Each of the catalog components is modeled in terms of a set of formal systems related to the languages employed in the component. Using this framework, we discuss the two critical issues of catalog intraoperability and catalog interoperability. The framework provides a basis both for the rational design of meta-information and catalogs in DL contexts, and for an analysis and resolution of the intraoperability and interoperability issues. We provide examples of the issues discussed in terms of the Alexandria Digital Library.
	Snakefeet. Snakeeyes. Snakefeet website: http://www.snakefeet.com/.
	MMM Snyman and M Jansen van Rensburg. Revolutionizing name authority control. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. A new model has been developed for the standardization of names in bibliographic databases. This paper describes the model and its implementation and also compares it with an existing model. The results show that the new model will revolutionize name authority control and will also improve on the existing NACO model. A prototype that was developed also indicates the technical feasibility of the model's implementation.
	Caj Sodergard, Matti Aaltonen, Sari Hagman, Mikko Hiirsalmi, Timo Jarvinen, Eija Kaasinen, Timo Kinnunen, Juha Kolari, Jouko Kunnas, and Antti Tammela. Integrated multimedia publishing: Combining tv and newspaper content on personal channels. In Proceedings of the Eighth International World-Wide Web Conference, 1999. Fast networks enable the delivery of TV and newspaper content over an Internet connection. This enables new types of integrated publications that include features from both media. The IMU system, described in this paper, automatically integrates newspaper and TV content into a continuously updated World Wide Web-multimedia publication. An active proxy server pursues the integration and delivers the publication through an ATM fibre link to fast networks, such as the bi-directional cable TV network and the ADSL telephone network, providing near-TV quality. The users read the IMU publication from the Internet on their PCs with normal World Wide Web-browsers. You can also watch the publication on your Internet TV set. The proxy server captures metadata from the Web sites and from the editorial systems of the IMU content providers. In addition, the system keeps track of the choices of the user and proposes what news the user and his/her social group would most probably be interested in. The user interface is based on personalisable channels, which gather news material according to the priorities defined by the editors and the users. For ease of use the proxy server automatically Paginates the articles into a sequence of browsable pages. News articles and TV news are linked to each other through automatic association. In a field trial lasting eight months, 62 people used the service through the bi-directional cable TV network in their homes. The average IMU session was brief, focusing on a few and fresh articles, and took place in the evening at prime time or in the morning. Both TV and newspaper content interested the users. Personalisation was not too attractive - only some of the users created their own channels. In the user interviews, the integration of content was VIewed as the key feature.
	Smartcode Software. Handweb. Smartcode Software website: http://www.smartcodesoft.com/.
	K. Sollins and L. Masinter. Functional requirements for uniform resource names. Technical Report RFC 1737, Network Working Group, December 1994. At http://info.internet.isi.edu/in-notes/rfc/files/rfc1101.txt. This RFC introduces URNs.
	Sang Hyuk Son. Replicated data management in distributed database systems. SIGMOD Record, 17(4):62-9, December 1988. This paper classifies different synchronization methods. The dimensions explored are: optimistic vs. pesimistic algorithms, syntactic vs. semantic approaches, majority algorithms, weighted voting algorithms, quorum and ADTs, and special copy approaches. The descriptions underly mechanisms and the type of information they use in ordering the operation of the transactions.
	Susan Sontag. On Photography. Picador, New York, NY, 1977.
	Von-Wun Soo, Chen-Yu Lee, Chung-Cheng Lin, Shu Lei Chen, and Ching chih Chen. Automated semantic annotation and retrieval based on sharable ontology and case-based learning techniques. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Effective information retrieval (IR) using domain knowledge and semantics is one of major challenges in IR. In this paper we propose a framework that can facilitate the image retrieval based on the sharable domain ontology and thesaurus. In particular, a case-based learning (CBL) using natural language phrase parser is proposed to convert a natural language query into a resource description framework (RDF) format, a semantic-web standard of metadata description that supports machine readable semantic representation. This same parser is also extended to perform the semantic annotation on the descriptive metadata of images and convert metadata automatically into the same RDF representation. The retrieval of images can then be conducted by matching the semantic and structural descriptions of the user query with those of the annotated descriptive metadata of images. We tested our problem domain by retrieving the historical and cultural images taken from Dr. Ching-chih Chen `First Emperor of China` CD-ROM as part of our productive international digital library collaboration. We have constructed and implemented the domain ontology, a Mandarin Chinese thesaurus, as well as the similarity match and retrieval algorithms in order to test our proposed framework. Our experiments have shown the feasibility and usability of these approaches.
	Von-Wun Soo, Chen-Yu Lee, Chao-Chun Yeh, and Ching chih Chen. Using sharable ontology to retrieve historical images. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. We present a framework of utilizing sharable domain ontology and thesaurus to help the retrieval of historical images of the First Emperor of China's terracotta warriors and horses. Incorporating the sharable domain ontology in RDF and RDF schemas of semantic web and a thesaurus, we implement methods to allow easily annotating images into RDF instances and parsing natural language like queries into the query schema in XML format. We also implement a partial structural matching algorithm to match the query schema with images at the level of semantic schemas. Therefore the historical images can be retrieved by native users of domain specific history in terms of natural language like queries.
	Von-Wun Soo, Shih-Yao Yang, Shu-Lei Chen, and Yi-Ting Fu. Ontology acquisition and semantic retrieval from semantic annotated chinese poetry. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. This research aims to utilize semantic web[1] technology to the semantic annotation of classical Chinese poetry. We investigate the feasibilities and advantages of semantic retrieval and automatic ontology acquisition from semantically annotated poems based on a Chinese thesaurus. We have induced a set of semantic composition rules for pair-wise character (word) patterns that can be used to parse the poem sentences and recursively generate RDF[2] triple relations among the pair of characters (words). We have also defined a scoring scheme to assess semantic similarity for semantic retrieval. We showed that the semantic retrieval method significantly outperformed the keyword-based retrieval method.
	Slvia Barcellos Southwick and Richard Southwick. Learning digital library technology across borders [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. This paper describes the background context and initial findings from an ongoing case study of an electronic theses and dissertations (ETD) digital library (DL) project in Brazil. The specific focus of the case study centers on the activities of a Brazilian government agency acting as a mediator between software developers - primarily academic institutions in the United States - and university clients in Brazil. The authors highlight the loosely integrated nature of the DL technology, and the uncertain relationship between developers and users in terms of support. These circumstances reinforce a view of technology transfer as a process of organizational learning. As a consequence, the mediating institution in the study is viewed as assuming multiple roles in advancing the project.
	Michael Southworth and Susan Southworth. Maps : a visual survey and design guide. Little, Brown, 1982.
	Stephan M. Spencer, Jean-Yves Sgro, and Max L. Nibert. Electronic publishing of virus structures in novel, multimedia formats on the world wide web. In DAGS '95, 1995. Format: HTML Document (10K + pictures) . Audience: Molecular virologists. References: 13. Links: 7. Abstract: Describes the advantages of using color and animation to display complex molecules like viruses. Talks about system at U. of Wisconsin- Madison.
	Ellen Spertus. Parasite: Mining structural information on the web. In Proceedings of the Sixth International World-Wide Web Conference, 1997.
	Diomidis Spinellis. Position-annotated photographs: A geotemporal web. IEEE Pervasive Computing, 2(2):72-79, 2003.
	Amanda Spink. Digital libraries and sustainable development? in dl '95 proceedings. 1995. Format: HTML Document (25K) . Audience: Social scientists. References: 50. Links: 1. Relevance: None. Abstract: Quote from paper: `Second, we need to consider whether we are encouraging and participating in the development of an unsustainable vision of a global information infrastructure and possibly contributing to a future crisis of human survival? Is the current imperative is toward global industrialization, the development of national and global information infrastructures, the`information society, digital libraries, and the technological development of LDCs sustainable? We need to consider the possible role of digital libraries within alternate futures for humanity? ... We need to understanding the informational dimensions, impacts and implications of sustainable development for digital libraries research. What are the implications for digital libraries if social change and movements diverge away from modernity? What is the relationship between digital libraries and the sustained development of global industrialization? Will our contribution to the solution of global problems through digital libraries evolve or disappear - if the neoclassical view proves unsustainable? What could be the role of digital libraries in down scaling industrial economies to a sustainable society within a basic needs approach?
	Gordon K. Springer and Timothy B. Patrick. Translating data to knowledge in digital libraries. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (8K) . Audience: Non technical, funders. References: 3. Links: 1. Relevance: Low. Abstract: Argues for the need for `filters` programs which will turn the use of the web from document retrieval to information retrieval. A large number of user and task specific filters will be required.
	Sargur N. Srihari, Stephen W. Lam, Jonathan J. Hull, Rohini K. Srihari, and Venugopal Govindaraju. Intelligent data retrieval from raster images of documents. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (24K + pictures) . Audience: Semi-technical, general computer scientists. References: 12. Links: 1. Relevance: Medium (but not mainstream DL). Abstract: Describes a method for getting information from raster image of documents. OCR aided by appealing to word frequencies in similar documents; some processing of graphics (sort by type-bar chart, pie chart, photo, tab le, schematic drawing), builds upon related system to find faces in photos.
	Mark A. Stairmand. Textual content analysis for information retrieval. In Proceedings of the Twentieth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1997. Shows how WordNet-based search facility doesn't work for text segmentation and word disambiguation, but that it does work for improved indexing
	Stanford University, University of California Berkeley, University of California Santa Barbara, San Diego Supercomputing Center, and California Digital Library. The Simple Digital Library Interoperability Protocol, 2000. Available at http://www-diglib.stanford.edu/ testbed/doc2/SDLIP/. Main citation for SDLIP
	Dominic Stanyer and Rob Procter. Improving web usability with the link lens. In Proceedings of the Eighth International World-Wide Web Conference. A number of factors may influence Web users' choice of which links to follow. These include assumptions about document quality and anticipated retrieval times. The present generation of World Wide Web browsers, however, provide only minimal support to assist users in making informed decisions. Web browser `link user interfaces' typically only display a document's Universal Resource Locator CURL), whilst a simple binary colour change in the URL's anchor is used to indicate its activation history. The question then is, how do users deal with the problem of having to make such decisions when the information at hand is insufficient? We have been conducting an investigation of how users make link selections.The results show users often are forced to fall back on heuristics and improvising strategies drawn from past experience. Based upon these results, we present a prototype of the `link lens', an enhanced link user interface designed to make such decisions easier and more productive for all users and help less experienced ones gain a better understanding of Web behaviour.
	David Steier. Comparable datasets in performance benchmarking. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	M. Stemm and R. H. Katz. Measuring and reducing energy consumption of network interfaces in hand-held devices. IEICE Transactions on Communications, vol.E80-B, no.8, p. 1125-31, E80-B(8):1125-31, 1997. Next generation hand-held devices must provide seamless connectivity while obeying stringent power and size constraints. We examine this issue from the point of view of the network interface (NI). We measure the power usage of two PDAs, the Apple Newton Messagepad and Sony Magic Link, and four NIs, the Metricom Ricochet Wireless Modem, the AT&T Wavelan operating at 915 MHz and 2.4 GHz, and the IBM Infrared Wireless LAN Adapter. These measurements clearly indicate that the power drained by the network interface constitutes a large fraction of the total power used by the PDA. We then examine two classes of optimizations that can be used to reduce network interface energy consumption on these devices: transport-level strategies and application-level strategies. Simulation experiments of transport-level strategies show that the dominant cost comes not from the number of packets sent or received by a particular transport protocol but the amount of time that the NI is in an active but idle state. Simulation experiments of application-level strategies show that significant energy savings can be made with a minimum of user-visible latency.
	Amanda Stent and Alexander Loui. Using event segmentation to improve indexing of consumer photographs. In 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 59-65. ACM Press, 2001.
	B. Sterzbach and W.A. Halang. A mobile vehicle on-board computing and communication system. Computers & Graphics, vol.20, no.5, p. 659-67, 20(5):659-67, 1996. The DuO vehicle tracking and fleet management system is presented as an example for a mobile computing application. The paper focuses on the visualization of geographical information at a control centre, on the design of the on-board unit and on the interaction and communication between the on-board units and the control centre. The technologies of GPS positioning and GSM data communication are presented as they are used within the system.
	J. Stewart, B. Bederson, and A. Druin. Single display groupware: A model for co-present collaboration. In Proceedings of the Conference on Human Factors in Computing Systems CHI'99, pages 286-293, 1999.
	Peter Stone and Manuela Veloso. User-guided interleaving of planning and execution. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	David Stotts, John Smith, Prasun Dewan, Kevin Jeffay, F. Donalson Smith, Dana Smith, Steven Weiss, James Coggins, and William Oliver. A patterned injury digital library for collaborative forensic medicine. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (44K). Audience: Mostly Non-technical, pathologists, funders . References: 26. Links: 1. Relevance: Low. Abstract: Considers the benefits and research issues related to the construction fo a collection of forensic data (image, video). Short description of some related technologies (WWW, Trellis, Dexter, ABC, Hypersets).
	J. Strain, R. Felciano, A. Seiver, R. Acuff, and L. Fagan. Optimizing physician access to surgical intensive care unit laboratory information through mobile computing. In Proceedings of the 1996 AMIA Annual Fall Symposium, pages 812-6, October 1996.
	Norbert A. Streitz, Jörg M. Haake, and Jeroen Hol. DOLPHIN: Integrated meeting support across local and remote desktop environments and liveboards. In Proceedings of the Conference on Computer-Supported Cooperative Work, CSCW'94, 1994.
	William S. Strong. Copyright in the new world of electronic publishing. In JEP, 1994. Format: HTML Document (29K) . Audience: Publishers. References: 0. Links: 0. Relevance: Low. Abstract: A lawyer talks about the future of copyright and electronic publishing. Argues that people will obey laws and not make lots of illicit copies. Publishers should aid that by keeping prices low, educating public on c opyright and fair use, and have simple licensing agreements.
	Tomek Strzalkowski, Jose Perez-Carballo, and Mihnea Marinescu. Natural language information retrieval in digital libraries. In Proceedings of DL'96, 1996. Format: Not yet online.
	Shigeo Sugimoto, Seiki Gotou, Yanchun Zhao, Tetsuo Sakaguchi, and Koichi Tabata. Enhancing usability of network-based library information system- experimental studies on user interface for opac and of a collaboration tool for library services. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document (27K + pictures) . Audience: HCI people, computer scientists. References: 11. Links: 6. Relevance: Low. Abstract: Describes two systems: one, an on-line catalog that uses a bookshelf metaphor (actually displaying spines of books with their titles, height and width from bibliographic data) with ordering by Dewey decimal system. The second system is a collaborative one, between librarian and user Includes video linkage (at 2 frames/sec) and half-duplex audio. Also, a shared workspace where either person can control cursor, button clicks. Runs over Ethernet at 10Mbps.
	Klaus Sullow and Rainer Page. Hypermedia browsing and the online publishing process. In DAGS '95, 1995. Format: Not Yet Online. Audience: Web users, computer scientists. References: 13. Links: . Relevance: Low-Medium. Abstract: Describes a browser (BWON) with proxy that caches pages locally, but also represents web structure graphically. Graph algorithm adds new nodes leaving old ones in current locations, possible to add filter to determ ine which nodes are added to the graph (filter based on URL. Paper also suggests allowing retrieval of HEAD information separately from full contents.) Can apply different (stronger) filters as you move farther from the focus node, r esulting in a `fish-eye` view.
	Kristen Summers. Logical structure types for documents. In DAGS '95, 1995. Format: HTML Document(41K + pictures) . Audience: Computer Scientists, . References: 26. Links: 4. Relevance: Low. Abstract: Attempts to automatically capture the structure of a document from bitmap/PostScript image. Uses geometric distinctions (contours & indentation), marking observables (font type and weight, bullets, rule-lines), linguistic observables (combinations of alphabetic and numeric characters), and contextual observables (presence of other blocks around the target block).
	Tamara Sumner, Sonal Bhushan, and Faisal Ahmad. Designing a language for creating conceptual browsing interfaces for digital libraries [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Conceptual browsing interfaces can help educators and learners to locate and use learning resources in educational digital libraries; in particular, resources that are aligned with nationally-recognized learning goals. Towards this end, we are developing a Strand Map Library Service, based on the maps published by the American Association for the Advancement of Science (AAAS). This service includes two public interfaces: (1) a graphical user interface for use by teachers and learners and (2) a programmatic interface that enables developers to construct conceptual browsing interfaces using dynamically generated components. Here, we describe our iterative, rapid prototyping design methodology, and the initial round of language type components that have been implemented and evaluated.
	Tamara Sumner and Melissa Dawe. Looking at digital library usability from a reuse perspective. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. The need for information systems to support the dissemination and reuse of educational resources has sparked a number of large-scale digital library efforts. This article describes usability findings from one such project - the Digital Library for Earth System Education (DLESE) - focusing on its role in the process of educational resource reuse. Drawing upon a reuse model developed in the domain of software engineering, the reuse cycle is broken down into five stages: formulation of a reuse intention, location, comprehension, modification, and sharing. Using this model to analyze user studies in the DLESE project, several implications for library system design and library outreach activities are highlighted. One finding is that resource reuse occurs at different stages in the educational design process, and each stage imposes different and possibly conflicting requirements on digital library design. Another finding is that reuse is a distributed process across several artifacts, both within and outside of the library itself. In order for reuse to be successful, a usability line cannot be drawn at the library boundary, but instead must encompass both the library system and the educational resources themselves.
	Tamara Sumner, Michael Khoo, Mimi Recker, and Mary Marlino. Understanding educator perceptions of 'quality' in digital libraries. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. The purpose of the study was to identify educators' expectations and requirements for the design of educational digital collections for classroom use. A series of five focus groups was conducted with practicing teachers, pre-service teachers, and science librarians, drawn from different educational contexts (i.e., K-5, 6-12, College). Participants' expect that the added value of educational digital collections is the provision of: (1) 'high quality' teaching and learning resources, and (2) additional contextual information beyond that in the resource. Key factors that influence educators' perceptions of quality were identified: scientific accuracy, bias, advertising, design and usability, and the potential for student distraction. The data showed that participants judged these criteria along a continuum of tolerance, combining consideration of several factors in their final judgements. Implications for collections accessioning policies, peer review, and digital library service design are discussed.
	Tamara Sumner and Mary Marlino. Digital libraries and educational practice: A case for new models. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Educational digital libraries can benefit from theoretical and methodological approaches that enable lessons learned from design and evaluation projects performed in one particular setting to be applied to other settings within the library network. Three promising advances in design theory are reviewed references tasks, design experiments, and design genre. Each approach advocates the creation of intermediate constructs as vehicles for knowledge building and knowledge sharing across design and research projects. One purpose of an intermediate construct is to formulate finer-grained models that describe and explain the relationship between key design features and the cognitive and social dimensions of the context of use. Three models are proposed and used as thought experiments to analyze the utility of these approaches to educational digital library design and evaluation: digital libraries as cognitive tools, component repositories, and community centers.
	Yanfeng Sun, Hongjiang Zhang, Lei Zhang, and Mingjing Li. Myphotos: a system for home photo management and processing. In MULTIMEDIA '02: Proceedings of the tenth ACM international conference on Multimedia, pages 81-82. ACM Press, 2002.
	Katia Sycara and Dajun Zeng. Task-based multi-agent coordination for information gathering. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	J. Alfredo Sßnchez and Anbal Arias. Fourth-phase digital libraries: Pacing, linking, annotating and citing in multimedia collections [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. We discuss the implications of the use of current multimedia collections and posit that it is possible to build what we term fourth-phase digital libraries (4PDLs). In 4PDLs users can take advantage of both the powerful audiovisual channels and the proven practices developed for media such as text. We demonstrate how various technologies can be integrated to produce a 4PDL.
	Kenji Takahashi and Eugene Liang. Analysis and design of web-based information systems. In Proceedings of the Sixth International World-Wide Web Conference, 1997.
	Atsuhiro Takasu. Bibliographic attribute extraction from erroneous references based on a statistical model. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. This paper proposes a method for extracting bibliographic attributes from OCR processed reference strings using an extended hidden Markov model. Bibliographic attribute extraction can be used in two ways. One is reference parsing in which attribute values are extracted from OCR processed reference for bibliographic matching. The other is reference alignment in which attribute values are aligned to the bibliographic record for enriching the vocabulary of the bibliographic database. In this paper, we first propose a statistical model for the attribute extraction which represents both syntactical structure of references and Optical Character Recognition (OCR) error patterns. Then, we perform experiments using bibliographic references obtained from scanned images of papers in journals and transactions and show that useful attribute values are extracted from OCR processed references with the accuracy of 93.9reducing the cost of preparing training data, which is a critical problem in rule-based systems.
	Hideaki Takeda, Kenji Iino, and Toyoaki Nishida. Ontology-supported agent communication. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	Kian-Lee Tan, Cheng Hian Goh, and Beng Chin Ooi. On getting some answers quickly, and perhaps more later. In Proceedings of the 15th International Conference on Data Engineering, pages 32-39, Sydney, Austrialia, 1999. ACM Press, New York.
	Andrew S. Tanenbaum. Computer Networks, 2nd Ed. Prentice-Hall, Englewood Cliffs, NJ, 1989. Chapter 1 (Introduction) provides a good discussion on the OSI reference model, and the network standardization issues.
	Andrew S. Tanenbaum. Distributed Operating Systems. Prentice Hall, Englewood Cliffs, NJ, 1995.
	Robert Tansley, Mick Bass, David Stuve, Margret Branschofsky, Daniel Chudnov, Greg McClellan, and MacKenzie Smith. Dspace: An institutional digital repository system. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. In this paper we describe DSpace ., an open source system that acts as a repository for digital research and educational material produced by an organization or institution. DSpace was developed during two years. collaboration between the Hewlett-Packard Company and MIT Libraries. The development team worked closely with MIT Libraries staff and early adopter faculty members to produce a .breadth-first. system, providing all of the basic features required by a digital storage and preservation service. We describe the functionality of the current DSpace system, and briefly describe its technical architecture. We conclude with some remarks about the future development and operation of the DSpace system.
	Linda Tauscher and Saul Greenberg. Revisitation patterns in world wide web navigation. In Proceedings of the Conference on Human Factors in Computing Systems CHI'97, 1997.
	Dario Teixeira and Yassine Faihe. In-home access to multimedia content. In MULTIMEDIA '02: Proceedings of the tenth ACM international conference on Multimedia, pages 49-56, New York, NY, USA, 2002. ACM Press.
	Roy Tennant. The berkeley digital library sunsite. D-Lib Magazine, Feb 1996. Format: HTML Document().
	Y. Teranishi, F. Tanemo, and Y. Umemoto. Dynamic object recomposition for active information system on mobile environment. In Proceedings IDEAS '97. International Database Engineering and Applications Symposium (Cat. No.97TB100166), pages 220-8, 1997. Location-dependent information service requires an ability to change both the information and functions provided to the mobile client according to user location and time of access. The paper describes a framework for location dependent information service, called the mobile object model. In the mobile object model, a service consists of objects that contain both information and functions. These objects, called mobile objects, are managed as distributed objects. Mobile objects are recomposed to a composite object according to user status dynamically. This composite object is then executed to realize location-dependent information service. We also propose an architecture based on the model. The architecture reduces the load of network transmission and the amount of CPU power used by the mobile client. Finally, we present a prototype system developed on the WWW called LODIS.
	D.B. Terry, A.J. Demers, K. Petersen, M.J. Spreitzer, M.M. Theimer, and B.B. Welch. Session guarantees for weakly consistent replicated data. In Proceedings Third International Conference on Parallel and Distributed Information Systems, pages 140-149, Austin, Texas, September 1994. At http://www.parc.xerox.com/bayou/.
	D.B. Terry, M.M. Theimer, K. Petersen, A.J. Demers, M.J. Spreitzer, and C.H. Hauser. Managing update conflicts in bayou, a weakly connected replicated storage system. In Proceedings 15th Symposium on Operating Systems Principles, pages 172-183, Cooper Mountain, Colorado, December 1995. http://www.parc.xerox.com/bayou/. Main Bayou reference for Doug Terry's system.
	L. Terveen, W. Hill, B. Amento, D. McDonald, and J. Creter. Phoaks: a system for sharing recommendations. Communications of the ACM, 40(3):59-62, March 1997. Finding relevant, high-quality information on the World Wide Web is a difficult problem. PHOAKS (People Helping One Another Know Stuff) is an experimental system that addresses this problem through a collaborative filtering approach. PHOAKS works by automatically recognizing, tallying and redistributing recommendations of Web resources mined from Usenet news messages.
	Loren Terveen and Will Hill. Finding and visualizing inter-site clan graphs. In Proceedings of the Conference on Human Factors in Computing Systems CHI'98, 1998.
	Yin Leng Theng, Norliza Mohd-Nasir, George Buchanan, Bob Fields, Harold Thimbleby, and Noel Cassidy. Dynamic digital libraries for children. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. The majority of current digital libraries (DLs) are not designed for children. For DLs to be popular with children, they need to be fun, easy-to-use and empower them, whether as readers or authors. This paper describes a new children's DL emphasizing its design and evaluation, working with the children (11-14 year olds) as design partners and testers. A truly participatory process was used, and observational study was used as a means of refinement to the initial design of the DL prototype. In constrast with current DLs, the children's DL provides both a static as well as a dynamic environment to encourage active engagement of children in using it. Design, implementation and security issues are also raised.
	Gomer Thomas, Glenn R. Thompson, Chin-Wan Chung, Edward Barkmeyer, Fred Carter, Marjorie Templeton, Stephen Fox, and Berl Hartman. Heterogeneous distributed database systems for production use. ACM Computing Surveys, 22(3):237-266, September 1990. A survey of the state of the art in heterogeneous distributed database systems targeted for production environments.
	J. Thomas, K. Pennock, T. Fiegel, J. Wise, M. Pottier, A. Schur, D. Lantrip, and V. Crow. The visual analysis of textual information: Browsing large document sets. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	P. J. Thomas, J. F. Meech, and J. Williams. Multimedia information using mobile computers: accessing the digital campus and the digital library. New Review of Hypermedia and Multimedia, Applications and Research, 2:17-23, 1996. The role of the information resource is changing. Publishers have been slow to adapt to the emergence of a global digital medium, but there are now signs that a great deal of information will be delivered online. However, digital publishing on the Internet with services for libraries will be a driving force in creating the global digital medium. One issue that will become increasingly relevant is how the individual user accesses rich multimedia data in the most appropriate way. The digital university campus and the digital library are becoming important concepts, with the aim that users of information services will receive information online supported by a ubiquistructure' of IT. For the digital campus, this means that scholarly and teaching activities are based on interactive access to information, and that the digital bookshop and the digital classroom are becoming possible with the development of 140 Mb/s SuperJANET links. However, libraries will not be truly digital for the foreseeable future, and they will maintain traditional and digital media side by side. We look at the digital library and the digital campus from the perspective of the individual user and his information needs. We are particularly interested in the use of small, mobile computers as access points to the global digital medium. In an environment of change (where the traditional campus and library exist alongside the digital campus and library), the most appropriate form of access technology is based on personal technology, which allows linking between digital information and traditional paper-based information.
	Richard E. Thompson. Agricultural network information center (agnic): A model for access to distributed resources. In Proceedings of DL'96, 1996. Format: Not yet online.
	Helen R. Tibbo. Primarily history: Historians and the search for primary source materials. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. This paper describes the first phase of an international project that is exploring how historians locate primary resource materials in the digital age, what they are teaching their Ph.D. students about finding research materials, and what archivists are doing to facilitate access to these materials. Preliminary findings are presented from a survey of 300 historians studying American History from leading institutions of higher education in the U.S. Tentative conclusions indicate the need to provide multiple pathways of access to historical research materials including paper-based approaches and newer digital ones. The need for user education, especially in regard to electronic search methodologies is indicated.
	D. E. Toliver. OL'SAM: An intelligent front-end for bibliographic information retrieval. Information, Technology and Libraries, 1(4):317-326, 1982.
	Anthony Tomasic and Hector Garcia-Molina. Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In Proceedings of 2nd International Conference on Parallel and Distributed Information Systems, pages 8-17, January 1993.
	Anthony Tomasic and Hector Garcia-Molina. Query processing and inverted indices in shared-nothing document information retrieval systems. VLDB Journal, 2(3):243-275, 1993.
	Anthony Tomasic, Hector Garcia-Molina, and Kurt Shoens. Incremental update of inverted list for text document retrieval. In Proceedings of the International Conference on Management of Data, pages 289-300, May 1994.
	Elaine Toms, Christine Dufour, Jonathan Lewis, and Ron Baecker. Assessing tools for use with webcasts. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. This research assessed the effectiveness of selected interface tools for helping people respond to classic information tasks with webcasts. Webcasts, another form of multi-media, have little public research. Rather than focus on classic search/browse task to locate an appropriate webcast to view, our work takes place at the level of an individual webcast to assess interactivity with the contents of a single webcast. The questions guiding our work are: 1) Which tool(s) are the most effective in achieving the best response? 2) What types of tools are needed for optimum response? In this study, 16 participants responded to a standard set of information tasks using ePresence, a webcasting system that handles both live and stored video, and that provides multiple techniques for accessing content. Using questionnaires, screen capture and interviews, we evaluated the interaction holistically and found that the tools in place were not as useful as was expected.
	Elaine G. Toms, Christine Dufour, and Susan Hesemeier</. Measuring the user's experience with digital libraries. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. In this paper, we propose a method for assessing user experience. Normally evaluation is based on usability or on the efficiency of the search or effectiveness of focused information search tasks. Yet all experiences with libraries (whether physical or virtual) need not be for the explicit purpose of finding, acquiring and using information. The experience and its playfulness and pleasure have equal value. To assess this experience, we modified a experiential value scale developed for online shopping and will be testing it in the context of culture and heritage websites.
	Richard M. Tong and David H. Holtzman. Knowledge-based access to heterogeneous information sources. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document (24K) . Audience: slightly technical, generalist comfortable with technology, funders, slight business slant. References: 4. Links: 1. Relevance: Low-Medium. Abstract: Describes the MINERVA architecture developed at Booz Allen. Two levels of mediators between users and content sources. Describes `Text Reference Language` to describe queries.
	Simon Tong and Edward Chang. Support vector machine active learning for image retrieval. In MULTIMEDIA '01: Proceedings of the ninth ACM international conference on Multimedia, pages 107-118, New York, NY, USA, 2001. ACM Press.
	Kentaro Toyama, Ron Logan, and Asta Roseway. Geographic location tags on digital images. In Proceedings of the 11th International Conference on Multimedia (MM2003), pages 156-166. ACM Press, 2003.
	Masashi Toyoda and Masaru Kitsuregawa. Extracting evolution of web communities from a series of web archives. In HYPERTEXT '03: Proceedings of the fourteenth ACM conference on Hypertext and hypermedia, pages 28-37, New York, NY, USA, 2003. ACM Press. Recent advances in storage technology make it possible to store a series of large Web archives. It is now an exciting challenge for us to observe evolution of the Web. In this paper, we propose a method for observing evolution of web communities. A web community is a set of web pages created by individuals or associations with a common interest on a topic. So far, various link analysis techniques have been developed to extract web communities. We analyze evolution of web communities by comparing four Japanese web archives crawled from 1999 to 2002. Statistics of these archives and community evolution are examined, and the global behavior of evolution is described. Several metrics are introduced to measure the degree of web community evolution, such as growth rate, novelty, and stability. We developed a system for extracting detailed evolution of communities using these metrics. It allows us to understand when and how communities emerged and evolved. Some evolution examples are shown using our system.
	Wei-Ho Tsai and Hsin-Min Wang. On the extraction of vocal-related information to facilitate the management of popular music collections. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. With the explosive growth of networked collections of musical material, there is a need to establish a mechanism like a digital library to manage music data. This paper presents a content-based processing paradigm of popular song collections to facilitate the realization of a music digital library. The paradigm is built on the automatic extraction of information of interest from music audio signals. Because the vocal part is often the heart of a popular song, we focus on developing techniques to exploit the solo vocal signals underlying an accompanied performance. This supports the necessary functions of a music digital library, namely, music data organization, music information retrieval/recommendation, and copyright protection.
	Douglas Tudhope, Ceri Binding, Dorothee Blocks, and Daniel Cunliffe. Compound terms in context: A matching function for search thesauri. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. In faceted thesauri, application guidelines encourage the synthesis of individual terms when indexing. These faceted compound strings have potential for high precision in retrieval. However, the lack of flexible retrieval tools that yield ranked matches of strings hinders the application of thesauri and related Knowledge Organization Systems to domains requiring precision in searching. Previous work has tended to approach the problem as involving unstructured lists. This paper extends the notion of similarity between strings of terms by considering the possibilities afforded by faceted thesauri, focusing on the Art and Architecture Thesaurus. The work reported is part of the ongoing 'FACET' project in collaboration with the National Museum of Science and Industry and its collections database. The paper discusses a matching function for faceted subject headings that does not rely on exact matching but incorporates term expansion via a measure of semantic closeness. Ranked results are produced for strings of terms that take account of facet structure and deal intuitively with missing or partially matching terms.
	M.B. Twidale, D.M. Nichols, and C.D. Paice. Browsing is a collaborative processs. Technical Report Technical Report CSEG/1/96, Computing Dept., Lancaster University, 1996. Interfaces to databases have traditionally been designed as single-user systems that hide other users and their activity. This paper aims to show that collaboration is an important aspect of searching online information stores that requires explicit computerised support. The claim is made that a truly user-centred system must acknowledge and support collaborative interactions between users. Collaborative working implies a need to share information: both the search product and the search process. Searches need not be restricted to inanimate resources but people can also search for other people. The ARIADNE system is introduced as an example of computerised support for collaboration between browsers. A number of systems offering varied approaches to supporting collaboration are surveyed and a structure for analysing the various aspects of collaboration is applied.
	J.D. Tygar and Bennet Yee. Dyad: A system for using physically secure coprocessors. In IP Workshop Proceedings, 1994. Format: HTML Document (97K). Audience: Computer Scientists, reasonably technical. References: 9 notes, 65 references. Links: 0. Relevance: Medium-Low. Abstract: Discusses the possibility of secure co-processors, so that clear text is never available on a non-secure co-processor. Also applicable for contracts, authentication, audit trails, and digital cash (which wouldn't require access to a central bank server.) Seems a cumbersome hardware solution, but something similar may be necessary in a `total protection` model. Good glossary of cryptography terms and good bibliography.
	Jeffrey D. Ullman. Information integration using logical views. In Proceedings of the 6th International Conference on Database Theory, Delphi, Greece, January 1997. Springer, Berlin.
	K.T. Unruh, K.E. Pettigrew, and J.C. Durrance. Evaluation of digital community information systems. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Community information systems provide a critical link between local resources and residents. While online versions of these systems have potential benefits, a systematic evaluation framework is needed to analyze and document realized impacts. Based on data from a nation-wide study of digital community information systems, an evaluation framework is proposed.
	Luella Upthegrove and Tom Roberts. Intellectual property header descriptors: A dynamic approach. In IP Workshop Proceedings, 1994. Format: HTML Document(). Audience: DL Researchers, librarians. References: 0. Links: 0. Relevance: Low-Medium. Abstract: Very sketchy outline of approach for managing IP by using trusted systems, local repositories contact remote (IP Owner's) repositories. Documents have headers with `global header descriptor contain a set of data ele ments that identify intellectual property: Ownership, Permitted Uses, Royalty Compensation, and IP Attributes.`
	Mark G.K.M. van Doorn and Arjen P. de Vries. The psychology of multimedia databases. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Multimedia information retrieval in digital libraries is a difficult task for computers in general. Humans on the other hand are experts in perception, concept representation, knowledge organization and memory retrieval. Cognitive psychology and science describe how cognition works in humans, but can offer valuable clues to information retrieval researchers as well. Cognitive psychologists view the human mind as a general-purpose symbol-processing system that interacts with the world. A multimedia information retrieval system can also be regarded as a symbol-processing system that interacts with the environment. Its underlying information retrieval model can be seen as a cognitive framework that describes how the various aspects of cognition are related to each other.
	Gregg C. Vanderheiden. Anywhere, anytime (+anyone) access to the next-generation www. In Proceedings of the Sixth International World-Wide Web Conference, 1997.
	Marc VanHeyningen. The unified computer science technical report index: Lessons in indexing diverse resources. 2nd International World Wide Web Conference, WWW'94, pages 535-543, October 1994.
	Aravindan Veerasamy and Shamkant Navathe. Querying, navigating, and visualizing a digital library catalog. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document() . Audience: HCI people. References: 4. Links: 1. Relevance: Low-medium. Abstract: Describes a system similar to TexTiles. A bar graph for each search term (with the retrieved documents along the x-axis) indicating the importance of the term in the document are overlayed. This allows the user to see which documents have which words at a glance. Also handles thesaurus expansion and GUI query generation. Retrieval of related results (other papers from the proceedings or by the author).
	Remco C. Veltkamp and Mirela Tanase. Content-based image retrieval systems: A survey. Technical Report TR UU-CS-2000-34 (revised version), Department of Computing Science, Utrecht University, October 2002.
	Rodrigo C. Vieira, Pavel Calado, Altigran S. Silva, Alberto H. F. Laender, and Berthier A. Ribeiro-Neto. Structuring keyword-based queries for web databases. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. This paper describes a framework, based on Bayesian belief networks, for querying Web databases using keywords only. According to this framework, the user inputs a query through a simple search-box interface. From the input query, one or more plausible structured queries are derived and submitted to Web databases. Since the query structuring is done automatically, the resulting queries might be imprecise (i.e., they may do not match exactly a structured query built manually). The results are then retrieved and presented to the user as ranked answers. To evaluate our framework, an experiment using 38 example queries was carried out. We found out that 97squeries is the proper one. Further, when the user selects one of these three top queries for processing, the ranked answers present average precision figures of 92%.
	Charles L. Viles. Maintaining state in a distributed information retrieval system. In 32nd Southeast Conf. of the ACM, pages 157-161, 1994.
	Charles L. Viles and James C. French. Dissemination of collection wide information in a distributed information retrieval system. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 12-20, July 1995.
	Radek Vingralek, Yuri Breitbart, and Gerhard Weikum. Distributed file organization with scalable cost/performance. Proceedings of the International Conference on Management of Data, pages 253-64, June 1994. This paper is perhaps too OS oriented. Nevertheless is a great paper. The paper presents a distributed file organization for record-structured, disk-resident files with key-based exact-match access. The file is organized into buckets that are spread across multiple servers, where a server may hold multiple buckets. Client requests are serviced by mapping keys onto buckets and looking up the corresponding server in an address table. Dynamic growth in terms of file size and access load is supported by bucket splits and bucket migration onto other existing or newly acquired servers. The significant and challenging problem addressed is how to achieve scalability so that both the file size and the client throughput can be scaled up by linearly increasing the number of servers and dynamically redistributing data. Unlike previous work with similar objectives, our data redistribution considers explicitly the cost/performance ratio of the system by aiming to minimize the number of servers that are acquired to provide the required performance. A new server is acquired only if the overall server utilization in the system does not drop below a specified threshold. Preliminary simulation results show that the goal of scalability with controlled cost/performance is indeed achieved to a large extent.
	Visa and Mastercard. Mastercard international - set secure electronic transaction (tm). Mastercard website: http://www.mastercard.com/set/.
	Luis von Ahn and Laura Dabbish. Labeling images with a computer game. In Proceedings of the Conference on Human Factors in Computing Systems CHI'04, pages 319-326, New York, NY, USA, 2004. ACM Press.
	Nina Wacholder, David K. Evans, and Judith L. Klavans. Automatic identification and organization of index terms for interactive browsing. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. The potential of automatically generated indexes for information access has been recognized for several decades (e.g., Bush 1945, Edmundson and Wyllys, 1961), but the quantity of text and the ambiguity of natural language processing have made progress at this task more difficult than was originally foreseen. Recently, a body of work on development of interactive systems to support phrase browsing has begun to emerge (e.g., Anick and Vaithyanathan 1997, Gutwin et al., Nevill-Manning et al. 1997, Godby and Reighart 1998). In this paper, we consider two issues related to the use of automatically identified phrases as index terms in a dynamic text browser (DTB), a user-centered system for navigating and browsing index terms: 1) What criteria are useful for assessing the usefulness of automatically identified index terms? and 2) Is the quality of the erms identified by automatic indexing such that they provide useful access to document content?
	Howard D. Wactlar and Ching chih Chen. Enhanced perspectives for historical and cultural documentaries using informedia technologies. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Speech recognition, image processing, and language understanding technologies have successfully been applied to broadcast news corpora to automate the extraction of metadata and make use of it in building effective video news retrieval interfaces. This paper discusses how these technologies can be adapted to cultural documentaries as represented by the award-winning First Emperor of China videodisc and multimedia CD. Through automated means, efficient interfaces into documentary contents can be built dynamically based on user needs. Such interfaces enable the assemblage of large video documentary libraries from component videodisc, CD, and videotape projects, with alternate views into the material complementing the original sequences authored by the materials' producers.
	W.A. Wagenaar. My memory: A study of autobiographical memory over six years. Cognitive psychology, 18:225-252, 1986.
	Ernest Wan, Philip Robertson, John Brook, Stephen Bruce, and Kristine Armitage. Retaining hyperlinks in printed hypermedia document. In Proceedings of the Eighth International World-Wide Web Conference. In this paper, we describe a method that allows a hypermedia document to retain its hyperlinks in the printed copy. The method associates the hyperlinks with cut-out tabs on the edges of the printed pages. A method for modelling the cut-out tabs and optimizing their assignment to the hyperlinks is discussed. We also describe a prototype authoring system that implements the method.
	James Z. Wang and Yanping Du. Scalable integrated region-based image retrieval using irm and statistical clustering. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. Statistical clustering is critical in designing scalable image retrieval systems. In this paper, we present a scalable algorithm for indexing and retrieving images based on region segmentation. The method uses statistical clustering on region features and IRM (Integrated Region Matching), a measure developed to evaluate overall similarity between images that incorporates properties of all the regions in the images by a region-matching scheme. Compared with retrieval based on individual regions, our overall similarity approach (a) reduces the influence of inaccurate segmentation, (b) helps to clarify the semantics of a particular region, and (c) enbables a simple querying interface for region-based image retrieval systems. The algorithm has been implemented as a part of our experimental SIMPLIcity image retrieval system and tested on large-scale image databases of both general-purpose images and pathology slides. Experiments have demonstrated that this technique maintains the accuracy and robustness of the original system while reducing the matching time significantly.
	Jenq-Haur Wang, Jei-Wen Teng, Wen-Hsiang Lu, , and Lee-Feng Chien. Translating unknown cross-lingual queries using a web-based approach. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Users' cross-lingual queries to a digital library system might be short and not included in a common translation dictionary (unknown terms). In this paper, we investigate the feasibility of exploiting the Web as the corpus source to translate unknown query terms for cross-language information retrieval (CLIR) in digital libraries. We propose a Web-based term translation approach to determining effective translations for an unknown query term via mining of bilingual search-result pages obtained from a real Web search engine. This approach can enrich the construction of a domain-specific bilingual lexicon and benefit CLIR services in a digital library with only monolingual document collections. Very promising results have been obtained in generating effective translation equivalents of many unknown terms, including proper nouns, technical terms and Web query terms, and in assisting bilingual lexicon construction for a real digital library system.
	Jenq-Haur Wang, De-Ming Zhuang, Ching-Chun Hsieh, , and Lee-Feng Chien. Resolving the unencoded character problem for chinese digital libraries. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Constructing a Chinese digital library, especially for a historical article archiving, is often bothered by the small character sets supported by the current computer systems. This paper is aimed at resolving the unencoded character problem with a practical and composite approach for Chinese digital libraries. The proposed approach consists of the glyph expression model, the glyph structure database, and supporting tools. With this approach, the following problems can be resolved. First, the extensibility of Chinese characters can be preserved. Second, it would be as easy to generate, input, display, and search unencoded characters as existing ones. Third, it is compatible with existing coding schemes that most computers use. This approach has been utilized by organizations and projects in various application domains including archeology, linguistics, ancient texts, calligraphy and paintings, and stone and bronze rubbings. For example, in Academia Sinica, a very large full-text database of ancient texts called Scripta Sinica has been created using this approach. The Union Catalog of National Digital Archives Project (NDAP) dealt with the unencoded characters encountered when merging the metadata of 12 different thematic domains from various organizations. Also, in Bronze Inscriptions Research Team (BIRT) of Academia Sinica, 3,459 Bronze Inscriptions were added, which is very helpful to the education and research in historic linguistics.
	Jun Wang, Abhishek Agrawal, Anil Bazaz, Supriya Angle, Edward A. Fox, and Chris North. Enhancing the envision interface for digital libraries. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. To enhance the ENVISION interface and facilitate user interaction, various techniques were considered for better rendering of search results with improved scalability. In this paper we discuss the challenges we encountered and our solutions to those problems. <br><br> The Envision interface helps visualize search results from the MARIAN digital library system. The old Envision interface featured a Query Screen, Result Visualization Screen, and Result List Screen. Envision presented each document in the result set graphically as an icon. The user could select several documents on the Visualization Screen and see corresponding details in the Result List Screen. Envision used a rigid matrix to display search results. As with various other visualization tools, it provided no overview, thus no context information to users. Scrollbars were needed for data outside the viewable area. In working to enhance Envision, we considered other efforts. Some approaches (e.g., ThemeScapes) are almost exactly opposite, displaying data in a completely flexible manner. The data is not bound to any axis but is arranged according to relationships between documents. We now explore a middle ground with an overview as well as loose matrix arrangement, with related documents mapped together.
	QianYing Wang, Susumu Harada, Tony Hsieh, and Andreas Paepcke. Visual interface and control modality: An experiment about fast photo browsing on mobile devices. In Proceedings of the IFIP INTERACT Conference. Lecture notes in Computer Science., volume 3585/2005, 2005. Available at http://dbpubs.stanford.edu/pub/2005-29. We examined the strengths and weaknesses of three diverse scroll control modalities for photo browsing on personal digital assistants (PDAs). This exploration covered nine alternatives in a design space that consisted of three visual interfaces and three control modalities. The three interfaces were a traditional thumbnail layout, a layout that placed a single picture on the screen at a time, and a hybrid that placed one large photo in the center of the display, while also displaying a row of neighboring thumbnails at the top and bottom of the screen. In a user experiment we paired each of these interfaces with each of the following three scroll control modalities: a jog dial, a squeeze sensor, and an on-screen control that was activated by tapping with a stylus. We offer a simple model that classifies our experiment's interfaces by how much they provide visual context within the photo collection. The model also classifies the scroll modalities by how tightly they correlate scroll input actions to effects on the screen. Performance and attitudinal results from the user experiment are presented and discussed.
	QianYing Wang, Tony Hsieh, Meredith Ringel Morris, and Andreas Paepcke. Multi-user piles across space. In Submitted for publication., 2005. Available at http://dbpubs.stanford.edu/pub/2005-28. We introduce Multi-User Piles Across Space, a technique that allows co-located individuals with PDAs to share and organize information items (e.g., photos, text, sound clips, etc.) by placing these items in shared, imaginary off-screen piles. This technique relies on human capacities to remember spatial layouts, and allows small co-located groups with limited screen real estate to collaboratively manage information. Each participant can use their PDAs stylus to flick information to shared off-screen piles and view their contents. Connections are implemented through ad hoc WiFi. Optimistic concurrency control provides long term data consistency. We also describe an extension that allows PDA owners to transfer information items and piles to and from a tabletop display.
	Yuhang Wang, Fillia Makedon, James Ford, Li Shen, and Dina Goldin. Generating fuzzy semantic metadata describing spatial relations from images using the r-histogram. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Automatic generation of semantic metadata describing spatial relations is highly desirable for image digital libraries. Relative spatial relations between objects in an image convey important information about the image. Because the perception of spatial relations is subjective, we propose a novel framework for automatic metadata generation based on fuzzy k-nn classification that generates fuzzy semantic metadata describing spatial relations between objects in an image. For each pair of objects of interest, the corresponding R-histogram is computed and used as input for a set of fuzzy k-nn classifiers. The R-histogram is a quantitative representation of spatial relations between two objects. The outputs of the classifiers are soft class labels for each of the following eight spatial relations: 1) Left of, 2) Right of, 3) Above, 4) Below, 5) Near, 6) Far, 7) Inside, 8) Outside. Because the classifier-training stage involves annotating the training images manually, it is desirable to use as few training images as possible. To address this issue, we applied existing prototype selection techniques and also devised two new extensions. We evaluated the performance of different fuzzy k-nn algorithms and prototype selection algorithms empirically on both synthetic and real images. Preliminary experimental results show that our system is able to obtain good annotation accuracy (92synthetic images and 82training set (4-5 images).
	R. Want, B. N. Schilit, N. I. Adams, R. Gold, K. Petersen, D. Goldberg, J. R. Ellis, and M. Weiser. An overview of the parctab ubiquitous computing experiment. IEEE Personal Communications, 2(6):28-33, Dec 1995. The PARCTAB system integrates a palm-sized mobile computer into an office network. The PARCTAB project serves as a preliminary testbed for ubiquitous computing, a philosophy originating at Xerox PARC that aims to enrich our computing environment by emphasizing context sensitivity, casual interaction and the spatial arrangement of computers. This article describes the ubiquitous computing philosophy, the PARCTAB system, user interface issues for small devices, and our experience in developing and testing a variety of mobile applications.
	Jewel H. Ward. A quantitative analysis of dublin core metadata element set usage within oai data providers [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. This research describes an empirical study of how the unqualified Dublin Core Metadata Element Set (DC or DCMES) is used by 100 Data Providers (DPs) registered with the Open Archives Initiative (OAI). The research was conducted to determine whether or not the DCMES is used to its full capabilities. Eighty-two of 100 DPs have metadata records available for analysis. DCMES usage varies by type of DP. The average number of Dublin Core elements per record is eight, with an average of 91,785 Dublin Core elements in each DP. Five of the 15 elements of the DCMES are used 71DCMES is not used to its fullest extent within DPs registered with the OAI.
	Colin Ware. Information visualization: perception for design. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2000.
	Andrew Waugh, Ross Wilkinson, Brendan Hills, and Jon Dell'oro. Preserving digital information forever. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Well within our lifetime we can expect to see most information being created, stored and used digitally. Despite the growing importance of digital data, the wider community pays almost no attention to the problems of preserving this digital information for the future. Even within the archival and library communities most work on digital preservation has been theoretical, not practical, and highlights the problems rather than giving solutions. Physical libraries have to preserve information for long periods and this is no less true of their digital equivalents. This paper describes the preservation approach adopted in the Victorian Electronic Record Strategy (VERS) which is currently being trailed within the Victorian government, one of the state of Australia. We review the various preservation approaches that have been suggested and describe in detail encapsulation, the approach which underlies the Vers format. A key difference between the VERS project and previous digital preservation projects is the focus within VERS on the construction of actual system to test and implement the proposed technology. VERS is not a theoretical study in preservation.
	Digest of Papers. First International Symposium on Wearable Computers (Cat. No.97TB100199). Oct 1997.
	John Weatherley. A web service framework for embedding discovery services in distributed library interfaces. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Significant barriers deter web page designers and developers from incorporating dynamic content from web services into their page designs. Web services typically require designers to learn service protocols and have access to and knowledge of dynamic application servers or CGI in order to incorporate dynamic content into their pages. This paper describes a framework for embedding discovery services in distributed interfaces that seeks to simplify this process and eliminate these barriers, making the use of the dynamic content available to a wider audience and increasing its potential for adoption and use in educational design.
	John Weatherley, Tamara Sumner, Michael Khoo, Michael Wright, and Marcel Hoffman. Partnership reviewing: A cooperative approach for peer review of complex educational resources. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Review of digital educational resources, such as course modules, simulations, and data analysis tools, can differ from review of scholarly articles, in the heterogeneity and complexity of the resources themselves. The Partnership Review Model, as demonstrated in two cases, appears to promote cooperative interactions between distributed resource reviewers, enabling reviewers to effectively divide up the task of reviewing complex resources with little explicit coordination. The shared structural outline of the resource made visible in the review environment enables participants to monitor other reviewers' actions and to thus target their efforts accordingly. This reviewing approach may be effective in educational digital libraries that depend on community volunteers for most of their reviewing.
	The inktomi webmap. http://www.inktomi.com/webmap/.
	Martin Wechsier and Peter Schauble. A new ranking principle for multimedia information retrieval. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999. A theoretic framework for multimedia information retrieval is introduced which guarantees optimal retrieval effectiveness. In particular, a Ranking Principle for Distributed Multimedia-Documents (RPDM) is described together with an algorithm that satisfies this principle. Finally, the RPDM is shown to be a generalization of the Probability Ranking principle (PRP) which guarantees optimal retrieval effectiveness in the caSe of text document retrieval. The PRP justifies theoretically the relevance ranking adopted by modern search engines. In contrast to the classical PRP, the new RPDM takes into account transmission and inspection time, and most importantly, aspectual recall rather than simple recall.
	Peter Wegner. Interoperability. ACM Computing Surveys, 28(1):285-287, March 1996. Part of ACM Computing Surveys' special issue on Perspectives in Computer Science. Discusses various aspects of interoperability- the ability of two or more software components to cooperate despite differences in language, interface, and execution platforms. In particular, this paper focuses on client-server interoperability.
	Stuart Weibel. Metadata: the foundations of resource description. D-Lib Magazine, Jul 1995. Format: HTML Document().
	Stuart Weibel, Jean Godby, Eric Miller, and Ron Daniel. OCLC/NCSA metadata workshop report, March 1995. This report defines the Dublin core. The Dublin core is a set of 13 metadata attributes that should be present in all documents. The attributes are: subject, title, author, publisher, other agent, date, object type, form, identifier, relation, source, language, coverage.
	Peter C. Weinstein. Ontology-based metadata: Transforming the marc legacy. In Proceedings of the Third ACM International Conference on Digital Libraries, 1998. Discusses how MARC data could be transformed into a logic-based ontological model of bibliographic relations.
	M. Weiser. The computer for the 21st century. Scientific American (International Edition), 265(3):66-75, 1991. The arcane aura that surrounds personal computers is not just a 'user interface' problem. The idea of a 'personal' computer itself is misplaced and that the vision of laptop machines, dynabooks and knowledge navigators is only a transitional step toward achieving the real potential of information technology. Such machines cannot truly make computing an integral, invisible part of people's lives. The author and his colleagues are therefore trying to conceive a new way of thinking about computers, one that takes into account the human world and allows the computers themselves to vanish into the background.
	M. Weiser. Hot topics-ubiquitous computing. Computer, 26(10):71-2, Oct 1993. The author suggests that, due to the trends of unobtrusive technology and more intrusive information, the next phase of computing technology will develop nonlinearly. He states that, in the long run, the personal computer and the workstation will become practically obsolete because computing access will be everywhere: in the walls, on your wrist, and in 'scrap' computers (i.e., like scrap paper) lying about to be used as needed. The current research on ubiquitous computing is reviewed.
	M. Weiser. Some computer science issues in ubiquitous computing. Communications of the ACM, 36(7):74-84, 1993. Ubiquitous computing enhances computer use by making many computers available throughout the physical environment, while making them effectively invisible to the user. This article explains what is new and different about the computer science involved in ubiquitous computing. First, it provides a brief overview of ubiquitous computing, then elaborates through a series of examples drawn from various subdisciplines of computer science: hardware components (e.g. chips), network protocols, interaction substrates (e.g. software for screens and pens), applications, privacy, and computational methods. Ubiquitous computing offers a framework for new and exciting research across the spectrum of computer science.
	M. Weiser, B. Welch, A. Demers, and S. Shenker. Scheduling for reduced cpu energy. In Proceedings of the First USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 13-23, 1994. The energy usage of computer systems is becoming more important, especially for battery operated systems. Displays, disks, and CPUs, in that order, use the most energy. Reducing the energy used by displays and disks has been studied elsewhere; this paper considers a new method for reducing the energy used by the CPU. We introduce a new metric for CPU energy performance, millions-of-instructions-per-joule (MIPJ). We examine a class of methods to reduce MIPJ that are characterized by dynamic control of system clock speed by the operating system scheduler. Reducing clock speed alone does not reduce MIPJ, since to do the same work the system must run longer. However, a number of methods are available for reducing energy with reduced clock-speed, such as reducing the voltage (Chandrakasan et al., 1992) (Horowitz, 1993) or using reversible (Younis and Knight, 1993) or adiabatic logic (Athas et al., 1994). What are the right scheduling algorithms for taking advantage of reduced clock-speed, especially in the presence of applications demanding ever more instructions-per-second? We consider several methods for varying the clock speed dynamically under control of the operating system, and examine the performance of these methods against workstation traces. The primary result is that by adjusting the clock speed at a fine grain, substantial CPU energy can be saved with a limited impact on performance.
	B. Welch, S. Elrod, T. Moran, K. McCall, F. Halasz, and R. Bruce. Applications of a computerized whiteboard. In 1994 SID International Symposium Digest of Technical Papers. SID, pages 591-3, 1994. A computerized whiteboard has been built using an infrared pen technology in combination with a 67 inch display. The image is formed from a rear projected liquid crystal light valve. The system was designed to support several different applications: a whiteboard that enables both capture and organization of information for informal creative meetings, a group station where computational tools can be used through the whiteboard metaphor, a communication device employing a remotely shared work surface, and a multimedia presentation tool.
	Liu Wenyin, Susan Dumais, Yanfeng Sun, HongJiang Zhang, Mary Czerwinski, and Brent Field. Semi-automatic image annotation. In 8th International Conference on Human-Computer Interactions (INTERACT 2001), July 2001.
	Liu Wenyin, Yanfeng Sun, and Hongjiang Zhang. Mialbum - a system for home photo managemet using the semi-automatic image annotation approach. In MULTIMEDIA '00: Proceedings of the eighth ACM international conference on Multimedia, pages 479-480, New York, NY, USA, 2000. ACM Press.
	Thomas E. White and Layna Fischer. New Tools for New Times: The Workflow Paradigm. Future Strategies, Inc., Alameda, CA, 1994.
	Whizbang! labs. http://www.whizbanglabs.com.
	Gio Wiederhold. Mediators in the architecture of future information systems. IEEE Computer, 25(3):51-60, March 1992. Describes mediator architecture for accessing multiple information systems and discusses the related research.
	Gio Wiederhold. Mediation in information systems. ACM Computing Surveys, 27(2):265-267, June 1995. This paper introduces mediated architecture for information systems as a logical evolution of client-server architecture.
	S. Wiesener, W. Kowarschick, P. Vogel, and R. Bayer. Semantic hypermedia retrieval in digital libraries. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	Barbara M. Wildemuth, Gary Marchionini, Meng Yang, Gary Geisler, Todd Wilkens, Anthony Hughes, and Richard Gruss. How fast is too fast? evaluating fast forward surrogates for digital video. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. Because the number of libraries incorporating digital video is rapidly increasing, there is a crucial need for more effective user interfaces to access these materials. To support effective browsing, such interfaces should include video surrogates (i.e., smaller objects that can stand in for the videos in the collection, analogous to abstracts standing in for documents). The current study investigated four variations (i.e., speeds) of one form of video surrogate: a fast forward of the video created by selecting every Nth frame from the full video. In addition, it provided a field test of the validity of six measures of user performance when interacting with video surrogates. Forty-five study participants interacted with all four versions of the fast forward surrogate, and completed all six performance tasks with each surrogate. Surrogate speed affected performance on four measures: object recognition (graphical), action recognition, linguistic gist comprehension (full text), and visual gist comprehension. Based on these results, we recommend that the default speed for fast forward surrogates should be about 1:64 of the original video keyframes. In addition, users should control the choice of fast forward speed to adjust for content characteristics and personal preferences.
	M. E. Williams. Transparent information systems through gateways, front ends, intermediaries, and interfaces. Journal of the American Society for Information Science, 37(4):204-214, July 1986.
	Matthew Williams. Direct metaphor and user interaction in the electronic libraries of the future. In DAGS '95, 1995. Format: Not Yet On-line. Audience: General population, HCI people. References: 10. Relevance: High. Abstract: Focuses on a user's interaction with a digital library, arguing for a direct metaphor (books, tables & bookshelves, highlighters, note cards, post-its). Also states importance of information having multiple forms- wh at appears as a magazine in one context might be note cards in another. Suggests having a `time-based memory of user interaction with the printed word` so that every time you reference something, a trail of your current research threa d is appended. Allows you to search for things like `the magazine article on whales I read last week.`
	Joseph Willihnganz. Debating mass communication during the rise and fall of broadcasting. In DAGS '95, 1995. Format: Not Yet On-line.
	Craig E. Wills and Mikhail Mikhailov. Towards a better understanding of web resources and server responses for improved caching. In Proceedings of the Eighth International World-Wide Web Conference, 1999. This work focuses on characterizing information about Web resources and server responses that is relevant to Web caching. The approach is to study a set of URLs at a variety of sites and gather statistics about the rate and nature of changes compared with the resource type. In addition, we gather response header information reported by the servers with each retrieved resource. Results from the work indicate that there is potential to reuse more cached resources than is currently being realized due to inaccurate and nonexistent cache directives. In terms of implications for caching, the relationships between resources used to compose a page must be considered. Embedded images are often reused, even in pages that change frequently. This result both points to the need to cache such images and to discard them when they are no longer included as part of any page. Finally, while the results show that HTML resources frequently change, these changes can be in a predictable and localized manner. Separating out the dynamic portions of a page into their own resources allows relatively static portions to be cached, while retrieval of the dynamic resources can trigger retrieval of new resources along with any invalidation of already cached resources.
	Brian Wingenroth, Mark Patton, and Tim DiLauro. Enhancing access to the levy sheet music collection: Reconstructing full-text lyrics from syllables. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. The goal of the Lester S. Levy Sheet Music Collection, Phase Two project is to develop tools, processes, and systems that facilitate collection ingestion through automated processes that reduce, but not necessarily eliminate human intervention. One of the major components of this project is an optical music recognition (OMR) system that extracts musical information and lyric text from the page images that comprise each piece in a collection. It is often the case, as it is with the Levy Collection, that lyrics embedded in music notation are written in a syllabicated form so that each syllable lines up with the note or notes to which it corresponds. Searching the syllabicated form of words, however, would be counterintuitive and cumbersome for end-users. This paper describes a tool that, using a simple algorithm, rebuilds complete words from lyric syllables and, in ambiguous cases, provides feedback to the collection builder. This system will be integrated into the workflow of the Levy Sheet Music Collection, but has broad applicability for any project ingesting musical scores with lyrics.
	Terry Winograd. A language/action perspective on the design of cooperative work. Human-Computer Interaction, 3(1), 1995. In creating computer-based systems, one works within a perspective that shapes the design questions that will be asked and the kinds of solutions that are sought. The article introduces a perspective based on language as action, and explores its consequences for system design. The author describes a communication too called The Coordinator, which was designed from a language/action perspective; and suggests how further aspects of coordinated work might be addressed in a similar style. The language/action perspective is illustrated with an example based on studies of nursing work in a hospital ward and contrasted to other currently prominent perspectives
	Terry Winograd, editor. Bringing Design to Software. Addison-Wesley, 1996.
	Terry Winograd. The design of interaction. In Peter Denning and Bob Metcalfe, editors, Beyond Calculation, The Next 50 Years of Computing, pages 149-162. Springer-Verlag, 1997.
	Terry Winograd and Fernando Flores. Understanding Computers and Cognition: A New Foundation for Design. Addison-Wesley, 1987.
	Niklaus Wirth. What can we do about the unnecessary diversity of notation for syntactic definitions. Communications of the ACM, 20(11):822-823, November 1977.
	I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kauffman Publishing, San Francisco, 2nd edition, 1999.
	Ian H. Witten. Managing gigabytes : compressing and indexing documents and images. Van Nostrand Reinhold, New York, 1994. A very good book in database compression. It describe first the mathematical foundations of compression. Then applies these concepts to data. This is followed by a study on how to compress indices. Finally, compression of databases of images is studied. The book include the description of a database system with the same name as the book.
	Ian H. Witten, David Bainbridge, and Stefan J. Boddie. Power to the people: End-user building of digital library collections. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. Naturally, digital library systems focus principally on the reader: the consumer of the material that constitutes the library. In contrast, this paper describes an interface that makes it easy for people to build their own library collections. Collections may be built and served locally from the user's own web server, or (given appropriate permissions) remotely on a shared digital library host. End users can easily build new collections styled after existing ones from material on the Web or from their local files - or both, and collections can be updated and new ones brought on-line at any time. The interface, which is intended for non-professional end users, is modeled after widely used commercial software installation packages. Lest one quail at the prospect of end users building their own collections on a shared system, we also describe an interface for the administrative user who is responsible for maintaining a digital library installation.
	Ian H. Witten, David Bainbridge, Gordon Paynter, and Stefan Boddie. The greenstone plugin architecture. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Flexible digital library systems need to be able to accept documents and metadata in a variety of forms. This paper describes an architecture based on plugins that allows one to import documents and metadata in different formats, and associate metadata with the appropriate documents. Plugins that import documents can perform their own format conversion internally, or take advantage of existing conversion programs. Metadata can be read from the input documents, or from separate metadata files, or can in some cases be computed from the documents themselves. It is easy to write new plugins for novel situations.
	Ian H. Witten, Sally Jo Cunningham, Mahendra Vallabh, and Timothy C. Bell. A new zealand digital library for computer science research. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: HTML Document(27K + pictures). Audience: Computer scientists. References: 18. Links: 13. Relevance: Low-medium. Abstract: Describes a CS Tech Report project in New Zealand. The mg system stores a full text index (in about 5 space of the original!). Documents stay on home servers, only index is centralized. Ability to automatically extract ASCII text from PostScript etc. Able to limit search to first page (usually author, title, abstract). Issues: scalability, ability to minimize communication traffic, transparent to providers, largely automatic indexing.
	Ian H. Witten, Rodger J. McNab, Stefan J. Boddie, and David Bainbridge. Greenstone: A comprehensive open-source digital library software system. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. This paper describes the Greenstone digital library software, a comprehensive, open_source system for the construction and presentation of information collections. Collections built with Greenstone offer effective full-text searching and metadata-based browsing facilities that are attractive and easy to use. Moreover, they are easily maintainable and can be augmented and rebuilt entirely automatically. The system is extensible: software plugins accommodate different document and metadata types
	Darrell W. Woelk. Carnot intelligent agents and digital libraries. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994. Format: HTML Document() . Audience: Digital Library researchers. References: 3. Links: 1. Relevance: Medium-Low. Abstract: Suggests the use of agents which are `automatically programmed with the knowledge necessary to map among different data models, query languages, and database schemas.` Relies on having a pre-defined `enterprise mod el` (common ontology) and mappings for all the individual databases.
	W. Wolf, B. Liu, M. Yeung, B. Yeo, and D. Markham. Video as scholarly material in the digital library. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online. Links: . References: . Relevance: . Abstract: .
	Joanna L. Wolfe. Effects of annotations on student readers and writers. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Recent research on annotations has focused on how readers annotate texts, ignoring the question of how reading annotations might affect subsequent readers of a text. This paper reports on a study of persuasive essays written by 123 undergraduates receiving primary source materials annotated in various ways. Findings indicate that annotations improve recall of emphasized items, influence how specific arguments in the source materials are perceived, decrease students' tendencies to unnecessarily summarize. Of particular interest is that students' perceptions of the annotator appeared to greatly influence how they responded to the annotated material. Using this study as a basis, I discuss implications for the design and implementation of digitally annotated materials.
	Alec Wolman, Geoffrey M. Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin, , and Henry M. Levy. On the scale and performance of cooperative web proxy caching. In Proceedings of the 17th ACM symposium on Operating Systems Principles (SOSP ’99), 1999. Published as Operating Systems Review 34(5):16-31, December 1999.
	K. R. Wood, T. Richardson, and F. Bennett. Global teleporting with java: toward ubiquitous personalized computing. Computer, 30(2):53-9, Feb 1997. The essence of mobile computing is having your personal computing environment available wherever you happen to be. Traditionally, this is achieved by physically carrying a computing device (say, a laptop or PDA) which may have some form of intermittent network connectivity, either wireless or tethered. However, at the Olivetti and Oracle Research Laboratory, we have introduced another form of mobility in which it is the user's applications that are mobile. Users do not carry any computing platform but instead bring up their applications on any nearby machine exactly as they appeared when last invoked. We call this form of mobility teleporting, and it has been used continuously and fruitfully by many members of our laboratory. We are extending this idea from our LAN to the entire Internet using Java as the common interface. It is still our personal X sessions that are made mobile, but now they can appear anywhere on the Internet within any Java-enabled browser.
	Kenneth R. Wood, Tristan Richardson, Frazer Bennett, Andy Harter, and Andy Hopper. Global teleporting with java: Toward ubiquitous personalized computing. IEEE Computer, pages 53-59, February 1997. They use proxies to let you access your X sessions from anywhere. They also experimented with using Netscape/Java to maintain such long-lasting sessions
	The Workflow Management Coalition home page. At http://www.aiim.org/wfmc/mainframe.htm.
	Victor Wu, R. Manmatha, and Edward M. Riseman. Finding text in images. In Proceedings of the Second ACM International Conference on Digital Libraries, 1997.
	Peter R. Wurman, Mechael P. Wellman, and William E. Walsh. The michigan internet auctionbot: A configurable auction server for human and software agents. In Proceedings of the Second International Conference on Autonomous Agents (Agents-98), 1998.
	Yahoo incorporated. http://www.yahoo.com.
	Tomoharu Yamaguchi, Itaru Hosomi, and Toshiaki Miyashita. Webstage: An active media enhanced world wide web browser. In Proceedings of the Conference on Human Factors in Computing Systems CHI'97, 1997.
	T. Yan and H. Garcia-Molina. SIFT-a tool for wide-area information dissemination. In Proc. 1995 USENIX Technical Conference, pages 177-186, New Orleans, 1995. http://dbpubs.stanford.edu/pub/1994-7.
	Tak Woon Yan, Matthew Jacobsen, Héctor García-Molina, and Umeshwar Dayal. From user access patterns to dynamic hypertext linking. In Proceedings of the Fifth International World-Wide Web Conference, 1996. This paper describes an approach for automatically classifying visitors of a web site according to their access patterns. User access logs are examined to discover clusters of users that exhibit similar information needs; e.g., users that access similar pages. This may result in a better understanding of how users visit the site, and lead to an improved organization of the hypertext documents for navigational convenience. More interestingly, based on what categories an individual user falls into, we can dynamically suggest links for him to navigate. In this paper, we describe the overall design of a system that implements these ideas, and elaborate on the preprocessing, clustering, and dynamic link suggestion tasks. We present some experimental results generated by analyzing the access log of a web site.
	T.W. Yan and H. Garcia-Molina. Distributed selective dissemination of information. In Proc. Parallel and Distributed Information Systems, pages 89-98, 1994.
	T.W. Yan and H. Garcia-Molina. Index structures for selective dissemination of information under the boolean model. ACM Transactions on Database Systems, 19(2):332-64, 1994.
	Beverly Yang and Hector Garcia-Molina. Comparing hybrid peer-to-peer systems. In Proceedings of the Twenty-seventh International Conference on Very Large Databases, 2001. `Peer-to-peer` systems like Napster and Gnutella have recently become popular for sharing information. In this paper, we study the relevant issues and tradeoffs in designing a scalable P2P system. We focus on a subset of P2P systems, known as `hybrid` P2P, where some functionality is still centralized. (In Napster, for example, indexing is centralized, and file exchange is distributed.) We model a file-sharing application, developing a probabilistic model to describe query behavior and expected query result sizes. We also develop an analytic model to describe system performance. Using experimental data collected from a running, publicly available hybrid P2P system, we validate both models. We then present several hybrid P2P system architectures and evaluate them using our model. We discuss the tradeoffs between the architectures and highlight the effects of key parameter values on system performance.
	Beverly Yang and Hector Garcia-Molina. Comparing hybrid peer-to-peer systems (25 page). Technical Report 2001-37, Stanford University, 2001. `Peer-to-peer` systems like Napster and Gnutella have recently become popular for sharing information. In this paper, we study the relevant issues and tradeoffs in designing a scalable P2P system. We focus on a subset of P2P systems, known as `hybrid` P2P, where some functionality is still centralized. (In Napster, for example, indexing is centralized, and file exchange is distributed.) We model a file-sharing application, developing a probabilistic model to describe query behavior and expected query result sizes. We also develop an analytic model to describe system performance. Using experimental data collected from a running, publicly available hybrid P2P system, we validate both models. We then present several hybrid P2P system architectures and evaluate them using our model. We discuss the tradeoffs between the architectures and highlight the effects of key parameter values on system performance.
	Beverly Yang and Hector Garcia-Molina. Comparing hybrid peer-to-peer systems (extended). Technical Report 2001-35, Stanford University, 2001. `Peer-to-peer` systems like Napster and Gnutella have recently become popular for sharing information. In this paper, we study the relevant issues and tradeoffs in designing a scalable P2P system. We focus on a subset of P2P systems, known as `hybrid` P2P, where some functionality is still centralized. (In Napster, for example, indexing is centralized, and file exchange is distributed.) We model a file-sharing application, developing a probabilistic model to describe query behavior and expected query result sizes. We also develop an analytic model to describe system performance. Using experimental data collected from a running, publicly available hybrid P2P system, we validate both models. We then present several hybrid P2P system architectures and evaluate them using our model. We discuss the tradeoffs between the architectures and highlight the effects of key parameter values on system performance.
	Beverly Yang and Hector Garcia-Molina. Improving search in peer-to-peer systems. Technical Report 2001-47, Stanford University, 2001. Peer-to-peer systems have emerged as a popular way to share huge volumes of data. The usability of these systems depends on effective techniques to find and retrieve data; however, current techniques used in existing P2P systems are often very inefficient. In this paper, we present three techniques for efficient search in P2P systems. We present the design of these techniques, and then evaluate them using a combination of experiments over Gnutella, the largest open P2P system in operation, and analysis. We show that while our techniques maintain the same quality of results as currently used techniques, our techniques use up to 5 times fewer resources. In addition, we designed our techniques to be simple in design and implementation, so that they can be easily incorporated into existing systems for immediate impact.
	Beverly Yang and Hector Garcia-Molina. Designing a super-peer network. Technical Report 2002-13, Stanford University, 2002. A `super-peer` is a node in a peer-to-peer network that operates both as a server to a set of clients, and as an equal in a network of super-peers. Super-peer networks strike a balance between the inherent efficiency of centralized search, and the autonomy, load balancing and robustness to attacks provided by distributed search. Furthermore, they take advantage of the heterogeneity of capabilities (e.g., bandwidth, processing power) across peers, which recent studies have shown to be enormous. Hence, new and old P2P systems like Morpheus and Gnutella are adopting super-peers in their design. Despite their growing popularity, the behavior of super-peer networks is not well understood. For example, what are the potential drawbacks of super-peer networks? How can super-peers be made more reliable? How many clients should a super-peer take on to maximize efficiency? In this paper we examine super-peer networks in detail, gaining an understanding of their fundamental characteristics and performance tradeoffs. We also present practical guidelines and a general procedure for the design of an efficient super-peer network.
	Beverly Yang and Hector Garcia-Molina. Improving search in peer-to-peer systems. Technical Report 2002-28, Stanford University, 2002. Peer-to-peer systems have emerged as a popular way to share huge volumes of data. The usability of these systems depends on effective techniques to find and retrieve data; however, current techniques used in existing P2P systems are often very inefficient. In this paper, we present three techniques for efficient search in P2P systems. We present the design of these techniques, and then evaluate them using a combination of analysis and experiments over Gnutella, the largest open P2P system in operation. We show that while our techniques maintain the same quality of results as currently used techniques, they use up to 5 times fewer resources. In addition, we designed our techniques to be simple, so that they can be easily incorporated into existing systems for immediate impact.
	Christopher C. Yang and K. W. Li. Error analysis of chinese text segmentation using statistical approach. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. The Chinese text segmentation is important for the indexing of Chinese documents, which has significant impact on the performance of Chinese information retrieval. The statistical approach overcomes the limitations of the dictionary based approach. The statistical approach is developed by utilizing the statistical information about the association of adjacent characters in Chinese text collected from the Chinese corpus. Both known words and unknown words can be segmented by the statistical approach. However, errors may occur due to the limitation of the corpus. In this work, we have conducted the error analysis of two Chinese text segmentation techniques using statistical approach, namely, boundary detection and heuristic method. Such error analysis is useful for the future development of automatic text segmentation of Chinese text or other text in oriental languages. It is also helpful to understand the impact of these errors on information retrieval system in digital libraries.
	Jun Yang and Alexander Hauptmann</. Video grammar for locating named people. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. Finding a named person in broadcast news video is important in video retrieval. Relying on the text information such as video transcript and OCR text, this task suffers from the temporal mismatch between a person's visual appearance and the occurrence of his/her name in the text. By exploring video grammar regarding the concurrence pattern between faces and names, we propose an extended text-based IR method to overcome this problem, which yield superior performance.
	Ming-Hsuan Yang, David J. Kriegman, and Narendra Ahuja. Detecting faces in images: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 24(1):34-58, 2002.
	Ke-Thia Yao, In-Young Ko, Ragy Eleish, and Robert Neches. Asynchronous information space analysis architecture using content and structure-based service brokering. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000. Our project focuses on rapid formation and utilization of custom collections of information for groups focused on high-paced tasks. Assembling such collections, as well as organizing and analyzing the documents within them, is a complex and sophisticated task. It requires understanding what information management services and tools are provided by the system, when they are appropriate to use, and how those services can be composed together to perform more complex analyses. This paper describes the architecture of a prototype implementation for the information analysis management system that we have developed. The architecture uses metadata to describe collections of documents both in terms of their content and structure. This metadata allows the system to dynamically and in a context-sensitive manner to determine the set of appropriate analysis services. To facilitate the invocation of those services, the architecture also provides and asynchronous and transparent service access mechanism.
	David Yaron, D. Jeff Milton, and Rebecca Freeland. Linked active content: A service for digital libraries for education. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. A service is described to help enable digital libraries for education, such as the NSDL, to serve as collaboration spaces for the creation, modification and use of active learning experiences. The goal is to redefine the line between those activities that fall within the domain of computer programming and those that fall within the domain of content authoring. The current location of this line, as defined by web technologies, is such that far too much of the design and development process is in the domain of software creation. This paper explores the definition and use of `linked active content`, which builds on the hypertext paradigm by extending it to support active content. This concept has community development advantages, since it provides an authoring paradigm that supports contributions from a more diverse audience, including especially those who have substantial classroom and pedagogical expertise but lack programming expertise. It also promotes the extraction of content from software so that collections may be better organized and more easily repurposed to meet the needs of a diverse audience of educations and students.
	Song Ye, Fillia Makedon, Tilmann Steinberg, Li Shen, Yuhang Wang, Yan Zhao, and James Ford. Scens: a system for the mediated sharing of sensitive data [short paper]. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003. This paper introduces SCENS, a Secure Content Exchange Negotiation System suitable for the exchange of private digital data that reside in distributed digital repositories. SCENS is an open negotiation system with flexibility, security and scalability. SCENS is currently being designed to support data sharing in scientific research, by providing incentives and goals specific to a research community. However, it can easily be extended to apply to other communities, such as government, commercial and other types of exchanges. It is a trusted third party software infrastructure enabling independent entities to interact and conduct multiple forms of negotiation.
	Ka-Ping Yee, Kirsten Swearingen, Kevin Li, and Marti Hearst. Faceted metadata for image search and browsing. In Proceedings of the conference on Human factors in computing systems, pages 401-408. ACM Press, 2003. Introduces metadata 'facets'. Shows a simple interface for browsinig image collections that have multiple categories of metadata.
	Ron Yeh, Chunyuan Liao, Scott Klemmer, Guimbretiere Francois, Brian Lee, Boyko Kakaradov, Jeannie Stamberger, and Andreas Paepcke. Butterflynet: A mobile capture and access system for field biology research. In Submitted for publication., 2005. Available at http://dbpubs.stanford.edu/pub/2005-26.
	Li-Hsing Yen, Ting-Lu Huang, and Shu-Yuen Hwang. A protocol for causally ordered message delivery in mobile computing systems. Mobile Networks and Applications, 2(2):365-72, 1997. There is a growing trend in developing applications for mobile computing systems in which mobile host computers retain their network connections while in transit. This paper proposes an algorithm that enforces a useful property, namely, causal ordering, that delivers messages among mobile hosts. This property ensures that causally related messages directed to the same destination will be delivered in an order consistent with their causality, which is important in applications that involve human interaction such as mobile e-mail and mobile teleconferencing. Such applications are envisioned by the proponents of Personal Communications Services (PCS). Without this property, users may receive and read original messages and the corresponding replies out of order. Our algorithm, when compared with previous proposals, provides an alternative with a low handoff cost, medium message overhead, and low probability of unnecessary inhibition in delivering messages.
	Horn yeu Shiaw, Robert J.K. Jacob, and Gregory R. Crane. The 3d vase museum: A new approach to context in a digital library. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. We present a new approach to displaying and browsing a digital library collection, a set of Greek vases in the Perseus digital library. Our design takes advantage of three-dimensional graphics to preserve context even while the user focuses in on a single item. In a typical digital library user interface, a user can either get an overview for context or else see a single selected item, sacrificing the context view. In our 3D Vase Museum, the user can navigate seamlessly from a high level scatterplot-like plan view to a perspective overview of a subset of the collection, to a view of an individual item, to retrieval of data associated with that item, all within the same virtual room and without any mode change or special command. We present this as an example of a solution to the problem of focus-plus-context in information visualization. We developed 3D models from the 2D photographs in the collection and placed them in our 3D virtual room. We evaluated our approach by comparing it to the conventional interface in Perseus using tasks drawn from archaeology courses and found a clear improvement. Subjects who used our 3D Vase Museum performed the tasks 33did so nearly three times faster.
	Ozgur Yilmazel, Christina M. Finneran, and Elizabeth D. Liddy. Metaextract: An nlp system to automatically assign metadata. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. At the Center for Natural Language Processing, we have developed MetaExtract, a system to automatically assign Dublin Core + GEM metadata using extraction techniques from our natural language processing research. MetaExtract is comprised of three distinct processes: eQuery and HTML-based Extraction modules and a Keyword Generator module. We conducted a Web-based survey to have users evaluate each metadata element's quality. The Title and Keyword elements were shown to be significantly different, with the manual quality slightly higher. The remaining elements for which we had enough data to test were shown not to be significantly different; they are: Description, Grade, Duration, Essential Resources, Pedagogy-Teaching Method, and Pedagogy-Group.
	Clement Yu, Prasoon Sharma, Weiyi Meng, and Yan Qin. Database selection for processing k nearest neighbors queries in distributed environments. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001. We consider the processing of digital library queries, consisting of a text component and a structured component in distributed environments. The text component can be processed using techniques given in previous papers. In this paper, we concentrate on the processing of the structured component of a distributed query. Histograms are constructed and algorithms are given to provide estimates of the desirabilities of the databases with respect to the given query. Databases are selected in descending order of desirability. An algorithm is also given to select tuples from the selected databases. Experimental results are given to show that the techniques provided here are effective and efficient.
	Yuehong Yuan, Stephen Roehrig, and Marvin Sirbu. Service models, operational decisions and architecture of digital libraries. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. Format: PostScript Document ().
	Z39.50 Maintenance Agency. Attribute set Bib-1 (Z39.50-1995): Semantics. Accessible at `ftp://ftp.loc.gov/pub/z3950/defs/ bib1.txt`, September 1995.
	Oren Zamir and Oren Etzioni. Grouper: A dynamic clustering interface to web search results. In Proceedings of the Eighth International World-Wide Web Conference, 1999. Users of Web search engines are often forced to sift through the long ordered list of document `snippets' returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on most major search engines. The NorthernLight search engine organizes its output into `custom folders' based on pre-computed document labels, but does not reveal how the folders are generated or how well they correspond to users interests. In this paper, we introduce Grouper, an interface to the results of the HuskySearch meta-search engine, which dynamically groups the search results into clusters labeled by phrases extracted from the snippets. In addition, we report on the first empirical comparison of user Web search behavior on a standard ranked-list presentation versus a clustered presentation. By analyzing HuskySearch logs, we are able to demonstrate substantial differences in the number of documents followed, and in the amount of time and effort expended by users accessing search results through these two interfaces.
	Lei Zhang, Longbin Chen, Mingjing Li, and Hongjiang Zhang. Automated annotation of human faces in family albums. In Proceedings of the 11th International Conference on Multimedia (MM2003), pages 355-358. ACM Press, 2003.
	Lei Zhang, Yuxiao Hu, Mingjing Li, Weiying Ma, and Hongjiang Zhang. Efficient propagation for face annotation in family albums. In Proceedings of the 12th International Conference on Multimedia (MM2004), pages 716-723. ACM Press, 2004.
	Meilan Zhang and Chris Quintana. Facilitating middle school students’ sense making process in digital libraries. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Previous research on using digital libraries in science classrooms indicated that middle school students tend to passively find answers rather than actively make sense of information they find in digital libraries. In response to this challenge, we designed a scaffolded software tool, the Digital IdeaKeeper, to support middle school students in making sense of digital library resources during online inquiry. This study describes preliminary results from a study to see how middle school students use different IdeaKeeper features. Participants include four eighth grade science classes taught by two teachers. Multiple sources of data were collected, including video recordings of students’ computer activities and conversations, students’ artifacts, log files and student final writing. Initial data analysis indicates that IdeaKeeper can facilitate online learners to engage in sense-making process in online inquiry.
	Xiaoni Zhang, Kellie B. Keeling, and Robert J. Pavur. Information quality of commericial web site home pages: an explorative analysis. In ICIS '00: Proceedings of the twenty first international conference on Information systems, pages 164-175, Atlanta, GA, USA, 2000. Association for Information Systems.
	D. Zhao. The elinor electronic library. In Advances in Digital Libraries '95, 1995. Format: Not Yet Online.
	W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld. Face recognition: A literature survey. ACM Computing Surveys, 35(4):399-458, DEC 2003.
	Bin Zhu, Marshall Ramsey, Hsinchun Chen, Hauck Rosie V, Tobun D. Ng, and Bruce Schatz. Create a large-scale digital library for geo-referenced information. In DL '99: Proceedings of the fourth ACM conference on Digital libraries, pages 260-261, New York, NY, USA, 1999. ACM Press.
	Ziming Zhuang, Rohit Wagle, and C. Lee Giles. What’s there and what’s not? focused crawling for missing documents in digital libraries. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. Some large scale topical digital libraries, such as CiteSeer, harvest online academic documents by crawling open-access archives, university and author homepages, and authors’ self-submissions. While these approaches have so far built reasonable size libraries, they can suffer from having only a portion of the documents from specific publishing venues. Alternative online resources and techniques that maximally exploit other resources to build the complete document collection of any given publication venue are discussed here. We investigate the feasibility of using publication metadata to guide the crawler towards authors’ homepages to harvest what is missing from a digital library collection. We collect a real-world dataset from two Computer Science publishing venues, involving a total of 593 unique authors over a time frame of 1998 to 2004. We then identify the missing papers that are not indexed by CiteSeer. Using a fully automatic heuristic-based system that has the capability of locating authors’ homepages and then using focused crawling to download the desired papers, we demonstrate that it is practical to harvest academic papers using a focused crawler that are missing from our digital library. Our harvester achieves a performance with an average recall level of 0.82 overall and 0.75 for those missing documents. Evaluation of the crawler’s performance based on the harvest rate shows definite advantages over other crawling approaches and consistently outperforms a defined baseline crawler on a number of measures.
	Yue Zhuge, Hector Garcia-Molina, Joachim Hammer, and Jennifer Widom. View maintenance in a warehousing environment. In SIGMOD Conference, pages 316-327, May 1995.
	Shlomo Zilberstein. An anytime computation approach to information gathering. In AAAI Spring Symposium on Information Gathering, 1995. Format: Compressed PostScript().
	S. Zinn, M. Sellers, and D. Bohli. OCLC's intelligent gateway service: Online information access for libraries. Library Hi Tech, 4(3):25-29, 1986.
	George Kingsley Zipf. Relative frequency as a determinant of phonetic change. Reprinted from the Harvard Studies in Classical Philology, XL, 1929.
	M.M. Zloof. Query-by-Example: a data base language. IBM Systems Journal, 16(4):324-343, 1977. The basic citation for QBE.
	J. Zobel, A. Moffat, and R. Sacks-Davis. An efficient indexing technique for full-text database systems. In Proceedings of the Eighteenth International Conference on Very Large Databases, pages 352-362, August 1992.
	Wenbo Zong, Dan Wu, Aixin Sun, Ee-Peng Lim, and Dion H. Goh. On assigning place names to geography related web pages. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005. In this paper, we attempt to give spatial semantics to web pages by assigning them place names. The entire assignment task is divided into three sub-problems, namely place name extraction, place name disambiguation and place name assignment. We propose our approaches to address these sub-problems. In particular, we have modified GATE, a well-known named entity extraction software, to perform place name extraction using a US Census gazetteer. A rule-based place name disambiguation method and a place name assignment method capable of assigning place names to web page segments have also been proposed. We have evaluated our proposed disambiguation and assignment methods on a web page collection referenced by the DLESE metadata collection. The results returned by our methods are compared with manually disambiguated place names and place name assignment. It is shown that our proposed place name disambiguation method works well for geo/geo ambiguities. The preliminary results of our place name assignment method indicate promising results given the existence of geo/non-geo ambiguities among place names.

This file has been generated by bibtex2html 1.78