Friday, October 27, 2006

TLA 2007

At ALA I enjoyed the blogger reception sponsored by OCLC. Maybe some company could do the same at the Texas Library Association annual conference. Enough folks to make it interesting.

Melvyl Recommender Project

Steve Toub of the California Digital Library has posted this to a couple of e-mail lists.

The Melvyl Recommender Project, which explored next-generation services for library catalogs, has reached its conclusion. This project was funded by the Andrew W. Mellon Foundation.

Popular commercial services such as Google, eBay, Amazon, and Netflix have evolved quickly over the last decade to help people find what they want, developing information retrieval strategies such as usefully ranked results, spelling correction, and recommendations. Library catalogs, in contrast, have changed little and are not well equipped to meet changing needs and expectations.

The Melvyl Recommender Project explored methods and feasibility of closing this gap. An additional extension project to the Melvyl Recommender Project carried out deeper explorations into the most interesting and promising questions raised during the original project, and to add obvious missing pieces of functionality. The principal area of investigation was the impact of adding full-text objects to what had previously been a metadata-only index.

Overall findings from both portions of the project include:
  • The text-based discovery application, the eXtensible Text Framework
    (XTF) that was the backbone of the project's system (known as "Relvyl") proved capable of scaling to millions of records and hundreds of concurrent users, indicating that this is an approach worth pursuing for providing ranking, recommendation and other types of functionality with an online catalog.
  • Use of an index based single word spelling correction algorithm addressed 90 percent of misspelled single words.
  • Initial examination of faceted browsing and FRBR-like document groups indicated that each of these features could substantially improve the patron's experience of working with large result sets.
  • User assessment confirmed that users prefer relevance ranked results over unranked results, although more investigation is required to determine whether content-based ranking with or without different types of weights (based on circulation or holdings) is more effective.
  • Two types of recommendation strategies were explored:
    circulation-based ("patrons who checked this out also checked out...") and text-similarity ("More like this..."). User assessment was conducted against the first type and showed that users like getting recommendations, which are useful for performing academic tasks, and they can also serve a unique query expansion function.
  • Adjustments to keyword searching strategies, document scoring and the index-based spelling correction dictionary allowed for an effective combination of full-text and metadata only records into one system, in which neither type of record was privileged.
Much of the functionality explored in both phases of the project can be found in the Relvyl prototype.

More information about the entire project can be found on the CDL website.

Thursday, October 26, 2006

Library Journal

Library Journal is offering a free subscription to U.S. library school students.
That's right, free! We are pleased to offer a free one-year subscription to Library Journal to all library students in the United States (sorry, Canada). No strings attached, no hidden fees, nothing but a full year of Library Journal delivered straight to your door. All you need is a valid student ID. Sign up.

Wednesday, October 25, 2006

Typo of the Day

This is for those of you who get this via e-mail or an RSS feed. In the sidebar on the Web site I've added a box for the Typo of the Day. This is not a substitute for subscribing to their weblog, but a pointer and reminder of its existence.

Tuesday, October 24, 2006

Librarypages Podcast

Here is a new library science podcast, Librarypages Podcast. The initial episode is a discussion on cataloging with Dr. Shawne D. Miksa. That's a good start.
Are you curious about issues that go on in the world of library science; even if it isn't your field of expertise? Then this is one place you'll want to check out.

Every other week podcast is posted to help investigate the different areas and issues in library science. Come and listen to those in the academic and professional world discuss the issues that effect us all.

This podcast is updated bi-weekly and it is completely free to listen to.

IWF Metadata Harvester

Posted to Code4lib.

I'd like to announce the publication of the IWF Metadata Harvester.

This package reads data from servers, writes it to databases, implements various kinds of searches, and writes HTML files to display the results. It currently handles data from two kinds of interface: OAI (Open Archives Initiative), which provides XML, and Z39.50, using the Pica format.

The IWF Metadata Harvester is Free Software. The program code is published under the terms of the GNU General Public License, and the documentation under those of the GNU Free Documentation License. It is available from this FTP server. I am currently looking into the possibility of having the package hosted on a website for software developers.

The IWF Metadata Harvester has been developed on a system running Microsoft Windows Server 2003 and using Visual Studio, Visual C++ .NET and Microsoft SQL Server 2000. However, I have tried to make it as portable as possible under the circumstances by avoiding the ATL and MFC types, classes, etc., as much as I could. However, it has not been possible to do without them completely. More recent code uses them less. I hope that it will be possible to use the files of SQL code with free database packages without too many alterations. I plan to port the package to GNU/Linux myself at the earliest opportunity.

I have used Donald Knuth and Silvio Levy's CWEB package for the C++ code, so that pretty-printed versions of the programs are included in the package. This directory also contains the manual, which has been written using the Texinfo package. It is available in the following formats:
DVI, PostScript, PDF, and HTML.

Because the IWF Metadata Harvester is Free Software, libraries, archives, or any other providers of metadata could make it available to users with no purchase, registration, special license arrangements, or (subject to local laws) liability.

The IWF Metadata Harvester is a work-in-progress. I would be very interested to know whether any libraries, archives, other institutions, businesses, or private persons would find it useful.

Any feedback would be much appreciated.

Laurence Finston

IWF Wissen und Medien gGmbH
Nonnenstieg 72
37075 Goettingen

Monday, October 23, 2006

Changes at AUTOCAT

This message was posted to AUTOCAT friday.

It is time for AUTOCAT to find a new home and new management.

During my recent visit to Buffalo I discussed the future of AUTOCAT with the Acting Vice-President for University Libraries and with an official in the computing center. The consensus was that we need to find a new site to host Autocat and new owner(s). If that is not possible AUTOCAT will go out of existence.

AUTOCAT: Library cataloging and discussion group began in October 1990 at the University of Vermont under the ownership of founder Nancy Keane. On April 28, 1993 it moved to the University at Buffalo and I became listowner. At that time it had 1880 subscribers in 24 countries; today it has 4600 subscribers in 42 countries.

AUTOCAT utilizes Eric Thomas' LISTSERV electronic discussion list software. Any host wishing to assume responsibility for AUTOCAT needs to have a Listserv application that will continue to support and maintain the list's database archival capabilities. The University at Buffalo currently runs Listserv version 14.5 but any version 1.8e or later (14.1, 14.2, 14.3, 14.4 or 14.5) should be OK.

Listserv is available on a wide range of computer operating systems. The hardware should be powerful enough to handle the volume of messages without getting bogged down. AUTOCAT averages around 25 posts per day with a distribution of 4600 subscribers, so we're looking at a daily traffic of about 115,000 messages.

It is also desirable that the list's archive file (dating from January 2, 1991 and currently containing over 120,000 messages) be exported to the new host and mounted as a searchable resource. AUTOCAT also has a listfile of files on various frequently-discussed topics that subscribers can retrieve; these should also be moved to the new host.

The requirements for the listowner(s) are that you should have some experience in running an electronic discussion list, have your supervisor's approval, and, of course, are interested in being a listowner. The amount of time required will vary from day to day but you should expect to spend about an hour a day on this job.

Those interested in becoming the new AUTOCAT listowners should send me an offer that contains your name, job title, e-mail address, and a brief description of your experience with running a list. Also needed is a description of your hosting environment (operating system, Listserv version, etc.) and the name, title, and e-mail address of the person responsible for managing lists at your site. All offers will be reviewed by me and by Jim Serwinowski of the UB Computing Center.

Please address your offers, questions, etc. to Copy all offers to

The deadline for receiving offers is October 27, 2006.