Friday, May 16, 2008

MARC Online

More news from LOC.
The Network Development and MARC Standards Office is pleased to announce that the Full versions of the all five MARC 21 formats are now available online, along with the Online Concise.
The "full" version of a format contains detailed descriptions of every data element, along with examples, input conventions, and history sections - all of the information from the printed formats. There are no textual differences between the Online Full and the printed documentation. The Concise still contains all of the elements and enough description to serve many lookup needs. Changes from the most recent update of the formats are indicated in the text of both the Online Concise and the Online Full.

Links in LC Records

News about 856 links from LOC.
I've received a couple of questions recently about the 856 links in LC records for the TOCs, descriptions, bios, sample texts, etc. and wanted to spread the word about what we did.

Every month, around the first of the month, folks run their link checkers to validate the links in their copies of LC records. The volume of traffic against our web server was tremendous. A couple of times it nearly brought the server down. We tried several things to minimize the impact if it looked like a link checker was running against the web server, but this didn't seem to help the problem. In the end, we moved all of the files that are in the 856 fields to a different, larger, more robust server. Apparently this is causing link checkers to report that there is a redirect and people are asking if they need to change the URL for the links. I would say that there is no need to change the 856 links from http://www.loc.gov... to http://catdir.loc.gov.... In fact, I am still adding the URLs as http://www.loc.gov...

LC is committed to maintaining these URLs, you should not be experiencing access problems with them except when running link checkers or maybe harvesters. I appreciate any reports of wrong connections or other serious problems with the files. By my count, we have over 710,000 links in the LC catalog now, so you can see this is a major commitment for LC.

Wednesday, May 14, 2008

Manifestations and Near-Equivalents

Martha M. Yee continues to make her work readily available.
The two articles about 'manifestation' (the word everyone used to mean 'expression' until FRBR came along) that I published in 1994 are now available at the University of California eScholarship Repository, as follows:

Manifestations and Near-Equivalents: Theory, with Special Attention to Moving-Image Materials. Library Resources & Technical Services 1994; 38:227-256.

Manifestations and Near-Equivalents of Moving Image Works: a Research Project. Library Resources & Technical Services 1994; 38:355-372.

Re: Recommendation and Ranganathan

I hope everybody here is also reading Lorcan Dempsey's weblog. However, just in case there are some who don't, begin with the excellent post Recommendation and Ranganathan. I thought the description of the four types of metadata a very good place to start thinking and discussion.

Tuesday, May 13, 2008

eXtensible Text Framework (XTF)

The California Digital Library (CDL) is pleased to announce a new release of its search and display technology, the eXtensible Text Framework (XTF) version 2.1. XTF is an open source, highly flexible software application that supports the search, browse and display of heterogeneous digital content. XTF offers efficient and practical methods for creating customized end-user interfaces for distinct digital content collections.

Highlights from the 2.1 release include:
  • Extensive interface improvements, including new search forms, built-in faceted browsing, and a new look and feel.
  • Increased support for document and information exchange formats.
    • XHTML and OAI-PMH output
    • NLM article format indexing and output
    • Microsoft Word indexing
  • Streamlined XSLT stylesheets for simpler deployment and
    adaptation.
  • Updated documentation that has been moved to the XTF project wiki, allowing XTF implementers to share solutions with entire user community.
  • "Freeform" Boolean query language, offered as an experimental feature.
  • Backward compatibility with existing XTF implementations.
A complete list of changes is available on the XTF Project page on SourceForge, where the distribution (including documentation) can also be downloaded.

Since the first deployment of XTF in 2005, the development strategy has been to build and maintain an indexing and display technology that is not only customizable, but also draws upon tested components already in use by the digital library and search communities - in particular the Lucene text search engine, Java, XML, and XSLT. By coordinating these pieces in a single platform that can be used to create multiple unique applications, CDL has succeeded in dramatically reducing the investment in infrastructure, staff training and development for new digital content projects.

XTF offers a suite of customizable features that support diverse intellectual access to content. Interfaces can be designed to support the distinct tools and presentations that are useful and meaningful to specific audiences. In addition, XTF offers the following core features:
  • Easy to deploy: Drops directly in to a Java application server such as Tomcat or Resin; has been tested on Solaris, Mac, Linux, and Windows operating systems.
  • Easy to configure: Can create indexes on any XML element or attribute; entire presentation layer is customizable via XSLT.
  • Robust: Optimized to perform well on large documents (e.g., a single text that exceeds 10MB of encoded text); scales to perform well on collections of millions of documents; provides full Unicode support.
  • Extensible:
    • Works well with a variety of authentication systems (e.g., IP address lists, LDAP, Shibboleth).
    • Provides an interface for external data lookups to support thesaurus-based term expansion, recommender systems, etc.
    • Can power other digital library services (e.g., XTF contains an OAI-PMH data provider that allows others to harvest metadata, and an SRU interface that exposes searches to federated search engines).
    • Can be deployed as separate, modular pieces of a third-party system (e.g., the module that displays snippets of matching text).
  • Powerful for the end user:
    • Spell checking of queries
    • Faceted displays for browsing
    • Dynamically updated browse lists
    • Session-based bookbags
These basic features can be tuned and modified. For instance, the same bookbag feature that allows users to store links to entire books, can also store links to citable elements of an object, such as a note or other reference.

XTF was actually used as an experimental OPAC technology at the CDL for an experiment with ranking and recommendation features with our catalog data.

Posted to many e-mail distribution lists.

Non-Latin Data in Name Authority Records

From LC:
As previously announced, MDS- Name Authority records will be enhanced with non-Latin script data in 4XX fields and selected notes beginning June 1, 2008, (see earlier announcements at http://www.loc.gov/catdir/cpso/nonroman_announce.pdf and http://www.loc.gov/catdir/cpso/nonlatin_whitepaper.html for additional information.) An additional FAQ related to the project will be posted at http://www.loc.gov/aba/ shortly.

An effort to automatically pre-populate existing authority records with non-Latin references by OCLC, Inc. will also begin in early June 2008. The initial rate of pre-population will be limited to several hundred records per week, and will grow to a rate of approximately 25,000 records per week. Note that other clean-up projects that have recently increased the volume of name authority records (http://www.loc.gov/cds/notices/2008-02-14.pdf ) will be suspended during this pre-population effort. It is estimated that approximately 400,000 pre-population records will be distributed over a number of months.

CDS is making available a file of name authority test records containing non-Latin script data. The file of 110 test records can be found on the Library of Congress rs7 server under the /emds/test subdirectory with file names of names.nonlatintest.records for the MARC 8 version and names.nonlatintest.records.utf8 for the UTF8 version.

Spam

I've been blasted with comment spam. So I've had to turn on the comment moderation function.

It is a shame how these few folks can ruin things for all. A few years back a e-card was a fun thing to receive and send. now so many are spam, I've stopped sending and opening them. Open comments seem ready to go the same way.

Friday, May 09, 2008

Metadata for Learning Resources

Metadata for Learning Resources: An Update on Standards Activity for 2008 by Sarah Currier appears in the latest issue of Ariadne.
The major areas of development covered in this article are:
  1. LOM Next: plans for the next version of the IEEE LOM
  2. The Joint DCMI/IEEE LTSC (Learning Technology Standards Committee) Taskforce: bringing together the two major metadata standards used for learning resources, and providing an RDF translation for the LOM
  3. DC-Education Application Profile (DC-Ed AP): a modular application profile purely looking at educational aspects of resources, based on community requirements
  4. The United Kingdom’s Joint Information Systems Committee Learning Materials Application Profile (JISC LMAP) scoping study: working alongside a number of similar projects looking at application profiles for repositories in other areas, e.g. images.
  5. International Standards Organisation Metadata for Learning Resources (ISO MLR): based primarily in Canada, this international standards body is devising a new international standard for educational metadata, in response to perceived limitations of the IEEE LOM
  6. The European Commission’s PROLEARN Harmonisation of Metadata project: a study into the issues and challenges of achieving harmonisation in metadata, given the heterogeneous landscape

Thursday, May 08, 2008

Metadata Advocates

I had an Ah-Ha moment while listening to John Udell's show Interviews with Innovators. The episode was Working with Data Sources with Raymond Yee.
Raymond Yee is a lecturer at the UC Berkeley School of Information and the author of Pro Web 2.0 Mashups: Remixing Data and Web Services. In this conversation he talks about teaching students how to work with existing data sources, and speculates with Jon Udell on ways to expand the supply of available sources.
What struck me was that we should be advocates for metadata standards. If the local geneology society puts up a calendar on their website, help them get it into iCal or hCal format. Then we could drop their info into a pathfinder. Or geocoding the local bird-watchers sightings, or school district's lunch menu, or .... We could offer our understanding of the importance of standards and data reuse to our community. The library benefits by becoming the go-to-place for information management. The community benefits because they get the word out more effectively. It would be a very different job description for a cataloger to become the community data standard outreach person. But, not a bad place to be.

Resource Description and Access

Now available, Outcomes of the Meeting of the Joint Steering Committee Held in Chicago, USA, 13-22 April 2008.

Wednesday, May 07, 2008

Using Wikipedia

Two new reports from HP Labs show interesting uses of Wikipedia in information management.

Boosting Inductive Transfer for Text Classification using Wikipedia by Somnath Banerjee. HPL-2008-42
Inductive transfer is applying knowledge learned on one set of tasks to improve the performance of learning a new task. Inductive transfer is being applied in improving the generalization performance on a classification task using the models learned on some related tasks. In this paper, we show a method of making inductive transfer for text classification more effective using Wikipedia. We map the text documents of the different tasks to a feature space created using Wikipedia, thereby providing some background knowledge of the contents of the documents. It has been observed here that when the classifiers are built using the features generated from Wikipedia they become more effective in transferring knowledge. An evaluation on the daily classification task on the Reuters RCV1 corpus shows that our method can significantly improve the performance of inductive transfer. Our method was also able to successfully overcome a major obstacle observed in a recent work on a similar setting. Publication Info: Published and presented at ICMLA 2007, the Sixth International Conference on Machine Learning and Applications (ICMLA'07), 13-15 Dec. 2007 Cincinnati, Ohio, USA
Clustering Short Texts using Wikipedia by Somnath Banerjee, Krishnan Ramanathan, and Ajay Gupta. HPL-2008-41
Subscribers to the popular news or blog feeds (RSS/Atom) often face the problem of information overload as these feed sources usually deliver large number of items periodically. One solution to this problem could be clustering similar items in the feed reader to make the information more manageable for a user. Clustering items at the feed reader end is a challenging task as usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia. Empirical results indicate that this enriched representation of text items can substantially improve the clustering accuracy when compared to the conventional bag of words representation. Publication Info: Published and presented at SIGIR 2007, the 30th Annual International ACM SIGIR Conference, 23-27 July 2007, Amsterdam, Netherlands

Monday, May 05, 2008

Slick Deal

Here is a bargain offered by Amazon, OCLC - MARC Record. It has free shipping too! This was seen on Slick Deals.

Don't they know they can get all the free MARC records they want from their local library?

Thanks Walter.

Thursday, May 01, 2008

myLOC

I may have missed this news, maybe while I was at TxLA, but I've not seen it elsewhere; the Library of Congress now has a "my" portal, myLOC.

Statement of International Cataloging Principles

The Statement of International Cataloging Principles is available for worldwide review.
As Chair of the IFLA Meeting of Experts on an International Cataloging Code (IME ICC) I am pleased to invite comments from the worldwide library community on the final draft of the Statement of International Cataloguing Principles and its accompanying Glossary.

In order to provide the appropriate review period and to schedule adequate time to cumulate, analyze, and incorporate comments before the General Meeting of IFLA in August, the Statement is being posted today on a public Wiki. The IFLA Headquarters Office is closed for holiday April 30-May 5th, but as soon as they return we will move the files there and redirect from the Wiki. In the meantime please link to: http://catprinciples.pbwiki.com/ and view and/or download the Statement for your review; and please use the accompanying voting document for your response.

MARC Records

Ed Summers has "created a bittorrent of the concatenated MARC files donated to the Internet Archive by Scriblio (7,030,372 records)":

http://inkdroid.org/torrents/lc-bib.torrent

Wednesday, April 30, 2008

Library of Congress Subject Heading Suggestion Blog-a-Thon

The results for the Library of Congress Subject Heading Suggestion Blog-a-Thon are in. The effort resulted in 24 subject headings, 6 cross-references, and 2 subdivisions suggestions.

Tuesday, April 29, 2008

Transparency

Get Satisfaction looks like a unique 2.0 tool to make the organization transparent.
Get Satisfaction is a direct connection between people and companies that fosters problem-solving, promotes sharing, and builds up relationships. Thousands of companies use this neutral space to support customers, exchange ideas, and get feedback about their products and services. Get Satisfaction is open, transparent, and free. You’re free to ask, free to answer, and free to start a new conversation. Everyone is invited and encouraged to participate: companies, employees, customers — anyone with an opinion, an answer, or something to say.
A few libraries are repersented. Michael Stephens needs to see this.

Monday, April 28, 2008

Free Comic Book Day

Free Comic Book Day is this weekend, May 3.

Additions to the MARC Code Lists for Relators, Sources, Description Conventions

The codes listed below have been recently approved for use in MARC 21 records. The codes will be added to the online MARC Code Lists for Relators, Sources, Description Conventions.

The codes should not be used in exchange records until after June 25, 2008. This 60-day waiting period is required to provide MARC 21 implementers time to include newly defined codes in any validation tables they may apply to the MARC fields where the codes are used.

Category Code Sources
The following codes are for use in subfield $2 in field 072 in Authority and Bibliographic records (Subject Category Code) and in subfield $z in field 073 (Subdivision Usage) in Authority records.

Additions:

bisacsh
BISAC Subject Headings
(http://www.bisg.org/standards/bisac_subject/index.html) [use only after June 25, 2008]
bisacmt
BISAC Merchandising Themes
(http://www.bisg.org/standards/merchandising.html) [use only after June 25, 2008]
bisacrt
BISAC Regional Themes
(http://www.bisg.org/standards/region_codes.html) [use only after June 25, 2008]
Classification Sources
The following code is for use in subfield $2 in field 084 in Bibliographic and Community Information records (Other Classification Number), in subfield $2 in field 084 in Classification records (Classification Scheme and Edition) and in subfield $2 in field 065 in Authority records (Other Classification Number).

Addition:
blissc
British Library Inside service subject classification. (London: British Library) [use only after June 25, 2008]
Term, Name, Title Sources
The following codes are for use in subfield $2 in fields 600-657 and 662 in Bibliographic and Community Information records, and in subfield $f in field 040 (Cataloging Source) in Authority records.

Additions:
bisacsh
BISAC Subject Headings
(http://www.bisg.org/standards/bisac_subject/index.html) [use only after June 25, 2008]
bisacmt
BISAC Merchandising Themes
(http://www.bisg.org/standards/merchandising.html) [use only after June 25, 2008]
bisacrt
BISAC Regional Themes
(http://www.bisg.org/standards/region_codes.html) [use only after June 25, 2008]
quiding
Quiding, Nils Herman. Svenskt allmant forfattningsregister for tiden fran ar 1522 till och med ar 1862. (Stockholm: Norstedt) [use only after June 25, 2008]
skon
tt indexera skonlitteratur: Amnesordslista, vuxenlitteratur.
(Stockholm: Svensk biblioteksfrening) [use only after June 25, 2008]

Friday, April 25, 2008

More Comments on TLA

The drive from Houston to Dallas was beautiful. The blue bonnets had past, except for a few scattered patches. However, the brown eyed susans, winecups, indian paintbrushes, and a white flower (cow's parsley?) were spectacular.

At the RDA preconference I had the pleasure of heading Carol Seiler, from AMIGOS, speak. Great presentor.