Thursday, January 24, 2008

Metadata Object Description Schema Revision

MODS version 3.3 is now available. Changes from version 3.2 are documented online.

ISBN Service

LibraryThing has a new API, one that corrects ISBNs and returns both a 10 and 13 digit ISBN. Very Restful. Just send an ISBN to and it will:
  • Give it any old ISBN and it does the math to return the ISBN10 and ISBN13 forms, if both exist.
  • It removes dashes and other junk.
  • It transparently fixes missing initial zeroes. This is a common problem with data from Excel files, which turn 0765344629 into 765344629.
  • If the ISBN isn't valid and can't be easily fixed, it returns an error.
They ask not to send more than 10 requests per minute second.

Wednesday, January 23, 2008

Photo Preservation Metadata

Photoplus: Auxiliary Information for Printed Images Based on Distributed Source Coding by Ramin Samadani and Debargha Mukherjee (HPL-2008-2) discusses some metadata for photographs that may be useful for preservation.
A printed photograph is difficult to reuse because the digital information that generated the print may no longer be available. This paper describes a mechanism for approximating the original digital image by combining a scan of the printed photograph with small amounts of digital auxiliary information kept together with the print. The auxiliary information consists of a small amount of digital data to enable accurate registration and color-reproduction, followed by a larger amount of digital data to recover residual errors and lost frequencies by distributed Wyner-Ziv coding techniques. Approximating the original digital image enables many uses, including making good quality reprints from the original print, even when they are faded many years later. In essence, the print itself becomes the currency for archiving and repurposing digital images, without requiring computer infrastructure. Publication Info: To be published and presented at VCIP 2008 - Visual Communications and Image Processing 2008, San Jose, CA

Cataloging Streaming Media

Good news from OLAC.
The Best Practices for Cataloging Streaming Media document is available on the OLAC website. Created by the CAPC Streaming Media Best Practices Task Force, it presents best practice guidelines and examples for cataloging both streaming video and audio, based on AACR2. It also presents definitions and examples of resources that can be considered as streaming media.

This document is available in both HTML and PDF formats.

Tuesday, January 22, 2008

Freebase Books Schema

Freebase is an interesting project, they accept data sets and then provide a platform to access them. Something like the Talis platform? They have a section for books and have a schema for book information. I'm not sure of the mechanics behind it all. I'd guess RDF would make the cross data set links easier. Here is an example of bibliographic data being just one type of data in a much larger system with connecitons to other data sets. Interesting.
Freebase is an open database of the world’s information. It is built by the community and for the community--free for anyone to query, contribute to, built applications on top of, or integrate into their websites.

Already, Freebase covers millions of topics in hundreds of categories. Drawing from large open data sets like Wikipedia, MusicBrainz, and the SEC, it contains structured information on many popular topics, like movies, music, people and locations--all reconciled and freely available via an open API. This information is supplemented by the efforts of a passionate global community of users, who are working together to add structured information on everything from philosophy to European railway stations to the chemical properties of common food ingredients.

In fact, part of what makes Freebase unique is that it spans domains--but requires that a particular topic exist only once in Freebase, even if it might normally be found in multiple databases. For example, Arnold Schwarzenegger would appear in a movie database as an actor, a political database as a governor and a bodybuilder database as a Mr. Universe. In Freebase, there is only one topic for Arnold Schwarzenegger, with all three facets of his public persona brought together. The unified topic acts as an information hub, making it easy to find and contribute information about him.

For books they have a work-like idea, a bit FRBR-like.
"Book" represents the abstract notion of a particular book, rather than a particular edition. It is on this level that articles or discussion about a book should generally occur (e.g., the article about Mary Shelley's "Frankenstein" is on the book topic, rather than on one or more of the hundreds of editions it has gone through). The book topic should also be used for connections to other types, such as films that have been adapted from a book.

Addition to the MARC Code Lists for Relators, Sources, Description Conventions

The code listed below has been recently approved for use in MARC 21 records. The code will be added to the online MARC Code Lists for Relators, Sources, Description Conventions.

This code should not be used in exchange records until after March 18, 2008. This 60-day waiting period is required to provide MARC 21 implementers time to include newly defined codes in any validation tables they may apply to the MARC fields where the codes are used. Term, Name, Title Sources

The following code is for use in subfield $2 in fields 600-657 (Subject Added Entries) in Bibliographic and Community Information records, field 662 (Subject Added Entry) in Bibliographic records, fields 700-788 (Heading Linking Entries) in Authority records and in subfield $f in field 040 (Cataloging Source) in Authority records.

Queens Library Spanish language subject headings (Queens, NY: Queens Library) [use only after March 18, 2008]