Friday, December 02, 2005

Identifiers and Subject Access

A while back I posted a criticism of David Weinberger's piece in the Boston Globe. He was kind enough to respond. Since many folks might miss the comments, I'm reposting them here.
Here's what I was trying to say, in a highly-compressed article.

Of course subject headings let us classify objects in more than one way. But the number of subject headings under which an object can fall is limited by the physical constraints of card catalogs and books. Further, the physical world requires us to shelve books in one spot and not another. (Multiple copies can be shelved in multiple spots, but that gets messy fast.) So, if we want a collection through which users can roam, we have to make a decision about the primary subject area within which the book will be physically shelved, and then a limited number of other subheadings under which it can be classified (with some number of see-also's). The limit (ten for the LoC, for example) is based not on the number of subject headings that might be relevant but on the awkwardness of physical material.

Digitizing the content as well as the metadata not only removes the limitation, it also allows for richer ways of identifying books one might want to read. Subjects, author and title are obvious ways we want to find books, but there are many more relationships that are useful for locating books we know or don't yet know we want to read. Cf. Amazon for a commercially-inspired -- and plain old inspired -- example.

But, to enable these richer ways of finding books, we need identifiers. IMO (and it's an uncertain opinion), semantics-free global unique IDs are the best choice. The minimal semantics and prevalence of ISBNs make them a good candidate, although there are some obvious problems with them (e.g., they only started in the 1960s). In any case, there's no reason to stick with a single set of GUIDs because computers are good at coordinating multiple sets of related data. So bring on the multiple ID schemes! (I hope Google Print publishes whatever ID's its using internally.)

That's what my piece in the Globe intended to say. If it led readers to a different understanding, then I wrote it badly.
Libraries provide many more access points than authors, titles and subjects. Format, genre, geographic codes, publisher numbers, time codes, keywords, and dates of publication or content all spring readily to mind. The bibliographic record in a library catalog is a very rich source of metadata. How easy it is to access that richness is another story. Collocation by many different facets is possible with the current metadata. Users can roam through the search results as easily as through digital collections.

Due to concerns about patron privacy we have not implemented recommendation systems. I think we could do so and still protect an individual's personal data. I think we will move in that direction in the next few years.

Identifiers are a problem. There will, as you suggest, have to be many. There already are. Many records in a library catalog will contain an ISBN, EAN and UPC. Many other standard identifiers can be included in a bibliographic record.

A greater problem is what do the identifiers identify. If I'm looking for Hamlet do I want a particular format, or edition? Would a book on CD do or a large print, or a film do, or do I require the Everyman's edition with a particular introduction? ISBNs are acceptable for identifying a particular manifestation. Searching for a expression or all manifestations of a work is a problem. OCLC has the xISBN service that collects all other ISBNs for a work and allows searching by all of them. That helps somewhat, it is not a good long-term solution. Librarians are working on an identifier for works. Parts of a work will also need to have identifiers, maybe standard citations would work. The OpenURL is a possible solution since it uses citation data. The Functional Requirements for Bibliographic Records (FRBR) will be useful in pulling together all the different manifestations of a work and differentiating among them.

Folksonomies, trackbacks, reader's comments will all enrich access to materials in the library (either physical or digital) in the not too distant future. RSS allows distribution of new item lists and other information from libraries. This is already being done and will become more widespread.

No comments: