Friday, June 29, 2012

Jessamyn West Podcasts

Jessamyn West, always articulate and worth a listen has been on a couple of podcasts recently. I've been meaning to post this but she made it easy for me.
I was interviewed by Steve Thomas for his Circulating ideas podcast a few weeks ago and interviewed by Kayhan B., Erin Anderson and Doug Mirams for their Bibliotech podcast a week earlier. I don’t listen to many professional-type podcasts but both of these conversations were a really good chance to talk over some of the issues facing the profession today in addition to just me going “bla bla…” about myself. Both shows have had a host of other guests and I’ve been digging around in the archives finding other stuff to listen to. If you’re podcast-oriented, these are two shows to put in regular rotation.
I do listen to podcasts, as I drive to work, and both of these are both on my podcatcher.

Monday, June 25, 2012

Code4Lib Journal Articles

The latest issue of Code4Lib Journal has a couple of interesting articles.
Using Semantic Web Technologies to Collaboratively Collect and Share User-Generated Content in Order to Enrich the Presentation of Bibliographic Records–Development of a Prototype Based on RDF, D2RQ, Jena, SPARQL and WorldCat’s FRBRization Web Service
Ragnhild Holgersen, Michael Preminger, David Massey

In this article we present a prototype of a semantic web-based framework for collecting and sharing user-generated content (reviews, ratings, tags, etc.) across different libraries in order to enrich the presentation of bibliographic records. The user-generated data is remodeled into RDF, utilizing established linked data ontologies. This is done in a semi-automatic manner utilizing the Jena and the D2RQ-toolkits. For the remodeling, a SPARQL-construct statement is tailored for each data source.

In the data source used in our prototype, user-generated content is linked to the relevant books via their ISBN. By remodeling the data according to the FRBR model, and expanding the RDF graph with data returned by WorldCat’s FRBRization web service, we are able to greatly increase the number of entry points to each book. We make the social content available through a RESTful web service with ISBN as a parameter. The web service returns a graph of all user-generated data registered to any edition of the book in question in the RDF/XML format. Libraries using our framework would thus be able to present relevant social content in association with bibliographic records, even if they hold a different version of a book than the one that was originally accessed by users. Finally, we connect our RDF graph to the linked open data cloud through the use of Talis’ SPARQL endpoint.

GLIMIR: Manifestation and Content Clustering within WorldCat
Janifer Gatenby, Richard O. Greene, W. Michael Oskins, Gail Thornburg

The GLIMIR project at OCLC clusters and assigns an identifier to WorldCat records representing the same manifestation. These include parallel records in different languages (e.g., a record with English descriptive notes and subject headings and one for the same book with French equivalents). It also clusters records that probably represent the same manifestation, but which could not be safely merged by OCLC’s Duplicate Detection and Resolution (DDR) program for various reasons. As the project progressed, it became clear that it would also be useful to create content-based clusters for groups of manifestations that are generally equivalent from the end user perspective (e.g., the original print text with its microform, ebook and reprint versions, but not new editions). Lessons from the GLIMIR project have improved OCLC’s duplicate detection program through the introduction of new matching techniques. GLIMIR has also had unexpected benefits for OCLC’s FRBR algorithm by providing new methods for identifying outliers thus enabling more records to be included in the correct work cluster.