Tuesday, August 18, 2009

Dewey Classification as Linked Data

News from the Dewey office.
For a long time, we wanted to do something with Linked Data. That is, apply Linked Data principles to parts of the Dewey Decimal Classification and present the data as a small “terminology service.” The service should respond to regular HTTP requests with either a machine- or a human-readable presentation of Dewey classes. There should be a URI (and, even better, a web page that delivers a useful description) for every Dewey concept, not just single classes. The data should be presented in a format that is capable of handling rich semantic information and in a way that allows users or user agents to just follow their nose to explore the data. For more complex stuff, the service should offer an API-like query access. Finally, the data that are presented should be reusable by anyone for non-commercial purposes.

Some results of these efforts are now available as dewey.info. We had to come up with a URI pattern for the DDC that would generate persistent identifiers for DDC concepts in a distributed environment. Secondly, we wanted to test out the RDF vocabulary SKOS for creating a representational model to express some of the best nuggets of DDC data (language-independent identifiers, multilingual terminology, and semantic relationships). And finally, because Linked Open Data is not really open when you have to ask someone before you can use it, we wanted to test out a Creative Commons license for easier reuse of DDC data for non-commercial purposes.

We chose the DDC Summaries as a first data set to publish according to Linked Data principles. The latest version of the Summaries, i.e., the top three levels of DDC 22, has been available as a web document for some time. To broaden the possible applications of what now essentially is just tag soup (in only one language), every class had to be identified with a URI and the data had to be presented in a reusable way. Please give it a try at dewey.info. An extended overview of the service can be found here, a slightly more technical description is available on the OCLC Developer Network wiki.

What you see now is only the first step. The intention of dewey.info is to be a platform for Dewey data on the web; more is to come in terms of languages, deeper data, and links to other datasets. The DDC has been widely used as a knowledge organization tool, and the way the URIs have been set up should allow the construction of links based on existing metadata in resource descriptions like bibliographic records.

An example: As the World Digital Library is getting ready to deploy RDF views of its data, instead of just including the Dewey number as a literal (or pointing to some other data source), they could include URIs to dewey.info to tap into the SKOS relationships pointing to broader and narrower classes for retrieval interactions, etc. In turn, this establishes a link from a Dewey class to other vocabularies used concurrently for that resource, to Wikipedia, etc. Links can go both ways and benefit all participating data sets, establishing a graph of Linked Data.

No comments: