Monday, July 07, 2008

Universal Decimal Classification

Maintenance of the Universal Decimal Classification: overview of the past and preparations for the future by Aida Slavic and Maria Ines Cordeiro and Gerhard Riesthuis appears in International Cataloguing and Bibliographic Control 37(2):pp. 23-29.
The paper highlights some aspects of the UDC management policy for 2007 and onwards. Following an overview of the long history of modernization of the classification, which started in the 1960s and has influenced the scheme's revision and development since 1990, major changes and policies from the recent history of the UDC revision are summarized. The perspective of the new editorial team, established in 2007, is presented. The new policy focuses on the improved organization and efficiency of editorial work and the improvement of UDC products.

Better Targeted Ads

Computing Semantic Similarity Using Ontologies by Rajesh Thiagarajan, Geetha Manjunath, and Markus Stumptner is a new HP Lab Report.
Determining semantic similarity of two sets of words that describe two entities is an important problem in web mining (search and recommendation systems), targeted advertisement and domains that need semantic content matching. Traditional Information Retrieval approaches even when extended to include semantics by performing the similarity comparison on concepts instead of words/terms, may not always determine the right matches when there is no direct overlap in the exact concepts that represent the semantics. As the entity descriptions are treated as self-contained units, the relationships that are not explicit in the entity descriptions are usually ignored. We extend this notion of semantic similarity to consider inherent relationships between concepts using ontologies. We propose simple metrics for computing semantic similarity using spreading activation networks with multiple mechanisms for activation (set based spreading and graph based spreading) and concept matching (using bipartite graphs). We evaluate these metrics in the context of matching two user profiles to determine overlapping interests between users. Our similarity computation results show an improvement in accuracy over other approaches, when compared with human-computed similarity. Although the techniques presented here are used to compute similarity between two user profiles, these are applicable to any content matching scenario.

Thursday, July 03, 2008

eXtensible Catalog & Koha

News from LibLime about Koha and the eXtensible Catalog.
LibLime, the leader in open-source solutions for libraries and the eXtensible Catalog (XC) project-- an Andrew W. Mellon Foundation funded project currently underway at the University of Rochester's River Campus Libraries-- have announced a new partnership agreement to ensure future compatibility between the XC project and Koha, the first open-source integrated library system.

The XC/LibLime partnership will ensure that the open-source software being developed as part of the XC project and the Koha open-source integrated library system will be fully compatible with each other, enabling current and future users of Koha to take advantage of the added capabilities for managing and distributing metadata that XC will offer. These benefits include facilitating the ability to combine legacy metadata with emerging schemas, and delivering library content to web content management and learning management systems.

Wednesday, July 02, 2008

Changes to MARC Code List for Languages

As a result of a formal request from the National Libraries of Serbia and Croatia and those countries' national standards bodies to the ISO 639 Joint Advisory Committee, the MARC language codes for Serbian and Croatian will be changed as below from the ISO 639-2 bibliographic codes (ISO 639-2/B) to the ISO 639-2 terminology codes (ISO 639-2/T). This change also supports established usage in bibliographic databases in Croatia. Because the codes are obsolete, rather than deleted, they may still appear in bibliographic records created before the implementation of this change.


New CodeLanguage NamePreviously Coded
srpSerbianscc
hrvCroatianscr
Subscribers can anticipate receiving MARC records reflecting these changes in all distribution services not earlier than September 1, 2008.

Martha Yee Articles

Some more articles by Martha Yee are now available.

Integration of Nonbook Materials in AACR2. Cataloging & Classification Quarterly 1983; 3:1-18.

Attempts to Deal With the Crisis in Cataloging at the Library of Congress in the 1940's. Library Quarterly 1987 Jan; 57:1-31.

What is a Work? In: The Principles and Future of AACR: Proceedings of the International Conference on the Principles and Future Development of AACR, Toronto, Ontario, Canada, October 23-25, 1997. Ed., Jean Weihs. Ottawa: Canadian Library Association; Chicago: American Library Association, 1998: 62-104.

Editions: Brainstorming for AACR2000. In: The Future of the Descriptive Cataloging Rules: Papers from the ALCTS Preconference, AACR2000, American Library Association Annual Conference, Chicago, June 22, 1995. Ed., Brian E.C. Schottlaender. (ALCTS Papers on Library Technical Services and Collections, no. 6) Chicago: American Library Association, 1998: 40-65.

Viewpoints: One Catalog or No Catalog? ALCTS Newsletter 1999; 10:4:13-17.

Lubetzky's Work Principle. In: The Future of Cataloging: Insights from the Lubetzky Symposium, April 18, 1998, University of California, Los Angeles. Ed., Tschera Harkness Connell, Robert L. Maxwell. Chicago: American Library Association, 2000.

Tuesday, July 01, 2008

RDA News

News from RDA.
The Co-Publishers of RDA Online (the American Library Association, the Canadian Library Association, and the Chartered Institute of Library and Information Professionals) have reached the conclusion that further time is required to complete the development of the new software that will be used for distributing the full draft of RDA for constituency review.

The full draft was originally scheduled for release on August 4, 2008. Instead, it will now be issued in October 2008. The three month time period allocated for comments on the full draft is unchanged, and in this new schedule will extend from October into January 2009. More specific dates for RDA's final release will be forthcoming shortly.

Members of the Committee of Principals (CoP) and the Joint Steering Committee for Development of RDA (JSC) agree that the importance of distributing RDA content in a well-developed and tested version of the new software is such that a two-month delay is justified. They concluded that this extension is worthwhile given the ultimate value of the exceptional effort that is going into RDA and feel that the review by constituencies will be enhanced as a result.

OCLC Terminology Services

Terminology Services, an Experimental Services for Controlled Vocabularies, a project of OCLC Research is now available.

Highlights

  • Search descriptions of controlled vocabularies
  • Search for concepts/headings in a controlled vocabulary
  • Retrieve a single concept/heading by its identifier
  • View relationships for a concept/heading including equivalence, hierarchical, and associative
  • Retrieve concepts/headings in multiple representations including HTML, MARC XML, SKOS, and Zthes.
  • Search using SRU CQL syntax
Vocabulary Resources include:
  • FAST subject headings
  • GSAFD Form and genre terms
  • Library of Congress AC Subject Headings
  • Library of Congress Subject Headings
  • Medical Subject Headings
  • Thesaurus for graphic materials: TGM I
  • Thesaurus for graphic materials: TGM II

New Union Catalog

The Avi Chai Foundation has announced a new tool for Judaica librarians — the Avi Chai Bookshelf Union Catalog. The union catalog, contains the MARC bibliographic holdings of 31 Jewish high school libraries in the United States and Canada that have been recipients of Avi Chai's Bookshelf grant. The Avi Chai Union Catalog runs on the OPALS (open source) library automation system.

Monday, June 30, 2008

Discovery at Safari Books

Jeff Patterson, CEO, Safari Books Online LLC spoke at the O'Reilly Tools of Change Conference on Valuing Content in a Web-enabled World
To effectively market their wares, publishers need to understand how their content is valued by the audience. With the web turning traditional distribution models on their head, easy searchability and access to a variety of free and paid resources must be considered. Jeff Patterson shares research on the information seeking habits of his client base of IT professionals. As users weigh the worth of information in exchange for their time, money and attention, publishers must grasp not just what is sold, but what is read, used and reused....

Money is one part of the equation, but time, and willingness to share personal details, are also important forms of currency. Patterson's studies posed a number of scenarios which revealed different behaviors depending on the urgency of the information seeking. Subscribers researching a long term question tended to start with paid resources such as online subscriptions or print books. Those with urgent business questions were more likely to use search engines as their first tool. These different behaviors bring home the point that products must be discoverable within a sea of available options. Information consumers will place a value on different resources depending on their context. The burden is now on the publishers to understand how their information is being used.

Friday, June 27, 2008

2008 Midwinter MARBI Meeting Minutes

The 2008 Midwinter MARBI Meeting minutes are now available online.

Cataloging Principles and RDA

Cataloging Principles and RDA by Barbara Tillett is the newly available webcast from LC.
The second in a series on RDA: Resource Description and Access, the next generation cataloging code designed for the digital environment. This presentation deals with the cataloging principles that have influenced the development of RDA; the challenges they present to the international sharing of bibliographic and authority data; and the challenges they present to the developers of RDA.

Wednesday, June 25, 2008

Metadata for Resource Discovery

Metadata to Support Next-Generation Library Resource Discovery: Lessons from the eXtensilble Catalog, Phase 1 by Jennifer Bowen has been published in the June 2008 issue of Information Technology and Libraries (p. 6-19).

The slides for her upcoming talk at ALA as part of the ALCTS Program, Creating the Future of the Catalog and Cataloging (Sunday morning, June 29, 8 AM-12 PM, Anaheim Convention Center, Room 204B) are on the XC Shared Results Page.

The next time nominations roll around for Movers and Shakers someone should nominate Jennifer. Her work on RDA and the eXtensilble Catalog more than qualify her.

Delay in Publication of 31st Edition of Library of Congress Subject Headings

News from LC.
Delay in publication of 31st edition of Library of Congress Subject Headings

Due to production problems, the 31st edition of the five-volume printed edition of the Library of Congress Subject Headings, commonly referred to as the Red Books, will not be available until the spring of 2009. The data cutoff date for the 31st edition will now be December 31, 2008.

Open Source OPAC

Rapi is yet another open-source OPAC project. It uses Lucene and Ruby like most of the projects do.
Rapi is an open-source project of the WING group in the School of Computing, National University of Singapore licensed under the MIT license. Rapi provides an OPAC package that allows you to:
  1. Build a Lucene index from your MARC files
  2. Screen scrape live circulation data from your own iii OPAC
  3. Wrap your OPAC with a customizable user interface
The user interface packaged with Rapi has been tested with Firefox 2 and 3 as well as Internet Explorer 7. The user interface supports a variety of features including tabs, an overview+details view, and a suggestion bar among many others. Note that although the user interface supports query suggestions, the package currently does not provide any suggestion modules. With that said, if you do have query suggestion modules, they can be easily integrated with the package. As an example, our live demo incorporates a spelling suggestion module.

Distributed Metadata Control Systems

Distributed Version Control and Library Metadata by Galen M. Charlton.
Distributed version control systems (DVCSs) are effective tools for managing source code and other artifacts produced by software projects with multiple contributors. This article describes DVCSs and compares them with traditional centralized version control systems, then describes extending the DVCS model to improve the exchange of library metadata.
Interesting suggestion. Network theory applied here. Only one node would be useless, two or three nodes interesting depending on the institutions, something like the old Linked System Project. More widespread adoption would make it much more useful.

Approved Books

The Open Library folks are considering adding information about banning to their bibliographic records. Other than MPAA ratings does anyone add approval by some body to their bibliographic records? I can remember seeing Nihil obstat and Imprimi potest on some books growing up. Is this still useful to some patrons for selecting an item?

Cross-concordances

Mayr, Philipp and Petras, Vivien (2008) Cross-concordances: terminology mapping and its effectiveness for information retrieval. World Library and Information Congress: 74th IFLA General Conference and Council, Québec, Canada.
The German Federal Ministry for Education and Research funded a major terminology mapping initiative, which found its conclusion in 2007. The task of this terminology mapping initiative was to organize, create and manage ‘cross-concordances’ between controlled vocabularies (thesauri, classification systems, subject heading lists) centred around the social sciences but quickly extending to other subject areas. 64 crosswalks with more than 500,000 relations were established. In the final phase of the project, a major evaluation effort to test and measure the effectiveness of the vocabulary mappings in an information system environment was conducted. The paper reports on the cross-concordance work and evaluation results.

Script Codes

One of the issues being considered by MARBI, Discussion Paper No. 2008-DP05, is how to indicate the script used in the bibliographic record. There is strong support for using the ISO 15924 Code List, Codes for the representation of names of scripts or Codes pour la représentation des noms d’écritures.

Thursday, June 19, 2008

FireFox Problems

I got the new improved FireFox, version 3, yesterday and now I'm using MS Explorer. FF3 is SLOW. I can't get into Blogger. Several add-ons I liked, TinyURL Creator, Link Evaluator, Persistent URL Bookmarker, and Map+ (opens a map for any address) don't work. I'm going to have to investigate wither it is possible to roll-back to the old version. I sure hope so. My advice, FWIW, wait.

It is the portable version of FireFox, maybe the regular version would not be so slow. It still wouldn't have the add-ons.

Operator+, an add-on that allows working with microformats is not working properly. I can't seem to export hCal events to Outlook.

June 24, I've reverted to an older version of FF Portable. All my tools are working again. At home I plan on moving to FF3. It will not be the portable version and the add-on tools are much less important.

Wednesday, June 18, 2008

OCLC Group Services

I've just heard of OCLC Group Services, a way for small libraries to participate in OCLC. Anyone have any experience with a group? Any group willing to have the Lunar and Planetary Institute Library become a member?

Tuesday, June 17, 2008

The Future of Cataloging: A PALINET Symposium

MP3s and slides from The Future of Cataloging: A PALINET Symposium are now available. The talks were:
  • Keynote Address, Karen Calhoun "Traveling Through Transitions in Technical Services: From Surviving to Thriving"
  • Response to Keynote, Panel Discussion / Beth Picknally Camden
  • Functional Requirements for Bibliographic Records (FRBR) and Current Development and Implementation Plans for Resource Description and Access (RDA) / John Attig
  • On the Record, One View of the Future – Library of Congress Report on the Future of Bibliographic Control / Nancy Fallgren
  • Making Special Collections Not So Special? The Implications for Archives and Special Collections of the Report of the Library of Congress Working Group on the Future of Bibliographic Control / Christine Di Bella
  • High Quality Discovery in a Web 2.0 World: Architectures for Next Generation Catalogs / John Mark Ockerbloom
  • Summary & Closing Remarks / Dina Giambi

Monday, June 16, 2008

Tagging

@toread and Cool : Subjective, Affective and Associative Factors in Tagging. In Proceedings Canadian Association for Information Science/L'Association canadienne des sciences de l'information (CAIS/ACSI), Vancouver, British Columbia (Canada).
This paper examines the use of non subject related tags in social bookmarking tools. Previous studies of tagging determined that many common tags are not directly subject related but are in fact affective tags dwelling on a user's emotional response to a document or are time and task related tags related to a users current projects or activities. These tags have been analysed to examine their role in the tagging process.
While not an academic study, the experience of LibraryThing in cleaning up tags for sale to libraries might be an interesting comparison. The study compares Del.icio.us, Connotea and CiteULike. It would be interesting to see how other tagging sites compare. What is the difference between tagging books, articles, websites and toasters? Is tagging different in different cultures? Do people in Japan tag differently than those in France? How about folk in Economics and Astrophysics? Lots of room for more research here. The next step would be to use the findings to inform our construction of subject headings. The FRBR group working on subjects might have a new body of knowledge to use in their work.

Friday, June 13, 2008

MARBI @ ALA

The remainder of the June 2008 MARC Advisory Group proposals have been posted and linked to the agenda for the meeting.

Chopac.org

Chopac.org has some interesting cataloging tools. There is an Amazon to MARC converter, DDC22 summaries, Amazon review server, and some others. They also have an ILS to download. Runs in the LAMP environment. They seem to have it up and running on their site. It gets additional info from Amazon and Google Books to enrich the records.

Thursday, June 12, 2008

On Descript

When I started this weblog back in 2002 nobody was covering cataloging. There was AUTOCAT, great place for discussion. But no one place was acting as a news source. Now there are plenty of other place to keep current in cataloging, check Planet Cataloging for a good list of weblogs in this space. Now another voice joins the chorus, On Descript, and we are richer for it.
On Descript is a forum dedicated to all things description in Library and Information Science (LIS). Here, you'll find information on subjects like cataloging, indexing, abstracting and the foundations of description practices in LIS. Please share your ideas!
Not yet covered by Planet Catalog, so visit his site.

Tuesday, June 10, 2008

Functional Requirements for Bibliographic Records

A German translation of the text of Functional Requirements for Bibliographic Records (FRBR) as amended and Japanese translations of the recently published errata and the amendment to the expression entity have been made available through IFLANET.

Monday, June 09, 2008

DCMI Registry Task Group

From the DCMI page.

DCMI Registry Task Group: call for participation.

A DCMI Registry Task Group has been set up with the primary aims of developing shared functional requirements and inter-registry interoperability issues. This group is currently recruiting participants. Those with an interest in metadata schema registries, terminology registries, ontology registries and metadata vocabulary management are invited to visit the Task Group's Wiki for further information, news, upcoming events and opportunities to contribute.

OLAC-MOUG 2008 Conference

Registration for the OLAC-MOUG 2008 Conference is open.

The joint conference of OLAC (Online Audiovisual Catalogers) and MOUG (Music OCLC Users Group) will take place in Cleveland, Ohio, between Friday, September 26 and Sunday, September 28, 2008. Attendees will enjoy four workshops on cataloging various non-book materials, keynote speech by Lynne Howarth (former Dean of the Faculty of Information Studies at the University of Toronto); closing address by Janet Swan Hill (Associate Director for Technical Services, University of Colorado); and a session on RDA, to name just a few highlights.

Preconference: space is limited for Thursday September 25th's Map Cataloging preconference, given by Paige Andrew.

Please see the conference website for more information and the registration form.

Posted to many distribution lists.

OAI-ORE Resource Maps

Posted to several lists.

The Foresite project is pleased to announce the initial code of two software libraries for constructing, parsing, manipulating and serialising OAI-ORE Resource Maps. These libraries are being written in Java and Python, and can be used generically to provide advanced functionality to OAI-ORE aware applications, and are compliant with the latest release (0.9) of the specification. The software is open source, released under a BSD licence, and is available from a Google Code repository.

You will find that the implementations are not absolutely complete yet, and are lacking good documentation for this early release, but we will be continuing to develop this software throughout the project and hope that it will be of use to the community immediately and beyond the end of the project.

Both libraries support parsing and serialising in: ATOM, RDF/XML, N3, N-Triples, Turtle and RDFa

Foresite is a JISC funded project which aims to produce a demonstrator and test of the OAI-ORE standard by creating Resource Maps of journals and their contents held in JSTOR, and delivering them as ATOM documents via the SWORD interface to DSpace. DSpace will ingest these resource maps, and convert them into repository items which reference content which continues to reside in JSTOR. The Python library is being used to generate the resource maps from JSTOR and the Java library is being used to provide all the ingest, transformation and dissemination support required in DSpace.

Please feel free to download and play with the source code, and let us have your feedback via the Google group:

foresite@googlegroups.com

Friday, June 06, 2008

More MARBI News

Some more MARBI news.

The following papers are available for review by the MARC community:
  • Proposal No. 2008-04: Changes to Nature of entire work and nature of content codes in field 008 of the MARC 21 bibliographic format
  • Proposal No. 2008-09: Definition of Videorecording format codes in field 007/04 of the MARC 21 Bibliographic format
  • Proposal No. 2008-10: Definition of a subfield for Other standard number in field 534 of the MARC 21 bibliographic format
Additional proposals and discussion papers will be posted shortly.

The draft agenda for the 2008 ALA Annual MARBI meetings is available online.

Please note that there is a strong possibility that MARBI may meet during its Monday afternoon time slot of 1:30-3:30 for continuation of the discussion.

Skype News

Skype now lets you set your mobile number as your caller-id on outgoing calls. Very nice. I'm set up.

ALA Annual MARBI Meeting

Posted to many e-mail distribution lists.

The following papers are available for review by the MARC community:

  • Proposal No. 2008-06: Adding information associated with the Series Added Entry fields (800-830)
  • Proposal No. 2008-07: Making field 440 (Series Statement/Added Entry--Title) obsolete in the MARC 21 Bibliographic Format
  • Proposal No. 2008-08: Definition of subfield $z in field 017 of the MARC 21 Bibliographic and addition of the field to the MARC 21 Holdings formats
  • Discussion Paper 2008-DP06: Coding deposit programs as methods of acquisitions in field 008/07 of the MARC 21 holdings format
Additional proposals and discussion papers will be posted shortly.

The draft agenda for the 2008 ALA Annual MARBI meetings will be made available soon.

Wednesday, June 04, 2008

Yahoo Search Monkey

Another step towards the Semantic Web, Yahoo SearchMonkey.
SearchMonkey is fundamentally about transforming the way search results are compiled and displayed by leveraging the same structured data that powers the millions of pages indexed by Yahoo! Search. By sharing structured data with Yahoo!, site owners and content publishers can build more useful, relevant and visually appealing search results, which can increase the quantity and quality of traffic from Yahoo! Search....

You can share data by embedding microformats, using semantic web standards such as RDF, sharing an XML data feed directly with Yahoo! Search, or using the SearchMonkey developer tool to build custom data services that extract structured data from your pages.

LibriVox

LibriVox is becoming a valuable resource for free audio books. They just reached 1500 titles in the collection.
We’ve had a pretty extraordinary May. We cataloged our 1,500th book, James Baldwin’s children’s history book, Four Great Americans, which was a great accomplishment. (Considering seven months ago we were at 1,000).

But we also had an impressively productive month: we released 115 (!) audiobooks into the public domain, almost four per day. Our previous record for monthly production was 77, reached in July 2007.
Is anyone cataloging these and adding them to their collection? Burning them to CDs and adding those to the collection? A few months back the Nebraska Library Commission made news by adding a few books licensed under Creative Commons to their catalog. Anyone doing the same for the LibriVox materials? Adding the records to OCLC for sharing or making them available via OAI-PMH?

Code4Lib Conference

The video from the Code4Lib Conference is now on Archive.org. Note that you can get the MPEG2 high def format there. Some talks include:
  • MARCThing Casey Durfee discusses MARCThing, a self-contained web service which aims to do for MARC and Z39.50 what Solr did for searching.
  • OpenURL Ross Singer and Jonathan Rochkind describe Ümlaut, an open source OpenURL middleware layer intended to improve the link resolving chain by analyzing incoming citations and intelligently querying resources to better enable access to them.
  • Blacklight Bess Sadler describes Blacklight, a Solr based OPAC replacement being developed by University of Virginia Library.
  • Scriblio Casey Bisson describes Scriblio, the OPAC replacement based on the WordPress authoring system.
  • A Metadata Registry Jon Phipps gives an introduction to the Metadata Registry, an open source vocabulary, metadata schema, and DC application profile manager and registry.
And plenty more.

Tuesday, June 03, 2008

Object Reuse and Exchange (ORE ) Specifications

The Open Archives Initiative has announced the public beta release of Object Reuse and Exchange Specifications.
Over the past eighteen months the Open Archives Initiative (OAI), in a project called Object Reuse and Exchange (OAI-ORE), has gathered international experts from the publishing, web, library, and eScience community to develop standards for the identification and description of aggregations of online information resources. These aggregations, sometimes called compound digital objects, may combine distributed resources with multiple media types including text, images, data, and video. The goal of these standards is to expose the rich content in these aggregations to applications that support authoring, deposit, exchange, visualization, reuse, and preservation. Although a motivating use case for the work is the changing nature of scholarship and scholarly communication, and the need for cyberinfrastructure to support that scholarship, the intent of the effort is to develop standards that generalize across all web-based information including the increasing popular social networks of “web 2.0”.

Monday, June 02, 2008

FGDC Digital Cartographic Standard for Geologic Map Symbolization

Found this sitting in the draft folder for quite some time. Here it is at last. The PostScript version of the FGDC Digital Cartographic Standard for Geologic Map Symbolization is now available as a USGS Techniques and Methods publication.

Reblog this post [with Zemanta]

Geologic Map Symbolization

The PostScript version of the FGDC Digital Cartographic Standard for Geologic Map Symbolization is now available as a USGS Techniques and Methods publication.

Improving Subject Searching

Improving subject searching in databases through a combination of descriptors and UDC by Granados, Mariangels and Nicolau, Anna (2008) In Proceedings BOBCATSSS'08: Providing acces for everyone, Zadar (Croatia)
Problems with subject access to online catalogues and databases are not new. Studies on the use of OPACs have revealed two apparently endemic problems: on the one hand, the large number of searches with zero hits (failed searches) and on the other, the retrieval of an excessive amount of bibliographic records (information overload).

In this paper we describe a new information retrieval technique based on the combination of descriptor weighting and the use of the Universal Decimal Classification (UDC) call numbers.

The use of classification call numbers in order to search the catalogue has traditionally been very restricted. In most catalogues, call numbers are used only as topographical indicators and are not searchable. The new system described here makes much fuller use of them.

The system is based on the hypothesis that a set of descriptors correspond to a UDC call number. Through the analysis of the frequency of distribution of descriptors and call numbers, we create a set of clusters that allow increasing precision and recall. At the same time, these clusters offer alternative search modes, making it possible to systematize the indexing process and increase its consistency. Here we present a case study of the use of the system with the ERIC database.

Friday, May 30, 2008

Tag Cleaner

Bring some consistency to your tagging with Delicious Tag Cleaner
What would a "Delicious Tag Cleaner" be? It is tool for removing unnecessary tags from your del.icio.us account....

If you're like me, you probably have thousands of bookmarks collected over years and years of web surfing and hundreds of tags used to describe them. But the thing is that over these months/years you haven't been able to come up with a consistent taxonomy for your tags.

I have, for example, dozens of different tags for expressing links related to software development: "dev", "devel", "development" etc.

So this tool can suggest you tags to be merged together, so you can choose one by one and have this tool to merge the chosen tags on your delicious account.
As you clean-up tags doesn't that remove them from the stream-of-consciousness thing? Aren't they losing their value and becoming subject headings? Poor ones at that.

Statement of International Cataloguing Principles

A reminder from IFLA about the Statement of International Cataloguing Principles.
This is a reminder announcement that the Statement of International Cataloguing Principles developed by the five IFLA Meetings of Experts on an International Cataloguing Code is now available for worldwide review and comment.

A vote form is also available there and can be used by anyone to indicate whether they approve the statement or not and to make comments. The form can be printed out, filled in, and faxed, or it can be filled in electronically and sent as an e-mail attachment.

Wednesday, May 28, 2008

2.0 Speaking Opportunities

Any folks who want to represent the library community in an eduction 2.0 setting should check out CR 2.0. They are having a series of 20 workshops around the U.S. and are using an unconference format. Go to their website and suggest a topic and the folks attending vote on what they want to hear. Even if you don't become a facilitator for the discussion, at least they have seen that libraries are part of eduction 2.0. Just participating in the discussion might open some eyes to the role of libraries in education.

Tuesday, May 27, 2008

Tagging @ NASA

NASA is sporting a tag cloud on their home page. It is generated from words used to search the site. Look to the right a bit down. It sports a nice star field background.

Friday, May 23, 2008

Web Ontology Language (OWL)

Some papers from HP Labs concerning the Web Ontology Language (OWL)
  • An OWL Full Interpretation by Jeremy Carrooll HPL-2008-60

    This report is an appendix to report HPL-2008-59. It gives a worked example of the construction used in the proof from that report. For finiteness, a reduced datatype map consisting of only xsd:boolean is used. Each of the graphs in the construction is listed explicitly, with some redundancy eliminated. The final Herbrand graph contains about 15,000 triples.

  • The Consistency of OWL Full (with proofs) by Jeremy Carroll and Dave Turner HPL-2008-59

    We show that OWL1 Full without the comprehension principles is consistent, and does not break most RDF graphs that do not use the OWL vocabulary. We discuss the role of the comprehension principles in OWL semantics, and how to maintain the relationship between OWL Full and OWL DL by reinterpreting the comprehension principles as permitted steps when checking an entailment, rather than as model theoretic principles constraining the universe of interpretation. Starting with such a graph we build a Herbrand model, using, amongst other things, an RDFS ruleset, and syntactic analogs of the semantic "if and only if" conditions on the RDFS and OWL vocabulary. The ordering of these steps is carefully chosen, along with some initialization data, to break the cyclic dependencies between the various conditions. The normal Herbrand interpretation of this graph as its own model then suffices. The main result follows by using an empty graph in this construction. We discuss the relevance of our results, both to OWL2, and more generally to a future revision of the Semantic Web recommendations. This longer version contains the proofs.

  • The Consistency of OWL Full by Jeremy Carroll and Dave Turner HPL-2008-58

    We show that OWL1 Full without the comprehension principles is consistent, and does not break most RDF graphs that do not use the OWL vocabulary. We discuss the role of the comprehension principles in OWL semantics, and how to maintain the relationship between OWL Full and OWL DL by reinterpreting the comprehension principles as permitted steps when checking an entailment, rather than as model theoretic principles constraining the universe of interpretation. Starting with such a graph we build a Herbrand model, using, amongst other things, an RDFS ruleset, and syntactic analogs of the semantic "if and only if" conditions on the RDFS and OWL vocabulary. The ordering of these steps is carefully chosen, along with some initialization data, to break the cyclic dependencies between the various conditions. The normal Herbrand interpretation of this graph as its own model then suffices. The main result follows by using an empty graph in this construction. We discuss the relevance of our results, both to OWL2, and more generally to a future revision of the Semantic Web recommendations. Publication Info: Submitted to ISWC 2008 b1 s 7th International Semantic Web Conference, Karlsruhe

MARC 2 MODS Tool

The Digital Library Federation announces a revision to their MARCXML to MODS tool.
The DLF Aquifer Metadata Working Group announces an update to the XML stylesheet they have developed for the Aquifer project, for conversion of MARCXML records to MODS. The current stylesheet, DLF_MARC2MODS_1.34.xsl, can be found from a link on our MARC to Aquifer MODS XSLT Stylesheet page. Changes are briefly documented in the comments at the beginning of the stylesheet. We have also updated the Introduction pages that give more detail about some of the changes.

The changes include re-added mapping for tag 510 citations to the note element for monographs only; added subject:hierarchicalGeographic element mapping of tag 662 Subject - Hierarchical Place Name; added mapping of tags 561 (ownership) and 581 (publications) to the note element, removed mapping of 007 specific material designation to the genre element when the value is "remote", and a correction to no longer repeat mapping of dates from the Leader to originInfo:date when the date type is "questionable".

Tuesday, May 20, 2008

MARC Update

Update No. 8 (October 2007) was recently released in multiple document formats. It includes changes made to the MARC 21 formats resulting from proposals which were considered by the ALA ALCTS/LITA/RUSA Machine-Readable Bibliographic Information Committee (MARBI), the Canadian Committee on MARC (CCM) and the BIC Bibliographic Standards Group in 2007.

The printed update is available through the Cataloging Distribution Service.
It includes pages for fields that have been changed, with changes marked with side lining. PDF of those printed update pages are also available online

D-Lib Magazine

The May/June 2008 issue of D-Lib Magazine is now available.

Some articles of interest include:
  • PREMIS With a Fresh Coat of Paint: Highlights from the Revision of the PREMIS Data Dictionary for Preservation Metadata Brian F. Lavoie, OCLC Online Computer Library Center
  • Adding Value to the Library Catalog by Implementing a Recommendation System Michael Moennich and Marcus Spiering, Karlsruhe University Library
I found the one on the recommendation system interesting. They are selling the service as an add-on to the OPAC. LibraryThing for Libraries is doing the same with their data. Syndantics has been doing this for quite some time with cover images and reviews. Seems to be a trend here, 2nd party additions to the OPAC supplying services based on data collected elsewhere. In the article world, there was some research done collecting OpenURL data to rate papers.

Monday, May 19, 2008

xOCLCnum

A new service from OCLC.
I'd like to announce and invite you to try xOCLCnum, the latest in the xIdentifier family of Web services from OCLC.

Just as xISBN allows you to find all related editions of a book by entering its ISBN, xOCLCnum does the same thing using OCLC number.

xOCLCnum is queried using a simple URL format, and returns an XML response with both related OCLCnums and related ISBNs (if any). It is designed to be easily built in to your library application, so you can expand queries, find all related editions, or do whatever creative thing you want to do.

Background:
ISBNs have been assigned since 1970, to most but not all books published.

OCLC numbers are assigned whenever a record is added to WorldCat, OCLC's global union catalog. These records cover a large portion of all books, old and new, held by any library in North America and, increasingly other regions worldwide (most recently, National Library of China).

So the coverage range of OCLC numbers is, not surprisingly, far greater than that of ISBNs: in WorldCat, for example, around 100 million OCLCnums compared to about 20 million ISBNs.

More Information on xOCLCnum
xOCLCnum API description

1:30 Ratio for Information

The post at Librarian.net about the book containing thirty tables-of-contents reminded me of the 1:30 rule for information.
Dolby and Resnikoff found these relationships:
  • A book title is 1/30 the length of a table of contents in characters, on average
  • A table of contents is 1/30 the length of a back of the book index, on average
  • A back of the book index is 1/30 the length of the text of a book, on average
  • An abstract is 1/30 the length of the technical paper it represents, on average
Is this the result of living in the material world and this won't hold true online? Or is this a function of the brain and how it deals with information and likely to hold true where ever we function?

XML Workshop

A couple of years ago I had the pleasure of taking the XML workshop offered by Eric Lease Morgan. One of the best workshops I've experienced. Now the notes have been revised and are available online.
XML is about distributing data and information unambiguously. Through this hands-on workshop you will learn: 1) what XML is, and 2) how it can be used to build library collections and faciliate library services in our globally networked environment.
  • An introduction to XML
  • Activity - Beyond MARC
  • Indexes make search easier
  • Activity - Indexing/searching MODS
  • Activity - Writing XML
  • Flavors of XML
  • Activity - Writing XML, redux
  • Activity - Full-text indexes
  • Client/server computing
  • Databases for data storage and maintenance
  • OAI-PMH - a de-centralized OCLC
  • Activity - Being an OAI service provider
  • Activity - Being an OAI data repository
  • Web Services
  • Activity - Creating a "mash-up"
  • Workshop summary
  • External links

Friday, May 16, 2008

MARC Online

More news from LOC.
The Network Development and MARC Standards Office is pleased to announce that the Full versions of the all five MARC 21 formats are now available online, along with the Online Concise.
The "full" version of a format contains detailed descriptions of every data element, along with examples, input conventions, and history sections - all of the information from the printed formats. There are no textual differences between the Online Full and the printed documentation. The Concise still contains all of the elements and enough description to serve many lookup needs. Changes from the most recent update of the formats are indicated in the text of both the Online Concise and the Online Full.

Links in LC Records

News about 856 links from LOC.
I've received a couple of questions recently about the 856 links in LC records for the TOCs, descriptions, bios, sample texts, etc. and wanted to spread the word about what we did.

Every month, around the first of the month, folks run their link checkers to validate the links in their copies of LC records. The volume of traffic against our web server was tremendous. A couple of times it nearly brought the server down. We tried several things to minimize the impact if it looked like a link checker was running against the web server, but this didn't seem to help the problem. In the end, we moved all of the files that are in the 856 fields to a different, larger, more robust server. Apparently this is causing link checkers to report that there is a redirect and people are asking if they need to change the URL for the links. I would say that there is no need to change the 856 links from http://www.loc.gov... to http://catdir.loc.gov.... In fact, I am still adding the URLs as http://www.loc.gov...

LC is committed to maintaining these URLs, you should not be experiencing access problems with them except when running link checkers or maybe harvesters. I appreciate any reports of wrong connections or other serious problems with the files. By my count, we have over 710,000 links in the LC catalog now, so you can see this is a major commitment for LC.

Wednesday, May 14, 2008

Manifestations and Near-Equivalents

Martha M. Yee continues to make her work readily available.
The two articles about 'manifestation' (the word everyone used to mean 'expression' until FRBR came along) that I published in 1994 are now available at the University of California eScholarship Repository, as follows:

Manifestations and Near-Equivalents: Theory, with Special Attention to Moving-Image Materials. Library Resources & Technical Services 1994; 38:227-256.

Manifestations and Near-Equivalents of Moving Image Works: a Research Project. Library Resources & Technical Services 1994; 38:355-372.

Re: Recommendation and Ranganathan

I hope everybody here is also reading Lorcan Dempsey's weblog. However, just in case there are some who don't, begin with the excellent post Recommendation and Ranganathan. I thought the description of the four types of metadata a very good place to start thinking and discussion.

Tuesday, May 13, 2008

eXtensible Text Framework (XTF)

The California Digital Library (CDL) is pleased to announce a new release of its search and display technology, the eXtensible Text Framework (XTF) version 2.1. XTF is an open source, highly flexible software application that supports the search, browse and display of heterogeneous digital content. XTF offers efficient and practical methods for creating customized end-user interfaces for distinct digital content collections.

Highlights from the 2.1 release include:
  • Extensive interface improvements, including new search forms, built-in faceted browsing, and a new look and feel.
  • Increased support for document and information exchange formats.
    • XHTML and OAI-PMH output
    • NLM article format indexing and output
    • Microsoft Word indexing
  • Streamlined XSLT stylesheets for simpler deployment and
    adaptation.
  • Updated documentation that has been moved to the XTF project wiki, allowing XTF implementers to share solutions with entire user community.
  • "Freeform" Boolean query language, offered as an experimental feature.
  • Backward compatibility with existing XTF implementations.
A complete list of changes is available on the XTF Project page on SourceForge, where the distribution (including documentation) can also be downloaded.

Since the first deployment of XTF in 2005, the development strategy has been to build and maintain an indexing and display technology that is not only customizable, but also draws upon tested components already in use by the digital library and search communities - in particular the Lucene text search engine, Java, XML, and XSLT. By coordinating these pieces in a single platform that can be used to create multiple unique applications, CDL has succeeded in dramatically reducing the investment in infrastructure, staff training and development for new digital content projects.

XTF offers a suite of customizable features that support diverse intellectual access to content. Interfaces can be designed to support the distinct tools and presentations that are useful and meaningful to specific audiences. In addition, XTF offers the following core features:
  • Easy to deploy: Drops directly in to a Java application server such as Tomcat or Resin; has been tested on Solaris, Mac, Linux, and Windows operating systems.
  • Easy to configure: Can create indexes on any XML element or attribute; entire presentation layer is customizable via XSLT.
  • Robust: Optimized to perform well on large documents (e.g., a single text that exceeds 10MB of encoded text); scales to perform well on collections of millions of documents; provides full Unicode support.
  • Extensible:
    • Works well with a variety of authentication systems (e.g., IP address lists, LDAP, Shibboleth).
    • Provides an interface for external data lookups to support thesaurus-based term expansion, recommender systems, etc.
    • Can power other digital library services (e.g., XTF contains an OAI-PMH data provider that allows others to harvest metadata, and an SRU interface that exposes searches to federated search engines).
    • Can be deployed as separate, modular pieces of a third-party system (e.g., the module that displays snippets of matching text).
  • Powerful for the end user:
    • Spell checking of queries
    • Faceted displays for browsing
    • Dynamically updated browse lists
    • Session-based bookbags
These basic features can be tuned and modified. For instance, the same bookbag feature that allows users to store links to entire books, can also store links to citable elements of an object, such as a note or other reference.

XTF was actually used as an experimental OPAC technology at the CDL for an experiment with ranking and recommendation features with our catalog data.

Posted to many e-mail distribution lists.

Non-Latin Data in Name Authority Records

From LC:
As previously announced, MDS- Name Authority records will be enhanced with non-Latin script data in 4XX fields and selected notes beginning June 1, 2008, (see earlier announcements at http://www.loc.gov/catdir/cpso/nonroman_announce.pdf and http://www.loc.gov/catdir/cpso/nonlatin_whitepaper.html for additional information.) An additional FAQ related to the project will be posted at http://www.loc.gov/aba/ shortly.

An effort to automatically pre-populate existing authority records with non-Latin references by OCLC, Inc. will also begin in early June 2008. The initial rate of pre-population will be limited to several hundred records per week, and will grow to a rate of approximately 25,000 records per week. Note that other clean-up projects that have recently increased the volume of name authority records (http://www.loc.gov/cds/notices/2008-02-14.pdf ) will be suspended during this pre-population effort. It is estimated that approximately 400,000 pre-population records will be distributed over a number of months.

CDS is making available a file of name authority test records containing non-Latin script data. The file of 110 test records can be found on the Library of Congress rs7 server under the /emds/test subdirectory with file names of names.nonlatintest.records for the MARC 8 version and names.nonlatintest.records.utf8 for the UTF8 version.

Spam

I've been blasted with comment spam. So I've had to turn on the comment moderation function.

It is a shame how these few folks can ruin things for all. A few years back a e-card was a fun thing to receive and send. now so many are spam, I've stopped sending and opening them. Open comments seem ready to go the same way.

Friday, May 09, 2008

Metadata for Learning Resources

Metadata for Learning Resources: An Update on Standards Activity for 2008 by Sarah Currier appears in the latest issue of Ariadne.
The major areas of development covered in this article are:
  1. LOM Next: plans for the next version of the IEEE LOM
  2. The Joint DCMI/IEEE LTSC (Learning Technology Standards Committee) Taskforce: bringing together the two major metadata standards used for learning resources, and providing an RDF translation for the LOM
  3. DC-Education Application Profile (DC-Ed AP): a modular application profile purely looking at educational aspects of resources, based on community requirements
  4. The United Kingdom’s Joint Information Systems Committee Learning Materials Application Profile (JISC LMAP) scoping study: working alongside a number of similar projects looking at application profiles for repositories in other areas, e.g. images.
  5. International Standards Organisation Metadata for Learning Resources (ISO MLR): based primarily in Canada, this international standards body is devising a new international standard for educational metadata, in response to perceived limitations of the IEEE LOM
  6. The European Commission’s PROLEARN Harmonisation of Metadata project: a study into the issues and challenges of achieving harmonisation in metadata, given the heterogeneous landscape

Thursday, May 08, 2008

Metadata Advocates

I had an Ah-Ha moment while listening to John Udell's show Interviews with Innovators. The episode was Working with Data Sources with Raymond Yee.
Raymond Yee is a lecturer at the UC Berkeley School of Information and the author of Pro Web 2.0 Mashups: Remixing Data and Web Services. In this conversation he talks about teaching students how to work with existing data sources, and speculates with Jon Udell on ways to expand the supply of available sources.
What struck me was that we should be advocates for metadata standards. If the local geneology society puts up a calendar on their website, help them get it into iCal or hCal format. Then we could drop their info into a pathfinder. Or geocoding the local bird-watchers sightings, or school district's lunch menu, or .... We could offer our understanding of the importance of standards and data reuse to our community. The library benefits by becoming the go-to-place for information management. The community benefits because they get the word out more effectively. It would be a very different job description for a cataloger to become the community data standard outreach person. But, not a bad place to be.

Resource Description and Access

Now available, Outcomes of the Meeting of the Joint Steering Committee Held in Chicago, USA, 13-22 April 2008.

Wednesday, May 07, 2008

Using Wikipedia

Two new reports from HP Labs show interesting uses of Wikipedia in information management.

Boosting Inductive Transfer for Text Classification using Wikipedia by Somnath Banerjee. HPL-2008-42
Inductive transfer is applying knowledge learned on one set of tasks to improve the performance of learning a new task. Inductive transfer is being applied in improving the generalization performance on a classification task using the models learned on some related tasks. In this paper, we show a method of making inductive transfer for text classification more effective using Wikipedia. We map the text documents of the different tasks to a feature space created using Wikipedia, thereby providing some background knowledge of the contents of the documents. It has been observed here that when the classifiers are built using the features generated from Wikipedia they become more effective in transferring knowledge. An evaluation on the daily classification task on the Reuters RCV1 corpus shows that our method can significantly improve the performance of inductive transfer. Our method was also able to successfully overcome a major obstacle observed in a recent work on a similar setting. Publication Info: Published and presented at ICMLA 2007, the Sixth International Conference on Machine Learning and Applications (ICMLA'07), 13-15 Dec. 2007 Cincinnati, Ohio, USA
Clustering Short Texts using Wikipedia by Somnath Banerjee, Krishnan Ramanathan, and Ajay Gupta. HPL-2008-41
Subscribers to the popular news or blog feeds (RSS/Atom) often face the problem of information overload as these feed sources usually deliver large number of items periodically. One solution to this problem could be clustering similar items in the feed reader to make the information more manageable for a user. Clustering items at the feed reader end is a challenging task as usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia. Empirical results indicate that this enriched representation of text items can substantially improve the clustering accuracy when compared to the conventional bag of words representation. Publication Info: Published and presented at SIGIR 2007, the 30th Annual International ACM SIGIR Conference, 23-27 July 2007, Amsterdam, Netherlands

Monday, May 05, 2008

Slick Deal

Here is a bargain offered by Amazon, OCLC - MARC Record. It has free shipping too! This was seen on Slick Deals.

Don't they know they can get all the free MARC records they want from their local library?

Thanks Walter.

Thursday, May 01, 2008

myLOC

I may have missed this news, maybe while I was at TxLA, but I've not seen it elsewhere; the Library of Congress now has a "my" portal, myLOC.

Statement of International Cataloging Principles

The Statement of International Cataloging Principles is available for worldwide review.
As Chair of the IFLA Meeting of Experts on an International Cataloging Code (IME ICC) I am pleased to invite comments from the worldwide library community on the final draft of the Statement of International Cataloguing Principles and its accompanying Glossary.

In order to provide the appropriate review period and to schedule adequate time to cumulate, analyze, and incorporate comments before the General Meeting of IFLA in August, the Statement is being posted today on a public Wiki. The IFLA Headquarters Office is closed for holiday April 30-May 5th, but as soon as they return we will move the files there and redirect from the Wiki. In the meantime please link to: http://catprinciples.pbwiki.com/ and view and/or download the Statement for your review; and please use the accompanying voting document for your response.

MARC Records

Ed Summers has "created a bittorrent of the concatenated MARC files donated to the Internet Archive by Scriblio (7,030,372 records)":

http://inkdroid.org/torrents/lc-bib.torrent

Wednesday, April 30, 2008

Library of Congress Subject Heading Suggestion Blog-a-Thon

The results for the Library of Congress Subject Heading Suggestion Blog-a-Thon are in. The effort resulted in 24 subject headings, 6 cross-references, and 2 subdivisions suggestions.

Tuesday, April 29, 2008

Transparency

Get Satisfaction looks like a unique 2.0 tool to make the organization transparent.
Get Satisfaction is a direct connection between people and companies that fosters problem-solving, promotes sharing, and builds up relationships. Thousands of companies use this neutral space to support customers, exchange ideas, and get feedback about their products and services. Get Satisfaction is open, transparent, and free. You’re free to ask, free to answer, and free to start a new conversation. Everyone is invited and encouraged to participate: companies, employees, customers — anyone with an opinion, an answer, or something to say.
A few libraries are repersented. Michael Stephens needs to see this.

Monday, April 28, 2008

Free Comic Book Day

Free Comic Book Day is this weekend, May 3.

Additions to the MARC Code Lists for Relators, Sources, Description Conventions

The codes listed below have been recently approved for use in MARC 21 records. The codes will be added to the online MARC Code Lists for Relators, Sources, Description Conventions.

The codes should not be used in exchange records until after June 25, 2008. This 60-day waiting period is required to provide MARC 21 implementers time to include newly defined codes in any validation tables they may apply to the MARC fields where the codes are used.

Category Code Sources
The following codes are for use in subfield $2 in field 072 in Authority and Bibliographic records (Subject Category Code) and in subfield $z in field 073 (Subdivision Usage) in Authority records.

Additions:

bisacsh
BISAC Subject Headings
(http://www.bisg.org/standards/bisac_subject/index.html) [use only after June 25, 2008]
bisacmt
BISAC Merchandising Themes
(http://www.bisg.org/standards/merchandising.html) [use only after June 25, 2008]
bisacrt
BISAC Regional Themes
(http://www.bisg.org/standards/region_codes.html) [use only after June 25, 2008]
Classification Sources
The following code is for use in subfield $2 in field 084 in Bibliographic and Community Information records (Other Classification Number), in subfield $2 in field 084 in Classification records (Classification Scheme and Edition) and in subfield $2 in field 065 in Authority records (Other Classification Number).

Addition:
blissc
British Library Inside service subject classification. (London: British Library) [use only after June 25, 2008]
Term, Name, Title Sources
The following codes are for use in subfield $2 in fields 600-657 and 662 in Bibliographic and Community Information records, and in subfield $f in field 040 (Cataloging Source) in Authority records.

Additions:
bisacsh
BISAC Subject Headings
(http://www.bisg.org/standards/bisac_subject/index.html) [use only after June 25, 2008]
bisacmt
BISAC Merchandising Themes
(http://www.bisg.org/standards/merchandising.html) [use only after June 25, 2008]
bisacrt
BISAC Regional Themes
(http://www.bisg.org/standards/region_codes.html) [use only after June 25, 2008]
quiding
Quiding, Nils Herman. Svenskt allmant forfattningsregister for tiden fran ar 1522 till och med ar 1862. (Stockholm: Norstedt) [use only after June 25, 2008]
skon
tt indexera skonlitteratur: Amnesordslista, vuxenlitteratur.
(Stockholm: Svensk biblioteksfrening) [use only after June 25, 2008]

Friday, April 25, 2008

More Comments on TLA

The drive from Houston to Dallas was beautiful. The blue bonnets had past, except for a few scattered patches. However, the brown eyed susans, winecups, indian paintbrushes, and a white flower (cow's parsley?) were spectacular.

At the RDA preconference I had the pleasure of heading Carol Seiler, from AMIGOS, speak. Great presentor.

Watch New Records Enter WorldCat

Watch new records enter WorldCat.

Wednesday, April 23, 2008

DCMI Abstract Model

At the RDA preconference I noticed that RDA seems to have been based, at least in part, on the DCMI Abstract Model. I knew RDA had some basis in FRBR, but this was something new to me. Getting to know the DCMI Abstract Model before RDA hits has been added to my to-do list.
This document specifies an abstract model for Dublin Core metadata. The primary purpose of this document is to specify the components and constructs used in Dublin Core metadata. It defines the nature of the components used and describes how those components are combined to create information structures. It provides an information model which is independent of any particular encoding syntax. Such an information model allows us to gain a better understanding of the kinds of descriptions that we are encoding and facilitates the development of better mappings and cross-syntax translations.

What is a Work?

Good news from Martha Yee.
...all of my "What is a Work?" articles published in Cataloging & Classification Quarterly in 1994-1995 are now available at the UC eScholarship repository, as follows:

"What is a Work? Part 1, The User and the Objects of the Catalog." Cataloging & Classification Quarterly 1994; 19:1:9-28.
http://repositories.cdlib.org/postprints/2709

"What is a Work? Part 2, The Anglo-American Cataloging Codes." Cataloging & Classification Quarterly 1994; 19:2:5-22.
http://repositories.cdlib.org/postprints/2710

"What is a Work? Part 3, The Anglo-American Cataloging Codes, Continued." Cataloging & Classification Quarterly 1995; 20:1:25-45.
http://repositories.cdlib.org/postprints/2755

"What is a Work? Part 4, Cataloging Theorists and a Definition." Cataloging & Classification Quarterly 1995; 20:2:3-23.
http://repositories.cdlib.org/postprints/2711

Another relevant article that I wrote about FRBR-izing OCLC is available as well:

"Musical Works on OCLC, or, What if OCLC Were Actually to Become a Catalog?" Music Reference Services Quarterly 2002: 8:1:1-26.

http://repositories.cdlib.org/postprints/2713

In addition, my recent article analyzing the differences among cataloging, metadata, descriptive bibliography, and abstracting and indexing services is now available:

"Cataloging Compared to Descriptive Bibliography, Abstracting and Indexing Services, and Metadata." Invited for Ruth Carter festschrift, Cataloging & Classification Quarterly 2007; 44:3/4:307-328.

http://repositories.cdlib.org/postprints/2721

LCSH Suggestion Blog-a-Thon

The Radical Reference folks are having a Library of Congress Subject Heading Suggestion Blog-a-Thon.
Do subject headings still matter? We say they do.

Does the Library of Congress always identify accessible and appropriately named headings and implement them in a timely manner? We say not always. All you have to do is spend one day behind a reference desk to see examples of biased, non-inclusive, and counterintuitive classifications that slow down, misdirect, or even obscure information from library users. As librarians and library workers, providing access to information is important-and classifying it in ways that are inclusive and intuitive strengthens our egalitarian mission.

Between now and Sunday, April 27, Radical Reference invites you to suggest subject headings and/or cross-references which will then be compiled and sent to the Library of Congress. You can either choose one previously suggested by Sandy Berman (pdf or spreadsheet) or propose your own.

This is a chance to positively impact the catalog of the de facto national library of the United States, which also impacts cataloging all over the world!

Tuesday, April 22, 2008

Recommender System for the DSpace

A Recommender System for the DSpace Open Repository Platform by Desmond Elliott, James Rutherford, and John Erickson. HPL-2008-21.
We present Quambo, a recommender system add-on for the DSpace open source repository platform. We explain how Quambo generates content recommendations based upon a user selected set of examples, our approach to presenting content recommendations to the user, and our experiences applying the system to a repository of technical reports. We consider how Quambo could be combined with the peer-federated DSpace add-on to extend the item-space from which recommendations can be generated; a larger item-space could improve the diversity of the set from which to make recommendations. We also consider how Quambo could be extended to add collaboration opportunities to DSpace. Publication Info: Submitted to Open Repositories 2008, Southampton, UK, April 1-4, 2008

Monday, April 21, 2008

TLA Recap

TLA is over for the year. Always an excellent conference. Here are a few observations. The RDA preconference had 135 registered. Some had to be turned away, the most the room would hold was 135. There is definitely an interest in this.

Walt Crawford shows that common sense is not so common but in the right forum always interesting.

No graphic novel/comic vendors. No Marvel, DC, Antarctic Press, Strangers in Paradise. Missed them. Rod Espinosa did a presentation and autograph session. And the author of American Born Chinese did a presentation. Have to check out his stuff, very well-spoken.

The keynote panel was fun. Roy Tennet was a very good moderator.

OPALS looks like an open-source ILS worth investigating.

Post any failures at the Library Success wiki. Examples of things that did not work and even better info on why are important and useful to others.

The KIC copier looks interesting. Too expensive for us right now, $20,000 or so. But a flat scanner that produces a PDF or TIFF and then can email or move the file to a thumb drive looks like the future.

The Nasher Sculpture Center is a beautiful setting. The Willows, Irises and water at the end of the row Oaks was stunning.

The District Caucuses were the same time as the alumni dinners. I went for the dinner. Nice view from the 69th floor.

TLA 2009

It looks like the Lunar and Planetary Institute (LPI) Education Dept. will be having a preconference at TLA 2009. Explore! Fun with Science. Never too early to get this penciled in your daytimer.

RDF Tool

RDFify your data wtih Triplify.
Triplify provides a building block for the "semantification" of Web applications. Triplify is a small plugin for Web applications, which reveals the semantic structures encoded in relational databases by making database content available as RDF, JSON or Linked Data.

Triplify is very light weight: It consists just of few files with less than 500 lines of code. For a typical Web application a configuration for Triplify can be created in less than one hour and if this Web application is deployed multiple times (as most open-source Web applications are) the configuration can be reused without modifications.

Triplify makes Web applications easier mashable and lays the foundation for next generation, semantics based Web searches.

23 Things

23 Things is all the rage among the Library 2.0 folks. I had an idea, how about 23 Things for the Semantic Web? COinS, Microformats, RDF, Topic Maps, SKOS, etc. There would be plenty to investigate. Not sure the concept could be grasped quite as fast though.

Friday, April 11, 2008

VALE OLS Materials

Video streaming, audio podcasts and PowerPoint presentations from the VALE's Next Generation Academic Library System Symposium OLS (Open Library System) are now available on the VALE website.

Genre/Form Headings for Radio Programs

In August 2007, the Cataloging Policy and Support Office (CPSO) announced a project to begin issuing genre/form authority records (MARC 21 tag 155) for motion pictures, television programs, and videos. As the next step in the development of genre/form headings at the Library of Congress, CPSO has begun a project to create genre/form headings for radio programs. These headings are being created by catalogers in the Motion Picture, Broadcasting, and Recorded Sound Division (MBRS) Division and will join those already being established for moving images. They are based chiefly on the concepts represented in the Radio Form/Genre Terms Guide (RADFG). Existing LCSH headings in the area of radio programming (MARC 21 tag 150) will also be considered for inclusion.

To support the creation and application of these headings, CPSO and MBRS have drafted a Subject Cataloging Manual (SCM) instruction sheet, H 1969.5, which is available in PDF format on CPSO’s website. Interested parties are invited to send comments on this instruction sheet to Janis Young at jayo@loc.gov.

CPSO reminds SACO participants that change requests and proposals for genre/form headings are not being accepted at this time.

TLA Conference

Postings next week will be sporadic, at best, possibly non-existent. I'll be at TLA and though I will have the laptop I may not feel like posting at the end of long, very full days. I'll start the week off at the preconference on RDA. Last count I heard for that was 135 registered, blows my mind. Later on Tuesday I'll be at dinner with some catalogers, good folks all. Then if time permits catch the end of the welcome party. Looking forward to seeing some folks I've not seen in too long and meeting some new people.

Thursday, April 10, 2008

TLA Conference News

Cali Lewis has been moved out of the NetFair location into a regular room. I think the time has stayed the same. Have to check when I get my conferernce schedule. I'm no longer the room host, but I plan on being there.

So far my conference Twitter experiment is a flop. I've got no one following, nor anyone to follow. I guess TLA is a bit different than CiL. I will keep it up for a bit just to make sure it is not the right tool at this time.

Tuesday, April 08, 2008

OPAC Enhancement

Here is an interesting enhancement to an OPAC, Answer Tips. The American University of Rome Library did this. Now double clicking on any unlinked word brings up a short pop-up explanation. Quick and easy to do. How much value does it add? Interesting.

Monday, April 07, 2008

TechNet 2008

Looks like fun. "TechNet 2008 is the first annual North Texas conference focusing on technology in libraries" June 12, 2008.

TX Library Association Annual Conference

I've started my Twitter for the Texas Library Assoc. Conference. If you'll be there and want to keep in touch.

Friday, April 04, 2008

New Version of Omeka

News from Omeka.
Omeka 0.9.1 is our first release since the initial public launch on February 20, 2008. It fixes 20+ bugs, and we highly recommend that all users upgrade their existing Omeka installations. The API hasn’t changed since the 0.9.0 release, so existing themes and plugins should continue to work after the upgrade.
BTW
Omeka is a web platform for publishing collections and exhibitions online. Designed for cultural institutions, enthusiasts, and educators, Omeka is easy to install and modify and facilitates community-building around collections and exhibits. Omeka is free and open source

PREMIS Data Dictionary for Preservation Metadata

The PREMIS Editorial Committee is pleased to announce the release of the PREMIS Data Dictionary for Preservation Metadata, version 2.0. This document is a revision of Data Dictionary for Preservation Metadata: Final report of the PREMIS Working Group, issued in May 2005. The PREMIS Data Dictionary and its supporting documentation is a comprehensive, practical resource for implementing preservation metadata in digital archiving systems. Preservation metadata is defined as information that preservation repositories need to know to support digital materials over the long term.

This document is a specification that emphasizes metadata that may be implemented in a wide range of repositories, supported by guidelines for creation, management and use, and oriented toward automated workflows. It is technically neutral in that no assumptions are made about preservation technologies, strategies, syntaxes, or metadata storage and management. Members of the PREMIS Editorial Committee revised the original data dictionary based on comments and experience from implementers and potential implementers since its release. The Editorial Committee kept the preservation community informed about issues being discussed, solicited comments on proposed revisions, and consulted outside experts where appropriate.

The international Editorial Committee is a part of the PREMIS Maintenance Activity sponsored by the Library of Congress. The Maintenance Activity also includes PREMIS tutorials and promotional activities, and an active PREMIS Implementers Group.

Major changes in this revision include:
  • Expanded rights metadata
  • More extensive significant properties and preservation level information
  • Mechanism for extensibility for a number of metadata units
The PREMIS Data Dictionary for Preservation Metadata, version 2.0 is available. An XML schema to support implementation is currently in draft and is available. This is an extensive revision of the earlier PREMIS version 1.1 schemas.

After a one month review, the schema will be finalized. Please send comments about the XML schema by April 24 to Ray Denenberg, rden@loc.gov.

Monday, March 31, 2008

NISO Website

The NISO website has a new look.

Friday, March 28, 2008

Additions to the MARC Code Lists for Relators, Sources, Description Conventions

The codes listed below have been recently approved for use in MARC 21 records. The codes will be added to the online MARC Code Lists for Relators, Sources, Description Conventions.

The codes should not be used in exchange records until after May 28, 2008.
This 60-day waiting period is required to provide MARC 21 implementers time to include newly defined codes in any validation tables they may apply to the MARC fields where the codes are used.

Category Code Sources
The following code is for use in subfield $2 in field 072 (Subject Category Code/Code Source) in Authority and Bibliographic records.

Addition:
ekz
Systematiken der ekz [use only after May 28, 2008]
Classification
The following codes are for use in subfield $2 in field 084 in Bibliographic and Community Information records (Other Classification Number), in subfield $2 in field 084 in Classification records (Classification Scheme and Edition) and in subfield $2 in field 065 in Authority records (Other Classification Number).

Additions:
dopaed
DOPAED der UB Erlangen [use only after May 28, 2008]
methepp
Methode Eppelsheimer [use only after May 28, 2008]
ssgn
Sondersammelgebiets-Nummer [use only after May 28, 2008]
Description Conventions
The following codes are for use in subfield $e in field 040 in Bibliographic and Authority records (Description Conventions).

Additions:
din1505
Titelangaben von Dokumenten (Berlin: Beuth) [use only after May 28, 2008]
vd16
Formalerschliessung nach dem Verzeichnis der Drucke des 16. Jahrhunderts (VD 16) [use only after May 28, 2008]
vd17
Formalerschliessung nach dem Verzeichnis der Drucke des 17.
Jahrhunderts (VD 17) [use only after May 28, 2008]
rakddb
Ansetzungsform gemaess der RAK - Anwendung Der Deutschen Bibliothek [use only after May 28, 2008]
Other codes
The following code is for use in subfield $2 in field 210 in Bibliographic records (Abbreviated Title).

Addition:
din1430
Key Title nach DIN 1430 (Berlin: Beuth) [use only after May 28, 2008]
The following code is for use in subfield $2 in field 044 (Country of Publishing/Producing Entity Code) in bibliographic records.

Addition:
swdl
Lndercode der Schlagwortnormdatei (SWD) (Leipzig, Frankfurt am Main, Berlin: Deutsche Nationalbibliothek) [use only after May 28, 2008]
Term, Name, Title Sources
The following code is for use in subfield $2 in fields 600-657 in Bibliographic and Community Information records, and in subfield $f in fields 040 (Cataloging Source) and subfield $2 in fields 700-788 (Heading Linking Entries / Source of heading or term) in Authority records.

Addition:
rswkaf
Alternativform zum Hauptschlagwort [use only after May 28, 2008]
The codes listed below were previously defined for use in subfield $2 in fields 600-651 in Bibliographic and Community Information records, and in subfield $f (Subject heading or thesaurus conventions) in field 040 in MARC 21 Authority records.

Usage has been expanded to subfield $2 in fields 654-657 and 662 in Bibliographic records (Subject Added Entries/Index Terms); subfield $2 in fields 654-657 in Community Information records (Subject Added Entries/Index Terms); and subfield $2 in fields 700-788 (Heading Linking Entries / Source of heading or term) in Authority records.
rswk
Regeln fr den Schlagwortkatalog (Leipzig, Frankfurt am Main, Berlin: Deutsche Nationalbibliothek) (3MB PDF file) [use in new fields only after May 28, 2008]
swd
Schlagwortnormdatei (Leipzig, Frankfurt am Main, Berlin: Deutsche Nationalbibliothek) [use in new fields only after May 28, 2008]

TLA Conference

At TLA I'll be a room host for the session by Cali Lewis. Tuesday, 2 PM @ the NetFair. I'll also be at the RDA preconference.

I lost the election as councilor for the Digital Libraries group, I do think the best person won. So I'll be passing on the DL business meeting, but will most likely hit most of their sessions. I'll be starting a Twitter for the conference. I'm looking forward to seeing some folks soon.

This morning I restarted my Facebook account. I got bored with it about a year ago and shut it down. Today I reactivated it. Everything was still there. The information does not get erased when you close it down.

USEMARCON Plus

A new version of USEMARCON Plus, The Universal MARC Record Convertor, is available.
USEMARCON facilitates the conversion of catalogue records from one MARC format to another e.g. from UKMARC to UNIMARC. The software was designed as a toolbox-style application, allowing users with detailed knowledge of the source and target MARC formats to develop rules governing the behaviour of the conversion. Rules files may be supplemented by additional tables for more accurate conversion of MARC-specific character sets or coded information. The tables and rules files are simple ASCII text files and can be created using any standard text editor such as MS Windows Notepad.

Wednesday, March 26, 2008

Microformats

Microformats University: 100+ Articles and Resources by Jessica Hupp.
Microformats are small formatting pieces designed to make your data easier to read by both users and software. Although their use is not widespread, it’s important that every web developer becomes familiar with them, as they’re sure to be an integral part of the web’s future. Because of this, there are a number of articles and resources out there devoted to microformats. We’ve compiled more than 100 of the best here.

Tuesday, March 25, 2008

Additions to the MARC Country and Geographic Area Code Lists

As the result of Kosovo declaring independence from Serbia in February 2008, new country and geographic area codes have been defined for use in MARC records.
  1. MARC country code change

    The new country code is:
    kv
    Kosovo
    Kosovo was previously coded rb for Serbia from February 2007-May 2008. From 1992-April 2007 it was coded yu for Serbia and Montenegro. Prior to October 1992, yu was used for Yugoslavia, which included the Socialist republics of Bosnia and Herzegovina, Croatia, Macedonia, Montenegro, Serbia, and Slovenia.
  2. MARC geographic area code change

    The new geographic area code is:
    e-kv
    Kosovo
    Kosovo was previously coded e-rb for Serbia from February 2007-May 2008.
Yugoslavia [e-yu] will be retained for works on Yugoslavia as a whole (including the Kingdom of Yugoslavia, the Federal Republic of Yugoslavia, and the Socialist Federal Republic of Yugoslavia) and former Yugoslav republics before they separated.

Code4Lib Journal

The 2nd issue of Code4Lib Journal is now available. Plenty of good articles.

LibraryThing API

Tim Spalding has released an API for LibraryThing.
I just released a Javascript/JSON API to LibraryThing core work data.

http://www.librarything.com/thingology/2008/03/first-cut-works-json-api.php

It's basically a riff on what Google did recently—a way to link to LibraryThing if we have a book, and not if we don't. It also includes copy and review counts, and the average rating. It takes ISBNs, LCCNs and OCLC numbers.

Next up will be a JSON API into member books, so members can design their own widgets and mash their library up with the contents of a page.

It's all very beta, and my ears are wide open.

Monday, March 24, 2008

Organizing Without Organizations

Here Comes Everybody: The Power of Organizing Without Organizations is a talk by Clay Shirky discussing his new book, Here Comes Everybody: The Power of Organizing Without Organizations. (WorldCat Amazon) It is available in both video and audio. Seen on Thing-ology.

OAI Toolkit

Now available on Sourceforge, OAI4J a client library for PMH and ORE
OAI4J is an open-source client library for OAI-PMH and OAI-ORE created by the National Library of Sweden. The library is object-oriented in it's design and written in Java. It can be used to harvest metadata from OAI-PMH compliant repositories. It can also be used to create new OAI-ORE Resource Maps from scratch, to parse existing ones and to serialize them to xml.
This is the 1st tool I've noticed that works on the new OAI-ORE specs.

Tagging Structures and the Organization of Information

Analyzing Communal Tag Relationships for Enhanced Navigation and User Modeling by Edwin Simpson and Mark H. Butler (HPL-2008-24)
The increasing amount of available information has created a demand for better, more automated methods of finding and organizing different types of information resource. This chapter investigates methods of improving navigation, personalization and recommendation of information resources using collaboratively generated tags to model resources and users. We discuss the advantages and limitations of tags, and describe using relationships between tags to discover latent structures that could be used to automatically organize a community's tags. We give a hierarchical clustering algorithm for extracting latent structure and explain methods for determining tag specificity. Next we explain how latent structure visualizations could enhance navigation. Finally we discuss future trends including using latent tag structures to model users and their current tasks for recommendation and user interface personalization. Publication Info: Submitted to (Book) Collaborative & Social Information Retrieval and Access: Techniques for Improved User Modeling, Edited by Max Chevalier, Christine Julien and Chantal Soule-Dupuy, published by IGI Global.

RDF Browsing Tool

A new Hp report of possible interest, Humboldt: Exploring Linked Data by Georgi Kobilarov and Ian Dickinson (HPL-2008-23)
Abstract: We present Humboldt, a novel user interface for browsing RDF data. Current user interfaces for browsing RDF data are reviewed. We argue that browsing tasks require both a facet browser's ability to select and process groups of resources at a time and a 'resource at a time' browser's ability to navigate anywhere in a dataset. We describe Humboldt which combines these two features in a single coherent interface. Our approach is based on the operation of pivoting, which enables the user to move the focus of a browsing from one set of resources to a set of related resources. With repeated use of the pivot operation the user can browse anywhere in the data. We describe a preliminary evaluation of our approach and discuss its implications for further development. Publication Info: To be presented and published in Linked Data on the Web (LDW'08), Beijing, China, April 2008