Friday, April 01, 2005

xSiteable Publishing Framework

Alexander Johannesen writes
I have started some serious work on the xSiteable Publishing Framework which combines a lot of things I've been working on for the last few years. This is a framework that's been brewing for quite some time, and hence I've decided to can the xSiteable project as a stand-alone thing; the good stuff developed for xSiteable version 0.95+ will be part of this new framework. Here is the rundown:
  • Support for DocBook documents, xSiteable templates and content, MARC XML (which is exciting to all the librarians out there!), and Topic maps (using CSXTM).
  • Conversion tool : MARC XML to Topic maps which I'm personally very excited about. I have basic support already finished.
  • Small-scale cache machine support for PHP (and Java, C# and Python close behind) making it really scalable and flexible.
The most important and exciting part of this framework is that I can now create fully dynamic sites based on sets of MARC XML records. The project that triggered this is a smaller folklore project where I was handed a set of MARC XML records with annotations. Not only can I now display info about this set, but show off the synergetic effects of putting them into a Topic maps instead of a traditional RDBMS. My prototype is looking really good, and I'll share more info here as I go along.

For librarians: If anyone wants to join me in my quest for a MARC XML => Topic maps brigde, please drop me a line. What is needed the most is an MARC compatible ontology.

A draft MARC XML=>Topic Map schema is now available. Alexander is the author of Here is a How to Topic Maps, Sir which is one of the better introductions to the topic.

Thursday, March 31, 2005

Off Topic

This is somewhat of a golden age for comic books. It has been quite some time since I've found so many excellent comics being published. I don't like books, of any type, with too much sex or violence. Sin City and Pulp Fiction are not my cup of tea, no matter how much the critics may like them. Unfortunately, too many in the comic industry and entertainment industry in general, think adult means sex and violence. Oh well, there are some excellent exceptions to the trend and more than in the recent past.
  1. Astonishing X-Men. Wonderful writing and art. Sometimes whole pages without any word balloons, the art is enough and the writer is not so fat-headed that he thinks he should have text in every panel. Has me reading the X-Men for the first time in 20 years.
  2. Soulfire. Beautiful art and an interesting story. A mix of science fiction and fantasy, no guys in tights.
  3. Other World. Again great art and a story that looks intriguing. This time science fiction and mythology mixed. Some interesting characters.
In the recent past there has been the Supergirl issues in Superman/Batman and Fathom: Dawn of War, that might still be on shelves in your local shop. A good time to be reading comics.

Wednesday, March 30, 2005

LC Test of Access Level Records

In November of 2004, the Library of Congress Bibliographic Access Divisions posted information related to efforts to define a new level of cataloging within the MARC/AACR context, called access level-- more information related to the background and development of the core data set and cataloging guidelines may be found online.

From December 2004 to January 2005, the Library of Congress conducted a test of the proposed access level core data set and cataloging guidelines to determine whether the resulting records would meet the objectives formulated for the project (functionality, cost-efficiency, and conformity with current standards). The link above includes a presentation summarizing the results of the test.

Future plans for implementing Access Level
After evaluating the results of the test, LC has determined that there are substantial cost savings to be derived from access level cataloging, with no appreciable loss of access for searchers. The Bibliographic Access Divisions is proposing to pursue the implementation of access level cataloging, using the following framework to define a preliminary phase to be carried out in the next year:
  1. Continue to apply access level cataloging for non-serial remote access electronic resources (with guideline modifications based on cataloger and reference feedback).
  2. Expand the group of trained catalogers from the five initial testers to include all catalogers trained to work on this category of material.
  3. Solicit feedback on the access level core data set, cataloging guidelines, and future plans, from internal and external constituencies.
  4. Collaborate with the Program for Cooperative Cataloging (see Objective 2.1.2 in the PCC Tactical Objectives),
  5. Distribute the records created as part of the test, as well as for the preliminary phase, via normal record distribution products (Cataloging Distribution Service).
  6. Consider additional tests of the functionality of the access level records in the catalog.
  7. Given the considerable savings derived from doing original cataloging at access level as opposed to adapting copy cataloging records at full level, perform only original cataloging at access level for the preliminary phase; re-assess this decision after one year.
  8. Work with other institutions testing the guidelines and core data set to decide on the optimal record identification indicia (e.g., encoding level, possible use of authentication code).
  9. Consider whether the access level model might also apply to other types of resources (Bibliographic Access Divisions Strategic Plan for 2005-2006, Goal IV, Objective 7)
More information
The results of the access level test at the Library of Congress (PowerPoint presentation), the original project report, and the core data set are available.

Tuesday, March 29, 2005

Automatic Metadata Generation Applications

The Library of Congress announces publications of the final report for the AMeGA (Automatic Metadata Generation Applications) project.

Greenberg, J., Spurgin, K., and Crystal, A. (2005). Final Report for the AMeGA (Automatic Metadata Generation Applications) Project. Submitted to the Library of Congress February, 17, 2005.

The final report can also be found on the Library of Congress Web site for the Bicentennial Conference on Bibliographic Control for the New Millennium, which seeks to provide leadership to libraries and other information centers in confronting the challenges of networked resources and the Web.

Dr. Greenberg served as Principal Investigator (PI) for the AMeGA project, a research grant which lasted a full year. AMeGA stands for _A_utomatic _Me_tadata _G_eneration _A_pplications project and the project had for goal to identify and recommend functionalities for applications supporting automatic metadata generation in the library/bibliographic control community. The project was conducted in connection Section 4.2 of the Library of Congress Bibliographic Control Action Plan. The Action Plan's charge for section 4.2 is to "Develop specifications for a tool that will enable libraries to extract [and harvest] metadata from Web-based resources in order to create catalog records and that will detect and report changes in resource content and bibliographic data in order to maintain those records. Communicate the specifications to the vendor community and encourage their adoption."

The AMeGA research project pursed three main goals:
  1. Evaluate the current automatic metadata generation applications (in the following categories: document presentation software, tools created specifically for metadata generation, and online library cataloging modules for creating metadata);
  2. Survey metadata professionals to get a consensus on which aspects of metadata generation are most amenable to automation and semi-automation; and
  3. Compile a final report of recommended functionalities for automatic metadata generation applications. The final report was reviewed and endorsed by the Metadata Generation Task Force (MGTF).
The report acknowledges the contributions of the MGTF members for their participation and their expert advice. In addition, since the final report was based partially on survey data gathered from a variety of participants that were recruited via a number of listservs, Dr. Greenberg also expressed her gratitude for input of the survey participants, both for the quality and depth of their responses.

To find out more about the AMeGA project, please go to AMeGA Project Web site.

Posted to AUTOCAT by John D. Byrum, Jr.

Monday, March 28, 2005

TLA Talk

I've started a FURL topic for my upcoming TLA Talk. I don't give many talks, so I'm starting to get excited.

NAICS Classification Revision

The North American Industry Classification System is being revised. OMB seeks comment on the advisability of revising NAICS to incorporate the changes published in the notice. Please submit comments as soon as possible, but no later than June 9, 2005.


Kevin points out in a comment on the Friday posting about Planet, that there is at least one library related planet, code4lib. Since so may folks only get the e-mail or RSS feed and would miss the comment I'm posting it here.

LCCN Sorting

sortLC for sorting LC call numbers. The sortLC Perl utilities are designed for sorting a list of Library of Congress (LC) call numbers. LC call numbers are based on the Library of Congress Classification system. This is an initial beta effort, not a finished product. Results may vary. Feedback is appreciated.

Free Comic Book Day

40 days until Free Comic Book Day. Free Comic Book Day will take place on May 7, 2005. Libraries should contact their local comic book shop to see if they can somehow participate.