Metadata extraction is one crucial module for domain specific Web content discovery and management, because the accuracy and completeness of the extracted metadata would directly affect the quality of subsequent domain information services. Our Online Course Organization project aims to build an online course portal to serve the course information obtained from the Web. Since most course pages are irregularly structured, most existing approaches are not effective for extracting course metadata. In this paper, we proposed a novel hierarchical clustering approach to generate a web page semantic structure model from the DOM tree, called Logical Structure Model, such that the hidden patterns and knowledge can be revealed and used to facilitate identifying course metadata. The experimental results have shown that our solution can achieve effective metadata extraction
Wednesday, November 26, 2008
Effective Metadata Extraction from Irregularly Structured Web Content by Baoyao Zhou, Wei Liu, Yu Yang, Weichun Wang Ming Zhang, (HPL-2008-203)
Now available, The Liblog Landscape 2007-2008 by Walt Crawford.
Liblogs--blogs written by library people, as opposed to official library blogs--provide some of today's most interesting and useful library literature. This book offers a broad look at English-language liblogs as they are and as they've changed between 2007 and 2008. The book includes more than 600 blogs with detailed analysis of 27 metrics for 2007 and 2008 and changes from 2007 to 2008--and, for 143 of them, 2006 as well. Through tables, charts and text, we explore the liblog landscape.
The MODS users are collecting examples of tools using MODS. One example is Tellico.
Tellico is a KDE application for organizing your collections. It provides default templates for books, bibliographies, videos, music, video games, coins, stamps, trading cards, comic books, and wines.Tellico allows you to enter your collection in a catalogue database, saving many different properties like title, author, etc. Two different views of your collection are shown. On the left, your entries are grouped together by any field you like, allowing you to see how many are in each group. On the right, selected fields are shown in column format, allowing you to sort by any field. On the bottom is a customizable HTML view of the current entry. The entry editor is a dialog box where you enter the data.
Tuesday, November 25, 2008
LibLime has announced the beta test of a suite of cataloging tools, ‡biblios.net.
‡biblios.net is a subscription-based, hosted version of the open-source ‡biblios metadata editor that we released earlier this year. In addition to the editor, ‡biblios.net includes some extended community features such as integrated real-time chat, forums, and private messaging.‡biblios.net also provides access to the world's largest database of freely-licensed library records. The database will be freely available to ‡biblios.net subscribers and non-subscribers alike via Z39.50, OAI, and direct download.Furthermore, the database itself will be maintained by ‡biblios.net users similar to the way that Wikipedia's database is maintained by users.We're now looking for enthusiastic participants to help shape the final production release of ‡biblios.net.Ways you can help:An aside, wouldn't it make more sense in that first paragraph to link "‡biblios metadata editor" rather than "released earlier this year?" Links are a form of mark-up and clean mark-up matters.
Become a beta tester for the ‡biblios.net platform by filling out the beta tester application form.Donate your records to ‡biblios.net. Upload records to http://archive.org, and drop us an email at 'info AT liblime DOT com'Get involved in the ‡biblios open-source community: get your copy of ‡biblios and join the development team at http://biblios.org