The VIAF dataset is now available for public consumption! http://viaf.org/viaf/data describes and links to the files involved and describes how we expect the ODC-By license to be applied. We are not sure just how popular the files will be, so if the site appears slow, please stop downloading and come back later. From my machine here at OCLC my browser is estimating 20-30 minutes to download the larger files, from my home it was double that.
Friday, May 04, 2012
Thom Hickey has details about the Virtual International Authority File being publicly available on his Outgoing weblog..
Thursday, May 03, 2012
Adding semantic meaning to text can only help our users. High-recall extraction of acronym-definition pairs with relevance feedback by Anna Yarygina and Natalia Vassilieva has been publshed by HP Laboratories as HPL-2012-46.
This paper addresses the problem of extracting acronyms and their definitions from large documents in a setting, when high recall is required and user feedback is available. We propose a three step approach to deal with the problem. First, acronym candidates are extracted using a weak regular expression. This step results in a list of acronyms with high recall but low precision rates. Second, definitions are constructed for every acronym candidate from its surrounding text. And last, a classifier is used to select genuine acronym- definition pairs. At the last step we use relevance feedback mechanism to tune the classifier model for every particular document. This allows achieving reasonable precision without losing recall. As opposed to existing approaches, either created to be generic and domain independent or tuned to one particular domain, our method is adaptive to an input document. We evaluate the proposed approach using three datasets from different domains. The experiments prove the validity of the presented ideas.
at 9:47 AM
Wednesday, May 02, 2012
Just published Guidelines for Subject Access in National Bibliographies IFLA Series on Bibliographic Control 45.
In a networked and globalized world of information the form of national bibliographies may have changed, however their major function remains unchanged: to inform about a country’s publication landscape, its cultural and intellectual heritage. Subject access offers a major route into this landscape providing information about the dispersion of publications in specific fields of knowledge and topics contained in a particular national publishing output. The Guidelines for Subject Access in National Bibliographies give graded recommendations concerning subject indexing policies for national bibliographic agencies and illustrating various policies by providing best practice examples.Thanks to Gary Price of INFOdocket for letting me know about this.
at 5:09 PM
Monday, April 30, 2012
The online National Library of Medicine Classification, available at http://www.nlm.nih.gov/class/, has been issued in a newly revised edition as of April 30, 2012.
Summary Statistics for the 2012 NLM Classification
- 37 class numbers added
- 178 class number captions or notes modified
- 4 class numbers canceled
- 2 Table G numbers (Geographic notation) added for South Korea (JK6) and North Korea (JK7); Korea was moved to the Historical Geographic Locations section retaining the same Table G (JK6)
- 88 index main headings added
- 578 index entries modified
- 218 index headings deleted
at 1:49 PM