The ParsCit team has also been updating the ParsCit package, and is happy to announce a new version that improves on classification accuracy, especially for general science journals. This version also adds a module that further processes XML files that are the output of the commercial Omnipage OCR engine. The version also benefits from a number of user-contributed fixes and training data, such as separating volume and issue numbers for journals, and export of parsed reference strings into EndNote, MODS, BibTeX or other metadata formats via the BiblioScript library.
You can either download a copy of ParsCit for your own use, or use it through a web services interface. We welcome your feedback and hope that if you use ParsCit or any other freely available reference string parsing tool that you can contribute annotated data to help make these models more robust.
ParsCit (and its online demos) are available from:
ParsCit is open source software that is used by many projects worldwide, and not just in experimental, research and academic places, but in commercial enterprises as well. Mendeley is using ParsCit to parse references from contributed papers, as is the Citations in Economics (CitEc) project.
Tuesday, June 21, 2011
Image via WikipediaSeen on Code4Lib.