Tuesday, June 21, 2011

ParsCit Updated

The ParsCit team has also been updating the ParsCit package, and is happy to announce a new version that improves on classification accuracy, especially for general science journals. This version also adds a module that further processes XML files that are the output of the commercial Omnipage OCR engine. The version also benefits from a number of user-contributed fixes and training data, such as separating volume and issue numbers for journals, and export of parsed reference strings into EndNote, MODS, BibTeX or other metadata formats via the BiblioScript library.

You can either download a copy of ParsCit for your own use, or use it through a web services interface. We welcome your feedback and hope that if you use ParsCit or any other freely available reference string parsing tool that you can contribute annotated data to help make these models more robust.

ParsCit (and its online demos) are available from:

ParsCit is open source software that is used by many projects worldwide, and not just in experimental, research and academic places, but in commercial enterprises as well. Mendeley is using ParsCit to parse references from contributed papers, as is the Citations in Economics (CitEc) project.

Bruce said...

Do you have insight into this rather odd note on the source code - "We have also pushed a copy of the ParsCit current distribution into GitHub. While we'll strive to keep the GitHub version as updated as possible, the versions on this page will remain the most authoritative for major updates. "?

Seems to me quite backward.