Wednesday, July 10, 2002

Metadata Extraction

Parser::Citation is a Perl module for extracting reference metadata from scholarly eprint papers.
Currently, Citation.pm attempts to parse the following metadata from references to other journal papers (it is not good at parsing metadata from references to books, conference proceedings, theses, etc.) :
  • name of the authors
  • name of the first author
  • journal title
  • volume
  • issue or supplement
  • start page
  • year

    Sometimes the title of the referenced paper is also extracted if it is in an easy-to-recognise form (e.g. enclosed in double quotes). These data are sufficient to identify a journal paper uniquely for reference linking purposes.

  • That last line is interesting, they have linked to arXiv.org sucessfully.