Friday, February 01, 2008

Capturing Government Documents

Managing Web Harvested Content: Results from the EPA Harvesting Pilot Project describes the results of a crawl of the EPA site by the GPO. They have questions about thier use of PURLs, keeping local copies of the harvested items and bib level considered useful. Comments accepted through Feb. 8.
LSCM believes that providing access to the monographs and serials harvested as part of the EPA Pilot Project via the CGP best serves the needs of the depository community and the general public. As can be seen from the sample of 300 publications, making the content from the EPA Pilot Project accessible to the public is a multi-step process and involves the commitment of a significant amount of time. However, as staff become more familiar with the new brief bibliographic record format the time required to create one of these records will decrease. The identification of complete publications, the identification all the parts or issues of a title scattered within the results of the harvest and the de-duplication of the contents will continue to require a significant amount of time and staff to complete.

Additionally 1,000 monographs within scope of the FDLP have been identified from EPA Pilot Project for inclusion in the Automated Metadata Extraction Project. This is a two year project with the Defense Technical Information Service (DTIC) and Old Dominion University (ODU) to use automated metadata extraction software tools to create metadata for groups of electronic publications in GPO’s electronic collection. This is a two year project and the results are not expected until near the end of the project.

No comments: