Thursday, December 06, 2007

Unicode and MARC

News from LC.
The revised Character set specifications are now posted on the MARC site. They take into account the use of the full Unicode repertoire, as opposed to only the MARC-8 subset of Unicode, and also include the loss-less and lossy techniques for converting full Unicode to MARC-8 repertoire that were approved this year.

The MARC-8 specifications are still part of the document and the MARC-8 character code tables and mappings have some improved formatting, but no changes have been made to the MARC-8 to Unicode character set mappings.The XML (all MARC-8 repertoire) and comma-delimited (East Asian MARC-8 only) files are still downloadable, but we plan to improve the XML file in the near future. We are interested to know whether the comma-delimited file is used, as we may only need to offer the XML for download.

