Showing posts with label XML. Show all posts
Showing posts with label XML. Show all posts

Wednesday, August 22, 2012

JATS: Journal Article Tag Suite

NISO has published the JATS: Journal Article Tag Suite standard.
The National Information Standards Organization (NISO) announces the publication of a new American National Standard, JATS: Journal Article Tag Suite, ANSI/NISO Z39.96-2012. JATS provides a common XML format in which publishers and archives can exchange journal content by preserving the intellectual content of journals independent of the form in which that content was originally delivered. In addition to the element and attribute descriptions, three journal article tag sets (the Archiving and Interchange Tag Set, the Journal Publishing Tag Set, and the Article Authoring Tag Set) are part of the standard. While designed to describe the textual and graphical content of journal articles, it can also be used for some other materials, such as letters, editorials, and book and product reviews....

The JATS standard is available as both an online XML document and a freely downloadable PDF from the NISO website ( Supporting documentation and schemas in DTD, RELAX NG, and W3C Schema formats are available at:

Tuesday, January 31, 2012

MARCXML to MODS 3.4 XSLT (Revision 1.75)

A logo of the Unites States Library of Congres...Image via WikipediaThe revised version of MARCXML to MODS 3.4 XSLT has been announced.
The Library of Congress' MARCXML to MODS 3.4 XSLT stylesheet (Revision 1.75) is now available--it incorporates edits made in response to comments received since the release of Revision 1.74.

The MODS 3.4 XSLT is based on the MARC to MODS 3.4 mapping made available by the Library of Congress in July of 2010 The mapping and the XSLT are also available via the Library of Congress' MODS Web site. They They will be revised periodically as users' comments are received and as subsequent MODS Editorial Committee analysis and decisions evolve.

Friday, April 02, 2010

Basic Group 1 entities and relations of the FR...Image via Wikipedia

The Variations/FRBR project at Indiana University has announced the release of an initial set of XML Schemas for the encoding of FRBRized bibliographic data.
The Variations/FRBR project aims to provide a concrete testbed for the FRBR conceptual model, and these XML Schemas represent one step towards that goal by prescribing a concrete data format that instantiates the conceptual model. Our project has been watching recent work to represent the FRBR-based Resource Description and Access (RDA) element vocabulary in RDF; however, due to the fact that this work represents RDA data rather than FRBR data directly, and that much metadata work in libraries currently (though perhaps not permanently) operates in an XML rather than an RDF environment, we concluded an XML-based format for FRBR data directly was needed at this time. We view XML conforming to these Schemas to be one possible external representation of FRBRized data, and will be exploring other representations (including RDF) in the future. We define "implementing FRBR," as the conceptual models described in the companion FRBR and FRAD reports; at this time we are not actively working on the model defined in the draft FRSAD report. Perhaps the most notable feature of the Variations/FRBR XML Schemas is their existence at three "levels": frbr, which embodies faithfully only those features defined by the FRBR and FRAD reports; efrbr, which adds additional features we hope will make the data format more "useful"; and vfrbr, which both contracts and extends the FRBR and FRAD models to create a data representation optimized for the description of musical materials and we hope provides a model for other domain-specific applications of FRBR.

Friday, December 18, 2009


XForms4lib is a new e-mail list for discussing the use of W3C XForms in connection with library metadata.

Monday, April 06, 2009

Date and Time Format

Work being done at LC.
The Library of Congress, in conjunction with several partners, has initiated an effort to develop a simple XML date/time format that can be referenced by XML schemas. In the MODS schema, for example, date formats are restricted to 'w3cdtf' (defined as the W3C Date Time Format Note which is a profile of ISO 8601), 'iso8601' (defined as the alternative in ISO 8601 "basic" that specifies the form YYYYMMDD, etc. rather than the form with hyphens), and 'marc' (defined as the conventions used in the MARC 008/07-14 character positions). None of these really meets the requirements of a date time format for these schemas.

Please see which provides a rationale for this work and the requirements that it addresses. We would be pleased to hear of additional requirements or any other comments or suggestions.

Wednesday, March 04, 2009

eXtensible Text Framework

News from the The California Digital Library about a new tool.
The California Digital Library (CDL) is pleased to announce the availability of an extensive self-guided tutorial for its eXtensible Text Framework (XTF) application. XTF is an open source, highly customizable piece of software supporting the search, browse, and display of heterogeneous digital content and offering efficient and practical methods for creating customized end-user interfaces for distinct digital collections. The tutorial provides guidance for implementing and customizing XTF, from core functionality to overall look and feel. Downloads for the Mac and Windows operating systems are available from the XTF Project page on SourceForge along with the complete distribution and documentation.

The tutorial comes with a complete XTF package that is ready to run when uncompressed; no other installation is required. It contains nine modules spanning the most powerful and popular features, including how to:

  • Add new content
  • Change metadata
  • Change logo and colors
  • Increase significance of titles in ranking hits
  • Customize and enable default status of advanced search
  • Change fields displayed in search results
  • Enable structural searching
  • Create a hierarchical facet
  • Change footnote behavior

Monday, November 17, 2008


Adding semantic mark-up to text is something the cataloger in me always finds good. Microformats, XML, or RDF all make searches more precise. Lemon8-XML is a tool to chamge scholarly papers in MS Word or Open Office formats into XML. Sweet idea.
Lemon8-XML is a web-based application designed to make it easier for non-technical editors and authors to convert scholarly papers from typical word-processor editing formats such as MS-Word .DOC and OpenOffice .ODT, into publishing layout formats such as the open, industry-standard NLM Journal Publishing XML format.

To use Lemon8-XML, you don't need to understand XML, all you need is a little time and a general understanding of how scholarly articles are structured. In general, this means a document with:

  1. some information about the article and authors at the top
  2. usually an abstract
  3. several sections, often titled "introduction", "methods", "results", etc.
  4. optional figures or tables, either in-text or as appendices
  5. a list of references or citations in a standardized format (eg. MLA, APA, etc.)
It is from the Public Knowledge Project.

Friday, October 17, 2008


The technical specification RDFa in XHTML Syntax and Processing was formally accepted as a Web Consortium Technical Recommendation by W3C Director Tim Berners-Lee.
The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported into a user's desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo's creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.

RDFa is a specification for attributes to express structured data in any markup language. This document specifies how to use RDFa with XHTML.

Wednesday, October 08, 2008


The latest issue of Nodalities has an interesting article, Anatomy Of A SearchMonkey by Peter Mika. It is a run-down of Yahoo's new Semantic Web search platform. The part that interested me was a flavor of ATOM, DataRSS.
These considerations led to the development of DataRSS, an extension of Atom for carrying structure data as part of feeds. A standard based on Atom immediately opens up the option of submitting metadata as a feed. Atom is an XML-based format which can be both input and output of XML transformation. The extension provides the data itself as well as metadata such as which application generated the data and when was it last updated.

Tuesday, September 02, 2008

Dublin Core in XML

The Dublin Core folks are looking for comments.
"Expressing Dublin Core description sets using XML (DC-DS-XML)" by Pete Johnston and Andy Powell has been published as a DCMI Proposed Recommendation for public comment from 1 to 29 September 2008. A related document, "Notes on the DC-DS-XML XML Format", describes the development of the format and its relationship to the DCMI Recommendation "Guidelines for implementing Dublin Core in XML" of April 2003. The Proposed Recommendation supports the W3C specification Gleaning Resource Descriptions from Dialects of Languages (GRDDL) in the form of an XSLT transform for extracting RDF triples from instances of metadata in the DC-DS-XML format. The specification includes 21 examples together with their equivalent representations in the DC-Text and RDF/XML syntaxes. A W3C XML Schema for the DC-DS-XML format is provided. Interested members of the public are invited to post comments to the DC-ARCHITECTURE mailing list, including [Public Comment] in the subject line.

Monday, May 19, 2008

XML Workshop

A couple of years ago I had the pleasure of taking the XML workshop offered by Eric Lease Morgan. One of the best workshops I've experienced. Now the notes have been revised and are available online.
XML is about distributing data and information unambiguously. Through this hands-on workshop you will learn: 1) what XML is, and 2) how it can be used to build library collections and faciliate library services in our globally networked environment.
  • An introduction to XML
  • Activity - Beyond MARC
  • Indexes make search easier
  • Activity - Indexing/searching MODS
  • Activity - Writing XML
  • Flavors of XML
  • Activity - Writing XML, redux
  • Activity - Full-text indexes
  • Client/server computing
  • Databases for data storage and maintenance
  • OAI-PMH - a de-centralized OCLC
  • Activity - Being an OAI service provider
  • Activity - Being an OAI data repository
  • Web Services
  • Activity - Creating a "mash-up"
  • Workshop summary
  • External links

Thursday, September 13, 2007

W3C Completes Bridge Between HTML/Microformats and Semantic Web

Big news from the W3C, GRDDL.
Today, the World Wide Web Consortium completed an important link between Semantic Web and microformats communities. With "Gleaning Resource Descriptions from Dialects of Languages", or GRDDL (pronounced "griddle"), software can automatically extract information from structured Web pages to make it part of the Semantic Web. Those accustomed to expressing structured data with microformats in XHTML can thus increase the value of their existing data by porting it to the Semantic Web, at very low cost.

"Sometimes one line of code can make a world of difference," said Tim Berners-Lee, W3C Director. "Just as stylesheets make Web pages more readable to people, GRDDL makes Web pages, microformat tags, XML documents, and data more readable to Semantic Web applications, opening more data to new possibilities and creative reuse."

Tuesday, September 04, 2007

Telescope Metadata

More and more people are getting into the metadata game. Here is a proposed XML metadata schema for telescopes.
Earlier I described my idea for an RSS-like XML feed for telescopes. The idea was to allow anyone to keep up with what particular telescopes were doing. In this post I will try to describe my current idea.
He is looking for comments.

Tuesday, March 27, 2007


Schematron is an XML schema language that sounds better than what we have been using for our data. It allows for data validation. O'Reilly has a paid downloadable PDF on Schematron.
Schematron is a rule-based XML schema language, offering flexibility and power that W3C XML schema, RELAX NG, and DTDs simply can't match.

You need Schematron and can't settle for other languages if you have to check rules that go beyond checking the document structures (i.e., checking that an element bar is included in element foo) and their datatypes. Schematron is the right tool for checking conditions such as "startDate is earlier than or equal to endDate."

Schematron is also the right tool to use if you have to raise user-friendly error messages rather than depend on error messages that are generated by a schema processor and that are often obscure. Schematron builds on XPath. You will need to understand XPath to to get the most from Schematron.

Friday, February 09, 2007

Yahoo Pipes

Yahoo Pipes looks interesting. No time to play with it now, but how about running the RSS feeds from our catalogs through Flickr, Google Maps or .... Dead simple mash-ups. Accepts XML, so it is not limited to RSS feeds, those just seem to be the most commonly used.

Friday, July 19, 2002


A List Apart has the paper "Using XML by J David Eisenberg.
More than a rulebook for generating your own markup, XML is part of a family of technologies that work together in powerful ways. Eisenberg demonstrates some of that power by creating an XML-based markup language from scratch and transforming it for a variety of formats, using nothing but his noggin and some off-the-shelf tools.