The other day I discovered the Web Data Commons, which is building on top of the Common Crawl to extract Microformat, Microdata, and RDFa data and make it available for free download. This means that there is starting to be free structured data from a big portion of the Web available for for anyone to play with at very low cost. Common Crawl takes care of the crawling and then Web Data Commons will do data extraction. This opens up new possibilities for services, specialized search, and aggregations of content. Big web data is being opened up for small startups and individuals.Is your library being crawled? Does it have metadata able to be harvested? Should it? Just asking.
Tuesday, January 31, 2012
Jason Ronallo at Preliminary Inventory of Digital Collections writes about Common Crawl, Web Data Commons, and Microdata.