Drawing Context from the Linked Data Web: The Example of the 20th Century Press Archives

Joachim Neubert

Intended audience: Library people who are interested in the much hyped Linked Data approach and wonder what it could do in practice for library applications.


Abstract: As an immense collection of 30 million press clippings which cover almost every issue under public discussion throughout the last century, the thematic dossiers of the 20th Century Press Archives bear value for historians as well as journalists and the general public. But without the guiding hand of an experienced librarian, the digitized folders of the archives, organised in large sequences or trees and bearing only short plain text labels, are difficult to discover and offer no navigation or thematic context.

 Overcoming this with manual annotations is not affordable. But there is already a wealth of data accessible in the Linked Data Cloud. Manually assigned id values from the German Personal Names Authority File for biographical dossiers or automatically looked up GeoNames ids for, e.g., the locations of companies provide a bridge into this cloud. There we could join to further information, e.g. abstracts and images from DBpedia or the publications recorded in national libraries via VIAF. Where newspaper archives provide Linked Data, as the "Chronicling America" project does, background information about long vanished sources and sometimes whole digitized issues can be linked in. 

A live demonstration of the archives new web site (http://zbw.eu/beta/p20) shows the added value for human users. (For Semantic Web browsers and crawlers, the site is accessible and navigatable too through embedded RDFa data.) Pulled-in metadata, for example about enclosing geographical entities, allows it to project additional navigational structure upon the dossiers. And even though the original metadata of the archives was German-only (as opposed to the often foreign-language newspaper clippings), context information in English from the LOD cloud made it possible to build an English version of the site easily.


This proposal focuses on the consumption of linked data in a library application. Alternatively, the main focus could be laid on providing linked data and how OAI-ORE and RDFa help in making this data useful für others, if this would be a better fit for the overall conference programme.



Attached files

Drawing Context from the Linked Data Web: The 20th Century Press Archives (Neubert) (5,69 MB)

elag2011_Neubert (53 MB)

elag2011_Neubert.m4v (55,01 MB)

Editor: Milan Janíček
Last modified: 21.6. 2011 13:06  
Contact: +420 232 002 515, milan.janicek@techlib.cz