Changes
Added "Codename Arctika" pitch
We will present an overview of currently-available datasets, what it takes to create and use snapshots of the data, and explore how the library community might push some of its own large stores of data and metadata into the cloud.
'''Talk Title:'''
Codename Arctika
'''Speaker name(s), affiliation(s), and email address(es):'''
Toke Eskildsen, The State and University Library of Denmark, te@statsbiblioteket.dk
'''Abstract:'''
There's something missing in the state of Denmark. Most of our web based copyright deposit material is trapped in a dark archive. After a successful pilot; money and time has been allocated to open part of the data. We tried NutchWAX and it worked well, but we wanted more. Proper integrated search with existing library material, extraction of names etc. Therefore we propose the following recipe: Take a slice of a dark archive with copyright deposit material. Get permission to publish it (the tricky bit). Add an ARC reader to get the bits, Tika to get the text and Summa to get large-scale index and faceting. We mixed it up and we will show what happened.