Changes

Jump to: navigation, search

2015 Prepared Talk Proposals

750 bytes added, 03:14, 5 November 2014
no edit summary
More and more libraries and archives are creating web archiving programs. For both new and established programs, these archives can consist of hundreds of thousands, if not millions, of born-digital resources within a single collection; as such, they are ideally suited for large-scale computational study and analysis. Yet current access methods for web archives consist largely of browsing the archived web in the same manner as browsing the live web and the size of these collections and complexity of the WARC format can make aggregate analysis difficult. This talk will describe a project to create new ways for users and researchers to access and study web archives by offering extracted and post-processed datasets derived from web collections. Working with the 325+ institutions and their 2600+ collections within the Archive-It service, the Internet Archive is building methods to deliver a variety of datasets culled from collections of web content, including extracted metadata packaged in JSON, longitudinal link graph data, named entities, and other types of data. The talk will cover the technical details of building dataset production pipelines with Apache Pig, Hadoop, and tools like Stanford NER, the programmatic aspects of building data services for archives and researchers, and ongoing work to create new ways to access and study web archives.
 
== Awesome Pi, LOL! ==
 
* Matt Connolly, mconnolly@cornell.edu, Cornell University Library
* Jennifer Colt, jrc88@cornell.edu, Cornell University Library
 
Inspired by Harvard Library Lab’s “Awesome Box” project, Cornell’s Library Outside the Library (LOL) group is piloting a more automated approach to letting our users tell us which materials they find particularly stunning. Armed with a Raspberry Pi, a barcode scanner, and some bits of kit that flash and glow, we have ventured into the foreign world of hardware development. This talk will discuss what it’s like for software developers and designers to get their hands dirty, how patrons are reacting to the Awesomizer, and LOL’s not-afraid-to-fail philosophy of experimentation.
12
edits

Navigation menu