Changes

Jump to: navigation, search

2015 Prepared Talk Proposals

1,558 bytes added, 02:24, 5 November 2014
added talk proposal
This talk will detail different metadata standards, including PBCore, PREMIS, and reVTMD, that can be implemented as methods of recording this information. Specifically, the talk will examine efforts to integrate this metadata into the Museum of Modern Art’s new digital repository, the DRMC. This talk will provide background on the DRMC as well as MoMA’s specific institutional needs for process history metadata, then discuss different metadata implementations we have considered to document process history.
 
== Pig Kisses Elephant: Building Research Data Services for Web Archives ==
* Jefferson Bailey, jefferson@archive.org, Internet Archive
* Vinay Goel, vinay@archive.org, Internet Archive
 
More and more libraries and archives are creating web archiving programs. For both new and established programs, these archives can consist of hundreds of thousands, if not millions, of born-digital resources within a single collection; as such, they are ideally suited for large-scale computational study and analysis. Yet current access methods for web archives consist largely of browsing the archived web in the same manner as browsing the live web and the size of these collections and complexity of the WARC format can make aggregate analysis difficult. This talk will describe a project to create new ways for users and researchers to access and study web archives by offering extracted and post-processed datasets derived from web collections. Working with the 325+ institutions and their 2600+ collections within the Archive-It service, the Internet Archive is building methods to deliver a variety of datasets culled from collections of web content, including extracted metadata packaged in JSON, longitudinal link graph data, named entities, and other types of data. The talk will cover the technical details of building dataset production pipelines with Apache Pig, Hadoop, and tools like Stanford NER, the programmatic aspects of building data services for archives and researchers, and ongoing work to create new ways to access and study web archives.

Navigation menu