Changes

Jump to: navigation, search

2013 talks proposals

1,565 bytes added, 18:32, 8 November 2012
no edit summary
PostgreSQL for your data, you might want to try it after this
presentation; if you already do, you'll pick up some new tips and tricks.
 
 
== A Cure for Romnesia: Site Story Web-Archiving ==
 
* Harihar Shankar, Research Library, Los Alamos National Laboratory, harihar@lanl.gov
 
The web changes constantly, erasing both inconvenient facts and
fictions. At web-scale, preservation organizations cannot be expected
to keep up by using traditional crawling, and they already miss many
important versions. The cure for this is to capture the interactions
between real browsers and the server, and push these into an archive
for safe keeping rather than trying to guess when pages change.
 
Every time the Apache Web Server sends data to a browser, SiteStory’s
Apache Module also pushes this data to the SiteStory Web Archive. The
same version of a resource will not be archived more than once, no
matter how many times it has been requested. The resulting archive is
effectively representative of a server's entire history, although
versions of resources that are never requested by a browser will also
never be archived.
 
In this presentation I will give an overview of SiteStory, an
Open-Source project written in Java that runs as an application under
Tomcat 6 or greater. SiteStory’s Apache Module is written in C. I will
also demonstrate the TimeMap tool that visualizes versions of a
resource available in the SiteStory archive. The TimeMap tool is a
Firefox browser extension that plots versions of a resource on a
SIMILE timeline. Since the tools uses the Memento protocol, it can
also display versions of resources available in Memento compliant web
archives and content management systems.
[[Category:Code4Lib2013]]

Navigation menu