2010talks Submissions

Submissions for 20-Minute Talk Slots

Edit this page to submit your proposal for a 20-minute talk at the Code4Lib 2010 Conference. For more information, see the Call for submissions. Please follow the formatting guidelines:

Talk Title:

Speaker name(s), affiliation(s), and email address(es):

Abstract of no more than 500 words:

Place your submission at the bottom of the page below this line:

Talk Title:

Mobile Web App Design: Getting Started

Speaker name, affiliation, and email address:

Michael Doran, University of Texas at Arlington, doran@uta.edu, http://rocky.uta.edu/doran/

Abstract:

Creating or adapting library web applications for mobile devices such as the iPhone, Android, and Palm Pre is not hard, but it does require learning some new tools, new techniques, and new approaches. From the Tao of mobile web app design to using mobile device SDKs for their emulators, this presentation will give you a jump-start on mobile cross-platform design, development, and testing. And all illustrated with a real-world mobile library web application.

Talk Title:

Drupal 7: A more powerful platform for building library applications

Speaker name, affiliation, and email address:

Cary Gordon, The Cherry Hill Company, cgordon@chillco.com

Abstract:

The release of Drupal 7 brings with it a big increase in utility for this already very useful and well-accepted content management framework. Specifically, the addition of fields in core, the inclusion of RDFa, the use of the PHP_db abstraction layer, and the promotion of files to first class objects facilitate the development of richer applications directly in Drupal without the need to integrate external products.

Talk Title:

Fiwalk with Me: Using Automatic Forensics Tools and Python for Digital Curation Triage

Speaker name, affiliation, and email address:

Mark Matienzo, The New York Public Library, mark@matienzo.org

Abstract of no more than 500 words:

Building on Simson Garfinkel's work in Automated Document and Media Exploitation (ADOMEX), this project investigates digital curation applications of open source tools used in digital forensics. Specifically, we will be using AFFLib's fiwalk ("file and inode walk") application and its corresponding Python library to develop a basic triage workflow for accessioned hard drives, removable media, or disk images. These tools will allow us to create a simple, Web-based "digital curation workbench" application to do preliminary analysis and processing of this data.

Talk Title:

Do it Yourself Cloud Computing with Apache and R

Speaker name, affiliation, and email address:

Harrison Dekker, University of California, Berkeley, hdekker@library.berkeley.edu

Abstract of no more than 500 words:

R is a powerful and extensible open source statistical analysis application. Rapache, software developed at Vanderbilt University, allows web developers to leverage the numeric processing and graphical capabilities of R in real-time through simple Apache server requests. This presentation will provide an overview of both R and rapache and will explore how these tools are relevant to the library community.

Talk Title:

Metadata editing - a truly extensible solution

Speaker name, affiliation and email address:

David Kennedy, Duke University, david.kennedy@duke.edu
David Chandek-Stark, Duke University, david.chandek.stark@duke.edu
http://library.duke.edu/trac/dc/wiki/Trident

Abstract of no more than 500 words:

We set out in the Trident project to create a metadata tool that scales. In doing so we have conceived of the metadata application profile, a profile which provides instructions for software on how to edit metadata. We have built a set of web services and some web-based tools for editing metadata. The metadata application profile allows these tools to extend across different metadata schemes, and allows for different rules to be established for editing items of different collections. Some features of the tools include integration with authority lists, auto-complete fields, validation and clean integration of batch editing with Excel. I know, I know, Excel, but in the right hands, this is a powerful tool for cleanup and batch editing.

In this talk, we want to introduce the concepts of the metadata application profile, and gather feedback on its merits, as well as demonstrate some of the tools we have developed and how they work together to manage the metadata in our Fedora repository.

Talk Title:

Flickr'ing the Switch

Speaker name, affiliation and email address:

Dianne Dietrich, Cornell University Library, dd388@cornell.edu

Abstract of no more than 500 words:

We started out with a simple dream – to pilot a handful of images from our collection in Flickr. Since June 2009, we've grown that dream from its humble beginnings into something bigger: we now have a Flickr collection of over two thousand images. We added geocoding and tags, repurposed our awesome structured metadata, and screenscraped the rest. This talk will focus on the code, which made most of this possible.

This includes (and is certainly not limited to) using the Python Flickr API, various geocoding tools, crafting Flickr metadata by restructuring XML data from Luna Insight, screenscraping any descriptive text we could get our hands on, negotiating naming conventions for thousands of images, thinking cleverly in order to batch update images on Flickr at a later point (we had to do this more than once), using digital forensic tools to save malformed tifs (that were digitized in 1998!), and, finally, our efforts at scaling everything up so we can integrate our Flickr project into the regular workflow at technical services.

Talk Title:

library/mobile: Developing a Mobile Catalog

Speaker name(s), affiliation(s), and email address(es):

Kim Griggs, Oregon State University Libraries, kim.griggs@oregonstate.edu

Abstract of no more than 500 words:

The increased use of mobile devices provides an untapped resource for delivering library resources to patrons. The mobile catalog is the next step for libraries in providing universal access to resources and information.

This talk will share Oregon State University (OSU) Libraries’ experience creating a custom mobile catalog. The discussion will first make the case for mobile catalogs, discuss the context of mobile search, and give an overview of vendor and custom mobile catalogs. The second half of the talk will look under the hood of OSU Libraries' custom mobile catalog to provide implementation strategies and discuss tools, techniques, requirements, and guidelines for creating an optimal mobile catalog experience that offers services that support time critical and location sensitive activities.

Talk Title:

Enhancing discoverability with virtual shelf browse

Speaker name(s), affiliation(s), and email address(es):

Andreas Orphanides, NCSU Libraries, andreas_orphanides@ncsu.edu
Cory Lown, NCSU Libraries, cory_lown@ncsu.edu
Emily Lynema, NCSU Libraries, emily_lynema@ncsu.edu

Abstract of no more than 500 words:

With collections turning digital, and libraries transforming into collaborative spaces, the physical shelf is disappearing. NCSU Libraries has implemented a virtual shelf browse tool, re-creating the benefits of physical browsing in an online environment and enabling users to explore digital and physical materials side by side. We hope that this is a first step towards enabling patrons familiar with Amazon and Netflix recommendations to "find more" in the library.

We will provide an overview of the architecture of the front-end application, which uses Syndetics cover images to provide a "cover flow" view and allows the entire "shelf" to be browsed dynamically. We will describe what we learned while wrangling multiple jQuery plugins, manipulating an ever-growing (and ever-slower) DOM, and dealing with unpredictable response times of third-party services. The front-end application is supported by a web service that provides access to a shelf-ordered index of our catalog. We will discuss our strategy for extracting data from the catalog, processing it, and storing it to create a queryable shelf order index.

Talk Title:

Where do mobile apps go when they die? or, The app with a thousand faces.

Speaker name, affiliation, and email address:

Jason Casden, North Carolina State University Libraries, jason_casden@ncsu.edu

Abstract:

New capabilities in both native and web-based mobile platforms are rapidly expanding the possibilities for mobile library services. In addition to developing small-screen versions of our current services, at NCSU Libraries we attempt to develop new services that take unique advantage of the mobile user context. Some of these ideas may require capabilities that are not exposed to the mobile browser. Smart technical planning can help to make sound development decisions when experimenting with mobile-enhanced development, while remaining agile when faced with constantly changing technical and non-technical restraints and opportunities.

This talk will be based on my experience as a developer of both native iPhone and web-based mobile library apps at NCSU Libraries, and with the effort to port our geo-mobile WolfWalk iPhone app to the web. I will also discuss some opportunities being created by other platforms, particularly Android-based devices.

Talk Title:

Using Google Voice for Library SMS

Speaker name, affiliation, and email address:

Eric Sessoms, Nub Games, Inc., nubgames@gmail.com
Pam Sessoms, UNC Chapel Hill, psessoms@gmail.com

Abstract:

The LibraryH3lp Google Voice/SMS gateway (free, full AGPL source available at http://github.com/esessoms/gvgw, works with any XMPP server, LibraryH3lp subscription not required) enables libraries to easily integrate texting services into their normal IM workflow. This talk will review the challenges we faced, especially issues involved with interfacing to a Google service lacking a published API, and will outline the design of the software with particular emphasis on features that help the gateway to be more responsive to users. Because the gateway is written in the Clojure programming language, we'll close by highlighting which features of the language and available tools had the greatest positive and negative impacts on our development process.

Talk Title:

Building a discovery system with Meresco open source components

Speaker name, affiliation, and email address:

Karin Clavel, TU Delft Library, The Netherlands, c.l.clavel@tudelft.nl
Etienne Posthumus, TU Delft Library, The Netherlands, e.posthumus@tudelft.nl

Abstract:

TU Delft Library uses Meresco, an open source component library for metadata management, to implement a custom integrated search solution called Discover). In Discover, different Meresco components are configured to work together in an efficient observer pattern, defined in what is called Meresco DNA (written in Python). The process is as follows: metadata is harvested from different sources using the Meresco harvester. It is then cross-walked into (any format you like, but we chose) MODS, then normalized, stored and indexed in three distinct but integrated indexes: a full-text Lucene index, a facet index and N-gram index for suggestions and fixing spelling mistakes. The facet index supports multiple algoritmes: drilldown, Jaccard, Mutual Information (or Information Gain) and Χ². One of the facets is used to cluster the search results by subject by using the Jaccard and Mutual Information algorithms.

The query parser component automatically detects and supports Google-like, Boolean and field-specific queries. Different XML documents describing the same content item coalesce to provide the user interface with an easy way to access metadata from either the original or normalized metadata or from user generated metadata such as ratings or tags. Other Meresco components provide an SRU and a RSS interface.

Discover currently holds all catalogue records, the institutional repository metadata, an architecture bibliography and a test-set of Science Direct articles. In 2010, it is expected to grow to over 10 million records with content from Elsevier, IEEE and Springer (subject to negotiatons with these publishers) and various open access resources. We will also add the university’s multimedia collection, ranging from digitized historical maps, drawing and photographs to recent (vod- and) podcasts.

In the proposed session, we would like to show you some examples of above mentioned functionality and explain how Meresco components work together to create this flexible system.

Talk Title:

Take control of library metadata and websites using the eXtensible Catalog

Speaker name(s), affiliation(s), and email address(es):

Jennifer Bowen, University of Rochester, jbowen@library.rochester.edu

Abstract of no more than 500 words:

The eXtensible Catalog Project has developed four open-source software toolkits that enable libraries to build and share their own web- and metadata-focused applications on top of a service-oriented architecture that incorporates Solr in Drupal, a robust metadata management platform, and OAI-PMH and NCIP-compatible tools that interact with legacy library systems in real-time.

XC’s robust metadata management platform allows libraries to orchestrate and sequence metadata processing services on large batches of metadata. Libraries can build their own services using the available “service-writers toolkit” or choose from our initial set of metadata services that clean up and “FRBRize” MARC metadata. Another service will aggregate metadata from multiple repositories to prepare it for use in unified discovery applications. XC software provides an RDA metadata test bed and a Solr-based metadata “navigator” that can aggregate and browse metadata (or data) in any XML format. XC’s user interface platform is the first suite of Drupal modules that treat both web content and library metadata as native Drupal nodes, allowing libraries to build web-applications that interact with metadata from library catalogs and institutional repositories as well as with library web pages. XC’s Drupal modules enable Solr in a FRBRized data environment, as a first step toward a full implementation of RDA. Other currently-available XC toolkits expose legacy ILS metadata, circulation, and patron functionality via web services for III, Voyager and Aleph (to date) using standard protocols (OAI-PMH and NCIP), allowing libraries to easily and regularly extract MARC data from an ILS in valid MARCXML and keep the metadata in their discovery applications “in sync” with source repositories.

This presentation will showcase XC’s metadata processing services, the metadata “navigator” and the Drupal user interface platform. The presentation will also describe how libraries and their developers can get started using and contributing to the XC code.

Talk Title:

I Am Not Your Mother: Write Your Test Code

Speaker name, affiliation, and email address:

Naomi Dushay, Stanford University, ndushay@stanford.edu

Abstract:

How is it worth it to slow down your code development to write tests? Won’t it take you a long time to learn how to write tests? Won’t it take longer if you have to write tests AND develop new features, fix bugs? Isn’t it hard to write test code? To maintain test code? I will try to answer these questions as I talk about how test code is crucial for our software. By way of illustration, I will show how it has played a vital role in making Blacklight a true community collaboration, as well as how it has positively impacted coding projects in the Stanford Libraries.

Talk Title:

How To Implement A Virtual Bookshelf With Solr

Speaker name, affiliation, and email address:

Naomi Dushay, Stanford University, ndushay@stanford.edu
Jessie Keck, Stanford University, jkeck@stanford.edu

Abstract:

Browsing bookshelves has long been a useful research technique as well as an activity many users enjoy. As larger and larger portions of our physical library materials migrate to offsite storage, having a browse-able virtual shelf organized by call number is a much-desired feature. I will talk about how we implemented nearby-on-shelf in Blacklight at Stanford, using Solr and SolrMarc:

the code to get shelfkeys out of call numbers
the code to lop volume data off the end of call numbers to avoid clutter in the browse
what I indexed in Solr given we have
1. multiple call numbers for a single bib record
2. multiple bib records for a single call number
Solr configuration, requests and responses to get call numbers before and after a given starting point as well as the desired information for display.
Other code needed to implement this feature in Blacklight (concepts easily ported to other UIs).

This virtual shelf is not only browsable across locations, but includes any item with a call number in our collection (digital or physical materials).

All code is available, or will be by Code4Lib 2010.

Talk Title:

A Better Advanced Search?

Speaker name, affiliation, and email address:

Naomi Dushay, Stanford University, ndushay@stanford.edu
Jessie Keck, Stanford University, jkeck@stanford.edu

Abstract:

Even though we’d like to get basic searches working so well that advanced search wouldn’t be necessary, there will always be a small set of users that want it, and there will always be some library searching needs that basic searching can’t serve. Our user interface designer was dissatisfied with many aspects of advanced search as currently available in most library discovery software; the form she designed was excellent but challenging to implement. See http://searchworks.stanford.edu/advanced We’ll share details of how we implemented Advanced Search in Blacklight:

thoughtfully designed html form for the user (NOT done by techies!)
boolean syntax while using Solr dismax magic (dismax does not speak Boolean)
checkbox facets (multiple facet value selection)
fielded searching while using Solr dismax magic (dismax allows complex weighting formulae across multiple author/title/subject/… fields, but does not allow “fielded” searching in the way lucene does)
1. easily configured in solrconfig.xml
manipulating user entered queries before sending them to Solr
making advanced search results look like other search results: breadcrumbs, selectable facets, and other fun.

Talk Title:

Scholarly annotation services using AtomPub and Fedora

Speaker name, affiliation, and email address:

Andrew Ashton, Brown University, andrew_ashton@brown.edu

Abstract:

We are building a framework for doing granular annotations of objects housed in Brown’s Digital Repository. Beginning with our TEI-encoded text collections, and eventually expanding to other media, these scholarly annotations are themselves objects stored and preserved in the repository. They are linked to other resources via URI references, and deployed using AtomPub services as part of Fedora’s Service/Dissemination model.

This effort stems from the recognition that standard web annotation techniques (e.g. tagging, Google Sidebar, page-level commenting, etc.) are not flexible or persistent enough to handle scholarly annotations as an organic part of natively digital research collections. We are developing solutions to several challenges that arise with this approach; particularly, how do we address highly granular portions of digital objects in a way that is applicable to different types of media (encoded texts, images, video, etc.). This presentation will provide an overview of the architecture, a discussion of the possibilities and problems we face in implementing this framework, and a demo of a live project using Atom annotations with a digital research collection.

Talk Title:

With Great Power... Managing an Open-Source ILS in a state-wide consortium.

Speaker name(s), affiliation(s), and email address(es):

Emily A. Almond, Software Development Manager, PINES/Georgia Public Library Service, ealmond@georgialibraries.org

Abstract:

Using agile software development methodology + project management to achieve a balance of support and expertise. Lessons learned after implementation that inform how the consortium should evolve so that you can utilize your new ILS for the benefit of all stakeholders. Topics covered: -- troubleshooting and help desk support -- development project plans -- roles and responsibility shifts -- re-branding the ILS and related organizations.

Talk Title:

Data Modeling; Logical Versus Physical; Why Do I Care?

Speaker name(s), affiliation(s), and email address(es):

Steve Dressler, Georgia Public Library Services, sdressler@georgialibraries.org

Abstract of no more than 500 words:

I am sure we have all been in the situation of having mountains of data stored in our database, needing a piece of information and yet being unable to determine how to get what we need. Computerized databases have been around for decades now and there are several architectures available; however, the ability of a database developer, regardless of the architecture, to store data in a format that is comprehensible to a businessperson yet readily accessible through software applications remains an impossible challenge.

Topics to be discussed include o Components comprising a logical model, how it is developed and how is it used? o Components comprising a physical model, how it is developed and how is it used? o What does a logical model look like? o What does a physical model look like? o Who works with a logical model and why? o Who works with a physical model and why? o What is the relationship between the logical model and the physical model? o What kind of a time investment is required to develop and maintain logical and physical models? o What are the challenges of keeping the two models in sync as the software application evolves?

Although data modeling is a huge discipline and presents research topics for millions of theses and dissertations, this twenty-minute snapshot view will allow anyone, technical or business, to sit through a development meeting and be able to grasp what is being discussed as well as gain a better understanding of logical and physical business flows.

Talk Title:

Media, Blacklight, and viewers like you.

Speaker name, affiliation, and email address:

Chris Beer, WGBH, chris_beer@wgbh.org

Abstract:

There are many shared problems (and solutions) for libraries and archives in the interest of helping the user. There are also many "new" developments in the archives world that the library communities have been working on for ages, including item-level cataloging, metadata standards, and asset management. Even with these similarities, media archives have additional issues that are less relevant to libraries: the choice of video players, large file sizes, proprietary file formats, challenges of time-based media, etc. In developing a web presence, many archives, including the WGBH Media Library and Archives, have created custom digital library applications to expose material online. In 2008, we began a prototyping phase for developing scholarly interfaces by creating a custom-written PHP front-end to our Fedora repository.

In late 2009, we finally saw the (black)light, and after some initial experimentation, decided to build a new, public website to support our IMLS-funded /Vietnam: A Television History/ archive (as well as existing legacy content). In this session, we will share our experience of and challenges with customizing Blacklight as an archival interface, including work in rights management, how we integrated existing Ruby on Rails user-generated content plugins, and the development of media components to support a rich user experience.

Talk Title:

DAMS PAS - Digital Asset Management System, Public Access System

Speaker name(s), affiliation(s), and email address(es):

Declan Fleming, University of California, San Diego, dfleming@ucsd.edu

Esmé Cowles, University of California, San Diego, ecowles@ucsd.edu

Abstract of no more than 500 words:

After years of describing our DAMS with Powerpoint, we finally have a public access system that we can show our mothers. And code4lib! The UCSD Libraries DAMS is an RDF based asset repository containing over 250,000 items and their derivatives. We describe the core system, the metadata and storage challenges involved in managing hundreds of thousands of items, and the interesting political aspects involved in releasing subsets to the public. We also describe the caching approach we used to ensure performance and access control.

Talk Title:

You Either Surf or You Fight: Integrating Library Services with Google Wave

Speaker name(s), affiliation(s), and email address(es):

Sean Hannan, Sheridan Libaries, Johns Hopkins University, shannan@jhu.edu

Abstract of no more than 500 words:

So Google Wave is a new shiny web toy, but did you know that it's also a great platform for collaboration and research? (I bet you did.) ...And what platform for collaboration and research would not be complete without some library tools to aid and abet that process? I will talk about how to take your library web services and integrate them with Google Wave to create bots that users can interact with to get at your resources as part of their social and collaborative work.

Talk Title: The Linked Library Data Cloud: Stop talking and start doing

Speaker name, affiliation, and email address: Ross Singer, Talis, ross.singer@talis.com

Abstract: A year later and how far has Linked Library Data come? Outside of the Swedish National Library's LIBRIS (which already existed), the return of lcsh.info as http://id.loc.gov/authorities/ and LC's Chronicling America, not much. But entry to the Linked Data cloud might be easier than you think. This presentation will describe various projects that are out in the wild that can bridge the gap between our legacy data and the semantic web, incremental steps we can take modeling our data, why linked data matters and a demonstration of how a small template changes can contribute to the Linked Data cloud.

Talk Title: A code4lib Manifesto

Speaker name(s), affiliation(s), and email address(es): Dan Chudnov, No Fixed Hairstyle, dchud at umich edu

Abstract of no more than 500 words: code4lib started with a half dozen library hackers and a list and it ain't like that anymore. I come to code4lib with strong opinions about why it's a positive force in my professional and personal life, but they're probably different from your opinions. I will share these opinions rudely yet succinctly to challenge everyone to think and argue about why code4lib works and what we need to do to keep it working.

Talk Title: Cloud4lib

Speaker name(s), affiliation(s), and email address(es): Jeremy Frumkin, University of Arizona, frumkinj at u library arizona edu
Terry Reese, Oregon State University, terry.reese at oregonstate edu

Abstract of no more than 500 words: Major library vendors are creating proprietary platforms for libraries. We will propose that the code4lib community pursue the cloud4lib, a open digital library platform based on open source software and open services. This platform would provide common service layers for libraries, not only via code, but also allow libraries to easily utilize tools and systems through cloud services. Instead of a variety of competing cloud services and proprietary platforms, cloud4lib will attempt to be a unifying force that will allow libraries to be consumer of the services built on top of it as well as allow developers / researchers / code4lib'ers to hack, extend, and enhance the platform as it matures.

Talk Title:

Iterative development done simply

Speaker name, affiliation, and email address:

Emily Lynema, North Carolina State University Libraries, emily_lynema@ncsu.edu

Abstract:

With a small IT unit and a wide array of projects to support, requests for development from business stakeholders in the library can quickly spiral out of control. To help make sense of the chaos, increase the transparency of the IT “black box,” and shorten time lag between requirements definition and functional releases, we have implemented a modified Agile/SCRUM methodology within the development group in the IT department at NCSU Libraries.

This presentation will provide a brief overview of the Agile methodology as an introduction to our simplified approach to iteratively handling multiple projects across a small team. This iterative approach allows us to regularly re-evaluate requested enhancements against institutional priorities and more accurately estimate timelines for specific units of functionality. The presentation will highlight how we approach each development cycle (from planning to estimating to re-aligning) as well as some of the actual tools and techniques we use to manage work (like JIRA and Greenhopper). It will identify some challenges faced in applying an established development methodology to a small team of multi-tasking developers, the outcomes we’ve seen, and the areas we’d like to continue improving. These types of iterative planning/development techniques could be adapted by even a single developer to help manage a chaotic workplace.

Talk Title

Public Datasets in the Cloud

Speaker name, affiliation and email address:

Rosalyn Metz, Wheaton College, metz_rosalyn@wheatoncollege.edu

Michael B. Klein, Oregon State University, Michael.Klein@oregonstate.edu

Abstract

When most people think about cloud computing (if they think about it at all), it usually takes one of two forms: Infrastructure Services, such as Amazon EC2 and GoGrid, which provide raw, elastic computing capacity in the form of virtual servers, and Platform Services, such as Google App Engine and Heroku, which provide preconfigured application stacks and specialized deployment tools.

Several providers, however, offer access to large public datasets that would be impractical for most organizations to download and work with locally. From a 67-gigabyte dump of DBpedia's structured information store to the 180-gigabyte snapshot of astronomical data from the Sloan Digital Sky Survey, chemistry and biology to economic and geographic data, these datasets are available instantly and backed by enough pay-as-you-go server capacity to make good use of them.

We will present an overview of currently-available datasets, what it takes to create and use snapshots of the data, and explore how the library community might push some of its own large stores of data and metadata into the cloud.

Talk Title:

Codename Arctika

Speaker name(s), affiliation(s), and email address(es):

Toke Eskildsen, The State and University Library of Denmark, te@statsbiblioteket.dk

Abstract:

There's something missing in the state of Denmark. Most of our web based copyright deposit material is trapped in a dark archive. After a successful pilot; money and time has been allocated to open part of the data. We tried NutchWAX and it worked well, but we wanted more. Proper integrated search with existing library material, extraction of names etc. Therefore we propose the following recipe: Take a slice of a dark archive with copyright deposit material. Get permission to publish it (the tricky bit). Add an ARC reader to get the bits, Tika to get the text and Summa to get large-scale index and faceting. We mixed it up and we will show what happened.