2013 talks proposals

From Code4Lib
Jump to: navigation, search

Voting is complete. See results at: http://vote.code4lib.org/election/results/24

2013 Conference Schedule


Deadline for talk submission has closed. We ask that no changes be made after this point, so that every voter reads the same thing. You can update your description again after voting closes.

Prepared talks are 20 minutes (including setup and questions), and focus on one or more of the following areas:

  • tools (some cool new software, software library or integration platform)
  • specs (how to get the most out of some protocols, or proposals for new ones)
  • challenges (one or more big problems we should collectively address)

The community will vote on proposals using the criteria of:

  • usefulness
  • newness
  • geekiness
  • uniqueness
  • awesomeness

Please follow the formatting guidelines:

== Talk Title ==
 
* Speaker's name, affiliation, and email address
* Second speaker's name, affiliation, email address, if applicable

Abstract of no more than 500 words.

Contents

All Teh Metadatas Re-Revisited

  • Esme Cowles, UC San Diego Library, escowles AT ucsd DOT edu
  • Matt Critchlow, UC San Diego Library, mcritchlow AT ucsd DOT edu
  • Bradley Westbrook, UC San Diego Library, bdwestbrook AT ucsd DOT edu

Last year Declan Fleming presented ALL TEH METADATAS and reviewed our UC San Diego Library Digital Asset Management system and RDF data model. You may be shocked to hear that all that metadata wasn't quite enough to handle increasingly complex digital library and research data in an elegant way. Our ad-hoc, 8-year-old data model has also been added to in inconsistent ways and our librarians and developers have not always been perfectly in sync in understanding how the data model has evolved over time.


In this presentation we'll review our process of locking a team of librarians and developers in a room to figure out a new data model, from domain definition through building and testing an OWL ontology. We¹ll also cover the challenges we ran into, including the review of existing controlled vocabularies and ontologies, or lack thereof, and the decisions made to cover the gaps. Finally, we'll discuss how we engaged the digital library community for feedback and what we have to do next. We all know that Things Fall Apart, this is our attempt at Doing Better This Time.

Modernizing VuFind with Zend Framework 2

  • Demian Katz, Villanova University, demian DOT katz AT villanova DOT edu

When setting goals for a new major release of VuFind, use of an existing web framework was an important decision to encourage standardization and avoid reinvention of the wheel. Zend Framework 2 was selected as providing the best balance between the cutting-edge (ZF2 was released in 2012) and stability (ZF1 has a long history and many adopters). This talk will examine some of the architecture and features of the new framework and discuss how it has been used to improve the VuFind project.

Did You Really Say That Out Loud? Tools and Techniques for Safe Public WiFi Computing

Public WiFi networks, even those that have passwords, are nothing more that an old-time party line: what every you say can be easily heard by anyone nearby. Remember Firesheep? It was an extension to Firefox that demonstrated how easy it was to snag session cookies and impersonate someone else. So what are you sending out over the airwaves, and what techniques are available to prevent eavesdropping? This talk will demonstrate tools and techniques for desktop and mobile operating systems that you should be using right now -- right here at Code4Lib -- to protect your data and your network activity.

Drupal 8 Preview — Symfony and Twig

  • Cary Gordon, The Cherry Hill Company, cgordon@chillco.com

Drupal is a great platform for building web applications. Last year, the core developers decided to adopt the Symfony PHP framework, because it would lay the groundwork for the modernization (and de-PHP4ification) of the Drupal codebase. As I write this, the Symfony ClassLoader and HttpFoundation libraries are committed to Drupal core, with more elements likely before Drupal 8 code freeze.

It seems almost certain that the Twig templating engine will supplant PHPtemplate as the core Drupal template engine. Twig is a powerful, secure theme building tool that removes PHP from the templating system, the result being a very concise and powerful theme layer.

Symfony and Twig have a common creator, Fabien Potencier, who's overall goal is to rid the world of the excesses of PHP 4.

Neat! But How Do We Do It? - The Real-world Problem of Digitizing Complex Corporate Digital Objects

  • Matthew Mariner, University of Colorado Denver, Auraria Library, matthew.mariner@ucdenver.edu

Isn't it neat when you discover that you are the steward of dozens of Sanborn Fire Instance Maps, hundreds of issues of a city directory, and thousands of photographs of persons in either aforementioned medium? And it's even cooler when you decide, "Let's digitize these together and make them one big awesome project to support public urban history"? Unfortunately it's a far more difficult process than one imagines at inception and, sadly, doesn't always come to fruition. My goal here is to discuss the technological (and philosophical) problems librarians and archivists face when trying to create ultra-rich complex corporate digital projects, or, rather, projects consisting of at least three facets interrelated by theme. I intend to address these problems by suggesting management solutions, web workarounds, and, perhaps, a philosophy that might help in determining whether to even move forward or not. Expect a few case studies of "grand ideas crushed by technological limitations" and "projects on the right track" to follow.

ResCarta Tools building a standard format for audio archiving, discovery and display

The free ResCarta Toolkit has been used by libraries and archives around the world to host city directories, newspapers, and historic photographs and by aerospace companies to search and find millions of engineering documents. Now the ResCarta team has released audio additions to the toolkit.

Create full text searchable oral histories, news stories, interviews. or build an archive of lectures; all done to Library of Congress standards. The included transcription editor allows for accurate correction of the data conversion tool’s output. Build true archives of text, photos and audio. A single audio file carries the embedded Axml metadata, transcription, and word location information. Checks with the FADGI BWF Metaedit.

ResCarta-Web presents your audio to IE, Chome, Firefox, Safari, and Opera browsers with full playback and word search capability. Display format is OGG!!

You have to see this tool in action. Twenty minutes from an audio file to transcribed, text-searchable website. Be there or be L seven (Yeah, I’m that old)

Format Designation in MARC Records: A Trip Down the Rabbit-Hole

  • Michael Doran, University of Texas at Arlington, doran@uta.edu

This presentation will use a seemingly simple data point, the "format" of the item being described, to illustrate some of the complexities and challenges inherent in the parsing of MARC records. I will talk about abstract vs. concrete forms; format designation in the Leader, 006, 007, and 008 fixed fields as well as the 245 and 300 variable fields; pseudo-formats; what is mandatory vs. optional in respect to format designation in cataloging practice; and the differences between cataloging theory and practice as observed via format-related data mining of a mid-size academic library collection.

I understand that most of us go to code4lib to hear about the latest sexy technologies. While MARC isn't sexy, many of the new tools being discussed still need to be populated with data gleaned from MARC records. MARC format designation has ramifications for search and retrieval, limits, and facets, both in the ILS and further downstream in next generation OPACs and web-scale discovery tools. Even veteran library coders will learn something from this session.

Touch Kiosk 2: Piezoelectric Boogaloo

  • Andreas Orphanides, North Carolina State University Libraries, akorphan@ncsu.edu

At the NCSU Libraries, we provide realtime access to information on library spaces and services through an interactive touchscreen kiosk in our Learning Commons. In the summer of 2012, two years after its initial deployment, I redeveloped the kiosk application from the ground up, with an entirely new codebase and a completely redesigned user interface. The changes I implemented were designed to remedy previously identified shortcomings in the code and the interface design [1], and to enhance overall stability and performance of the application.

In this presentation I will outline my revision process, highlighting the lessons I learned and the practices I implemented in the course of redevelopment. I will highlight the key features of the HTML/Javascript codebase that allow for increased stability, flexibility, and ease of maintenance; and identify the changes to the user interface that resulted from the usability findings I uncovered in my previous research. Finally, I will compare the usage patterns of the new interface to the analysis of the previous implementation to examine the practical effect of the implemented changes.

I will also provide access to a genericized version of the interface code for others to build their own implementations of similar kiosk applications.

[1] http://journal.code4lib.org/articles/5832

Wayfinding in a Cloud: Location Service for libraries

  • Petteri Kivimäki, The National Library of Finland, petteri.kivimaki@helsinki.fi

Searching for books in large libraries can be a difficult task for a novice library user. This paper presents The Location Service, software as a service (SaaS) wayfinding application developed and managed by The National Library of Finland, which is targeted for all the libraries. The service provides additional information and map-based guidance to books and collections by showing their location on a map, and it can be integrated with any library management system, as the integration happens by adding a link to the service in the search interface. The service is being developed continuously based on the feedback received from the users.

The service has two user interfaces: One for the customers and one for the library staff for managing the information related to the locations. The UI for the customers is fully customizable by the libraries, and the customization is done via template files by using the following techniques: HTML, CSS, and Javascript/jQuery. The service supports multiple languages, and the libraries have a full control of the languages, which they want to support in their environment.

The service is written in Java and it uses Spring and Hibernate frameworks. The data is stored in PostgreSQL database, which is shared by all the libraries. They do not possess a direct access to the database, but the service offers an interface, which makes it possible to retrieve XML data over HTTP. Modification of the data via admin UI, however, is restricted, and access on the other libraries’ data is blocked.

Empowering Collection Owners with Automated Bulk Ingest Tools for DSpace

  • Terry Brady, Georgetown University, twb27@georgetown.edu

The Georgetown University Library has developed a number of applications to expedite the process of ingesting content into DSpace.

  • Automatically inventory a collection of documents or images to be uploaded
  • Generate a spreadsheet for metadata capture based on the inventory
  • Generate item-level ingest folders, contents files and dublin core metadata for the items to be ingested
  • Validate the contents of ingest folders prior to initiating the ingest to DSpace
  • Present users with a simple, web-based form to initiate the batch ingest process

The applications have eliminated a number of error-prone steps from the ingest workflow and have significantly reduced a number of tedious data editing steps. These applications have empowered content experts to be in charge of their own collections.

In this presentation, I will provide a demonstration of the tools that were built and discuss the development process that was followed.

Quality Assurance Reports for DSpace Collections

  • Terry Brady, Georgetown University, twb27@georgetown.edu

The Georgetown University Library has developed a collection of quality assurance reports to improve the consistency of the metadata in our DSpace collections. The report infrastructure permits the creation of query snippets to test for possible consistency errors within the repository such as items missing thumbnails, items with multiple thumbnails, items missing a creation date, items containing improperly formatted dates, items without duplicated metadata fields, items recently added items across the repository, a community or a collection

These reports have served to prioritize programmatic data cleanup tasks and manual data cleanup tasks. The reports have served as a progress tracker for data cleanup work and will provide on-going monitoring of the metadata consistency of the repository.

In this presentation, I will provide a demonstration of the tools that were built and discuss the development process that was followed.

A Hybrid Solution for Improving Single Sign-On to a Proxy Service with Squid and EZproxy through Shibboleth and ExLibris’ Aleph X-Server

  • Alexander Jerabek, UQAM - Université du Québec à Montréal, jerabek.alexander_j@uqam.ca
  • Minh-Quang Nguyen, UQAM - Université du Québec à Montréal, nguyen.minh-quang@uqam.ca

In this talk, we will describe how we developed and implemented a hybrid solution for improving single sign-on in conjunction with the library’s proxy service. This hybrid solution consists of integrating the disparate elements of EZproxy, the Squid workflow, Shibboleth, and the Aleph X-Server. We will report how this new integrated service improves the user experience. To our knowledge, this new service is unique and has not been implemented anywhere else. We will also present some statistics after approximately one year in production.

See article: http://journal.code4lib.org/articles/7470

HTML5 Video Now!

  • Jason Ronallo, North Carolina State University Libraries, jnronall@ncsu.edu

Can you use HTML5 video now? Yes.

I'll show you how to get started using HTML5 video, including gotchas, tips, and tricks. Beyond the basics we'll see the power of having video integrated into HTML and the browser. We'll look at how to interact with video (and other time-based media) via JavaScript. Finally, we'll look at examples that push the limits and show the exciting future of video on the Web.

My experience comes from technical development of an oral history video clips project. I developed the technical aspects of the project, including video processing, server configuration, development of a public site, creation of an administrative interface, and video engagement analytics. Major portions of this work have been open sourced under an MIT license.

Hybrid Archival Collections Using Blacklight and Hydra

  • Adam Wead, Rock and Roll Hall of Fame and Museum, awead@rockhall.org

At the Library and Archives of the Rock and Roll Hall of Fame, we use available tools such as Archivists' Toolkit to create EAD finding aids of our collections. However, managing digital content created from these materials and the born-digital content that is also part of these collections represents a significant challenge. In my presentation, I will discuss how we solve the problem of our hybrid collections by using Hydra as a digital asset manager and Blacklight as a unified presentation and discovery interface for all our materials.

Our strategy centers around indexing ead xml into Solr as multiple documents: one for each collection, and one for every series, sub-series and item contained within a collection. For discovery, we use this strategy to leverage item-level searching of archival collections alongside our traditional library content. For digital collections, we use this same technique to represent a finding aid in Hydra as a set of linked objects using RDF. New digital items are then linked to these parent objects at the collection and series level. Once this is done, the items can be exported back out to the Blacklight solr index and the digital content appears along with the rest of the items in the collection.

Making the Web Accessible through Solid Design

  • Cynthia Ng from Ryerson University Library & Archives

In libraries, we are always trying our best to be accessible to everyone and we make every effort to do so physically, but what about our websites? Web designers are great at talking about the user experience and how to improve it, but what sometimes gets overlooked is how to make a site more accessible and meet accessibility guidelines. While guidelines are necessary to cover a minimum standard, web accessibility should come from good web design without ‘sacrificing’ features. While it's difficult to make a website fully accessible to everyone, there are easy, practical ways to make a site as accessible as possible.

While the focus will be on websites and meeting the Web Accessibility Guidelines WCAG, the presentation will also touch on how to make custom web interfaces accessible.

Getting People to What They Need Fast! A Wayfinding Tool to Locate Books & Much More

  • Steven Marsden, Ryerson University Library & Archives, steven dot marsden at ryerson dot ca
  • Cynthia Ng, Ryerson University Library & Archives

Having a bewildered, lost user in the building or stacks is a common occurrence, but we can help our users find their way through enhanced maps and floor plans. While not a new concept, these maps are integrated into the user’s flow of information without having to load a special app. The map not only highlights the location, but also provides all the related information with a link back to the detailed item view. During the first stage of the project, it has only be implemented for books (and other physical items), but the 'RULA Finder' is built to help users find just about anything and everything in the library including study rooms, computer labs, and staff. With a simple to use admin interface, it makes it easy for everyone, staff and users.

The application is written in PHP with data stored in a MySQL database. The end-user interface involves jQuery, JSON, and the library's discovery layer (Summon) API.

The presentation will not only cover the technical aspects, but also the implementation and usability findings.

De-sucking the Library User Experience

  • Jeremy Prevost, Northwestern University, j-prevost {AT} northwestern [DOT] edu

Have you ever thought that library vendors purposely create the worst possible user experience they can imagine because they just hate users? Have you ever thought that your own library website feels like it was created by committee rather than for users because, well, it was? I’ll talk about how we used vendor supplied APIs to our ILS and Discovery tool to create an experience for our users that sucks at least a little bit less.

The talk will provide specific examples of how inefficient or confusing vendor supplied solutions are from a user perspective along with our specific streamlined solutions to the same problems. Code examples will be minimal as the focus will be on improving user experience rather than any one code solution of doing that. Examples may include the seemingly simple tasks of renewing a book or requesting an item from another campus library.

Solr Testing Is Easy with Rspec-Solr Gem

  • Naomi Dushay, Stanford University, ndushay AT stanford DOT edu

How do you know if

  • your idea for "left anchoring" searches actually works?
  • your field analysis for LC call numbers accommodates a suffix between the first and second cutter without breaking the rest of LC call number parsing?
  • tweaking Solr configs to improve, say, Chinese searching, won't break Turkish and Cyrillic?
  • changes to your solrconfig file accomplish what you wanted without breaking anything else?

Avoid the whole app stack when writing Solr acceptance/relevancy/regression tests! Forget cucumber and capybara. This gem lets you easily (only 4 short files needed!) write tests like this, passing arbitrary parameters to Solr:

 it "unstemmed author name Zare should precede stemmed variants" do
   resp = solr_response(author_search_args('Zare').merge({'fl'=>'id,author_person_display', 'facet'=>false}))
   resp.should include("author_person_display" => /\bZare\W/).in_each_of_first(3).documents
   resp.should_not include("author_person_display" => /Zaring/).in_each_of_first(20).documents
 end
     
 it "Cyrillic searching should work:  Восемьсoт семьдесят один день" do
   resp = solr_resp_doc_ids_only({'q'=>'Восемьсoт семьдесят один день'})
   resp.should include("9091779")
 end
  
 it "q of 'String quartets Parts' and variants should be plausible " do
   resp = solr_resp_doc_ids_only({'q'=>'String quartets Parts'})
   resp.should have_at_least(2000).documents
   resp.should have_the_same_number_of_results_as(solr_resp_doc_ids_only({'q'=>'(String quartets Parts)'}))
   resp.should have_more_results_than(solr_resp_doc_ids_only({'q'=>'"String quartets Parts"'}))
 end
  
 it "Traditional Chinese chars 三國誌 should get the same results as simplified chars 三国志" do
   resp = solr_response({'q'=>'三國誌', 'fl'=>'id', 'facet'=>false}) 
   resp.should have_at_least(240).documents
   resp.should have_the_same_number_of_results_as(solr_resp_doc_ids_only({'q'=>'三国志'})) 
 end

See    http://rubydoc.info/github/sul-dlss/rspec-solr/frames    https://github.com/sul-dlss/rspec-solr

and our production relevancy/acceptance/regression tests slowly migrating from cucumber to:    https://github.com/sul-dlss/sw_index_tests

Northwestern's Digital Image Library

  • Mike Stroming, Northwestern University Library, m-stroming AT northwestern DOT edu
  • Edgar Garcia, Northwestern University Library, edgar-garcia AT northwestern DOT edu

At Northwestern University Library, we are about to release a beta version of our Digital Image Library (DIL). DIL is an implementation of the Hydra technology that provides a Fedora repository solution for discovery of and access to over 100,000 images for staff, students, and scholars. Some important features are:

  • Build custom collection of images using drag-and-drop
  • Re-order images within a collection using drag-and-drop
  • Nest collections within other collections
  • Create details/crops of images
  • Zoom, rotate images
  • Upload personal images
  • Retrieve your own uploads and details from a collection
  • Export a collection to a PowerPoint presentation
  • Create a group of users and authorize access to your images
  • Batch edit image metadata

Our presentation will include a demo, explanation of the architecture, and a discussion of the benefits of being a part of the Hydra open-source community.

Two standards in a software (to say nothing of Normarc)

  • Zeno Tajoli, CINECA (Italy), z DOT tajoli AT cineca DOT it

With this presentation I want to show how ILS Koha handles the support of three differnt MARC dialects: MARC21, Unimarc and Normarc. The main points of the presentation:

  • Three MARC at MySQL level
  • Three MARC at API level
  • Three MARC at display
  • Can I add a new format ?

Future Friendly Web Design for Libraries

  • Michael Schofield, Alvin Sherman Library, Research, and Information Technology Center, mschofied[dot]nova[dot]edu

Libraries on the web are afterthoughts. Often their design is stymied on one hand by red tape imposed by the larger institution and on the other by an overload of too democratic input from colleagues. Slashed budgets / staff stretched too thin foul-up the R-word (that'd be "redesign") - but things are getting pretty strange. Notions about the Web (and where it can be accessed) are changing.

So libraries can only avoid refabbing their fixed-width desktop and jQuery Mobile m-dot websites for so long until desktop users evaporate and demand from patrons with web-ready refrigerators becomes deafening. Just when we have largely hopped on the bandwagon and gotten enthusiastic about being online, our users expect a library's site to look and perform great on everything.

Our presence on the web should be built to weather ever-increasing device complexity. To meet users at their point of need, libraries must start thinking Future Friendly.

This overview rehashes the approach and philosophy of library web design, re-orienting it for maximum accessibility and maximum efficiency of design. While just 20 minutes, we'll mull over techniques like mobile-first responsive web design, modular CSS, browser feature detection for progressive enhancement, and lots of nifty tricks.

BYU's discovery layer service aggregator

  • Curtis Thacker, Brigham Young University, curtis.thacker AT byu DOT edu

It is clear that libraries will continue to experience rapid change based on the speed of technology. To acknowledge this new reality and to provide rapid response to shifting end user paradigms BYU has developed a custom service aggregator. At first our vendors looked at us a bit funny; however, in the last year they have been astonished with the fluid implementation of new services – here’s the short list:

  • filmfinder - a tool for browsing and searching films
  • A custom book recommender service based on checkout data
  • Integrated library services like personell, library hours, study room scheduler and database finder through a custom adwords system.
  • A very geeky and powerful utility used for converting marc XML into primo compliant xml.
  • Embedded floormaps
  • A responsive web design
  • Bing did-you-mean
  • And many more.

I will demo the system, review the archtecture and talk about future plans.

The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery

  • Michael Klein, Senior Software Developer, Northwestern University LIbrary, michael.klein AT northwestern DOT edu
  • Nathan Rogers, Programmer/Analyst, Indiana University, rogersna AT indiana DOT edu

Based on the success of the Variations digital music platform, Indiana University and Northwestern University have developed a next generation educational tool for delivering multimedia resources to the classroom. The Avalon Media System (formerly Variations on Video) supports the ingest, media processing, management, and access-controlled delivery of library-managed video and audio collections. To do so, the system draws on several existing, mature, open source technologies:

  • The ingest, search, and discovery functionality of the Hydra framework
  • The powerful multimedia workflow management features of Opencast Matterhorn
  • The flexible Engage audio/video player
  • The streaming capabilities of both Red5 Media Server (open source) and Adobe Flash Media Server (proprietary)

Extensive customization options are built into the framework for tailoring the application to the needs of a specific institution.

Our goal is to create an open platform that can be used by other institutions to serve the needs of the academic community. Release 1 is planned for a late February launch with future versions released every couple of months following. For more information visit http://avalonmediasystem.org/ and https://github.com/variations-on-video/hydrant.

The DH Curation Guide: Building a Community Resource

  • Robin Davis, John Jay College of Criminal Justice, robdavis AT jjay.cuny.edu
  • James Little, University of Illinois Urbana-Champaign, little9 AT illinois.edu

Data curation for the digital humanities is an emerging area of research and practice. The DH Curation Guide, launched in July 2012, is an educational resource that addresses aspects of humanities data curation in a series of expert-written articles. Each provides a succinct introduction to a topic with annotated lists of useful tools, projects, standards, and good examples of data curation done right. The DH Curation Guide is intended to be a go-to resource for data curation practitioners and learners in libraries, archives, museums, and academic institutions.

Because it's a growing field, we designed the DH Curation Guide to be a community-driven, living document. We developed a granular commenting system that encourages data curation community members to contribute remarks on articles, article sections, and article paragraphs. Moreover, we built in a way for readers to contribute and annotate resources for other data curation practitioners.

This talk will address how the DH Curation Guide is currently used and will include a sneak peek at the articles that are in store for the Guide’s future. We will talk about the difficulties and successes of launching a site that encourages community. We are all builders here, so we will also walk through developing the granular commenting/annotation system and the XSLT-powered publication workflow.

Solr Update

  • Erik Hatcher, LucidWorks, erik.hatcher AT lucidworks.com

Solr is continually improving. Solr 4 was recently released, bringing dramatic changes in the underlying Lucene library and Solr-level features. It's tough for us all to keep up with the various versions and capabilities.

This talk will blaze through the highlights of new features and improvements in Solr 4 (and up). Topics will include: SolrCloud, direct spell checking, surround query parser, and many other features. We will focus on the features library coders really need to know about.

Reports for the People

  • Kara Young, Keene State College, NH, kyoung1 at keene.edu
  • Dana Clark, Keene State College, NH, dclark5 at keene.edu

Libraries are increasingly being called upon to provide information on how our programs and services are moving our institutional strategic goals forward. In support of College and departmental Information Literacy learning outcomes, Mason Library Systems at Keene State College developed an assessment database to record and report assessment activities by Library faculty. Frustrated by the lack of freely available options for intuitively recording, accounting for, and outputting useful reports on instructional activities, Librarians requested a tool to make capturing and reporting activities (and their lives) easier. Library Systems was able to respond to this need by working with librarians to identify what information is necessary to capture, where other assessment tools had fallen short, and ultimately by developing an application that supports current reporting imperatives while providing flexibility for future changes.

The result of our efforts was an in-house browser interfaced Assessment Database to improve the process of data collection and analysis. The application is written in PHP, data stored in a MySQL database, and presented via browser making extensive use of JQuery and JQuery plug-ins for data collection, manipulation, and presentation. The presentation will outline the process undertaken to build a successful collaboration with Library faculty from conception to implementation, as well as the technical aspects of our trial-and-error approach. Plus: cool charts and graphs!

Network Analyses of Library Catalog Data

  • Kirk Hess, University of Illinois at Urbana-Champaign, kirkhess AT illinois.edu
  • Harriett Green, University of Illinois at Urbana-Champaign, green19 AT illinois.edu

Library collections are all too often like icebergs: The amount exposed on the surface is only a fraction of the actual amount of content, and we’d like to recommend relevant items from deep within the catalog to users. With the assistance of an XSEDE Allocation grant (http://xsede.org), we’ve used R to reconstitute anonymous circulation data from the University of Illinois’s library catalog into separate user transactions. The transaction data is incorporated into subject analyses that use XSEDE supercomputing resources to generate predictive network analyses and visualizations of subject areas searched by library users using Gephi (https://gephi.org/). The test data set for developing the subject analyses consisted of approximately 38,000 items from the Literatures and Languages Library that contained 110,000 headings and 130,620 transactions. We’re currently working on developing a recommender system within VuFind to display the results of these analyses.

Pitfall! Working with Legacy Born Digital Materials in Special Collections

  • Donald Mennerich, The New York Public Library, don.mennerich AT gmail.com
  • Mark A. Matienzo, Yale University Library, mark AT matienzo.org

Archives and special collections are being faced with a growing abundance of born digital material, as well as an abundance of many promising tools for managing them. However, one must consider the potential problems that can arise when approaching a collection containing legacy materials (from roughly the pre-internet era). Many of the tried and true, "best of breed" tools for digital preservation don't always work as they do for more recent materials, requiring a fair amount of ingenuity and use of "word of mouth tradecraft and knowledge exchanged through serendipitous contacts, backchannel conversations, and beer" (Kirschenbaum, "Breaking badflag").

Our presentation will focus on some of the strange problems encountered and creative solutions devised by two digital archivists in the course of preserving, processing, and providing access to collections at their institutions. We'll be placing particular particular emphasis of the pitfalls and crocodiles we've learned to swing over safely, while collecting treasure in the process. We'll address working with CP/M disks in collections of authors' papers, reconstructing a multipart hard drive backup spread across floppy disks, and more.

Project foobar FUBAR

  • Becky Yoose, Grinnell College, yoosebec AT grinnell DOT edu

Be it mandated from Those In A Higher Pay Grade Than You or self-inflicted, many of us deal with managing major library-related technology projects [1]. It’s common nowadays to manage multiple technology projects, and generally external and internal issues can be planned for to minimize project timeline shifts and quality of deliverables. Life, however, has other plans for you, and all your major library technology infrastructure projects pile on top of each other at the same time. How do you and your staff survive a train wreck of technology projects and produce deliverables to project stakeholders without having to go into the library IT version of the United States Federal Witness Protection Program?

This session covers my experience with the collision of three major library technology projects - including a new institutional repository and an integrated library system migration - and how we dealt with external and internal factors, implemented damage control, and overall lessening the damage from the epic crash. You might laugh, you might cry, you will probably have flashbacks from previous projects, but you will come out of this session with a set of tools to use when you’re dealing with managing mission-critical projects.

[1] Past code4lib talks have covered specific project management strategies, such as Agile, for application development. I will be focusing on and discussing general project management practices in relation to various library technology projects, many of which these strategies include in their own structures.

Implementing RFID in an Academic Library

  • Scott Bacon, Coastal Carolina University, sbacon AT coastal DOT edu

Coastal Carolina University’s Kimbel Library recently implemented RFID to increase security, provide better inventory control over library materials and enable do-it-yourself patron services such as self checkout.

I’ll give a quick overview of RFID and the components involved and then will talk about how our library utilized the technology. It takes a lot of research, time, money and not too little resourcefulness to make your library RFID-ready. I’ll show how we developed our project timeline, how we assessed and evaluated vendors and how we navigated the bid process. I’ll also talk about hardware and software installation, configuration and troubleshooting and will discuss our book and media collection encoding process.

We encountered myriad issues with our vendor, the hardware and the software. Would we do it all over again? Should your library consider RFID? Caveats abound...

Coding an Academic Library Intranet in Drupal: Now We're Getting Organizized...

  • Scott Bacon, Coastal Carolina University, sbacon AT coastal DOT edu

The Kimbel Library Intranet is coded in Drupal 7, and was created to increase staff communication and store documentation. This presentation will contain an overview of our intranet project, including the modules we used, implementation issues, and possible directions in future development phases. I won’t forget to talk about the slew of tasty development issues we faced, including dealing with our university IT department, user buy-in, site navigation, user roles, project management, training and mobile modules (or the lack thereof). And some other fun (mostly) true anecdotes will surely be shared.

The main functions of Phase I of this project were to increase communication across departments and committees, facilitate project management and revise the library's shared drive. Another important function of this first phase was to host mission-critical documentation such as strategic goals, policies and procedures. Phase II of this project will focus on porting employee tasks into the centralized intranet environment. This development phase, which aims to replicate and automate the bulk of staff workflows within a content management system, will be a huge undertaking.

We chose Drupal as our intranet platform because of its extensibility, flexibility and community support. We are also moving our entire library web presence to Drupal in 2013 and will be soliciting any advice on which modules to use/avoid and which third-party services to wrangle into the Drupal environment. Should we use Drupal as the back-end to our entire Web presence? Why or why not?

Hands off! Best Practices and Top Ten Lists for Code Handoffs

  • Naomi Dushay, Stanford University Library, ndushay AT stanford DOT edu (as mouthpiece for multiple contributors)

Transition points in who is the primary developer on an actively developing code base can be a source of frustration for everyone involved. We've tried to minimize that pain point as much as possible through the use of agile methods like test driven development, continuous integration, and modular design. Has optimizing for developer happiness brought us happiness? What's worked, what hasn't, and what's worth adopting? How do you keep your project in a state where you can easily hand it off?

How to be an effective evangelist for your open source project

  • Bess Sadler, Stanford University Library, bess@stanford.edu

The difference between an open source software project that gets new adopters and new contributing community members (which is to say, a project that goes on existing for any length of time) and a project that doesn't, often isn't a question of superior design or technology. It's more often a question of whether the advocates for the project can convince institutional leaders AND front line developers that a project is stable and trustworthy. What are successful strategies for attracting development partners? I'll try to answer that and talk about what we could do as a community to make collaboration easier.

Thoughts from an open source vendor - What makes a "good" vendor in a meritocracy?

  • Matt Zumwalt, Data Curation Experts / MediaShelf / Hydra Project, matt@curationexperts.com

What is the role of vendors in open source? What should be the position of vendors in a meritocracy? What are the avenues for encouraging great vendors who contribute to open source communities in valuable ways? How you answer these questions has a huge impact on a community, and in order to formulate strong answers, you need to be well informed. Let’s glimpse at the business practicalities of this situation, beginning with 1) an overview of the viable profit models for open-source software, 2) some of the realities of vendor involvement in open source, and 3) an account of the ins & outs of compensation & equity structures within for-profit corporations.

The topics of power & influence, fairness, community participation, software quality, employment and personal profit are fair game, along with software licensing, support, sponsorship, closed source software and the role of sales people.

This presentation will draw on personal experience from the past seven years spent bootstrapping and running MediaShelf, a small but prolific for-profit consulting company that focuses entirely on open source digital repository software. MediaShelf has played an active role in creating the Hydra Framework and continuously contributes to maintenance of Fedora and Blacklight. Those contributions have been funded through consulting contracts for authoring & implementing open source software on behalf of organizations around the world.

Occam’s Reader: A system that allows the sharing of eBooks via Interlibrary Loan

  • Ryan Litsey, Texas Tech University, Ryan DOT Litsey AT ttu.edu
  • Kenny Ketner, Texas Tech University, Kenny DOT Ketner AT ttu.edu

Occam’s Reader is a software platform that allows the transfer and sharing of electronic books between libraries via existing interlibrary loan software. Occam’s Reader allows libraries to meet the growing need to be able to share our electronic resources. In the ever-increasing digital world, many of our collection development plans now include eBook platforms. The problem with eBooks, however, is that they are resources that are locked into the home library. With Occam’s Reader we can continue the centuries-old tradition of resource sharing and also keep up with the changing digital landscape.


Using Puppet for configuration management when no two servers look alike

  • Eugene Vilensky, Senior Systems Administrator, Northwestern University Library, evilensky northwestern edu

Configuration management is hot because it allows one to scale to thousands of machines, all of which look alike, and tightly manage changes across the nodes. Infrastructure as code, implement all changes programmatically, yadda yadda yadda.

Unfortunately, servers which have gone unmanaged for a long time do not look very similar to each other. Variables come in many forms, usually because of some or all of the following: Who installed the server, where it was installed, where the image was sourced from, when it was installed, where additional packages were sourced, and what kind of software was hosted on it.

Bringing such machines into your configuration management platform is no harder and no easier than some or all of the following options options: 1) blow such machines away and start from scratch, migrate your data. 2) Find the lowest common baseline between the current state and the ideal state and start the work there. 3) implement new features/services on existing unmanaged machines but manage the new features/services.

I will describe our experiences at the library for all three options using the Puppet open-source tool on Enterprise Linux 5 and 6.

REST IS Your Mobile Strategy

  • Richard Wolf, University of Illinois at Chicago, richwolf@uic.edu

Mobile is the new hotness ... and you can't be one of the cool kids unless you've got your own mobile app ... but the road to mobility is daunting. I'll argue that it's actually easier than it seems ... and that the simplest way to mobility is to bring your data to the party, create a REST API around the data, tell developers about your API, and then let the magic happen. To make my argument concrete, I'll show (lord help me!) how to go from an interesting REST API to a fun iOS tool for librarians and the general public in twenty minutes.

ARCHITECTING ScholarSphere: How We Built a Repository App That Doesn't Feel Like Yet Another Janky Old Repository App

  • Dan Coughlin, Penn State University, danny@psu.edu
  • Mike Giarlo, Penn State University, michael@psu.edu

ScholarSphere is a web application that allows the Penn State research community to deposit, share, and manage its scholarly works. It is also, as some of our users and our peers have observed, a repository app that feels much more like Google Docs or GitHub than earlier-generation repository applications. ScholarSphere is built upon the Hydra framework (Fedora Commons, Solr, Blacklight, Ruby on Rails), MySQL, Redis, Resque, FITS, ImageMagick, jQuery, Bootstrap, and FontAwesome. We'll talk about techniques we used to:

  • eliminate Fedora-isms in the application
  • model and expose RDF metadata in ways that users find unobtrusive
  • manage permissions via a UI widget that doesn't stab you in the face
  • harvest and connect controlled vocabularies (such as LCSH) to forms
  • make URIs cool
  • keep the app snappy without venturing into the architectural labyrinth of YAGNI
  • build and queue background jobs
  • expose social features and populate activity streams
  • tie checksum verification, characterization, and version control to the UI
  • let users upload and edit multiple files at once

The application will be demonstrated; code will be shown; and we solemnly commit to showing ABSOLUTELY NO XML.

Coding with Mittens

  • Jim LeFager, DePaul University Library jlefager@depaul.edu


Working in an environment where developers have restricted access to servers and development areas, or where you are primarily working in multiple hosted systems with limited access, can be a challenge when you are attempting to incorporate any new functionality or improve an existing one. Hosted web services present a benefit so that staff time is not dedicated to server maintenance and development, but customization can be difficult and at times impossible. In many cases, incorporating any current API functionality requires additional work besides the original development work which can be frustrating and inefficient. The result can be a Frankenstein monster of web services that is confusing to the user and difficult to navigate.

This talk will focus on some effective best practices, and maybe not so great but necessary practices that we have adopted to develop and improve our user’s experience using javascript/jQuery and CSS to manipulate our hosted environments. This will include a review of available tools that allow collaborative development in the cloud, as well as examples of jQuery methods that have allowed us to take additional control of these hosted environments as well as track them using Google Analytics. Included will be examples from Springshare Campus Guides, CONTENTdm and other hosted web spaces that have been ‘hacked’ to improve the UI.


Hacking the DPLA

  • Nate Hill, Chattanooga Public Library, nathanielhill AT gmail.com
  • Sam Klein, Wikipedia, metasj AT gmail.com

The Digital Public Library of America is a growing open-source platform to support digital libraries and archives of all kinds. DPLA-alpha is available for testing, with data from six initial Hubs. New APIs and data feeds are in development, with the next release scheduled for April.

Come learn what we are doing, how to contribute or hack the DPLA roadmap, and how you (or your favorite institution) can draw from and publish through it. Larger institutions can join as a (content or service) hub, helping to aggregate and share metadata and services from across their {region, field, archive-type}. We will discuss current challenges and possibilities (UI and API suggestions wanted!), apps being built on the platform, and related digitization efforts.

DPLA has a transparent community and planning process; new participants are always welcome. Half the time will be for suggestions and discussion. Please bring proposals, problems, partnerships and possible paradoxes to discuss.

Introduction to SilverStripe 3.0

  • Ian Walls, University of Massachusetts Amherst, iwalls AT library DOT umass DOT edu

SilverStripe is an open source Content Management System/development framework out of New Zealand, written in PHP, with a solid MVC structure. This presentation will cover everything you need to know to get started with SilverStripe, including

  • Features (and why you should consider SilverStripe)
  • Requirements & Installation
  • Model-View-Controller
  • Key data types & configuration settings
  • Modules
  • Where to start with customization
  • Community support and participation

Citation search in SOLR and second-order operators

  • Roman Chyla, Astrophysics Data System, roman.chyla AT (cfa.harvad.edu|gmail.com)

Citation search is basically about connections (Is the paper read by a friend of mine more important than others? Get me a paper read by somebody who cites many papers/is cited by many papers?), but the implementation of the citation search is surprisingly useful in many other areas.

I will show 'guts' of the new citation search for astrophysics, it is generic and can be applied recursively to any Lucene query. Some people would call it a second-order operation because it works with the results of the previous (search) function. The talk will see technical details of the special query class, its collectors, how to add a new search operator and how to influence relevance scores. Then you can type with me: friends_of(friends_of(cited_for(keyword:"black holes") AND keyword:"red dwarf"))


Managing Segmented Images and Hierarchical Collections with Fedora-Commons and Solr

  • David Lacy, Villanova University, david DOT lacy AT villanova.edu

Many of the resources within our digital library are split into parts -- newspapers, scrapbooks and journals being examples of collections of individual scanned pages. In some cases, groups of pages within a collection, or segments within a particular page, may also represent chapters or articles.

We recently devised a procedure to extract these "segmented resources" into their own objects within our repository, and index them individually in our Discovery Layer.

In this talk I will explain how we dissected and organized these newly created resources with an extension to our Fedora Model, and how we make them discoverable through Solr configurations that facilitate browsable hierarchical relationships and field-collapsed results that group items within relevant resources.

Google Analytics, Event Tracking and Discovery Tools

  • Emily Lynema, North Carolina State University Libraries. ejlynema AT ncsu DOT edu
  • Adam Constabaris, North Carolina State University Libraries, ajconsta AT ncsu DOT edu

The NCSU Libraries is using Google Analytics increasingly across its website as a replacement for usage tracking via Urchin. More recently, we have also begun to use the event tracking features in Google Analytics. This has allowed us to gather usage statistics for activities that don’t initiate new requests to the server, such as clicks that hide and show already-loaded content (as in many tabbed interfaces). Aggregating these events together with pageview tracking in Google Analytics presents a more unified picture of patron activity and can help improve design of tools like the library catalog. While assuming a basic understanding of the use of Google Analytics pageview tracking, this presentation will start with an introduction to the event tracking capabilities that may be less widely known.

We’ll share library catalog usage data pulled from Google Analytics, including information about features that are common across the newest wave of catalog interfaces, such as tabbed content, Google Preview, and shelf browse. We will also cover the approach taken for the technical implementation of this data-intensive JavaScript event tracking.

As a counterpart, we can demonstrate how we have begun to use Google Analytics event tracking in a proprietary vendor discovery tool (Serials Solutions Summon). While the same technical ideas govern this implementation, we can highlight the differences (read, challenges) inherent in utilizing this type of event tracking in vendor-owned application vs. a locally developed application.

Along the way, hopefully you’ll learn a little about why you might (or might not) want to use Google Analytics event tracking yourself and see some interesting catalog usage stats.

Actions speak louder than words: Analyzing large-scale query logs to improve the research experience

  • Raman Chandrasekar, Serials Solutions, Raman DOT Chandrasekar AT serialssolutions DOT com
  • Ted Diamond, Serials Solutions, Ted DOT Diamond AT serialssolutions DOT com

Analyzing anonymized query and click through logs leads to a better understanding of user behaviors and intentions and provides great opportunities to respond to users with an improved search experience. A large-scale provider of SaaS services, Serials Solutions is uniquely positioned to learn from the dataset of queries aggregated from the Summon service generated by millions of users at hundreds of libraries around the world.

In this session, we will describe our Relevance Metrics Framework and provide examples of insights gained during its development and implementation. We will also cover recent product changes inspired by these insights. Chandra and Ted, from the Summon dev team, will share insights and outcomes from this ongoing process and highlight how analysis of large-scale query logs helps improve the academic research experience.

Supporting Gaming in the College Classroom

  • Megan O'Neill, Albion College, moneill AT albion DOT edu

Faculty are increasingly interested both in teaching with games and with gamifying their courses. Introducing digital games and game support for faculty through the library makes a lot of sense, but it comes with a thorny set of issues. This talk will discuss our library's initial steps toward creating a digital gamerspace and game support infrastructure in the library, including: 1) The scope and acquisitions decisions that make the most sense for us, and 2) Some difficulties we've discovered in trying to get our collection, physical- , digital- and head-space, and infrastructure up and going. There will also be an extremely brief overview of WHY we decided to teach with games and to support gamification, what (if anything) to do about mobile gaming, and where games in education might be going.

Codecraft

  • Devon Smith, OCLC Research, smithde@oclc.org

We can think of and talk about software development as science, engineering, and craft. In this presentation, I'll talk about the craft aspect of software. From Wikipedia[1]: "In English, to describe something as a craft is to describe it as lying somewhere between an art (which relies on talent and technique) and a science (which relies on knowledge). In this sense, the English word craft is roughly equivalent to the ancient Greek term techne." Of the questions who, what, where, why, when, and how, I will focus on why and how, with a minor in where.

N.B.: This will be a NON-TECHNICAL talk.

[1] https://en.wikipedia.org/wiki/Craft#Classification

KnowBot: A Tool to Manage Reference and Beyond

  • Sarah Park, Northwest Missouri State University
  • Hong Gyu Han, Northwest Missouri State University
  • Lori Mardis, Northwest Missouri State University

Northwest Missouri State University has developed and used RefPole for collecting and analyzing reference statistics since 2005. RefPole was a tool to answer librarians’ needs to manage reference statistics and knowledge among librarians. It was an analysis tool for the library leaders to make decisions on library operations. RefPole was adequate for the internal use; however, it was developed for local access which keeps the collective reference knowledge from being shared beyond the desktop and from being accessed by students and faculty.

In 2011, responding to growing internal and external need, the library has developed a web based knowledge base management system, KnowBot, in Ruby on Rail. KnowBot offers public searching, rating, cloud tagging, librarian, and reporting interfaces. With the additional public interfaces, it also extended reference services 24/7. Librarians can record responses to questions with graphics and multimedia. The reporting interface features not only the simple transactional data, but it also exhibits multi-dimensional analytic tool in real time.

The presenters will demonstrate KnowBot; share the source code; and discuss the use of the knowledge base to answer the organizational and public need.

Creating a (mostly) integrated Patron Account with SirsiDynix Symphony and ILLiad

  • Emily Lynema, North Carolina State University Libraries, ejlynema AT ncsu DOT edu
  • Jason Raitz, North Carolina State University Libraries, jcraitz AT ncsu DOT edu

IIn 2012, the NCSU Libraries at long last replaced a vendor “my account” tool that had been running unsupported for years. With the opportunity to create something new, one of the initial goals was a user experience that more seamlessly combined ILS data from SirsiDynix Symphony with ILL data from ILLiad. As a Kuali OLE beta partner, the NCSU Libraries is looking at an ILS migration within the next few years, so another goal was to build the interface on top of a standard so it would not have to be re-written as part of the migration. And the icing on the cake was a transition from a local Perl-based authentication system to the newer campus-wide Shibboleth authentication.

This presentation will start with our design goals for a new user interface, include a demonstration, and describe the simple techniques used to provide a more integrated view of Symphony and ILLiad patron data. The backbone of the actual application is built using Zend’s PHP Framework and integrates eXtensible Catalog’s NCIP Toolkit to reach out to Symphony for patron data. In addition, we can talk about our successes (and difficulties) using jQuery Mobile to create a mobile view using the same underlying code as the web version. As one of our first Shibboleth applications here in the Libraries, this experience also taught us first-hand about some of the challenges of this type of single sign-on.

SKOS Name Authority in a DSpace Institutional Repository

  • Tom Johnson, Oregon State University, thomas.johnson@oregonstate.edu

Name ambiguity is widespread in institutional repositories. Searching by author, users are typically greeted by a variety of misspellings and permutations of initials, collision between contributors with similar names, and other problems inherent in uncontrolled (often user-submitted) data. While DSpace has the technical capacity to use controlled names, it relies on outside authority files (from LoC, for example) to do the heavy lifting. For institutional authors, this leaves a major coverage gap and creates namespace pollution on a vast scale (try searching authorities.loc.gov for "Johnson, John", sometime).

OSU is solving this problem with an institutionally scoped, low maintenance SKOS/FOAF "name authority file". People in the IR are assigned URIs, names are maintained as skos:prefLabel, altLabel, or hiddenLabel. We've developed a simple Python application allowing staff to update individual "records", and code on the DSpace side to access the dataset over SPARQL. This presentation will walk you through where we are now, limitations we've run into, and possibilities for the future.

Meta-Harvesting: Harvesting the Harvesters

  • Steven Anderson, Boston Public Library, sanderson AT bpl DOT org
  • Eben English, Boston Public Library, eenglish AT bpl DOT org

The emerging Digital Public Library of America (http://dp.la/) has proposed to aggregate digital content for search and discovery from several regional "service hubs" that will provide metadata via an as-yet-unspecified harvest process. As these service hubs are already harvesters of digital content from myriad sources themselves, the potential for "telephone game"-esque data loss and/or transmutation is a significant danger.

This talk will discuss the experience of Digital Commonwealth (http://www.digitalcommonwealth.org/), a statewide digital repository currently in the process of being revamped, refactored, and redesigned by the Boston Public Library using the Hydra Framework. The repository, which aggregates data from over 20 institutions (some of which are themselves aggregators), is also undergoing a massive metadata cleanup effort as records are prepared to be ingested into the DPLA as one of the regional service hubs. Topics will include automated and manual processes for data crosswalking and cleanup, advanced OAI-PMH chops, and the implications of the (at this time still-emerging) metadata standards and APIs being created by the DPLA.

Every crosswalk, transformation, migration, harvest, or export/ingest of metadata requires informed decision making and precise attention to detail. This talk will provide insight into key decision points and potential quagmires, as well as a discussion of the challenges of dealing with heterogeneous data from a wide variety of institutions.

Pay No More Than £3 // DIY Digital Curation

  • Chris Fitzpatrick, World Maritime University, cf AT wmu DOT se

Are you a small library or archive?
Do you feel you are being held back by limited technical resources?
Tired of waiting around for the Google Books Library people to reply to your emails?

Join the club. Open-source software, hackerspaces, dirt cheap storage, cloud computing, and social media make it possible for any institution to start curating digitally. Today. This talk will cover some of the guerrilla tactics being employed to drag a small university's large collection into the internet age.

Topics will include:

  • Cheap and effective document scanning methods.
  • Valuable resources found at your local hackerspace / makerspace / fablab.
  • Metadata enrichment for the not-so-rich and NLP for the people.
  • Utilizing social media to crowdsource your collection building.
  • How to post-process, OCR, PDF, and ePub your documents using Free software.
  • Ways to build out a digital repository with no servers, code, or large 2-year grants required. (ok, maybe some code).

IIIF: One Image Delivery API to Rule Them All

  • Willy Mene, Stanford University Libraries, wmene AT stanford DOT edu
  • Stuart Snydman, Stanford University Libraries, snydman AT stanford DOT edu

The International Image Interoperability Framework was conceived of by a group of research and national libraries determined to achieve the holy grail of seamless sharing and reuse of images in digital image repositories and applications. By converging on common API’s for image delivery, metadata transmission and search, it is catalyzing the development of a new wave of interoperable image delivery software that will surpass the current crop of image viewers, page turners, and navigation systems, and in so doing give scholars an unprecedented level of consistent and rich access to image-based resources across participating repositories.

The IIIF Image API (http://library.stanford.edu/iiif/image-api) specifies a web service that returns an image in response to a standard http or https request. The URL can specify the region, size, rotation, quality characteristics and format of the requested image. A URL can also be constructed to request basic technical information about the image to support client applications. The API could be adopted by any image repository or service, and can be used to retrieve static images in response to a properly constructed URL.

In this presentation we will review version 1 of the IIIF image api and validator, demonstrate applications by daring early adopters, and encourage widespread adoption.

Data-Driven Documents: Visualizing library data with D3.js

  • Bret Davidson, North Carolina State University Libraries, bret_davidson@ncsu.edu

Several JavaScript libraries have emerged over the past few years for creating rich, interactive visualizations using web standards. Few are as powerful and flexible as D3.js[1]. D3 stands apart by merging web standards with a rich API and a unique approach to binding data to DOM elements, allowing you to apply data-driven transformations to a document. This emphasis on data over presentation has made D3 very popular; D3 is used by several prominent organizations including the New York Times[2], GOV.UK[3], and Trulia[4].

Power usually comes at a cost, and D3 makes you pay with a steeper learning curve than many alternatives. In this talk, I will get you over the hump by introducing the core construct of D3, the Data-Join. I will also discuss when you might want to use D3.js, share some examples, and explore some advanced utilities like scales and shapes. I will close with a brief overview of how we are successfully using D3 at NCSU[5] and why investing time in learning D3 might make sense for your library.

n Characters in Search of an Author

  • Jay Luker, IT Specialist, Smithsonian Astrophysics Data System, jluker@cfa.harvard.edu

When it comes to author names the disconnect between our metadata and what a user might enter into a search box presents challenges when trying to maximize both precision and recall [0]. When indexing a paper written by "Wäterwheels, A" a goal should be to preserve as much as possible the original information. However, users searching by author name may frequently omit the diaeresis and search for simply, "Waterwheels". The reverse of this scenario is also possible, i.e., your decrepit metadata contains only the ASCII, "Supybot, Zoia", whereas the user enters, "Supybot, Zóia". If recall is your highest priority the simple solution is to always downgrade to ASCII when indexing and querying. However this strategy sacrifices precision, as you will be unable to provide an "exact" search, necessary in cases where "Hacker, J" and "Häcker, J" really are two distinct authors.

This talk will describe the strategy ADS[1] has devised for addressing common and edge-case problems faced when dealing with author name indexing and searching. I will cover the approach we devised to not only the transliteration issue described above, but also how we deal with author initials vs. full first and/or middle names, authors who have published under different forms of their name, authors who change their names (wha? people get married?!). Our implementation relies on Solr/Lucene[2], but my goal is an 80/20 mix of high- vs. low-level details to keep things both useful and stackgnostic [3].

But, does it all still work : Testing Drupal with simpletest and casperjs

  • David Kinzer - Lead Developer, Jenkins Law Library, dkinzer@jenkinslaw.org
  • Chad Nelson - Developer, Jenkins Law Library, cnelson@jenkinslaw.org

Most developers know that they should be writing tests along with their code, but not every developer knows how or where to get started. This talk will walk through the nuts and bolts of the testing a medium-sized Drupal site with many integrated moving parts. We’ll talk about unit testing of individual functions with SimpleTest (and how that has changed how we write functions), functional testing of the user interface with casperjs. We will discuss automating deployment with phing, drush, jenkins-ci & github, which, combined with our tests, removes the “hold-your-breath” feeling before updating our live site.

Relations, Recommendations and PostgreSQL

  • William Denton, Web Librarian, York University, wdenton@yorku.ca
  • Dan Scott, Systems Librarian, Laurentian University, dscott@laurentian.ca

In 2012, a ragtag group of library hackers from various Ontario universities, funded with only train tickets and fueled with Tim Hortons coffee, assembled under the Scholars Portal banner to build a common circulation data repository and recommendation engine: the Scholars Portal Library Usage-based Recommendation Engine (SPLURGE). PostgreSQL, the emerging darling of the old-school relational database world, is the heart of SPLURGE, and the circulation data for Ontario's 400,000 university students is its blood. Two of the contributors to this effort explore the PostgreSQL features that SPLURGE uses to ease administration efforts, simplify application development, and deliver high performance results. If you don't use PostgreSQL for your data, you might want to try it after this presentation; if you already do, you'll pick up some new tips and tricks.


A Cure for Romnesia: Site Story Web-Archiving

  • Harihar Shankar, Research Library, Los Alamos National Laboratory, harihar@lanl.gov

The web changes constantly, erasing both inconvenient facts and fictions. At web-scale, preservation organizations cannot be expected to keep up by using traditional crawling, and they already miss many important versions. The cure for this is to capture the interactions between real browsers and the server, and push these into an archive for safe keeping rather than trying to guess when pages change.

Every time the Apache Web Server sends data to a browser, SiteStory’s Apache Module also pushes this data to the SiteStory Web Archive. The same version of a resource will not be archived more than once, no matter how many times it has been requested. The resulting archive is effectively representative of a server's entire history, although versions of resources that are never requested by a browser will also never be archived.

In this presentation I will give an overview of SiteStory, an Open-Source project written in Java that runs as an application under Tomcat 6 or greater. SiteStory’s Apache Module is written in C. I will also demonstrate the TimeMap tool that visualizes versions of a resource available in the SiteStory archive. The TimeMap tool is a Firefox browser extension that plots versions of a resource on a SIMILE timeline. Since the tools uses the Memento protocol, it can also display versions of resources available in Memento compliant web archives and content management systems.

Practical Relevance Ranking for 10 million books.

  • Tom Burton-West, University of Michigan Library, tburtonw@umich.edu

HathiTrust Full-text search indexes the full-text and metadata for over 10 million books. There are many challenges in tuning relevance ranking for a collection of this size. This talk will discuss some of the underlying issues, some of our experiments to improve relevance ranking, and our ongoing efforts to develop a principled framework for testing changes to relevance ranking.

Some of the topics covered will include:

  • Length normalization for indexing the full-text of book-length documents
  • Indexing granularity for books
  • Testing new features in Solr 4.0:
    • New ranking formulas that should work better with book-length documents: BM25 and DFR.
    • Grouping/Field Collapsing. Can we index 3 billion pages and then use Solr's field collapsing feature to rank books according to the most relevant page(s)?
    • Finite State Automota/Block Trees for storing the in-memory index to the index. Will this allow us to allow wildcards/truncation despite over 2 billion unique terms per index?
  • Relevance testing methodologies:Query log analysis, Click models, Interleaving, A/B testing, and Test collection based evaluation.
  • Testing of a new high-performance storage system to be installed in early 2013. We will report on any tests we are able to run prior to conference time.

Browser/Javascript Integration Testing with Ruby

  • Jessie Keck, Stanford University, jkeck at stanford dot edu

It's near impossible to build a rich web application without javascript. We have a lot of great patterns to follow, such as progressive enhancement, to make sure our rich web applications are usable, accessible, and testable. However; when javascript is involved the possibility exists that bugs can be introduced that won't get caught by most unit and integration testing frameworks.


This is where Watir (pronounced water) comes in. Watir can be used with popular ruby testing frameworks like RSpec and Capybara. This talk will show how to use the combination of these tools to write RSpec tests using Watir to spin up an application in a variety of browsers, navigate the application, and make assertions about the page using Capybara.


Tests using Watir are written in ruby but they don't necessarily need to test ruby application. You can test any application that you can point a browser at, so there are a wide variety of potential uses for tests written with Watir.

Immanentizing the Google

  • Will Sexton, Duke University Libraries, will.sexton@duke.edu
  • Sean Aery, Duke University Libraries, sean.aery@duke.edu

We're using a "Google-as-a-Service" approach to reduce the complexity and cost of maintaining a structured-data discovery platform for digitized collections and other library-generated content. Our work picks up from a paper in the code4lib Journal by NCSU's Jason Ronallo [1], introducing the idea of embedded schema.org HTML microdata for library digital collections. We've extended our schema.org/RDFa Lite implementation by using Google Site Search to develop a customized interface. In our talk, we'll demonstrate how to set up an instance of Site Search, how to customize the display of results, and how to use the platform's filtering, sorting and other useful functions. We'll also report on our analysis of usage data, and discuss our strategy for scaling the system to support global site search in an upcoming library-wide CMS migration project.

[1] "HTML5 Microdata and Schema.org", code4lib #16

Evolving Towards a Consortium MARCR Redis Datastore

  • Jeremy Nelson, Colorado College, jeremy.nelson@coloradocollege.edu
  • Sheila Yeh, University of Denver, Sheila.Yeh@du.edu

The current state of technology in library automation is not keeping pace with the explosive growth in information storage and retrieval system. The lag costs institutions as well as users’ resource discovery. To address this problem, we should look into how successfully enterprise such as Craigslist and StackOverflow manage and scale their enormous volume of data. The key lies in the Redis, a NoSQL open source advanced key-value data structure server. Therefore, Colorado College and the University of Denver, along with the Colorado Alliance of Research Libraries are exploring and co-developing a MARCR Redis Datastore. It is a peer-to-peer bibliographic datastore, modeled using the Library of Congress Bibliographic Framework's new Linked Data based MARC 21 replacement, called MARCR (MARC Resources). The structure of MARCR leads itself to an advanced Consortium catalog where a Work is cataloged once and multiple institutions have complete control over their own Instances of the Work, de-duplicating cataloging efforts while supporting real-time resource sharing between the Instances. Control, access, and discovery of records in the proposed MARCR Redis Datastore are provided through lightweight HTML5 responsive apps built with Django, Bootstrap, and KnockoutJS that also integrate with both open-source and commercial discovery products.

Redis offers many advantages for a shared MARCR bibliographic datastore, such as speed, scalability, and ease-of-deployment. Especially it can support multiple cloud models that benefits institution of various size and capital. We will demonstrate a MVP (Minimal Viable Product) iteration of this MARCR Datastore using the transformed MARC 21 records from Colorado College and the University of Denver into Redis with coordination by Colorado Alliance of Research Libraries.

Take Your Content and Shove It

  • Eric Frierson*, EBSCO Publishing, efrierson@ebscohost.com

Public services librarians have experimented getting out of the library. For example, the 'embedded librarian' model puts the librarian in class with students, offering help and advice throughout the semester at the point of need. Digital services have also found their way into virtual classrooms by way of links from the course management system (e.g., Blackboard, Moodle) and the occasional embedded search box that serves as a portal into the library's search solution.

With the release of discovery services and their associated APIs, we can do more. Rather than linking back to the library, we can take our resources and push them into the learning experience, allowing them to escape the library website silo altogether. Imagine a professor being able to search library resources and add items to their course website without ever leaving their CMS, or a student adding items to a folder that shows up in their campus dashboard. What if we could tie the use of library resources to student success in the classroom by leveraging user data from CMS tools? In this session, I will briefly describe how APIs might make these scenarios possible, but then facilitate a discussion on where else we could shove our resources. I hope to initiate a few development projects along these lines.

On Top of Discovery (All Covered with Customizations)

  • Scott Hanrath, University of Kansas Librarires, shanrath@ku.edu

On Top of Discovery (All Covered with Customizations)

How and why we've customized the front-end of our vendor library discovery system (Primo) to improve the user experience and integrate with local systems using dollops of JavaScript, a pinch of JSONP, and a smattering of both vendor and simple homegrown APIs. I'll talk about techniques for adding more AJAX to an already AJAX-intensive interface that you don't fully control (and how a few underlying changes could make it easier) and reflect on our meatball-retention odds in the event that somebody sneezes and the underlying interface changes.

Features to be discussed include improving the display of quasi-FRBRized records in search results through subtracting metadata here and adding metadata there, adding a 'did-you-mean' option in an attempt to steer users toward using Boolean operators in the way the system demands, adding fine-grained event tracking with Google Analytics, and porting existing add-ons like special collection requests, augmented stacks locations, and demand-driven acquisitions requests from our last-generation OPAC.

EAD without XSLT: A Practical New Approach to Web-Based Finding Aids

  • Trevor Thornton, New York Public Library, trevorthornton@nypl.org

The New York Public Library is reengineering its system for delivering archival finding aids on the Web. The foundation of this system is a data management application, written in Rails, within which collections and their components are managed as associated model instances, and descriptive data is stored natively as JSON and HTML. Front-end applications interact with the back-end via a flexible API that is capable of returning any part of the description at any level. This approach provides a number of benefits over the traditional XML/XSLT approach:

  • Data is stored natively in the format in which it is needed by the front-end application, making rendering much faster
  • Finding aid data can be lazy-loaded via AJAX requests
  • Enables presentation of the archival description beyond the traditional finding aid structure (alternate arrangements, visualizations, etc.)
  • Links to digital assets can be maintained independently of archival description
  • Data cleanup and normalization can be accomplished during and/or after ingest of original data into the system, ensuring data quality and consistency
  • Data is stored in a schema-neutral format, enabling easy transformation into other formats as required (e.g. RDF for semantic web applications, future version(s) of EAD schema for harvesting, etc.)

In this session I will describe the architecture of this system and its data model, and discuss the challenges presented in the design process.

Primo / Blackboard Plugin Adaptor Development at Northwestern

  • Michael North, Northwestern University Libraries, m-north@northwestern.edu

The two most visited websites on campus are the Blackboard Course Management System (CMS) site and the Library Discovery Webpage (powered by Primo). These two sites were perfect for a collaborative project to share functionality between themselves to the benefit of faculty and students.

This collaborative project (using Java, API's, x-services) was successful in integrating the Library Primo resource records and e-Shelf folders, with Blackboard's Course Documents webpages for faculty to use in organizing student's study resources. First we developed a "push" feature used to push individual resources from Primo "into" Blackboard. This is a static link . Second, we created a "pull" feature whereby an entire Primo e-Shelf folder (containing sub-folders and resource records) can be pulled "into" Blackboard. This is a dynamic link. These two functions result in the Blackboard Course Documents page having Primo functionality with either dynamic or static resource links.

This session will share an overview of the project, coding structure, and the technical hurdles that needed to be overcome to combine functionality between two major academically used application products.

Relishing Quality Assurance Testing with Cucumber

  • Joseph Dalton, The New York Public Library, josephdalton AT nypl DOT org

For those starting on a test-driven development path, the plethora of options for QA testing can also be overwhelming, ranging from writing user stories and simple acceptance tests, to running automated tests with Cucumber and Gherkin (and optionally making these more visible to stakeholders with Relish), to utilizing complex, enterprise-level tools like Quality Center to model business processes.

Although libraries are usually, and sometimes emphatically so, not profit-driven institutions, this doesn't have to mean there can't be a valid role for software quality assurance within our development environments. We've all heard "any test is better than no tests at all," but how do we effectively encourage our own institutions to embrace a test-driven development path and quality-assurance testing when, unlike businesses, our organizations generally aren't tasked with obvious quality-drivers like generating a profit, ROI, etc?

In this presentation I'll discuss some of the steps the New York Public Library has recently taken to define and develop a QA/Testing framework, in the context of the Library's recent adoption of Agile development practices for its Digital Repository and other project teams.

I woke up / fell out of bed / checked my mail / and what I read... : PHP to Java to NCIP to ...

  • John Bodfish, OCLC – bodfishj@oclc.org
  • Michelle Suranofsky, Lehigh University – mis306@lehigh.edu

The trailer: YouTube video

It's 10 a.m. and your inbox has an 'Urgent' message from the State Librarian asking for an update on the “NCIP thing” for the statewide project first mentioned (to you) yesterday. You know there’s an open source “NCIP Toolkit” which supports the variety of systems involved in your statewide project, but you’ve also heard it’s pure Java and that’s not your cuppa. Sure it supports discovery with multiple ILS types, as well as resource sharing, patron empowerment, etc. etc. but is it possible to bridge those worlds? After a few minutes of searching you have a plan for ticking-off the “multi-vendor NCIP support” box on the project requirements. We’ll demonstrate a proof-of-concept implementation for PHP developers and report on the issues we encountered and our solutions.

Powering Complicated Web Form in Rails Using XML

  • Kristopher Kelly, New York Public Library, kristopherkelly@nypl.org

The New York Public Library recently launched the first phase of its new Metadata Management System, created in-house to create MODS-based metadata for digital assets. Moving from an idiosyncratic database design, the NYPL wanted to use a more standard format. Adopting MODS and XML led to the question of how to store the data. We chose to attempt to store XML in the database and edit it through a web form. Storing bibliographic data in such a way might seem counter-intuitive, but it has proven to solve more problems than it has created.

In this session, I will discuss how we were able to power a complicated form with XML while improving usability and overall performance.

Message Queues: Event Driven Architecture for NYPL's repository platform

  • Jason Varghese, New York Public Library, jason dot varghese at nypl.org

At the New York Public Library, the digital repository continues to grow at an astonishing rate with storage soon to reach petabyte range. As an increasing amount of content is produced, generated, or acquired, workflow automation and scalability became increasingly important. Workflow involves several organizational units using multiple systems. As a result, reducing the dependencies between our various systems was an important criteria. The message queue enables us to design an event driven system built from a suite of lightweight and interoperable REST-based services. Benefits include traditional drivers such as loose coupling, interoperability between heterogeneous systems, improving application scalability, and many more benefits that will be explored in this talk.

Synching up at Web Scale: the NISO/OAI ResourceSync Effort

  • Nettie Lagace, National Information Standards Organization (NISO), nettie AT niso DOT org

It's increasingly the case that to better serve users in a dynamic Web environment, it's desirable to synchronize large-scale web resources accurately, and in real time. However, many current system designs cope with the lack of a good available solution to this requirement by de-emphasizing current coverage or by using tools to manage crawl scheduling. The NISO/OAI ResourceSync effort, funded by the Sloan Foundation and JISC, is currently designing an solution approach that is aligned with general Web Architecture and is targeted at different communities, particularly those in the areas of cultural heritage and research.

The ResourceSync working group has been under way since early 2012, and expects to have its beta draft specification available for public review and testing by the time the Code4Lib conference takes place. This talk will outline the problem cases, the technical approach and reasoning taken by the working group, and invite feedback from the Code4Lib audience.

The Care and Feeding of a Crowd

  • Shawn Averkamp, University of Iowa, shawn-averkamp at uiowa.edu
  • Matthew Butler, University of Iowa, matthew-butler at uiowa.edu

After a low-tech experiment in crowdsourced transcription grew into a surprisingly successful library initiative and demanded new commitments to user engagement, we found ourselves looking for a more efficient and user-friendly solution. We customized CHNM’s Scripto community transcription tool and various other Omeka plugins to develop a new site: DIYHistory.

We often receive questions about the technical side of both platforms, usually (to our dismay) from libraries who already assume they don't have the IT resources to pursue their own crowdsourcing initiatives. But we found that the software makes up only half of the recipe for success. Do you have compelling content? A long-term commitment to engaging with your users? Are you ready to promote your project far and wide? If so, then deploying a crowdsourcing initiative may be easier than you think.

Our very small development team, which consisted of a healthy mix of technologists and other stakeholders, worked closely and collaboratively on all aspects of the site. We’ll talk about customizing open-source software--how we scaled up functionality and scaled back design to improve user experience and production-level workflows--and how that process served to gently introduce collaborative software practices, such as using Git for version control, into a small, but agile, organization ready to grow. Finally, we'll share our transcription starter kit of forked Scipto and Omeka code and associated documentation for those interested in doing it themselves.

Linked Open Communism: Better discovery through data dis- and re- aggregation

  • Corey A Harper, New York University, corey dot harper at nyu dot edu

Current library search interfaces focus on books, journals and articles but offer little access to related entities, such as people, places, and events. These entities are generally only represented as attributes of other metadata records. Linked data can power interfaces that surface these entities as first-class resources, integrating them into results alongside library materials.

This presentation will describe research into such an interface for exploring a particular subject area: the history of the Communist Party & labor movements in the US. A triple store was seeded by 1,600 EAD records from NYU's Tamiment Library and Wagner Labor Archives. Based on access points in the finding aids, the store was further populated with data from various sources, including MARC, id.loc, VIAF, and dbpedia. Identifiers are being assigned for a wide array of typed entities, and triples can then be re-assembled into new entity "records". These new records will be loaded into a discovery interface that will allow typical keyword searching across *all* contained entities, show links between entities, and include faceting on entity types.

It is hoped that this prototype will be a model for a new kind of interface to library, archive & museum metadata targeted to particular subject domains, and could inform the development of a similar dis- and re- aggregation approach for entire library collections.

Building a Metadata Lab for LIS Students

  • Margaret Kipp, University of Wisconsin Milwaukee, kipp at uwm dot edu

Teaching metadata and linked data concepts to MLIS students requires more than creating basic metadata records, it also requires an understanding of how metadata fits into the library workflow and how data entry into metadata and cataloguing tools works in practice. We are developing a metadata lab for use in teaching information organisation related courses to MLIS students. Currently we are using open source software for the lab including Koha--ILS, Omeka--digital library tool and 4store--RDF triple store. The preliminary tools are hosted on LAMP servers and will be supplemented with additional software as we expand our lab. This presentation will report on the results of setting up the first few software packages for the lab and their use in teaching various courses including an introductory course in information organisation, a metadata course, and a course on linked data, Semantic Web and mashups. One of the goals of this session would be to discuss methods for bridging gaps between academic and practical work with metadata.


Feed - The HathiTrust Ingest Toolkit

  • Ryan Rotter, University of Michigan, rrotter AT umich DOT edu

HathiTrust has a mission of ensuring the long-term preservation and accessibility of materials in the archive. Ensuring consistency among materials from different sources is one way we do this; it ensures that tools such as large scale search and PageTurner don't need to be concerned with where the content originated from and that it will be possible to undertake format migrations in the future. To ensure consistency, we have very specific and stringent standards including (but not limited to) the following areas:

  • Item identifiers (i.e. how each individual submitted item is identified and named)
  • Package layout (file names, directory structure, etc.)
  • Image technical characteristics (file format, resolution, color depth, etc.)
  • Image metadata (scanning time, scanning artist, etc.)
  • Source METS file comprising MARC, PREMIS, package contents and structMap, optionally with page numbers and page tags

We have chosen not to accept submissions in arbitrary formats for a couple of reasons. Unfortunately we just don't have the resources to create custom transformations for all sources of content, and if we created generic transformations that could accept data in a wide variety of formats there would most likely be some data loss in the transformation.

Therefore we have chosen to provide the ingest tools to the library community as a set of building blocks to help you build and validate submission packages that meet the standards while at the same time allowing you to preserve images without loss of quality and include any metadata that you want to preserve.

Roses are ff0000, Violets are 0000ff DeLaMare is throwing a Hackathon and so should you!

  • Chrissy Klenke, University of Nevada, Reno, cklenke@unr.edu
  • Nick Crowl, University of Nevada, Reno, ncrowl@unr.edu

Hack 4 Reno is a 24-hour hackathon, where teams use local data to build applications that benefit the local community. Co-hosted by Reno Collective and the DeLaMare Science and Engineering Library, and sponsored by the City of Reno which generously provides the data, the teams, made up up of coders, designers, writers, and more, get to hack away for 24-hours, creating, collaborating, and having fun with it all: http://hack4reno.com/

The Reno Collective is Reno’s premiere co-working space for freelancers, designers, programmers, entrepreneurs, and startups. The DeLaMare Science and Engineering Library (DLM) at the University of Nevada, Reno is fast becoming the bridge between students, faculty, and members of its greater community of Reno Collective, Hack4Reno, Bridewire Makerspace, and the Code for American Reno Brigade.

Come hear about the hackathon, the projects created out of this event, and a glimpse of a few of the innovative projects created in collaboration with the DeLaMare Library. Robotics kits, 3D printers, drone quadricopters, lockpicking workshops and kits, bootcamps and 24-hour hackathons are just the start!


Stuffing the Repository: An Advanced Dive Into Object Handling in Hydra

  • Steven Anderson, Boston Public Library, sanderson AT bpl DOT org
  • Eben English, Boston Public Library, eenglish AT bpl DOT org

This topic focuses on some advanced techniques for dealing with digital objects created for a repository. While all examples presented will be in the Hydra framework, the theory of what is presented is applicable to non-Hydra solutions. Specific topics include:

  • Client side MD5 checksumming: While an Ajax file upload is fairly simple nowadays, verifying that the file doesn't become corrupted during transmission to the server is often overlooked. A method to calculate the MD5 checksum via the client browser before the file is transmitted over the network will be presented.
  • Object Modeling Inheritance: There are many different theories regarding content modeling in the wild, from "one model to rule them all" to extreme granularity. Here we will outline an approach to modeling content inspired by OOP, using specific content type classes that inherit from a set of more generic content models.
  • Hydra Models as a Rails Engine: In order to facilitate sharing of content models between multiple Hydra code bases, a completely separate and independent Ruby on Rails Engine to express content models has been developed. This unique approach offers tremendous potential for easily sharing and re-using pre-configured content models in a Hydra Head simply by installing a gem.