Difference between revisions of "2014 Prepared Talk Proposals"

From Code4Lib
Jump to: navigation, search
(PhantomJS+Selenium: Easy Automated Testing of AJAX-y UIs)
 
Line 542: Line 542:
  
 
* Martin Haye, California Digital Library, martin.haye@ucop.edu
 
* Martin Haye, California Digital Library, martin.haye@ucop.edu
** Previous Code4Lib Presentation: [http://code4lib.org/conference/2012/collett Beyond code: Versioning data with Git and Mercurial] at Code4Lib 2012 (Martin co-presenting with Stephanie Collett)
+
** Previous Code4Lib Presentation: [http://code4lib.org/conference/2012/collett Beyond code: Versioning data with Git and Mercurial] at Code4Lib 2012 (Martin co-presenting with Charlie Collett)
 
* Mark Redar, California Digital Library, mark.redar@ucop.edu
 
* Mark Redar, California Digital Library, mark.redar@ucop.edu
  

Latest revision as of 19:45, 27 May 2016

Proposals for Prepared Talks:

Prepared talks are 20 minutes (including setup and questions), and should focus on one or more of the following areas:

  • Projects you've worked on which incorporate innovative implementation of existing technologies and/or development of new software
  • Tools and technologies – How to get the most out of existing tools, standards and protocols (and ideas on how to make them better)
  • Technical issues - Big issues in library technology that should be addressed or better understood
  • Relevant non-technical issues – Concerns of interest to the Code4Lib community which are not strictly technical in nature, e.g. collaboration, diversity, organizational challenges, etc.

To Propose a Talk

  • Log in to the wiki in order to submit a proposal. If you are not already registered, follow the instructions to do so.
  • Provide a title and brief (500 words or fewer) description of your proposed talk.
  • If you so choose, you may also indicate when, if ever, you have presented at a prior Code4Lib conference. This information is completely optional, but it may assist us in opening the conference to new presenters.

As in past years, the Code4Lib community will vote on proposals that they would like to see included in the program. This year, however, only the top 10 proposals will be guaranteed a slot at the conference. Additional presentations will be selected by the Program Committee in an effort to ensure diversity in program content. Community votes will, of course, still weigh heavily in these decisions.

Presenters whose proposals are selected for inclusion in the program will be guaranteed an opportunity to register for the conference. The standard conference registration fee will still apply.

Proposals can be submitted through Friday, November 8, 2013, at 5pm PST. Voting will commence on November 18, 2013 and continue through December 6, 2013. The final line-up of presentations will be announced in early January, 2014.

Talk Proposals

Contents

Creating a new Greek-Dutch dictionary

  • Caspar Treijtel, University of Amsterdam, c.treijtel@uva.nl

At present, no complete dictionary of (ancient) Greek-Dutch is available online. A new dictionary is currently under construction at Leiden University, with software being developed at the University of Amsterdam. The team in Leiden has already begun preparation of the data, with at this moment about 6,000 approved lemmas. The ultimate goal is to produce both a print version and online open access version from the same source documents. The software needed for this has been made in a project that was funded by CLARIN-NL.

Migrator

For the production of lemmas we have implemented an advanced workflow. The (generally non-technical) users create lemmas using MS Word, which is both familiar and easy to use. We have developed a custom software module that carefully migrates the Word documents into deeply structured XML by analyzing the structure and semantics of the lemmas, and falling back on heuristics in ambiguous cases. While having initially envisioned the oXygen XML Author component as the main tool for creating new lemmas, we obtained excellent results with the migrator module, and decided therefore to continue using MS Word as the primary composition tool. The main advantage of this is that the editors are much more familiar with Word than with any other WYSIWYG editor. Lemmas that have been migrated to XML are stored in an XML database and can be further edited using oXygen XML Author.

Lemmatizer

Greek morphology is complicated. In order to use a dictionary effectively, a rather high level of initial language competence is necessary for the user to be able to relate the word form s/he finds in a text to the correct basic lemma form, where the definition of the word can be found. Using a Greek morphological database we have been able to facilitate the search for lemmas. A ‘lemmatizer’ module gives the possible parsings of the word forms and the lemmas they can be derived from. This enables the user to type in the word as found in the text and be redirected to the correct lemma.

Visualization

For the online dictionary we have implemented a visualization module that allows the user to view multiple lemmas at once. The implementation of this module has been done using the Javascript framework MooTools. The result is a viewer that performs really well and is run by maintainable Javascript code.

The online dictionary is still being worked on, have a look at http://www.woordenboekgrieks.nl/ for the beta version. A newer test version with additional features can be found here: http://angel.ic.uva.nl:8600/.

Credits

  • construction of the dictionary: Prof. Ineke Sluiter, Classics department of Leiden University; Prof. Albert Rijksbaron, University of Amsterdam
  • publisher of the dictionary: Amsterdam University Press
  • design/typesetting dictionary: TaT Zetwerk (http://www.tatzetwerk.nl/)
  • software development: Digital Production Center, University Library, University of Amsterdam
  • project funding: CLARIN-NL (http://www.clarin.nl/)
  • morphological database for use by the lemmatizer: courtesy of Prof. Helma Dik, University of Chicago (based on data of the Perseus Project)

Using Drupal to drive alternative presentation systems

  • Cary Gordon, The Cherry Hill Company, cgordon@chillco.com

Recently, we have been building systems that use angular.js, Rails, or other systems for presentation, while leveraging Drupal's sophisticated content management capabilities on the back end.

So far, these have been one-way systems, but as we move to Drupal 8 we are beginning to explore ways to further decouple the presentation and CMS functions.

A Book, a Web Browser and a Tablet: How Bibliotheca Alexandrina's Book Viewer Framework Makes It Possible

A lot of institutions around the world are engaged in multiple digitization projects aiming at preserving the human knowledge present in books and availing them through multiple channels to people around the whole globe. These efforts will sure help close the digital gap particularly with the arrival of affordable e-readers, mobile phones and network coverage. However, the digital reading experience has not yet arrived to its maximum potential. Many readers miss features they like in their good old books and wish to find them in their digital counterpart. In an attempt to create a unique digital reading experience, Bibliotheca Alexandria (BA) created a flexible book viewing framework that is currently used to access its current collection of more than 300,000 digital books in five different languages which includes the largest collection of digitized Arabic books.

Using open source tools, BA used the framework to develop a modular book viewer that can be deployed in different environments and is currently at the heart of various BA projects. The Book viewer provides several features creating a more natural reading experience. As with physical books, the reader can now personalize the books he reads by adding annotations like highlights, underlines and sticky notes to capture his thoughts and ideas in addition to being able to share the book with friends on social networks. The reader can perform a search across the content of the book receiving highlighted search results within the pages of the book. More features can be further added to the book viewer through its plugin architecture.

Structured data NOW: seeding schema.org in library systems

The semantic web, linked data, and structured data are all fantastic ideas with a barrier imposed by implementation constraints. If their system does not allow customizations, or the institution lacks skilled human resources, it does not matter how enthused a given library might be about publishing structured data... it will not happen. However, if the software in use simply publishes structured data by default, then the web will be populated for free. Really! No extra resources necessary.

This presentation highlights Dan's work with systems such as Evergreen, Koha, and VuFind to enable the publication of schema.org structured data out-of-the-box. Along the way, we reflect the current state of the W3C Schema.org Bibliographic Extension community group efforts to shape the evolution of the schema.org vocabulary. Finally, hold on tight as we contemplate next steps and the possibilities of a world where structured data is the norm on the web.

Towards Pasta Code Nirvana: Using JavaScript MVC to Fill Your Programming Ravioli

JavaScript MVC frameworks are ushering in a golden age of robust and responsive web applications that take advantage of evergreen browsers, performant JS engines, and the unprecedented reach provided by billions of personal computing devices. The web browser has emerged as the world’s most popular application runtime and the complexity[1] and scope of JavaScript applications has exploded accordingly. Server-side web frameworks like Rails and Django have helped developers adhere to best practices like modularity, dependency injection, and unit testing for years, practices that are now being applied to JavaScript development through projects like Backbone[2], Ember[3], and Angular[4].

This talk will discuss the issues JavaScript MVC frameworks are trying to solve, common features like data binding, implications for the future of web development[5], and the appropriateness of JavaScript MVC for library applications.

WebSockets for Real-Time and Interactive Interfaces

Previous Code4Lib presentations:

Watching the Google Analytics Real-Time dashboard for the first time was mesmerizing. As soon as someone visited a site, I could see what page they were on. For a digital collections site with a lot of images, it was fun to see what visitors were looking at. But getting from Google Analytics to the image or other content of what was currently being viewed was cumbersome. The real-time experience was something I wanted to share with others. I'll show you how I used a WebSocket service to create a real-time interface to digital collections views and search queries.

In the Hunt Library at NCSU we have some large video walls. I wanted to make HTML-based exhibits that featured viewer interactions. I'll show you how I converted Listen to Wikipedia [1] into an bring-your-own-device interactive exhibit. With WebSockets any HTML page can be remote controlled by any internet connected device.

I will attempt to include real-time audience participation.

[1] http://listen.hatnote.com/

Rapid Development of Automated Tasks with the File Analyzer

  • Terry Brady, Georgetown University Libraries, twb27@georgetown.edu

The Georgetown University Libraries have customized the File Analyzer and Metadata Harvester application (https://github.com/Georgetown-University-Libraries/File-Analyzer) to solve a number of library automation challenges:

  • validating digitized and reformatted files
  • validating vendor statistics for counter compliance
  • preparing collections of digital files for archiving and ingest
  • manipulating ILS import and export files

The File Analyzer application was used by the US National Archives to validate 3.5 million digitized images from the 1940 Census. After implementing a customized ingest workflow within the File Analyzer, the Georgetown University Libraries was able to process an ingest backlog of over a thousand files of digital resources into DigitalGeorgetown, the Libraries’ Digital Collections and Institutional Repository platform. Georgetown is currently developing customized workflows that integrate Apache Tika, BagIt, and Marc conversion utilities.

The File Analyzer is a desktop application with a powerful framework for implementing customized file validation and transformation rules. As new rules are deployed, they are presented to users within a user interface that is easy (and powerful) to use.

Learn about the functionality that is available for download, how you can use this tool to automate workflows from digital collections to ILS ingests to electronic resources statistics and also discuss the opportunities to collaborate on enhancements to this application!

GeoHydra: How to Build a Geospatial Digital Library with Fedora

Geographically-rich data are exploding and putting fear in those trying to tackle integrating them into existing digital library infrastructures. Building a spatial data infrastructure that integrates with your digital library infrastructure need not be a daunting task. We have successfully deployed a geospatial digital library infrastructure using Fedora and open-source geospatial software [1]. We'll discuss the primary design decisions and technologies that led to a production deployment within a few months. Briefly, our architecture revolves around discovery, delivery, and metadata pipelines using open-source OpenGeoPortal [2], Solr [3], GeoServer [4], PostGIS [5], and GeoNetwork [6] technologies, plus the proprietary ESRI ArcMap [7] -- the GIS industry's workhorse. Finally, we'll discuss the key skillsets needed to build and maintain a spatial data infrastructure.

[1] http://foss4g.org [2] http://opengeoportal.org [3] http://lucene.apache.org/solr [4] http://geoserver.org [5] http://postgis.net [6] http://geonetwork-opensource.org [7] http://esri.com

Under the Hood of Hadoop Processing at OCLC Research

Roy Tennant

  • Previous Code4Lib presentations: 2006: "The Case for Code4Lib 501c(3)"

Apache Hadoop is widely used by Yahoo!, Google, and many others to process massive amounts of data quickly. OCLC Research uses a 40-node compute cluster with Hadoop and HBase to process the 300 million MARC records of WorldCat in various ways. This presentation will explain how Hadoop MapReduce works and illustrate it with specific examples and code. The role of the jobtracker in both monitoring and reporting on processes will be explained. String searching WorldCat will also be demonstrated live.

Quick and Easy Data Visualization with Google Visualization API and Google Chart Libraries

Bohyun Kim, Florida International University, bohyun.kim@fiu.edu

  • 'No' previous Code4Lib presentations

Do most of the data that your library collects stay in spreadsheets or are published as a static table with a series of boring numbers? Do your library stakeholders spend more time collecting the data than using it as a decision-making tool because the data is presented in a way that makes it hard for them to quickly grasp its significance?

This talk will provide an overview of Google Visualization API [2] and Google Chart Libraries [3] to get you started on the way to quickly query and visualize your library data from remote data sources (e.g. a Google Spreadsheet or your own database) with (or without) cool-looking user-controls, animation effects, and even a dashboard.

Leap Motion + Rare Books: A hands-free way to view and interact with rare books in 3D

Juan Denzer, Binghamton University, jdenzer@binghamton.edu

  • 'No' previous Code4Lib presentations

As rare books become more delicate over time, making them available to the public becomes harder. We at Binghamton University Library have developed an application that makes it easier to view rare books without ever having to touch them. We have combined the Leap Motion hands-free device and 3D rendered models to create a new virtual experience for the viewer.

The application allows the user to rotate and zoom in on a 3D representation of a rare book. The user is also able to ‘open’ the virtual book and flip through it using a natural user interface. Such as swiping the hand left or right to turn the page.

The application is built on the .Net framework and is written in C#. 3D models are created using simple 3D software such as sketchup or Blender. Scans of the book cover and spine are created using simple flatbed scanners. The inside pages are scanned using overhead scanners.

This talk with discuss the technologies used in developing the application and virtually any library could implement the application with virtually no coding at all. This presentation will have a demonstration of the software and also a chance for audience members to experience the Rare Book Leap Motion App themselves.


Course Reserves Unleashed!

  • Bobbi Fox, Library Technology Services, Harvard University, bobbi_fox@harvard.edu
  • Gloria Korsman, Andover-Harvard Theological Library
    • No previous Code4Lib presentations

Hey kids! Remember when SOAP was used for something other than washing? Our sophisticated (and highly functional) Course Reserves Request system does!

However, while the system is great for submitting and processing course reserve requests, the student-facing presentation through Havard’s home-grown -- and soon to be replaced -- LMS leaves a lot to be desired.

Follow along as we leverage Solr 4 as a No-SQL database, along with more progressive RESTful API techniques, to release Reserves data into the wild without interfering with reserves request processing -- and, in the process, open up the opportunity for other schools to feed their data in as well.

We Are All Disabled! Universal Web Design Making Web Services Accessible for Everyone

Cynthia Ng, Accessibility Librarian, CILS at Langara College

  • No previous Code4Lib presentations (not counting lightning talks)

We’re building and improving tools and services all the time, but do you only develop for the “average” user or add things for “disabled” users? We all use “assistive” technology accessing information in a multitude of ways with different platforms, devices, etc. Let’s focus on providing web services that are accessible to everyone without it being onerous or ugly. The aim is to get you thinking about what you can do to make web-based services and content more accessible for all from the beginning or with small amounts of effort whether you're a developer or not.

The goal of the presentation is to provide both developers and content creators with information on simple, practical ways to make web content and web services more accessible. However, rather than thinking about putting in extra effort or making adjustment for those with disabilities, I want to help people think about how to make their websites more accessible for all users through universal web design.

Personalize your Google Analytics Data with Custom Events and Variables

Josh Wilson, Systems Integration Librarian, State Library of North Carolina - joshwilsonnc@gmail.com

At the State Library of North Carolina, we had more specific questions about the use of our digital collections than standard GA could provide. A few implementations of custom events and custom variables later, we have our answers.

I'll demonstrate how these analytics add-ons work, and why implementation can sometimes be more complicated than just adding a few lines of JavaScript to your ga.js. I'll discuss some specific examples in use at the SLNC:

  • Capturing the content of specific metadata fields in CONTENTdm as Custom Events
  • Recording Drupal taxonomy terms as Custom Variables

In both instances, this data deepened our understanding of how our sites and collections were being used, and in turn, we were able to report usage more accurately to content contributors and other stakeholders.

More on: GA Custom Events | GA Custom Variables


Behold Fedora 4: The Incredible Shrinking Repository!

Esmé Cowles, UC San Diego Library. Previous talk: All Teh Metadatas Re-Revisited (2013)

  • One repository contains untold numbers of digital objects and powers many Hydra and Islandora apps
  • It speaks RDF, but contains no triplestore! (triplestores sold separately, SPARQL Update may be involved, some restrictions apply)
  • Flexible enough to tie itself in knots implementing storage and access control policies
  • Witness feats of strength and scalability, with dramatically increased performance and clustering
  • Plumb the depths of bottomless hierarchies, and marvel at the metadata woven into the very fabric of the repository
  • Ponder the paradox of ingesting large files by not ingesting them
  • Be amazed as Fedora 4 swallows other systems whole (including Fedora 3 repositories)
  • Watch novice developers setup Fedora 4 from scratch, with just a handful of incantations to Git and Maven

The Fedora Commons Repository is the foundation of many digital collections, e-research, digital library, archives, digital preservation, institutional repository and open access publishing systems. This talk will focus on how Fedora 4 improves core repository functionality, adds new features, maintains backwards compatibility, and addresses the shortcomings of Fedora 3.

Organic Free-Range API Development - Making Web Services That You Will Actually Want to Consume

Steve Meyer and Karen Coombs, OCLC

Building web services can have great benefits by providing reusability of data and functionality. Underpinning your applications with a web service will allow you to write code once and support multiple environments: your library's web app, mobile applications, the embedded widget in your campus portal. However, building a web service is its own kind of artful programming. Doing it well requires attention to many of the same techniques and requirements as building web applications, though with different outcomes.

So what are the usability principles for web services? How do you build a web service that you (and others) will actually want to use? In this talk, we’ll share some of the lessons learned - the good, the bad, and the ugly - through OCLC's work on the WorldCat Metadata API. This web service is a sophisticated API that provides external clients with read and write access to WorldCat data. It provides a model to help aspiring API creators navigate the potential complications of crafting a web service. We'll cover:

  • Loose coupling of data assets and resource-oriented data modeling at the core
  • Coding to standards vs. exposure of an internal data model
  • Authentication and security for web services: API Keys, Digital Signing, OAuth Flows
  • Building web services that behave as a suite so it looks like the left hand knows what the right hand is doing

So at the end of the day, your team will know your API is a very good egg after all.

If accepted, the presenters intend to produce and share a Quick Guide for building a web service that will reflect content presented in the talk.

Lucene's Latest (for Libraries)

erik.hatcher@lucidworks.com

Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.

The Why and How of Very Large Displays in Libraries.

  • Cory Lown, NCSU Libraries, cwlown@ncsu.edu

Previous Code4Lib Presentations:

Built into the walls of NC State's new Hunt Library are several Christie MicroTile Display Wall Systems. What does a library do with a display that's seven feet tall and over twenty feet wide? I'll talk about why libraries might want large displays like this, what we're doing with them right now, and what we might do with them in the future. I'll talk about how these displays factor into planning for new and existing web projects. And I'll get into the fun details of how you build web applications that scale from the very small browser window on a phone all the way up to a browser window with about 14 million pixels (about 10 million more than a dual 24" monitor desktop setup).

Discovering your Discovery System in Real Time.

  • Godmar Back, Virginia Tech, gback@vt.edu
  • Annette Bailey, Virginia Tech, afbailey@vt.edu

Practically all libraries today provide web-based discovery systems to their users; users discover items and peruse or check them out by clicking on links. Unlike the traditional transaction of checking out a book at the circulation desk, this interaction is largely invisible. We have built a system that records user's interaction with Summon in real-time, processes the resulting data with minimal delay, and visualizes it in various ways using Google Charts and using various d3.js modules, such as word clouds, tree maps, and others.

These visualizations can be embedded in web sites, but are also suitable for projection via large-scale displays or projectors right into the 'Learning Spaces' many libraries are converted into. The goal of this talk is to share the technology and advocate the building of a cloud-based infrastructure that would make this technology available to any library that uses a discovery system, rather than just those who have the technological prowess for developing such systems and visualizations in-house.

Previous presentations at Code4Lib:

  • Talk: Code4Lib 2009 LibX 2.0
  • Preconference: LibX 2.0, 2009
  • Preconference: Code4Lib 2010, On Widgets and Web Services

Your Library, Anywhere: A Modern, Responsive Library Catalogue at University of Toronto Libraries

  • Bilal Khalid, Gordon Belray, Lisa Gayhart (lisa.gayhart@utoronto.ca)
  • No previous Code4Lib presentations

With the recent surge in the mobile device market and an ever expanding patron base with increasingly divergent levels of technical ability, the University of Toronto Libraries embarked on the development of a new catalogue discovery layer to fit the needs of its diverse users.

The result: a mobile-friendly, flexible and intuitive web application that brings the full power of a faceted library catalogue to users without compromising quality or performance, employing Responsive Web Design principles. This talk will discuss: application development; service improvements; interface design; and user outreach, testing, and project communications. Feedback and questions from the audience are very welcome. If time runs short, we will be available for questions and conversation after the presentation.

Note: A version of this content has been provisionally accepted as an article for Code4Lib Journal, January 2014 publication.)

All Tiled Up

  • Mike Graves, MIT Libraries (mgraves@mit.edu)

You've got maps. You even scanned and georeferenced them. Now what? Running a full GIS stack can be expensive, and overkill in some cases. The good news is that you have a lot more options now than you did just a few years ago. I'd like to present some lighter weight solutions to making georeferenced images available on the Web.

This talk will provide an introduction to MBTiles. I'll go over what they are, how you create them, how you use them and why you would use them.

The Great War: Image Interoperability to Facebook

  • Rob Sanderson, Los Alamos National Laboratory (azaroth42@gmail.com)
  • Rob Warren, Carleton University
    • No previous presentations

Using a pipeline constructed from Linked Open Data and other interoperability specifications, it is possible to merge and re-use image and textual data from distributed library collections to build new, useful tools and applications. Starting with the OAI-PMH interface to ContentDM, we will take you on a tour through the International Image Interoperability Framework and Shared Canvas, to a cross-institutional viewer, and image analysis for the purposes of building a historical Facebook from finding and tagging people in photographs. The World War One collections are drawn from multiple institutions and merged by the machine learning code.

The presentation will focus on the (open source) toolchain and the benefits of the use of standards throughout: OAI-PMH to get the metadata, IIIF for interaction with the images, the Shared Canvas ontology for describing collections of digitized objects, Open Annotation for tagging things in the images and specialized ontologies that are specific to the contents. The tools include standard RDF / OWL technologies, JSON-LD, imagemagick and OpenCV for image analysis.

Visualizing Solr Search Results with D3.js for User-Friendly Navigation of Large Results Sets

  • Julia Bauder, Grinnell College Libraries (bauderj-at-grinnell-dot-edu)
  • No previous presentations at national Code4Lib conferences

As the corpus of articles, books, and other resources searched by discovery systems continues to get bigger, searchers are more and more frequently confronted with unmanageably large numbers of results. How can we help users make sense of 10,000 hits and find the ones they actually want? Facets help, but making sense of a gigantic sidebar of facets is not an easy task for users, either. During this talk, I will explain how we will soon be using Solr 4’s pivot queries and hierarchical visualizations (e.g., treemaps) from D3.js to let patrons view and manipulate search results. We will be doing this with our VuFind 2.0 catalog, but this technique will work with any system running Solr 4. I will also talk about early student reaction to our tests of these visualization features.

PeerLibrary – open source cloud based collaborative library

  • Mitar Milutinovic, UC Berkeley, mitar.code4lib at tnode.com
  • Not presented or attended code4lib before

PeerLibrary is a new open source project and a cloud service providing collaborative reading, sharing and storing. Users can upload publications they want to read (currently in PDF format), read them in the browser in real-time with others, highlight, annotate and organize their own or collaborative library. PeerLibrary provides a search engine to search over all uploaded open access publications. Additionally, it aims to collaboratively aggregate the open layer of knowledge on top of this publications through public annotations and references user will add to publications. In this way publications would not just be available to read, but accessible to the general public as well. Currently, it is aiming at scientific community and scientific publications.

See screencast here. Subscribe to newsletter to be a beta tester when we open.

It is still in development and beta launch is planned at the end of November.

Who was where when, or finding biographical articles on Wikipedia by place and time

  • Emily Morton-Owens, The Seattle Public Library (presenting on work from NYU)
  • No previous c4l presentations

It's easy to answer the question "What important people were in Paris in 1939?" But what about Virginia in the 1750s or Scandinavia in the 14th century? I created a tool that allows you to search for biographies in a generally applicable way, using a map interface. I would like to present updates to my thesis project, which combines a crawler written in Java that extracts information from Wikipedia articles, with a MongoDB data store and a frontend in Python.

The input to the project is freetext of entire articles in Wikipedia; this is important to allow us to pick up Benjamin Franklin not just in the single most obvious place of Philadelphia but also in London, Paris, Boston, etc. I can talk about my experiments disambiguating place names (approaches pioneered on newspaper articles were actually unhelpful on this type of text) and setting up a processing queue that does not become mired in the biographies of every human who ever played soccer. I also want to mitigate some of the implementation choices I made due to my academic deadline and improve the accuracy/usability.

What I hope to show is that I was able to develop a novel and useful reference tool automatically, using fairly simple heuristics that are a far cry from hand-cataloging familiar to many librarians.

You can try out the original version (this server is inconveniently set to be updated/rebooted on 11/8--may be temporarily unavailable)

Good!, DRY, and Dynamic: Content Strategy for Libraries (Especially the Big Ones)

  • Michael Schofield, Nova Southeastern University Libraries, mschofield@nova.edu
  • No previous code4lib presentations.

The responsibilities of the #libweb are exploding [it’s a good thing] and it is no longer uncommon for libraries to manage or even home-grow multiple applications and sites. Often it is at this point where the web people begin to suffer the absence of a content strategy when, say, business hours need to be updated sitewide a half-dozen times.

We were already feeling this crunch when we decided to further complicate the Nova Southeastern University Libraries by splitting the main library website into two. The Alvin Sherman Library, Research, and Information Technology Center is a unique joint-use facility that serves not only the academic community but the public of Broward County - and marketing a hyperblend of content through one portal just wasn't cutting it. With a web team of two, we knew that managing all this rehashed, disparate content was totally unsustainable.

I want to share in this talk how I went about making our library content DRY (“don’t repeat yourself”): input content in one place--blurbs, policies, featured events, featured databases, book reviews, business hours, and so on.--and syndicate it everywhere - even, sometimes, dynamically target that content for specific audiences or context. It is a presentation that is a little about workflow, a little more about browser and context detection, a tangent about content-modeling the CMS, and a lot about APIs, syndication, and performance.

No code, no root, no problem? Adventures in SaaS and library discovery

In 2012 VCU was an eager early adopter of Ex Libris' cloud service Alma as an ILS, ERM, link resolver, and single-stop, de-silo'd public-facing discovery tool. This has been a disruptive change that has shifted our systems staff's day-to-day work, relationships with others in the library, and relationships with vendors.

I'll share some of our experiences and takeaways from implementing and maintaining a cloud service:

  • Seeking disruption and finding it
  • Changing expectations of service and the reality of unplanned downtime
  • Communication and problem resolution with non-IT library staff
  • Working with a vendor that uses agile development methodology
  • Benefits and pitfalls of creating customizations and code workarounds
  • Changes in library IT/coders' roles with SaaS

...as well as thoughts on the philosophy of library discovery vs real-life experiences in moving to a single-search model.

Building for others (and ourselves): the Avalon Media System

Avalon Media System is a collaborative effort between development teams at Northwestern and Indiana Universities. Our goal is to produce an open source media management platform that works well for us, but is also widely adopted and contributed to by other institutions. We believe that building a strong user and contributor community is vital to the success and longevity of the project, and have developed the system with this goal in mind. We will share lessons learned, pains and successes we’ve had releasing two versions of the application since last year.

Our presentation will cover our experiences:

  • providing flexible, admin-friendly distribution and installation options
  • building with abstraction, customization and local integrations in mind
  • prioritizing features (user stories)
  • attracting code contributions from other institutions
  • gathering community feedback
  • creating a product rather than a bag of parts

How to check your data to provide a great data product? Data quality as a key product feature at Europeana

  • Péter Király portal backend developer, Europeana
  • No previous C4L presentations

Europeana.eu - Europe's digital library, archive and museum - aggregates more than 30 million metadata records from more than 2200 institutions. The records come from libraries, archives, museums and every other kind of cultural institution, from very different systems and metadata schemas, and are typically transformed several times until they are ingested into the Europeana data repository. Europeana builds a consolidated database from these records, creating reliable and consistent services for end-users (a search portal, search widget, mobile apps, thematic sites etc.) and an API, which supports our strategic goeal of data for reuse in education, creative industries, and the cultural sector. A reliable "data product" is thus at the core of our own software products, as well as those of our API partners.

Much effort is needed to smooth out local differences in the metadata curation practice of our data providers. We need a solid framework to measure the consistency of our data and provide feedback to decision-makers inside and outside the organisation. We can also use this metrics framework to ask content providers to improve their own metadata. Of course, a data-quality-driven approach requires that we also improve the data transformation steps of the Europeana ingestion process itself. Data quality issues heavily define what new features we are able to create in our user interfaces and API, and might actually affect the design and implementation of our underlying data structure, the Europeana Data Model.

In the presentation I briefly describe the Europeana metadata ingestion process, show the data quality metrics, the measuring techniques (using the Europeana API, Solr and MongoDB queries), some typical problems (both trivial and difficult ones), and finally the feedback mechanism we propose to deploy.

Keywords: Europeana, data quality, EDM, API, Apache Solr, MongoDB, #opendata, #openglam

Teach your Fedora to Fly: scaling out a digital repository

  • Aaron Coburn, Software Developer, Amherst College
  • No previous C4L presentations

Fedora is a great repository system for managing large collections of digital objects, but what happens when a popular food magazine begins directing a large number of readers to a manuscript showing Emily Dickinson’s own recipe for doughnuts? While Fedora excels in its support of XML-based metadata, it doesn’t always perform well under a high volume of traffic. Nor is it especially tolerant of network or hardware failures.

This presentation will show how we are making heavy use of a fedora repository while at the same time insulating it almost entirely from any web traffic. Starting with a distributed web front-end built with Node.js, and caching most of the user-accessible content from Fedora in an elastic, fault-tolerant Riak (NoSQL) cluster, we have eliminated nearly all single points of failure in the system. It also means that our production system is spread across twelve separate servers, where asynchrony and Map-Reduce are king. And aside from being blazing fast, it is also entirely Hydra-compliant.

Furthermore, we will attempt to answer the question: if fedora crashes and the visitors to your site don’t notice, did it really fail?


Using Open Source Software and Freeware to Preserve and Deliver Digital Videos

  • Wei Fang, Head of Digital Services, Rutgers University Law Library
  • Jiebei Luo, Digital Projects Initiative Intern, Rutgers University
  • No previous C4L presentations

The Rutgers University Law Library is the official digital repository of the New Jersey Supreme Court oral arguments since 2002. This large video collection contains approximately 3,000 videos with a total of 400 GB or 6,000 viewing hours. With the expansion of this collection, the existing database and the static website could not efficiently support the library’s daily operations and meet its patrons’ search needs. By utilizing open source software and freeware such as Ubuntu, FFmpeg, Solr and Drupal, the library is able to develop a complete solution to re-encoding videos, embedding subtitles, incorporating Solr search engine and content management system to support full-text subtitle search, automatically updating video metadata records in the library catalog system and eventually providing a plug-in free HTML 5-based Web interface for patrons to view the videos online. The aspects below will be presented in detail at the conference:

  • Video codecs comparison
  • Server-end batch video encoding/re-encoding
  • HTML 5 video tag and embedding subtitles
  • Incorporating search engine Solr and content management tool Drupal with the database to retrieve videos by full-text search especially in subtitle files
  • Incorporating video metadata with the library catalog system

Shared Vision, Shared Resources: the Curate Institutional Repository

Curate is being collaboratively developed by several institutions in the Hydra community who share the need and vision for a Fedora-backed Institutional Repository. The first release of Curate was a collaboration between Notre Dame and Northwestern University, along with Digital Curation Experts (DCE) - a vendor hired jointly by our two institutions. Powered by the Hydra engine Sufia, the team worked quickly to release the first version of Curate in October 2013 which provides a basic self-deposit system that has support for various content types, collection building, DOI minting, and user profile creation. From the very beginning we have built Curate to be easy to theme and extend in order to ease the process of installation and use by other institutions.

In December 2013, additional partners will join the project including: Indiana University, the University of Cincinnati and the University of Virginia. Each institution contributes resources to the project in order to further our common goal to create a product that fits our needs and has a sustainable future.Together we will tackle additional content types (like complex data, software, media), administrative collections and more.

Our presentation will include:

  • a brief demonstration of Curate and technical overview
  • why and how we work together
  • why build Curate
  • the future of the project

Solr, Cloud and Blacklight

  • David Jiao, Library Information Systems, Indiana University at Bloomington, djiao@indiana.edu
    • No previous code4lib presentations

SolrCloud refers to the distributed capabilities in Solr4. It is designed to offer a highly available, fault tolerant environment by organizing data into multiple pieces that can be hosted on multiple machines with replicas, and providing a centralized cluster configuration and management.

At Indiana University, we are upgrading our Solr backend for our recently released Blacklight-based OPAC system from Solr 1.4 to Solr4, and we also put up efforts to build a private cloud of Solr4 servers. In this talk, I will persent certain features of SolrCloud, including distributed requests, fault tolerance, near real time indexing/searching, and configuration management with Zookeeper, and our experiences of utilizing these features to provide better performance and architecture for our OPAC system, which serves over 7 million bibliographic records to over 100 thousand students and faculty members. I will also discuss some practical lessons learned from our SolrCloud setup/upgrade and the integration of the new SolrCloud to our customized Blacklight system.

Leveraging XSD's for Reflective, Live Dataset Support in Institutional Repositories

  • Mark Sullivan, Library Information Technology, University of Florida
    • No previous code4lib presentations

The University of Florida Libraries are currently adding support for active datasets into our METS-based institutional repository software. This ongoing project enables the library to be a partner in current, or long-running, data-driven projects around the university by providing tangible short-term and long-term benefits to the projects. The system assists project teams by storing and providing access to their data, while supporting online filtering and sorting of the data, custom queries, and adding and editing of the data by authorized users. We are also exploring simple data visualizations to allow users to perform basic graphical and geographic queries. Several different schemas were explored including DDI and EML, but ultimately the streamlined approach of using XSD's with some custom attributes was chosen, with all other data residing in the METS file portions. Currently the system is being developed using XSD's describing XML datasets, but this model should easily scale to support SQL datasets or large datasets supported by Hadoop or iRODS.

This work is being integrated in the open source SobekCM Digital Content Management System which is built on a pair-tree structure of METS resources with rich metadata support including DC, MODS, MARC, VRACore, DarwinCore, IEE-LOM, GML/KML, schema.org microdata, and many other standard schemas. The system has emphasized online, distributed creation and maintenance of resources including geo-placement and geographic searching of resources, building structure maps (table of contents) visually online, and a broad suite of curator tools.

This work is presented as a model which could be implemented in other systems as well. We will demonstrate current support and discuss our upcoming roadmap to provide complete support.

Dead-simple Video Content Management: Let Your Filesystem Do The Work

  • Andreas Orphanides, NCSU Libraries (akorphan (at) ncsu.edu)
    • (never led or soloed a C4L presentation)

Content management is hard. To keep all the moving parts in order, and to maintain a layer of separation between the system and content creators (who are frequently not technical experts), we typically turn to content management systems like Drupal. But even Drupal and its kin require significant overhead and present a not inconsiderable learning curve for nontechnical users.

In some contexts it's possible -- and desirable -- to manage content in a more streamlined, lightweight way, with a minimum of fuss and technical infrastructure. In this presentation I'll share a simple MVC-like architecture for managing video content for playback on the web, which uses a combination of Apache's mod_rewrite module and your server's filesystem structure to provide an automated approach to video content management that's easy to implement and provides a low barrier to content updates: friendly to content creators and technology implementors alike. Even better, the basic method is HTML5-friendly, and can be integrated into your favorite content management system if you've got permissions for creating templates.

In the presentation I'll go into detail about the system structure and logic required to implement this approach. I'll detail the benefits and limitations of the system, as well as the challenges I encountered in developing its implementation. Audience members should come away with sufficient background to implement a similar system on their own servers. Implementation documentation and genericized code will also be shared, as available.

Managing Discovery

  • Andrew Pasterfield, Senior Programmer/Systems Analyst, University of Calgary Library, ampaster@ucalgary.ca
    • No previous code4lib presentations

In fall 2012 the University of Calgary Library launched a new home page that incorporated a Summon powered Single Search Box with customized “bento box” results display. Search at the U of C now combines a range of metadata sources for discovery and customized mapping of a database recommender and LibGuide into a unified display. Further customizations include a non Google Analytics/non proxy method to log clicks.

This presentation will discuss the technical details of bringing the various systems together into one display interface to increase discovery at the U of C Library.

http://library.ucalgary.ca


Sorting it out: a piece of the User Centered Design Process

  • Cindy Beggs, Akendi, cindy@akendi.com

This talk is about how to apply a user centered design methodology to the process of creating an information architecture. Participants learn the fundamentals of UCD and how card sorting and reverse card sorting enable us to isolate the content we present on screen from the layouts and visuals of those screens. We talk about ways to identify who will be using the information architecture you are creating and why we need to know how it will be used.

What will attendees takes away from your talk? The criticality of involving “real” end users in the process of creating an information architecture. The basics of following a user-centered-design process in the creation of best in class, content-rich, digital products.

Cindy Beggs has been working in the “information industry” for over 25 years. A librarian by profession, she has spent decades helping users figure out how to find their way through large bodies of content. Her insights into how people seek information, her empathy for those who find it a challenge and her practical experience helping organizations figure out how to best structure their content contribute to her success as an information architect with both clients and trainees. (http://www.akendi.com/aboutus/management/)


Implementation of ArchivesSpace in University of Richmond

  • Birong Ho, bho@richmond.edu

University of Richmond implemented its archive collection management ArchivsSpace in the fall, 2013. As a charter member and the Head of Special Collection as the Board member, implementation of such an Open Source Software became a priority.

Several aspects of implementation will be addressed in the talk. Among them, they are Collections and Repository, storage layer including data format, System resources requirements, Technical architecture, Customization, scaling and integrated with other systems in the library.

The customization, scale, and integration with other systems such as Archeon and Exist on campus became a concern will be focused and elaborated in the talk.

Easy Wins for Modern Web Technologies in Libraries

  • Trey Terrell, Analyst Programmer, Oregon State University
    • No previous Code4Lib presentations

Oregon State University is currently implementing an updated version of its room reservation system. In its development we've come across and implemented a variety of "easy wins" to make it more responsive, easier to maintain, less expensive to run, and just cooler to experience. While our particular system was in Ruby on Rails, this talk will address general methods and example utilities which can be used no matter your stack.

I'll be talking about things like cache management, reverse proxies, publish/subscribe servers, WebSockets, responsive design, asynchronous processing, and keeping complicated stacks up and running with minimal effort.

Implementing Islandora at a Small Institution

  • Megan Kudzia, Albion College Library
  • Eddie Bachle, Albion College IT
    • No previous Code4Lib presentations

Albion College (and particularly the Library/Archives and Special Collections) has a variety of needs which could be met by an open-source Institutional Repository system. Several months and lots of conversations later, we’re continuing to troubleshoot our way through Islandora. We’d like to talk about what has worked for us, where our frustrations have been, whether it’s even possible to install and develop a system like this at a small institution, and where the process has stalled.

As of right now, we do have a semi-working installation. We’re not sure when it will be ready for our end users, but we'll talk about our development process and evaluate our progress. Contributions also by Nicole Smeltekop, Albion College Archives & Special Collections


PhantomJS+Selenium: Easy Automated Testing of AJAX-y UIs

  • Martin Haye, California Digital Library, martin.haye@ucop.edu
  • Mark Redar, California Digital Library, mark.redar@ucop.edu

Web user interfaces are demanding ever-more dynamism and polish, combining HTML5, AJAX, lots of CSS and jQuery (or ilk) to create autocomplete drop-downs, intelligent buttons, stylish alert dialogs, etc. How can you make automated tests for these highly complex and interactive UIs?

Part of the answer is PhantomJS. It’s a modern WebKit browser that’s “headless” (meaning it has no display) that can be driven from command-line Selenium unit tests. PhantomJS is dead simple to install, and its blazing speed and server-friendliness make continuous integration testing easy. You can write UI unit tests in {language-of-your-choice} and run them not just in PhantomJS but in Firefox and Chrome, plus a zillion browser/OS combinations at places like SauceLabs, TestingBot and BrowserStack.

In this double-team live code talk, we’ll explain all that while we demonstrate the following in real time:

  • Start with nothing.
  • Install Selenium bindings for Ruby and Python.
  • In each language write a small test of an AJAX-y UI.
  • Run the tests in Firefox, and fix bugs (in the test or UI) as needed.
  • Install PhantomJS.
  • Show the same tests running headless as part of a server-friendly test suite.
  • (Wifi permitting) Show the same tests running on a couple different browser/OS combinations on the server cloud at SauceLabs – talking through a tunnel to the local firewalled application.

New Technologies, Collaboration, & Entrepreneurship in Libraries: Harnessing Their Power to Help Your Library

  • Stephanie Walker – swalker@brooklyn.cuny.edu
  • Howard Spivak – howards@brooklyn.cuny.edu
  • Alex - Alex@brooklyn.cuny.edu

Academic libraries are caught in budget squeezes and often struggle to find ways to communicate value to senior administration and others. At Brooklyn College Library, we have taken an unusual, possibly unique, approach to these issues. Our technology staff have long worked directly with librarians to develop products that meet library, faculty, and student needs, and we have shared many of our products with colleagues, including an award-winning website, e-resource, and content management system we call 4MyLibrary, which we shared for free with 8 CUNY colleges, and also an easy-to-use book scanner, which has proven overwhelming popular with students, faculty, other librarians, and numerous campus offices. Recently, motivated by budget cuts, we decided that what worked for us might interest other libraries, and working with our Office of Technology Commercialization, we started selling 2 products: our book scanners (at half the price of commercial alternatives), and a hosting service, whereby we could host and support 4MyLibrary for libraries with minimal technology staff. Both succeeded, and yielded major benefits: a steady revenue stream and the admiration and serious goodwill of our senior administration and others. However, this presentation is neither a basic how-to, nor an advertisement. With this presentation, we hope to spur a conversation for broader collaboration, especially regarding new technologies, among libraries. We all have some level of technical expertise, most of us are struggling with rising prices and tight budgets, and many of us are unhappy with various technology products we use, from scanners to our ILS. We believe – and can demonstrate – that with collaboration, we can solve many of our problems, and provide better services to boot.

Identifiers, Data, and Norse Gods

ORCID and DataCite provide stable identifiers for researchers and and data, respectively. Each system does a fine job of providing value to its users. But wouldn't it be great if they could link their systems to create something much more powerful? Perhaps even as powerful as a god?

Enter ODIN, The ORCID and DataCite Interoperability Network. ODIN is a two-year project to unleash the power of persistent identifiers for researchers and the research they create. This talk will present recent work from the ODIN project, including several tools that can unleash the godlike power of identifiers at your institution. Current tools include:

  • Metadata generator tool: allows repository staff to create DataCite metadata with embedded ORCIDs.
  • Claiming tool: assists researchers in claiming their work within the ORCID system.
  • ORCID-feed: includes a list of ORCID works on any web page.
  • ODIN's HAMR: assists in populating a DSpace repository with ORCIDs. Based on work from a Code4Lib hackathon!

Armed Bandits in the Digital Library

Many of us are using the excellent Lucene library (or SOLR appliance) to provide search functionality. These systems contain number of features to adjust relevancy ranking of hits, but we may not know how to use them. In this presentation, I'll present the available options - eg. what is the default ranking 'Vector space model, what are the alternatives (eg. BM25) and what are the other options we have to tweak and adjust the ranking of the hits (eg. boost factors, functions). But even if we know how to deploy these adjustments and tweaks, we are still left in dark. We do not know whether the change we've just rolled out had a significant (statistically significant) effect or maybe it was just a waste of time and resources? A/B testing is one option, but there may be a much better one - so called "Multi-Armed Bandits Approach". And in this talk I'd like to show how we are experimenting with this strategy to adjust ADS search engine.

Building Worker Queues with AWS and Resque

A common task in larger systems is to be able to process large input files automatically. Often users can drop those files into a shared directory on AWS or on NFS or another shared drive. Those files need to be processed and potentially integrated into a system. This task has come up recently in the University of Virginia libraries in allowing users to add GIS data to the system and in setting up a system for the Academic Preservation Trust (http://aptrust.org/) that ingests files and resources into the preservation system.

This system is built by loosely coupling a number of different technologies. This allows us to easily interoperate and communicate between different system and programming environments. Because the interfaces are well defined, it’s also fairly simple to switch out technologies as the requirements of the system change.

The process is fairly simple:

First, a Ruby daemon monitors an AWS S3 bucket that others can upload new files into. This daemon creates a Resque status task, adds a marker for the task in a database, and continues monitoring.

Second, Resque mediates incoming job requests and routes them to the appropriate workers which may be in Java, Go, or Ruby. The diversity of technologies that Resque can manage allows great latitude to leverage the appropriate tool for a specific job. While processing, it updates the status for that job and coordinates processing with other jobs.

Finally, a page that is integrated into a larger Rails app provides a novice-user-friendly view of the status of the workers and allows basic tasks such as restarting the job.

This architecture allows us to swap in the technology that best fits each part of the process, and it makes it easier to maintain the system. We use this to integrate and coordinate between tasks handled in Java, Ruby, and Go, and it provides an effective way to interoperate with these programming languages and the respective strengths that they bring to this system.


Sustaining your Open Source project through training

  • Bess Sadler (Stanford University Libraries) and Mark Bussey (Data Curation Experts) will discuss their experiences developing and delivering training for Project Hydra.

Topics covered:

  • Working practices for developing training materials
  • Sharing the work when there are no dedicated resources
  • Inviting community (and student) input to create higher quality content
  • Strategies to keep training docs up-to date
  • Strategies to make training materials available to the widest-possible audience
  • Using surveys (Survey Monkey) to assess the effectiveness of your training program


Piwik: Open source web analytics

While Google Analytics is synonymous with Web Analytics, fortunately today we have many other good options, and one option is Piwik, piwik.org a simple to install, open-source PHP/MySQL application with a tracking script that will sit alongside Google Analytics tracking the usual clicks, events and variables. In this presentation, I'd like to cover the usual analytics topics and also cover what makes Piwik powerful, such as importing and visualizing web logs from any system to incorporate both past and future data, easily tracking downloads, and the ability to write your own reports or dashboard. The visitor log data is stored securely on your own server so you have control over who looks at the data and how much or how little to keep. With an active and helpful developer community, Piwik has the potential for analytics which makes sense for libraries, not e-commerce.


Next Generation Catalogue - RDF as a Basis for New Services

  • Anne-Lena Westrum – digitalutvikling@gmail.com
  • Benjamin Rokseth
  • Asgeir Rekkavik
  • Petter Goksøyr Åsen

Oslo Public Library has converted the entire MARC-catalogue to RDF via the self-made conversion tool MARC2RDF.

data.deichman.no, the enriched RDF version of the library catalogue including its authority files, forms the basis for two different mashups; The Active shelf and the Book recommendation database. The RDF catalogue is linked with various content and the dataset is updated daily to account for additions, deletions and changes made in the MARC catalogue.

The Active shelf is a physical touchscreen device that makes use of open source software, RFID technology, RDF data and external web service APIs to provide information about any library book a patron is curious to know more about.

The Book recommendations database stores book recommendations written by library staff from all over Norway and links them to the RDF-representation of the MARC-catalogue.

Economics of Scale: Thinking about Metadata Quality and Completeness for Fun and Profit

  • William Hicks, University of North Texas (William.hicks@unt.edu)

The UNT Libraries Digital Collections constitute three internet gateways, The Portal to Texas History, UNT Digital Library, and the Gateway to Oklahoma History, making available to the public a wide range of materials, from photographs and newspapers, to dissertations and recordings of music ensemble performances. The collections disseminate over 500,000 unique items, that were used over 9 millions times last year and with growth trends in both areas shows no signs of slowing.

As the size and scope of our collections has grown, so to has a pressing need to think clearly about the quality of our metadata, the completeness of our records, and the most efficient way of doing metadata entry. Not surprisingly there have been a few things written on the subject and so over the last few months we’ve started writing new code and getting the infrastructure of our metadata editing system to a place where we can begin to test these ideas on our ever expanding dataset. What kinds of questions are we looking to answer, and what types of tools are we building? That’s what this talk will be all about, but here are a few ideas to ponder:

  • What kinds of tools have we built, or can we employ to standardize data entry and aid the user in their input needs?
  • How close does a metadata record come to a “completeness” standard? What does that even look like? What are the implications when we look at such a standard at scale?
  • If we can identify what we think a “quality” metadata record “is”, historically speaking, how close do we get to that ideal?
  • Does an item’s history matter? Can we quantify it and locate value in change through time?
  • What are the economic costs of metadata entry? If we have enough quantifiable measures about the types of objects in our systems, and we can profile our data entry personnel, what can this say about optimizing staff time and return on investment?
  • What sort of priorities are we setting for ourselves when we treat all items as equal, when clearly some types of materials get vastly more use by the public.
  • Finally what kinds of analysis tools might we develop to gauge our overall metadata “health,” to steer projects, or to ultimately improve our systems for our end user’s needs?

Most of our questions are still quite open ended, and honestly we are just getting started down this road. But as digital collections grow, and library budgets realign or shrink, it becomes increasingly important to back up our assertions and opinions with numbers, and find more efficient ways to work with the resources we have.


A Different Kind of Search: Query Analysis of Map Search

  • Zoe Chao, University of New Mexico (zoechao@unm.edu)
  • No previous Code4Lib presentation

Map searches are an increasingly important part of university and library websites. In 2012, The University of New Mexico (UNM) replaced its original PDF based campus maps (http://iss.unm.edu/PCD/campus-map.html) with an interactive map search based on the free Google Maps API. In addition to the basic map information such as streets and building outlines, we added search capabilities and categories for browsing (http://search.unm.edu/maps/). From November 2012 to September 2013, we logged about six thousand search instances on the campus map search. This data suggests that map searching presents a fundamentally different kind of search for users which results in a large number of failed searches that return empty or misleading result sets.

In this presentation we will briefly describe the development and current implementation of the UNM map search and our data collection of search queries. We then discuss the some surprising findings based on the data analysis. For instance, a large number of map queries include specific room numbers, which indicates some users perceive the search to include buildings' floor plans. This result suggests that we need to truncate numbers from queries in order to return correct building locations. Finally we will talk about the insight we gained from the data and our next steps toward the data driven interface design.


More Like This: Approaches to Recommending Related Items using Subject Headings

  • Kevin Beswick, NCSU Libraries (kdbeswic@ncsu.edu)
    • No previous code4lib presentations

With a significant portion of the collection at our new Hunt Library being housed in an automated storage and retrieval system, several of us at NCSU Libraries have begun looking at ways to replace and improve upon the classic shelf browsing experience in order to make it easier for patrons to browse related materials. Our goal is to mimic popular services like Amazon and Netflix, which utilize recommendation engines to make it easy for users to find items similar to a particular item of interest. While there have been previous efforts in libraries to recreate this experience using circulation or call number data, we are currently investigating algorithms that focus on use of subject headings. Use of subject headings as an alternative can be particularly helpful in the case of electronic materials that do not always have call numbers or circulation data. In this talk, I will share:

  • Details of the proposed algorithms
  • How these algorithms were quickly and easily implemented using Solr.
  • Our evaluation process and its outcomes in terms of the effectiveness of the algorithms.
  • How this has (or could) impact presentation of recommended items in our discovery layer.

Questioning Authority: building a ruby gem to facilitate UI interactions with varied controlled vocabularies

  • Mark Bussey, Data Curation Experts, mark@curationexperts.com

At a recent Hydra meeting, developers from five different institutions all realized that they had similar needs to support various types of UI fields based on a multiple of internal and external authorities and controlled vocabularies. Their goals was to develop a tool that let them meet these needs in ways that minimized the need for custom coding for each vocabulary. During an intense three-day working session, they minted the initial release of the questioning authority gem.

The talk will cover both how cross-institutional development helped speed development and how the gem can be used for accessing both external vocabularies like LCSH and LCNA and for presenting internal vocabulary lists. Although the developing institutions are all Hydra implementers, the gem itself doesn't have any Hydra dependencies and can be used in any Rails or Blacklight based application.


Building Hydra, a framework; a community

Justin Coyne Project Hydra contributor / Data Curation Experts

More than just a repository, the Hydra Project is a community of cultural heritage institutions dedicated to pooling knowledge and resources. It is a completely open source project that has grown continuously for over 5 years. Within this vibrant community, a number of conventions and practices have emerged that we believe will benefit others attempting to cultivate support for their community oriented projects. The Hydra Project is now a mature initiative which is producing shareable, reusable and customizable components as well as complete repository solutions. In a time of tight budgets and growing demand for improved systems, we believe that "the Hydra way" is the exemplar case in the library community for how to work across institutions to deliver high quality services to our patrons. This talk will cover both the technical and human processes that have sustained Hydra's continued development and growth.

From Ohloh.net In a Nutshell, Project Hydra...

  • has had 8,364 commits made by 64 contributors representing 60,733 lines of code
  • has a codebase with a long source history maintained by a very large development team with stable Y-O-Y commits
  • took an estimated 15 years of effort (COCOMO model) starting with its first commit in October, 2009 ending with its most recent commit 7 days ago

JQuery XML Editor

Presenter: Ben Pennell, UNC Chapel Hill Libraries (bbpennel@email.unc.edu) no previous C4L presentations

The jquery.xmleditor is a portable jquery widget developed by the University of North Carolina at Chapel Hill Libraries for the purpose of simplifying the description workflow for existing objects in our digital repository. It does so by adding context and structure informed by an underlying XML schema. Even more generally, it creates and modifies XML documents in your web browser.

It can be found here, including a live demo: [1]

Features include:

  • Graphical editor mode for displaying and modifying XML elements
  • Text editor mode for directly modifying the underlying document (using the Cloud9 editor)
  • Contextual, schema driven menus for adding new elements, subelements and attributes in both the graphical and text editing modes
  • Fully javascript and CSS based, jquery widget
  • AJAX submission of document modifications
  • Ability to export XML document to a file in web browsers that support it
  • Keyboard shortcuts for navigation and other operations
  • Standalone tool for building JSON representations of XML schemas

In our own implementation, the tool communicates with a Fedora based SWORD 2 enabled repository to receive the starting MODS document and to submit changes. But it's all XML in the end, and includes options for exporting to file or submitting to any endpoint that accepts XML.

This presentation will include an overview of the development process, technologies and issues involved, as well as a brief demonstration of the editor in use. It will also touch on the tool backing the editor which constructs JSON objects from schemas.

Visualizing Library Resources as Networks

  • Matt Miller New York Public Library, NYPL Labs.
    • No previous C4L presentations

Library resources are typically presented linearly in the form of a catalog search results page or an iterative list of subjects, books, special collections, etc. This talk explores the possibilities created when thinking of library resources as interconnected networks. We will look at the progress of a project to visualize NYPL resources such as catalog subject headings[1][2] as a network. We will also look at moving beyond visualizations into building network interfaces, such as our archival access term explorer[3] prototype.

[1] Catalog Subject Headings Visualization

[2] Time lapsed catalog network

[3] Archival access term explorer prototype.

Island or Archipelago? Reducing Repository Redundancy at University of Toronto Libraries

  • Sara Allain, Special Collections Librarian, University of Toronto Scarborough
  • Kelli Babcock, Special Projects Librarian, Special Projects Librarian, University of Toronto Scarborough
  • No previous Code4Lib presentations

This session will address a big issue in library technology – the creation of redundant repositories across large, multi-library institutions. We will discuss an ongoing collaboration at the University of Toronto: the development of Collections UofT, an Islandora/Drupal instance intended to support the special collections projects of UofT's community, faculty members, and 44 libraries. We will look at:

  • Successful communication strategies imperative to fostering collaboration among project stakeholders
  • Complications caused by legacy repositories and varying metadata standards
  • Negotiating branding and usability requirements for disparate projects
  • Focused outreach to generate community buy-in
  • Defining the roles and responsibilities of the repository's community
  • Generating a proactive response to the above issues through documentation, issue reporting, and standardized Memoranda of Understanding

As the University of Toronto Libraries continue to facilitate and develop digital projects, it is vital that our systems be both centralized and flexible, able to meet the needs of various collaborators across a wide range of subject areas. Collections UofT is our first step towards a brighter digital future for special collections at the University of Toronto.


So You Think You Want to Be a DPLA Service Hub?: Building a Statewide Repository System for the Commonwealth

  • Steven Anderson, Boston Public Library (sanderson@bpl.org)
    • No previous presentations at national Code4Lib conferences (excluding one lightning talk in 2013)
  • Eben English, Boston Public Library (eenglish@bpl.org)
    • No previous presentations at national Code4Lib conferences

Built upon the Hydra stack, the Digital Commonwealth repository system houses a variety of digital content from over a dozen Massachusetts libraries. In addition, we also harvest metadata via OAI-PMH from many other institutions throughout the state that lives alongside hosted content in (relative) harmony. This talk will discuss the development of our repository, with an emphasis on the specialized use cases that are involved in creating a system to serve as a DPLA service hub.

As a DPLA hub, we have many contributing institutions using many different systems (Omeka, DSpace, CONTENTdm, Fedora/Hydra, etc.) with OAI feeds that we need to harvest from and convert into our data storage format. Come hear about our journey into the madness of what people can put into their metadata records and our data normalization strategies for adding this content to our system.

We'll also cover:

  • Inherited design structure: Each OAI source has its own metadata nuances, and creating a "single script to rule them all" is out of the question (even if the records use the same schema and/or come from the same system). It is, however, possible to use good object-oriented principles to first cover general cases and then adjust for each institution's metadata style. In addition, our system uses content models that inherit from more basic implementations that make dealing with various types of heterogeneous content in our system much less painful.
  • Interface design: How do you create an online metadata editor for world's widest user base, from septuagenarian volunteers to academic librarians? How do you design a search interface that keeps content from a small historical society from getting lost in a sea of material contributed by statewide organizations? We've got answers.
  • Useful libraries and techniques: > 120. That's how many date formats our system currently supports when reading from an OAI feed. What libraries did we use to help parse that information? How are we generating thumbnails for various types of content when none are provided? We'll cover useful libraries and gems that make the hub developer's life worth living again.

Getting a New Website Without Losing the Old One

  • Angie Ballard, NCSU Libraries, (aballard@ncsu.edu)
    • No previous Code4Lib presentations
  • Charlie Morris, NCSU Libraries, (cdmorris@ncsu.edu)
  • Erik Olson, NCSU Libraries, (eolson@ncsu.edu)
    • No previous Code4Lib presentations

The NCSU Libraries last website redesign launch was in August 2010. The stated goal then was to position our website and our organization for a future of evolving through more iterative changes and agile workflows. This year’s latest evolution to a responsive designed website carried out this approach. We made incremental changes that retrofitted the face of the existing desktop website to be responsive-ready while simultaneously developing a fully-responsive Drupal theme.

Staff and end-users saw incremental changes starting with flattening the visual design, followed by font and spacing changes, modularizing existing page elements, and finally new responsive headers, footers and page layouts. This approach allowed us to re-use large portions of existing code, and to provide a more gradual shift for styaff and end-users. This iterative design process allows for testing and internal evaluation along the way. It also highlights IA and Content Strategy issues to be addressed in later projects.

We will talk about how scoping the project to these technical changes while largely maintaining the existing site IA, content, and visual design elements has a number of advantages with a few challenges.

Solr faceted title/call-number/heading browse with inline cross-references

  • Michael Gibney, University of Pennsylvania (mgibney@pobox.upenn.edu)
  • No previous presentations at national Code4Lib conferences

I would like to present an overview of recent development at the University of Pennsylvania library leveraging Solr/Lucene data structures to allow true browse (e.g. for Call Number, Title, Author, and Subject) with inline cross-references, over arbitrary subsets of records (as restricted by filters/facets/queries). Challenges addressed in development include:

  • 1. Providing for efficient normalized term sorting (with highly-configurable normalization) while preserving term case and formatting for term-centric display.
  • 2. Allowing record-centric display of results retrieved via term index (effectively allowing sorting on multi-valued fields). This point applies mainly to Call Number and Title browse.
  • 3. Inline display (with associated record counts) of cross-references for heading terms (as of Nov. 8, 2013, implemented only for Author browse using LC authority file as represented in VIAF, but designed to be readily extended to apply to subject headings, and multiple, query-time configurable authority schemes).

The solution that will be presented is native to Solr/Lucene (an extension of UnInvertedField), and is related to the approach suggested by Jonathan Rochkind at: http://bibwild.wordpress.com/2010/06/05/note-to-self-more-ideas-for-browse-search-in-solr/. It is extremely lightweight, with the only dependencies being already supplied by Solr/Lucene on the classpath. It is flexible and easily configured via Solr configuration files. Being related strictly to Solr/Lucene, it should be front-end agnostic and equally applicable in VUFind, Blacklight, or any other framework using a Solr backend.

The resulting functionality is in production at http://franklin.library.upenn.edu/. It is still under heavy development, and questions/comments/criticism would be welcome. The source code has not been released open source, but hopefully that will change in the near future.


Queue Programming -- how using job queues can make the Library coding world a better place

  • Birkin James Diana, Brown University (birkin_diana@brown.edu)
    • I've given one or two C4L 20-minute talks and a few lightning ones over the years

In 2007 we built a system that dumped certain user web-requests for books into a database for offline-processing triggered via cron. We wanted to make the magic happen live, but knew it would take too long. Thus we created, sort of accidentally, a kind of old-fashioned static procedural job queue.

Over the years we we've been repeatedly impressed with how useful and robust this unintended architecture has been, and it fostered thinking about using real job queues in Library workflows.

Fast-forward to the present. We now are using _real_ job queueing, in production, for parts of the functioning of Brown Digital Repository. We've also used it for ingestion scripts, and plan to move more lots more code to this architecture.

I'd like to share & show:

  • our lightweight rq/redis job queueing setup
  • how using job queues can speed up workflows via using multiple workers
  • how job queueing can make workflows more robust, especially by simplifying failure handling
  • a way we've smoothly avoided race-conditions that can occur in concurrent-programming
  • a technique for using task-processing job queues to simplify complex workflows

rq: http://python-rq.org

redis (python): https://pypi.python.org/pypi/redis/

How Can a new NISO Recommended Practice Help Me?

  • Nettie Lagace, Associate Director of Programs, National Information Standards Organization (NISO)
  • No previous C4L presentations (except for lightning talks in 2012 and 2013)

Two new NISO recommended practices are on their way to publication and hopefully, uptake and adoption: a specification for Open Access Metadata and Indicators (OAMI) and a Protocol for Exchanging Serial Content (PESC). Who are the stakeholders and potential users of these? How are they expected to be applied? This presentation will cover specification and implementation details for these two community-developed recommendations and utilize them as examples of consensus standards completed in a short turnaround time period.

The NISO Open Access Metadata and Indicators recommendations are a mechanism for transmitting the access status of scholarly works: peer reviewed articles published in subscription and hybrid journals, material available in institutional repositories, or any other such applicable material. Clear information regarding re-use rights must be included in this communication; “open access” on its own may not convey potential downstream uses. In addition, embargoes often come into play regarding availability of material.

The NISO Protocol for Exchanging Serial Content attempts to address an entirely different conundrum: how can digital files which make up serial content (which may well include text and images or other associated data) be successfully transmitted from partner to partner while including metadata requirements for description and organization of content? This information is needed for those who archive and preserve content, as well as those who may aggregate it, index it, or convert it to other uses. As more serial content is shipped to disparate stakeholders for all manner of potential uses, a common protocol will prevent local reinvention of the wheel.

Standards are entities that users in many communities often love to hate (http://xkcd.com/927/), but when projects need to be completed in a timely, cost-effective way and when interoperability with other entities is key, (almost) everyone will look to see if there is an existing standard or best practice in existence to help them get started. In order for standards and best practices to gain acceptance and adoption, it is critical for their development process to involve as many potential stakeholders and eventual user communities as possible.

A reusable application to enable self deposit of complex objects into a digital preservation environment

  • Jill Sexton jill@email.unc.edu, UNC Chapel Hill Libraries
  • Mike Daines daines@email.unc.edu, UNC Chapel Hill Libraries
  • Greg Jansen count0@email.unc.edu, UNC Chapel Hill Libraries

Jill gave a lightning talk once, otherwise no previous C4L presentations

Patron-initiated ingest of complex, multi-part objects into digital preservation environments remains a challenging problem for many libraries. In this talk we discuss how we approached this problem at UNC Chapel Hill.

UNC Chapel Hill Libraries is the developer of the Curator’s Workbench, (download: http://www2.lib.unc.edu/software/ GitHub Repo: https://github.com/UNC-Libraries/Curators-Workbench/wiki) an open-source collections preparation and work flow tool for digital materials. In response to the demand for patron-initiated ingest into our preservation repository, we extended the functionality of the Workbench, creating a module that enables easy creation of web deposit forms suitable for varying content types. The forms use dictionary and crosswalk mapping components to map the input fields to the MODS schema. Form designs also include explanatory text and designation of required fields. The forms work in tandem with a server-side form-hosting application, which can be configured to put uploads and MODS records onto a filesystem, or to deposit materials into a repository via SWORD. The forms feature simplifies the creation of deposit forms, shifting form design from software developers to curators, who have greater familiarity with both the depositor community and with descriptive standards. We also shift metadata creation to the content creators, who have the most knowledge of submitted materials.

We will demonstrate how this process works for the submission of Studio Art MFA theses. These complex deposits consist of a narrative description of the artwork in addition to up to 20 video- or image-based files documenting of their work, and associated metadata for each file. In addition to preserving MFA projects in a stable environment, this procedure gives graduate students greater control over the submission and description process and provides online access to MFA Art Theses and supporting works. Additionally, the project has invited discussions with MFA students about the preservation of their personal archives.

Our talk will address how these tools could work within other digital preservation environments


Leveling Up: Migrating Multiple DSpace Repositories to a Multi-tenant Configuration.

  • Aaron Collier, Digital Repository Services Manager, Systemwide Digital Library Services, California State University (acollier@calstate.edu)
    • No previous presentations at national Code4Lib conferences.
  • Carmen Mitchell, Institutional Repository Manager, California State University San Marcos (cmitchell@csusm.edu)
    • No previous presentations at national Code4Lib conferences (excluding Ask Anything sessions, 2012 & 2013)

In 2007 the California State University system started a project to provide a hosted institutional repository system for it’s individual campuses using the DSpace repository system. With limited technical staffing dedicated to the project, the result was a single server hosting seventeen individual and separate instances (including tomcat, databases and indexes). This lead to resource instability and lack of parity between versions, features and support. In order to overcome the shortcomings of this structure, a custom multi-tenant configuration was developed using the DSpace platform. This posed several technical challenges related to campus branding, authentication and deposit workflows.

During the development and testing of the multi-tenant structure of DSpace for the California State University system, constituent campuses continued to digitize works and create metadata in anticipation of a reliable system to insert these works. This created a situation where several campuses have created a lot of content and are looking for time saving measures for DSpace ingestion in order to continue work on the digitization projects. Development of a SWORD interface for bulk submission presented an attractive opportunity to provide a portal for bulk submission while avoiding the bottleneck of the provided method of FTP and DSpace scripting. Aaron Collier will talk about the technical challenges, and Carmen Mitchell will discuss the institutional needs: captioning, access copies vs display copies, workflow issues like batch uploading, embargoes, etc.

Curate Cloud: The role of cloud computing in expanding the impact of digital curation

Digital curation skills are a multidisciplinary and pressing need in public, academic and corporate environments (Yakel, 2007 336). By 2018, the United States will have a shortage of 140,000 -190,000 people with the deep analytical skills needed to manage large holdings of digital assets (Manyika et al., 2011). At the same time our information organizations will increasingly rely digital assets in making effective decisions (Ibid.). Despite advances in digital curation technologies, institutions create far more information than they curate in large part due to a gap in skills and perceived financial and technical barriers to entry (Heidorn, 2008). These barriers can seem insurmountable for smaller and under-represented information and cultural heritage institutions. However, new cloud computing based digital curation technologies reduce many of the financial and technical barriers so that the greatest challenge remaining is a need for updated skills and digital curation competencies.

Our information and cultural memory institutions require a new generation of professionals engaged in the preservation of digital resources and prepared to deploy curation tools that are not dependent on local technology infrastructure. In order to develop these competencies, Curate Cloud, a project being led by Dr. Jimmy Lin at the University of Maryland, College Park seeks to educate the next generation of information professionals using a curriculum integrated, cloud-based virtual learning environment.

The environment, designed using Amazon Web Service infrastructure and deployed in a “zero-configuration” environment lowers barriers of entry to students when learning about new technologies and cultivates a new level of cloud-based IT literacies in these students. This project draws on the successes of similar programs and pushes further by developing and deploying a novel cloud-based, open source virtual research and learning environment (VRLE) that embraces the on-demand, self-service model of cloud computing and features cloud-based curation tools that will enable the exploration of digital curation across the education, library, archive, and museum (LIS/LAM) community.

The presentation will focus on the research findings from the use of the VRLE in Library and Information Science education arenas as well as the challenges and opportunities that relate to delivering complex IT instruction using cloud computing platforms. The codebase for the VRLE is available at https://github.com/mitcheet.

This project is supported by the Institute for Museum and Library Services and Amazon Web Services through the Amazon Educational Research program.

Resources

  • Heidorn, P. B. (2008). Shedding Light on the Dark Data in the Long Tail of Science. Library Trends, 57(2), 280–299. doi:10.1353/lib.0.0036.
  • Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. (2011). Big data : The next frontier for innovation , competition , and productivity. McKinsey Global Institute, 364(May), 156.
  • Yakel, E. (2007). Digital curation. OCLC Systems Services, 23(4), 335–340. doi:10.1108/10650750710831466

Creating a better web experience

  • Katie Bertel, bertelks@buffalostate.edu, SUNY Buffalo State
  • Chris Parana, paranacj@buffalostate.edu, SUNY Buffalo State
    • No previous presentations at Code4Lib

The web has become much more dynamic and interactive in recent times. Sites more closely resemble full-blown applications, rather than static information resources. We see an opportunity for libraries to adhere to the same design principles used by popular websites, to create a more intuitive and enjoyable user experience.

In our presentation, we will discuss the results from usability testing after a website redesign in 2012 (library.buffalostate.edu), our guiding design principles, and showcase some of our solutions that enhance user experience, such as responsive web design, unified searching (Knowledge Base, Summon, website documents), and transitional interfaces.

Frameworks can be exploited to significantly reduce the time needed to develop powerful and engaging web applications. For example, we can use motion and transitional interfaces to help convey the sense of “space” in web design.

The goal is to create an engaging experience to draw our users in. When this is achieved, it encourages usage and creates an enjoyable place that is more than just a tool, but also a place for discovery.

Responsive Web Design - A Paradigm Shift

  • Jenny Brandon, Web Designer/Librarian, Michigan State University Libraries (jbrandon@msu.edu)

No previous presentations at Code4Lib

RWD is the biggest paradigm shift in web design in the last decade. This presentation will begin with a brief overview of responsive web design (RWD), elements of RWD, what types of frameworks are available and why you should choose one. Examples of library websites that have already implemented RWD will be analyzed to compare and contrast design methods. The remainder of the presentation will provide details on the Michigan State University Libraries' implementation of responsive web design using the Drupal Omega theme, and solutions adopted to transform an existing, fixed width library web site to a responsive design.

Topics included:

  • flexible grids
  • media queries
  • mobile first
  • images
  • design considerations
  • collaboration

The Smithsonian Transcription Center

eChing-hsien Wang, Branch Manager Library and Archives Systems Innovations Office of the Chief Information Officer Smithsonian Institution

In 2013, the Smithsonian Institution - the largest library, archive, museum and research center complex in the world - launched transcription.si.edu, the first release of the Smithsonian's Digital Volunteers platform. With the ambitious goal to engage varied audiences, enrich collections and enable discovery in ways never before imagined, the Transcription Center enlists the "crowd" to transcribe millions of pages of handwritten documents from across the Institution's vast and diverse collections. We will share our goals, strategies, and experiences as contributors and developers of this collaborative initiative among librarians, archivists and museum curators. Design, workflows, user analytics, templates, and discoveries will be demonstrated and discussed for formats as varied as botanical specimen files, diaries, ledgers, field notebooks, letters, and photographs. We will also showcase the benefit of using open source technology in building our system architecture and we will share our technical challenges and lessons learned along the way.

Ching-hsien Wang has not presented at Code4Lib conference before, but have participated in other conference presentations before.