2015 Prepared Talk Proposals

Code4lib 2015 is a loosely-structured conference that provides people working at the intersection of libraries/archives/museums/cultural heritage and technology with a chance to share ideas, be inspired, and forge collaborations. For more information about the Code4lib community, please visit http://code4lib.org/about/. The conference will be held at the Portland Hilton & Executive Tower in Portland, Oregon, from February 9-12, 2015.

Proposals for Prepared Talks:

We encourage everyone to propose a talk.

Prepared talks are 20 minutes (including setup and questions), and should focus on one or more of the following areas:

Projects you've worked on which incorporate innovative implementation of existing technologies and/or development of new software
Tools and technologies – How to get the most out of existing tools, standards and protocols (and ideas on how to make them better)
Technical issues - Big issues in library technology that should be addressed or better understood
Relevant non-technical issues – Concerns of interest to the Code4Lib community which are not strictly technical in nature, e.g. collaboration, diversity, organizational challenges, etc.

Proposals can be submitted through Friday, November 7, 2014 at 5pm PST (GMT−8). Voting will start on November 11, 2014 and continue through November 25, 2014. The URL to submit votes will be announced on the Code4Lib website and mailing list and will require an active code4lib.org account to participate. The final list of presentations will be announced in early- to mid-December.

Proposals for Prepared Talks:

Log in to the Code4lib wiki and edit this wiki page using the prescribed format. If you are not already registered, follow the instructions to do so. Provide a title and brief (500 words or fewer) description of your proposed talk. If you so choose, you may also indicate when, if ever, you have presented at a prior Code4Lib conference. This information is completely optional, but it may assist voters in opening the conference to new presenters.

Please follow the formatting guidelines:


== Talk Title: ==
 
* Speaker's name,  email address, and (optional) affiliation
* Second speaker's name, email address, and affiliation, if second speaker

Abstract of no more than 500 words.

Talk Proposals

Zines + Gamification = Awesomest Metadata Literacy Outreach Event Ever!

Jennifer Hecker, jenniferraehecker@gmail.com, University of Texas Libraries & Austin Fanzine Project
Lillian Karabaic, librarian@iprc.org, Independent Publishing Resource Center (Portland)

In academic libraries, and elsewhere, the popularity of zine (a magazine produced for love, not profit) collections is on the rise. At the same time, metadata literacy is becoming an increasingly important skill, helping people navigate and understand digital environments and interactions. We have found a way to teach metadata literacy to the general public that isn’t super-boring – in fact, we’ve made it downright fun!

First, volunteer zine librarian Lillian Karabaic of Portland’s Independent Publishing Resource Center facilitated the creation of a gamified cataloging interface for the IPRC’s annual Raiders of the Lost Archives backlog-busting 24-hour volunteer cataloging event.

Then, archivist Jennifer Hecker facilitated the adaptation of the IPRC’s game for use in a similar, but also very different context – promoting UT Libraries newly-acquired zine collections. The main goal of the academic-library-based event was increasing excitement around the collections, but with the side goal of building metadata literacy, and introducing an understanding of library cataloging issues.

The Texas modification also conforms to the xZINECOREx metadata schema developed by the national Zine Librarians Interest Group, and triggered interesting conversations with the Libraries’s cataloging department about evolving metadata standards and how to incorporate the products of crowd-sourcing projects into existing workflows.

Both games will be demoed.

Do the Semantic FRBRoo

Rosie Le Faive, rlefaive@upei.ca, University of Prince Edward Island

Islandora is great for creating repositories of any data type, but how can you model meaningful relationships between digital objects and use them to tell a story?

At UPEI, I’m assembling an ethnography of Prince Edward Island’s traditional fiddle music that includes musical clips, video clips, oral histories, musical notation, images, and ethnographic commentaries. In order to present an exhibition-style site, I’m tying these digital objects together via the people, places, events, tunes and topics that they share or describe.

To describe the relationships, I’m extending Islandora to use FRBRoo, a vocabulary that combines the FRBR model with CIDOC-CRM, the the object-oriented museum documentation ontology. These modules being developed will allow other researchers to create a structured, navigable digital repository of diverse object types, that uses Islandora as an exhibition platform.

Our $50,000 Problem: Why Library School?

Jennie Rose Halperin, jhalperin@mozilla.com, Mozilla Corporation

57 library schools in the United States are churning out approximately 100 graduates per year, many with debt upwards of $50,000. According to ONet, 84% of library jobs in the US require an MLS. The library profession is 92% white and 82% female and entry-level librarians can expect to make $32,500 per year.

Contrasted with developers, who are almost 90% male and can expect to make $70,000 in an entry-level position, these numbers are dismal.

According to a recent survey, the top skill that outgoing library students want to know is “programming” and yet many MLS programs still consider Microsoft Word an essential technology skill.

What is going on here? Why do we accept this fate, where mostly female debt-burdened professionals continue to be thrown onto the work force without the education their expensive degrees promised?

As a community we need to come together to stop this cycle. We need to provide better support and mentorship to diversify and keep the profession relevant and help librarianship move into the future it deserves.

This talk will walk through the challenges of navigating a hostile employment environment as well as present models for better development and future state imagining.

No cataloging software? Need more than Dublin Core? No problem!: Experiences with CollectiveAccess

Sean Q. Hendricks, sqhendr@clemson.edu, Clemson University
Rachel Wittmann, rwittma@clemson.edu, Clemson University

Clemson University Libraries has implemented the open-source software CollectiveAccess for customized digital collection needs. CollectiveAccess is an open-source project with the goal of providing a flexible way to manage and publish museum and archival collections. There are several applications associated with the projects; most used are: Providence (for cataloging and entering metadata) and Pawtucket (for displaying objects in a collection for the public). It has many profiles readily available for installing with existing library standards, such as Dublin Core, and there is a robust syntax for creating your own profiles to fit custom tailored metadata schemas. Plus, the user interface allows you to modify the metadata profile quickly and easily.

In this talk, we will discuss:

Our experiences with installing Providence and creating an installation profile that satisfies the needs of many of the Clemson Libraries digital archiving processes.
The stumbling blocks experienced in that process and how they were resolved.
The available plugins sourcing widely used authorities, such as Library of Congress thesauri and GeoNames.org, and how they have been used by our projects.
A brief overview of the export and import functions and also current workflow practices within Providence.
Future plans & the role of CollectiveAccess at Clemson University Libraries

Getting ContentDM and Wordpress to Play Together

Sean Q. Hendricks, sqhendr@clemson.edu, Clemson University

Clemson University Libraries has a very strong program for digitizing and archiving photographs, and the Digital Imaging team processes many hundreds of photographs every month. These images are managed using different methods, including ContentDM, a digital collection manager.

ContentDM provides various methods for searching and displaying photographs, along with their metadata. However, recent initiatives have resulted in the need to leverage those collections into exhibits displayed on other library-related websites, such as our Special Collections unit. The Clemson Libraries has invested heavily in Wordpress as our content management system of choice, and it seemed most efficient not to have to export and import images into our Wordpress sites in order to provide exhibited images.

Fortunately, ContentDM has provided an API to many of their functions, allowing the extraction of metadata and even rescaled images through URLs. This project has been developing a plugin for Wordpress that integrates with ContentDM through shortcodes that Wordpress editors can easily include in their content. These shortcodes allow editors to choose how many images, which images from which collections, thumbnail sizes, etc. to display in different gallery styles. Plans are for it to allow integration with different plugins such as Fancybox and Masonry.

In this presentation, I will demonstrate the current state of the plugin and discuss future plans.

Refinery — An open source locally deployable web platform for the analysis of large document collections

Daeil Kim, The New York Times, daeil.kim@nytimes.com

Refinery is an open source web platform for the analysis of large unstructured document collections. It extracts meaningful semantic themes within documents also known as "topics" which can be thought of as word clouds composed of terms that highly co-occur with one another. Once this semantic index is formed, one can extract relevant documents related to these topics and further refine their contents through a summarization process that allows users to search for phrases that are relevant to them within the corpus. The goal of Refinery is to make this whole process easier and to provide some of the latest scalable versions of these learning algorithms in an intuitive web-based interface. Refinery is also meant to be run locally, thus bypassing the need for securing document collections over the internet. The talk will go through some of the technologies involved and a demo of the app.

For more info check out http://www.docrefinery.org.

Drupal 8 — Evolution & Revolution

Cary Gordon, The Cherry Hill Company, cgordon@chillco.com

Drupal 8 is in beta and nearing release. Among its many features, it notably has become more developer friendly through its adoption of the Symfony PHP framework along with Symfony's outstanding set of libraries (like Guzzle) and tools (like Composer). And, in implementing the Twig theming system, it is can begin to escape PHPtemplate. These moves also make it easier to create headless systems that uses Angular.js and other systems for presentation, or even forgo presentation entirely.

From the site-builder's perspective, Drupal 8 provides a much smother experience and makes it easier to build and implement site recipes.

Using GameSalad to Build a Gamified Information Literacy Mobile App for Higher Education

Stanislav 'Stan' Bogdanov, stan@stanrb.com, Adelphi University and Boglio LLC

GameSalad is a popular tool for developing mobile and desktop games with little actual programming. In this presentation, Stan Bogdanov breaks down the development process he followed while building mobiLit, a mobile app with the goal of being the first open-source gamified information literacy app to be used as part of a college-level information literacy curriculum. He will go through the basics of using GameSalad to create an app that can be easily customized by non-programmers and the instructional principles used to teach the material in a mobile medium. Stan will also go through two qualitative design studies he did on the app and discuss their results and the lessons learned from building mobiLit. The session will conclude with an overview of the next steps for the mobiLit project.

The Impossible Search: Pulling data from multiple unknown sources

Riley Childs, no official affiliation (currently a Senior in High School at Charlotte United Christian Academy), rchilds (AT) cucawarriors.com

It's easy to search data you know the structure of, but what if you need to pull in data from sources that don't have a standard structure. The ability to search community events along with your standard catalog search results is an example, but often the only way to pull these events is through XML, JSON, (Insert structured format here), or even just raw html. But how do you get that structure? That simple question is what makes this impossible. The process to define and process this structure takes a lot of manual labor, especially if the data you are pulling is just HTML, and then every time you add data to the index you have to run all the data through a script to pull in data in a format Solr or an other index can use. This talk will focus on Solr, but the principles explained will apply to many other indexes.

What! You're Not Using Docker?

Cary Gordon, The Cherry Hill Company, cgordon@chillco.com

Boring part: Docker[1] is a container system that provides benefits similar to virtualization with only a fraction of the overhead. Scintillating part: Docker can host between four to six times the number of service instances than systems such as Xen or VMWare on a given piece of hardware. But thats not all! Docker also makes it simple(r) to create transportable instances, so you can spin up development servers on your laptop.

[1]https://www.docker.com/

Video Accessibility, WebVTT, and Timed Text Track Tricks

Jason Ronallo, jronallo@gmail.com, NCSU Libraries

Video on the Web presents new challenges and opportunities. How do you make your video more accessible to those with various disabilities and needs? I'll show you how. This presentation will focus on how to write and deliver captions, subtitles, audio descriptions, and timed metadata tracks for Web video using the WebVTT W3C standard. Encoding timed text tracks in this way opens up opportunities for new functionality on your websites beyond accessibility. The presentation will show some examples of the potential for using timed text tracks in creative ways. I'll cover all the HTML and JavaScript you will need to know as well as some of the CSS and other bits you could probably do without but are too fun to pass up.

Categorizing Records with Random Forests

Geoffrey Boushey, geoffrey.boushey@ucsf.edu, UCSF Library

Academic libraries are increasingly responsible for providing ingest, search, discovery, and analysis for data sets. Emerging techniques from data science and machine learning can provide librarians and developers with an opportunity to generate new insights and services from these document collections. This presentation will provide a brief overview of common machine learning classification techniques, then dive into a more detailed example using a random forest to assign keywords to research data sets. The talk will emphasize the insight that can be gained from machine learning rather than the inner workings of the algorithms. The overall goal of this presentation is to provide librarians and developers with the context to recognize an opportunity to apply machine learning categorization techniques at their home campuses and organizations.

Data Science in Libraries

Devon Smith, smithde@oclc.org, OCLC

Data Science is increasing in buzz and hype. I'll go over what it is, what it isn't, and how it fits in libraries.

PDF metadata extraction for academic literature

Kevin Savage, kevin.savage at mendeley.com, Mendeley
Joyce Stack, joyce.stack at mendeley.com, Mendeley

Mendeley recently added a, "document from file," endpoint to its API which attempts to extract metadata such as title and authors directly from PDF files. This talk will describe at a high level the machine learning methods we used including how we measured and tuned our model. We will then delve more deeply into our stack, the tools we used, some of the things that didn't work and why PDFs are the worst thing ever to compute over.

Giving Users What They Want: Record Grouping in VuFind

Mark Noble, mark@marmot.org, Marmot Library Network

In 2013, Marmot did extensive usability studies with patrons to determine what was difficult in the catalog. Many patrons had problems sifting through all of the various formats and editions of a title. In 2014 we developed a method for grouping records so only a single work is shown in search results and all formats and editions are listed under that work. We will discuss our definition of a 'work' based on FRBR principles; combining meta data from MARC records with metadata from other sources like OverDrive; the technical details of Record Grouping; the design decisions made during implementation; and the reaction from users and staff.

Topic Space: a mobile augmented reality recommendation app

Jim Hahn, jimhahn@illinois.edu, University of Illinois at Urbana-Champaign

The Topic Space module (http://minrvaproject.org/modules_topicspace.php ) was developed with an IMLS Sparks! Grant to investigate augmented reality technologies for in-library recommendations. The funding allowed for sustained university community collaboration by the University Library, the Graduate School of Library and Information Science, as well as graduate student programmers sourced from the Department of Computer Science. Collaborators designed app functionality and identified relevant open source libraries that could power optical character recognition (OCR) functionality from within the mobile phone.

Topic space allows a user to take a picture of an item's call number in the book stacks. The module will show the user other books that are relevant but that are not shelved nearby. It can also show users books that are normally shelved here but that are currently checked out. Recommendations are based on Library of Congress subject headings and ILS circulation data which indicate recommendation candidates based on total check-outs.

Research questions included development of back end (server-side) pattern matching algorithms for recommendations, and a rapid formative evaluation of interface design that would provide optimal user experience for navigation of the book stacks as a context to recommendations.

Along with the Topic Space native app, grant collaborators prototyped web based recommendations which could serve as a new way of providing readers advisory and “more like this” recommendations from discovery interfaces accessed through desktop browsers. Outcomes of the grant include the availability of the Topic Spaces module within Minrva app on the Android Play store and an experimental Backbone.js based Topic Space web app.

Leveling Up Your Git Workflow

Megan Kudzia, moneill@albion.edu, Albion College Library
Kate Sears, eks11@albion.edu, Albion College Library

Have you started experimenting with Git on your own, but now you need to include others in your projects? Learn from our mistakes! Transitioning from a one-person git workflow and repo structure, to a structure that includes multiple people (including student workers), is not for the faint of heart. We'll talk about why we decided to work this way, our path to developing a git culture amongst ourselves, conceptual and technical difficulties we've faced, what we learned, and where we are now. Also with pretty pictures (aka workflow drawings).

Drone Loaning Program: Because Laptops are so last century

* Uche Enwesi, uenwesi@umd.edu, University of Maryland Libraries
* Francis Kayiwa, fkayiwa@umd.edu, University of Maryland Libraries

At Univ. Maryland we are in the very early stages of looking into allowing our student body get their hands on a drone. Yes that's right we will let students take out a drone for n amount of hours to work on projects of their choosing. The talk will talk about the logistics of getting a program of this sort from concept to "Is the drone available?". If people sign waivers we will also promise not to crash the drone into code4lib attendees.

Got Git? Getting More Out of Your GitHub Repositories

* Terry Brady, twb27@georgetown.edu, Georgetown University Library

This presentation will discuss how librarians, developers, and system administrators at Georgetown University are maximizing their use of the public and private GitHub repositories.

In additional to all of the great benefits of using Git for code management, the GitHub interface provides a powerful set of tools to showcase a project and to keep your users informed of developments to your project. These tools can assist with marketing and outreach - turning your code repository into a focus of conversation!

Style-able Project Pages
Project Wikis
Project Release Notes/Portfolios
Web Resources That Can Be Directly Requested
Gists for code sharing
Private Repositories and Organizational Groups
Pull Request Conversation Tracking
Customized Issue management

Quick Wins for Every Department in the Library - File Analyzer!

* Terry Brady, twb27@georgetown.edu, Georgetown University Library

The Georgetown University Library has customized workflows for nearly every department in our library with a single code base.

Analyzing Marc Records for the Cataloging department
Transferring ILS invoices for the University Account System for the Acquisitions department
Delivering patron fines to the Bursar’s office for the Access Service department
Summarizing student worker timesheet data for the Finance department
Validating COUNTER compliant reports for the Electronic Resources department
Generating ingest packages for the Digital Services department
Validating checksums for the Preservation department

Learn how you can customize the File Analyzer to become a hero in your library!

The Geospatial World is Moving from Maps on the Web to Maps of the web. Libraries can too

Mita Williams, mita@uwindsor.ca, User Experience Librarian, University of Windsor

The transition from paper maps to digital ones changed much more than the maps themselves; it changed the very foundation of how we work and how we find each other. Now maps are transforming again. The Geospatial World is moving from GIS systems that are institutionally-focused, expensive, feature-burdened, and binds data into a complicated and demanding user-hostile interface. From this transition from digital to web-based digital geospatial tools has come growth and development in new forms of map-based investigative journalism, activism, scholarship, and business ventures. This talk will highlight the conditions and strategies that made these changes possible as a means to draw a path by which librarians through our own work may follow, dragons notwithstanding.

Building Your Own Federated Search

Rich Trott, Richard.Trott@ucsf.edu, UC San Francisco

Advances in modern browsers have created some interesting possibilities for federated search. This presentation will cover common techniques and pitfalls in building a federated search. We will discuss what principles guided our decisions when implementing our own federated search. We will show tools we've built and our findings from building and using experimental prototypes.

Your higher education institution likely offers dozens of online resources for educators, students, researchers, and the public. And each of these online resources likely has its own search tool. But users can't be expected to search in dozens of different interfaces to find what they're looking for. A typical solution for this issue is federated search.

Indexing Linked Data with LDPath

Chris Beer, cabeer@stanford.edu, Stanford University Libraries

LDPath [1] is a simple query language for indexing linked open data, with support for caching, content negotiation, and integration with non-RDF endpoints. This talk will demonstrate the features and potential of the language and framework to index a resource with links into id.loc.gov, viaf.org, geonames.org, etc to build an application-ready document.

[1] http://marmotta.apache.org/ldpath/language.html

Show Me the Money: Integrating an LMS with Payment Providers

Josh Weisman, Josh.Weisman@exlibrisgroup.com, Development Director-Resources Management, Ex Libris Group

In order to provide an easy and convenient way for patrons to pay fines, we are exploring ways to integrate the library management system with online payment providers such as PayPal. With many LMS systems being designed and developed for the cloud, we should be able to provide the frictionless user experience our patrons have come to expect from online transactions. In this session we'll discuss strategies for integration and review a sample application which uses REST APIs from a library management system to integrate with PayPal.

Shibboleth Federated Authentication for Library Applications:

Scott Fisher, scott.fisher@ucop.edu, California Digital Library
Ken Weiss, ken.weiss@ucop.edu, California Digital Library

Shibboleth is the most widely-used method to provide single-sign-on authentication to academic applications where users come from many different institutions. Shibboleth, the InCommon education and research trust framework, and the SAML protocol comprise a very powerful - but very complicated - solution to this very complicated problem. Scott and Ken have implemented Shibboleth for multiple library applications. They will share their understanding of the good, the bad, and the underlying spaghetti that makes it all work. Ken will discuss some of the technical aspects of the solution, touching on optimal and non-optimal use cases, administrative challenges, and authorization concerns. Scott will describe the implementation pattern for multi-institution single-sign-on that the California Digital Library has evolved, using the recently released Dash application (http://dash.cdlib.org) as an example.

Scientific Data: A Needs Assessment Journey

Vicky Steeves, vsteeves@amnh.org, American Museum of Natural History

While surveying digital research and collections data in the research science divisions at the American Museum of Natural History in NYC (as a part of my National Digital Stewardship Residency project), I have come across the big data hogs (genome sequencing and CT scanning) and the little pieces of data (images, publications), all equally important to not only scientific discovery, but as nodes in the history of science.

In this session, I will discuss the development of my needs assessment surveys for scientific datasets and the interview process with Museum curators and researchers as background, seguing into an explanation of the results. I will then combine my findings into preliminary selection criteria to choose tools for digital preservation and management unique to scientific datasets. This will brooke a discussion on emerging standards, tools, and technologies in big data, specific to research science.

I will conclude with preliminary findings on emerging technology that can be used to answer concerns surrounding the management and digital preservation of these data. I am hoping the Q&A session can be used to both answer questions about my project, and function as a way for you (the larger tech-savy library community) to discuss the tools I’ve touched on in this talk.

Feminist Human Computer Interaction (HCI) in Library Software

Bess Sadler, bess@stanford.edu, Stanford University Libraries

Libraries are not neutral repositories of knowledge. Library classification systems and search technologies tend to reflect the inequalities, biases, ethnocentrism, and power imbalances of the societies in which they are built [1]. How might we better resist these tendencies in the library software we create? This talk will examine some qualities of feminist HCI (pluralism, self-disclosure, participation, ecology, advocacy, and embodiment) [2] through the lens of library software.

[1] Olson, Hope A. (2002). The Power to Name: Locating the Limits of Subject Representation in Libraries. Dordrecht, The Netherlands: Kluwer Academic Publishers.

[2] Bardzell, Shaowen. Feminist HCI: Taking Stock and Outlining an Agenda for Design. CHI 2010: HCI For All. http://dmrussell.net/CHI2010/docs/p1301.pdf

Heiðrún: DPLA's Metadata Harvesting, Mapping and Enhancement System

Audrey Altman, audrey at dp.la, Digital Public Library of America
Gretchen Gueguen, gretchen at dp.la, Digital Public Library of America
Mark Breedlove, mb at dp.la, Digital Public Library of America

The Digital Public Library of America aggregates metadata for over 8 million objects from more than 24 direct partners, or Hubs, using its Metadata Application Profile (MAP), an RDF metadata application profile based on the Europeana Data Model. After working with the initial system for harvesting, mapping and enhancing our Hub’s metadata for a year, we realized that it was inadequate for working with data at this scale. There were architectural issues; it was opaque to non-developer and partner staff; there were inadequate tools for quality assurance and analysis; and the system was unaware that it was working with RDF data. As the network of Hubs expanded and we ingested more metadata, it became harder and harder to know when or why a harvest, a mapping task, or an enrichment went wrong because the tools for quality assurance were largely inadequate.

The DPLA Content and Technology teams decided to develop a new system from the ground up to address those problems. Development of Heidrun, the internal version of the new system, started in October 2014. Heidrun’s goals are to make it easier for us to harvest and map metadata from various sources and in variety of schemas to the DPLA MAP, to better enrich that metadata using external data sources, and to actively involve our partners in the ingestion process through access to better QA tools. Heidrun and its componentry are built on Ruby on Rails, Blacklight, and ActiveTriples. Our presentation will give some background on our design principles and processes used during development, the architecture of the system, and its functionality. We plan to release a version of Heidrun and its components as a generalized metadata aggregation system for use by DPLA Hubs and others working to aggregate cultural heritage metadata.

OS or GTFO: Program or Perish

Tessa Fallon, tessa.fallon@gmail.com

Description TBD

Creating Dynamic— and Cheap!— Digital Displays with HTML 5 Authoring Software

Chris Woodall, cmwoodall@salisbury.edu, Salisbury University Libraries

Would your library like to have large digital signage that displays dynamic information such as library hours, weather, room availability, and more? Have you looked into purchasing large digital signage, only to be turned off by the high price tag and lack of customization available with commercial solutions? Our library has developed a cheap and effective alternative to these systems using HTML 5 authoring software, a large TV, and freely-available APIs from Google, Springshare, and others. At this session, you’ll learn about the system that we have in place for displaying dynamic and easily-updatable information on our library’s large digital display, and how you can easily create something similar for your library.

REPOX: Metadata Blender

John Mignault, jmignault@metro.org, Empire State Digital Network

With the growth in the number of hubs providing metadata to the Digital Public Library of America, many of them are using REPOX, a tool originally created for the Europeana project, to aggregate disparate metadata feeds and transform them into formats suitable for ingest into DPLA. The Empire State Digital Network, the forthcoming DPLA service hub for NY state, is using it to prepare for our first ingest into DPLA in early 2015. We'll take a look at REPOX and its capabilities and how it can be useful for ingesting and transforming metadata, and also discuss some things we've learned in massaging widely varied metadata feeds.

Beyond Open Source

Jason Casden, jmcasden@ncsu.edu, NCSU Libraries
Bret Davidson, bddavids@ncsu.edu, NCSU Libraries

The Code4Lib community has produced an increasingly impressive collection of open source software over the last decade, but much of this creative work remains out of reach for large portions of the library community. Do the relatively privileged institutions represented by a majority of Code4Lib participants have a professional responsibility to support the adoption of their innovations?

Drawing from old and new software packaging and distribution approaches (from freeware to Docker), we will propose extending the open source software values of collaboration and transparency to include the wide and affordable distribution of software. We believe this will not only simplify the process of sharing our applications within the Code4Lib community, but also make it possible for less well resourced institutions to actually use our software. We will identify areas of need, present our experiences with the users of our own open source projects, discuss our attempts to go beyond open source, and make an argument for the internal value of supporting and encouraging a vibrant library ecosystem.

Making It Work: Problem Solving Using Open Source at a Small Academic Library

Adam Strohm, astrohm@iit.edu, Illinois Institute of Technology
Max King, mking9@iit.edu, Illinois Institute of Technology

The Illinois Institute of Technology campus was added to the National Register of Historic Places in 2005, and contains a building, Mies van der Rohe's S.R. Crown Hall, that was named a National Historic Landmark in 2001. Creating a digital resource that can adequately showcase the campus and its architecture is challenge enough in and of itself, but doing so as a two-person team of relative newcomers, at a university library without dedicated programmers on staff, ups the ante considerably. The challenges of technical know-how, staff time, and funding are nothing new to anyone working on digital projects at a university library, and are amplified when doing so at a smaller institution. This talk covers the conception, development, and design of the campus map site that was built, concentrating on the problem-solving strategies developed to cope with limited technical and financial resources. We'll talk about our approach to development with Open Source software, including Omeka, along with the Neatline and Simile Timeline plugins. We'll also discuss the juggling act of designing for mobile mapping functionality without sacrificing desktop design, weighing the costs of increased functionality versus our ability to time-effectively include that functionality, and the challenge of building a site that could be developed iteratively, with an eye towards future enhancement and sustainability. Finally, we’ll provide recommendations for other librarians at smaller institutions for their own efforts at digital development.

Recording Digitization History: Metadata Options for the Process History of Audiovisual Materials

Peggy Griesinger, peggy_griesinger@moma.org, Museum of Modern Art

The Museum of Modern Art has amassed a large collection of audiovisual materials over its many decades of existence. In order to preserve these materials, much of the audiovisual collection has been digitized. This is a complex process involving numerous steps and devices, and the methods used for digitization can have an effect on the quality of the file that is preserved. Therefore, knowing exactly how something was digitized is critical for future stewards of these objects to be able to properly care for and preserve them. However, detailed technical information about the processes involved in the digitization of audiovisual materials is not defined explicitly in most metadata schemas used for audiovisual materials. In order to record process history using existing metadata standards, some level of creativity is required to allow existing standards to express this information.

This talk will detail different metadata standards, including PBCore, PREMIS, and reVTMD, that can be implemented as methods of recording this information. Specifically, the talk will examine efforts to integrate this metadata into the Museum of Modern Art’s new digital repository, the DRMC. This talk will provide background on the DRMC as well as MoMA’s specific institutional needs for process history metadata, then discuss different metadata implementations we have considered to document process history.

Pig Kisses Elephant: Building Research Data Services for Web Archives

Jefferson Bailey, jefferson@archive.org, Internet Archive
Vinay Goel, vinay@archive.org, Internet Archive

More and more libraries and archives are creating web archiving programs. For both new and established programs, these archives can consist of hundreds of thousands, if not millions, of born-digital resources within a single collection; as such, they are ideally suited for large-scale computational study and analysis. Yet current access methods for web archives consist largely of browsing the archived web in the same manner as browsing the live web and the size of these collections and complexity of the WARC format can make aggregate analysis difficult. This talk will describe a project to create new ways for users and researchers to access and study web archives by offering extracted and post-processed datasets derived from web collections. Working with the 325+ institutions and their 2600+ collections within the Archive-It service, the Internet Archive is building methods to deliver a variety of datasets culled from collections of web content, including extracted metadata packaged in JSON, longitudinal link graph data, named entities, and other types of data. The talk will cover the technical details of building dataset production pipelines with Apache Pig, Hadoop, and tools like Stanford NER, the programmatic aspects of building data services for archives and researchers, and ongoing work to create new ways to access and study web archives.

Awesome Pi, LOL!

Matt Connolly, mconnolly@cornell.edu, Cornell University Library
Jennifer Colt, jrc88@cornell.edu, Cornell University Library

Inspired by Harvard Library Lab’s “Awesome Box” project, Cornell’s Library Outside the Library (LOL) group is piloting a more automated approach to letting our users tell us which materials they find particularly stunning. Armed with a Raspberry Pi, a barcode scanner, and some bits of kit that flash and glow, we have ventured into the foreign world of hardware development. This talk will discuss what it’s like for software developers and designers to get their hands dirty, how patrons are reacting to the Awesomizer, and LOL’s not-afraid-to-fail philosophy of experimentation.

You Gotta Keep 'em Separated: The Case for "Bento Box" Discovery Interfaces

Jason Thomale, jason.thomale@unt.edu, University of North Texas Libraries

I know, I know--proposing a talk about Resource Discovery is like, so 2010.

The thing is, practically all of us--in academic libraries at least--have a similar set up for discovery, with just a few variations, and so talking about it still seems useful. Stop me if this sounds familiar. You've got a single search box on the library homepage as a starting point for discovery. And it's probably a tabbed affair, with an option for searching the catalog for books, an option for searching a discovery service for articles, an option for searching databases, and maybe a few others. Maybe you have an option to search everything at once--probably the default, if you have it. And, if you're a crazy hepcat, maybe you only have your one search that searches everything, with no tabs.

Now, the question is, for your "everything" search, are you doing a combined list of results, or are you doing it bento-box style, with a short results list from each category displayed in its own compartment?

At UNT, we've been holding off on implementing an "everything" search, for various reasons. One reason is that the evidence for either style hasn't been very clear. There's this persistent paradox that we just can't reconcile: users tell us, through word and action, that they prefer searching Google, yet, libraries aren't Google, and there are valid design reasons why we shouldn't try to oversimplify our discovery interfaces to be like Google. And there's user data that supports both sides.

Holding off on making this decision has granted us 2 years of data on how people use our tabbed search interface that does not include an "everything" search. Recently I conducted a thorough analysis of this data--specifically the usage and query data for our catalog and discovery system (Summon). And I think it helps make the case for a bento box style discovery interface. To be clear, it isn't exactly the smoking gun that I was hoping for, but the picture it paints I think is telling. At the very least, it points away from a combined-results approach.

I'm proposing a talk discussing the data we've collected, the trends we've seen, and what I think it all means--plus other reasons that we're jumping on the "bento box" discovery bandwagon and why I think "bento box" is at this point the path that least sells our souls.

Don’t know about you, but I’m feeling like SHA-2!: Checksumming with Taylor Swift

Ashley Blewer!, ashley.blewer@gmail.com

Checksum technology is used all over the place, from git commits to authenticating Linux packages. It is most commonly used in the digital preservation field to monitor materials in storage for changes that will occur over time or used in the transmission of files during duplication. But do you even checksum, bro? I want this talk to move checksums from a position of mysterious macho jargon to something everyone can understand and want to use. I think a lot of people have heard of checksum but don’t know where to begin when it comes to actually using it at their institution. And cryptography is hella intimidating! This talk will cover what checksums are, how they can be integrated into a library or archival workflow, protecting collections requiring additional levels of security, algorithms used to verify file fixity and how they are different, and other aspects of cryptographic technology. Oh, and please note that all points in this talk will be emphasized or lightly performed through Taylor Swift lyrics. Seriously, this talk will consist of at least 50% Taylor Swift. Can you, like, even?

Level Up Your Coding with Code Club (yes, you can talk about it)

Coral Sheldon-Hess, coral@sheldon-hess.org

Reading code is a necessary part of becoming a better developer. It gives you more experience and more insight into How Things Are (or Aren't) Done; it builds your intuition about how to solve problems with code; and it increases your confidence that you, too, can tackle whatever technological problems you're facing.

But you don't have to read code alone! (Which is good. It's really not fun to read code alone.)

In late 2014, a group of librarians formed two Code Clubs, inspired by this talk by Saron (of Bloggytoons fame). I'd like to tell you about how we've structured our Code Clubs, what has gone well, what we've learned, and what you need to do to form your own Code Club. I'll share a list of the codebases we've looked at, too, to help you get your own Code Club off the ground!

The Growth of a Programmer

Joshua Gomez, Getty Research Institute, jgomez@getty.edu

Just like other creative endeavors, software developers can experience periods of great productivity or find themselves in a rut. After contemplating the alternating periods in my own career I've noticed several factors that have effected my own professional growth and happiness, including: mentorship, structure, community, teamwork, environment, formal education, etc. Not all of the factors need to be present at all times; but some mixture of them is critical for continued growth. In this talk, I will articulate these factors, discuss how they can effect a developer's career, and how they can be sought out when missing. This talk is aimed at both new developers looking to strike their own path as well as the veterans that lead or mentor them.

Developing a Fedora 4.0 Content Model for Disk Images

Matthew Farrell, matthew.j.farrell@duke.edu, Duke University Libraries
Alexandra Chassanoff, achass@email.unc.edu, BitCurator Access Project Manager

As the acquisition of born-digital materials grows, institutions are seeking methods to facilitate easy ingest into their repositories and provide access to disk images and files derived or extracted from disk images. In this session, we describe our development of a Fedora 4.0 Content model for disk images, including acceptable image file formats and the rationale behind those choices. We will also discuss efforts to integrate the disk image content model into the BitCurator Access environment. Unlike generalized, format-agnostic content models which might treat the disk image as a generic bitstream, a content model designed for disk images enables expression of relationships among associated content in the collection such as files extracted from images and other born-digital and digitized material associated with the same creator. It also enables capture of file-system attributes such as file paths, timestamps, whether files are allocated/deleted, etc. Further, a disk image content model suggests further steps repositories can take in order to transform and re-use associated metadata generated during the creation and forensic analysis of the disk image.

Data acquisition and publishing tools in R

Scott Chamberlain, scott@ropensci.org, rOpenSci/UC Berkeley - first-time presenter

R is an open source programming environment that is widely used among researchers in many fields. R is powerful because it's free, increasingly robust, and facilitates reproducible research, an increasingly sought after goal in academia. Although tools for data manipulation/visualization/analysis are well developed in R, data acquisition and publishing tools are not. rOpenSci is a collaborative effort to create the tools necessary to complete the reproducible research workflow. This presentation discusses the need for these tools, including examples, including interacting with the repositories Mendeley, Dryad, DataONE, and Figshare. In addition, we are building tools for searching scholarly metadata and acuiring full text of open access articles in a standarized way across metadata providers (e.g., Crossref, DataCite, DPLA) and publishers (e.g., PLOS, PeerJ, BMC, Pubmed). Last, we are building out tools for data reading and writing in Ecologial Metadata Language (EML).

SPLUNK: Log File Analysis

Jim LeFager, jlefager@depaul.edu, DePaul University Library

DePaul University Library recently took over monitoring and maintaining of the library EZproxy servers this past year and using Splunk, a machine data analysis tool, we are able to gather information and statistics on our electronic resource usage in addition to monitoring the servers. Splunk is a tool that can collect, analyze, and visualize log files and other machine data in real time and this has allowed for gathering realtime usage statistics for our electronic resources allowing us to filter by multiple facets including IP Range, Group Membership (student, faculty), so that we can see who is accessing our resources and from where. Splunk allows our library to query our data and create rich custom dashboards as well as create alerts that can be triggered when certain conditions are met, such as error codes, which can send an email alert to a group of users. We will be leveraging Splunk to monitor all library web applications going forward. This talk will review setting up Splunk and best practices in using the available features and customizations available including creating queries, alerts, and custom dashboards.

Your code does not exist in a vacuum

Becky Yoose, yoosebec at grinnell dot edu, Grinnell College (Done a lightning talk, MC duties, but have not presented a prepared talk)

“If you have something to say, then say it in code…” - Sebastian Hammer, code4lib 2009

In its 10 year run, code4lib has covered the spectrum of libtech development, from search to repositories to interfaces. However, during this time there has been little discussion about this one little fact about development - code does not exist in a vacuum.

Like the comment above, code has something to say. A person’s or organization’s culture and beliefs influences code in all steps of the development cycle. What development method you use, tools, programming languages, licenses - everything is interconnected with and influenced by the philosophies, economics, social structures, and cultural beliefs of the developer and their organization/community.

This talk will discuss these interconnections and influences when one develops code for libraries, focusing on several development practices (such as “Fail Fast, Fail Often” and Agile) and licensing choices (such as open source) that libtech has either tried to model or incorporate into mainstream libtech practices. It’ll only scratch the surface of the many influences present in libtech development, but it will give folks a starting point to further investigate these connections at their own organizations and as a community as a whole.

tl;dr - this will be a messy theoretical talk about technology and libraries. No shiny code slides, no live demos. You might come out of this talk feeling uncomfortable. Your code does not exist in a vacuum. Then again, you don’t exist in a vacuum either.

The Metadata Hopper: Mapping and Merging Metadata Standards for Simple, User-Friendly Access

Tracy Seneca, tjseneca@uic.edu, University of Illinois at Chicago
Esther Verreau: verreau1@uic.edu, University of Illinois at Chicago

The Chicago Collections Consortium: 15 institutions and growing! 8 distinct EAD standards! At least 3 permutations of MARC, and we lost count of the varieties of custom CONTENTdm image collections. Not to mention the 14,730 unique subject terms, nearly all of which lead our poor end-users to exactly one organization's content.

All large content aggregation projects have faced this challenge, and there are a few emerging tools to help us wrangle disparate metadata into new contexts. The Metadata Hopper is one such tool. The Metadata Hopper enables archivists to map their local metadata standards to standardized deposit records, and tags those materials using a shared vocabulary, integrating them into a user-friendly portal without disrupting local practices. In last year's Code4Lib lightning talk we described the challenges that the Chicago Collections Consortium faces in creating shared, in-depth access to archival and digital collections about Chicago history and culture across CCC member organizations. This year, thanks to the Andrew W. Mellon Foundation, we have a working Django application to demonstrate. In this talk we'll discuss the design that enables multiple layers of flexibility, from the ability to accept a variety of metadata standards to designing for an open source audience.

http://chicagocollectionsconsortium.org

Programmers are not projects: lessons learned from managing humans

Erin White, erwhite@vcu.edu, Virginia Commonwealth University - first-time presenter

Managing projects is one thing, but managing people is another. Whether we’re hired as managers or grow “organically” into management roles, sometimes technical people end up leading technical teams (gasp!). I’ll talk about lessons I’ve learned about hiring, retaining, and working long-term and day-to-day with highly tech-competent humans. I’ll also talk about navigating the politics of libraryland, juggling different types of projects, and working with constrained budgets to make good things and keep talented people engaged.

Practical Strategies for Picking Low-Hanging Fruits to Improve Your Library's Web Usability and UX

Bohyun Kim, bkim@hshsl.umaryland.edu, University of Maryland, Baltimore

Have you ever tried to fix an obvious (to you at least!) problem in Web usability or UX (user experience) only to face strong resistance from the library staff? Are you a strong advocate for making library resources, systems, services, and space as usable as possible, but do you often find yourself struggling to get the point across and/or obtain the crucial buy-in from colleagues and administrators?

There is no shortage of Web usability and UX guidelines. But applying them to a library and implementing desired changes often involve a long and slow process. To tackle this issue, this talk will focus on how to utilize the 'expert review' process (aka 'heuristic evaluation') as a preliminary or even preparatory step before embarking on more time-and-labor-intensive usability testing and user research. Several examples from simple fixes to more nuanced usability and UX issues in libraries will be discussed to your heart's content. The goal of this talk is to provide practical strategies for picking as many low-hanging fruits as possible to make a real (albeit small) difference to your library's Web usability and UX effectively and efficiently.

A Semantic Makeover for CMS Data

Bill Levay, wjlevay@gmail.com, Linked Jazz Project

How can we take semi-structured but messy metadata from a repository like CONTENTdm and transform it into rich linked data? Working with metadata from Tulane’s Hogan Jazz Archive Photography Collection, the Linked Jazz Project used Open Refine and Python scripts to tease out proper names, match them with name authority URIs, and specify FOAF relationships between musicians who appear together in photographs. Additional RDF triples were created for any dates associated with the photos, and for those images with place information we employed GeoNames URIs. Historical images and data that were siloed can now interact with other datasets, like Linked Jazz’s rich set of names and personal relationships, and can be visualized [link to come] or otherwise presented on the web in any number of ways. I have not previously presented at a Code4Lib conference.

Taking User Experience (UX) to new heights

Kayne Richens, kayne.richens@deakin.edu.au, Deakin University

User Experience, or "UX", is for more than just websites. At Deakin University Library we're exploring ways to improve the user experience inside our campus library spaces, by putting new technologies front and centre in the overall experience for our students. How are we doing this? We’re collaborating with the University's IT department and exploring the following Library-changing opportunities:

- Augmented Reality for Way-finding: We’re tackling that infamous thing that all Libraries can't get right – way-finding. We're enhancing library tour information and way-finding experiences by introducing augmented reality solutions.

- Heat mapping the library with wi-fi: We’re using our existing wi-fi infrastructure to present "heat maps" of library space utilisation, allowing our users to easily locate the space that best suits their needs, whether it be busy spaces to collaborate, or quiet spaces to study. And by overlaying computer usage and group study room bookings, users can quickly locate the space they need.

- Video chat library service: We’re piloting video-conferencing facilities in our group study rooms and spaces, connecting users and librarians and other professionals.

This talk will look at how these different technologies will be brought together to provide improved user experiences, as well some of the evidence and reasons that helped us to identify our needs, so you can too.

How to Hack it as a Working Parent: or, Should Your Face be Bathed in the Blue Glow of a Phone at 2 AM?

Margaret Heller, Loyola University Chicago, mheller1@luc.edu
Christina Salazar, California State University Channel Islands, christina.salazar@csuci.edu
May Yan, Ryerson University, may.yan@ryerson.ca

Modern technology has made it easier than ever for parents employed in technical environments to keep up with work at all hours and in all locations. This makes it possible to work a flexible schedule, but also may lead to problems with work/life balance and furthering unreasonable expectations about working hours. Add to that shifting gender roles and limited paid parental leave in the United States and you have potential for burnout and a certainty for anxiety. It raises the additioal question of whether the “always connected” mindset puts up a barrier to some populations who otherwise might be better represented in open source and library technology communities.

This presentation will address tools that are useful for working parents in technical library positions, and share some lessons learned about using these tools while maintaining a reasonable work/life balance. We will consider a question that Karen Coyle raised back in 1996: “What if the thousands of hours of graveyard shift amateur hacking wasn't really the best way to get the job done? That would be unthinkable.”

For those who are able to take an extended parental leave, we will present strategies for minimizing the impact to your career and your employer. For those (particularly in the United States) who are only able to take a short leave will require different strategies. Despite different levels of preparation, all are useful exercises in succession planning and making a stronger workplace and future ability to work a flexible schedule through reviewing workloads, cross-training personnel, hiring contract replacements, and creative divisions of labor. Such preparation makes work better for everyone, kids or no kids.

Making your digital objects embeddable around the web

Jessie Keck, jkeck@stanford.edu, Stanford University Libraries
Jack Reed, pjreed@stanford.edu, Stanford University Libraries

With more and more content from our digital repositories making their way into our discovery environments we quickly realize that we’re repeatedly re-inventing the wheel when it comes to creating “Viewers” for these digital objects. With various different types of viewers necessary (books, images, audio, video, geospatial data, etc) the burden of getting these viewers into various environments (topic guides, blogs, catalogs, etc) becomes exponential.

In this talk we’ll discuss how Stanford University Libraries implemented an oEmbed service to create an extensible viewer framework for all of its digital content. Using this service we’ve been able to easily integrate viewers into various discovery applications as well as make it easy for end users who discover our objects to easily embed customized versions into their own websites and blogs.

So you want to make your geospatial data discoverable

Jack Reed, pjreed@stanford.edu, Stanford University Libraries

Finding data for research or coursework can be one of the most time intensive tasks for a scholar or student. We introduce GeoBlacklight, an open source, multi-institutional software project focused on solving these common challenges at institutions across the world. GeoBlacklight prioritizes user experience, integrates with many GIS tools, and streamlines the use and organization of geospatial data. This talk will provide an introduction to the software, demonstrate current functionality, and provide a road map for future work.

Clueless-Driven Development: How I learned to migrate to Fedora 4

Adam Wead, awead@psu.edu, Penn State University

Recently I was tasked with migrating the content from our Fedora3 repository to the new Fedora4 repository architecture. Despite a wealth of community support, I had no idea how to approach, or even begin to solve this problem. I knew I wanted to follow best practices and use test-driven development to build my solution, but had no idea where to start. Despite this initial setback, I was able to start writing tests with only a vague understanding of the problem. As my tests exposed where my understanding of the problem was flawed, my code evolved, and within a week I had arrived at a working solution that exhibited all the hallmarks of good testing and software design.

This talk recounts the process I went through from starting with practically nothing, to arriving at a working solution. You can follow the rules of test-driven development, but you can write tests in an expressive way to describe the problem instead of just describing what the code should do. It was also essential to begin testing from an integration viewpoint as opposed to a unit one, because at the outset the units were unknown and were later realized through further development. For the presentation, I will be demonstrating using RSpec and Ruby. All the code examples will be related to the Hydra software stack; however, I hope to show that the processes at work will be applicable in any context.

Designing and Leading a Kick A** Tech Team

Sibyl Schaefer, sschaefer@rockarch.org, Rockefeller Archive Center

New managers are often promoted without receiving management training, yet management is not something you just figure out. The experience of being expected to know how to manage, yet not being trained to do so often results in new managers feeling isolated and unsure how to move from making to managing. In this talk I’ll focus on my own managerial experience of designing and leading an archival tech team in a small independent archives. Topics covered will include hiring, delegating, creating a team culture, and leading people whose specialized knowledge exceeds your own. The talk take-aways should be applicable to managers and employees at large and small institutions alike.

American (Archives) Horror Story: LTO Failure and Data Loss

Rebecca Fraimow, rebecca_fraimow@wgbh.org, NDSR Resident, WGBH
Casey Davis, casey_davis@wgbh.org, Project Manager, American Archive of Public Broadcasting, WGBH

Here’s a story to send shivers down archival spines: when transferring video files off LTO for the American Archive project, WGBH got an initial failure rate of 57%. After repeat tries, the rates improved; still, an unnervingly large percentage of files were never able to be transferred successfully. Even more unnerving, going public with our horror story got a big response from other archives using LTO -- it seems like many institutions are having similarly scary results. What are the real risks with LTO tape? Are there steps that archives should be taking to better circumvent those risks? This presentation will share information about LTO storage failures across archives world and discuss the process of investigating the problem at WGBH by testing different methods of data retrieval from LTO (direct and networked downloads, individual file retrieval and bulk data dump, use of LTO 4 and LTO 6 decks) and using checksum comparisons and file analysis and characterization tools such as ffprobe, mediainfo and exiftool to analyze failed files. We'll also present whatever results we’ve managed to turn up by the time of Code4Lib!

PBCore in Action: Three Words, Not Two!

Casey E. Davis, casey_davis@wgbh.org, Project Manager, American Archive of Public Broadcasting, WGBH
Andrew (Drew) Myers, andrew_myers@wgbh.org, Supervising Developer, WGBH

In 2001, public media representatives developed the PBCore XML schema to establish a common language for managing metadata about their analog and digital audio and video. Since then, PBCore has been adopted by a number of organizations and archivists in the moving image archival community. The schema has also undergone a few revisions, but on more than one occasion it was left orphaned and with little to no support.

Times have changed. You may have heard the news that PBCore is back in action as part of the American Archive of Public Broadcasting initiative and via the Association of Moving Image Archivists (AMIA) PBCore Advisory Subcommittee. A group of archivists, public media stakeholders, and engaged users have come together to provide necessary support for the standard and to see to its further development.

At this session, I'll discuss the scope and uses of PBCore in digital preservation and access, report on the progress and goals of the PBCore Advisory Subcommittee, and share how the group (by the time of the conference) will have transformed the XML schema into an RDF ontology, bringing PBCore into the second decade of the 21st century. #PBHardcore