2015 Prepared Talk Proposals

Code4lib 2015 is a loosely-structured conference that provides people working at the intersection of libraries/archives/museums/cultural heritage and technology with a chance to share ideas, be inspired, and forge collaborations. For more information about the Code4lib community, please visit http://code4lib.org/about/. The conference will be held at the Portland Hilton & Executive Tower in Portland, Oregon, from February 9-12, 2015.

Proposals for Prepared Talks:

We encourage everyone to propose a talk.

Prepared talks are 20 minutes (including setup and questions), and should focus on one or more of the following areas:

Projects you've worked on which incorporate innovative implementation of existing technologies and/or development of new software
Tools and technologies – How to get the most out of existing tools, standards and protocols (and ideas on how to make them better)
Technical issues - Big issues in library technology that should be addressed or better understood
Relevant non-technical issues – Concerns of interest to the Code4Lib community which are not strictly technical in nature, e.g. collaboration, diversity, organizational challenges, etc.

Proposals can be submitted through Friday, November 7, 2014 at 5pm PST (GMT−8). Voting will start on November 11, 2014 and continue through November 25, 2014. The URL to submit votes will be announced on the Code4Lib website and mailing list and will require an active code4lib.org account to participate. The final list of presentations will be announced in early- to mid-December.

Proposals for Prepared Talks:

Log in to the Code4lib wiki and edit this wiki page using the prescribed format. If you are not already registered, follow the instructions to do so. Provide a title and brief (500 words or fewer) description of your proposed talk. If you so choose, you may also indicate when, if ever, you have presented at a prior Code4Lib conference. This information is completely optional, but it may assist voters in opening the conference to new presenters.

Please follow the formatting guidelines:


== Talk Title: ==
 
* Speaker's name,  email address, and (optional) affiliation
* Second speaker's name, email address, and affiliation, if second speaker

Abstract of no more than 500 words.

Talk Proposals

Refinery — An open source locally deployable web platform for the analysis of large document collections

Daeil Kim, The New York Times, daeil.kim@nytimes.com

Refinery is an open source web platform for the analysis of large unstructured document collections. It extracts meaningful semantic themes within documents also known as "topics" which can be thought of as word clouds composed of terms that highly co-occur with one another. Once this semantic index is formed, one can extract relevant documents related to these topics and further refine their contents through a summarization process that allows users to search for phrases that are relevant to them within the corpus. The goal of Refinery is to make this whole process easier and to provide some of the latest scalable versions of these learning algorithms in an intuitive web-based interface. Refinery is also meant to be run locally, thus bypassing the need for securing document collections over the internet. The talk will go through some of the technologies involved and a demo of the app.

For more info check out http://www.docrefinery.org.

Drupal 8 — Evolution & Revolution

Cary Gordon, The Cherry Hill Company, cgordon@chillco.com

Drupal 8 is in beta and nearing release. Among its many features, it notably has become more developer friendly through its adoption of the Symfony PHP framework along with Symfony's outstanding set of libraries (like Guzzle) and tools (like Composer). And, in implementing the Twig theming system, it is can begin to escape PHPtemplate. These moves also make it easier to create headless systems that uses Angular.js and other systems for presentation, or even forgo presentation entirely.

From the site-builder's perspective, Drupal 8 provides a much smother experience and makes it easier to build and implement site recipes.

Using GameSalad to Build a Gamified Information Literacy Mobile App for Higher Education

Stanislav 'Stan' Bogdanov, stan@stanrb.com, Adelphi University and Boglio LLC

GameSalad is a popular tool for developing mobile and desktop games with little actual programming. In this presentation, Stan Bogdanov breaks down the development process he followed while building mobiLit, a mobile app with the goal of being the first open-source gamified information literacy app to be used as part of a college-level information literacy curriculum. He will go through the basics of using GameSalad to create an app that can be easily customized by non-programmers and the instructional principles used to teach the material in a mobile medium. Stan will also go through two qualitative design studies he did on the app and discuss their results and the lessons learned from building mobiLit. The session will conclude with an overview of the next steps for the mobiLit project.

The Impossible Search: Pulling data form unknown sources

Riley Childs, no official affiliation (currently a Senior in High School at Charlotte United Christian Academy), rchilds (AT) cucawarriors.com

It's easy to search data you know the structure of, but what if you need to pull in data from sources that don't have a standard structure. The ability to search community events along with your standard catalog search results is an example, but often the only way to pull these events is through XML, JSON, (Insert structured format here), or even just raw html. But how do you get that structure? That simple question is what makes this impossible. The process to define and process this structure takes a lot of manual labor, especially if the data you are pulling is just HTML, and then every time you add data to the index you have to run all the data through a script to pull in data in a format Solr or an other index can use. This talk will focus on Solr, but the principles explained will apply to many other indexes.

What! You're Not Using Docker?

Cary Gordon, The Cherry Hill Company, cgordon@chillco.com

Boring part: Docker[1] is a container system that provides benefits similar to virtualization with only a fraction of the overhead. Scintillating part: Docker can host between four to six times the number of service instances than systems such as Xen or VMWare on a given piece of hardware. But thats not all! Docker also makes it simple(r) to create transportable instances, so you can spin up development servers on your laptop.

[1]https://www.docker.com/

Video Accessibility, WebVTT, and Timed Text Track Tricks

Jason Ronallo, jronallo@gmail.com, NCSU Libraries

Video on the Web presents new challenges and opportunities. How do you make your video more accessible to those with various disabilities and needs? I'll show you how. This presentation will focus on how to write and deliver captions, subtitles, audio descriptions, and timed metadata tracks for Web video using the WebVTT W3C standard. Encoding timed text tracks in this way opens up opportunities for new functionality on your websites beyond accessibility. The presentation will show some examples of the potential for using timed text tracks in creative ways. I'll cover all the HTML and JavaScript you will need to know as well as some of the CSS and other bits you could probably do without but are too fun to pass up.

Categorizing Records with Random Forests

Geoffrey Boushey, geoffrey.boushey@ucsf.edu, UCSF Library

Academic libraries are increasingly responsible for providing ingest, search, discovery, and analysis for data sets. Emerging techniques from data science and machine learning can provide librarians and developers with an opportunity to generate new insights and services from these document collections. This presentation will provide a brief overview of common machine learning classification techniques, then dive into a more detailed example using a random forest to assign keywords to research data sets. The talk will emphasize the insight that can be gained from machine learning rather than the inner workings of the algorithms. The overall goal of this presentation is to provide librarians and developers with the context to recognize an opportunity to apply machine learning categorization techniques at their home campuses and organizations.

Data Science in Libraries

Devon Smith, smithde@oclc.org, OCLC

Data Science is increasing in buzz and hype. I'll go over what it is, what it isn't, and how it fits in libraries.

PDF metadata extraction for academic literature

Kevin Savage, kevin.savage at mendeley.com, Mendeley
Joyce Stack, joyce.stack at mendeley.com, Mendeley

Mendeley recently added a, "document from file," endpoint to its API which attempts to extract metadata such as title and authors directly from PDF files. This talk will describe at a high level the machine learning methods we used including how we measured and tuned our model. We will then delve more deeply into our stack, the tools we used, some of the things that didn't work and why PDFs are the worst thing ever to compute over.

Giving Users What They Want: Record Grouping in VuFind

Mark Noble, mark@marmot.org, Marmot Library Network

In 2013, Marmot did extensive usability studies with patrons to determine what was difficult in the catalog. Many patrons had problems sifting through all of the various formats and editions of a title. In 2014 we developed a method for grouping records so only a single work is shown in search results and all formats and editions are listed under that work. We will discuss our definition of a 'work' based on FRBR principles; combining meta data from MARC records with metadata from other sources like OverDrive; the technical details of Record Grouping; the design decisions made during implementation; and the reaction from users and staff.

Topic Space: a mobile augmented reality recommendation app

Jim Hahn, jimhahn@illinois.edu, University of Illinois at Urbana-Champaign

The Topic Space module (http://minrvaproject.org/modules_topicspace.php ) was developed with an IMLS Sparks! Grant to investigate augmented reality technologies for in-library recommendations. The funding allowed for sustained university community collaboration by the University Library, the Graduate School of Library and Information Science, as well as graduate student programmers sourced from the Department of Computer Science. Collaborators designed app functionality and identified relevant open source libraries that could power optical character recognition (OCR) functionality from within the mobile phone.

Topic space allows a user to take a picture of an item's call number in the book stacks. The module will show the user other books that are relevant but that are not shelved nearby. It can also show users books that are normally shelved here but that are currently checked out. Recommendations are based on Library of Congress subject headings and ILS circulation data which indicate recommendation candidates based on total check-outs.

Research questions included development of back end (server-side) pattern matching algorithms for recommendations, and a rapid formative evaluation of interface design that would provide optimal user experience for navigation of the book stacks as a context to recommendations.

Along with the Topic Space native app, grant collaborators prototyped web based recommendations which could serve as a new way of providing readers advisory and “more like this” recommendations from discovery interfaces accessed through desktop browsers. Outcomes of the grant include the availability of the Topic Spaces module within Minrva app on the Android Play store and an experimental Backbone.js based Topic Space web app.

Leveling Up Your Git Workflow

Megan Kudzia, moneill@albion.edu, Albion College Library
Kate Sears, eks11@albion.edu, Albion College Library

Have you started experimenting with Git on your own, but now you need to include others in your projects? Learn from our mistakes! Transitioning from a one-person git workflow and repo structure, to a structure that includes multiple people (including student workers), is not for the faint of heart. We'll talk about why we decided to work this way, our path to developing a git culture amongst ourselves, conceptual and technical difficulties we've faced, what we learned, and where we are now. Also with pretty pictures (aka workflow drawings).

Drone Loaning Program: Because Laptops are so last century

* Uche Enwesi, uenwesi@umd.edu, University of Maryland Libraries
* Francis Kayiwa, fkayiwa@umd.edu, University of Maryland Libraries

At Univ. Maryland we are in the very early stages of looking into allowing our student body get their hands on a drone. Yes that's right we will let students take out a drone for n amount of hours to work on projects of their choosing. The talk will talk about the logistics of getting a program of this sort from concept to "Is the drone available?". If people sign waivers we will also promise not to crash the drone into code4lib attendees.

Got Git? Getting More Out of Your GitHub Repositories

* Terry Brady, twb27@georgetown.edu, Georgetown University Library

This presentation will discuss how librarians, developers, and system administrators at Georgetown University are maximizing their use of the public and private GitHub repositories.

In additional to all of the great benefits of using Git for code management, the GitHub interface provides a powerful set of tools to showcase a project and to keep your users informed of developments to your project. These tools can assist with marketing and outreach - turning your code repository into a focus of conversation!

Style-able Project Pages
Project Wikis
Project Release Notes/Portfolios
Web Resources That Can Be Directly Requested
Gists for code sharing
Private Repositories and Organizational Groups
Pull Request Conversation Tracking
Customized Issue management

Quick Wins for Every Department in the Library - File Analyzer!

* Terry Brady, twb27@georgetown.edu, Georgetown University Library

The Georgetown University Library has customized workflows for nearly every department in our library with a single code base.

Analyzing Marc Records for the Cataloging department
Transferring ILS invoices for the University Account System for the Acquisitions department
Delivering patron fines to the Bursar’s office for the Access Service department
Summarizing student worker timesheet data for the Finance department
Validating COUNTER compliant reports for the Electronic Resources department
Generating ingest packages for the Digital Services department
Validating checksums for the Preservation department

Learn how you can customize the File Analyzer to become a hero in your library!

The Geospatial World is Moving from Maps on the Web to Maps of the web. Libraries can too

Mita Williams, mita@uwindsor.ca, User Experience Librarian, University of Windsor

The transition from paper maps to digital ones changed much more than the maps themselves; it changed the very foundation of how we work and how we find each other. Now maps are transforming again. The Geospatial World is moving from GIS systems that are institutionally-focused, expensive, feature-burdened, and binds data into a complicated and demanding user-hostile interface. From this transition from digital to web-based digital geospatial tools has come growth and development in new forms of map-based investigative journalism, activism, scholarship, and business ventures. This talk will highlight the conditions and strategies that made these changes possible as a means to draw a path by which librarians through our own work may follow, dragons notwithstanding.

Building Your Own Federated Search

Rich Trott, Richard.Trott@ucsf.edu, UC San Francisco

Advances in modern browsers have created some interesting possibilities for federated search. This presentation will cover common techniques and pitfalls in building a federated search. We will discuss what principles guided our decisions when implementing our own federated search. We will show tools we've built and our findings from building and using experimental prototypes.

Your higher education institution likely offers dozens of online resources for educators, students, researchers, and the public. And each of these online resources likely has its own search tool. But users can't be expected to search in dozens of different interfaces to find what they're looking for. A typical solution for this issue is federated search.

Indexing Linked Data with LDPath

Chris Beer, cabeer@stanford.edu, Stanford University Libraries

LDPath [1] is a simple query language for indexing linked open data, with support for caching, content negotiation, and integration with non-RDF endpoints. This talk will demonstrate the features and potential of the language and framework to index a resource with links into id.loc.gov, viaf.org, geonames.org, etc to build an application-ready document.

[1] http://marmotta.apache.org/ldpath/language.html

Show Me the Money: Integrating an LMS with Payment Providers

Josh Weisman, Josh.Weisman@exlibrisgroup.com, Development Director-Resources Management, Ex Libris Group

In order to provide an easy and convenient way for patrons to pay fines, we are exploring ways to integrate the library management system with online payment providers such as PayPal. With many LMS systems being designed and developed for the cloud, we should be able to provide the frictionless user experience our patrons have come to expect from online transactions. In this session we'll discuss strategies for integration and review a sample application which uses REST APIs from a library management system to integrate with PayPal.

Shibboleth Federated Authentication for Library Applications:

Scott Fisher, scott.fisher@ucop.edu, California Digital Library
Ken Weiss, ken.weiss@ucop.edu, California Digital Library

Shibboleth is the most widely-used method to provide single-sign-on authentication to academic applications where users come from many different institutions. Shibboleth, the InCommon education and research trust framework, and the SAML protocol comprise a very powerful - but very complicated - solution to this very complicated problem. Scott and Ken have implemented Shibboleth for multiple library applications. They will share their understanding of the good, the bad, and the underlying spaghetti that makes it all work. Ken will discuss some of the technical aspects of the solution, touching on optimal and non-optimal use cases, administrative challenges, and authorization concerns. Scott will describe the implementation pattern for multi-institution single-sign-on that the California Digital Library has evolved, using the recently released Dash application (http://dash.cdlib.org) as an example.

Scientific Data: A Needs Assessment Journey

Vicky Steeves, vsteeves@amnh.org, American Museum of Natural History

While surveying digital research and collections data in the research science divisions at the American Museum of Natural History in NYC (as a part of my National Digital Stewardship Residency project), I have come across the big data hogs (genome sequencing and CT scanning) and the little pieces of data (images, publications), all equally important to not only scientific discovery, but as nodes in the history of science.

In this session, I will discuss the development of my needs assessment surveys for scientific datasets and the interview process with Museum curators and researchers as background, seguing into an explanation of the results. I will then combine my findings into preliminary selection criteria to choose tools for digital preservation and management unique to scientific datasets. This will brooke a discussion on emerging standards, tools, and technologies in big data, specific to research science.

I will conclude with preliminary findings on emerging technology that can be used to answer concerns surrounding the management and digital preservation of these data. I am hoping the Q&A session can be used to both answer questions about my project, and function as a way for you (the larger tech-savy library community) to discuss the tools I’ve touched on in this talk.