2013 talks proposals

Deadline for talk submission is Friday, November 2 at 5pm PT. We ask that no changes be made after this point, so that every voter reads the same thing. You can update your description again after voting closes.

Prepared talks are 20 minutes (including setup and questions), and focus on one or more of the following areas:

tools (some cool new software, software library or integration platform)
specs (how to get the most out of some protocols, or proposals for new ones)
challenges (one or more big problems we should collectively address)

The community will vote on proposals using the criteria of:

usefulness
newness
geekiness
uniqueness
awesomeness

Please follow the formatting guidelines:

== Talk Title ==
 
* Speaker's name, affiliation, and email address
* Second speaker's name, affiliation, email address, if applicable

Abstract of no more than 500 words.

1 Modernizing VuFind with Zend Framework 2
2 Did You Really Say That Out Loud? Tools and Techniques for Safe Public WiFi Computing
3 Drupal 8 Preview — Symfony and Twig
4 Neat! But How Do We Do It? - The Real-world Problem of Digitizing Complex Corporate Digital Objects
5 ResCarta Tools building a standard format for audio archiving, discovery and display
6 Format Designation in MARC Records: A Trip Down the Rabbit-Hole
7 Touch Kiosk 2: Piezoelectric Boogaloo
8 Wayfinding in a Cloud: Location Service for libraries
9 Empowering Collection Owners with Automated Bulk Ingest Tools for DSpace
10 Quality Assurance Reports for DSpace Collections
11 A Hybrid Solution for Improving Single Sign-On to a Proxy Service with Squid and EZproxy through Shibboleth and ExLibris’ Aleph X-Server
12 HTML5 Video Now!
13 Hybrid Archival Collections Using Blacklight and Hydra
14 Making the Web Accessible through Solid Design
15 Getting People to What They Need Fast! A Wayfinding Tool to Locate Books & Much More
16 De-sucking the Library User Experience
17 Solr Testing Is Easy with Rspec-Solr Gem
18 Northwestern's Digital Image Library
19 Two standards in a software (to say nothing of Normarc)
20 Future Friendly Web Design for Libraries
21 BYU's discovery layer service aggregator
22 The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery
23 The DH Curation Guide: Building a Community Resource
24 Solr Update
25 Reports for the People
26 Network Analyses of Library Catalog Data
27 Pitfall! Working with Legacy Born Digital Materials in Special Collections

Modernizing VuFind with Zend Framework 2

Demian Katz, Villanova University, demian DOT katz AT villanova DOT edu

When setting goals for a new major release of VuFind, use of an existing web framework was an important decision to encourage standardization and avoid reinvention of the wheel. Zend Framework 2 was selected as providing the best balance between the cutting-edge (ZF2 was released in 2012) and stability (ZF1 has a long history and many adopters). This talk will examine some of the architecture and features of the new framework and discuss how it has been used to improve the VuFind project.

Did You Really Say That Out Loud? Tools and Techniques for Safe Public WiFi Computing

Peter Murray, LYRASIS, Peter.Murray@lyrasis.org

Public WiFi networks, even those that have passwords, are nothing more that an old-time party line: what every you say can be easily heard by anyone nearby. Remember Firesheep? It was an extension to Firefox that demonstrated how easy it was to snag session cookies and impersonate someone else. So what are you sending out over the airwaves, and what techniques are available to prevent eavesdropping? This talk will demonstrate tools and techniques for desktop and mobile operating systems that you should be using right now -- right here at Code4Lib -- to protect your data and your network activity.

Drupal 8 Preview — Symfony and Twig

Cary Gordon, The Cherry Hill Company, cgordon@chillco.com

Drupal is a great platform for building web applications. Last year, the core developers decided to adopt the Symfony PHP framework, because it would lay the groundwork for the modernization (and de-PHP4ification) of the Drupal codebase. As I write this, the Symfony ClassLoader and HttpFoundation libraries are committed to Drupal core, with more elements likely before Drupal 8 code freeze.

It seems almost certain that the Twig templating engine will supplant PHPtemplate as the core Drupal template engine. Twig is a powerful, secure theme building tool that removes PHP from the templating system, the result being a very concise and powerful theme layer.

Symfony and Twig have a common creator, Fabien Potencier, who's overall goal is to rid the world of the excesses of PHP 4.

Neat! But How Do We Do It? - The Real-world Problem of Digitizing Complex Corporate Digital Objects

Matthew Mariner, University of Colorado Denver, Auraria Library, matthew.mariner@ucdenver.edu

Isn't it neat when you discover that you are the steward of dozens of Sanborn Fire Instance Maps, hundreds of issues of a city directory, and thousands of photographs of persons in either aforementioned medium? And it's even cooler when you decide, "Let's digitize these together and make them one big awesome project to support public urban history"? Unfortunately it's a far more difficult process than one imagines at inception and, sadly, doesn't always come to fruition. My goal here is to discuss the technological (and philosophical) problems librarians and archivists face when trying to create ultra-rich complex corporate digital projects, or, rather, projects consisting of at least three facets interrelated by theme. I intend to address these problems by suggesting management solutions, web workarounds, and, perhaps, a philosophy that might help in determining whether to even move forward or not. Expect a few case studies of "grand ideas crushed by technological limitations" and "projects on the right track" to follow.

ResCarta Tools building a standard format for audio archiving, discovery and display

John Sarnowski, The ResCarta Foundation, john.sarnowski@rescarta.org

The free ResCarta Toolkit has been used by libraries and archives around the world to host city directories, newspapers, and historic photographs and by aerospace companies to search and find millions of engineering documents. Now the ResCarta team has released audio additions to the toolkit.

Create full text searchable oral histories, news stories, interviews. or build an archive of lectures; all done to Library of Congress standards. The included transcription editor allows for accurate correction of the data conversion tool’s output. Build true archives of text, photos and audio. A single audio file carries the embedded Axml metadata, transcription, and word location information. Checks with the FADGI BWF Metaedit.

ResCarta-Web presents your audio to IE, Chome, Firefox, Safari, and Opera browsers with full playback and word search capability. Display format is OGG!!

You have to see this tool in action. Twenty minutes from an audio file to transcribed, text-searchable website. Be there or be L seven (Yeah, I’m that old)

Format Designation in MARC Records: A Trip Down the Rabbit-Hole

Michael Doran, University of Texas at Arlington, doran@uta.edu

This presentation will use a seemingly simple data point, the "format" of the item being described, to illustrate some of the complexities and challenges inherent in the parsing of MARC records. I will talk about abstract vs. concrete forms; format designation in the Leader, 006, 007, and 008 fixed fields as well as the 245 and 300 variable fields; pseudo-formats; what is mandatory vs. optional in respect to format designation in cataloging practice; and the differences between cataloging theory and practice as observed via format-related data mining of a mid-size academic library collection.

I understand that most of us go to code4lib to hear about the latest sexy technologies. While MARC isn't sexy, many of the new tools being discussed still need to be populated with data gleaned from MARC records. MARC format designation has ramifications for search and retrieval, limits, and facets, both in the ILS and further downstream in next generation OPACs and web-scale discovery tools. Even veteran library coders will learn something from this session.

Touch Kiosk 2: Piezoelectric Boogaloo

Andreas Orphanides, North Carolina State University Libraries, akorphan@ncsu.edu

At the NCSU Libraries, we provide realtime access to information on library spaces and services through an interactive touchscreen kiosk in our Learning Commons. In the summer of 2012, two years after its initial deployment, I redeveloped the kiosk application from the ground up, with an entirely new codebase and a completely redesigned user interface. The changes I implemented were designed to remedy previously identified shortcomings in the code and the interface design [1], and to enhance overall stability and performance of the application.

In this presentation I will outline my revision process, highlighting the lessons I learned and the practices I implemented in the course of redevelopment. I will highlight the key features of the HTML/Javascript codebase that allow for increased stability, flexibility, and ease of maintenance; and identify the changes to the user interface that resulted from the usability findings I uncovered in my previous research. Finally, I will compare the usage patterns of the new interface to the analysis of the previous implementation to examine the practical effect of the implemented changes.

I will also provide access to a genericized version of the interface code for others to build their own implementations of similar kiosk applications.

[1] http://journal.code4lib.org/articles/5832

Wayfinding in a Cloud: Location Service for libraries

Petteri Kivimäki, The National Library of Finland, petteri.kivimaki@helsinki.fi

Searching for books in large libraries can be a difficult task for a novice library user. This paper presents The Location Service, software as a service (SaaS) wayfinding application developed and managed by The National Library of Finland, which is targeted for all the libraries. The service provides additional information and map-based guidance to books and collections by showing their location on a map, and it can be integrated with any library management system, as the integration happens by adding a link to the service in the search interface. The service is being developed continuously based on the feedback received from the users.

The service has two user interfaces: One for the customers and one for the library staff for managing the information related to the locations. The UI for the customers is fully customizable by the libraries, and the customization is done via template files by using the following techniques: HTML, CSS, and Javascript/jQuery. The service supports multiple languages, and the libraries have a full control of the languages, which they want to support in their environment.

The service is written in Java and it uses Spring and Hibernate frameworks. The data is stored in PostgreSQL database, which is shared by all the libraries. They do not possess a direct access to the database, but the service offers an interface, which makes it possible to retrieve XML data over HTTP. Modification of the data via admin UI, however, is restricted, and access on the other libraries’ data is blocked.

Empowering Collection Owners with Automated Bulk Ingest Tools for DSpace

Terry Brady, Georgetown University, twb27@georgetown.edu

The Georgetown University Library has developed a number of applications to expedite the process of ingesting content into DSpace.

Automatically inventory a collection of documents or images to be uploaded
Generate a spreadsheet for metadata capture based on the inventory
Generate item-level ingest folders, contents files and dublin core metadata for the items to be ingested
Validate the contents of ingest folders prior to initiating the ingest to DSpace
Present users with a simple, web-based form to initiate the batch ingest process

The applications have eliminated a number of error-prone steps from the ingest workflow and have significantly reduced a number of tedious data editing steps. These applications have empowered content experts to be in charge of their own collections.

In this presentation, I will provide a demonstration of the tools that were built and discuss the development process that was followed.

Quality Assurance Reports for DSpace Collections

Terry Brady, Georgetown University, twb27@georgetown.edu

The Georgetown University Library has developed a collection of quality assurance reports to improve the consistency of the metadata in our DSpace collections. The report infrastructure permits the creation of query snippets to test for possible consistency errors within the repository such as items missing thumbnails, items with multiple thumbnails, items missing a creation date, items containing improperly formatted dates, items without duplicated metadata fields, items recently added items across the repository, a community or a collection

These reports have served to prioritize programmatic data cleanup tasks and manual data cleanup tasks. The reports have served as a progress tracker for data cleanup work and will provide on-going monitoring of the metadata consistency of the repository.

In this presentation, I will provide a demonstration of the tools that were built and discuss the development process that was followed.

A Hybrid Solution for Improving Single Sign-On to a Proxy Service with Squid and EZproxy through Shibboleth and ExLibris’ Aleph X-Server

Alexander Jerabek, UQAM - Université du Québec à Montréal, jerabek.alexander_j@uqam.ca
Minh-Quang Nguyen, UQAM - Université du Québec à Montréal, nguyen.minh-quang@uqam.ca

In this talk, we will describe how we developed and implemented a hybrid solution for improving single sign-on in conjunction with the library’s proxy service. This hybrid solution consists of integrating the disparate elements of EZproxy, the Squid workflow, Shibboleth, and the Aleph X-Server. We will report how this new integrated service improves the user experience. To our knowledge, this new service is unique and has not been implemented anywhere else. We will also present some statistics after approximately one year in production.

See article: http://journal.code4lib.org/articles/7470

HTML5 Video Now!

Jason Ronallo, North Carolina State University Libraries, jnronall@ncsu.edu

Can you use HTML5 video now? Yes.

I'll show you how to get started using HTML5 video, including gotchas, tips, and tricks. Beyond the basics we'll see the power of having video integrated into HTML and the browser. Finally, we'll look at examples that push the limits and show the exciting future of video on the Web.

My experience comes from technical development of an oral history video clips project. I developed the technical aspects of the project, including video processing, server configuration, development of a public site, creation of an administrative interface, and video engagement analytics. Major portions of this work have been open sourced under an MIT license.

Hybrid Archival Collections Using Blacklight and Hydra

Adam Wead, Rock and Roll Hall of Fame and Museum, awead@rockhall.org

At the Library and Archives of the Rock and Roll Hall of Fame, we use available tools such as Archivists' Toolkit to create EAD finding aids of our collections. However, managing digital content created from these materials and the born-digital content that is also part of these collections represents a significant challenge. In my presentation, I will discuss how we solve the problem of our hybrid collections by using Hydra as a digital asset manager and Blacklight as a unified presentation and discovery interface for all our materials.

Our strategy centers around indexing ead xml into Solr as multiple documents: one for each collection, and one for every series, sub-series and item contained within a collection. For discovery, we use this strategy to leverage item-level searching of archival collections alongside our traditional library content. For digital collections, we use this same technique to represent a finding aid in Hydra as a set of linked objects using RDF. New digital items are then linked to these parent objects at the collection and series level. Once this is done, the items can be exported back out to the Blacklight solr index and the digital content appears along with the rest of the items in the collection.

Making the Web Accessible through Solid Design

Cynthia Ng from Ryerson University Library & Archives

In libraries, we are always trying our best to be accessible to everyone and we make every effort to do so physically, but what about our websites? Web designers are great at talking about the user experience and how to improve it, but what sometimes gets overlooked is how to make a site more accessible and meet accessibility guidelines. While guidelines are necessary to cover a minimum standard, web accessibility should come from good web design without ‘sacrificing’ features. While it's difficult to make a website fully accessible to everyone, there are easy, practical ways to make a site as accessible as possible.

While the focus will be on websites and meeting the Web Accessibility Guidelines WCAG, the presentation will also touch on how to make custom web interfaces accessible.

Getting People to What They Need Fast! A Wayfinding Tool to Locate Books & Much More

Steven Marsden, Ryerson University Library & Archives, steven dot marsden at ryerson dot ca
Cynthia Ng, Ryerson University Library & Archives

Having a bewildered, lost user in the building or stacks is a common occurrence, but we can help our users find their way through enhanced maps and floor plans. While not a new concept, these maps are integrated into the user’s flow of information without having to load a special app. The map not only highlights the location, but also provides all the related information with a link back to the detailed item view. During the first stage of the project, it has only be implemented for books (and other physical items), but the 'RULA Finder' is built to help users find just about anything and everything in the library including study rooms, computer labs, and staff. With a simple to use admin interface, it makes it easy for everyone, staff and users.

The application is written in PHP with data stored in a MySQL database. The end-user interface involves jQuery, JSON, and the library's discovery layer (Summon) API.

The presentation will not only cover the technical aspects, but also the implementation and usability findings.

De-sucking the Library User Experience

Jeremy Prevost, Northwestern University, j-prevost {AT} northwestern [DOT] edu

Have you ever thought that library vendors purposely create the worst possible user experience they can imagine because they just hate users? Have you ever thought that your own library website feels like it was created by committee rather than for users because, well, it was? I’ll talk about how we used vendor supplied APIs to our ILS and Discovery tool to create an experience for our users that sucks at least a little bit less.

The talk will provide specific examples of how inefficient or confusing vendor supplied solutions are from a user perspective along with our specific streamlined solutions to the same problems. Code examples will be minimal as the focus will be on improving user experience rather than any one code solution of doing that. Examples may include the seemingly simple tasks of renewing a book or requesting an item from another campus library.

Solr Testing Is Easy with Rspec-Solr Gem

Naomi Dushay, Stanford University, ndushay AT stanford DOT edu

How do you know if

your idea for "left anchoring" searches actually works?
your field analysis for LC call numbers accommodates a suffix between the first and second cutter without breaking the rest of LC call number parsing?
tweaking Solr configs to improve, say, Chinese searching, won't break Turkish and Cyrillic?
changes to your solrconfig file accomplish what you wanted without breaking anything else?

Avoid the whole app stack when writing Solr acceptance/relevancy/regression tests! Forget cucumber and capybara. This gem lets you easily (only 4 short files needed!) write tests like this, passing arbitrary parameters to Solr:

 it "unstemmed author name Zare should precede stemmed variants" do
   resp = solr_response(author_search_args('Zare').merge({'fl'=>'id,author_person_display', 'facet'=>false}))
   resp.should include("author_person_display" => /\bZare\W/).in_each_of_first(3).documents
   resp.should_not include("author_person_display" => /Zaring/).in_each_of_first(20).documents
 end
     
 it "Cyrillic searching should work:  Восемьсoт семьдесят один день" do
   resp = solr_resp_doc_ids_only({'q'=>'Восемьсoт семьдесят один день'})
   resp.should include("9091779")
 end
  
 it "q of 'String quartets Parts' and variants should be plausible " do
   resp = solr_resp_doc_ids_only({'q'=>'String quartets Parts'})
   resp.should have_at_least(2000).documents
   resp.should have_the_same_number_of_results_as(solr_resp_doc_ids_only({'q'=>'(String quartets Parts)'}))
   resp.should have_more_results_than(solr_resp_doc_ids_only({'q'=>'"String quartets Parts"'}))
 end
  
 it "Traditional Chinese chars 三國誌 should get the same results as simplified chars 三国志" do
   resp = solr_response({'q'=>'三國誌', 'fl'=>'id', 'facet'=>false}) 
   resp.should have_at_least(240).documents
   resp.should have_the_same_number_of_results_as(solr_resp_doc_ids_only({'q'=>'三国志'})) 
 end

See http://rubydoc.info/github/sul-dlss/rspec-solr/frames https://github.com/sul-dlss/rspec-solr

and our production relevancy/acceptance/regression tests slowly migrating from cucumber to: https://github.com/sul-dlss/sw_index_tests

Northwestern's Digital Image Library

Mike Stroming, Northwestern University Library, m-stroming AT northwestern DOT edu
Edgar Garcia, Northwestern University Library, edgar-garcia AT northwestern DOT edu

At Northwestern University Library, we are about to release a beta version of our Digital Image Library (DIL). DIL is an implementation of the Hydra technology that provides a Fedora repository solution for discovery of and access to over 100,000 images for staff, students, and scholars. Some important features are:

Build custom collection of images using drag-and-drop
Re-order images within a collection using drag-and-drop
Nest collections within other collections
Create details/crops of images
Zoom, rotate images
Upload personal images
Retrieve your own uploads and details from a collection
Export a collection to a PowerPoint presentation
Create a group of users and authorize access to your images
Batch edit image metadata

Our presentation will include a demo, explanation of the architecture, and a discussion of the benefits of being a part of the Hydra open-source community.

Two standards in a software (to say nothing of Normarc)

Zeno Tajoli, CINECA (Italy), z DOT tajoli AT cineca DOT it

With this presentation I want to show how ILS Koha handles the support of three differnt MARC dialects: MARC21, Unimarc and Normarc. The main points of the presentation:

Three MARC at MySQL level
Three MARC at API level
Three MARC at display
Can I add a new format ?

Future Friendly Web Design for Libraries

Michael Schofield, Alvin Sherman Library, Research, and Information Technology Center, mschofied[dot]nova[dot]edu

Libraries on the web are afterthoughts. Often their design is stymied on one hand by red tape imposed by the larger institution and on the other by an overload of too democratic input from colleagues. Slashed budgets / staff stretched too thin foul-up the R-word (that'd be "redesign") - but things are getting pretty strange. Notions about the Web (and where it can be accessed) are changing.

So libraries can only avoid refabbing their fixed-width desktop and jQuery Mobile m-dot websites for so long until desktop users evaporate and demand from patrons with web-ready refrigerators becomes deafening. Just when we have largely hopped on the bandwagon and gotten enthusiastic about being online, our users expect a library's site to look and perform great on everything.

Our presence on the web should be built to weather ever-increasing device complexity. To meet users at their point of need, libraries must start thinking Future Friendly.

This overview rehashes the approach and philosophy of library web design, re-orienting it for maximum accessibility and maximum efficiency of design. While just 20 minutes, we'll mull over techniques like mobile-first responsive web design, modular CSS, browser feature detection for progressive enhancement, and lots of nifty tricks.

BYU's discovery layer service aggregator

Curtis Thacker, Brigham Young University, curtis.thacker AT byu DOT edu

It is clear that libraries will continue to experience rapid change based on the speed of technology. To acknowledge this new reality and to provide rapid response to shifting end user paradigms BYU has developed a custom service aggregator. At first our vendors looked at us a bit funny; however, in the last year they have been astonished with the fluid implementation of new services – here’s the short list:

filmfinder - a tool for browsing and searching films
A custom book recommender service based on checkout data
Integrated library services like personell, library hours, study room scheduler and database finder through a custom adwords system.
A very geeky and powerful utility used for converting marc XML into primo compliant xml.
Embedded floormaps
A responsive web design
Bing did-you-mean
And many more.

I will demo the system, review the archtecture and talk about future plans.

The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery

Michael Klein, Senior Software Developer, Northwestern University LIbrary, michael.klein AT northwestern DOT edu
Nathan Rogers, Programmer/Analyst, Indiana University, rogersna AT indiana DOT edu

Based on the success of the Variations digital music platform, Indiana University and Northwestern University have developed a next generation educational tool for delivering multimedia resources to the classroom. The Avalon Media System (formerly Variations on Video) supports the ingest, media processing, management, and access-controlled delivery of library-managed video and audio collections. To do so, the system draws on several existing, mature, open source technologies:

The ingest, search, and discovery functionality of the Hydra framework
The powerful multimedia workflow management features of Opencast Matterhorn
The flexible Engage audio/video player
The streaming capabilities of both Red5 Media Server (open source) and Adobe Flash Media Server (proprietary)

Extensive customization options are built into the framework for tailoring the application to the needs of a specific institution.

Our goal is to create an open platform that can be used by other institutions to serve the needs of the academic community. Release 1 is planned for a late February launch with future versions released every couple of months following. For more information visit http://avalonmediasystem.org/ and https://github.com/variations-on-video/hydrant.

The DH Curation Guide: Building a Community Resource

Robin Davis, John Jay College of Criminal Justice, robdavis AT jjay.cuny.edu
James Little, University of Illinois Urbana-Champaign, little9 AT illinois.edu

Data curation for the digital humanities is an emerging area of research and practice. The DH Curation Guide, launched in July 2012, is an educational resource that addresses aspects of humanities data curation in a series of expert-written articles. Each provides a succinct introduction to a topic with annotated lists of useful tools, projects, standards, and good examples of data curation done right. The DH Curation Guide is intended to be a go-to resource for data curation practitioners and learners in libraries, archives, museums, and academic institutions.

Because it's a growing field, we designed the DH Curation Guide to be a community-driven, living document. We developed a granular commenting system that encourages data curation community members to contribute remarks on articles, article sections, and article paragraphs. Moreover, we built in a way for readers to contribute and annotate resources for other data curation practitioners.

This talk will address how the DH Curation Guide is currently used and will include a sneak peek at the articles that are in store for the Guide’s future. We will talk about the difficulties and successes of launching a site that encourages community. We are all builders here, so we will also walk through developing the granular commenting/annotation system and the XSLT-powered publication workflow.

Solr Update

Erik Hatcher, LucidWorks, erik.hatcher AT lucidworks.com

Solr is continually improving. Solr 4 was recently released, bringing dramatic changes in the underlying Lucene library and Solr-level features. It's tough for us all to keep up with the various versions and capabilities.

This talk will blaze through the highlights of new features and improvements in Solr 4 (and up). Topics will include: SolrCloud, direct spell checking, surround query parser, and many other features. We will focus on the features library coders really need to know about.

Reports for the People

Kara Young, Keene State College, NH, kyoung1 at keene.edu
Dana Clark, Keene State College, NH, dclark5 at keene.edu

Libraries are increasingly being called upon to provide information on how our programs and services are moving our institutional strategic goals forward. In support of College and departmental Information Literacy learning outcomes, Mason Library Systems at Keene State College developed an assessment database to record and report assessment activities by Library faculty. Frustrated by the lack of freely available options for intuitively recording, accounting for, and outputting useful reports on instructional activities, Librarians requested a tool to make capturing and reporting activities (and their lives) easier. Library Systems was able to respond to this need by working with librarians to identify what information is necessary to capture, where other assessment tools had fallen short, and ultimately by developing an application that supports current reporting imperatives while providing flexibility for future changes.

The result of our efforts was an in-house browser interfaced Assessment Database to improve the process of data collection and analysis. The application is written in PHP, data stored in a MySQL database, and presented via browser making extensive use of JQuery and JQuery plug-ins for data collection, manipulation, and presentation. The presentation will outline the process undertaken to build a successful collaboration with Library faculty from conception to implementation, as well as the technical aspects of our trial-and-error approach. Plus: cool charts and graphs!

Network Analyses of Library Catalog Data

Kirk Hess, University of Illinois at Urbana-Champaign, kirkhess AT illinois.edu
Harriett Green, University of Illinois at Urbana-Champaign, green19 AT illinois.edu

Library collections are all too often like icebergs: The amount exposed on the surface is only a fraction of the actual amount of content, and we’d like to recommend relevant items from deep within the catalog to users. With the assistance of an XSEDE Allocation grant (http://xsede.org), we’ve used R to reconstitute anonymous circulation data from the University of Illinois’s library catalog into separate user transactions. The transaction data is incorporated into subject analyses that use XSEDE supercomputing resources to generate predictive network analyses and visualizations of subject areas searched by library users using Gephi (https://gephi.org/). The test data set for developing the subject analyses consisted of approximately 38,000 items from the Literatures and Languages Library that contained 110,000 headings and 130,620 transactions. We’re currently working on developing a recommender system within VuFind to display the results of these analyses.

Pitfall! Working with Legacy Born Digital Materials in Special Collections

Donald Mennerich, The New York Public Library, don.mennerich AT gmail.com
Mark A. Matienzo, Yale University Library, mark AT matienzo.org

Archives and special collections are being faced with a growing abundance of born digital material, as well as an abundance of many promising tools for managing them. However, one must consider the potential problems that can arise when approaching a collection containing legacy materials (from roughly the pre-internet era). Many of the tried and true, "best of breed" tools for digital preservation don't always work as they do for more recent materials, requiring a fair amount of ingenuity and use of "word of mouth tradecraft and knowledge exchanged through serendipitous contacts, backchannel conversations, and beer" (Kirschenbaum, "Breaking badflag").

Our presentation will focus on some of the strange problems encountered and creative solutions devised by two digital archivists in the course of preserving, processing, and providing access to collections at their institutions. We'll be placing particular particular emphasis of the pitfalls and crocodiles we've learned to swing over safely, while collecting treasure in the process. We'll address working with CP/M disks in collections of authors' papers, reconstructing a multipart hard drive backup spread across floppy disks, and more.