Southeast 2019 Program

From Code4Lib
Revision as of 16:52, 2 May 2019 by Kbeswick (Talk | contribs) (15 minute talks)

Jump to: navigation, search
  • Planning Committee: Kevin Beswick, Bret Davidson, Mike Kastellec, Mia Partlow, Hannah Rainey

Contents

15 minute talks

Dead Simple Catalog Indexer: Using Solr for better MARC searching

Dennis Christman (Duke University)
Are you ever frustrated searching MARC records in your ILS? Do you ever have complex searches or updates that are difficult or even impossible for your system to handle? The Dead Simple Catalog Indexer is an open source tool developed by NCSU Libraries that takes MARC records and puts them into Solr for you. Duke University Libraries (DUL) Technical Services has recently implemented this tool, opening up exciting new workflows for working with our data. Many of the complex searches we are now able to do would have previously required server level access, effectively creating a bottleneck where our projects had to work on another department’s timeline. Using this tool has helped to alleviate this bottleneck, allowing us to work through projects more quickly and freeing up the time of our colleagues. This session will briefly describe the tool and its implementation process, and then go over several projects where we utilized the tool. If you have ever needed to know every record had a certain combination of LDR position 06 and 337 values and haven’t been able to, this might be the tool for you.

Asserting for Success: Leveraging TravisCI, Bash, and Unit Tests to Ensure Metadata Transformations Do What We Expect

Mark Baggett (University of Tennessee)
Serving as a Digital Public Library of America (DPLA) service hub since 2015, the Digital Library of Tennessee uses Repox to aggregate our state’s cultural heritage materials and transforms each partner institution’s unique metadata mappings (DC, QDC, XOAI, MODS) to a shared format using XSLT. Over the years, testing metadata transforms before deployment to production has been time-consuming and frustrating for both the transform’s writer and its reviewer. It has also occasionally led to frantic moments before a scheduled ingest to repair a broken transform that went unnoticed during the review process. In this talk, I will go over our recent adoption of unit tests for this type of quality control, discuss what it's helped solve, and demonstrate how automated testing is not just for developers, but can help solve the work of librarians as well.

Metadata Cartography: MAPping metadata for a repository migration

Anna Goslen, Rebekah Kati (University of North Carolina at Chapel Hill)
The Carolina Digital Repository (CDR) at UNC-Chapel Hill will migrate from custom Fedora to Samvera Hyrax. As part of the content remediation process and preparation for storage and display in the new system, we need to migrate our MODS metadata to RDF. In this presentation, we will explain our repository and metadata use cases, describe the Metadata Application Profile creation process and offer advice and best practices for attendees who are contemplating their own Fedora to Hyrax content migration. We will discuss how legacy content, desired features, and system limitations each informed our decision making.

Annotation of IIIF resources

Niqui O'Neill (NC State University)
This presentation will discuss and demo a new open source JavaScript library for presenting annotations of IIIF resources. The library allows for the use of annotations for display and storytelling purposes. This rich display of annotations demonstrates the reuse value of annotations and provides the opportunity for new forms of scholarly output. This presentation will give an introduction to annotations, demonstrate the low barrier of entry to using the library, challenges around creating and using annotations of IIIF resources from multiple data models, potential use cases, and future development opportunities. Additionally, this talk will also touch on issues of annotations as scholarly output and demonstrate a local annotation server to help mediate some of the obstacles in creating annotations.

Getting Serious About Open Access Advocacy: Ten Practical Repository UI and Metadata Revisions to Help Your Library Champion the Cause

Maggie Dickson, Sean Aery (Duke University)
As the 2010s draw to a close, open access to scholarly work has become an integral theme throughout many libraries’ strategic plans, and Duke University Libraries is no exception. Ushering in this new era of openness will require libraries to take concerted action to improve the way their institutions' open scholarly publications are represented once collected in the platforms they support for curation, discovery, and access.

In a world where the open access copy of an article coexists with—and competes with—the published (often paywalled) version, how can libraries add value to the OA copy beyond merely making it accessible? How can we increase its impact? What can we do using our local metadata that can’t be done at scale by a publisher? And in the face of competing priorities, constrained resources, and a swiftly moving carousel of technology platforms, how can we make progress toward these ends without breaking the bank?

Over the past year, Duke Libraries decided to embrace—rather than replace—an aging DSpace platform for its open access publications, updating the core software from version 1.7 to 6.2. With renewed focus on metadata architecture and targeted user interface enhancements, Duke’s new DSpace system puts a modern spin on the software, and dares to break outside of the box of what an OA repository traditionally does.

We reconsidered how researchers can be presented alongside their research, displaying an author-provided photo and bio on item pages, and linking out to profiles in ORCID and VIVO using lightweight name string & ID pairing. We built copyable citations that vary by type, and took care to encourage citing the published version of the article where possible. We also illuminated usage, attention, and collection stats throughout the site.

Metadata has been the true key to unlocking the potential of these materials. Through metadata auditing, remodeling, and remediation, we built a solid foundation for developing a platform worthy of the research it holds. These changes have in effect turned a traditionally utilitarian platform into one that can appeal on an emotional level, and have helped to highlight the distinctive character of the Duke research community.

Come hear about Duke's approaches to addressing these challenges, and the tradeoffs and pitfalls encountered. No matter what platform your library's open access publications call home, you'll learn about ten ideas for practical metadata and interface changes you can make to help raise the profile of your institution's scholarly works.

Born-Digital Workflows at the Rose Library

Brenna Edwards (Emory University)
Creating workflows for preserving born-digital materials is a challenge, as technology and tools in the field are constantly being introduced or updated. At the Rose Manuscripts, Archives, and Rare Book Library at Emory University, the BitCurator environment has been adapted to create more efficient workflows for preserving born-digital media. While BitCurator has a wide menu of tools available, this talk will focus on a select few found to be the most useful when working with newly accessioned born-digital materials. These include FSLint,Bagger, BulkExtractor, Brunnhilde, and a toolset called CCA (Canadian Center for Architecture) Tools. Through experimentation and documentation, these tools have improved the workflow for both accessioning and processing born-digital media. This, in turn, makes the born-digital holdings at the Rose more accessible to our researchers. 

An evolving Hyrax: two approaches to adapting open-source digital repository software for mediated data deposit

Moira Downey, Jocelyn Triplett, Jennifer Darragh (Duke University)
The Samvera open source software community's Hyrax framework provides a user interface for digital repositories that incorporates a robust and growing set of features centered around the archiving, publishing, and sharing of digital content. Hyrax natively enables upload of files through direct user deposit, proxy deposit, and mediated deposit. This range of options represents a variety of possible workflows. However, none of them explicitly facilitate a workflow that allows for a review of the files to ensure their quality prior to ingest into the system. Over the past year, Duke University Libraries have adapted the Hyrax codebase to develop two data repositories with distinct approaches to pre-publication quality control--one human-centered and one system-based.

In 2017, Duke University Libraries introduced a data curation and publication program aimed at helping faculty and other campus scholars make their research data findable, accessible, interoperable and reusable (FAIR) [1]. The curation workflow established in support of this program is heavily reliant on staff intervention and involves a thorough review of a depositor's data to ensure that the dataset meets those FAIR standards. In the same spirit of openness that inspired the curation program, the libraries chose to build a local digital repository for researcher data using the Hyrax framework. The development team acknowledged that the software would require a number of customizations to allow the kind of human level audit that the program's curatorial procedures required. The end result--Duke's Research Data Repository [2]--is a system that allows researchers to submit files and accompanying metadata, while affording curatorial staff the opportunity to examine, rearrange, and potentially transform the files prior to ingest.

Also at Duke, the team behind MorphoSource [3], a publicly accessible web digital repository for 3D scans of biological specimens, saw in Hyrax a solid foundation on which to redevelop and expand the scope of the site to include museum and cultural heritage objects. The current MorphoSource site has 62 thousand files from over 900 contributors, and is experiencing exponential growth. In order to accommodate this volume of deposits on the new platform while ensuring that the user-submitted data and metadata are interoperable and support preservation activities as well as discovery and access, the MorphoSource team has undertaken several customizations to the Hyrax interface to guide users and validate files and metadata throughout the deposit process.

This presentation will look a closer look at how the two teams at Duke have bent the Hyrax codebase to build research data repositories using different workflows for pre-publication review and quality control. We will briefly trace the history of both archives, and explore the various ways in which each application implements the needs of its respective program.

[1] https://www.go-fair.org/fair-principles/
[2] https://research.repository.duke.edu/
[3] https://www.morphosource.org

Embracing the Mess: An Approach to Utilizing All Potential Data Sources

Luke Aeschleman (NC State University)
NC State University Libraries’ Citation Index is a central hub for researcher citation data, sourcing metadata from Web of Science, ORCID, Crossref, and faculty Curriculum Vitaes. One of the major accomplishments of the application is its ability to “intertwine” citation metadata into an enhanced, cohesive record. As opposed to standard ETL workflows (in which all data sources would be standardized, deduplicated, and stored), the Citation Index benefits from incomplete or duplicate records, as each source represents a single part of the larger whole. Some sources are better for author affiliation data and some are better for external identifiers. Two sources might be equally “as good” at supplying metadata but both lack 100% coverage. To ensure the best possible final record, the Citation Index “embraces the mess.” This approach allows the application to be more resilient to dirty data and more flexible in adding new data sources (and even more mess!).

This talk will use the Citation Index as a real-life example of how to approach multiple, open source data sources and the challenges of working in an environment that can be fraught with metadata inconsistencies. The talk will outline the benefits of “embracing the mess” as opposed to focusing on the creation of impeccably clean records.

Improving Library Workflows with Serverless Technologies

Karen Coombs (OCLC)
This session will provide an overview of the concepts of serverless and discuss how utilizing serverless technologies can improve library workflows, potentially reduce costs and facilitate innovation. The session will review several use cases related to for metadata maintenance, analytics and discovery; and examine using tools such as AWS Lambda, Step Functions, S3 and ElasticSearch.

Wrangling Images at the Digitization Rodeo

Jeremy D. Moore (University of Tennessee)
This presentation looks at digital image post-processing and quality control as data wrangling problems that can be solved by leveraging data science and developer tools, such as conda, Jupyter Notebooks, and GitHub. Based on my experience with millions of images in over a decade of managing digitization labs, I will share exploratory methods that fall somewhere between manually processing images in Adobe Photoshop and a fully-automated BASH script. Due to the unique issues inherent in physical item digitization performed by an ever-changing cast of student digital imaging technicians from a wide variety of backgrounds, this is less of a workflow and more a mentality lending itself to creative problem-solving in a manner that is flexible, scalable, and teachable to the non-developer. Attendees will be shown examples of past projects and hopefully learn new techniques for tackling old problems with Python.

Longleaf: a repository-independent command-line utility for applying digital preservation processes to files

Benjamin Pennell, Jason Casden (University of North Carolina at Chapel Hill)
As our digital collections infrastructure has grown over the past 20 years, we’ve found it difficult to apply digital preservation plans consistently across system-defined content boundaries. Our institution has developed longleaf, a new portable, command-line, repository-agnostic, rules-based tool for monitoring, replicating, and applying preservation processes to files. We chose to develop this tool in order to address several ongoing technological preservation challenges that we feel are also common at other institutions:

  • Preservation activities being applied to files based on system affiliation (i.e. repository platform or lack thereof) rather than the needs of the content.
  • Difficulty maintaining an ideal schedule of fixity checks as the sizes of our collections grow.
  • Physical and computational costs to servers and storage devices imposed by ongoing cryptographic checksum algorithms.
  • Difficulty gradually introducing cloud storage services into our replication strategy for vulnerable files.

We argue that the complexity of digital preservation technologies and the manner in which they are coupled with repository management systems contribute significantly to these problems. In an attempt to address these issues, we have designed longleaf according to the principles of high “software availability” (Davidson & Casden, 2016) that prioritize ease of use by a broad set of users in a variety of environments. To that end, longleaf is an open source Unix-style utility that will run on any modern Linux operating system with only a ruby interpreter. It is designed as a flexible tool that can be applied to any content storage system with a file system: longleaf requires no repository, no external database, and no storage system other than the file system. It can be run completely from the command line or triggered by arbitrary external systems (e.g. initiated on file ingest). We will be applying longleaf to files managed entirely on shared drives, files managed by Hyrax and our in-house Fedora-based repositories, as well as digitization masters managed by CONTENTdm.

Longleaf’s modular architecture and flexible configuration system will allow this tool to be used as a platform for evaluating and implementing varied preservation activities across subsets of larger collections. We are increasing coverage of ongoing and transactional fixity checks by implementing both typical computationally expensive cryptographic checksums alongside far more scaleable non-cryptographic checksums and filesystem checks based on different schedules, events, and collection affiliations. We will integrate storage endpoints with different access costs (e.g. Amazon S3 Glacier and magnetic tape data storage) by setting appropriate replication and verification (i.e. fixity checking) schedules and techniques based on characteristics of both the source and destination locations. This approach can allow the fairly straightforward implementation of actions based on levels of digital preservation need regardless of repository system constraints.

In this session, we will present the longleaf system design and demonstrate the ways that we are using longleaf to consistently implement digital preservation plans as defined by our institution’s digital preservation specialists.

Davidson, B., & Casden, J. (2016). Beyond open source. Code4Lib Journal, Issue 31. Retrieved from http://journal.code4lib.org/articles/11148

Drupal 7 to Drupal 8: Our Journey

Erik Olson, Meredith Wynn (NC State University)
Drupal 8 has some sorely needed improvements and many nice upgrades from Drupal 7. New tools like Symfony, Twig, and Composer alone make it worth the upgrade. Because of these new tools there are substantial changes in the way Drupal 8 is built and manages its data model.

Upgrading a Drupal 7 site to Drupal 8 is, unfortunately, not as simple as running a script. Templates and custom modules will need to be rebuilt and all of your content will have to be migrated to a new database model. If this sounds daunting, well that’s because it is.

The NC State University Libraries website had 25,000 nodes, 30 content types, 10 custom modules, 100+ custom views, and over 150 templates. I am very proud to say, without any evidence to back up this claim, it was the single largest website to attempt a Drupal 8 migration. The upgrade was very difficult, but we did it and we are glad we did.

In our talk we will discuss the right and wrong ways to go about a migration, the best tools we found, and tips we wish we knew before we began our migration.

Topics include:

  • Benefits of Drupal’s “Migrate” module and what it lacks
  • How to migrate views
  • How to structure your yaml files for your custom content types
  • Rewriting templates in Twig


Grant-funded open source software: lessons learned from the initial release

Sharon Luong, Erica Titkemeyer (University of North Carolina at Chapel Hill)
Jitterbug is a web application funded as part of an Andrew W. Mellon Foundation grant, supporting large-scale description, digitization, preservation, and access of archival audiovisual recordings across Wilson Special Collections Library at the University of North Carolina at Chapel Hill’s University Libraries. Launched in early 2017, Jitterbug has been successful in helping staff describe over 40,000 items and preserve and provide access to over 40% of the total archival recordings within the Southern Folklife Collection.

Solving a number of challenges for audiovisual collections and institutions involved with AV preservation, including the need for customized fields based on formats and batch data importing for various points in the digitization workflow, it has been a hope that Jitterbug could find adoption among UNC’s peer institutions. However, many practical and technical hurdles remain in the way of Jitterbug use outside of Wilson Library. Specifically, this presentation will highlight the difficulties in promoting Jitterbug, from limitations experienced in grant-funded open source software development, to the discovery of local application dependencies that may keep it from being truly reusable by other institutions.

Jitterbug’s Product Owner, Erica Titkemeyer, will share details on the initial development and use of the application. They will discuss, in hindsight, potential provisions to the grant proposal in order to allow for conceptualization of a simpler, more generalized version of Jitterbug to better meet the needs of a wider constituency.

Jitterbug’s developer, Sharon Luong, will talk about technical measures to enable and improve the re-usability of initial open source software releases. These include simplifying application dependencies, decreasing setup effort, and increasing documentation. They will also discuss prioritizing these and other retroactive improvements against new requested features.

5 minute lightning talks

Raspberry PioT: Teaching the Internet of Things with the Raspberry Pi

Colin Nickels (NC State University)
The Internet of Things is a complicated topic; it's an umbrella term that encompasses many technologies; it's a messy collection of gadgets and gizmos; it's a buzzword that conveys little solid meaning. Making sense of this is hard. This complexity makes IoT a particularly difficult topic to teach in a hands-on Makerspace workshop.

Over the years of struggling with this topic, we have adopted multiple different platforms, technologies and learning outcomes. Starting with Arduino and moving to Raspberry Pi, we have iterated on our workshop to make it more approachable and provide more time building an Internet of Things Thing.

This talk highlights our efforts to tackle IoT as a workshop in our library. I will discuss the advantages of the Pi as well as lessons learned through years of struggling to teach this topic.

Using Python’s Pandas Data Analysis Library for Digital Collections Assessment

Julia Gootzeit, Morgan McKeehan (University of North Carolina at Chapel Hill)
Assessing the metadata and content of large-scale digital collections necessitates careful analysis of the very large data sets about collections that accrue within digital asset management systems over time. Collections data sets can be messy and complicated to work with, however, posing challenges for comprehensive assessment efforts.

At our institution, we are currently conducting an assessment of metadata and content for our digital collections in preparation for migrating them out of CONTENTdm and into a new system. In this lightning talk, we will discuss how we used the open-source data analysis library pandas for the python programming language, to address some of the collections assessment challenges we have encountered. We have found pandas’ fine-grained and well-documented data analysis tools to be easy to work with and flexible for our needs in assessing large volumes of tabular metadata that well exceed the size limitations of commonly used spreadsheet software.

In our talk, we will briefly outline the pandas modules we found useful for working with collections data, and will show how we used them to perform specific assessment tasks such as:

  • merging data sets from various export sources according to specific parameters
  • running calculations across combined data sets to create collections snapshots for attributes such as image quality of content files

We will also provide recommendations and links for the sources we have found most helpful for learning pandas.

Heroku to the Rescue!

Meredith L. Hale (University of Tennessee)
What are your options when you need to host an application and don’t have access to a server? In this lightning talk, I’ll provide snapshots of two cases studies that answer this question using Heroku with no resulting costs. Heroku is a cloud-based Platform as a Service (PaaS) that can be capitalized upon for a variety of projects. In one instance, Heroku is being used to minimize the technical knowledge staff need to use a Library of Congress reconciliation service and the time needed to install the requirements on individual computers. Hosting the app on Heroku makes it so that staff and students do not need to use the command line to run the reconciliation program in OpenRefine. In the second instance, Heroku is being used to host a Twitterbot that promotes the library’s digital collections through a daily image post with associated metadata on the Twitter application. Running a program periodically can be achieved in Heroku by using either apscheduler or the Heroku scheduler add-on. Once fired by Heroku, this program uses OAI-PMH to randomly select a digital collections item link and posts the associated image and title online using the affordances of OpenGraph tags and the Twitter API. Heroku proved to be a useful and approachable tool in these instances and provided me with my first experience using a PaaS. Through sharing my work with Heroku, I hope that attendees will use the platform to solve new problems of their own.

Integrating the Cantaloupe IIIF Server into Hyrax/Samvera

Kevin S. Clarke (UCLA)
Hyrax comes with RIIIF support, but it's easy to configure the Cantaloupe IIIF server to work with your Hyrax/Samvera installation. I propose to show how to accomplish this, and then to talk briefly about how we're planning to change the way that Hyrax and Cantaloupe interact. We envision using JP2 instead of TIFF images as the IIIF source image. We also want to further decouple Cantaloupe and Hyrax (which will involve storing manifests outside of Hyrax, as well as some other changes). I'll talk about the code we're writing to make this possible.

New Roots: Developing a Bilingual Omeka site

Emily Brassell (University of North Carolina at Chapel Hill)
In this lightning talk, I’ll give an overview of New Roots: Voices from Carolina del Norte!, a digital archive containing oral histories of Latin American migrants in North Carolina and the experiences of North Carolinians who have worked for the integration of new settlers into this southern state.

Built on the Omeka platform and funded by the National Endowment for the Humanities, the project is a collaboration of the Latino Migration Project, the Southern Oral History Program, and University of North Carolina at Chapel Hill Libraries. There were two major considerations in developing the site: first, since Spanish speaking researchers and the Latino community in NC were important audiences, it was crucial to create a bilingual site. Secondly, to avoid the tedium and potential errors of duplicate data entry, we needed to sync data from our authoritative source (CONTENTdm) to Omeka.

To create a bilingual site, we forked the Omeka Multilanguage plugin and customized it heavily. Our solution also depends on a custom theme, a metadata schema with English and Spanish translations of most fields, and English and Spanish translations of Omeka “SimplePages.”

As for syncing data, we created an endpoint on our CONTENTdm server with XML data created according to the ResourceSync standard. We then created an Omeka plugin making heavy use of resync-php which translates the CONTENTdm data into fields usable by New Roots.

The development process was enriched by working closely with a faculty member. In weekly meetings we refined requirements and discussed design decisions, and the project benefited greatly from her domain knowledge.

In the talk, I’ll describe each of these aspects in more detail (focusing on the bilingual functionality) and will briefly highlight the site’s other features with a series of screenshots. Finally, I’ll discuss lessons learned and future aspirations for the site.

Replacing aging ipads with low cost DIY solutions

Jason Fleming (University of North Carolina Wilmington)
We have been using Raspberry Pis for rotating displays in the library for about 2 years now with success, so when it was time to replace our aging Ipad lookup stations we decided to explore the option of using a touch screen display coupled with a raspberry pi. We will describe the solution we came up with and the problems finding a case to fit, and how we ended up retrofitting the existing cases with new 3D printed parts.

Scaling Up R Instruction using RStudio Cloud + Zoom

Alison Blaine (NC State University)
In this lightning talk, I will discuss my experiences using cloud computing software, specifically RStudio Cloud, and webinar technology (Zoom) to scale up teaching R workshops at NC State to accomodate both in-person and online participants. This talk will focus on my 9-week R for Data Science workshop series that I'm currently teaching (Feb - April 2019), in which approximately 50 participants attend weekly hands-on coding sessions that are synchronous in-person at Hunt Library and online. The series is also being recorded and will be available as a self-directed non-credit course via the Moodle learning management software at NC State. My goal for the talk is to offer thoughts about how others might be able to successfully use these and similar technologies (such as Google Colab) to scale up data science instruction.

Building Bridges: The Triangle Digital Humanities Network as a Model for Navigating Inter-Institutional Collaboration

Nathan Kelber, Claire Cahoon (University of North Carolina at Chapel Hill)
Libraries across the country are striving to find the right people, knowledge, and resources to create and sustain successful digital projects, but often struggle to break out of silos and connect with outside sources. Large R1 institutions grapple with keeping connected while small institutions strive to get access to needed resources. We all benefit from collaborating across institutions, documenting our strengths, and building the capacities of our staffs. These are the broad goals of the Triangle Digital Humanities Network (TDHN), which aims to create a community of practice for digital humanities scholars, teachers, and practitioners from institutions of all kinds within the North Carolina research triangle. Our talk will discuss the benefits and challenges of developing such an organization, with a focus on inter-institutional collaboration.

Tackling these issues requires a variety of tactics. At TDHN, we have focused on these support methods:

  • Inter-institutional communication channels (email list, shared calendar)
  • Census data of local people, projects, organizations, workshops, spaces, communities, and repositories
  • A Triangle Digital Humanities Institute for training scholars, especially for incorporating new scholars and those at under-represented institutions

The challenge of any such community is to bring together people of diverse backgrounds, especially populations and institutions that are historically underrepresented in technology. We will address the challenges our team faces concerning decentralizing leadership and contribution as well as finding and including smaller institutions. Using TDHN as a case study, we will discuss the value of creating an interdisciplinary, inter-institutional, virtual community and how to take the first steps towards starting one in your area.

Back to Southeast 2019

Back to Southeast