Changes

Jump to: navigation, search

2014 Prepared Talk Proposals

74,962 bytes added, 19:45, 27 May 2016
PhantomJS+Selenium: Easy Automated Testing of AJAX-y UIs
==2014 Prepared Talk Proposals==
 
'''Proposals for Prepared Talks:'''
'''To Propose a Talk'''
* Log in to the wiki in order to submit a proposal. If you are not already registered, follow the instructions to do so.
* Provide a title and brief (500 words or fewer) description description of your proposed talk.
* If you so choose, you may also indicate when, if ever, you have presented at a prior Code4Lib conference. This information is completely optional, but it may assist us in opening the conference to new presenters.
'''Talk Proposals'''
 
==Creating a new Greek-Dutch dictionary==
* Caspar Treijtel, University of Amsterdam, c.treijtel@uva.nl
 
At present, no complete dictionary of (ancient) Greek-Dutch is available online. A new dictionary is currently under construction at Leiden University, with software being developed at the University of Amsterdam. The team in Leiden has already begun preparation of the data, with at this moment about 6,000 approved lemmas. The ultimate goal is to produce both a print version and online open access version from the same source documents. The software needed for this has been made in a project that was funded by CLARIN-NL.
 
Migrator
 
For the production of lemmas we have implemented an advanced workflow. The (generally non-technical) users create lemmas using MS Word, which is both familiar and easy to use. We have developed a custom software module that carefully migrates the Word documents into deeply structured XML by analyzing the structure and semantics of the lemmas, and falling back on heuristics in ambiguous cases. While having initially envisioned the oXygen XML Author component as the main tool for creating new lemmas, we obtained excellent results with the migrator module, and decided therefore to continue using MS Word as the primary composition tool. The main advantage of this is that the editors are much more familiar with Word than with any other WYSIWYG editor. Lemmas that have been migrated to XML are stored in an XML database and can be further edited using oXygen XML Author.
 
Lemmatizer
 
Greek morphology is complicated. In order to use a dictionary effectively, a rather high level of initial language competence is necessary for the user to be able to relate the word form s/he finds in a text to the correct basic lemma form, where the definition of the word can be found. Using a Greek morphological database we have been able to facilitate the search for lemmas. A ‘lemmatizer’ module gives the possible parsings of the word forms and the lemmas they can be derived from. This enables the user to type in the word as found in the text and be redirected to the correct lemma.
 
Visualization
 
For the online dictionary we have implemented a visualization module that allows the user to view multiple lemmas at once. The implementation of this module has been done using the Javascript framework MooTools. The result is a viewer that performs really well and is run by maintainable Javascript code.
 
The online dictionary is still being worked on, have a look at http://www.woordenboekgrieks.nl/ for the beta version. A newer test version with additional features can be found here: http://angel.ic.uva.nl:8600/.
 
Credits
 
* construction of the dictionary: Prof. Ineke Sluiter, Classics department of Leiden University; Prof. Albert Rijksbaron, University of Amsterdam
* publisher of the dictionary: Amsterdam University Press
* design/typesetting dictionary: TaT Zetwerk (http://www.tatzetwerk.nl/)
* software development: Digital Production Center, University Library, University of Amsterdam
* project funding: CLARIN-NL (http://www.clarin.nl/)
* morphological database for use by the lemmatizer: courtesy of Prof. Helma Dik, University of Chicago (based on data of the Perseus Project)
----
* [http://code4lib.org/conference/2013/ronallo HTML5 Video Now!] 2013
Watching the Google Analytics Real-Time dashboard for the first time was mesmerizing. As soon as someone visited a site, I could see what page they were on. For a digital collections site with a lot of images, it was fun to see what visitors were looking at. But getting from Google Analytics to the image or other content of what was currently being viewed was cumbersome. The real-time experience was something I wanted to share with others. I'll show you how I used a WebSocket service to create a real-time interface to digital collectionsviews and search queries.
In the Hunt Library at NCSU we have some large video walls. I wanted to make HTML-based exhibits that featured viewer interactions. I'll show you how I converted Listen to Wikipedia [1] into an bring-your-own-device interactive exhibit. With WebSockets any HTML page can be remote controlled by any internet connected device.
Built into the walls of NC State's new Hunt Library are several [http://www.christiedigital.com/en-us/digital-signage/products/microtiles/pages/microtiles-digital-signage-video-wall.aspx Christie MicroTile Display Wall Systems]. What does a library do with a display that's seven feet tall and over twenty feet wide? I'll talk about why libraries might want large displays like this, what we're doing with them right now, and what we might do with them in the future. I'll talk about how these displays factor into planning for new and existing web projects. And I'll get into the fun details of how you build web applications that scale from the very small browser window on a phone all the way up to a browser window with about 14 million pixels (about 10 million more than a dual 24" monitor desktop setup).
 
== Discovering your Discovery System in Real Time. ==
 
* Godmar Back, Virginia Tech, gback@vt.edu
* Annette Bailey, Virginia Tech, afbailey@vt.edu
 
Practically all libraries today provide web-based discovery systems to their users;
users discover items and peruse or check them out by clicking on links. Unlike
the traditional transaction of checking out a book at the circulation desk, this
interaction is largely invisible. We have built a system that records user's
interaction with Summon in real-time, processes the resulting data with minimal delay,
and visualizes it in various ways using Google Charts and using various d3.js modules,
such as word clouds, tree maps, and others.
 
These visualizations can be embedded in web sites, but are also suitable for
projection via large-scale displays or projectors right into the 'Learning Spaces'
many libraries are converted into. The goal of this talk is to share the technology
and advocate the building of a cloud-based infrastructure that would make this
technology available to any library that uses a discovery system, rather than just
those who have the technological prowess for developing such systems and
visualizations in-house.
 
Previous presentations at Code4Lib:
* Talk: Code4Lib 2009 [http://code4lib.org/files/LibX2.0-Code4Lib-2009AsPresented.ppt LibX 2.0]
* Preconference: [http://wiki.code4lib.org/index.php/LibX_Preconference LibX 2.0, 2009]
* Preconference: Code4Lib 2010, On Widgets and Web Services
== Your Library, Anywhere: A Modern, Responsive Library Catalogue at University of Toronto Libraries ==
As the corpus of articles, books, and other resources searched by discovery systems continues to get bigger, searchers are more and more frequently confronted with unmanageably large numbers of results. How can we help users make sense of 10,000 hits and find the ones they actually want? Facets help, but making sense of a gigantic sidebar of facets is not an easy task for users, either.
During this talk, I will explain how we will soon be using Solr 4’s pivot queries and hierarchical visualizations (e.g., treemaps) from D3.js to let patrons view and manipulate search results. We will be doing this with our VuFind 2.0 catalog, but this technique will work with any system running Solr 4. I will also talk about early student reaction to our tests of these visualization features.
 
== PeerLibrary – open source cloud based collaborative library ==
 
* [http://mitar.tnode.com/ Mitar Milutinovic], UC Berkeley, mitar.code4lib at tnode.com
* Not presented or attended code4lib before
 
[https://github.com/peerlibrary/peerlibrary PeerLibrary is a new open source project] and a cloud service providing collaborative reading, sharing and storing. Users can upload publications they want to read (currently in PDF format), read them in the browser in real-time with others, highlight, annotate and organize their own or collaborative library. PeerLibrary provides a search engine to search over all uploaded open access publications. Additionally, it aims to collaboratively aggregate the open layer of knowledge on top of this publications through public annotations and references user will add to publications. In this way publications would not just be available to read, but accessible to the general public as well. Currently, it is aiming at scientific community and scientific publications.
 
See [http://blog.peerlibrary.org/post/63458789185/screencast-previewing-the-peerlibrary-project screencast here]. [http://peerlibrary.org/ Subscribe to newsletter] to be a beta tester when we open.
 
It is still in development and beta launch is planned at the end of November.
 
== Who was where when, or finding biographical articles on Wikipedia by place and time ==
 
* [http://morton-owens.info Emily Morton-Owens], The Seattle Public Library (presenting on work from NYU)
* No previous c4l presentations
 
It's easy to answer the question "What important people were in Paris in 1939?" But what about Virginia in the 1750s or Scandinavia in the 14th century? I created a tool that allows you to search for biographies in a generally applicable way, using a map interface. I would like to present updates to my thesis project, which combines a crawler written in Java that extracts information from Wikipedia articles, with a MongoDB data store and a frontend in Python.
 
The input to the project is freetext of entire articles in Wikipedia; this is important to allow us to pick up Benjamin Franklin not just in the single most obvious place of Philadelphia but also in London, Paris, Boston, etc. I can talk about my experiments disambiguating place names (approaches pioneered on newspaper articles were actually unhelpful on this type of text) and setting up a processing queue that does not become mired in the biographies of every human who ever played soccer. I also want to mitigate some of the implementation choices I made due to my academic deadline and improve the accuracy/usability.
 
What I hope to show is that I was able to develop a novel and useful reference tool automatically, using fairly simple heuristics that are a far cry from hand-cataloging familiar to many librarians.
 
You can try out [http://linserv1.cims.nyu.edu:48866/ the original version] (this server is inconveniently set to be updated/rebooted on 11/8--may be temporarily unavailable)
 
== Good!, DRY, and Dynamic: Content Strategy for Libraries (Especially the Big Ones) ==
 
*Michael Schofield, Nova Southeastern University Libraries, mschofield@nova.edu
*No previous code4lib presentations.
 
The responsibilities of the #libweb are exploding [it’s a good thing] and it is no longer uncommon for libraries to manage or even home-grow multiple applications and sites. Often it is at this point where the web people begin to suffer the absence of a content strategy when, say, business hours need to be updated sitewide a half-dozen times.
 
We were already feeling this crunch when we decided to further complicate the Nova Southeastern University Libraries by splitting the main library website into two. The Alvin Sherman Library, Research, and Information Technology Center is a unique joint-use facility that serves not only the academic community but the public of Broward County - and marketing a hyperblend of content through one portal just wasn't cutting it. With a web team of two, we knew that managing all this rehashed, disparate content was totally unsustainable.
 
I want to share in this talk how I went about making our library content DRY (“don’t repeat yourself”): input content in one place--blurbs, policies, featured events, featured databases, book reviews, business hours, and so on.--and syndicate it everywhere - even, sometimes, dynamically target that content for specific audiences or context. It is a presentation that is a little about workflow, a little more about browser and context detection, a tangent about content-modeling the CMS, and a lot about APIs, syndication, and performance.
 
== No code, no root, no problem? Adventures in SaaS and library discovery ==
 
*[mailto:erwhite@vcu.edu Erin White, VCU]
*No previous C4L presentations
 
In 2012 VCU was an eager early adopter of Ex Libris' cloud service Alma as an ILS, ERM, link resolver, and single-stop, de-silo'd public-facing discovery tool. This has been a disruptive change that has shifted our systems staff's day-to-day work, relationships with others in the library, and relationships with vendors.
 
I'll share some of our experiences and takeaways from implementing and maintaining a cloud service:
* Seeking disruption and finding it
* Changing expectations of service and the reality of unplanned downtime
* Communication and problem resolution with non-IT library staff
* Working with a vendor that uses agile development methodology
* Benefits and pitfalls of creating customizations and code workarounds
* Changes in library IT/coders' roles with SaaS
 
...as well as thoughts on the philosophy of library discovery vs real-life experiences in moving to a single-search model.
 
== Building for others (and ourselves): the Avalon Media System ==
* [mailto:michael.klein@northwestern.edu Michael B Klein], Senior Software Developer, Northwestern University
** [http://code4lib.org/conference/2010/metz_klein Public Datasets in the Cloud] (code4lib 2010)
** [http://code4lib.org/conference/2013/klein-rogers The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery] (code4lib 2013)
* [mailto:j-rudder@northwestern.edu Julie Rudder], Digital Initiatives Project Manager, Northwestern University
** no previous code4lib presentations
 
[http://www.avalonmediasystem.org/ Avalon Media System] is a collaborative effort between development teams at Northwestern and Indiana Universities. Our goal is to produce an open source media management platform that works well for us, but is also widely adopted and contributed to by other institutions. We believe that building a strong user and contributor community is vital to the success and longevity of the project, and have developed the system with this goal in mind. We will share lessons learned, pains and successes we’ve had releasing two versions of the application since last year.
 
Our presentation will cover our experiences:
* providing flexible, admin-friendly distribution and installation options
* building with abstraction, customization and local integrations in mind
* prioritizing features (user stories)
* attracting code contributions from other institutions
* gathering community feedback
* creating a product rather than a bag of parts
 
== How to check your data to provide a great data product? Data quality as a key product feature at Europeana ==
 
*[mailto:Peter.Kiraly@kb.nl Péter Király] portal backend developer, Europeana
*No previous C4L presentations
 
[http://Europeana.eu/ Europeana.eu] - Europe's digital library, archive and museum - aggregates more than 30 million metadata records from more than 2200 institutions. The records come from libraries, archives, museums and every other kind of cultural institution, from very different systems and metadata schemas, and are typically transformed several times until they are ingested into the Europeana data repository. Europeana builds a consolidated database from these records, creating reliable and consistent services for end-users (a search portal, search widget, mobile apps, thematic sites etc.) and an API, which supports our strategic goeal of data for reuse in education, creative industries, and the cultural sector. A reliable "data product" is thus at the core of our own software products, as well as those of our API partners.
 
Much effort is needed to smooth out local differences in the metadata curation practice of our data providers. We need a solid framework to measure the consistency of our data and provide feedback to decision-makers inside and outside the organisation. We can also use this metrics framework to ask content providers to improve their own metadata. Of course, a data-quality-driven approach requires that we also improve the data transformation steps of the Europeana ingestion process itself. Data quality issues heavily define what new features we are able to create in our user interfaces and API, and might actually affect the design and implementation of our underlying data structure, the Europeana Data Model.
 
In the presentation I briefly describe the Europeana metadata ingestion process, show the data quality metrics, the measuring techniques (using the Europeana API, Solr and MongoDB queries), some typical problems (both trivial and difficult ones), and finally the feedback mechanism we propose to deploy.
 
Keywords: Europeana, data quality, EDM, API, Apache Solr, MongoDB, #opendata, #openglam
 
== Teach your Fedora to Fly: scaling out a digital repository ==
 
*[mailto:acoburn@amherst.edu Aaron Coburn], Software Developer, Amherst College
*No previous C4L presentations
 
Fedora is a great repository system for managing large collections of digital objects, but what happens when a popular food magazine begins directing a large number of readers to a manuscript showing Emily Dickinson’s own recipe for doughnuts? While Fedora excels in its support of XML-based metadata, it doesn’t always perform well under a high volume of traffic. Nor is it especially tolerant of network or hardware failures.
 
This presentation will show how we are making heavy use of a fedora repository while at the same time insulating it almost entirely from any web traffic. Starting with a distributed web front-end built with Node.js, and caching most of the user-accessible content from Fedora in an elastic, fault-tolerant Riak (NoSQL) cluster, we have eliminated nearly all single points of failure in the system. It also means that our production system is spread across twelve separate servers, where asynchrony and Map-Reduce are king. And aside from being blazing fast, it is also entirely Hydra-compliant.
 
Furthermore, we will attempt to answer the question: if fedora crashes and the visitors to your site don’t notice, did it really fail?
 
 
== Using Open Source Software and Freeware to Preserve and Deliver Digital Videos ==
* [mailto:wfang@kinoy.rutgers.edu Wei Fang], Head of Digital Services, Rutgers University Law Library
* Jiebei Luo, Digital Projects Initiative Intern, Rutgers University
*No previous C4L presentations
 
The Rutgers University Law Library is the official digital repository of the New Jersey Supreme Court oral arguments since 2002. This large video collection contains approximately 3,000 videos with a total of 400 GB or 6,000 viewing hours. With the expansion of this collection, the existing database and the static website could not efficiently support the library’s daily operations and meet its patrons’ search needs.
By utilizing open source software and freeware such as Ubuntu, FFmpeg, Solr and Drupal, the library is able to develop a complete solution to re-encoding videos, embedding subtitles, incorporating Solr search engine and content management system to support full-text subtitle search, automatically updating video metadata records in the library catalog system and eventually providing a plug-in free HTML 5-based Web interface for patrons to view the videos online.
The aspects below will be presented in detail at the conference:
* Video codecs comparison
* Server-end batch video encoding/re-encoding
* HTML 5 video tag and embedding subtitles
* Incorporating search engine Solr and content management tool Drupal with the database to retrieve videos by full-text search especially in subtitle files
* Incorporating video metadata with the library catalog system
 
== Shared Vision, Shared Resources: the Curate Institutional Repository ==
* Dan Brubaker Horst, University of Notre Dame
** [http://code4lib.org/conference/2011/JohnsonHorst A Community-Based Approach to Developing a Digital Exhibit at Notre Dame Using the Hydra Framework]
* Julie Rudder, Northwestern University
** no previous presentations
 
Curate is being collaboratively developed by several institutions in the Hydra community who share the need and vision for a Fedora-backed Institutional Repository. The first release of Curate was a collaboration between Notre Dame and Northwestern University, along with Digital Curation Experts (DCE) - a vendor hired jointly by our two institutions. Powered by the Hydra engine Sufia, the team worked quickly to release the first version of Curate in October 2013 which provides a basic self-deposit system that has support for various content types, collection building, DOI minting, and user profile creation. From the very beginning we have built Curate to be easy to theme and extend in order to ease the process of installation and use by other institutions.
 
In December 2013, additional partners will join the project including: Indiana University, the University of Cincinnati and the University of Virginia. Each institution contributes resources to the project in order to further our common goal to create a product that fits our needs and has a sustainable future.Together we will tackle additional content types (like complex data, software, media), administrative collections and more.
 
Our presentation will include:
* a brief demonstration of Curate and technical overview
* why and how we work together
* why build Curate
* the future of the project
 
== Solr, Cloud and Blacklight ==
* David Jiao, Library Information Systems, Indiana University at Bloomington, djiao@indiana.edu
** No previous code4lib presentations
 
SolrCloud refers to the distributed capabilities in Solr4. It is designed to offer a highly available, fault tolerant environment by organizing data into multiple pieces that can be hosted on multiple machines with replicas, and providing a centralized cluster configuration and management.
 
At Indiana University, we are upgrading our Solr backend for our recently released Blacklight-based OPAC system from Solr 1.4 to Solr4, and we also put up efforts to build a private cloud of Solr4 servers. In this talk, I will persent certain features of SolrCloud, including distributed requests, fault tolerance, near real time indexing/searching, and configuration management with Zookeeper, and our experiences of utilizing these features to provide better performance and architecture for our OPAC system, which serves over 7 million bibliographic records to over 100 thousand students and faculty members. I will also discuss some practical lessons learned from our SolrCloud setup/upgrade and the integration of the new SolrCloud to our customized Blacklight system.
 
== Leveraging XSD's for Reflective, Live Dataset Support in Institutional Repositories ==
* [mailto:msulliva@ufl.edu Mark Sullivan], Library Information Technology, University of Florida
** No previous code4lib presentations
 
The University of Florida Libraries are currently adding support for active datasets into our METS-based institutional repository software. This ongoing project enables the library to be a partner in current, or long-running, data-driven projects around the university by providing tangible short-term and long-term benefits to the projects. The system assists project teams by storing and providing access to their data, while supporting online filtering and sorting of the data, custom queries, and adding and editing of the data by authorized users. We are also exploring simple data visualizations to allow users to perform basic graphical and geographic queries. Several different schemas were explored including DDI and EML, but ultimately the streamlined approach of using XSD's with some custom attributes was chosen, with all other data residing in the METS file portions. Currently the system is being developed using XSD's describing XML datasets, but this model should easily scale to support SQL datasets or large datasets supported by Hadoop or iRODS.
 
This work is being integrated in the open source [http://sobek.ufl.edu SobekCM Digital Content Management System] which is built on a pair-tree structure of METS resources with [http://ufdc.ufl.edu/design/webcontent/sobekcm/SobekCM_Resource_Object.pdf rich metadata support] including DC, MODS, MARC, VRACore, DarwinCore, IEE-LOM, GML/KML, schema.org microdata, and many other standard schemas. The system has emphasized online, distributed creation and maintenance of resources including geo-placement and geographic searching of resources, building structure maps (table of contents) visually online, and a broad suite of curator tools.
 
This work is presented as a model which could be implemented in other systems as well. We will demonstrate current support and discuss our upcoming roadmap to provide complete support.
 
== Dead-simple Video Content Management: Let Your Filesystem Do The Work ==
 
* Andreas Orphanides, NCSU Libraries (akorphan (at) ncsu.edu)
** (never led or soloed a C4L presentation)
 
Content management is hard. To keep all the moving parts in order, and to maintain a layer of separation between the system and content creators (who are frequently not technical experts), we typically turn to content management systems like Drupal. But even Drupal and its kin require significant overhead and present a not inconsiderable learning curve for nontechnical users.
 
In some contexts it's possible -- and desirable -- to manage content in a more streamlined, lightweight way, with a minimum of fuss and technical infrastructure. In this presentation I'll share a simple MVC-like architecture for managing video content for playback on the web, which uses a combination of Apache's mod_rewrite module and your server's filesystem structure to provide an automated approach to video content management that's easy to implement and provides a low barrier to content updates: friendly to content creators and technology implementors alike. Even better, the basic method is HTML5-friendly, and can be integrated into your favorite content management system if you've got permissions for creating templates.
 
In the presentation I'll go into detail about the system structure and logic required to implement this approach. I'll detail the benefits and limitations of the system, as well as the challenges I encountered in developing its implementation. Audience members should come away with sufficient background to implement a similar system on their own servers. Implementation documentation and genericized code will also be shared, as available.
 
== Managing Discovery ==
 
* Andrew Pasterfield, Senior Programmer/Systems Analyst, University of Calgary Library, ampaster@ucalgary.ca
**No previous code4lib presentations <br><br>
In fall 2012 the University of Calgary Library launched a new home page that incorporated a Summon powered
Single Search Box with customized “bento box” results display. Search at the U of C now combines a range of
metadata sources for discovery and customized mapping of a database recommender and LibGuide into a unified
display. Further customizations include a non Google Analytics/non proxy method to log clicks.<br><br>
 
This presentation will discuss the technical details of bringing the various systems together into one display interface to increase discovery at the U of C Library.
 
http://library.ucalgary.ca
 
 
== Sorting it out: a piece of the User Centered Design Process ==
 
* Cindy Beggs, [http://www.akendi.com/aboutus/management/ Akendi], cindy@akendi.com
 
This talk is about how to apply a user centered design methodology to the process of creating an information architecture. Participants learn the fundamentals of UCD and how card sorting and reverse card sorting enable us to isolate the content we present on screen from the layouts and visuals of those screens. We talk about ways to identify who will be using the information architecture you are creating and why we need to know how it will be used.
What will attendees takes away from your talk?
The criticality of involving “real” end users in the process of creating an information architecture. The basics of following a user-centered-design process in the creation of best in class, content-rich, digital products.
 
Cindy Beggs has been working in the “information industry” for over 25 years. A librarian by profession, she has spent decades helping users figure out how to find their way through large bodies of content. Her insights into how people seek information, her empathy for those who find it a challenge and her practical experience helping organizations figure out how to best structure their content contribute to her success as an information architect with both clients and trainees. (http://www.akendi.com/aboutus/management/)
 
 
==Implementation of ArchivesSpace in University of Richmond==
 
*Birong Ho, bho@richmond.edu
 
University of Richmond implemented its archive collection management ArchivsSpace in the fall, 2013. As a charter member and the Head of Special Collection as the Board member, implementation of such an Open Source Software became a priority.
 
Several aspects of implementation will be addressed in the talk. Among them, they are Collections and Repository, storage layer including data format, System resources requirements, Technical architecture, Customization, scaling and integrated with other systems in the library.
 
The customization, scale, and integration with other systems such as Archeon and Exist on campus became a concern will be focused and elaborated in the talk.
 
==Easy Wins for Modern Web Technologies in Libraries==
 
*[mailto:trey.terrell@oregonstate.edu Trey Terrell], Analyst Programmer, Oregon State University
** No previous Code4Lib presentations
 
Oregon State University is currently implementing an updated version of its room reservation system. In its development we've come across and implemented a variety of "easy wins" to make it more responsive, easier to maintain, less expensive to run, and just cooler to experience. While our particular system was in Ruby on Rails, this talk will address general methods and example utilities which can be used no matter your stack.
 
I'll be talking about things like cache management, reverse proxies, publish/subscribe servers, WebSockets, responsive design, asynchronous processing, and keeping complicated stacks up and running with minimal effort.
 
==Implementing Islandora at a Small Institution==
 
*Megan Kudzia, Albion College Library
*Eddie Bachle, Albion College IT
**No previous Code4Lib presentations
 
Albion College (and particularly the Library/Archives and Special Collections) has a variety of needs which could be met by an open-source Institutional Repository system. Several months and lots of conversations later, we’re continuing to troubleshoot our way through Islandora. We’d like to talk about what has worked for us, where our frustrations have been, whether it’s even possible to install and develop a system like this at a small institution, and where the process has stalled.
 
As of right now, we do have a semi-working installation. We’re not sure when it will be ready for our end users, but we'll talk about our development process and evaluate our progress.
''Contributions also by Nicole Smeltekop, Albion College Archives & Special Collections''
 
 
== PhantomJS+Selenium: Easy Automated Testing of AJAX-y UIs ==
 
* Martin Haye, California Digital Library, martin.haye@ucop.edu
** Previous Code4Lib Presentation: [http://code4lib.org/conference/2012/collett Beyond code: Versioning data with Git and Mercurial] at Code4Lib 2012 (Martin co-presenting with Charlie Collett)
* Mark Redar, California Digital Library, mark.redar@ucop.edu
 
Web user interfaces are demanding ever-more dynamism and polish, combining HTML5, AJAX, lots of CSS and jQuery (or ilk) to create autocomplete drop-downs, intelligent buttons, stylish alert dialogs, etc. How can you make automated tests for these highly complex and interactive UIs?
 
Part of the answer is PhantomJS. It’s a modern WebKit browser that’s “headless” (meaning it has no display) that can be driven from command-line Selenium unit tests. PhantomJS is dead simple to install, and its blazing speed and server-friendliness make continuous integration testing easy. You can write UI unit tests in {language-of-your-choice} and run them not just in PhantomJS but in Firefox and Chrome, plus a zillion browser/OS combinations at places like SauceLabs, TestingBot and BrowserStack.
 
In this double-team live code talk, we’ll explain all that while we demonstrate the following in real time:
 
* Start with nothing.
* Install Selenium bindings for Ruby and Python.
* In each language write a small test of an AJAX-y UI.
* Run the tests in Firefox, and fix bugs (in the test or UI) as needed.
* Install PhantomJS.
* Show the same tests running headless as part of a server-friendly test suite.
* (Wifi permitting) Show the same tests running on a couple different browser/OS combinations on the server cloud at SauceLabs – talking through a tunnel to the local firewalled application.
 
==New Technologies, Collaboration, & Entrepreneurship in Libraries: Harnessing Their Power to Help Your Library==
 
* Stephanie Walker – swalker@brooklyn.cuny.edu
* Howard Spivak – howards@brooklyn.cuny.edu
* Alex - Alex@brooklyn.cuny.edu
 
Academic libraries are caught in budget squeezes and often struggle to find ways to communicate value to senior administration and others. At Brooklyn College Library, we have taken an unusual, possibly unique, approach to these issues. Our technology staff have long worked directly with librarians to develop products that meet library, faculty, and student needs, and we have shared many of our products with colleagues, including an award-winning website, e-resource, and content management system we call 4MyLibrary, which we shared for free with 8 CUNY colleges, and also an easy-to-use book scanner, which has proven overwhelming popular with students, faculty, other librarians, and numerous campus offices. Recently, motivated by budget cuts, we decided that what worked for us might interest other libraries, and working with our Office of Technology Commercialization, we started selling 2 products: our book scanners (at half the price of commercial alternatives), and a hosting service, whereby we could host and support 4MyLibrary for libraries with minimal technology staff. Both succeeded, and yielded major benefits: a steady revenue stream and the admiration and serious goodwill of our senior administration and others. However, this presentation is neither a basic how-to, nor an advertisement. With this presentation, we hope to spur a conversation for broader collaboration, especially regarding new technologies, among libraries. We all have some level of technical expertise, most of us are struggling with rising prices and tight budgets, and many of us are unhappy with various technology products we use, from scanners to our ILS. We believe – and can demonstrate – that with collaboration, we can solve many of our problems, and provide better services to boot.
 
== Identifiers, Data, and Norse Gods ==
 
* Ryan Scherle, [http://datadryad.org Dryad Digital Repository], ryan@datadryad.org
** previous Code4Lib talk [http://ryan.scherle.org/papers/2010-2-code4lib-HIVE.ppt HIVE: A New Tool for Working With Vocabularies], at Code4Lib 2011.
 
ORCID and DataCite provide stable identifiers for researchers and and data, respectively. Each system does a fine job of providing value to its users. But wouldn't it be great if they could link their systems to create something much more powerful? Perhaps even as powerful as a god?
 
Enter [http://odin-project.eu/ ODIN], The ORCID and DataCite Interoperability Network. ODIN is a two-year project to unleash the power of persistent identifiers for researchers and the research they create. This talk will present recent work from the ODIN project, including several tools that can unleash the godlike power of identifiers at your institution. Current tools include:
* Metadata generator tool: allows repository staff to create DataCite metadata with embedded ORCIDs.
* Claiming tool: assists researchers in claiming their work within the ORCID system.
* ORCID-feed: includes a list of ORCID works on any web page.
* ODIN's HAMR: assists in populating a DSpace repository with ORCIDs. Based on work from a Code4Lib hackathon!
 
== Armed Bandits in the Digital Library ==
 
* Roman Chyla, [http://labs.adsabs.harvard.edu/adsabs/ Astrophysics Data System], rchyla@cfa.harvard.edu
** Previous Code4Lib: [http://code4lib.org/conference/2013/chyla Citation search in SOLR and second-order operators]
 
Many of us are using the excellent Lucene library (or SOLR appliance) to provide search functionality. These systems contain number of features to adjust relevancy ranking of hits, but we may not know how to use them. In this presentation, I'll present the available options - eg. what is the default ranking 'Vector space model, what are the alternatives (eg. BM25) and what are the other options we have to tweak and adjust the ranking of the hits (eg. boost factors, functions). But even if we know how to deploy these adjustments and tweaks, we are still left in dark. We do not know whether the change we've just rolled out had a significant (statistically significant) effect or maybe it was just a waste of time and resources? A/B testing is one option, but there may be a much better one - so called "Multi-Armed Bandits Approach". And in this talk I'd like to show how we are experimenting with this strategy to adjust [http://labs.adsabs.harvard.edu/adsabs/ ADS search engine].
 
== Building Worker Queues with AWS and Resque ==
 
* Eric Rochester [http://scholarslab.org Scholars' Lab], erochest@virginia.edu
* Scott Turnbull [http://aptrust.org/ Academic Preservation Trust], scott.turnbull@aptrust.org
 
A common task in larger systems is to be able to process large input files automatically. Often users can drop those files into a shared directory on AWS or on NFS or another shared drive. Those files need to be processed and potentially integrated into a system. This task has come up recently in the University of Virginia libraries in allowing users to add GIS data to the system and in setting up a system for the Academic Preservation Trust (http://aptrust.org/) that ingests files and resources into the preservation system.
 
This system is built by loosely coupling a number of different technologies. This allows us to easily interoperate and communicate between different system and programming environments. Because the interfaces are well defined, it’s also fairly simple to switch out technologies as the requirements of the system change.
 
The process is fairly simple:
 
First, a Ruby daemon monitors an AWS S3 bucket that others can upload new files into. This daemon creates a Resque status task, adds a marker for the task in a database, and continues monitoring.
 
Second, Resque mediates incoming job requests and routes them to the appropriate workers which may be in Java, Go, or Ruby. The diversity of technologies that Resque can manage allows great latitude to leverage the appropriate tool for a specific job. While processing, it updates the status for that job and coordinates processing with other jobs.
 
Finally, a page that is integrated into a larger Rails app provides a novice-user-friendly view of the status of the workers and allows basic tasks such as restarting the job.
 
This architecture allows us to swap in the technology that best fits each part of the process, and it makes it easier to maintain the system. We use this to integrate and coordinate between tasks handled in Java, Ruby, and Go, and it provides an effective way to interoperate with these programming languages and the respective strengths that they bring to this system.
 
 
== Sustaining your Open Source project through training ==
 
* Bess Sadler (Stanford University Libraries) and Mark Bussey (Data Curation Experts) will discuss their experiences developing and delivering training for Project Hydra.
 
Topics covered:
* Working practices for developing training materials
* Sharing the work when there are no dedicated resources
* Inviting community (and student) input to create higher quality content
* Strategies to keep training docs up-to date
* Strategies to make training materials available to the widest-possible audience
* Using surveys (Survey Monkey) to assess the effectiveness of your training program
 
 
==Piwik: Open source web analytics==
* Kirk Hess, University of Illinois at Urbana-Champaign (kirkhess@illinois.edu)
** (Code4Lib 2012: [http://code4lib.org/conference/2012/hess| Discovering Digital Library User Behavior with Google Analytics])
 
While Google Analytics is synonymous with Web Analytics, fortunately today we have many other good options, and one option is Piwik, [http://piwik.org| piwik.org] a simple to install, open-source PHP/MySQL application with a tracking script that will sit alongside Google Analytics tracking the usual clicks, events and variables. In this presentation, I'd like to cover the usual analytics topics and also cover what makes Piwik powerful, such as importing and visualizing web logs from any system to incorporate both past and future data, easily tracking downloads, and the ability to write your own reports or dashboard. The visitor log data is stored securely on your own server so you have control over who looks at the data and how much or how little to keep. With an active and helpful developer community, Piwik has the potential for analytics which makes sense for libraries, not e-commerce.
 
 
== Next Generation Catalogue - RDF as a Basis for New Services ==
* Anne-Lena Westrum – digitalutvikling@gmail.com
* Benjamin Rokseth
* Asgeir Rekkavik
* Petter Goksøyr Åsen
 
Oslo Public Library has converted the entire MARC-catalogue to RDF via the self-made conversion tool MARC2RDF.
[http://digital.deichman.no/data.deichman.no/| data.deichman.no], the enriched RDF version of the library catalogue including its authority files, forms the basis for two different mashups; The Active shelf and the Book recommendation database. The RDF catalogue is linked with various content and the dataset is updated daily to account for additions, deletions and changes made in the MARC catalogue.
 
[http://vimeo.com/68687814| The Active shelf] is a physical touchscreen device that makes use of open source software, RFID technology, RDF data and external web service APIs to provide information about any library book a patron is curious to know more about.
 
The Book recommendations database stores book recommendations written by library staff from all over Norway and links them to the RDF-representation of the MARC-catalogue.
 
==Economics of Scale: Thinking about Metadata Quality and Completeness for Fun and Profit==
* William Hicks, University of North Texas (William.hicks@unt.edu)
 
The UNT Libraries Digital Collections constitute three internet gateways, The Portal to Texas History, UNT Digital Library, and the Gateway to Oklahoma History, making available to the public a wide range of materials, from photographs and newspapers, to dissertations and recordings of music ensemble performances. The collections disseminate over 500,000 unique items, that were used over 9 millions times last year and with growth trends in both areas shows no signs of slowing.
As the size and scope of our collections has grown, so to has a pressing need to think clearly about the quality of our metadata, the completeness of our records, and the most efficient way of doing metadata entry. Not surprisingly there have been a few things written on the subject and so over the last few months we’ve started writing new code and getting the infrastructure of our metadata editing system to a place where we can begin to test these ideas on our ever expanding dataset. What kinds of questions are we looking to answer, and what types of tools are we building? That’s what this talk will be all about, but here are a few ideas to ponder:
* What kinds of tools have we built, or can we employ to standardize data entry and aid the user in their input needs?
* How close does a metadata record come to a “completeness” standard? What does that even look like? What are the implications when we look at such a standard at scale?
* If we can identify what we think a “quality” metadata record “is”, historically speaking, how close do we get to that ideal?
* Does an item’s history matter? Can we quantify it and locate value in change through time?
* What are the economic costs of metadata entry? If we have enough quantifiable measures about the types of objects in our systems, and we can profile our data entry personnel, what can this say about optimizing staff time and return on investment?
* What sort of priorities are we setting for ourselves when we treat all items as equal, when clearly some types of materials get vastly more use by the public.
* Finally what kinds of analysis tools might we develop to gauge our overall metadata “health,” to steer projects, or to ultimately improve our systems for our end user’s needs?
 
Most of our questions are still quite open ended, and honestly we are just getting started down this road. But as digital collections grow, and library budgets realign or shrink, it becomes increasingly important to back up our assertions and opinions with numbers, and find more efficient ways to work with the resources we have.
 
 
==A Different Kind of Search: Query Analysis of Map Search==
* Zoe Chao, University of New Mexico (zoechao@unm.edu)
* No previous Code4Lib presentation
 
Map searches are an increasingly important part of university and library websites. In 2012, The University of New Mexico (UNM) replaced its original PDF based campus maps (http://iss.unm.edu/PCD/campus-map.html) with an interactive map search based on the free Google Maps API. In addition to the basic map information such as streets and building outlines, we added search capabilities and categories for browsing (http://search.unm.edu/maps/). From November 2012 to September 2013, we logged about six thousand search instances on the campus map search. This data suggests that map searching presents a fundamentally different kind of search for users which results in a large number of failed searches that return empty or misleading result sets.
In this presentation we will briefly describe the development and current implementation of the UNM map search and our data collection of search queries. We then discuss the some surprising findings based on the data analysis. For instance, a large number of map queries include specific room numbers, which indicates some users perceive the search to include buildings' floor plans. This result suggests that we need to truncate numbers from queries in order to return correct building locations. Finally we will talk about the insight we gained from the data and our next steps toward the data driven interface design.
 
 
==More Like This: Approaches to Recommending Related Items using Subject Headings==
* Kevin Beswick, NCSU Libraries (kdbeswic@ncsu.edu)
** No previous code4lib presentations
With a significant portion of the collection at our new Hunt Library being housed in an automated storage and retrieval system, several of us at NCSU Libraries have begun looking at ways to replace and improve upon the classic shelf browsing experience in order to make it easier for patrons to browse related materials. Our goal is to mimic popular services like Amazon and Netflix, which utilize recommendation engines to make it easy for users to find items similar to a particular item of interest. While there have been previous efforts in libraries to recreate this experience using circulation or call number data, we are currently investigating algorithms that focus on use of subject headings. Use of subject headings as an alternative can be particularly helpful in the case of electronic materials that do not always have call numbers or circulation data. In this talk, I will share:
* Details of the proposed algorithms
* How these algorithms were quickly and easily implemented using Solr.
* Our evaluation process and its outcomes in terms of the effectiveness of the algorithms.
* How this has (or could) impact presentation of recommended items in our discovery layer.
 
== Questioning Authority: building a ruby gem to facilitate UI interactions with varied controlled vocabularies ==
* [[User:Mhbussey|Mark Bussey]], Data Curation Experts, mark@curationexperts.com
 
At a recent Hydra meeting, developers from five different institutions all realized that they had similar needs to support various types of UI fields based on a multiple of internal and external authorities and controlled vocabularies. Their goals was to develop a tool that let them meet these needs in ways that minimized the need for custom coding for each vocabulary. During an intense three-day working session, they minted the initial release of the [https://github.com/projecthydra/questioning_authority/blob/master/README.md questioning authority] gem.
 
The talk will cover both how cross-institutional development helped speed development and how the gem can be used for accessing both external vocabularies like LCSH and LCNA and for presenting internal vocabulary lists. Although the developing institutions are all Hydra implementers, the gem itself doesn't have any Hydra dependencies and can be used in any Rails or Blacklight based application.
 
 
== Building Hydra, a framework; a community ==
 
[mailto:justin@curationexperts.com Justin Coyne] Project Hydra contributor / Data Curation Experts
 
More than just a repository, the [http://projecthydra.org Hydra Project] is a community of cultural heritage institutions dedicated to pooling knowledge and resources. It is a completely open source project that has grown continuously for over 5 years. Within this vibrant community, a number of conventions and practices have emerged that we believe will benefit others attempting to cultivate support for their community oriented projects. The Hydra Project is now a mature initiative which is producing shareable, reusable and customizable components as well as complete repository solutions. In a time of tight budgets and growing demand for improved systems, we believe that "the Hydra way" is the exemplar case in the library community for how to work across institutions to deliver high quality services to our patrons. This talk will cover both the technical and human processes that have sustained Hydra's continued development and growth.
 
From [http://www.ohloh.net/p/projecthydra Ohloh.net]
In a Nutshell, Project Hydra...
* has had 8,364 commits made by 64 contributors representing 60,733 lines of code
* has a codebase with a long source history maintained by a very large development team with stable Y-O-Y commits
* took an estimated 15 years of effort (COCOMO model) starting with its first commit in October, 2009 ending with its most recent commit 7 days ago
 
== JQuery XML Editor ==
 
 
Presenter: Ben Pennell, UNC Chapel Hill Libraries (bbpennel@email.unc.edu)
no previous C4L presentations
 
The jquery.xmleditor is a portable jquery widget developed by the University of North Carolina at Chapel Hill Libraries for the purpose of simplifying the description workflow for existing objects in our digital repository. It does so by adding context and structure informed by an underlying XML schema. Even more generally, it creates and modifies XML documents in your web browser.
 
It can be found here, including a live demo:
[https://github.com/UNC-Libraries/jquery.xmleditor]
 
Features include:
* Graphical editor mode for displaying and modifying XML elements
* Text editor mode for directly modifying the underlying document (using the Cloud9 editor)
* Contextual, schema driven menus for adding new elements, subelements and attributes in both the graphical and text editing modes
* Fully javascript and CSS based, jquery widget
* AJAX submission of document modifications
* Ability to export XML document to a file in web browsers that support it
* Keyboard shortcuts for navigation and other operations
* Standalone tool for building JSON representations of XML schemas
 
In our own implementation, the tool communicates with a Fedora based SWORD 2 enabled repository to receive the starting MODS document and to submit changes. But it's all XML in the end, and includes options for exporting to file or submitting to any endpoint that accepts XML.
 
This presentation will include an overview of the development process, technologies and issues involved, as well as a brief demonstration of the editor in use. It will also touch on the tool backing the editor which constructs JSON objects from schemas.
 
== Visualizing Library Resources as Networks ==
* [mailto:matthewmiller@nypl.org Matt Miller] New York Public Library, NYPL Labs.
**No previous C4L presentations
 
Library resources are typically presented linearly in the form of a catalog search results page or an iterative list of subjects, books, special collections, etc. This talk explores the possibilities created when thinking of library resources as interconnected networks. We will look at the progress of a project to visualize NYPL resources such as catalog subject headings[1][2] as a network. We will also look at moving beyond visualizations into building network interfaces, such as our archival access term explorer[3] prototype.
 
[1] [https://dl.dropboxusercontent.com/u/4070829/catalog-viz-subjects/seadragon.html Catalog Subject Headings Visualization]
 
[2] [https://dl.dropboxusercontent.com/u/16562899/timelapse6.mp4 Time lapsed catalog network]
 
[3] [http://archives.nypl.org/terms Archival access term explorer prototype.]
 
== Island or Archipelago? Reducing Repository Redundancy at University of Toronto Libraries ==
 
*[mailto:sallain@utsc.utoronto.ca Sara Allain], Special Collections Librarian, University of Toronto Scarborough
*[mailto:kbabcock@utsc.utoronto.ca Kelli Babcock], Special Projects Librarian, Special Projects Librarian, University of Toronto Scarborough
*No previous Code4Lib presentations
 
This session will address a big issue in library technology – the creation of redundant repositories across large, multi-library institutions. We will discuss an ongoing collaboration at the University of Toronto: the development of Collections UofT, an Islandora/Drupal instance intended to support the special collections projects of UofT's community, faculty members, and 44 libraries. We will look at:
 
*Successful communication strategies imperative to fostering collaboration among project stakeholders
*Complications caused by legacy repositories and varying metadata standards
*Negotiating branding and usability requirements for disparate projects
*Focused outreach to generate community buy-in
*Defining the roles and responsibilities of the repository's community
*Generating a proactive response to the above issues through documentation, issue reporting, and standardized Memoranda of Understanding
 
As the University of Toronto Libraries continue to facilitate and develop digital projects, it is vital that our systems be both centralized and flexible, able to meet the needs of various collaborators across a wide range of subject areas. Collections UofT is our first step towards a brighter digital future for special collections at the University of Toronto.
 
 
== So You Think You Want to Be a DPLA Service Hub?: Building a Statewide Repository System for the Commonwealth ==
 
* Steven Anderson, Boston Public Library (sanderson@bpl.org)
**No previous presentations at national Code4Lib conferences (excluding one lightning talk in 2013)
* Eben English, Boston Public Library (eenglish@bpl.org)
**No previous presentations at national Code4Lib conferences
 
Built upon the Hydra stack, the [https://search.digitalcommonwealth.org Digital Commonwealth] repository system houses a variety of digital content from over a dozen Massachusetts libraries. In addition, we also harvest metadata via OAI-PMH from many other institutions throughout the state that lives alongside hosted content in (relative) harmony. This talk will discuss the development of our repository, with an emphasis on the specialized use cases that are involved in creating a system to serve as a DPLA service hub.
 
As a DPLA hub, we have many contributing institutions using many different systems (Omeka, DSpace, CONTENTdm, Fedora/Hydra, etc.) with OAI feeds that we need to harvest from and convert into our data storage format. Come hear about our journey into the madness of what people can put into their metadata records and our data normalization strategies for adding this content to our system.
 
We'll also cover:
 
* Inherited design structure: Each OAI source has its own metadata nuances, and creating a "single script to rule them all" is out of the question (even if the records use the same schema and/or come from the same system). It is, however, possible to use good object-oriented principles to first cover general cases and then adjust for each institution's metadata style. In addition, our system uses content models that inherit from more basic implementations that make dealing with various types of heterogeneous content in our system much less painful.
 
* Interface design: How do you create an online metadata editor for world's widest user base, from septuagenarian volunteers to academic librarians? How do you design a search interface that keeps content from a small historical society from getting lost in a sea of material contributed by statewide organizations? We've got answers.
 
* Useful libraries and techniques: '''> 120'''. That's how many date formats our system currently supports when reading from an OAI feed. What libraries did we use to help parse that information? How are we generating thumbnails for various types of content when none are provided? We'll cover useful libraries and gems that make the hub developer's life worth living again.
 
==Getting a New Website Without Losing the Old One==
 
*Angie Ballard, NCSU Libraries, (aballard@ncsu.edu)
**No previous Code4Lib presentations
*Charlie Morris, NCSU Libraries, (cdmorris@ncsu.edu)
*Erik Olson, NCSU Libraries, (eolson@ncsu.edu)
**No previous Code4Lib presentations
 
The NCSU Libraries last website redesign launch was in August 2010. The stated goal then was to position our website and our organization for a future of evolving through more iterative changes and agile workflows. This year’s latest evolution to a responsive designed website carried out this approach. We made incremental changes that retrofitted the face of the existing desktop website to be responsive-ready while simultaneously developing a fully-responsive Drupal theme.
 
Staff and end-users saw incremental changes starting with flattening the visual design, followed by font and spacing changes, modularizing existing page elements, and finally new responsive headers, footers and page layouts. This approach allowed us to re-use large portions of existing code, and to provide a more gradual shift for styaff and end-users. This iterative design process allows for testing and internal evaluation along the way. It also highlights IA and Content Strategy issues to be addressed in later projects.
 
We will talk about how scoping the project to these technical changes while largely maintaining the existing site IA, content, and visual design elements has a number of advantages with a few challenges.
 
==Solr faceted title/call-number/heading browse with inline cross-references==
 
* Michael Gibney, University of Pennsylvania (mgibney@pobox.upenn.edu)
* No previous presentations at national Code4Lib conferences
 
I would like to present an overview of recent development at the University of Pennsylvania library leveraging Solr/Lucene data structures to allow true browse (e.g. for Call Number, Title, Author, and Subject) with inline cross-references, over arbitrary subsets of records (as restricted by filters/facets/queries). Challenges addressed in development include:
 
* 1. Providing for efficient normalized term sorting (with highly-configurable normalization) while preserving term case and formatting for term-centric display.
* 2. Allowing record-centric display of results retrieved via term index (effectively allowing sorting on multi-valued fields). This point applies mainly to Call Number and Title browse.
* 3. Inline display (with associated record counts) of cross-references for heading terms (as of Nov. 8, 2013, implemented only for Author browse using LC authority file as represented in VIAF, but designed to be readily extended to apply to subject headings, and multiple, query-time configurable authority schemes).
 
The solution that will be presented is native to Solr/Lucene (an extension of UnInvertedField), and is related to the approach suggested by Jonathan Rochkind at: http://bibwild.wordpress.com/2010/06/05/note-to-self-more-ideas-for-browse-search-in-solr/. It is extremely lightweight, with the only dependencies being already supplied by Solr/Lucene on the classpath. It is flexible and easily configured via Solr configuration files. Being related strictly to Solr/Lucene, it should be front-end agnostic and equally applicable in VUFind, Blacklight, or any other framework using a Solr backend.
 
The resulting functionality is in production at http://franklin.library.upenn.edu/. It is still under heavy development, and questions/comments/criticism would be welcome. The source code has not been released open source, but hopefully that will change in the near future.
 
 
 
==Queue Programming -- how using job queues can make the Library coding world a better place==
 
*Birkin James Diana, Brown University (birkin_diana@brown.edu)
**I've given one or two C4L 20-minute talks and a few lightning ones over the years
 
In 2007 we built a system that dumped certain user web-requests for books into a database for offline-processing triggered via cron. We wanted to make the magic happen live, but knew it would take too long. Thus we created, sort of accidentally, a kind of old-fashioned static procedural job queue.
 
Over the years we we've been repeatedly impressed with how useful and robust this unintended architecture has been, and it fostered thinking about using real job queues in Library workflows.
 
Fast-forward to the present. We now are using _real_ job queueing, in production, for parts of the functioning of Brown Digital Repository. We've also used it for ingestion scripts, and plan to move more lots more code to this architecture.
 
I'd like to share & show:
* our lightweight rq/redis job queueing setup
* how using job queues can speed up workflows via using multiple workers
* how job queueing can make workflows more robust, especially by simplifying failure handling
* a way we've smoothly avoided race-conditions that can occur in concurrent-programming
* a technique for using task-processing job queues to simplify complex workflows
 
rq: http://python-rq.org
 
redis (python): https://pypi.python.org/pypi/redis/
 
== How Can a new NISO Recommended Practice Help Me? ==
* [mailto:nettie@niso.org Nettie Lagace], Associate Director of Programs, National Information Standards Organization (NISO)
* No previous C4L presentations (except for lightning talks in 2012 and 2013)
 
Two new NISO recommended practices are on their way to publication and hopefully, uptake and adoption: a specification for Open Access Metadata and Indicators (OAMI) and a Protocol for Exchanging Serial Content (PESC). Who are the stakeholders and potential users of these? How are they expected to be applied? This presentation will cover specification and implementation details for these two community-developed recommendations and utilize them as examples of consensus standards completed in a short turnaround time period.
 
The NISO Open Access Metadata and Indicators recommendations are a mechanism for transmitting the access status of scholarly works: peer reviewed articles published in subscription and hybrid journals, material available in institutional repositories, or any other such applicable material. Clear information regarding re-use rights must be included in this communication; “open access” on its own may not convey potential downstream uses. In addition, embargoes often come into play regarding availability of material.
 
The NISO Protocol for Exchanging Serial Content attempts to address an entirely different conundrum: how can digital files which make up serial content (which may well include text and images or other associated data) be successfully transmitted from partner to partner while including metadata requirements for description and organization of content? This information is needed for those who archive and preserve content, as well as those who may aggregate it, index it, or convert it to other uses. As more serial content is shipped to disparate stakeholders for all manner of potential uses, a common protocol will prevent local reinvention of the wheel.
 
Standards are entities that users in many communities often love to hate (http://xkcd.com/927/), but when projects need to be completed in a timely, cost-effective way and when interoperability with other entities is key, (almost) everyone will look to see if there is an existing standard or best practice in existence to help them get started. In order for standards and best practices to gain acceptance and adoption, it is critical for their development process to involve as many potential stakeholders and eventual user communities as possible.
 
== A reusable application to enable self deposit of complex objects into a digital preservation environment==
 
* Jill Sexton jill@email.unc.edu, UNC Chapel Hill Libraries
* Mike Daines daines@email.unc.edu, UNC Chapel Hill Libraries
* Greg Jansen count0@email.unc.edu, UNC Chapel Hill Libraries
 
Jill gave a lightning talk once, otherwise no previous C4L presentations
 
Patron-initiated ingest of complex, multi-part objects into digital preservation environments remains a challenging problem for many libraries. In this talk we discuss how we approached this problem at UNC Chapel Hill.
 
UNC Chapel Hill Libraries is the developer of the Curator’s Workbench, (download: http://www2.lib.unc.edu/software/ GitHub Repo: https://github.com/UNC-Libraries/Curators-Workbench/wiki) an open-source collections preparation and work flow tool for digital materials. In response to the demand for patron-initiated ingest into our preservation repository, we extended the functionality of the Workbench, creating a module that enables easy creation of web deposit forms suitable for varying content types. The forms use dictionary and crosswalk mapping components to map the input fields to the MODS schema. Form designs also include explanatory text and designation of required fields. The forms work in tandem with a server-side form-hosting application, which can be configured to put uploads and MODS records onto a filesystem, or to deposit materials into a repository via SWORD. The forms feature simplifies the creation of deposit forms, shifting form design from software developers to curators, who have greater familiarity with both the depositor community and with descriptive standards. We also shift metadata creation to the content creators, who have the most knowledge of submitted materials.
 
We will demonstrate how this process works for the submission of Studio Art MFA theses. These complex deposits consist of a narrative description of the artwork in addition to up to 20 video- or image-based files documenting of their work, and associated metadata for each file. In addition to preserving MFA projects in a stable environment, this procedure gives graduate students greater control over the submission and description process and provides online access to MFA Art Theses and supporting works. Additionally, the project has invited discussions with MFA students about the preservation of their personal archives.
 
Our talk will address how these tools could work within other digital preservation environments
 
 
== Leveling Up: Migrating Multiple DSpace Repositories to a Multi-tenant Configuration. ==
 
* Aaron Collier, Digital Repository Services Manager, Systemwide Digital Library Services, California State University (acollier@calstate.edu)
**No previous presentations at national Code4Lib conferences.
* Carmen Mitchell, Institutional Repository Manager, California State University San Marcos (cmitchell@csusm.edu)
**No previous presentations at national Code4Lib conferences (excluding Ask Anything sessions, 2012 & 2013)
 
In 2007 the California State University system started a project to provide a hosted institutional repository system for it’s individual campuses using the DSpace repository system. With limited technical staffing dedicated to the project, the result was a single server hosting seventeen individual and separate instances (including tomcat, databases and indexes). This lead to resource instability and lack of parity between versions, features and support. In order to overcome the shortcomings of this structure, a custom multi-tenant configuration was developed using the DSpace platform. This posed several technical challenges related to campus branding, authentication and deposit workflows.
During the development and testing of the multi-tenant structure of DSpace for the California State University system, constituent campuses continued to digitize works and create metadata in anticipation of a reliable system to insert these works. This created a situation where several campuses have created a lot of content and are looking for time saving measures for DSpace ingestion in order to continue work on the digitization projects. Development of a SWORD interface for bulk submission presented an attractive opportunity to provide a portal for bulk submission while avoiding the bottleneck of the provided method of FTP and DSpace scripting. Aaron Collier will talk about the technical challenges, and Carmen Mitchell will discuss the institutional needs: captioning, access copies vs display copies, workflow issues like batch uploading, embargoes, etc.
 
== Curate Cloud: The role of cloud computing in expanding the impact of digital curation ==
 
*Erik Mitchell (http://erikmitchell.info) University of California, Berkeley
*Jimmy Lin (http://www.umiacs.umd.edu/~jimmylin/) University of Maryland, College Park
 
Digital curation skills are a multidisciplinary and pressing need in public, academic and corporate environments (Yakel, 2007 336). By 2018, the United States will have a shortage of 140,000 -190,000 people with the deep analytical skills needed to manage large holdings of digital assets (Manyika et al., 2011). At the same time our information organizations will increasingly rely digital assets in making effective decisions (Ibid.). Despite advances in digital curation technologies, institutions create far more information than they curate in large part due to a gap in skills and perceived financial and technical barriers to entry (Heidorn, 2008). These barriers can seem insurmountable for smaller and under-represented information and cultural heritage institutions. However, new cloud computing based digital curation technologies reduce many of the financial and technical barriers so that the greatest challenge remaining is a need for updated skills and digital curation competencies.
 
Our information and cultural memory institutions require a new generation of professionals engaged in the preservation of digital resources and prepared to deploy curation tools that are not dependent on local technology infrastructure. In order to develop these competencies, Curate Cloud, a project being led by Dr. Jimmy Lin at the University of Maryland, College Park seeks to educate the next generation of information professionals using a curriculum integrated, cloud-based virtual learning environment.
 
The environment, designed using Amazon Web Service infrastructure and deployed in a “zero-configuration” environment lowers barriers of entry to students when learning about new technologies and cultivates a new level of cloud-based IT literacies in these students. This project draws on the successes of similar programs and pushes further by developing and deploying a novel cloud-based, open source virtual research and learning environment (VRLE) that embraces the on-demand, self-service model of cloud computing and features cloud-based curation tools that will enable the exploration of digital curation across the education, library, archive, and museum (LIS/LAM) community.
 
The presentation will focus on the research findings from the use of the VRLE in Library and Information Science education arenas as well as the challenges and opportunities that relate to delivering complex IT instruction using cloud computing platforms. The codebase for the VRLE is available at https://github.com/mitcheet.
 
This project is supported by the Institute for Museum and Library Services and Amazon Web Services through the Amazon Educational Research program.
 
Resources
*Heidorn, P. B. (2008). Shedding Light on the Dark Data in the Long Tail of Science. Library Trends, 57(2), 280–299. doi:10.1353/lib.0.0036.
*Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. (2011). Big data : The next frontier for innovation , competition , and productivity. McKinsey Global Institute, 364(May), 156.
*Yakel, E. (2007). Digital curation. OCLC Systems Services, 23(4), 335–340. doi:10.1108/10650750710831466
 
== Creating a better web experience ==
 
* Katie Bertel, bertelks@buffalostate.edu, SUNY Buffalo State
* Chris Parana, paranacj@buffalostate.edu, SUNY Buffalo State
**No previous presentations at Code4Lib
 
The web has become much more dynamic and interactive in recent times. Sites more closely resemble full-blown applications, rather than static information resources. We see an opportunity for libraries to adhere to the same design principles used by popular websites, to create a more intuitive and enjoyable user experience.
 
In our presentation, we will discuss the results from usability testing after a website redesign in 2012 (library.buffalostate.edu), our guiding design principles, and showcase some of our solutions that enhance user experience, such as responsive web design, unified searching (Knowledge Base, Summon, website documents), and transitional interfaces.
 
Frameworks can be exploited to significantly reduce the time needed to develop powerful and engaging web applications. For example, we can use motion and transitional interfaces to help convey the sense of “space” in web design.
 
The goal is to create an engaging experience to draw our users in. When this is achieved, it encourages usage and creates an enjoyable place that is more than just a tool, but also a place for discovery.
 
== Responsive Web Design - A Paradigm Shift ==
 
* Jenny Brandon, Web Designer/Librarian, Michigan State University Libraries (jbrandon@msu.edu)
 
No previous presentations at Code4Lib
 
RWD is the biggest paradigm shift in web design in the last decade. This presentation will begin with a brief overview of responsive web design (RWD), elements of RWD, what types of frameworks are available and why you should choose one. Examples of library websites that have already implemented RWD will be analyzed to compare and contrast design methods. The remainder of the presentation will provide details on the Michigan State University Libraries' implementation of responsive web design using the Drupal Omega theme, and solutions adopted to transform an existing, fixed width library web site to a responsive design.
 
Topics included:
* flexible grids
* media queries
* mobile first
* images
* design considerations
* collaboration
 
== The Smithsonian Transcription Center ==
 
eChing-hsien Wang, Branch Manager
Library and Archives Systems Innovations
Office of the Chief Information Officer
Smithsonian Institution
 
In 2013, the Smithsonian Institution - the largest library, archive, museum and research center complex in the world - launched transcription.si.edu, the first release of the Smithsonian's Digital Volunteers platform. With the ambitious goal to engage varied audiences, enrich collections and enable discovery in ways never before imagined, the Transcription Center enlists the "crowd" to transcribe millions of pages of handwritten documents from across the Institution's vast and diverse collections. We will share our goals, strategies, and experiences as contributors and developers of this collaborative initiative among librarians, archivists and museum curators. Design, workflows, user analytics, templates, and discoveries will be demonstrated and discussed for formats as varied as botanical specimen files, diaries, ledgers, field notebooks, letters, and photographs. We will also showcase the benefit of using open source technology in building our system architecture and we will share our technical challenges and lessons learned along the way.
 
Ching-hsien Wang has not presented at Code4Lib conference before, but have participated in other conference presentations before.
 
[[:Category:Code4Lib2014]][[Category:Talk Proposals]]
224
edits

Navigation menu