Difference between revisions of "2013 talks proposals"
m |
|||
Line 483: | Line 483: | ||
I will show 'guts' of the new citation search for astrophysics, it is generic and can be applied recursively to any Lucene query. Some people would call it a second-order operation because it works with the results of the previous (search) function. The talk will see technical details of the special query class, its collectors, how to add a new search operator and how to influence relevance scores. Then you can type with me: friends_of(friends_of(cited_for(keyword:"black holes") AND keyword:"red dwarf")) | I will show 'guts' of the new citation search for astrophysics, it is generic and can be applied recursively to any Lucene query. Some people would call it a second-order operation because it works with the results of the previous (search) function. The talk will see technical details of the special query class, its collectors, how to add a new search operator and how to influence relevance scores. Then you can type with me: friends_of(friends_of(cited_for(keyword:"black holes") AND keyword:"red dwarf")) | ||
+ | |||
+ | |||
+ | == Managing Segmented Images and Hierarchical Collections with Fedora-Commons and Solr == | ||
+ | |||
+ | * David Lacy, Villanova University, david DOT lacy AT villanova.edu | ||
+ | |||
+ | Many of the resources within our digital library are split into parts -- newspapers, scrapbooks and journals being examples of collections of individual scanned pages. In some cases, groups of pages within a collection, or segments within a particular page, may also represent chapters or articles. | ||
+ | |||
+ | We recently devised a procedure to extract these "segmented resources" into their own objects within our repository, and index them individually in our Discovery Layer. | ||
+ | |||
+ | In this talk I will explain how we dissected and organized these newly created resources with an extension to our Fedora Model, and how we make them discoverable through Solr configurations that facilitate browsable hierarchical relationships and field-collapsed results that group items within relevant resources. | ||
[[Category:Code4Lib2013]] | [[Category:Code4Lib2013]] |
Revision as of 20:11, 2 November 2012
Deadline has been extended by request due to the hurricane/storm.
Deadline for talk submission is Friday, November 9 at 11:59pm ET. We ask that no changes be made after this point, so that every voter reads the same thing. You can update your description again after voting closes.
Prepared talks are 20 minutes (including setup and questions), and focus on one or more of the following areas:
- tools (some cool new software, software library or integration platform)
- specs (how to get the most out of some protocols, or proposals for new ones)
- challenges (one or more big problems we should collectively address)
The community will vote on proposals using the criteria of:
- usefulness
- newness
- geekiness
- uniqueness
- awesomeness
Please follow the formatting guidelines:
== Talk Title == * Speaker's name, affiliation, and email address * Second speaker's name, affiliation, email address, if applicable Abstract of no more than 500 words.
Contents
- 1 Modernizing VuFind with Zend Framework 2
- 2 Did You Really Say That Out Loud? Tools and Techniques for Safe Public WiFi Computing
- 3 Drupal 8 Preview — Symfony and Twig
- 4 Neat! But How Do We Do It? - The Real-world Problem of Digitizing Complex Corporate Digital Objects
- 5 ResCarta Tools building a standard format for audio archiving, discovery and display
- 6 Format Designation in MARC Records: A Trip Down the Rabbit-Hole
- 7 Touch Kiosk 2: Piezoelectric Boogaloo
- 8 Wayfinding in a Cloud: Location Service for libraries
- 9 Empowering Collection Owners with Automated Bulk Ingest Tools for DSpace
- 10 Quality Assurance Reports for DSpace Collections
- 11 A Hybrid Solution for Improving Single Sign-On to a Proxy Service with Squid and EZproxy through Shibboleth and ExLibris’ Aleph X-Server
- 12 HTML5 Video Now!
- 13 Hybrid Archival Collections Using Blacklight and Hydra
- 14 Making the Web Accessible through Solid Design
- 15 Getting People to What They Need Fast! A Wayfinding Tool to Locate Books & Much More
- 16 De-sucking the Library User Experience
- 17 Solr Testing Is Easy with Rspec-Solr Gem
- 18 Northwestern's Digital Image Library
- 19 Two standards in a software (to say nothing of Normarc)
- 20 Future Friendly Web Design for Libraries
- 21 BYU's discovery layer service aggregator
- 22 The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery
- 23 The DH Curation Guide: Building a Community Resource
- 24 Solr Update
- 25 Reports for the People
- 26 Network Analyses of Library Catalog Data
- 27 Pitfall! Working with Legacy Born Digital Materials in Special Collections
- 28 Project foobar FUBAR
- 29 Implementing RFID in an Academic Library
- 30 Coding an Academic Library Intranet in Drupal: Now We're Getting Organizized...
- 31 Hands off! Best Practices and Top Ten Lists for Code Handoffs
- 32 How to be an effective evangelist for your open source project
- 33 What does it mean to be a "good" vendor in an open source meritocracy?
- 34 Occam’s Reader: A system that allows the sharing of eBooks via Interlibrary Loan
- 35 Using Puppet for configuration management when no two servers look alike
- 36 REST IS Your Mobile Strategy
- 37 ScholarSphere: How We Built a Repository App That Doesn't Feel Like Yet Another Janky Old Repository App
- 38 Coding with Mittens
- 39 Hacking the DPLA
- 40 Introduction to SilverStripe 3.0
- 41 Citation search in SOLR and second-order operators
- 42 Managing Segmented Images and Hierarchical Collections with Fedora-Commons and Solr
Modernizing VuFind with Zend Framework 2
- Demian Katz, Villanova University, demian DOT katz AT villanova DOT edu
When setting goals for a new major release of VuFind, use of an existing web framework was an important decision to encourage standardization and avoid reinvention of the wheel. Zend Framework 2 was selected as providing the best balance between the cutting-edge (ZF2 was released in 2012) and stability (ZF1 has a long history and many adopters). This talk will examine some of the architecture and features of the new framework and discuss how it has been used to improve the VuFind project.
Did You Really Say That Out Loud? Tools and Techniques for Safe Public WiFi Computing
- Peter Murray, LYRASIS, Peter.Murray@lyrasis.org
Public WiFi networks, even those that have passwords, are nothing more that an old-time party line: what every you say can be easily heard by anyone nearby. Remember Firesheep? It was an extension to Firefox that demonstrated how easy it was to snag session cookies and impersonate someone else. So what are you sending out over the airwaves, and what techniques are available to prevent eavesdropping? This talk will demonstrate tools and techniques for desktop and mobile operating systems that you should be using right now -- right here at Code4Lib -- to protect your data and your network activity.
Drupal 8 Preview — Symfony and Twig
- Cary Gordon, The Cherry Hill Company, cgordon@chillco.com
Drupal is a great platform for building web applications. Last year, the core developers decided to adopt the Symfony PHP framework, because it would lay the groundwork for the modernization (and de-PHP4ification) of the Drupal codebase. As I write this, the Symfony ClassLoader and HttpFoundation libraries are committed to Drupal core, with more elements likely before Drupal 8 code freeze.
It seems almost certain that the Twig templating engine will supplant PHPtemplate as the core Drupal template engine. Twig is a powerful, secure theme building tool that removes PHP from the templating system, the result being a very concise and powerful theme layer.
Symfony and Twig have a common creator, Fabien Potencier, who's overall goal is to rid the world of the excesses of PHP 4.
Neat! But How Do We Do It? - The Real-world Problem of Digitizing Complex Corporate Digital Objects
- Matthew Mariner, University of Colorado Denver, Auraria Library, matthew.mariner@ucdenver.edu
Isn't it neat when you discover that you are the steward of dozens of Sanborn Fire Instance Maps, hundreds of issues of a city directory, and thousands of photographs of persons in either aforementioned medium? And it's even cooler when you decide, "Let's digitize these together and make them one big awesome project to support public urban history"? Unfortunately it's a far more difficult process than one imagines at inception and, sadly, doesn't always come to fruition. My goal here is to discuss the technological (and philosophical) problems librarians and archivists face when trying to create ultra-rich complex corporate digital projects, or, rather, projects consisting of at least three facets interrelated by theme. I intend to address these problems by suggesting management solutions, web workarounds, and, perhaps, a philosophy that might help in determining whether to even move forward or not. Expect a few case studies of "grand ideas crushed by technological limitations" and "projects on the right track" to follow.
ResCarta Tools building a standard format for audio archiving, discovery and display
- John Sarnowski, The ResCarta Foundation, john.sarnowski@rescarta.org
The free ResCarta Toolkit has been used by libraries and archives around the world to host city directories, newspapers, and historic photographs and by aerospace companies to search and find millions of engineering documents. Now the ResCarta team has released audio additions to the toolkit.
Create full text searchable oral histories, news stories, interviews. or build an archive of lectures; all done to Library of Congress standards. The included transcription editor allows for accurate correction of the data conversion tool’s output. Build true archives of text, photos and audio. A single audio file carries the embedded Axml metadata, transcription, and word location information. Checks with the FADGI BWF Metaedit.
ResCarta-Web presents your audio to IE, Chome, Firefox, Safari, and Opera browsers with full playback and word search capability. Display format is OGG!!
You have to see this tool in action. Twenty minutes from an audio file to transcribed, text-searchable website. Be there or be L seven (Yeah, I’m that old)
Format Designation in MARC Records: A Trip Down the Rabbit-Hole
- Michael Doran, University of Texas at Arlington, doran@uta.edu
This presentation will use a seemingly simple data point, the "format" of the item being described, to illustrate some of the complexities and challenges inherent in the parsing of MARC records. I will talk about abstract vs. concrete forms; format designation in the Leader, 006, 007, and 008 fixed fields as well as the 245 and 300 variable fields; pseudo-formats; what is mandatory vs. optional in respect to format designation in cataloging practice; and the differences between cataloging theory and practice as observed via format-related data mining of a mid-size academic library collection.
I understand that most of us go to code4lib to hear about the latest sexy technologies. While MARC isn't sexy, many of the new tools being discussed still need to be populated with data gleaned from MARC records. MARC format designation has ramifications for search and retrieval, limits, and facets, both in the ILS and further downstream in next generation OPACs and web-scale discovery tools. Even veteran library coders will learn something from this session.
Touch Kiosk 2: Piezoelectric Boogaloo
- Andreas Orphanides, North Carolina State University Libraries, akorphan@ncsu.edu
At the NCSU Libraries, we provide realtime access to information on library spaces and services through an interactive touchscreen kiosk in our Learning Commons. In the summer of 2012, two years after its initial deployment, I redeveloped the kiosk application from the ground up, with an entirely new codebase and a completely redesigned user interface. The changes I implemented were designed to remedy previously identified shortcomings in the code and the interface design [1], and to enhance overall stability and performance of the application.
In this presentation I will outline my revision process, highlighting the lessons I learned and the practices I implemented in the course of redevelopment. I will highlight the key features of the HTML/Javascript codebase that allow for increased stability, flexibility, and ease of maintenance; and identify the changes to the user interface that resulted from the usability findings I uncovered in my previous research. Finally, I will compare the usage patterns of the new interface to the analysis of the previous implementation to examine the practical effect of the implemented changes.
I will also provide access to a genericized version of the interface code for others to build their own implementations of similar kiosk applications.
[1] http://journal.code4lib.org/articles/5832
Wayfinding in a Cloud: Location Service for libraries
- Petteri Kivimäki, The National Library of Finland, petteri.kivimaki@helsinki.fi
Searching for books in large libraries can be a difficult task for a novice library user. This paper presents The Location Service, software as a service (SaaS) wayfinding application developed and managed by The National Library of Finland, which is targeted for all the libraries. The service provides additional information and map-based guidance to books and collections by showing their location on a map, and it can be integrated with any library management system, as the integration happens by adding a link to the service in the search interface. The service is being developed continuously based on the feedback received from the users.
The service has two user interfaces: One for the customers and one for the library staff for managing the information related to the locations. The UI for the customers is fully customizable by the libraries, and the customization is done via template files by using the following techniques: HTML, CSS, and Javascript/jQuery. The service supports multiple languages, and the libraries have a full control of the languages, which they want to support in their environment.
The service is written in Java and it uses Spring and Hibernate frameworks. The data is stored in PostgreSQL database, which is shared by all the libraries. They do not possess a direct access to the database, but the service offers an interface, which makes it possible to retrieve XML data over HTTP. Modification of the data via admin UI, however, is restricted, and access on the other libraries’ data is blocked.
Empowering Collection Owners with Automated Bulk Ingest Tools for DSpace
- Terry Brady, Georgetown University, twb27@georgetown.edu
The Georgetown University Library has developed a number of applications to expedite the process of ingesting content into DSpace.
- Automatically inventory a collection of documents or images to be uploaded
- Generate a spreadsheet for metadata capture based on the inventory
- Generate item-level ingest folders, contents files and dublin core metadata for the items to be ingested
- Validate the contents of ingest folders prior to initiating the ingest to DSpace
- Present users with a simple, web-based form to initiate the batch ingest process
The applications have eliminated a number of error-prone steps from the ingest workflow and have significantly reduced a number of tedious data editing steps. These applications have empowered content experts to be in charge of their own collections.
In this presentation, I will provide a demonstration of the tools that were built and discuss the development process that was followed.
Quality Assurance Reports for DSpace Collections
- Terry Brady, Georgetown University, twb27@georgetown.edu
The Georgetown University Library has developed a collection of quality assurance reports to improve the consistency of the metadata in our DSpace collections. The report infrastructure permits the creation of query snippets to test for possible consistency errors within the repository such as items missing thumbnails, items with multiple thumbnails, items missing a creation date, items containing improperly formatted dates, items without duplicated metadata fields, items recently added items across the repository, a community or a collection
These reports have served to prioritize programmatic data cleanup tasks and manual data cleanup tasks. The reports have served as a progress tracker for data cleanup work and will provide on-going monitoring of the metadata consistency of the repository.
In this presentation, I will provide a demonstration of the tools that were built and discuss the development process that was followed.
A Hybrid Solution for Improving Single Sign-On to a Proxy Service with Squid and EZproxy through Shibboleth and ExLibris’ Aleph X-Server
- Alexander Jerabek, UQAM - Université du Québec à Montréal, jerabek.alexander_j@uqam.ca
- Minh-Quang Nguyen, UQAM - Université du Québec à Montréal, nguyen.minh-quang@uqam.ca
In this talk, we will describe how we developed and implemented a hybrid solution for improving single sign-on in conjunction with the library’s proxy service. This hybrid solution consists of integrating the disparate elements of EZproxy, the Squid workflow, Shibboleth, and the Aleph X-Server. We will report how this new integrated service improves the user experience. To our knowledge, this new service is unique and has not been implemented anywhere else. We will also present some statistics after approximately one year in production.
See article: http://journal.code4lib.org/articles/7470
HTML5 Video Now!
- Jason Ronallo, North Carolina State University Libraries, jnronall@ncsu.edu
Can you use HTML5 video now? Yes.
I'll show you how to get started using HTML5 video, including gotchas, tips, and tricks. Beyond the basics we'll see the power of having video integrated into HTML and the browser. Finally, we'll look at examples that push the limits and show the exciting future of video on the Web.
My experience comes from technical development of an oral history video clips project. I developed the technical aspects of the project, including video processing, server configuration, development of a public site, creation of an administrative interface, and video engagement analytics. Major portions of this work have been open sourced under an MIT license.
Hybrid Archival Collections Using Blacklight and Hydra
- Adam Wead, Rock and Roll Hall of Fame and Museum, awead@rockhall.org
At the Library and Archives of the Rock and Roll Hall of Fame, we use available tools such as Archivists' Toolkit to create EAD finding aids of our collections. However, managing digital content created from these materials and the born-digital content that is also part of these collections represents a significant challenge. In my presentation, I will discuss how we solve the problem of our hybrid collections by using Hydra as a digital asset manager and Blacklight as a unified presentation and discovery interface for all our materials.
Our strategy centers around indexing ead xml into Solr as multiple documents: one for each collection, and one for every series, sub-series and item contained within a collection. For discovery, we use this strategy to leverage item-level searching of archival collections alongside our traditional library content. For digital collections, we use this same technique to represent a finding aid in Hydra as a set of linked objects using RDF. New digital items are then linked to these parent objects at the collection and series level. Once this is done, the items can be exported back out to the Blacklight solr index and the digital content appears along with the rest of the items in the collection.
Making the Web Accessible through Solid Design
- Cynthia Ng from Ryerson University Library & Archives
In libraries, we are always trying our best to be accessible to everyone and we make every effort to do so physically, but what about our websites? Web designers are great at talking about the user experience and how to improve it, but what sometimes gets overlooked is how to make a site more accessible and meet accessibility guidelines. While guidelines are necessary to cover a minimum standard, web accessibility should come from good web design without ‘sacrificing’ features. While it's difficult to make a website fully accessible to everyone, there are easy, practical ways to make a site as accessible as possible.
While the focus will be on websites and meeting the Web Accessibility Guidelines WCAG, the presentation will also touch on how to make custom web interfaces accessible.
Getting People to What They Need Fast! A Wayfinding Tool to Locate Books & Much More
- Steven Marsden, Ryerson University Library & Archives, steven dot marsden at ryerson dot ca
- Cynthia Ng, Ryerson University Library & Archives
Having a bewildered, lost user in the building or stacks is a common occurrence, but we can help our users find their way through enhanced maps and floor plans. While not a new concept, these maps are integrated into the user’s flow of information without having to load a special app. The map not only highlights the location, but also provides all the related information with a link back to the detailed item view. During the first stage of the project, it has only be implemented for books (and other physical items), but the 'RULA Finder' is built to help users find just about anything and everything in the library including study rooms, computer labs, and staff. With a simple to use admin interface, it makes it easy for everyone, staff and users.
The application is written in PHP with data stored in a MySQL database. The end-user interface involves jQuery, JSON, and the library's discovery layer (Summon) API.
The presentation will not only cover the technical aspects, but also the implementation and usability findings.
De-sucking the Library User Experience
- Jeremy Prevost, Northwestern University, j-prevost {AT} northwestern [DOT] edu
Have you ever thought that library vendors purposely create the worst possible user experience they can imagine because they just hate users? Have you ever thought that your own library website feels like it was created by committee rather than for users because, well, it was? I’ll talk about how we used vendor supplied APIs to our ILS and Discovery tool to create an experience for our users that sucks at least a little bit less.
The talk will provide specific examples of how inefficient or confusing vendor supplied solutions are from a user perspective along with our specific streamlined solutions to the same problems. Code examples will be minimal as the focus will be on improving user experience rather than any one code solution of doing that. Examples may include the seemingly simple tasks of renewing a book or requesting an item from another campus library.
Solr Testing Is Easy with Rspec-Solr Gem
- Naomi Dushay, Stanford University, ndushay AT stanford DOT edu
How do you know if
- your idea for "left anchoring" searches actually works?
- your field analysis for LC call numbers accommodates a suffix between the first and second cutter without breaking the rest of LC call number parsing?
- tweaking Solr configs to improve, say, Chinese searching, won't break Turkish and Cyrillic?
- changes to your solrconfig file accomplish what you wanted without breaking anything else?
Avoid the whole app stack when writing Solr acceptance/relevancy/regression tests! Forget cucumber and capybara. This gem lets you easily (only 4 short files needed!) write tests like this, passing arbitrary parameters to Solr:
it "unstemmed author name Zare should precede stemmed variants" do resp = solr_response(author_search_args('Zare').merge({'fl'=>'id,author_person_display', 'facet'=>false})) resp.should include("author_person_display" => /\bZare\W/).in_each_of_first(3).documents resp.should_not include("author_person_display" => /Zaring/).in_each_of_first(20).documents end it "Cyrillic searching should work: Восемьсoт семьдесят один день" do resp = solr_resp_doc_ids_only({'q'=>'Восемьсoт семьдесят один день'}) resp.should include("9091779") end it "q of 'String quartets Parts' and variants should be plausible " do resp = solr_resp_doc_ids_only({'q'=>'String quartets Parts'}) resp.should have_at_least(2000).documents resp.should have_the_same_number_of_results_as(solr_resp_doc_ids_only({'q'=>'(String quartets Parts)'})) resp.should have_more_results_than(solr_resp_doc_ids_only({'q'=>'"String quartets Parts"'})) end it "Traditional Chinese chars 三國誌 should get the same results as simplified chars 三国志" do resp = solr_response({'q'=>'三國誌', 'fl'=>'id', 'facet'=>false}) resp.should have_at_least(240).documents resp.should have_the_same_number_of_results_as(solr_resp_doc_ids_only({'q'=>'三国志'})) end
See http://rubydoc.info/github/sul-dlss/rspec-solr/frames https://github.com/sul-dlss/rspec-solr
and our production relevancy/acceptance/regression tests slowly migrating from cucumber to: https://github.com/sul-dlss/sw_index_tests
Northwestern's Digital Image Library
- Mike Stroming, Northwestern University Library, m-stroming AT northwestern DOT edu
- Edgar Garcia, Northwestern University Library, edgar-garcia AT northwestern DOT edu
At Northwestern University Library, we are about to release a beta version of our Digital Image Library (DIL). DIL is an implementation of the Hydra technology that provides a Fedora repository solution for discovery of and access to over 100,000 images for staff, students, and scholars. Some important features are:
- Build custom collection of images using drag-and-drop
- Re-order images within a collection using drag-and-drop
- Nest collections within other collections
- Create details/crops of images
- Zoom, rotate images
- Upload personal images
- Retrieve your own uploads and details from a collection
- Export a collection to a PowerPoint presentation
- Create a group of users and authorize access to your images
- Batch edit image metadata
Our presentation will include a demo, explanation of the architecture, and a discussion of the benefits of being a part of the Hydra open-source community.
Two standards in a software (to say nothing of Normarc)
- Zeno Tajoli, CINECA (Italy), z DOT tajoli AT cineca DOT it
With this presentation I want to show how ILS Koha handles the support of three differnt MARC dialects: MARC21, Unimarc and Normarc. The main points of the presentation:
- Three MARC at MySQL level
- Three MARC at API level
- Three MARC at display
- Can I add a new format ?
Future Friendly Web Design for Libraries
- Michael Schofield, Alvin Sherman Library, Research, and Information Technology Center, mschofied[dot]nova[dot]edu
Libraries on the web are afterthoughts. Often their design is stymied on one hand by red tape imposed by the larger institution and on the other by an overload of too democratic input from colleagues. Slashed budgets / staff stretched too thin foul-up the R-word (that'd be "redesign") - but things are getting pretty strange. Notions about the Web (and where it can be accessed) are changing.
So libraries can only avoid refabbing their fixed-width desktop and jQuery Mobile m-dot websites for so long until desktop users evaporate and demand from patrons with web-ready refrigerators becomes deafening. Just when we have largely hopped on the bandwagon and gotten enthusiastic about being online, our users expect a library's site to look and perform great on everything.
Our presence on the web should be built to weather ever-increasing device complexity. To meet users at their point of need, libraries must start thinking Future Friendly.
This overview rehashes the approach and philosophy of library web design, re-orienting it for maximum accessibility and maximum efficiency of design. While just 20 minutes, we'll mull over techniques like mobile-first responsive web design, modular CSS, browser feature detection for progressive enhancement, and lots of nifty tricks.
BYU's discovery layer service aggregator
- Curtis Thacker, Brigham Young University, curtis.thacker AT byu DOT edu
It is clear that libraries will continue to experience rapid change based on the speed of technology. To acknowledge this new reality and to provide rapid response to shifting end user paradigms BYU has developed a custom service aggregator. At first our vendors looked at us a bit funny; however, in the last year they have been astonished with the fluid implementation of new services – here’s the short list:
- filmfinder - a tool for browsing and searching films
- A custom book recommender service based on checkout data
- Integrated library services like personell, library hours, study room scheduler and database finder through a custom adwords system.
- A very geeky and powerful utility used for converting marc XML into primo compliant xml.
- Embedded floormaps
- A responsive web design
- Bing did-you-mean
- And many more.
I will demo the system, review the archtecture and talk about future plans.
The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery
- Michael Klein, Senior Software Developer, Northwestern University LIbrary, michael.klein AT northwestern DOT edu
- Nathan Rogers, Programmer/Analyst, Indiana University, rogersna AT indiana DOT edu
Based on the success of the Variations digital music platform, Indiana University and Northwestern University have developed a next generation educational tool for delivering multimedia resources to the classroom. The Avalon Media System (formerly Variations on Video) supports the ingest, media processing, management, and access-controlled delivery of library-managed video and audio collections. To do so, the system draws on several existing, mature, open source technologies:
- The ingest, search, and discovery functionality of the Hydra framework
- The powerful multimedia workflow management features of Opencast Matterhorn
- The flexible Engage audio/video player
- The streaming capabilities of both Red5 Media Server (open source) and Adobe Flash Media Server (proprietary)
Extensive customization options are built into the framework for tailoring the application to the needs of a specific institution.
Our goal is to create an open platform that can be used by other institutions to serve the needs of the academic community. Release 1 is planned for a late February launch with future versions released every couple of months following. For more information visit http://avalonmediasystem.org/ and https://github.com/variations-on-video/hydrant.
The DH Curation Guide: Building a Community Resource
- Robin Davis, John Jay College of Criminal Justice, robdavis AT jjay.cuny.edu
- James Little, University of Illinois Urbana-Champaign, little9 AT illinois.edu
Data curation for the digital humanities is an emerging area of research and practice. The DH Curation Guide, launched in July 2012, is an educational resource that addresses aspects of humanities data curation in a series of expert-written articles. Each provides a succinct introduction to a topic with annotated lists of useful tools, projects, standards, and good examples of data curation done right. The DH Curation Guide is intended to be a go-to resource for data curation practitioners and learners in libraries, archives, museums, and academic institutions.
Because it's a growing field, we designed the DH Curation Guide to be a community-driven, living document. We developed a granular commenting system that encourages data curation community members to contribute remarks on articles, article sections, and article paragraphs. Moreover, we built in a way for readers to contribute and annotate resources for other data curation practitioners.
This talk will address how the DH Curation Guide is currently used and will include a sneak peek at the articles that are in store for the Guide’s future. We will talk about the difficulties and successes of launching a site that encourages community. We are all builders here, so we will also walk through developing the granular commenting/annotation system and the XSLT-powered publication workflow.
Solr Update
- Erik Hatcher, LucidWorks, erik.hatcher AT lucidworks.com
Solr is continually improving. Solr 4 was recently released, bringing dramatic changes in the underlying Lucene library and Solr-level features. It's tough for us all to keep up with the various versions and capabilities.
This talk will blaze through the highlights of new features and improvements in Solr 4 (and up). Topics will include: SolrCloud, direct spell checking, surround query parser, and many other features. We will focus on the features library coders really need to know about.
Reports for the People
- Kara Young, Keene State College, NH, kyoung1 at keene.edu
- Dana Clark, Keene State College, NH, dclark5 at keene.edu
Libraries are increasingly being called upon to provide information on how our programs and services are moving our institutional strategic goals forward. In support of College and departmental Information Literacy learning outcomes, Mason Library Systems at Keene State College developed an assessment database to record and report assessment activities by Library faculty. Frustrated by the lack of freely available options for intuitively recording, accounting for, and outputting useful reports on instructional activities, Librarians requested a tool to make capturing and reporting activities (and their lives) easier. Library Systems was able to respond to this need by working with librarians to identify what information is necessary to capture, where other assessment tools had fallen short, and ultimately by developing an application that supports current reporting imperatives while providing flexibility for future changes.
The result of our efforts was an in-house browser interfaced Assessment Database to improve the process of data collection and analysis. The application is written in PHP, data stored in a MySQL database, and presented via browser making extensive use of JQuery and JQuery plug-ins for data collection, manipulation, and presentation. The presentation will outline the process undertaken to build a successful collaboration with Library faculty from conception to implementation, as well as the technical aspects of our trial-and-error approach. Plus: cool charts and graphs!
Network Analyses of Library Catalog Data
- Kirk Hess, University of Illinois at Urbana-Champaign, kirkhess AT illinois.edu
- Harriett Green, University of Illinois at Urbana-Champaign, green19 AT illinois.edu
Library collections are all too often like icebergs: The amount exposed on the surface is only a fraction of the actual amount of content, and we’d like to recommend relevant items from deep within the catalog to users. With the assistance of an XSEDE Allocation grant (http://xsede.org), we’ve used R to reconstitute anonymous circulation data from the University of Illinois’s library catalog into separate user transactions. The transaction data is incorporated into subject analyses that use XSEDE supercomputing resources to generate predictive network analyses and visualizations of subject areas searched by library users using Gephi (https://gephi.org/). The test data set for developing the subject analyses consisted of approximately 38,000 items from the Literatures and Languages Library that contained 110,000 headings and 130,620 transactions. We’re currently working on developing a recommender system within VuFind to display the results of these analyses.
Pitfall! Working with Legacy Born Digital Materials in Special Collections
- Donald Mennerich, The New York Public Library, don.mennerich AT gmail.com
- Mark A. Matienzo, Yale University Library, mark AT matienzo.org
Archives and special collections are being faced with a growing abundance of born digital material, as well as an abundance of many promising tools for managing them. However, one must consider the potential problems that can arise when approaching a collection containing legacy materials (from roughly the pre-internet era). Many of the tried and true, "best of breed" tools for digital preservation don't always work as they do for more recent materials, requiring a fair amount of ingenuity and use of "word of mouth tradecraft and knowledge exchanged through serendipitous contacts, backchannel conversations, and beer" (Kirschenbaum, "Breaking badflag
").
Our presentation will focus on some of the strange problems encountered and creative solutions devised by two digital archivists in the course of preserving, processing, and providing access to collections at their institutions. We'll be placing particular particular emphasis of the pitfalls and crocodiles we've learned to swing over safely, while collecting treasure in the process. We'll address working with CP/M disks in collections of authors' papers, reconstructing a multipart hard drive backup spread across floppy disks, and more.
Project foobar FUBAR
- Becky Yoose, Grinnell College, yoosebec AT grinnell DOT edu
Be it mandated from Those In A Higher Pay Grade Than You or self-inflicted, many of us deal with managing major library-related technology projects [1]. It’s common nowadays to manage multiple technology projects, and generally external and internal issues can be planned for to minimize project timeline shifts and quality of deliverables. Life, however, has other plans for you, and all your major library technology infrastructure projects pile on top of each other at the same time. How do you and your staff survive a train wreck of technology projects and produce deliverables to project stakeholders without having to go into the library IT version of the United States Federal Witness Protection Program?
This session covers my experience with the collision of three major library technology projects - including a new institutional repository and an integrated library system migration - and how we dealt with external and internal factors, implemented damage control, and overall lessening the damage from the epic crash. You might laugh, you might cry, you will probably have flashbacks from previous projects, but you will come out of this session with a set of tools to use when you’re dealing with managing mission-critical projects.
[1] Past code4lib talks have covered specific project management strategies, such as Agile, for application development. I will be focusing on and discussing general project management practices in relation to various library technology projects, many of which these strategies include in their own structures.
Implementing RFID in an Academic Library
- Scott Bacon, Coastal Carolina University, sbacon AT coastal DOT edu
Coastal Carolina University’s Kimbel Library recently implemented RFID to increase security, provide better inventory control over library materials and enable do-it-yourself patron services such as self checkout.
I’ll give a quick overview of RFID and the components involved and then will talk about how our library utilized the technology. It takes a lot of research, time, money and not too little resourcefulness to make your library RFID-ready. I’ll show how we developed our project timeline, how we assessed and evaluated vendors and how we navigated the bid process. I’ll also talk about hardware and software installation, configuration and troubleshooting and will discuss our book and media collection encoding process.
We encountered myriad issues with our vendor, the hardware and the software. Would we do it all over again? Should your library consider RFID? Caveats abound...
Coding an Academic Library Intranet in Drupal: Now We're Getting Organizized...
- Scott Bacon, Coastal Carolina University, sbacon AT coastal DOT edu
The Kimbel Library Intranet is coded in Drupal 7, and was created to increase staff communication and store documentation. This presentation will contain an overview of our intranet project, including the modules we used, implementation issues, and possible directions in future development phases. I won’t forget to talk about the slew of tasty development issues we faced, including dealing with our university IT department, user buy-in, site navigation, user roles, project management, training and mobile modules (or the lack thereof). And some other fun (mostly) true anecdotes will surely be shared.
The main functions of Phase I of this project were to increase communication across departments and committees, facilitate project management and revise the library's shared drive. Another important function of this first phase was to host mission-critical documentation such as strategic goals, policies and procedures. Phase II of this project will focus on porting employee tasks into the centralized intranet environment. This development phase, which aims to replicate and automate the bulk of staff workflows within a content management system, will be a huge undertaking.
We chose Drupal as our intranet platform because of its extensibility, flexibility and community support. We are also moving our entire library web presence to Drupal in 2013 and will be soliciting any advice on which modules to use/avoid and which third-party services to wrangle into the Drupal environment. Should we use Drupal as the back-end to our entire Web presence? Why or why not?
Hands off! Best Practices and Top Ten Lists for Code Handoffs
- Naomi Dushay, Stanford University Library, ndushay@stanford.edu
- Bess Sadler, Stanford University Library, bess@stanford.edu
Transition points in who is the primary developer on an actively developing code base can be a source of frustration for everyone involved. We've tried to minimize that pain point as much as possible through the use of agile methods like test driven development, continuous integration, and modular design. Has optimizing for developer happiness brought us happiness? What's worked, what hasn't, and what's worth adopting? How do you keep your project in a state where you can easily hand it off?
How to be an effective evangelist for your open source project
- Bess Sadler, Stanford University Library, bess@stanford.edu
The difference between an open source software project that gets new adopters and new contributing community members (which is to say, a project that goes on existing for any length of time) and a project that doesn't, often isn't a question of superior design or technology. It's more often a question of whether the advocates for the project can convince institutional leaders AND front line developers that a project is stable and trustworthy. What are successful strategies for attracting development partners? I'll try to answer that and talk about what we could do as a community to make collaboration easier.
What does it mean to be a "good" vendor in an open source meritocracy?
- Matt Zumwalt, Data Curation Experts / MediaShelf / Hydra Project, matt@curationexperts.com
What is the role of vendors in open source? What should be the position of vendors in a meritocracy? What are the avenues for encouraging great vendors who contribute to open source communities in valuable ways? How you answer these questions has a huge impact on a community, and in order to formulate strong answers, you need to be well informed. Let’s glimpse at the business practicalities of this situation, beginning with 1) an overview of the viable profit models for open-source software, 2) some of the realities of vendor involvement in open source, and 3) an account of the ins & outs of compensation & equity structures within for-profit corporations.
The topics of power & influence, fairness, community participation, software quality, employment and personal profit are fair game, along with software licensing, sponsorship, closed source software and the role of sales people.
This presentation will draw on personal experience from the past seven years spent bootstrapping and running MediaShelf, a small but prolific for-profit consulting company that focuses entirely on open source digital repository software. MediaShelf has played an active role in creating the Hydra Framework and continuously contributes to maintenance of Fedora. Those contributions have been funded through consulting contracts for authoring & implementing open source software on behalf of organizations around the world.
Occam’s Reader: A system that allows the sharing of eBooks via Interlibrary Loan
- Ryan Litsey, Texas Tech University, Ryan DOT Litsey AT ttu.edu
- Kenny Ketner, Texas Tech University, Kenny DOT Ketner AT ttu.edu
Occam’s Reader is a software platform that allows the transfer and sharing of electronic books between libraries via existing interlibrary loan software. Occam’s Reader allows libraries to meet the growing need to be able to share our electronic resources. In the ever-increasing digital world, many of our collection development plans now include eBook platforms. The problem with eBooks, however, is that they are resources that are locked into the home library. With Occam’s Reader we can continue the centuries-old tradition of resource sharing and also keep up with the changing digital landscape.
Using Puppet for configuration management when no two servers look alike
- Eugene Vilensky, Senior Systems Administrator, Northwestern University Library, evilensky northwestern edu
Configuration management is hot because it allows one to scale to thousands of machines, all of which look alike, and tightly manage changes across the nodes. Infrastructure as code, implement all changes programmatically, yadda yadda yadda.
Unfortunately, servers which have gone unmanaged for a long time do not look very similar to each other. Variables come in many forms, usually because of some or all of the following: Who installed the server, where it was installed, where the image was sourced from, when it was installed, where additional packages were sourced, and what kind of software was hosted on it.
Bringing such machines into your configuration management platform is no harder and no easier than some or all of the following options options: 1) blow such machines away and start from scratch, migrate your data. 2) Find the lowest common baseline between the current state and the ideal state and start the work there. 3) implement new features/services on existing unmanaged machines but manage the new features/services.
I will describe our experiences at the library for all three options using the Puppet open-source tool on Enterprise Linux 5 and 6.
REST IS Your Mobile Strategy
- Richard Wolf, University of Illinois at Chicago, richwolf@uic.edu
Mobile is the new hotness ... and you can't be one of the cool kids unless you've got your own mobile app ... but the road to mobility is daunting. I'll argue that it's actually easier than it seems ... and that the simplest way to mobility is to bring your data to the party, create a REST API around the data, tell developers about your API, and then let the magic happen. To make my argument concrete, I'll show (lord help me!) how to go from an interesting REST API to a fun iOS tool for librarians and the general public in twenty minutes.
ScholarSphere: How We Built a Repository App That Doesn't Feel Like Yet Another Janky Old Repository App
- Dan Coughlin, Penn State University, danny@psu.edu
- Mike Giarlo, Penn State University, michael@psu.edu
ScholarSphere is a web application that allows the Penn State research community to deposit, share, and manage its scholarly works. It is also, as some of our users and our peers have observed, a repository app that feels much more like Google Docs or GitHub than earlier-generation repository applications. ScholarSphere is built upon the Hydra framework (Fedora Commons, Solr, Blacklight, Ruby on Rails), MySQL, Redis, Resque, FITS, ImageMagick, jQuery, Bootstrap, and FontAwesome. We'll talk about techniques we used to:
- eliminate Fedora-isms in the application
- model and expose RDF metadata in ways that users find unobtrusive
- manage permissions via a UI widget that doesn't stab you in the face
- harvest and connect controlled vocabularies (such as LCSH) to forms
- make URIs cool
- keep the app snappy without venturing into the architectural labyrinth of YAGNI
- build and queue background jobs
- expose social features and populate activity streams
- tie checksum verification, characterization, and version control to the UI
- let users upload and edit multiple files at once
The application will be demonstrated; code will be shown; and we solemnly commit to showing ABSOLUTELY NO XML.
Coding with Mittens
- Jim LeFager, DePaul University Library jlefager@depaul.edu
Working in an environment where developers have restricted access to servers and development areas, or where you are primarily working in multiple hosted systems with limited access, can be a challenge when you are attempting to incorporate any new functionality or improve an existing one. Hosted web services present a benefit so that staff time is not dedicated to server maintenance and development, but customization can be difficult and at times impossible. In many cases, incorporating any current API functionality requires additional work besides the original development work which can be frustrating and inefficient. The result can be a Frankenstein monster of web services that is confusing to the user and difficult to navigate.
This talk will focus on some effective best practices, and maybe not so great but necessary practices that we have adopted to develop and improve our user’s experience using javascript/jQuery and CSS to manipulate our hosted environments. This will include a review of available tools that allow collaborative development in the cloud, as well as examples of jQuery methods that have allowed us to take additional control of these hosted environments as well as track them using Google Analytics. Included will be examples from Springshare Campus Guides, CONTENTdm and other hosted web spaces that have been ‘hacked’ to improve the UI.
Hacking the DPLA
- Nate Hill, Chattanooga Public Library, nathanielhill AT gmail.com
- Sam Klein, Wikipedia, metasj AT gmail.com
The Digital Public Library of America is a growing open-source platform to support digital libraries and archives of all kinds. DPLA-alpha is available for testing, with data from six initial Hubs. New APIs and data feeds are in development, with the next release scheduled for April.
Come learn what we are doing, how to contribute or hack the DPLA roadmap, and how you (or your favorite institution) can draw from and publish through it. Larger institutions can join as a (content or service) hub, helping to aggregate and share metadata and services from across their {region, field, archive-type}. We will discuss current challenges and possibilities (UI and API suggestions wanted!), apps being built on the platform, and related digitization efforts.
DPLA has a transparent community and planning process; new participants are always welcome. Half the time will be for suggestions and discussion. Please bring proposals, problems, partnerships and possible paradoxes to discuss.
Introduction to SilverStripe 3.0
- Ian Walls, University of Massachusetts Amherst, iwalls AT library DOT umass DOT edu
SilverStripe is an open source Content Management System/development framework out of New Zealand, written in PHP, with a solid MVC structure. This presentation will cover everything you need to know to get started with SilverStripe, including
- Features (and why you should consider SilverStripe)
- Requirements & Installation
- Model-View-Controller
- Key data types & configuration settings
- Modules
- Where to start with customization
- Community support and participation
Citation search in SOLR and second-order operators
- Roman Chyla, Astrophysics Data System, roman.chyla AT (cfa.harvad.edu|gmail.com)
Citation search is basically about connections (Is the paper read by a friend of mine more important than others? Get me a paper read by somebody who cites many papers/is cited by many papers?), but the implementation of the citation search is surprisingly useful in many other areas.
I will show 'guts' of the new citation search for astrophysics, it is generic and can be applied recursively to any Lucene query. Some people would call it a second-order operation because it works with the results of the previous (search) function. The talk will see technical details of the special query class, its collectors, how to add a new search operator and how to influence relevance scores. Then you can type with me: friends_of(friends_of(cited_for(keyword:"black holes") AND keyword:"red dwarf"))
Managing Segmented Images and Hierarchical Collections with Fedora-Commons and Solr
- David Lacy, Villanova University, david DOT lacy AT villanova.edu
Many of the resources within our digital library are split into parts -- newspapers, scrapbooks and journals being examples of collections of individual scanned pages. In some cases, groups of pages within a collection, or segments within a particular page, may also represent chapters or articles.
We recently devised a procedure to extract these "segmented resources" into their own objects within our repository, and index them individually in our Discovery Layer.
In this talk I will explain how we dissected and organized these newly created resources with an extension to our Fedora Model, and how we make them discoverable through Solr configurations that facilitate browsable hierarchical relationships and field-collapsed results that group items within relevant resources.