Changes

2013 talks proposals

51,195 bytes added, 02:38, 5 September 2014

no edit summary

'''~~Deadline has been extended by request due to the hurricane~~Voting is complete. See results at: http:/~~storm~~/vote.code4lib.org/election/results/24'''

'''[http://code4lib.org/conference/2013/schedule 2013 Conference Schedule] ''' Deadline for talk submission is ''~~Friday, November 9~~'has closed''' ~~at 11:59pm ET~~. We ask that no changes be made after this point, so that every voter reads the same thing. You can update your description again after voting closes.

Prepared talks are 20 minutes (including setup and questions), and focus on one or more of the following areas:

Abstract of no more than 500 words.

</pre>

== All Teh Metadatas Re-Revisited ==

* Esme Cowles, UC San Diego Library, escowles AT ucsd DOT edu

* Matt Critchlow, UC San Diego Library, mcritchlow AT ucsd DOT edu

* Bradley Westbrook, UC San Diego Library, bdwestbrook AT ucsd DOT edu

Last year Declan Fleming presented ALL TEH METADATAS and reviewed our UC

San Diego Library Digital Asset Management system and RDF data model. You

may be shocked to hear that all that metadata wasn't quite enough to

handle increasingly complex digital library and research data in an

elegant way. Our ad-hoc, 8-year-old data model has also been added to in

inconsistent ways and our librarians and developers have not always been

perfectly in sync in understanding how the data model has evolved over

time.

In this presentation we'll review our process of locking a team of

librarians and developers in a room to figure out a new data model, from

domain definition through building and testing an OWL ontology. We¹ll also

cover the challenges we ran into, including the review of existing

controlled vocabularies and ontologies, or lack thereof, and the decisions

made to cover the gaps. Finally, we'll discuss how we engaged the digital

library community for feedback and what we have to do next. We all know

that Things Fall Apart, this is our attempt at Doing Better This Time.

== Modernizing VuFind with Zend Framework 2 ==

Can you use HTML5 video now? Yes.

I'll show you how to get started using HTML5 video, including gotchas, tips, and tricks. Beyond the basics we'll see the power of having video integrated into HTML and the browser. We'll look at how to interact with video (and other time-based media) via JavaScript. Finally, we'll look at examples that push the limits and show the exciting future of video on the Web.

My experience comes from technical development of an oral history video clips project. I developed the technical aspects of the project, including video processing, server configuration, development of a public site, creation of an administrative interface, and video engagement analytics. Major portions of this work have been open sourced under an MIT license.

== Hands off! Best Practices and Top Ten Lists for Code Handoffs ==

* Naomi Dushay, Stanford University Library, ndushay@AT stanford~~.edu~~* Bess Sadler, Stanford University Library, bess@stanford.DOT edu(as mouthpiece for multiple contributors)

Transition points in who is the primary developer on an actively developing code base can be a source of frustration for everyone involved. We've tried to minimize that pain point as much as possible through the use of agile methods like test driven development, continuous integration, and modular design. Has optimizing for developer happiness brought us happiness? What's worked, what hasn't, and what's worth adopting? How do you keep your project in a state where you can easily hand it off?

== How to be an effective evangelist for your open source project ==

The difference between an open source software project that gets new adopters and new contributing community members (which is to say, a project that goes on existing for any length of time) and a project that doesn't, often isn't a question of superior design or technology. It's more often a question of whether the advocates for the project can convince institutional leaders AND front line developers that a project is stable and trustworthy. What are successful strategies for attracting development partners? I'll try to answer that and talk about what we could do as a community to make collaboration easier.

== Thoughts from an open source vendor - What ~~does it mean to be~~ makes a "good" vendor in ~~an open source~~ a meritocracy? ==

* Matt Zumwalt, Data Curation Experts / MediaShelf / Hydra Project, matt@curationexperts.com

What is the role of vendors in open source? What should be the position of vendors in a meritocracy? What are the avenues for encouraging great vendors who contribute to open source communities in valuable ways? How you answer these questions has a huge impact on a community, and in order to formulate strong answers, you need to be well informed. Let’s glimpse at the business practicalities of this situation, beginning with 1) an overview of the viable profit models for open-source software, 2) some of the realities of vendor involvement in open source, and 3) an account of the ins & outs of compensation & equity structures within for-profit corporations.

The topics of power & influence, fairness, community participation, software quality, employment and personal profit are fair game, along with software licensing, support, sponsorship, closed source software and the role of sales people.

This presentation will draw on personal experience from the past seven years spent bootstrapping and running MediaShelf, a small but prolific for-profit consulting company that focuses entirely on open source digital repository software. MediaShelf has played an active role in creating the Hydra Framework and continuously contributes to maintenance of Fedoraand Blacklight. Those contributions have been funded through consulting contracts for authoring & implementing open source software on behalf of organizations around the world.

==Occam’s Reader: A system that allows the sharing of eBooks via Interlibrary Loan==

Mobile is the new hotness ... and you can't be one of the cool kids unless you've got your own mobile app ... but the road to mobility is daunting. I'll argue that it's actually easier than it seems ... and that the simplest way to mobility is to bring your data to the party, create a REST API around the data, tell developers about your API, and then let the magic happen. To make my argument concrete, I'll show (lord help me!) how to go from an interesting REST API to a fun iOS tool for librarians and the general public in twenty minutes.

== ARCHITECTING ScholarSphere: How We Built a Repository App That Doesn't Feel Like Yet Another Janky Old Repository App ==

* Dan Coughlin, Penn State University, danny@psu.edu

This talk will focus on some effective best practices, and maybe not so great but necessary practices that we have adopted to develop and improve our user’s experience using javascript/jQuery and CSS to manipulate our hosted environments. This will include a review of available tools that allow collaborative development in the cloud, as well as examples of jQuery methods that have allowed us to take additional control of these hosted environments as well as track them using Google Analytics. Included will be examples from Springshare Campus Guides, CONTENTdm and other hosted web spaces that have been ‘hacked’ to improve the UI.

== Hacking the DPLA ==

* Nate Hill, Chattanooga Public Library, nathanielhill AT gmail.com

* Sam Klein, Wikipedia, metasj AT gmail.com

The Digital Public Library of America is a growing open-source platform to support digital libraries and archives of all kinds. DPLA-alpha is available for testing, with data from six initial Hubs. New APIs and data feeds are in development, with the next release scheduled for April.

Come learn what we are doing, how to contribute or hack the DPLA roadmap, and how you (or your favorite institution) can draw from and publish through it. Larger institutions can join as a (content or service) hub, helping to aggregate and share metadata and services from across their {region, field, archive-type}. We will discuss current challenges and possibilities (UI and API suggestions wanted!), apps being built on the platform, and related digitization efforts.

DPLA has a transparent community and planning process; new participants are always welcome. Half the time will be for suggestions and discussion. Please bring proposals, problems, partnerships and possible paradoxes to discuss.

== Introduction to SilverStripe 3.0 ==

* Ian Walls, University of Massachusetts Amherst, iwalls AT library DOT umass DOT edu

SilverStripe is an open source Content Management System/development framework out of New Zealand, written in PHP, with a solid MVC structure. This presentation will cover everything you need to know to get started with SilverStripe, including

* Features (and why you should consider SilverStripe)

* Requirements & Installation

* Model-View-Controller

* Key data types & configuration settings

* Modules

* Where to start with customization

* Community support and participation

== Citation search in SOLR and second-order operators ==

* Roman Chyla, Astrophysics Data System, roman.chyla AT (cfa.harvad.edu|gmail.com)

Citation search is basically about connections (Is the paper read by a friend of mine more important than others? Get me a paper read by somebody who cites many papers/is cited by many papers?), but the implementation of the citation search is surprisingly useful in many other areas.

I will show 'guts' of the new citation search for astrophysics, it is generic and can be applied recursively to any Lucene query. Some people would call it a second-order operation because it works with the results of the previous (search) function. The talk will see technical details of the special query class, its collectors, how to add a new search operator and how to influence relevance scores. Then you can type with me: friends_of(friends_of(cited_for(keyword:"black holes") AND keyword:"red dwarf"))

== Managing Segmented Images and Hierarchical Collections with Fedora-Commons and Solr ==

* David Lacy, Villanova University, david DOT lacy AT villanova.edu

Many of the resources within our digital library are split into parts -- newspapers, scrapbooks and journals being examples of collections of individual scanned pages. In some cases, groups of pages within a collection, or segments within a particular page, may also represent chapters or articles.

We recently devised a procedure to extract these "segmented resources" into their own objects within our repository, and index them individually in our Discovery Layer.

In this talk I will explain how we dissected and organized these newly created resources with an extension to our Fedora Model, and how we make them discoverable through Solr configurations that facilitate browsable hierarchical relationships and field-collapsed results that group items within relevant resources.

== Google Analytics, Event Tracking and Discovery Tools==

* Emily Lynema, North Carolina State University Libraries. ejlynema AT ncsu DOT edu

* Adam Constabaris, North Carolina State University Libraries, ajconsta AT ncsu DOT edu

The NCSU Libraries is using Google Analytics increasingly across its website as a replacement for usage tracking via Urchin. More recently, we have also begun to use the event tracking features in Google Analytics. This has allowed us to gather usage statistics for activities that don’t initiate new requests to the server, such as clicks that hide and show already-loaded content (as in many tabbed interfaces). Aggregating these events together with pageview tracking in Google Analytics presents a more unified picture of patron activity and can help improve design of tools like the library catalog. While assuming a basic understanding of the use of Google Analytics pageview tracking, this presentation will start with an introduction to the event tracking capabilities that may be less widely known.

We’ll share library catalog usage data pulled from Google Analytics, including information about features that are common across the newest wave of catalog interfaces, such as tabbed content, Google Preview, and shelf browse. We will also cover the approach taken for the technical implementation of this data-intensive JavaScript event tracking.

As a counterpart, we can demonstrate how we have begun to use Google Analytics event tracking in a proprietary vendor discovery tool (Serials Solutions Summon). While the same technical ideas govern this implementation, we can highlight the differences (read, challenges) inherent in utilizing this type of event tracking in vendor-owned application vs. a locally developed application.

Along the way, hopefully you’ll learn a little about why you might (or might not) want to use Google Analytics event tracking yourself and see some interesting catalog usage stats.

== Actions speak louder than words: Analyzing large-scale query logs to improve the research experience ==

* Raman Chandrasekar, Serials Solutions, Raman DOT Chandrasekar AT serialssolutions DOT com

* Ted Diamond, Serials Solutions, Ted DOT Diamond AT serialssolutions DOT com

Analyzing anonymized query and click through logs leads to a better understanding of user behaviors and intentions and provides great opportunities to respond to users with an improved search experience. A large-scale provider of SaaS services, Serials Solutions is uniquely positioned to learn from the dataset of queries aggregated from the Summon service generated by millions of users at hundreds of libraries around the world.

In this session, we will describe our Relevance Metrics Framework and provide examples of insights gained during its development and implementation. We will also cover recent product changes inspired by these insights. Chandra and Ted, from the Summon dev team, will share insights and outcomes from this ongoing process and highlight how analysis of large-scale query logs helps improve the academic research experience.

== Supporting Gaming in the College Classroom ==

*Megan O'Neill, Albion College, moneill AT albion DOT edu

Faculty are increasingly interested both in teaching with games and with gamifying their courses. Introducing digital games and game support for faculty through the library makes a lot of sense, but it comes with a thorny set of issues. This talk will discuss our library's initial steps toward creating a digital gamerspace and game support infrastructure in the library, including:

1) The scope and acquisitions decisions that make the most sense for us, and 2) Some difficulties we've discovered in trying to get our collection, physical- , digital- and head-space, and infrastructure up and going.

There will also be an extremely brief overview of WHY we decided to teach with games and to support gamification, what (if anything) to do about mobile gaming, and where games in education might be going.

== Codecraft ==

* Devon Smith, OCLC Research, smithde@oclc.org

We can think of and talk about software development as science, engineering, and craft. In this presentation, I'll talk about the craft aspect of software. From Wikipedia[1]: "In English, to describe something as a craft is to describe it as lying somewhere between an art (which relies on talent and technique) and a science (which relies on knowledge). In this sense, the English word craft is roughly equivalent to the ancient Greek term techne." Of the questions who, what, where, why, when, and how, I will focus on why and how, with a minor in where.

'''N.B.''': This will be a NON-TECHNICAL talk.

[1] https://en.wikipedia.org/wiki/Craft#Classification

== KnowBot: A Tool to Manage Reference and Beyond ==

* Sarah Park, Northwest Missouri State University

* Hong Gyu Han, Northwest Missouri State University

* Lori Mardis, Northwest Missouri State University

Northwest Missouri State University has developed and used RefPole for collecting and analyzing reference statistics since 2005. RefPole was a tool to answer librarians’ needs to manage reference statistics and knowledge among librarians. It was an analysis tool for the library leaders to make decisions on library operations. RefPole was adequate for the internal use; however, it was developed for local access which keeps the collective reference knowledge from being shared beyond the desktop and from being accessed by students and faculty.

In 2011, responding to growing internal and external need, the library has developed a web based knowledge base management system, KnowBot, in Ruby on Rail. KnowBot offers public searching, rating, cloud tagging, librarian, and reporting interfaces. With the additional public interfaces, it also extended reference services 24/7. Librarians can record responses to questions with graphics and multimedia. The reporting interface features not only the simple transactional data, but it also exhibits multi-dimensional analytic tool in real time.

The presenters will demonstrate KnowBot; share the source code; and discuss the use of the knowledge base to answer the organizational and public need.

== Creating a (mostly) integrated Patron Account with SirsiDynix Symphony and ILLiad ==

* Emily Lynema, North Carolina State University Libraries, ejlynema AT ncsu DOT edu

* Jason Raitz, North Carolina State University Libraries, jcraitz AT ncsu DOT edu

IIn 2012, the NCSU Libraries at long last replaced a vendor “my account” tool that had been running unsupported for years. With the opportunity to create something new, one of the initial goals was a user experience that more seamlessly combined ILS data from SirsiDynix Symphony with ILL data from ILLiad. As a Kuali OLE beta partner, the NCSU Libraries is looking at an ILS migration within the next few years, so another goal was to build the interface on top of a standard so it would not have to be re-written as part of the migration. And the icing on the cake was a transition from a local Perl-based authentication system to the newer campus-wide Shibboleth authentication.

This presentation will start with our design goals for a new user interface, include a demonstration, and describe the simple techniques used to provide a more integrated view of Symphony and ILLiad patron data. The backbone of the actual application is built using Zend’s PHP Framework and integrates eXtensible Catalog’s NCIP Toolkit to reach out to Symphony for patron data. In addition, we can talk about our successes (and difficulties) using jQuery Mobile to create a mobile view using the same underlying code as the web version. As one of our first Shibboleth applications here in the Libraries, this experience also taught us first-hand about some of the challenges of this type of single sign-on.

== SKOS Name Authority in a DSpace Institutional Repository ==

* Tom Johnson, Oregon State University, thomas.johnson@oregonstate.edu

Name ambiguity is widespread in institutional repositories. Searching by author, users are typically greeted by a variety of misspellings and permutations of initials, collision between contributors with similar names, and other problems inherent in uncontrolled (often user-submitted) data. While DSpace has the technical capacity to use controlled names, it relies on outside authority files (from LoC, for example) to do the heavy lifting. For institutional authors, this leaves a major coverage gap and creates namespace pollution on a vast scale (try searching [http://authorities.loc.gov authorities.loc.gov] for "Johnson, John", sometime).

OSU is solving this problem with an institutionally scoped, low maintenance SKOS/FOAF "name authority file". People in the IR are assigned URIs, names are maintained as skos:prefLabel, altLabel, or hiddenLabel. We've developed a simple Python application allowing staff to update individual "records", and code on the DSpace side to access the dataset over SPARQL. This presentation will walk you through where we are now, limitations we've run into, and possibilities for the future.

== Meta-Harvesting: Harvesting the Harvesters ==

* Steven Anderson, Boston Public Library, sanderson AT bpl DOT org

* Eben English, Boston Public Library, eenglish AT bpl DOT org

The emerging Digital Public Library of America (http://dp.la/) has proposed to aggregate digital content for search and discovery from several regional "service hubs" that will provide metadata via an as-yet-unspecified harvest process. As these service hubs are already harvesters of digital content from myriad sources themselves, the potential for "telephone game"-esque data loss and/or transmutation is a significant danger.

This talk will discuss the experience of Digital Commonwealth (http://www.digitalcommonwealth.org/), a statewide digital repository currently in the process of being revamped, refactored, and redesigned by the Boston Public Library using the Hydra Framework. The repository, which aggregates data from over 20 institutions (some of which are themselves aggregators), is also undergoing a massive metadata cleanup effort as records are prepared to be ingested into the DPLA as one of the regional service hubs. Topics will include automated and manual processes for data crosswalking and cleanup, advanced OAI-PMH chops, and the implications of the (at this time still-emerging) metadata standards and APIs being created by the DPLA.

Every crosswalk, transformation, migration, harvest, or export/ingest of metadata requires informed decision making and precise attention to detail. This talk will provide insight into key decision points and potential quagmires, as well as a discussion of the challenges of dealing with heterogeneous data from a wide variety of institutions.

== Pay No More Than £3 // DIY Digital Curation ==

* Chris Fitzpatrick, World Maritime University, cf AT wmu DOT se

Are you a small library or archive? <br>

Do you feel you are being held back by limited technical resources?<br>

Tired of waiting around for the Google Books Library people to reply to your emails? <br>

Join the club. Open-source software, hackerspaces, dirt cheap storage, cloud computing, and social media make it possible for any institution to start curating digitally. Today.

This talk will cover some of the guerrilla tactics being employed to drag a small university's large collection into the internet age.

Topics will include:

*Cheap and effective document scanning methods.

*Valuable resources found at your local hackerspace / makerspace / fablab.

*Metadata enrichment for the not-so-rich and NLP for the people.

*Utilizing social media to crowdsource your collection building.

*How to post-process, OCR, PDF, and ePub your documents using Free software.

*Ways to build out a digital repository with no servers, code, or large 2-year grants required. (ok, maybe some code).

== IIIF: One Image Delivery API to Rule Them All ==

* Willy Mene, Stanford University Libraries, wmene AT stanford DOT edu

* Stuart Snydman, Stanford University Libraries, snydman AT stanford DOT edu

The International Image Interoperability Framework was conceived of by a group of research and national libraries determined to achieve the holy grail of seamless sharing and reuse of images in digital image repositories and applications. By converging on common API’s for image delivery, metadata transmission and search, it is catalyzing the development of a new wave of interoperable image delivery software that will surpass the current crop of image viewers, page turners, and navigation systems, and in so doing give scholars an unprecedented level of consistent and rich access to image-based resources across participating repositories.

The IIIF Image API (http://library.stanford.edu/iiif/image-api) specifies a web service that returns an image in response to a standard http or https request. The URL can specify the region, size, rotation, quality characteristics and format of the requested image. A URL can also be constructed to request basic technical information about the image to support client applications. The API could be adopted by any image repository or service, and can be used to retrieve static images in response to a properly constructed URL.

In this presentation we will review version 1 of the IIIF image api and validator, demonstrate applications by daring early adopters, and encourage widespread adoption.

== Data-Driven Documents: Visualizing library data with D3.js ==

* Bret Davidson, North Carolina State University Libraries, bret_davidson@ncsu.edu

Several JavaScript libraries have emerged over the past few years for creating rich, interactive visualizations using web standards. Few are as powerful and flexible as D3.js[1]. D3 stands apart by merging web standards with a rich API and a unique approach to binding data to DOM elements, allowing you to apply data-driven transformations to a document. This emphasis on data over presentation has made D3 very popular; D3 is used by several prominent organizations including the New York Times[2], GOV.UK[3], and Trulia[4].

Power usually comes at a cost, and D3 makes you pay with a steeper learning curve than many alternatives. In this talk, I will get you over the hump by introducing the core construct of D3, the Data-Join. I will also discuss when you might want to use D3.js, share some examples, and explore some advanced utilities like scales and shapes. I will close with a brief overview of how we are successfully using D3 at NCSU[5] and why investing time in learning D3 might make sense for your library.

*[1]http://d3js.org/

*[2]http://www.nytimes.com/interactive/2012/08/24/us/drought-crops.html

*[3]https://www.gov.uk/performance/dashboard

*[4]http://trends.truliablog.com/vis/pricerange-boston/

*[5]http://www.lib.ncsu.edu/dli/projects/spaceassesstool

== ''n'' Characters in Search of an Author ==

* Jay Luker, IT Specialist, Smithsonian Astrophysics Data System, jluker@cfa.harvard.edu

When it comes to author names the disconnect between our metadata and what a user might enter into a search box presents challenges when trying to maximize both precision and recall [0]. When indexing a paper written by "Wäterwheels, A" a goal should be to preserve as much as possible the original information. However, users searching by author name may frequently omit the diaeresis and search for simply, "Waterwheels". The reverse of this scenario is also possible, i.e., your decrepit metadata contains only the ASCII, "Supybot, Zoia", whereas the user enters, "Supybot, Zóia". If recall is your highest priority the simple solution is to always downgrade to ASCII when indexing and querying. However this strategy sacrifices precision, as you will be unable to provide an "exact" search, necessary in cases where "Hacker, J" and "Häcker, J" really are two distinct authors.

This talk will describe the strategy ADS[1] has devised for addressing common and edge-case problems faced when dealing with author name indexing and searching. I will cover the approach we devised to not only the transliteration issue described above, but also how we deal with author initials vs. full first and/or middle names, authors who have published under different forms of their name, authors who change their names (wha? people get married?!). Our implementation relies on Solr/Lucene[2], but my goal is an 80/20 mix of high- vs. low-level details to keep things both useful and stackgnostic [3].

*[0] http://en.wikipedia.org/wiki/Precision_and_recall

*[1] http://www.adsabs.harvard.edu/

*[2] http://lucene.apache.org/solr/

*[3] http://en.wikipedia.org/wiki/Portmanteau

== But, does it all still work : Testing Drupal with simpletest and casperjs ==

* David Kinzer - Lead Developer, Jenkins Law Library, dkinzer@jenkinslaw.org

* Chad Nelson - Developer, Jenkins Law Library, cnelson@jenkinslaw.org

Most developers know that they should be writing tests along with their code, but not every developer knows how or where to get started. This talk will walk through the nuts and bolts of the testing a medium-sized Drupal site with many integrated moving parts. We’ll talk about unit testing of individual functions with [http://www.simpletest.org/en/overview.html SimpleTest] (and how that has changed how we write functions), functional testing of the user interface with [http://casperjs.org/ casperjs]. We will discuss automating deployment with [http://www.phing.info/ phing], [http://drupal.org/project/drush drush], [http://jenkins-ci.org/ jenkins-ci] & github, which, combined with our tests, removes the “hold-your-breath” feeling before updating our live site.

[[Category:Code4Lib2013]]

== Relations, Recommendations and PostgreSQL ==

* William Denton, Web Librarian, York University, wdenton@yorku.ca

* Dan Scott, Systems Librarian, Laurentian University, dscott@laurentian.ca

In 2012, a ragtag group of library hackers from various Ontario

universities, funded with only train tickets and fueled with Tim Hortons

coffee, assembled under the Scholars Portal banner to build a common

circulation data repository and recommendation engine: the Scholars

Portal Library Usage-based Recommendation Engine (SPLURGE). PostgreSQL,

the emerging darling of the old-school relational database world, is the

heart of SPLURGE, and the circulation data for Ontario's 400,000

university students is its blood. Two of the contributors to this effort explore the PostgreSQL features

that SPLURGE uses to ease administration efforts, simplify application

development, and deliver high performance results. If you don't use

PostgreSQL for your data, you might want to try it after this

presentation; if you already do, you'll pick up some new tips and tricks.

== A Cure for Romnesia: Site Story Web-Archiving ==

* Harihar Shankar, Research Library, Los Alamos National Laboratory, harihar@lanl.gov

The web changes constantly, erasing both inconvenient facts and

fictions. At web-scale, preservation organizations cannot be expected

to keep up by using traditional crawling, and they already miss many

important versions. The cure for this is to capture the interactions

between real browsers and the server, and push these into an archive

for safe keeping rather than trying to guess when pages change.

Every time the Apache Web Server sends data to a browser, SiteStory’s

Apache Module also pushes this data to the SiteStory Web Archive. The

same version of a resource will not be archived more than once, no

matter how many times it has been requested. The resulting archive is

effectively representative of a server's entire history, although

versions of resources that are never requested by a browser will also

never be archived.

In this presentation I will give an overview of SiteStory, an

Open-Source project written in Java that runs as an application under

Tomcat 6 or greater. SiteStory’s Apache Module is written in C. I will

also demonstrate the TimeMap tool that visualizes versions of a

resource available in the SiteStory archive. The TimeMap tool is a

Firefox browser extension that plots versions of a resource on a

SIMILE timeline. Since the tools uses the Memento protocol, it can

also display versions of resources available in Memento compliant web

archives and content management systems.

== Practical Relevance Ranking for 10 million books. ==

* Tom Burton-West, University of Michigan Library, tburtonw@umich.edu

[http://www.hathitrust.org/ HathiTrust Full-text search] indexes the full-text and metadata for over 10 million books. There are many challenges in tuning relevance ranking for a collection of this size. This talk will discuss some of the underlying issues, some of our experiments to improve relevance ranking, and our ongoing efforts to develop a principled framework for testing changes to relevance ranking.

Some of the topics covered will include:

* Length normalization for indexing the full-text of book-length documents

* Indexing granularity for books

*Testing new features in Solr 4.0:

**New ranking formulas that should work better with book-length documents: BM25 and DFR.

**Grouping/Field Collapsing. Can we index 3 billion pages and then use Solr's field collapsing feature to rank books according to the most relevant page(s)?

**Finite State Automota/Block Trees for storing the in-memory index to the index. Will this allow us to allow wildcards/truncation despite over 2 billion unique terms per index?

*Relevance testing methodologies:Query log analysis, Click models, Interleaving, A/B testing, and Test collection based evaluation.

*Testing of a new high-performance storage system to be installed in early 2013. We will report on any tests we are able to run prior to conference time.

== Browser/Javascript Integration Testing with Ruby ==

* Jessie Keck, Stanford University, jkeck at stanford dot edu

It's near impossible to build a rich web application without javascript. We have a lot of great patterns to follow, such as progressive enhancement, to make sure our rich web applications are usable, accessible, and testable. However; when javascript is involved the possibility exists that bugs can be introduced that won't get caught by most unit and integration testing frameworks.

This is where Watir (pronounced water) comes in. Watir can be used with popular ruby testing frameworks like RSpec and Capybara. This talk will show how to use the combination of these tools to write RSpec tests using Watir to spin up an application in a variety of browsers, navigate the application, and make assertions about the page using Capybara.

Tests using Watir are written in ruby but they don't necessarily need to test ruby application. You can test any application that you can point a browser at, so there are a wide variety of potential uses for tests written with Watir.

== Immanentizing the Google ==

* Will Sexton, Duke University Libraries, will.sexton@duke.edu

* Sean Aery, Duke University Libraries, sean.aery@duke.edu

We're using a "Google-as-a-Service" approach to reduce the complexity and cost of maintaining a structured-data discovery platform for digitized collections and other library-generated content. Our work picks up from a paper in the code4lib Journal by NCSU's Jason Ronallo [1], introducing the idea of embedded schema.org HTML microdata for library digital collections. We've extended our schema.org/RDFa Lite implementation by using Google Site Search to develop a customized interface. In our talk, we'll demonstrate how to set up an instance of Site Search, how to customize the display of results, and how to use the platform's filtering, sorting and other useful functions. We'll also report on our analysis of usage data, and discuss our strategy for scaling the system to support global site search in an upcoming library-wide CMS migration project.

[1] [http://journal.code4lib.org/articles/6400 "HTML5 Microdata and Schema.org", code4lib #16]

== Evolving Towards a Consortium MARCR Redis Datastore ==

* Jeremy Nelson, Colorado College, jeremy.nelson@coloradocollege.edu

* Sheila Yeh, University of Denver, Sheila.Yeh@du.edu

The current state of technology in library automation is not keeping pace with the explosive growth in information storage and retrieval system. The lag costs institutions as well as users’ resource discovery. To address this problem, we should look into how successfully enterprise such as Craigslist and StackOverflow manage and scale their enormous volume of data. The key lies in the Redis, a NoSQL open source advanced key-value data structure server. Therefore, Colorado College and the University of Denver, along with the Colorado Alliance of Research Libraries are exploring and co-developing a MARCR Redis Datastore. It is a peer-to-peer bibliographic datastore, modeled using the Library of Congress Bibliographic Framework's new Linked Data based MARC 21 replacement, called MARCR (MARC Resources). The structure of MARCR leads itself to an advanced Consortium catalog where a Work is cataloged once and multiple institutions have complete control over their own Instances of the Work, de-duplicating cataloging efforts while supporting real-time resource sharing between the Instances. Control, access, and discovery of records in the proposed MARCR Redis Datastore are provided through lightweight HTML5 responsive apps built with Django, Bootstrap, and KnockoutJS that also integrate with both open-source and commercial discovery products.

Redis offers many advantages for a shared MARCR bibliographic datastore, such as speed, scalability, and ease-of-deployment. Especially it can support multiple cloud models that benefits institution of various size and capital. We will demonstrate a MVP (Minimal Viable Product) iteration of this MARCR Datastore using the transformed MARC 21 records from Colorado College and the University of Denver into Redis with coordination by Colorado Alliance of Research Libraries.

== Take Your Content and Shove It ==

* Eric Frierson*, EBSCO Publishing, efrierson@ebscohost.com

Public services librarians have experimented getting out of the library. For example, the 'embedded librarian' model puts the librarian in class with students, offering help and advice throughout the semester at the point of need. Digital services have also found their way into virtual classrooms by way of links from the course management system (e.g., Blackboard, Moodle) and the occasional embedded search box that serves as a portal into the library's search solution.

With the release of discovery services and their associated APIs, we can do more. Rather than linking back to the library, we can take our resources and push them into the learning experience, allowing them to escape the library website silo altogether. Imagine a professor being able to search library resources and add items to their course website without ever leaving their CMS, or a student adding items to a folder that shows up in their campus dashboard. What if we could tie the use of library resources to student success in the classroom by leveraging user data from CMS tools? In this session, I will briefly describe how APIs might make these scenarios possible, but then facilitate a discussion on where else we could shove our resources. I hope to initiate a few development projects along these lines.

== On Top of Discovery (All Covered with Customizations) ==

* Scott Hanrath, University of Kansas Librarires, shanrath@ku.edu

On Top of Discovery (All Covered with Customizations)

How and why we've customized the front-end of our vendor library discovery system (Primo) to improve the user experience and integrate with local systems using dollops of JavaScript, a pinch of JSONP, and a smattering of both vendor and simple homegrown APIs. I'll talk about techniques for adding more AJAX to an already AJAX-intensive interface that you don't fully control (and how a few underlying changes could make it easier) and reflect on our meatball-retention odds in the event that somebody sneezes and the underlying interface changes.

Features to be discussed include improving the display of quasi-FRBRized records in search results through subtracting metadata here and adding metadata there, adding a 'did-you-mean' option in an attempt to steer users toward using Boolean operators in the way the system demands, adding fine-grained event tracking with Google Analytics, and porting existing add-ons like special collection requests, augmented stacks locations, and demand-driven acquisitions requests from our last-generation OPAC.

== EAD without XSLT: A Practical New Approach to Web-Based Finding Aids ==

* Trevor Thornton, New York Public Library, trevorthornton@nypl.org

The New York Public Library is reengineering its system for delivering archival finding aids on the Web. The foundation of this system is a data management application, written in Rails, within which collections and their components are managed as associated model instances, and descriptive data is stored natively as JSON and HTML. Front-end applications interact with the back-end via a flexible API that is capable of returning any part of the description at any level. This approach provides a number of benefits over the traditional XML/XSLT approach:

* Data is stored natively in the format in which it is needed by the front-end application, making rendering much faster

* Finding aid data can be lazy-loaded via AJAX requests

* Enables presentation of the archival description beyond the traditional finding aid structure (alternate arrangements, visualizations, etc.)

* Links to digital assets can be maintained independently of archival description

* Data cleanup and normalization can be accomplished during and/or after ingest of original data into the system, ensuring data quality and consistency

* Data is stored in a schema-neutral format, enabling easy transformation into other formats as required (e.g. RDF for semantic web applications, future version(s) of EAD schema for harvesting, etc.)

In this session I will describe the architecture of this system and its data model, and discuss the challenges presented in the design process.

== Primo / Blackboard Plugin Adaptor Development at Northwestern ==

* Michael North, Northwestern University Libraries, m-north@northwestern.edu

The two most visited websites on campus are the Blackboard Course Management System (CMS) site and the Library Discovery Webpage (powered by Primo). These two sites were perfect for a collaborative project to share functionality between themselves to the benefit of faculty and students.

This collaborative project (using Java, API's, x-services) was successful in integrating the Library Primo resource records and e-Shelf folders, with Blackboard's Course Documents webpages for faculty to use in organizing student's study resources. First we developed a "push" feature used to push individual resources from Primo "into" Blackboard. This is a static link . Second, we created a "pull" feature whereby an entire Primo e-Shelf folder (containing sub-folders and resource records) can be pulled "into" Blackboard. This is a dynamic link. These two functions result in the Blackboard Course Documents page having Primo functionality with either dynamic or static resource links.

This session will share an overview of the project, coding structure, and the technical hurdles that needed to be overcome to combine functionality between two major academically used application products.

== Relishing Quality Assurance Testing with Cucumber ==

*Joseph Dalton, The New York Public Library, josephdalton AT nypl DOT org

For those starting on a test-driven development path, the plethora of options for QA testing can also be overwhelming, ranging from writing user stories and simple acceptance tests, to running automated tests with Cucumber and Gherkin (and optionally making these more visible to stakeholders with Relish), to utilizing complex, enterprise-level tools like Quality Center to model business processes.

Although libraries are usually, and sometimes emphatically so, not profit-driven institutions, this doesn't have to mean there can't be a valid role for software quality assurance within our development environments. We've all heard "any test is better than no tests at all," but how do we effectively encourage our own institutions to embrace a test-driven development path and quality-assurance testing when, unlike businesses, our organizations generally aren't tasked with obvious quality-drivers like generating a profit, ROI, etc?

In this presentation I'll discuss some of the steps the New York Public Library has recently taken to define and develop a QA/Testing framework, in the context of the Library's recent adoption of Agile development practices for its Digital Repository and other project teams.

== I woke up / fell out of bed / checked my mail / and what I read... : PHP to Java to NCIP to ... ==

* John Bodfish, OCLC – bodfishj@oclc.org

* Michelle Suranofsky, Lehigh University – mis306@lehigh.edu

The trailer:

[http://www.youtube.com/watch?v=HCJ0dmW5YEs YouTube video]

It's 10 a.m. and your inbox has an 'Urgent' message from the State Librarian asking for an update on the “NCIP thing” for the statewide project first mentioned (to you) yesterday. You know there’s an open source “NCIP Toolkit” which supports the variety of systems involved in your statewide project, but you’ve also heard it’s pure Java and that’s not your cuppa. Sure it supports discovery with multiple ILS types, as well as resource sharing, patron empowerment, etc. etc. but is it possible to bridge those worlds? After a few minutes of searching you have a plan for ticking-off the “multi-vendor NCIP support” box on the project requirements. We’ll demonstrate a proof-of-concept implementation for PHP developers and report on the issues we encountered and our solutions.

== Powering Complicated Web Form in Rails Using XML ==

* Kristopher Kelly, New York Public Library, kristopherkelly@nypl.org

The New York Public Library recently launched the first phase of its new Metadata Management System, created in-house to create MODS-based metadata for digital assets. Moving from an idiosyncratic database design, the NYPL wanted to use a more standard format. Adopting MODS and XML led to the question of how to store the data. We chose to attempt to store XML in the database and edit it through a web form. Storing bibliographic data in such a way might seem counter-intuitive, but it has proven to solve more problems than it has created.

In this session, I will discuss how we were able to power a complicated form with XML while improving usability and overall performance.

== Message Queues: Event Driven Architecture for NYPL's repository platform ==

* Jason Varghese, New York Public Library, jason dot varghese at nypl.org

At the New York Public Library, the digital repository continues to grow at an astonishing rate with storage soon to reach petabyte range. As an increasing amount of content is produced, generated, or acquired, workflow automation and scalability became increasingly important. Workflow involves several organizational units using multiple systems. As a result, reducing the dependencies between our various systems was an important criteria. The message queue enables us to design an event driven system built from a suite of lightweight and interoperable REST-based services. Benefits include traditional drivers such as loose coupling, interoperability between heterogeneous systems, improving application scalability, and many more benefits that will be explored in this talk.

== Synching up at Web Scale: the NISO/OAI ResourceSync Effort ==

* Nettie Lagace, National Information Standards Organization (NISO), nettie AT niso DOT org

It's increasingly the case that to better serve users in a dynamic Web environment, it's desirable to synchronize large-scale web resources accurately, and in real time. However, many current system designs cope with the lack of a good available solution to this requirement by de-emphasizing current coverage or by using tools to manage crawl scheduling. The NISO/OAI ResourceSync effort, funded by the Sloan Foundation and JISC, is currently designing an solution approach that is aligned with general Web Architecture and is targeted at different communities, particularly those in the areas of cultural heritage and research.

The ResourceSync working group has been under way since early 2012, and expects to have its beta draft specification available for public review and testing by the time the Code4Lib conference takes place. This talk will outline the problem cases, the technical approach and reasoning taken by the working group, and invite feedback from the Code4Lib audience.

== The Care and Feeding of a Crowd ==

* Shawn Averkamp, University of Iowa, shawn-averkamp at uiowa.edu

* Matthew Butler, University of Iowa, matthew-butler at uiowa.edu

After a low-tech experiment in crowdsourced transcription grew into a surprisingly successful library initiative and demanded new commitments to user engagement, we found ourselves looking for a more efficient and user-friendly solution. We customized CHNM’s Scripto community transcription tool and various other Omeka plugins to develop a new site: DIYHistory.

We often receive questions about the technical side of both platforms, usually (to our dismay) from libraries who already assume they don't have the IT resources to pursue their own crowdsourcing initiatives. But we found that the software makes up only half of the recipe for success. Do you have compelling content? A long-term commitment to engaging with your users? Are you ready to promote your project far and wide? If so, then deploying a crowdsourcing initiative may be easier than you think.

Our very small development team, which consisted of a healthy mix of technologists and other stakeholders, worked closely and collaboratively on all aspects of the site. We’ll talk about customizing open-source software--how we scaled up functionality and scaled back design to improve user experience and production-level workflows--and how that process served to gently introduce collaborative software practices, such as using Git for version control, into a small, but agile, organization ready to grow. Finally, we'll share our transcription starter kit of forked Scipto and Omeka code and associated documentation for those interested in doing it themselves.

== Linked Open Communism: Better discovery through data dis- and re- aggregation ==

* Corey A Harper, New York University, corey dot harper at nyu dot edu

Current library search interfaces focus on books, journals and articles but offer little access to related entities, such as people, places, and events. These entities are generally only represented as attributes of other metadata records. Linked data can power interfaces that surface these entities as first-class resources, integrating them into results alongside library materials.

This presentation will describe research into such an interface for exploring a particular subject area: the history of the Communist Party & labor movements in the US. A triple store was seeded by 1,600 EAD records from NYU's Tamiment Library and Wagner Labor Archives. Based on access points in the finding aids, the store was further populated with data from various sources, including MARC, id.loc, VIAF, and dbpedia. Identifiers are being assigned for a wide array of typed entities, and triples can then be re-assembled into new entity "records". These new records will be loaded into a discovery interface that will allow typical keyword searching across *all* contained entities, show links between entities, and include faceting on entity types.

It is hoped that this prototype will be a model for a new kind of interface to library, archive & museum metadata targeted to particular subject domains, and could inform the development of a similar dis- and re- aggregation approach for entire library collections.

== Building a Metadata Lab for LIS Students ==

* Margaret Kipp, University of Wisconsin Milwaukee, kipp at uwm dot edu

Teaching metadata and linked data concepts to MLIS students requires more than creating basic metadata records, it also requires an understanding of how metadata fits into the library workflow and how data entry into metadata and cataloguing tools works in practice. We are developing a metadata lab for use in teaching information organisation related courses to MLIS students. Currently we are using open source software for the lab including Koha--ILS, Omeka--digital library tool and 4store--RDF triple store. The preliminary tools are hosted on LAMP servers and will be supplemented with additional software as we expand our lab. This presentation will report on the results of setting up the first few software packages for the lab and their use in teaching various courses including an introductory course in information organisation, a metadata course, and a course on linked data, Semantic Web and mashups. One of the goals of this session would be to discuss methods for bridging gaps between academic and practical work with metadata.

==Feed - The HathiTrust Ingest Toolkit==

* Ryan Rotter, University of Michigan, rrotter AT umich DOT edu

HathiTrust has a mission of ensuring the long-term preservation and accessibility of materials in the archive. Ensuring consistency among materials from different sources is one way we do this; it ensures that tools such as large scale search and PageTurner don't need to be concerned with where the content originated from and that it will be possible to undertake format migrations in the future. To ensure consistency, we have very specific and stringent standards including (but not limited to) the following areas:

* Item identifiers (i.e. how each individual submitted item is identified and named)

* Package layout (file names, directory structure, etc.)

* Image technical characteristics (file format, resolution, color depth, etc.)

* Image metadata (scanning time, scanning artist, etc.)

* Source METS file comprising MARC, PREMIS, package contents and structMap, optionally with page numbers and page tags

We have chosen not to accept submissions in arbitrary formats for a couple of reasons. Unfortunately we just don't have the resources to create custom transformations for all sources of content, and if we created generic transformations that could accept data in a wide variety of formats there would most likely be some data loss in the transformation.

Therefore we have chosen to provide the ingest tools to the library community as a set of building blocks to help you build and validate submission packages that meet the standards while at the same time allowing you to preserve images without loss of quality and include any metadata that you want to preserve.

==Roses are ff0000, Violets are 0000ff DeLaMare is throwing a Hackathon and so should you!==

* Chrissy Klenke, University of Nevada, Reno, cklenke@unr.edu

* Nick Crowl, University of Nevada, Reno, ncrowl@unr.edu

Hack 4 Reno is a 24-hour hackathon, where teams use local data to build applications that benefit the local community. Co-hosted by Reno Collective and the DeLaMare Science and Engineering Library, and sponsored by the City of Reno which generously provides the data, the teams, made up up of coders, designers, writers, and more, get to hack away for 24-hours, creating, collaborating, and having fun with it all: http://hack4reno.com/

The Reno Collective is Reno’s premiere co-working space for freelancers, designers, programmers, entrepreneurs, and startups. The DeLaMare Science and Engineering Library (DLM) at the University of Nevada, Reno is fast becoming the bridge between students, faculty, and members of its greater community of Reno Collective, Hack4Reno, Bridewire Makerspace, and the Code for American Reno Brigade.

Come hear about the hackathon, the projects created out of this event, and a glimpse of a few of the innovative projects created in collaboration with the DeLaMare Library. Robotics kits, 3D printers, drone quadricopters, lockpicking workshops and kits, bootcamps and 24-hour hackathons are just the start!

== Stuffing the Repository: An Advanced Dive Into Object Handling in Hydra ==

* Steven Anderson, Boston Public Library, sanderson AT bpl DOT org

* Eben English, Boston Public Library, eenglish AT bpl DOT org

This topic focuses on some advanced techniques for dealing with digital objects created for a repository. While all examples presented will be in the Hydra framework, the theory of what is presented is applicable to non-Hydra solutions. Specific topics include:

* Client side MD5 checksumming: While an Ajax file upload is fairly simple nowadays, verifying that the file doesn't become corrupted during transmission to the server is often overlooked. A method to calculate the MD5 checksum via the client browser before the file is transmitted over the network will be presented.

* Object Modeling Inheritance: There are many different theories regarding content modeling in the wild, from "one model to rule them all" to extreme granularity. Here we will outline an approach to modeling content inspired by OOP, using specific content type classes that inherit from a set of more generic content models.

* Hydra Models as a Rails Engine: In order to facilitate sharing of content models between multiple Hydra code bases, a completely separate and independent Ruby on Rails Engine to express content models has been developed. This unique approach offers tremendous potential for easily sharing and re-using pre-configured content models in a Hydra Head simply by installing a gem.

[[Category:Code4Lib2013]]

[[Category:Talk Proposals]]

← Older edit

Cbeer

Bureaucrat, administrator

214

edits