2012 talks proposals

Deadline for talk submission is Sunday, November 20.

Prepared talks are 20 minutes (including setup and questions), and focus on one or more of the following areas:

* tools (some cool new software, software library or integration platform)
* specs (how to get the most out of some protocols, or proposals for new ones)
* challenges (one or more big problems we should collectively address)

The community will vote on proposals using the criteria of:

* usefulness
* newness
* geekiness
* diversity of topics

Please follow the formatting guidelines:


== Talk Title: ==
 
* Speaker's name, affiliation, and email address
* Second speaker's name, affiliation, email address, if second speaker

Abstract of no more than 500 words.

VuFind 2.0: Why and How?

Demian Katz, Villanova University, demian.katz@villanova.edu

A major new version of the VuFind discovery software is currently in development. While VuFind 1.x remains extremely popular, some of its components are beginning to show their age. VuFind 2.0 aims to retain all the strengths of the previous version of the software while making the architecture cleaner, more modern and more standards-based. This presentation will examine the motivation behind the update, preview some of the new features to look forward to, and discuss the challenges of creating a developer-friendly open source package in PHP.

Open Source Software Registry

Peter Murray, LYRASIS, Peter.Murray@lyrasis.org

LYRASIS is creating and shepherding a registry of library open source software as part of its grant from the Mellon Foundation to support the adoption of open source software by libraries. The goal of the grant is to help libraries of all types determine if open source software is right for them, and what combination of software, hosting, training, and consulting works for their situation. The registry is intended to become a community exchange point and stimulant for growth of the library open source ecosystem by connecting libraries with projects, service providers, and events.

The first half of this session will demonstrate the registry functions and describe how projects and providers can get involved. The second half of the session will be a brainstorming suggestion of how to expand the functionality and usefulness of the registry.

Property Graphs And TinkerPop Applications in Digital Libraries

Brian Tingle, California Digital Library, brian.tingle.cdlib.org@gmail.com

TinkerPop is an open source software development group focusing on technologies in the graph database space. This talk will provide a general introduction to the TinkerPop Graph Stack and the property graph model is uses. The introduction will include code examples and explanations of the property graph models used by the Social Networks in Archival Context project and show how the historical social graph is exposed as a JSON/REST API implemented by a TinkerPop rexster Kibble that contains the application's graph theory logic. Other graph database applications possible with TinkerPop such as RDF support, and citation analysis will also be discussed.

Security in Mind

Erin Germ, United States Naval Academy, Nimitz Library, germ@usna.edu

I would like to talk about security of library software.

Over the Summer, I discovered a critical vulnerability in a vendor’s software that (verified) allowed me to assume any user’s identity for that site, (verified) switch to any user, and to (unverified, meaning I didn’t not perform this as I didn’t want to “hack” another library’s site) assume the role of any user for any other library who used this particular vendor's software.

Within a 3 hour period, I discovered a 2 vulnerabilities: 1) minor one allowing me to access any backups from any library site, and 2) a critical vulnerability. From start to finish, the examination, discovery in the vulnerability, and execution of a working exploit was done in less than 2 hours. The vulnerability was a result of poor cookie implementation. The exploit itself revolved around modifying the cookie, and then altering the browser’s permissions by assuming the role of another user.

I do not intend on stating which vendor it was, but I will show how I was able to perform this. If needed, I can do further research and “investigation” into other vendor's software to see what I can “find”.

If selected, I will contact the vendor to inform them that I will present about this at C4L2012. I do not intend on releasing the name of the vendor.

Search Engines and Libraries

Greg Lindahl, blekko CTO, greg@blekko.com

blekko is a new web-scale search engine which enables end-users to create vertical search engines, through a feature called slashtags. Slashtags can contain as few as 1 or as many as tens of thousands of websites relevant to a narrow or broad topic. We have an extensive set of slashtags curated by a combination of volunteers and an in-house librarian team, or end-users can create and share their own. This talk will cover examples of slashtag creation relevant to libraries, and show how to embed this search into a library website, either using javascript or via our API.

We have exhibited at a couple of library conferences, and have received a lot of interest. blekko is a free service.

Beyond code. Versioning data with Git and Mercurial.

Stephanie Collett, California Digital Library, stephanie.collett@ucop.edu
Martin Haye, California Digital Library, martin.haye@ucop.edu

Within a relatively short time since their introduction, distributed version control systems (DVCS) like Git and Mercurial have enjoyed widespread adoption for versioning code. It didn’t take long for the library development community to start discussing the potential for using DVCS within our applications and repositories to version data. After all, many of the features that have made some of these systems popular in the open source community to version code (e.g. lightweight, file-based, compressed, reliable) also make them compelling options for versioning data. And why write an entire versioning system from scratch if a DVCS solution can be a drop-in solution? At the California Digital Library (CDL) we’ve started using Git and Mercurial in some of our applications to version data. This has proven effective in some situations and unworkable in others. This presentation will be a practical case study of CDL’s experiences with using DVCS to version data. We will explain how we’re incorporating Git and Mercurial in our applications, describe our successes and failures and consider the issues involved in repurposing these systems for data versioning.

Design for Developers

Lisa Kurt, University of Nevada, Reno, lkurt@unr.edu

Users expect good design. This talk will delve into what makes really great design, what to look for, and how to do it. Learn the principles of great design to take your applications, user interfaces, and projects to a higher level. With years of experience in graphic design and illustration, Lisa will discuss design principles, trends, process, tools, and development. Design examples will be from her own projects as well as a variety from industry. You’ll walk away with design knowledge that you can apply immediately to a variety of applications and a number of top notch go-to resources to get you up and running.

Building research applications with Mendeley

William Gunn, Mendeley william.gunn@mendeley.com (@mrgunn)

This is partly a tool talk and partly a big idea one.

Mendeley has built the world's largest open database of research and we've now begun to collect some interesting social metadata around the document metadata. I would like to share with the Code4Lib attendees information about using this resource to do things within your application that have previously been impossible for the library community, or in some cases impossible without expensive database subscriptions. One thing that's now possible is to augment catalog search by surfacing information about content usage, allowing people to not only find things matching a query, but popular things or things read by their colleagues. In addition to augmenting search, you can also use this information to augment discovery. Imagine an online exhibit of artifacts from a newly discovered dig not just linking to papers which discuss the artifact, but linking to really good interesting papers about the place and the people who made the artifacts. So the big idea is, "How will looking at the literature from a broader perspective than simple citation analysis change how research is done and communicated? How can we build tools that make this process easier and faster?" I can show some examples of applications that have been built using the Mendeley and PLoS APIs to begin to address this question, and I can also present results from Mendeley's developer challenge which shows what kinds of applications researchers are looking for, what kind of applications peope are building, and illustrates some interesting places where the two don't overlap.

Your UI can make or break the application (to the user, anyway)

Robin Schaaf, University of Notre Dame, schaaf.4@nd.edu

UI development is hard and too often ends up as an after-thought to computer programmers - if you were a CS major in college I'll bet you didn't have many, if any, design courses. I'll talk about how to involve the users upfront with design and some common pitfalls of this approach. I'll also make a case for why you should do the screen design before a single line of code is written. And I'll throw in some ideas for increasing usability and attractiveness of your web applications. I'd like to make a case study of the UI development of our open source ERMS.

Why Nobody Knows How Big The Library Really Is - Perspective of a Library Outside Turned Insider

Patrick Berry, California State University, Chico, pberry@csuchico.edu

In this talk I would like to bring the perspective of an "outsider" (although an avowed IT insider) to let you know that people don't understand the full scope of the library. As we "rethink education", it is incumbent upon us to help educate our institutions as to the scope of the library. I will present some of the tactics I'm employing to help people outside, and in some cases inside, the library to understand our size and the value we bring to the institution.

Building a URL Management Module using the Concrete5 Package Architecture

David Uspal, Villanova University, david.uspal@villanova.edu

Keeping track of URLs utilized across a large website such as a university library, and keeping that content up to date for subject and course guides, can be a pain, and as an open source shop, we’d like to have open source solution for this issue. For this talk, I intend to detail our solution to this issue by walking step-by-step through the building process for our URL Management module -- including why a new solution was necessary; a quick rundown of our CMS (Concrete5, a CMS that isn’t Drupal); utilizing the Concrete5 APIs to isolate our solution from core code (to avoid complications caused by core updates); how our solution was integrated into the CMS architecture for easy installation; and our future plans on the project.

Building an NCIP connector to OpenSRF to facilitate resource sharing

Jon Scott, Lyrasis, jon_scott@wsu.edu and Kyle Banerjee, Orbis Cascade Alliance, banerjek@uoregon.edu

How do you reverse engineer any protocol to provide a new service? Humans (and worse yet, committees) often design verbose protocols built around use cases that don't line up current reality. To compound difficulties, the contents of protocol containers are not sufficiently defined/predictable and the only assistance available is sketchy documentation and kind individuals on the internet willing to share what they learned via trial by fire.

NCIP (Niso Circulation Interchange Protocol) is an open standard that defines a set of messages to support exchange of circulation data between disparate circulation, interlibrary loan, and related applications -- widespread adoption of NCIP would eliminate huge amounts of duplicate processing in separate systems.

This presentation discusses how we learned enough about NCIP and OpenSRF from scratch to build an NCIP responder for Evergreen to facilitate resource sharing in a large consortium that relies on over 20 different ILSes.

Practical Agile: What's Working for Stanford, Blacklight, and Hydra

Naomi Dushay, Stanford University Libraries, ndushay@stanford.edu

Agile development techniques can be difficult to adopt in the context of library software development. Maybe your shop has only one or two developers, or you always have too many simultaneous projects. Maybe your new projects can’t be started until 27 librarians reach consensus on the specifications.

This talk will present successful Agile- and Silicon-Valley-inspired practices we’ve adopted at Stanford and/or in the Blacklight and Hydra projects. We’ve targeted developer happiness as well as improved productivity with our recent changes. User stories, dead week, sight lines … it’ll be a grab bag of goodies to bring back to your institution, including some ideas on how to adopt these practices without overt management buy in.

Quick and Dirty Clean Usability: Rapid Prototyping with Bootstrap

Shaun Ellis, Princeton University Libraries, shaune@princeton.edu

"The code itself is unimportant; a project is only as useful as people actually find it." - Linus Torvalds [1]

Usability has been a buzzword for some time now, but what is the process for making the the transition toward a better user experience, and hence, better designed library sites? I will discuss the one facet of the process my team is using to redesign the Finding Aids site for Princeton University Libraries (still in development). The approach involves the use of rapid prototyping, with Bootstrap [2], to make sure we are on track with what users and stakeholders expect up front, and throughout the development process.

Because Bootstrap allows for early and iterative user feedback, it is more effective than the historic Photoshop mockups/wireframe technique. The Photoshop approach allows stakeholders to test the look, but not the feel -- and often leaves developers scratching their heads. Being a CSS/HTML/Javascript grid-based framework, Bootstrap makes it easy for anyone with a bit of HTML/CSS chops to quickly build slick, interactive prototypes right in the browser -- tangible solutions which can be shared, evaluated, revised, and followed by all stakeholders (see Minimum Viable Products [3]). Efficiency is multiplied because the customized prototypes can flow directly into production use, as is the goal with iterative development approaches, such as the Agile methodology.

While Bootstrap is not the only framework that offers grid-based layout, development is expedited and usability is enhanced by Bootstraps use of of "prefabbed" conventional UI patterns, clean typography, and lean Javascript for interactivity. Furthermore, out-of-the box Bootstrap comes in a fairly neutral palette, so focus remains on usability, and does not devolve into premature discussions of color or branding choices. Finally, using Less can be a powerful tool in conjunction with Bootstrap, but is not necessary. I will discuss the pros and cons, and offer examples for how to getting up and running with or without Less.

Search Engine Relevancy Tuning - A Static Rank Framework for Solr/Lucene

Mike Schultz, Amazon.com (formerly Summon Search Architect) mike.schultz@gmail.com

Solr/Lucene provides a lot of flexibility for adjusting relevancy scoring and improving search results. Roughly speaking there are two areas of concern: Firstly, a 'dynamic rank' calculation that is a function of the user query and document text fields. And secondly, a 'static rank' which is independent of the query and generally is a function of non-text document metadata. In this talk I will outline an easily understood, hand-tunable static rank system with a minimal number of parameters.

The obvious major feature of a search engine is to return results relevant to a user query. Perhaps less obvious is the huge role query independent document features play in achieving that. Google's PageRank is an example of a static ranking of web pages based on links and other secret sauce. In the Summon service, our 800 million documents have features like publication date, document type, citation count and Boolean features like the-article-is-peer-reviewed. These fields aren't textual and remain 'static' from query to query, but need to influence a document's relevancy score. In our search results, with all query related features being equal, we'd rather have more recent documents above older ones, Journals above Newspapers, and articles that are peer reviewed above those that are not. The static rank system I will describe achieves this and has the following features:

Query-time only calculation - nothing is baked into the index - with parameters adjustable at query time.
The system is based on a signal metaphor where components are 'wired' together. System components allow multiplexing, amplifying, summing, tunable band-pass filtering, string-to-value-mapping all with a bare minimum of parameters.
An intuitive approach for mixing dynamic and static rank that is more effective than simple adding or multiplying.
A way of equating disparate static metadata types that leads to understandable results ordering.

Submitting Digitized Book-like things to the Internet Archive

Joel Richard, Smithsonian Institution Libraries, richardjm@si.edu

The Smithsonian Libraries has submitted thousands of out-of-copyright items to the Internet Archive over the years. Specifically in relation to the Biodiversity Heritage Library, we have developed an in-house boutique scanning and upload process that became a learning experience in automated uploading to the Archive. As part of the software development, we created a whitepaper that details the combined learning experiences of the Smithsonian Libraries and the Missouri Botanical Garden. We will discuss some of the the contents of this whitepaper in the context of our scanning process and the manner in which we upload items to the Archive.

Our talk will include a discussion of the types of files and their formats used by the Archive, processes that the Archive performs on uploaded items, ways of interacting and affecting those processes, potential pitfalls and solutions that you may encounter when uploading, and tools that the Archive provides to help monitor and manage your uploaded documents.

Finally, we'll wrap up with a brief summary of how to use things that are on the Internet Archive in your own websites.

So... you think you want to Host a Code4Lib National Conference, do you?

Elizabeth Duell, Orbis Cascade Alliance, eduell@uoregon.edu

Are you interested in hosting your own Code4Lib Conference? Do you know what it would take? What does BEO stands for? What does F&B Minimum mean? Who would you talk to for support/mentoring? There are so many things to think about: internet support, venue size, rooming blocks, contracts, dietary restrictions and coffee (can't forget the coffee!) just to name a few. Putting together a conference of any size can look daunting, so let's take the scary out of it and replace it with a can do attitude!

Be a step ahead of the game by learning from the people behind the curtain. Ask questions and be given templates/ cheat sheets!

HTML5 Microdata and Schema.org

Jason Ronallo, North Carolina State University Libraries, jason_ronallo@ncsu.edu

When the big search engines announced support for HTML5 microdata and the schema.org vocabularies, the balance of power for semantic markup in HTML shifted.

What is microdata?
Where does microdata fit with regards to other approaches like RDFa and microformats?
Where do libraries stand in the worldview of Schema.org and what can they do about it?
How can implementing microdata and schema.org optimize your sites for search engines?
What tools are available?

Stack View: A Library Browsing Tool

Annie Cain, Harvard Library Innovation Lab, acain@law.harvard.edu
Jeff Goldenson, Harvard Library Innovation Lab, jgoldenson@law.harvard.edu

In an effort to recreate and build upon the traditional method of browsing a physical library, we used catalog data, including dimensions and page count, to create a virtual shelf.

This CSS and JavaScript backed visualization allows items to sit on any number of different shelves, really taking advantage of its digital nature. See how we built Stack View on top of our data and learn how you can create shelves of your own using our open source code.

“Linked-Data-Ready” Software for Libraries

Jennifer Bowen, University of Rochester River Campus Libraries, jbowen@library.rochester.edu

Linked data is poised to replace MARC as the basis for the new library bibliographic framework. For libraries to benefit from linked data, they must learn about it, experiment with it, demonstrate its usefulness, and take a leadership role in its deployment.

The eXtensible Catalog Organization (XCO) offers open-source software for libraries that is “linked-data-ready.” XC software prepares MARC and Dublin Core metadata for exposure to the semantic web, incorporating FRBR Group 1 entities and registered vocabularies for RDA elements and roles. This presentation will include a software demonstration, proposed software architecture for creation and management of linked data, a vision for how libraries can migrate from MARC to linked data, and an update on XCO progress toward linked data goals.

How people search the library from a single search box

Cory Lown, North Carolina State University Libraries, cory_lown@ncsu.edu

Searching the library is complex. There's the catalog, article databases, journal title and database title look-ups, the library website, finding aids, knowledge bases, etc. How would users search if they could get to all of these resources from a single search box? I'll share what we've learned about single search at NCSU Libraries by tracking use of QuickSearch (http://www.lib.ncsu.edu/search/index.php?q=aerospace+engineering), our home-grown unified search application. As part of this talk I will suggest low-cost ways to collect real world use data that can be applied to improve search. I will try to convince you that data collection must be carefully planned and designed to be an effective tool to help you understand what your users are telling you through their behavior. I will talk about how the fragmented library resource environment challenges us to provide useful and understandable search environments. Finally, I will share findings from analyzing millions of user transactions about how people search the library from a production single search box at a large university library.

An Incremental Approach to Archival Description and Access

Chela Scott Weber, New York University Libraries, chelascott@gmail.com
Mark A. Matienzo, Yale University Library, mark@matienzo.org

This is placeholder text; description coming shortly

Making the Easy Things Easy: A Generic ILS API

Wayne Schneider, Hennepin County Library, wschneider@hclib.org

Some stuff we try to do is complicated, because, let's face it, library data is hard. Some stuff, on the other hand, should be easy. Given an item identifier, I should be able to look at item availability. Given a title identifier, I should be able to place a request. And no, I shouldn't have to parse through the NCIP specification or write a SIP client to do it.

This talk will present work we have done on a web services approach to an API for traditional library transactional data, including example applications.

Your Catalog in Linked Data

Tom Johnson, Oregon State University Libraries, thomas.johnson@oregonstate.edu

Linked Library Data activity over the last year has seen bibliographic data sets and vocabularies proliferating from traditional library sources. We've reached a point where regular libraries don't have to go it alone to be on the Semantic Web. There is a quickly growing pool of things we can actually link to, and everyone's existing data can be immediately enriched by participating.

This is a quick and dirty road to getting your catalog onto the Linked Data web. The talk will take you from start to finish, using Free Software tools to establish a namespace, put up a SPARQL endpoint, make a simple data model, convert MARC records to RDF, and link the results to major existing data sets (skipping conveniently over pesky processing time). A small amount of "why linked data?" content will be covered, but the primary goal is to leave you able to reproduce the process and start linking your catalog into the web of data. Appropriate documentation will be on the web.

Getting the Library into the Learning Management System using Basic LTI

David Walker, California State University, dwalker@calstate.edu

The integration of library resources into learning management systems (LMS) has long been something of a holy grail for academic libraries. The ability to deliver targeted library systems and services to students and faculty within the context of a specific course could greatly simplify access to library resources. Yet, the technical barriers to achieving that goal have to date been formidable.

The recently released Learning Tool Interoperability (LTI) protocol, developed by IMS, now greatly simplifies this process by allowing libraries (and others) to develop and maintain “tools” that function like a native plugin or building block within the LMS, but ultimately live outside of it. In this presentation, David will provide an overview of Basic LTI, a simplified subset (or profile) of the wider LTI protocol, showing how libraries can use this to easily integrate their external systems into any major LMS. He’ll showcase the work Cal State has done to do just that.

Turn your Library Proxy Server into a Honeypot

Calvin Mah, Simon Fraser University, calvinm@sfu.ca (@calvinmah)

Ezproxy has provided libraries with a useful tool for providing patrons with offsite online access to licensed electronic resources. This has not gone unnoticed for the unscrupulous users of the Internet who are either unwilling or unable to obtain legitimate access to these materials for themselves. Instead, they buy or share hacked university computing accounts for unauthorized access. When undetected, abuse of compromised university accounts can lead to abuse of vendor resources which lead to the blocking of the entire campus block of IP addresses from accessing that resource.

Simon Fraser University Library has been pro actively detecting and thwarting unauthorized attempts through log analysis. Since SFU has begun analysing our ezproxy logs, the number of new SFU login credentials which are posted and shared in publicly accessible forums has been reduced to zero. Since our log monitoring began in 2008, the annual average number of SFU login credentials that are compromised or hacked is 140. Instead of being a single point of weakness in campus IT security, the library’s proxy server is a honeypot exposing weak passwords, keystroke logging trojans installed on patron PCs and campus network password sniffers.

This talk will discuss techniques such as geomapping login attempts, strategies such as seeding phishing attempts and tools such as statistical log analysis used in detecting compromised login credentials.

Relevance Ranking in the Scholarly Domain

Tamar Sadeh, PhD, Ex Libris Group, tamar.sadeh@exlibrisgroup.com

The greatest challenge for discovery systems is how to provide users with the most relevant search results, given the immense landscape of available content. In a manner that is similar to human interaction between two parties, in which each person adjusts to the other in tone, language, and subject matter, discovery systems would ideally be sophisticated and flexible enough to adjust their algorithms to individual users and each user’s information needs.

When evaluating the relevance of an item to a specific user in a specific context, relevance-ranking algorithms need to take into account, in addition to the degree to which the item matches the query, information that is not embodied in the item itself. Such information, which includes the item’s scholarly value, the type of search that the user is conducting (e.g., an exploratory search or a known-item search), and other factors, enables a discovery system to fulfill user expectations that have been shaped by experience with Web search engines.

The session will focus on the challenges of developing and evaluating relevance-ranking algorithms for the scholarly domain. Examples will be drawn mainly from the relevance-ranking technology deployed by the Ex Libris Primo discovery solution.