2011talks Submissions

Deadline for talk submission is Saturday, November 13. See this mailing list post for more details, or the general Code4Lib 2011 page.

See the Call for Submissions for guidelines on appropriate topic talks and the criteria on which submissions are evaluated.

Please follow the formatting guidelines:

== Talk Title: ==
 
* Speaker's name, affiliation, and email address
* Second speaker's name, affiliation, email address, if second speaker

Abstract of no more than 500 words.

How "great" are the Great Books?

Eric Lease Morgan, University of Notre Dame (emorgan at nd.edu)

In the 1960s a set of books called the Great Books of the Western World was published. It was supposed to represent the best of Western literature and enable the reader to further their liberal arts education. Sixty volumes in all, it included works by Plato, Aristotle, Shakespeare, Milton, Galileo, Kepler, Melville, Darwin, etc. These great books were selected based on the way they discussed a set of 102 "great ideas" such as art, astronomy, beauty, evil, evolution, mind, nature, poetry, revolution, science, will, wisdom, etc. How "great" are these books, and how "great" are the ideas expressed in them?

Given full text versions of these books it is almost trivial to use the "great ideas" as input and apply relevancy ranking algorithms against the texts thus creating a sort of score -- a "Great Ideas Coefficient". Term Frequency/Inverse Document Frequency (TFIDF) is a well-established algorithm for computing just this sort of thing:

relevancy = ( c / t ) * log( d / f ) where:

c = number of times a given word appears in a document
t = total number of words in a document
d = total number of documents in a corpus
f = total number of documents containing a given word

Thus, to calculate our Great Ideas Coefficient I sum the relevancy score for each "great idea" for each "great book". Plato's Republic might have a cumulative score of 525 while Aristotle's On The History Of Animals might have a cumulative score of 251. Books with a larger Coefficient could be considered greater. Given such a score a person could measure a book's "greatness". We could then compare the score to the scores of other books. Which book is the "greatest"? We could compare the score to other measurable things such as book's length or date to see if there were correlations. Are "great books" longer or shorter than others? Do longer books contain more "great ideas"? Are there other books that were not included in the set that maybe should have been included?

The first part of this talk describes the different steps involved in the text pre-processing to calculate an accurate TFIDF value for each item of the corpus. The results and statistical analysis are discussed in the second part. Finally I will outline the remaining work such as refining the analysis and extending the current quantitative process to a web implementation.

UNR BookFinder: Leveraging Google Books to Move Beyond Catalog Search

Will Kurt, University of Nevada, Reno, (wkurt at unr.edu)

Google Books is a great tool, but it lacks an easy method allowing users to access the items they find through their library. The UNR BookFinder is a mashup of the Google Books and WorldCat APIs (and some ugly hacks) which allows users to search for items with the power of Google’s fulltext search while eliminating the need to search all of the library’s various resources to find an item. The UNR BookFinder automatically searches the catalog and consortial ILL for the item, if these fail an ILLiad request form as automatically filled out. The end result is that the user can explore an universe of books and access them as fast as possible through the university library. A video of the alpha version can be found here.

Moving a large multi-tiered search architecture from dedicated hosts to the cloud

Peter Ciuffetti, Senior Software Engineer, Credo Reference Ltd. (pete at credoreference.com)

So you want to move a large production search service from dedicated hosts to the cloud? The flexibility is enticing, the costs are attractive, the geek cred is undeniable. Our cloud adventure came with many undocumented surprises ranging from mysterious server behavior to sales engineers suggesting that 'maybe the cloud isn't for you'. We eventually made it all work and our production service is now on the cloud. This talk will cover what the cloud product FAQs don't say, what their tech support doesn't know (or won't say) and mistakes you can avoid by talking to the guys with the arrows in their backs.

VuFind Beyond MARC: Discovering Everything Else

Demian Katz, Library Technology Development Specialist, Villanova University (demian dot katz at villanova dot edu)

The VuFind[1] discovery layer has been providing a user-friendly interface to MARC records for several years now. However, library data consists of more than just MARC records, and VuFind has grown to accommodate just about anything you can throw at it. This presentation will examine the new workflows and tools that enable discovery of non-MARC resources and some of the non-traditional applications of VuFind that they make possible. Technologies covered will include OAI-PMH, XSLT, Aperture, Solr and, of course, VuFind itself.

Linked data apps for medical professionals

Rurik Thomas Greenall, NTNU Library, (rurik dot greenall at ub dot ntnu dot no)

The promise of linked data for libraries has yet to be realized, as a demonstration of the power of RDF, HTTP-URIs and SPARQL, NTNU Library together with the Norwegian Electronic Health Library produced a linked data representation of MeSH and created a small translation app that can be used to help health professionals identify the right term and apply it in their database searches. This talk presents the simple ways in which the core technologies and concepts in linked data provide a solid, time-saving way of developing usable applications.

fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival Records using Open Source Forensic Software

Mark A. Matienzo, Manuscripts and Archives, Yale University Library (mark at matienzo dot org)

Many of the complications of born-digital records involve preparing them for transfer into a storage or preservation environment. Digital evidence of any kind is easily susceptible to unintentional and intentional modification. This presentation will describe the use of open source forensic software in pre-ingest workflows for digital archives. Digital archivists and other digital curation practitioners can develop emergent processes to prepare records for ingest and transfer using a combination of relatively simple tools. The granularity and simplicity of these tools and procedures provides the possibility for their smooth integration into a digital curation environment built on micro-services.

Why (Code4) Libraries Exist

Eric Hellman, President, Gluejar, Inc. (eric at hellman dot net)

Libraries have historically delivered value to society by facilitating the sharing of books. The library "brand" is built around the building and exploitation of their collections. These collections have been acquired and owned. As ebook readers become the preferred consumption platform for books, libraries are beginning to come to terms with the fact that they don't own their digital collections, and can't share books as they'd like to. Yet libraries continue to be valuable in many ways. In this transitional period, only one thing can save libraries from irrelevance and dissipation: Code.

The Story of TILE: Making Modular & Reusable Tools

Doug Reside, MITH, University of Maryland (dougreside at gmail dot com)

The Text Image Linking Environment (TILE) is a collaborative project between the Maryland Institute for Technology in the Humanities (MITH), the Digital Library Program at Indiana University, and the School of Library and Information Science at Indiana University Bloomington. Since May 2009, the TILE project team has been developing through NEH Research & Development funding a web-based, modular, image markup tool for both semi-automated linking between encoded text and image of text, and image annotation. The software will be complete and ready for release in June 2011.

The basic functionality of TILE is to create links between images and text that relates to that image – either annotations or transcriptions. We have paid particular attention to linking between image of text and transcription of text. These links may be made manually, but the project also includes an algorithm, written in JavaScript, for recognizing text within an image and automatically associating the coordinates with a Unicode transcription. Additionally, the tool can import and export transcriptions and links from and to a variety of metadata formats (TEI, METS, OWL) and will provide an API for developers to write mappings for additional formats. Of course, this functionality is immediately useful to a relatively limited set of editors of digital materials, but we have made modularity and extensibility primary goals of the project.

Many members of the TILE development team are also members of the Open Annotation Collaboration (OAC), and have therefore attempted to develop TILE’s annotation features to be OAC compliant. Like OAC, TILE assumes that the text and the images to be linked may exist at separate and completely unconnected servers. When a user starts the TILE tool for the first time, she is prompted to supply a URI to a TILE compliant JSON file.

TILE’s JSON is simple and thoroughly documented, and we provide several translators to map common existing metadata formats to the format. We have already created a PHP script that will generate TILE JSON from a TEI P5 document and are currently working to do the same for the METS files used in the Indiana University’s METS navigator tool.

Additionally, TILE provides a modular exporting tool that allows users to run the work they’ve done in TILE through an external translator and then download the result to the client computer. For example, a user may import a set of images and transcripts from a METS file at the Library of Congress, use TILE to link images and text, and then export the result as a TEI file. The TEI file may then be reimported to TILE at a later data to further edit or convert the file.

At Code4Lib, we will demonstrate the functionality of TILE and display a poster and provide handouts that describe the thinking behind TILE, how it is intended to be used, and details on how TILE is built and functions.

We Don’t Server Their Kind : Managing E-resources with Flat-File Databases

Junior Tidal, Multimedia and Web Services Librarian, New York City College of Technology, CUNY (jtidal at citytech dot cuny dot edu)

Managing E-resources can be a daunting challenge. URLs, database names, and even vendors can change, go down, or simply cease to exist. My proposal involves the use of a PHP-based, flat-file database driven web tool for database management. The design of this program was to fulfill two needs: ease of use for librarians with a lack of programming experience and to meet the security and technical restrictions placed by the college’s IT department. My presentation will explore the development of this tool, challenges within its development, and future improvements. PHP code and the flat-file database will also be explained and provided to attendees. For a working demonstration feel free to visit the New York City College of Technology’s A-Z database page or the subject database page.

Drupal 7 as a Rapid Application Development Tool

Cary Gordon, President, The Cherry Hill Company & Board Member, The Drupal Association (cgordon at chillco dawt com)

Five years ago, I discovered that the Drupal CMS had a programming framework disguised as an API, and learned that I could use it to solve problems.

Drupal 7 builds on that to provide a powerful toolset for interfacing with, manipulating and presenting data. It empowers tool-builders by providing a minimal install option, along with a more powerful installation profile system makes it easier for developers to package and distribute their applications.

Helping Open Source Succeed

Peter Murray, LYRASIS, Peter.Murray@lyrasis.org
Tim Daniels, LYRASIS, Tim.Daniels@lyrasis.org

Deciding if open source is an option for your institution, or what open source software matches your institution’s needs and capabilities, is a complex decision. LYRASIS is developing a new area of focus to assist libraries with decision tools and an open source software registry. We want to learn from the creators of open source software what questions institutions have when considering the adoption of open source software and what information you would like to see in a registry that compares various open source tools. A summary of topics discussed in this session will be openly published as part of LYRASIS’ program development plans and decision support resources.

The mission of the new and emerging LYRASIS Technology Services area is to serve members and the broader library community as a provider of expertise and capacity in open source based technology solutions. We think that viable roles for an organization supporting open source software are to: a) Increase understanding of open source technology within the library community, including value, benefits, risks, and costs; b) Assist in decision-making by providing resources to help libraries evaluate open source technologies, institutional readiness, and capacity for adoption; c) Support adoption and use of open source technologies and systems within libraries and consortia; d) Foster integration of open source software tools to expand the ability of existing programs to meet a range of library user needs; e) Develop and test new open source software programs, and contribute to the development of existing programs; f) Support long-term sustainability of viable, library-based open source software and systems. We recognize that these roles exist to some extent on a continuum, with latter services related to development and sustainability building on the knowledge and experience gained through deployment of existing open source systems. In turn, effective adoption and use depends on understanding open source systems and having resources to assist in decision-making and implementation.

With open source software in the “innovator” and “early adopter” stages in the library community, we intend to focus its initial efforts on roles A-D in the above list: increased understanding, decision-support, and effective adoption and integration of existing library-focused open source systems. This session is focused on the decision-support services area of activity.

The impact of this session is expected to be far reaching, if initially subtle. With most of the session time devoted to discussion and interaction among peers on questions surrounding the adoption of open source software, participants will take away a deeper understanding of topics each institution should consider when looking at open source software. These findings, along with that of similar sessions around the country, will inform the creation and expansion of the free decision support tools being developed by LYRASIS.

Letting in the light: using Solr as an external search component

Jay Luker, IT Specialist, ADS (jluker at cfa dot harvard dot edu)
Benoit Thiell, ADS (bthiell at cfa dot harvard dot edu)

It’s well-established that Solr provides an excellent foundation for building a faceted search engine. But what if your application’s foundation has already been constructed? How do you add Solr as a federated, fulltext search component to an existing system that already provides a full set of well-crafted scoring and ranking mechanisms?

This talk will describe a work-in-progress project at the Smithsonian/NASA Astrophysics Data System to migrate its aging search platform to Invenio, an open-source institutional repository and digital library system originally developed at CERN, while at the same time incorporating Solr as an external component for both faceting and fulltext search.

In this presentation we'll start with a short introduction of Invenio and then move on to the good stuff: an in-depth exploration of our use of Solr. We'll explain the challenges that we faced, what we learned about some particular Solr internals, interesting paths we chose not to follow, and the solutions we finally developed, including the creation of custom Solr request handlers and query parser classes.

This presentation will be quite technical and will show a measure of horrible Java code. Benoit will probably run away during that part.

Working with DuraCloud: How to preserve your data in the cloud

Bill Branan, DuraSpace, bbranan at duraspace dot org
Andrew Woods, DuraSpace, awoods at duraspace dot org

Ever expanding digital collections have become the norm in academic libraries. As the size of collections grow, the need for simple-to-deploy yet powerful preservation strategies becomes increasingly important. The DuraCloud project, a cloud-hosted service for data management and preservation, is committed to bringing the availability and elasticity of the cloud to bear on the issue of digital preservation. This session will discuss the APIs and tools which can be used to communicate and integrate with the DuraCloud platform, providing an immediate connection to scalable storage available from multiple cloud storage providers, configurable services which can be run over your content out-of-the-box, and a development platform which can serve as the basis for ongoing data mining and analysis.

Visualizing Library Data

Karen Coombs, OCLC, coombsk at oclc dot org

Visualizations can be powerful tools to give context to library users and to provide a clear picture for data-driven decision-making in libraries. Map mashups, tag clouds and timelines can be used to show information to users in new ways and help them locate materials to meet their needs. QR codes can help link users to materials that libraries have in their collections. Charts and graphs can be used to help analyze library collections (holdings) and compare them to other libraries. This session will show prototypes which combine tools like Google Chart API, Protovis and Simile Widgets with data from WorldCat, WorldCat Registry, Classify, Terminology Services, and Dewey.info to create vivid illustrations in library user interfaces and administration tools.

Kuali OLE: Architecture for Diverse and Linked Data

Tim McGeary, Lehigh University, Kuali OLE Functional Council, tim dot mcgeary at lehigh dot edu
Brad Skiles, Project Manager, Kuali OLE, Indiana University, bradskil at indiana dot edu

With programming scheduled to be begin in January 2011 on the Kuali Open Library Environment (OLE), the Kuali OLE Functional Council is developing the requirements for an architecture for diverse data sets and linked data. With no frontrunner for one bibliographic data standard, and local requirements on what data will be accompanying or linked to the main record store, Kuali OLE needs to build a flexible environment for records management and access.

We will present the concepts of our planned architecture, a multi-repository framework, using a document repository, a semantic repository, and a relational repository, brokered on top of the enterprise service bus of Kuali Rice.

As a community source project, this is an opportunity for the Kuali OLE partners to present our plans for discussion with the community, and we look forward to feedback, questions, and comments.