2011talks Submissions

From Code4Lib
Revision as of 23:58, 28 October 2010 by Anarchivist (Talk | contribs) (fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival Records using Open Source Forensic Software)

Jump to: navigation, search

Deadline for talk submission is Saturday, November 13. See this mailing list post for more details.

Please follow the formatting guidelines:

== Talk Title: ==
 
* Speaker's name, affiliation, and email address
* Second speaker's name, affiliation, email address, if second speaker

Abstract of no more than 500 words.

How "great" are the Great Books?

  • Eric Lease Morgan, University of Notre Dame (emorgan at nd.edu)

In the 1960s a set of books called the Great Books of the Western World was published. It was supposed to represent the best of Western literature and enable the reader to further their liberal arts education. Sixty volumes in all, it included works by Plato, Aristotle, Shakespeare, Milton, Galileo, Kepler, Melville, Darwin, etc. These great books were selected based on the way they discussed a set of 102 "great ideas" such as art, astronomy, beauty, evil, evolution, mind, nature, poetry, revolution, science, will, wisdom, etc. How "great" are these books, and how "great" are the ideas expressed in them?

Given full text versions of these books it is almost trivial to use the "great ideas" as input and apply relevancy ranking algorithms against the texts thus creating a sort of score -- a "Great Ideas Coefficient". Term Frequency/Inverse Document Frequency (TFIDF) is a well-established algorithm for computing just this sort of thing:

relevancy = ( c / t ) * log( d / f ) where:

  • c = number of times a given word appears in a document
  • t = total number of words in a document
  • d = total number of documents in a corpus
  • f = total number of documents containing a given word

Thus, to calculate our Great Ideas Coefficient I sum the relevancy score for each "great idea" for each "great book". Plato's Republic might have a cumulative score of 525 while Aristotle's On The History Of Animals might have a cumulative score of 251. Books with a larger Coefficient could be considered greater. Given such a score a person could measure a book's "greatness". We could then compare the score to the scores of other books. Which book is the "greatest"? We could compare the score to other measurable things such as book's length or date to see if there were correlations. Are "great books" longer or shorter than others? Do longer books contain more "great ideas"? Are there other books that were not included in the set that maybe should have been included?

The first part of this talk describes the different steps involved in the text pre-processing to calculate an accurate TFIDF value for each item of the corpus. The results and statistical analysis are discussed in the second part. Finally I will outline the remaining work such as refining the analysis and extending the current quantitative process to a web implementation.

UNR BookFinder: Leveraging Google Books to Move Beyond Catalog Search

  • Will Kurt, University of Nevada, Reno, (wkurt at unr.edu)

Google Books is a great tool, but it lacks an easy method allowing users to access the items they find through their library. The UNR BookFinder is a mashup of the Google Books and WorldCat APIs (and some ugly hacks) which allows users to search for items with the power of Google’s fulltext search while eliminating the need to search all of the library’s various resources to find an item. The UNR BookFinder automatically searches the catalog and consortial ILL for the item, if these fail an ILLiad request form as automatically filled out. The end result is that the user can explore an universe of books and access them as fast as possible through the university library. A video of the alpha version can be found here.

Moving a large multi-tiered search architecture from dedicated hosts to the cloud

  • Peter Ciuffetti, Senior Software Engineer, Credo Reference Ltd. (pete at credoreference.com)

So you want to move a large production search service from dedicated hosts to the cloud? The flexibility is enticing, the costs are attractive, the geek cred is undeniable. Our cloud adventure came with many undocumented surprises ranging from mysterious server behavior to sales engineers suggesting that 'maybe the cloud isn't for you'. We eventually made it all work and our production service is now on the cloud. This talk will cover what the cloud product FAQs don't say, what their tech support doesn't know (or won't say) and mistakes you can avoid by talking to the guys with the arrows in their backs.

VuFind Beyond MARC: Discovering Everything Else

  • Demian Katz, Library Technology Development Specialist, Villanova University (demian dot katz at villanova dot edu)

The VuFind[1] discovery layer has been providing a user-friendly interface to MARC records for several years now. However, library data consists of more than just MARC records, and VuFind has grown to accommodate just about anything you can throw at it. This presentation will examine the new workflows and tools that enable discovery of non-MARC resources and some of the non-traditional applications of VuFind that they make possible. Technologies covered will include OAI-PMH, XSLT, Aperture, Solr and, of course, VuFind itself.

Linked data apps for medical professionals

  • Rurik Thomas Greenall, NTNU Library, (rurik dot greenall at ub dot ntnu dot no)

The promise of linked data for libraries has yet to be realized, as a demonstration of the power of RDF, HTTP-URIs and SPARQL, NTNU Library together with the Norwegian Electronic Health Library produced a linked data representation of MeSH and created a small translation app that can be used to help health professionals identify the right term and apply it in their database searches. This talk presents the simple ways in which the core technologies and concepts in linked data provide a solid, time-saving way of developing usable applications.

fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival Records using Open Source Forensic Software

  • Mark A. Matienzo, Manuscripts and Archives, Yale University Library (mark at matienzo dot org)

Many of the complications of born-digital records involve preparing them for transfer into a storage or preservation environment. Digital evidence of any kind is easily susceptible to unintentional and intentional modification. This presentation will describe the use of open source forensic software in pre-ingest workflows for digital archives. Digital archivists and other digital curation practitioners can develop emergent processes to prepare records for ingest and transfer using a combination of relatively simple tools. The granularity and simplicity of these tools and procedures provides the possibility for their smooth integration into a digital curation environment built on micro-services.