Code4Lib - User contributions [en]

2011talks Submissions

2010-10-21T17:50:46Z

Willkurt:

Deadline for talk submission is ''Saturday, November 13''. See [http://www.mail-archive.com/code4lib@listserv.nd.edu/msg08878.html this mailing list post for more details].

Please follow the formatting guidelines:

<pre>
== Talk Title: ==

* Speaker's name, affiliation, and email address
* Second speaker's name, affiliation, email address, if second speaker

Abstract of no more than 500 words.
</pre>

== How "great" are the Great Books? ==

* Eric Lease Morgan, University of Notre Dame (emorgan at nd.edu)

In the 1960s a set of books called the Great Books of the Western World was published. It was supposed to represent the best of Western literature and enable the reader to further their liberal arts education. Sixty volumes in all, it included works by Plato, Aristotle, Shakespeare, Milton, Galileo, Kepler, Melville, Darwin, etc. These great books were selected based on the way they discussed a set of 102 "great ideas" such as art, astronomy, beauty, evil, evolution, mind, nature, poetry, revolution, science, will, wisdom, etc. How "great" are these books, and how "great" are the ideas expressed in them?

Given full text versions of these books it is almost trivial to use the "great ideas" as input and apply relevancy ranking algorithms against the texts thus creating a sort of score -- a "Great Ideas Coefficient". Term Frequency/Inverse Document Frequency (TFIDF) is a well-established algorithm for computing just this sort of thing:

relevancy = ( c / t ) * log( d / f ) where:

* c = number of times a given word appears in a document
* t = total number of words in a document
* d = total number of documents in a corpus
* f = total number of documents containing a given word

Thus, to calculate our Great Ideas Coefficient I sum the relevancy score for each "great idea" for each "great book". Plato's Republic might have a cumulative score of 525 while Aristotle's On The History Of Animals might have a cumulative score of 251. Books with a larger Coefficient could be considered greater. Given such a score a person could measure a book's "greatness". We could then compare the score to the scores of other books. Which book is the "greatest"? We could compare the score to other measurable things such as book's length or date to see if there were correlations. Are "great books" longer or shorter than others? Do longer books contain more "great ideas"? Are there other books that were not included in the set that maybe should have been included?

The first part of this talk describes the different steps involved in the text pre-processing to calculate an accurate TFIDF value for each item of the corpus. The results and statistical analysis are discussed in the second part. Finally I will outline the remaining work such as refining the analysis and extending the current quantitative process to a web implementation.

== UNR BookFinder: Leveraging Google Books to Move Beyond Catalog Search ==
* Will Kurt, University of Nevada, Reno, (wkurt at unr.edu)
Google Books is a great tool, but it lacks an easy method allowing users to access the items they find through their library. The UNR BookFinder is a mashup of the Google Books and WorldCat APIs (and some ugly hacks) which allows users to search for items with the power of Google’s fulltext search while eliminating the need to search all of the library’s various resources to find an item. The UNR BookFinder automatically searches the catalog and consortial ILL for the item, if these fail an ILLiad request form as automatically filled out. The end result is that the user can explore an universe of books and access them as fast as possible through the university library. A video of the alpha version can be found [http://www.youtube.com/watch?v=qaqcUSTtdVk here].

2011 Preconference Proposals

2010-10-20T18:38:39Z

Willkurt:

== Proposals for 2011 Code4LibCon Preconferences ==

Proposals will close Friday November 19 so we can finalize the list and add them to registration!

We'll have space for up to 3 full-day pre-conferences and 3-6 half-day pre-conferences.

'''Please include a "Contact/Responsible Individual" name and email address so we know who is willing to put on the proposed precon.'''

== Text mining ==
* Description: This workshop will describe and demonstrate the principles of text mining and other digital humanities computing techniques. With the advent of so much full text content available in libraries, and with the increasing ease in which people can find content, the question to ask one's self is, "What do I do with all of this content?" Or, as Gregory Crane said, [http://www.dlib.org/dlib/march06/crane/03crane.html "What do you do with a million books?"] Text mining, visualization, concordancing are some of the answers -- process for making sense of large full text corpora -- something often called "distant reading". Participants will go away with a better understanding of what the digital humanities are and how they can applied in a library setting.
* Duration: half-day
* Speaker Bio: Eric Lease Morgan considers himself a librarian first and a computer user second. His professional goal is to discover new ways to use computer to provide better library services. Some of his more notable projects included Mr. Serials, Index Morganagus, the Alex Catalogue of Electronic Texts, and MyLibrary. Currently he spends his time investigating the digital humanities and integrating them into VUFind.
* Contact: Eric Lease Morgan (emorgan at nd.edu)

== What's New In Solr ==
* Description: The library world is fired up about Solr. Practically every next-gen catalog is using it (via Blacklight, VuFind, or other technologies). Solr has continued improving in some dramatic ways, including geospatial support, field collapsing/grouping, extended dismax query parsing, pivot/grid/matrix/tree faceting, autosuggest, and more. This session will cover all of these new features, showcasing live examples of them all, including anything new that is implemented prior to the conference.
* Duration: half-day
* Speaker Bio: Erik has spoken at several code4lib conferences (Keynoted Athens '07 along with the infamous pioneering Solr preconference, presented at Providence '09, and pre-conferenced Asheville '10). Erik co-authored "Lucene in Action", and he's a Lucene and Solr committer. His library world claims to fame are founding and naming Blacklight, original developer on Collex and the Rossetti Archive search.
* Contact: Erik Hatcher (erik.hatcher at lucidimagination.com)

== Intro to Functional Programming with JavaScript (and a little Haskell) ==
* Description: Functional programming is a topic that is becoming increasingly important for programmers to be aware of. Unfortunately it also has the reputation of being an area of programming that is particularly difficult and academic. Languages like Haskell, while being very powerful, certainly live up to this reputation. However many of the essential features of functional programming can be explored through a language as simple and commonplace as JavaScript.

:This preconference talk will cover what makes a language ‘functional’ and the usage and implementation of essential features of functional programming: first-class functions, lambda functions, higher order functions, closures, and function currying. It will show how many of the powerful abstractions in a language like Haskell can also be implemented in a language like JavaScript, this will include a discussion of the trade offs between purity and performance.

:The aim of this talk is to prepare participants to both implement functional techniques in everyday programming, as well as start exploring the topic more academically. Even if you never plan on coding in a purely functional style this workshop will give you an understanding of topics that should improve your programming in other languages with functional features such as Ruby, Python, and C#. At the very least after this workshop you can go to the bar and throw around words like “lambda function”, “closure” and “currying” with confidence!
* Duration: half-day
* Speaker Bio: Will Kurt is the Applications Development Librarian at the University of Nevada, Reno, where he is also working on a master’s in Computer Science. He has spoken at several library conferences including Computers in Libraries and Internet Librarian on topics including the Microsoft Surface and Visualizing Information.
* Contact: Will Kurt (wkurt at unr.edu)

[[Category:Code4Lib2011]]

Working with MARC

2010-04-08T00:09:17Z

Willkurt: /* MARC Programming Libraries */

== Working with MARC ==

MARC stands for Machine Readable Cataloging, and many folks in the code4lib community find themselves working with MARC records at some point. This page is meant to be a round-up of the tools for working with MARC. If you want a general introduction to the standard, [http://en.wikipedia.org/wiki/MARC_standards the wikipedia article] is a good place to start.

=== Desktop tools ===
MarcEdit http://people.oregonstate.edu/~reeset/marcedit/html/index.php

=== Getting Marc Indexed for Search Engines ===

==== MARC in Solr ====

SolrMarc http://code.google.com/p/solrmarc/

Solr http://lucene.apache.org/solr

==== MARC in Zebra ====

Getting Started with Zebra http://wiki.code4lib.org/index.php/Getting_Started_with_Zebra

Zebra http://www.indexdata.com/zebra

=== MARC Programming Libraries ===

{| class="wikitable sortable"
|-valign="top"
! Project !! Language !! class="unsortable" | Links !! class="unsortable" | Notes
|-valign="top"
| MARC4J || Java || http://marc4j.tigris.org/ ||
|-valign="top"
| javamarc || Java || http://github.com/billdueber/javamarc || Fork of MARC4J
|-valign="top"
| MARC/pm || Perl || http://marcpm.sf.net || Umbrella project; see also [http://search.cpan.org/search?query=marc&mode=all CPAN]
|-valign="top"
| pymarc || Python || http://github.com/edsu/pymarc/ ||
|-valign="top"
| File_MARC || PHP || http://pear.php.net/package/File_MARC/ || PEAR package; sanctioned fork of PHP-MARC
|-valign="top"
| PHP-MARC || PHP || http://www.emilda.org/index.php?q=php-marc || Abandoned(?)
|-valign="top"
| ruby-marc || Ruby || http://rubyforge.org/projects/marc/ <br/> http://wiki.code4lib.org/index.php/Ruby-marc ||
|-valign="top"
| enhanced-marc || Ruby || http://github.com/rsinger/enhanced-marc || Convenience methods for ruby-marc
|-valign="top"
| marc21 || Scheme || http://code.google.com/p/marc21 ||
|-valign="top"
| marcerl || Erlang || svn://pubserv.oclc.org/marcerl|| Very alpha code
|-valign="top"
| Scala-MARC || Scala || http://github.com/achelous/Scala-MARC ||
|-valign="top"
| CSharp Marc || C# || http://bitbucket.org/mattgrayson/csharp-marc ||
|-valign="top"
| MARC.NET || C# || http://github.com/willkurt/MARC.NET || basic start, not thoroughly 'real world' tested
|}

=== Getting Sample Data ===

One common question is where to get sample MARC records for testing or playing around with. If you work at a library, chances are good that you can get some records out of your ILS (go ask your systems librarian if you don't know how to do this yourself). If you don't work in a library, you can get MARC bibliographic records from the Internet Archive at [http://www.archive.org/details/marcrecords http://www.archive.org/details/marcrecords].

There is a nascent movement within the code4lib community to establish a test set of problematic MARC records, especially records that are representative of the kinds of weirdness that is encountered in real libraries. It is hoped that this could eventually become a test corpus against which to run various MARC processing implementations. For more information, watch [http://www.archive.org/details/MARCTHULU Simon Spero's excellent talk from Code4LibCon 2010].

MARC records for authority data are more common. The [http://www.getty.edu/research/conducting_research/vocabularies/download.html Getty Vocabularies] makes both the The Art & Architecture Thesaurus (AAT) and The Union List of Artist Names (ULAN) freely available. The [http://www.library.northwestern.edu/public/gsafd/ Guidelines On Subject Access To Individual Works Of Fiction, Drama, Etc.] records are available from Northwestern University. The [http://www.nlm.nih.gov/mesh/filelist.html Medical Subject Headings (MeSH)] are available in many formats, one of them being MARC.