Changes

2010talks Submissions

16,707 bytes added, 19:09, 17 November 2009

m

Adding 2010 category

~~Edit this page to submit your proposal~~ Deadline for ~~a 20-minute~~ talk ~~at the Code4Lib 2010 Conference~~submission was '''Friday, November 13'''. ~~For more information, see~~ Edits to existing proposals are no longer allowed as these are being processed for the ~~[[2010talkscall_Call_for_Submissions|Call for submissions]]~~voting system.

'''Please follow the formatting guidelines:'''

* Harrison Dekker, University of California, Berkeley, hdekker@library.berkeley.edu

[http://cran.r-project.org/ R ] is a popular, powerful , and extensible open source statistical analysis application. [http://biostat.mc.vanderbilt.edu/rapache/ Rapache], software developed at Vanderbilt University, allows web developers to leverage the ~~numeric processing~~ data analysis and ~~graphical~~ visualization capabilities of R in real-time through simple Apache server requests. This presentation will provide an overview of both R and rapache and will explore how these tools ~~are relevant~~ might be used to develop applications for the library community.

== Metadata editing - a truly extensible solution ==

* Naomi Dushay, Stanford University, ndushay@stanford.edu

* Willy Mene, Stanford University, wmene@stanford.edu

* Jessie Keck, Stanford University, jkeck@stanford.edu

How is it worth it to slow down your code development to write tests? Won't it take you a long time to learn how to write tests? Won't it take longer if you have to write tests AND develop new features, fix bugs? Isn't it hard to write test code? To maintain test code? I We will ~~try to answer~~ address these questions as I we talk about how test code is crucial for our software. By way of illustration, I we will show how it has played a vital role in making Blacklight a true community collaboration, as well as how it has positively impacted coding projects in the Stanford Libraries.

== How To Implement A Virtual Bookshelf With Solr ==

== A Better Advanced Search? ==

* Naomi Dushay, Stanford University, ndushay@stanford.edu

* Jessie Keck, Stanford University, jkeck@stanford.edu

Even though we'd ~~like~~ love to get basic searches working so well that advanced search wouldn't be necessary, there will always be a small set of users that want it, and there will always be some library searching needs that basic searching can't serve. Our user interface designer was dissatisfied with many aspects of advanced search as currently available in most library discovery software; the form she designed was excellent but challenging to implement. See http://searchworks.stanford.edu/advanced

We'll share details of how we implemented Advanced Search in Blacklight:

# ~~thoughtfully~~ non-techie designed html form for the user ~~(NOT done by techies!)~~

# boolean syntax while using Solr dismax magic (dismax does not speak Boolean)

# checkbox facets (multiple facet value selection)

## easily configured in solrconfig.xml

# manipulating user entered queries before sending them to Solr

# making advanced search results look like other search results: breadcrumbs, selectable facets, and other fun.

== Scholarly annotation services using AtomPub and Fedora ==

* Ross Singer, Talis, ross.singer@talis.com

A year later and how far has Linked Library Data come? ~~Outside~~ With the emergence of ~~the Swedish National Library's LIBRIS~~ large, centralized sources (~~which already existed), the return of lcsh.info as http://~~id.loc.gov/authorities/ ~~and LC's Chronicling America~~, ~~not much~~viaf. ~~But~~ org, among others) entry to the Linked Data cloud might be easier than you think. This presentation will describe various projects that are out in the wild that can bridge the gap between our legacy data and the semantic web, incremental steps we can take modeling our data, why linked data matters and a demonstration of how a small template changes can contribute to the Linked Data cloud.

== A code4lib Manifesto ==

* Jodi Schneider, DERI NUI Galway, jschneider@pobox.com

~~We will tell about the idea of binding together~~ JeromeDL is an open source e-library with semantics ~~coming from two sources~~. A fully functional digital library, JeromeDL uses linked data: ~~legacy~~ using standard "Web3.0" vocabularies such as SIOC, ~~well-crafted annotations provided by librarians~~FOAF, and ~~less organized/structured annotations provided by the community~~ WordNet, JeromeDL publishes RDF descriptions of the e-library ~~users~~contents. ~~We will present <a href="http://www.jeromedl.org/">JeromeDL system</a> that enables users~~ Jerome DL uses FOAF to ~~provide and~~ manage ~~such annotations; it also implements a number of information discovery solutions~~ users--meaning that ~~utilize these combined~~ access privileges can be naturally assigned to a social network, in addition to individuals or all WWW users. Users can also share annotations, ~~including~~ promoting collaborative browsing~~, natural language query templates~~ and collaborative filtering. ~~We will also talk about a vocabulary service used by JeromeDL that encourages~~ To encourage users to provide ~~more~~ meaningful annotations ~~than~~ (beyond just tags~~. Finally~~), ~~we will show how~~ JeromeDLuses a WordNet-based ~~libraries contribute to~~ vocabulary service. The system also leverages full-text indexing with Lucene and allows filtering with the ~~Web 3~~SIMILE project's Exhibit.~~0 linked data by utilizing standard vocabularies~~In short, ~~such as SIOC~~JeromeDL is a social semantic digital library--allowing users to collect, ~~FOAF~~publish, and ~~WordNet, and publishing RDF description of~~ share their library ~~content~~with their social network on the semantic web.

*[http://www.jeromedl.org/ JeromeDL homepage]

*[http://bleedingedge.jeromedl.org/preview?show=techreport JeromeDL demo site]

== Kill the search button ==

== HIVE: a new tool for working with vocabularies ==

* Ryan Scherle, National Evolutionary ~~Synthesic~~ Synthesis Center, rscherle@nescent.org

* Jose Aguera, Universitty of North Carolina, jose.aguera@gmail.com

HIVE is a toolkit that assists users in selecting vocabulary and ontology terms to annotate digital content. ~~The~~ HIVE ~~approach promises to combine~~ combines the ease of folksonomies with the rigor of traditional vocabularies. ~~Users can~~ By combining semantic web standards with text mining techniques, HIVE will improve the effectiveness of subject metadata generation, allowing users to search and browse ~~through~~ terms from a variety of vocabularies and ontologies ~~in one integrated tool~~. Documents can be submitted to HIVE ~~for automatic analysis, resulting in a set of~~ to automatically generate suggested vocabulary terms.

Your system can interact with common vocabularies such as LCSH and MESH via the central HIVE server, or you can install a local copy of HIVE with your own custom set of vocabularies. This talk will give an overview of the current features of HIVE and describe how to build tools that use the HIVE services.

== Implementing Metasearch and a Unified Index with Masterkey ==

* [[User:DataGazetteer|Peter Murray]], OhioLINK, peter@OhioLINK.edu

Index Data's suite of metasearch and local indexing tools under the product name Masterkey are a powerful way to provide access to a diverse set of databases. In 2009, OhioLINK contracted with Index Data to help build a new metasearch platform and a unified index of locally-loaded records.

By the time conference rolls around, the user interface and the metasearch infrastructure will be set up and live. This part of the presentation will dive into the innards of the AJAX-powered end-user interface, the configuration back-end, and possibly a view of the Gecko-driven Index Data Connector Framework.

It is hard to predict at the point this talk is being proposed what the state of the unified index will be. At the very least, there will be broad system diagrams and a description of how intend to eventually bring 250 million records into one index. With luck, there might even be running code to show.

== Adding Solr-based Search to Evergreen's OPAC ==

* Alexander O'Neill, Robertson Library, University of Prince Edward Island, aoneill@upei.ca

The current way the Evergreen OPAC searches records is to use it's database back-end's search system, with heavy use of caching layers to compensate for the relatively long wait to perform a new search.

This is a personal project to adapt the Evergreen search results page to use the Solr and Lucene search engine stack - integrating the external search function as closely as possible with Evergreen's existing look and feel. This is a possible alternative to replacing an entire OPAC just to take advantage of the very desirable features offered by the Solr stack as Evergreen does offer a very well-designed extensible JavaScript interface which we and others have already gotten great results customizing and adding features to such as integrated Google Books previews and incorporating LibraryThing's social features. Adapting the leading open source search technology into this very powerful stack is one more feature to add to Evergreen's very compelling list of selling points.

It is still possible to use Evergreen's OpenSRF messaging system to get live information about each book's current availability status without having to push all of this information into the Solr index.

I will show how I used SolrMarc to import records from Evergreen, taking advantage of the fact that the VuFind and Blacklight projects have collaborated to create a general import utility that is usable by third-party projects. I will discuss some of the hurdles I encountered while using SolrMarc and the resulting changes to SolrMarc's design that this use case helped to motivate.

I'll also make an effort to take measurements of performance when hosting both Solr and Evergreen on the same server compared with putting Solr on a separate server. It will also be informative to see how much of an Evergreen server's system load is devoted to processing user searches.

==Matching Dirty Data - Yet another wheel==

* Anjanette Young, University of Washington Libraries, younga3 at u washington edu

* Jeff Sherwood, University of Washington Libraries, jeffs3 at u washington edu

Regular expressions is a powerful tool to identify matching data between similar files. When one or both of these files has inconsistent data due to differing character encodings or miskeying, the use of regular expressions to find matches becomes impractically complex.

The Levenshtein distance (LD) algorithm is a basic sequence comparison technique that can be used to measure word similarity more flexibly. Employing the LD to calculate difference eliminates the need to identify and code into regex patterns all of the ways in which otherwise matching strings might be inconsistent. Instead, a similarity threshold is tuned to identify close matches while eliminating false positives.

Recently, the UW Libraries began an effort to store Electronic Theses and Dissertations (ETD) in our institutional repository which runs on DSpace. We received 6,756 PDFs along with a file of UMI-created MARC records which needed to be matched to our library's custom MARC records (60,175 records). Once matched, merged information from both records would be used to create the dublin_core.xml file needed for batch ingest into DSpace. Unfortunately, records within the MARC data had no common unique identifiers to facilitate matching. Direct matching by title or author was impractical due to slight inconsistencies in data entry. Additionally, one of the files had "flattened" characters in title and author fields to ASCII. We successfully employed LD to match records between the two files before merging them.

This talk demonstrates one method of matching sets of MARC records that lack common unique identifiers and might contain slight differences in the matching fields. It will cover basic usage of several python tools. No large stack traces, just the comfort of pure python and basic computational algorithms in a step-by-step presentation on dealing with an old library task: matching dirty data. While much literature exists on matching/merging duplicate bibliographic records, most of this literature does not specify how to accomplish the task, just reports on the efficiency of the tools used to accomplish the task, often within a larger system such as an ILS.

==Automating Git to create your own open-source Dropbox clone==

* Ian Walls, System Integration Librarian, NYU Health Sciences Libraries, Ian.Walls at med.nyu.edu

Dropbox is a great tool for synchronizing files across pretty much any machine you’re working on. Unfortunately, it has some drawbacks:

# Monthly fees for more than 2GB

# The server isn’t yours

# The server-side scripting isn’t open source

However, using the [http://git-scm.com/ Git distributed version control system], file event APIs, and your favourite scripting language, it is possible to create a file synchronization system (with full replication and multiple histories) that connects all your computers to your own server.

These scripts would allow library developers to collaborate and work on multiple machines with ease, while benefiting from the robust version control of Git. An active internet connection is not required to have access to the full history of the repository, making it easier to work on the go. This also keeps your data more private and secure by only hosting it on machines you trust (important if you’re dealing with sensitive patron information).

== Becoming Truly Innovative: Migrating from Millennium to Koha==

* Ian Walls, System Integration Librarian, NYU Health Sciences Libraries, Ian.Walls at med.nyu.edu

On Sept. 1st, 2009, the NYU Health Sciences Libraries made the unprecedented move from their Millennium ILS to Koha. The migration was done over the course of 3 months, without assistance from either Innovative Interfaces, Inc. or any Koha vendor. The in-house script, written in Perl and XSLT, can be used with any Millennium installation, regardless of which modules have been purchased, and can be adapted to work for migration to systems other than Koha. Helper scripts were also developed to capture the current circulation state (checkouts, holds and fines), and do minor data cleanup.

This presentation will cover the planning and scheduling of the migration, as well as an overview of the code that was written for it. Opportunities for systems integration and development made newly available by having an open source platform are also discussed.

== 7 Ways to Enhance Library Interfaces with OCLC Web Services ==

* Karen A. Coombs, librarywebchic@gmail.com

OCLC Web Services such as xISSN, WorldCat Search API, WorldCat Identities, and the WorldCat Registry provide a variety of data which can be used to enhance and improve current library interfaces. This talk will discuss several simple ideas to improve current users interfaces using data from these services.

Javascript and PHP code to add journal of table of contents information, peer-reviewed journal designation, links to other libraries in the area with a book, also available ..., and info about this author will be discussed.

== Adventures with Facebook Open Platform ==

* Kenny Ketner, Texas Tech University Libraries, kenny.ketner@ttu.edu

Developing with the facebook platform can be both exciting and something that you wouldn’t wish on your worst enemy. This talk will chronicle the Texas Tech Libraries Development Team experimentation with Facebook Open Platform (fbOpen) as we attempt to create a facebook-like social media application Texas Tech University Libraries, hopefully expanding to the Texas Digital Library (TDL).

More than just a facebook app or page, fbOpen is a complete implementation of the facebook system on a LAMP stack – Linux, Apache, MySQL, PHP – which must be maintained by the institution itself. This project is at an early stage, so emphasis will be placed on the challenges of installation, configuration, and testing, as well as the pros and cons for institutions that are considering taking on a similar project.

== Kurrently Kochief ==

* Gabriel Farrell, Drexel University Libraries, gsf24@drexel.edu

Kochief is a discovery interface and catalogue manager. It rests on Solr and a

Python stack including Django, pymarc, and rdflib. We're using it to highlight

a few collections at Drexel. They live at http://sets.library.drexel.edu.

I'll talk about the latest and greatest, including advances in the install and

configuration, details considered in the searcher's experience, and the

sourcing and exposing of Linked Data.

== Fedora Commons Repository Workflow with Drupal 6 and SCXML ==

* Scott Hammel, Clemson University, scott@clemson.edu

Clemson is building an enterprise architecture repository to support the Medicaid Information Technology Architecture framework. Using Drupal 6 and Fedora Commons Repository and inspired by Islandora, we've written a module for Drupal that supports artifact governance workflow. Workflow is represented as a state machine stored as SCXML in datastreams on digital objects.

I will talk about the solution, challenges, standards and how workflow, governance, state, and policy are stored and manipulated as content on digital objects.

== Forging Connections: Current uses of SRU ==

* T. Michael Silver, MLIS Student at the University of Alberta, michael.silver@ualberta.ca

Search / Retrieve via URL (SRU) has been touted as the next generation of the Z39.50 protocol. Its use of HTTP communication and XML data formats were designed to allow greater integration with other online resources. In October and November 2009, I interviewed seven SRU administrators from libraries, not-for-profit and for-profit organizations to gain insights into their experiences with the protocol.

The results from this small study show that SRU is being used as more than a replacement for Z39.50. Instead, it is also being used to create connections between information resources and users by leveraging the protocol’s use of web standards. My presentation will focus on reporting the topics which emerged during the interviews, ranging from the history and future of information retrieval to differing views on SRU’s relationship with federated search, OpenSearch and other web protocols.

==Extending EZProxy for Fun and Profit==

* Brice Stacey, University of Massachusetts Boston, brice.stacey@umb.edu

EZProxy is much more than just an authentication tool for remote access to library resources. As middleware between electronic resources and patrons, EZProxy is the the backbone from which many applications may be built. Potential uses include monitoring resource use to enhance collection development decisions, injecting context-sensitive information and links to tutorials in a branded toolbar for the duration of a session, and using EZProxy as a single sign-on server. These three ideas alone could streamline the user experience, allow for more granular library instruction and increase awareness of what is actually important to users.

In this session I'd also like to initiate a discussion about the creation of a collaborative site for EZProxy administrators. The proposed site would feature a private workspace to manage EZProxy configurations, drawn from a public repository of database definitions and authentication schemes. Additionally, the site would be an ideal environment for developing additional applications as described above.

== Micro Library Apps: Building library functionality into the Google Gadget platform ==

* Jason A. Clark, Head of Digital Access and Web Services, Montana State University Libraries, jaclark@montana.edu

With implementations of the OpenSocial standard, complete functionality within Google Wave, and a huge user base actively using iGoogle, Google Gadgets and the Gadgets API can be used as an emerging platform for bite-sized pieces of library services and applications.

MSU Libraries has applied Google Gadget API technology to allow users to create their own dashboards or waves filled with library content modules. In this session we will demonstrate a wide range of gadgetry including, but not limited to: tabbed gateway searching of catalogs and databases, flash-animated library subject maps, a customized database gateway, a digital collections app gadget, a feed aggregator for library data streams, and a gadget for campus maps and street views.

[http://www.lib.montana.edu/tools/gadgets.php http://www.lib.montana.edu/tools/gadgets.php]

We'll talk through the anatomy of a Google Gadget, the possibilities for the API and its use in library settings, and the XML, Javascript, HTML, and occasional PHP that make it go.

== Can't We All Just Get Along? ==

* Ryan Scherle, National Evolutionary Synthesis Center, rscherle@nescent.org

One of the greatest challenges of a large project is bringing together people from different traditions and getting them to work together. Most Code4Lib attendees are accustomed to working with a team of librarians, technologists, and subject specialists. Working with teams from multiple institutions and multiple disciplines increases the level of complexity, particularly when some teams have a history of maintaining their own discipline-specific technology solutions.

[http://dataone.org DataONE] is a collaborative repository of scientific data being developed by a group of more than 20 organizations. It will combine contents from a diverse set of scientific repositories, covering many disciplines, metadata schemes, and usage policies.

I will give an overview of the DataONE project and its technical architecture, focusing on the architectural design process and techniques for overcoming the differences between the participating repositories. I will also outline the steps required if you want to connect a new repository to the DataONE system.

== Data for all: facilitating access to reference transaction data using web-based tools ==

* David Dahl, Emerging Technologies Librarian, Towson University, ddahl@towson.edu

Like many libraries, Towson University’s Albert S. Cook Library uses a homegrown web application to record reference transaction statistics into a Microsoft Access database. (Ours is informally called StatsTracker.) Previously this collected data was only available in a raw format within the database, limiting its usefulness to just 1 or 2 staff with knowledge of querying an Access database. These individuals were frequently asked to compile data to aid in the department’s decision-making. A recent initiative to make this data more publicly accessible (to internal staff) motivated the creation of a suite of web-based tools that aggregate and analyze collected data in order to make up-to-the-minute statistics available for use by the Reference Department. Using a combination of ASP.net, SQL, Microsoft Chart Controls, and the Visual Web Developer (VWD) application for development, the StatsTracker Analysis Toolkit makes reference transaction data accessible and usable by any member of the department.

This session will cover the development process, demonstrate how VWD facilitated development, and present possibilities for further use of this combination of tools.

[[Category: Code4Lib2010]]

← Older edit

Wickr

Bureaucrat, sudo, administrator

417

edits

Changes

2010talks Submissions

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools