Changes

2012 talks proposals

48,793 bytes added, 19:46, 27 May 2016

→‎Beyond code: Versioning data with Git and Mercurial.

Deadline for talk submission is was ''Sunday, November 20''.(The deadline for 2012 Talks proposals is now closed.)

Prepared talks are 20 minutes (including setup and questions), and focus on one or more of the following areas:

''We have exhibited at a couple of library conferences, and have received a lot of interest. blekko is a free service.''

== Beyond code. : Versioning data with Git and Mercurial. ==

* ~~Stephanie~~ Charlie Collett, California Digital Library, ~~stephanie~~charlie.collett@ucop.edu

* Martin Haye, California Digital Library, martin.haye@ucop.edu

Mendeley has built the world's largest open database of research and we've now begun to collect some interesting social metadata around the document metadata. I would like to share with the Code4Lib attendees information about using this resource to do things within your application that have previously been impossible for the library community, or in some cases impossible without expensive database subscriptions. One thing that's now possible is to augment catalog search by surfacing information about content usage, allowing people to not only find things matching a query, but popular things or things read by their colleagues. In addition to augmenting search, you can also use this information to augment discovery. Imagine an online exhibit of artifacts from a newly discovered dig not just linking to papers which discuss the artifact, but linking to really good interesting papers about the place and the people who made the artifacts. So the big idea is, "How will looking at the literature from a broader perspective than simple citation analysis change how research is done and communicated? How can we build tools that make this process easier and faster?" I can show some examples of applications that have been built using the Mendeley and PLoS APIs to begin to address this question, and I can also present results from Mendeley's developer challenge which shows what kinds of applications researchers are looking for, what kind of applications peope are building, and illustrates some interesting places where the two don't overlap.

Slides from my talk are here: http://db.tt/PMaqFoVw

==Your UI can make or break the application (to the user, anyway)==

==Search Engine Relevancy Tuning - A Static Rank Framework for Solr/Lucene==

* Mike Schultz, ~~Amazon.com (~~formerly Summon Search Architect) , mike.schultz@gmail.com

Solr/Lucene provides a lot of flexibility for adjusting relevancy scoring and improving search results. Roughly speaking there are two areas of concern: Firstly, a 'dynamic rank' calculation that is a function of the user query and document text fields. And secondly, a 'static rank' which is independent of the query and generally is a function of non-text document metadata. In this talk I will outline an easily understood, hand-tunable static rank system with a minimal number of parameters.

* Mark A. Matienzo, Yale University Library, mark@matienzo.org

~~''This~~ An ongoing problem for many archives and special collections units is ~~placeholder text;~~ a lack of technological infrastructure and ongoing support. Funding for many archival programs arrives on a project-by-project basis, often in the form of grants. One of the largest concerns for archivist, therefore, is ensuring the sustainability of any solutions or processes that support core operations, such as archival description ~~coming shortly'~~and access systems. The presenters will describe their experience developing an iterative and sustainable approach to archival description and access at the library of a small historical society. Starting with mostly OCRed legacy finding aids and no online access to collections, and ending with structured data about the entirety of their holdings available online over three years time, we will detail the evolution of the work from problem-solving through to the resulting phases of descriptive work and development of a basic online access portal created in WordPress. We will discuss making reasonable and sustainable choices in an environment with little monetary and technical support, and how the organization's staff were able to build a system and processes that could leverage messy legacy metadata initially and grow to use structured, standardized data as it was created. We will also discuss the specific technical solutions we developed (the WordPress instance and supporting plugins) and our experience with how bugs and barriers outside of our control changed our insights.

== Making the Easy Things Easy: A Generic ILS API ==

== DMPTool: Guidance and ~~Resources for your~~ resources to build a data management plan ==

* Marisa Strong, California Digital Libary, marisa.strong@ucop.edu

A number of U.S. funding agencies such as the National Science Foundation require researchers to supply detailed, cost-effective plans for managing research data, called Data Management Plans. To help researchers with this requirement, several organizations such as the California Digital Library, University of Illinois, University of Virginia, Smithsonian Institution, the DataONE consortium and the (UK) Digital Curation Centre) came together to develop the DMPTool. The goal of the DMPTool is to provide researchers with guidance, links to resources and help with writing data management plans.

~~This tool presents~~ A number of U.S. funding agencies such as the ~~requirements specific~~ National Science Foundation require researchers to ~~the funding agency they are applying~~ supply detailed plans for ~~along with detailed~~ managing research data, called Data Management Plans. To help researchers with ~~each section. Users can create a plan~~this requirement, ~~preview it~~the California Digital Library (CDL) along with several organizations, ~~export it in various formats, and make it freely accessible for others~~ collaborated to ~~read~~develop the DMPTool. ~~Users who are members of participating institutions will benefit from specific help for each section~~The goal is to provide researchers with guidance, ~~suggested answers, and~~ links to resources ~~for management of their data, all specific to their institution. Institutions can also announce events, workshops,~~ and help with writing data management ~~information via the DMPTool blog available from within the tool~~plans.

This open-source , Ruby on Rails software tool is hosted on a SLES VM by CDL. The tool is integrated with ~~federated login using~~ Shibboleth , federated single sign-on software, which allows users to login via their home institutions. ~~It is a Ruby/Rails application hosted on a SLES VM.~~ We had a geographically distributed development team sharing their code on Bitbucket.

This talk will demo ~~the~~ features of the application , the Shibboleth login architecture, as well as highlight the agile development practices and ~~infrastructure~~ methods used ~~in building~~ to successfully design and build the applicationon an aggressive schedule.

== Lies, Damned Lies, and Lines of Code Per Day ==

I will describe the technical underpinnings of Carrier, challenges that we’ve faced since its implementation, enhancements planned for the next release of the software, and discuss our plans for releasing this software for others to use '''for free'''.

== We Built It. They Came. Now What? ==

What can we do as a community to move beyond our build-first-ask-questions-later mentality and embed sustainability into our new and existing ideas and products without moving toward commercialization? I fully expect we’ll end up with more questions than answers, but let’s spend some talking about our predicament and yours and think about how we can come out the other side.

== Contextually Rich Collections Without the Risk: Digital Forensics and Automated Data Triage for Digital Collections ==

* [[User:kamwoods|Kam Woods]], University of North Carolina at Chapel Hill, kamwoods@email.unc.edu

* Cal Lee, University of North Carolina at Chapel Hill, callee -- at -- ils -- unc -- edu

Digital libraries and archives are increasingly faced with a significant backlog of unprocessed data along with an accelerating stream of incoming material. These data often arrive from donor organizations, institutions, and individuals on hard drives, optical and magnetic disks, flash memory devices, and even complete hardware (traditional desktop computers and mobile systems).

Information on these devices may be sensitive, obscured by operating system arcana, or require specialized tools and procedures to parse. Furthermore, the sheer volume of materials being handled means that even simple tasks such as providing useful content reports can be impractical (or impossible) in current workflows.

Many of the tasks currently associated with data triage and analysis can be simplified and performed with improved coverage and accuracy through the use of open source digital forensics tools. In this talk we will discuss recent developments in providing digital librarians and archivists with simple, open source tools to accomplish these tasks. We will discuss tools and methods be tested, developed and packaged as part of the [http://bitcurator.net BitCurator] project. These tools can be used to reduce or eliminate laborious, error-prone tasks in existing workflows and put valuable time back into the hands of digital librarians and archivists -- time better used to identify and tackle complex tasks that *cannot* be solved by software.

== Finding Movies with FRBR and Facets ==

* Kelley McGrath, University of Oregon, kelleym@uoregon.edu

How might the Functional Requirements for Bibliographic Records (FRBR) model and faceted navigation improve access to film and video in libraries? I will describe the design and implementation of a FRBR-inspired prototype discovery interface ([http://blazing-sunset-24.heroku.com/ http://blazing-sunset-24.heroku.com/]) using Solr and Blacklight . This approach demonstrates how FRBR can enable a work-centric view that is focused on the original movie or program while supporting users in selecting an appropriate version.

The prototype features two sets of facets, which independently address two important information needs: (1) "What kind of movie or program do you want to watch?" (e.g., a 1970s TV sitcom, something directed by Kurosawa, or an early German horror film); (2) "How do you want to watch it? Where do you want to get it from?" (e.g., on Blu-ray, with Spanish subtitles, available at the local public library). This structure enables patrons to narrow, broaden and pivot across facet values instead of limiting them to the tree-structured hierarchy common with existing FRBR applications.

This type of interface requires controlled data values mapped to FRBR group 1 entities, which in many cases are not available in existing MARC bibliographic records. I will discuss ongoing work using the XC Metadata Services Toolkit ([http://www.extensiblecatalog.org/ http://www.extensiblecatalog.org/]) to extract and normalize data from existing MARC records for videos in order to populate a FRBRized, faceted discovery interface.

==Escaping the Black Box — Building a Platform to Foster Collaborative Innovation==

* Karen Coombs, OCLC, coombsk@oclc.org

* Kathryn Harnish, OCLC harnishk@oclc.org

Exposed Web services offer an unprecedented opportunity for collaborative innovation — that’s one of the hallmarks of Web-based services like Amazon, Google, and Facebook. These environments are popular not only for their native feature sets, but also for the array of community-developed apps that can run in them. The creativity of the development communities that work in these systems brings new value to all types of users.

What if the library community could realize this same level of collaborative innovation around its systems? What kinds of support would be necessary to transform library systems from “black boxes” to more open, accessible environments in which value is created and multiplied by the user community?

In this session, we’ll discuss the challenges and opportunities OCLC faced in creating just that kind of environment. The recently-released OCLC “cooperative platform” provides improved access to a wide variety of OCLC’s data and services, allowing library developers and other interested partners to collaborate, innovate, and share new solutions with fellow libraries. We’ll describe the open standards and technologies we’ve put in play in as we:

* exposed robust Web services that provide access to both data and business logic;

* created an architecture for integrating community-built applications in OCLC (and other) products; and

* developed an infrastructure to support community development, collaboration, and app sharing

Learn how OCLC is helping to open the “black box” -- and give libraries the freedom to become true partners in the evolution of their library systems.

== Code inheritance; or, The Ghosts of Perls Past ==

* Jon Gorman, University of Illinois, jtgorman@illinois.ed

Any organization has a history not found in its archives or museums. Mysteries exist that origins are lost to the collective institutional knowledge. Despite what has been forgotten by humans, our servers and computers still keep running. Instructions crafted long ago execute like digital ghosts following orders of masters who have long since left.

The University of Illinois has a fair amount of Perl code created by several different developers. This code includes software that handles our data feeds coming both in and out of campus, reports against our Voyager system, some web applications, and more.

I'll touch a little on the historical legacy and why Perl is used. From there I'll share some tips, best practices, and some of the mistakes I've made in trying to maintain this code. Most of the advice will transition to any language, but code and libraries discussed will be Perl. The presentation will also touch on some internal debate on whether or not to port parts of our Perl codebase.

== Recorded Radio/TV broadcasts streamed for library users ==

* Kåre Fiedler Christiansen, The State and University Library Denmark, kfc@statsbiblioteket.dk

* Mads Villadsen, The State and University Library Denmark, mv@statsbiblioteket.dk

"Provide online access to the Radio/TV collection," my boss said. About 500,000

hours of Danish broacast radio and TV. Easy, right? Well, half a year later

we'd done it, but it turned out to involve practically every it employee in the

library and quite a few non-technical people as well.

Combining our Fedora-based DOMS repository system with our Lucene-based Summa

search system with our WAYF-based single-signon system with an upgrade of our

SAN system for enough speed to deliver the content with an ffmpeg-based

transcoding workflow system with a Wowza-based streaming server, and sprinkling

it all with a nice user-friendly web frontend turned out to be quite a challenge,

but also one of the most engaging experiences for a long time.

Of course we were immidiately shut down, since the legal details weren't quite

as clear as we thought they were, but take an exclusive preview at

http://developer.statsbiblioteket.dk/kultur/ - username/password: code4lib.

== NoSQL Bibliographic Records: Implementing a Native FRBR Datastore with Redis ==

* Jeremy Nelson, Colorado College, jeremy.nelson@coloradocollege.edu

In October, the Library of Congress issued a news release, "A Bibliographic Framework for the Digital Age" outlining a list of requirements for a New Bibliographic Framework Environment. Responding to this challenge, this talk will demonstrate a Redis (http://redis.io) FRBR datastore proof-of-concept that, with a lightweight python-based interface, can meet these requirements.

Because FRBR is an Entity-Relationship model; it is easily implemented as key-value within the primitive data structures provided by Redis. Redis' flexibility makes it easy to associate arbitrary metadata and vocabularies, like MARC, METS, VRA or MODS, with FRBR entities and inter-operate with legacy and emerging standards and practices like RDA Vocabularies and LinkedData.

== Upgrading from Catalog to Discovery Environment: A Consortial Approach ==

* Spencer Lamm, Swarthmore College, slamm1@swarthmore.edu

* Chelsea Lobdell, Swarthmore College, clobdel1@swarthmore.edu

Almost two years ago the Tri-College Consortium of Haverford, Swarthmore, and Bryn Mawr Colleges embarked upon a journey to provide enhanced end-user experience and discoverability with our library applications. Our solution was to implement an integration of ExLibris's Primo Central into Villanova's VuFind for a dual-channel searching experience. We present a case study of the collaborative and technical aspects of our process.

At a high level we will describe our approach to project management and decision making. We used a multi-tiered structure of working groups with an iterative design-feedback implementation cycle. We will relay lessons learned from our experience: successes, failures, and unexpected hurdles.

At a lower, technical level we will discuss the vufind search module architecture; the workflow of creating a new search channel; a Primo API parser; and the data structures of the Primo API response and the Primo SearchObject. Time permitting, we will also outline how we modified VuFind's Innovative driver to work with our ILS.

== Improving geospatial data access for researchers and students ==

* Dileshni Jayasinghe, Scholars Portal, University of Toronto, d.jayasinghe@utoronto.ca

* Sepehr Mavedati, Scholars Portal, University of Toronto, sepehr.mavedati@utoronto.ca

Scholars GeoPortal (http://geo.scholarsportal.info) was created as a platform for online delivery of geospatial data resources to the Ontario Council of University Libraries community. Prior to the start of this project, each institution was storing data locally, and had its own practice for distributing datasets to users. This ranged from home grown online data delivery systems to burning data on to DVDs for each individual request. Most institutions had limited resources and expertise to create and maintain a sophisticated delivery system on their own. Led by OCUL Map, GIS librarians, staff at Scholars Portal in partnership with the Government of Ontario, the GeoPortal project began in 2009.

Our talk will focus on the design and architecture of Scholars Portal's solution to support maps and geospatial data, and how we distribute these data collections to our users.

The system consists of 4 main components: metadata management system, map server, spatial database, and the web application.

*Metadata Management: customized metadata editor with data hosted in MarkLogic, providing text and spatial queries

*Map Server: ArcGIS Server

*Spatial database: MS SQL Server with spatial extension

*Web application: Javascript web application using Dojo and Esri’s Javascript API

For other code4libbers who are interested in a similar system, we will also discuss the open source alternatives for each component (GeoNetwork, MapServer, etc.), and challenges and limitations we faced trying to use some of these tools. We'd also like to pick your brains on how we can make this application better. What can we do differently?

== LibX 2.0 ==

* Godmar Back, Virginia Tech, godmar@gmail.com

We would like to provide the Code4Lib community with an update on what we've accomplished with LibX (which we last presented in 2009) - where we've gone, what our users are thinking, and how both its technology and its adapter community can be included in the code4lib world. We've grown to our 200,00 users, have a sleek, newly designed user interface, support for Google Chrome. We're now directly consuming many web services. Our Libapp Builders allows anyone to place results, cue, tutorials and other library-related information into pages.

== Introducing the DuraSpace Incubator ==

* Jonathan Markow, DuraSpace, jjmarkow@duraspace.org

DuraSpace is planning to launch a new incubation program for the benefit of open source projects that wish to become part of our organization, in the interest of helping them to become sustainable, community-driven projects and supporting them afterwards with umbrella services that help them to thrive. From time to time DuraSpace becomes aware of open source software projects in the preservation, archiving, or repository space that are in search of a community “home”. The motivation might be that the project is simply trying to attract more developers, that it would like to develop a more robust community of users and service providers, that its current organizational sponsorship is in question, or that it would like to take advantage of an existing and compatible organization's best practices and administrative infrastructure rather than create a new one of its own. DuraSpace is now prepared to leverage its resources, experience, and reputation in the community to help these projects become, or continue to be, successful. Projects emerging from incubation will become officially recognized as DuraSpace projects. This briefing presents highlights of the DuraSpace Incubator and invites questions and feedback from participants.

== In-browser data storage and me ==

* Jason Casden, North Carolina State University Libraries, jason_casden@ncsu.edu

When it comes to storing data in web browsers on a semi-persistent basis, there are several partially-adopted, semi-deprecated, product-specific, or even universally accepted options. These include models such as key-value stores, relational databases, and object stores. I will present some of these options and discuss possible applications of these technologies in library services. In addition to quoting heavily from Mark Pilgrim's excellent chapter on this topic, I will weave in my own experience utilizing in-browser data storage in an iPad-based data collection tool to successfully improve performance and data stability while reducing network dependence. See also: HTML5.

== Coding for the past, archiving for the future … and the Salman Rushdie Papers ==

* Peter Hornsby, Emory University Libraries, phornsb@emory.edu

Cultural heritage production is moving to the digital medium and libraries use of repository solutions such as Fedora Commons and DSpace are a solid response to this change. But how do we go from, for instance a selection of 90's computing technology to a collection of digital objects ready for ingest into your institution's local repository? Once you have ingested your digital objects how are you going to provide access to these resources? The arrival of the Salman Rushdie Papers, which contain 10 years of Sir Salman Rushdie's digital life, gave Emory University Libraries the opportunity to explore these questions. I would like to to talk about the approach the Emory University Libraries adopted, what we learned and the coding challenges that remain.

== Indexing big data with Tika, Solr & map-reduce ==

* Scott Fisher, California Digital Library, scott.fisher AT ucop BORK edu

* Erik Hetzner, California Digital Library, erik.hetzner AT ucop BORK edu

The Web Archiving Service at the California Digital Library has

crawled a large amount of data, in every format found on the web: 30

TB, comprising about 600 million fetched URLs. In this talk we will

discuss how we parsed this data using Tika and map-reduce, and how we

indexed this data with Solr, tweaked the relevance ranking, and were

able to provide our users with a better search experience.

== ALL TEH METADATAS! or How we use RDF to keep all of the digital object metadata formats thrown at us. ==

* Declan Fleming, University of California, San Diego, dfleming AT ucsd DING edu

What's the right metadata standard to use for a digital repository? There isn't just one standard that fits documents, videos, newspapers, audio files, local data, etc. And there is no standard to rule them all. So what do you do? At UC San Diego Libraries, we went down a conceptual level and attempted to hold every piece of metadata and give each holding place some context, hopefully in a common namespace. RDF has proven to be the ideal solution, and allows us to work with MODS, PREMIS, MIX, and just about anything else we've tried. It also opens up the potential for data re-use and authority control as other metadata owners start thinking about and expressing their data in the same way. I'll talk about our workflow which takes metadata from a stew of various sources (CSV dumps, spreadsheet data of varying richness, MARC data, and MODS data), normalizes them into METS by our Metadata Specialists who create an assembly plan, and then ingests them into our digital asset management system. The result is a [http://dl.dropbox.com/u/6923768/Work/DAMS%20object%20rdf%20graph.png beautiful graph] of RDF triples with metadata poised to be expressed as [https://libraries.ucsd.edu/digital/ HTML], RSS, METS, XML, and opens linked data possibilities that we are just starting to explore.

== HathiTrust Large Scale Search: Scalability meets Usability ==

* Tom Burton-West, DLPS, University of Michigan Library, tburtonw AT umich edu

[http://www.hathitrust.org/ HathiTrust Large-Scale search] provides full-text search services over nearly 10 million full-text books using Solr for the back-end. Our index is around 5-6 TB in size and each shard contains over 3 billion unique terms due to content in over 400 languages and dirty OCR.

Searching the full-text of 10 million books often results in very large result sets. By conference time a number of [http://www.hathitrust.org/full-text-search-features-and-analysis features] designed to help users narrow down large result sets and to do exploratory searching will either be in production or in preparation for release. There are often trade-offs between implementing desirable user features and keeping response time reasonable in addition to the traditional search trade-offs of precision versus recall.

We will discuss various [http://www.hathitrust.org/blogs/large-scale-search scalability] and usability issues including:

* Trade-offs between desirable user features and keeping response time reasonable and scalable

* Our solution to providing the ability to search within the 10 million books and also search within each book

* Migrating the [http://babel.hathitrust.org/cgi/mb personal collection builder application] from a separate Solr instance to an app which uses the same back-end as full-text search.

* Design of a scalable multilingual spelling suggester

* Providing advanced search features combining MARC metadata with OCR

** The dismax mm and tie parameters

** Weighting issues and tuning relevance ranking

* Displaying only the most "relevant" facets

* Tuning relevance ranking

* Dirty OCR issues

* CJK tokenizing and other multilingual issues.

== DMPTool: Guidance and resources to build a data management plan ==

Marisa Strong, California Digital Libary, marisa.strong@ucop.edu

A number of U.S. funding agencies such as the National Science Foundation require researchers to supply detailed plans for managing research data, called Data Management Plans. To help researchers with this requirement, the California Digital Library (CDL) along with several organizations, collaborated to develop the DMPTool. The goal is to provide researchers with guidance, links to resources and help with writing data management plans.

This open-source, Ruby on Rails software tool is hosted on a SLES VM by CDL. The tool is integrated with Shibboleth, federated single sign-on software, which allows users to login via their home institutions. We had a geographically distributed development team sharing their code on Bitbucket.

This talk will demo features of the application, the Shibboleth login architecture, as well as highlight the agile development practices and methods used to successfully design and build the application on an aggressive schedule.

== The Islandora Open Source Framework for Digital Asset Management ==

* Keith Folsom, Orbis Cascade Alliance, kfolsom@uoregon.edu

Managing digital content is a challenging task—becoming even more so

as the volumes and types of content increase at what seems an exponential

rate. Though there are good commercial management systems available,

having competing and potentially more configurable open source options is ideal.

One such option is Islandora—an open source framework that wraps a Drupal

front-end around the Fedora digital object management and storage system.

My talk will serve as an introduction to the Islandora framework—including a

discussion of Fedora’s digital object model and content model architecture;

how Islandora exposes the power of Fedora for storage, discovery, and retrieval

of data; and the wide variety of underlying open source software and technology

that enables the system. I will also give a quick tour of a stock Islandora

installation and provide tips on navigating the documentation for set-up and

use of this powerful framework.

== What do the NISO IOTA OpenURL quality reports tell us about the future of OpenURL linking? ==

* Adam Chandler, Cornell University, alc28@cornell.edu

NISO IOTA (http://openurlquality.niso.org/) is an initiative that makes use of log files from various institutions and vendors to analyze element frequency and patterns contained within OpenURL requests. The reports created from this analysis inform vendors about where to make improvements to their OpenURLs. In this talk, the chair of the IOTA working group will share what the group has learned about the differences in quality across OpenURL sources.

== "CALIL.JP" Open Libraries by web-scraping. - Introducing Library API from Japan ==

* Ryuuji Yoshimoto, Nota Inc. Engineer, ryuuji@notaland.com

I am an engineer at Nota Inc., a start-up company for web services. "CALIL" (http://calil.jp/) is a web service for library users in Japan. (Not only for librarians but also for general patrons.)

CALIL allows users search books from multiple libraries nearby, and get realtime holding status. Our service supports over 5,800 libraries.

CALIL supports public, university, and other many special libraries in Japan. The service can search 88% of collections of all public libraries in Japan.

Public libraries in Japan do not have an unified catalogue like OCLC.

Web OPACs in Japan are generally very slow and their usability is low.

We develop a comprehensive scraping service over 2000 web OPACs and it supports recognize real-time holding status on them as well.

This service can be used as for substitution of OPACs provided by libraries. It provides more useful, speedy and open service.

Our scraping platform also provides API for free.

Any developer can access realtime holding status at almost all the libraries in Japan by one API.

Since the launch in 2010, many apps on iPhone and Android are developed by many third party developers.

And it allows many web service connect to library (book shelf, review etc).

CALIL is written by 100% pure Python and running on Google App Engine.

I will introduce about "CALIL", "CALIL Library API", and its methodology. Open Libraries in Japan to World-Coders!!

== Discovering Digital Library User Behavior with Google Analytics==

* Kirk Hess, Digital Humanities Specialist, University of Illinois Urbana-Champaign, kirkhess@illinois.edu

Digital library administrators are frequently asked questions like "How many times was that document downloaded", or "What’s the most popular book in our collection?" Conventional web logging software, such as AWStats, can only answer those questions some of the time, and there’s always the question of whether or not the data is polluted by non-users, such as spiders and crawlers. Google Analytics, (http://google.com/analytics/ ) , a JavaScript-based solution that excludes most crawlers and bots, shows how users found your site and how they explored it.

The presentation will review tracking search queries, adding events such as clicking external links or downloading files, and custom variables, to track user behavior that is normally difficult to track. We'll also discuss using jQuery scripts to add tracking code to the page without having to modify the underlying web application. Once you've collected data, you may use the Google Analytics API to extract data and integrate it with data from your digital repository to show granular data about individual items in your Digital Library. Finally, we'll discuss how this information allows you to improve the user experience, and summarize some of the research we are doing with our digital repository and the data gathered from Google Analytics.

== Introducing Kuali OLE 0.3==

* Rich Slabach, Quality Assurance Manager, Kuali OLE, rlslabac at indiana dot edu

* Nianli Ma, Technical Architect, Kuali OLE, Indiana University, nianma at indiana dot edu

This research update will feature technical staff from the Kuali Open Library Environment (OLE) project, which is in its second year of building a community-source library management environment. Operating since July 2010, and supported by The Andrew W. Mellon Foundation, Kuali OLE is the one of the largest academic library software collaborations in the United States. In this presentation we will discuss the Kuali OLE Year 2 Roadmap as well as key components of the system architecture, additionally we will demonstrate our Kuali OLE 0.3 release from November 2011 with our cloud-based test drive implementation and our well documented driver's manual. This will lead to a better understanding of how this code base could support library management at your home institution.

We will also discuss opportunities for engagement with Kuali OLE and for adoption and use of the software as well as hear more about our plans for long-term sustainability. For more on our current software see - https://wiki.kuali.org/display/OLE/OLE+and+Docstore+Server+Installation

== UDFR: Building a Registry using Open-Source Semantic Software ==

* Stephen Abrams, Associate Director, UC3, California Digital Library, stephen.abrams AT ucop DING edu

* Lisa Dawn Colvin, UDFR Project Manager, California Digital Library, lisa.colvin AT ucop DING edu

Fundamental to effective long-term preservation analysis, planning, and intervention is the deep understanding of the diverse digital formats used to represent content. The Unified Digital Format Registry project (UDFR, https://bitbucket.org/udfr/main/wiki/Home) will provide an open source platform for an online, semantically-enabled registry of significant format representation information.

We will give an introduction to the UDFR tool and its use within a preservation process.

We will also discuss our experiences of integrating disparate data sources and models into RDF: describing our iterative data modeling process and decisions around integrating vocabularies, data sources and provenance representation.

Finally, we will share how we extended an existing open-source semantic wiki tool, OntoWiki, to create the registry.

== Sirsi Symphony: Developing a "web service" to provide real time bibliographic information to Blacklight. ==

* John Pillans, Enterprise Software, Library Systems, Configuration Manager Kuali OLE, Indiana University, jpillan@indiana.edu

Indiana University Libraries is currently in the process of implementing Blacklight as its discovery layer on top of Sirsi Symphony. One aspect of Blacklight that must be developed locally is providing circulation status and holdings information to the user. We have developed a "web service" which provides the bibliographic data, formatted MARC holdings data (if present), and item data with current circulation information to the Blacklight system in XML.

== Open Sourcing the Dream: Making the Read/Write Library ==

* Margaret Heller, Read/Write Library Chicago and Dominican University, mheller@dom.edu

You met the Chicago Underground Library last year, now meet The Read/Write Library Chicago.It's a new name, a new space, and new opportunities to develop our catalog. We are working on creating the open source version of our ideas with a distributed team of interested volunteers, plus experimenting with innovative partnerships with the Chicago technology community. This talk will share what the team and open source project look like, what we are doing with our data, and how we finally learned to stop worrying and love Git.

== Interactive maps: an easy-to-maintain and scalable approach ==

* Mariela Gunn, Oakland University, gunn@oakland.edu

Developing interactive maps of a library building presents a unique challenge in an institution with limited web services personnel. Our technical expectations are high: we want the maps to have engaging interactivity, to be modular so we can link to different services represented in them, and to be scalable so that we can integrate data-driven elements. Our content needs are ever changing: we want to have distributed authorship of content through a user-friendly interface that can be used by all librarians without a steep learning curve.

This talk will focus on the design of interactive maps by a group of our undergraduate student interns who selected a web application -- Maps Alive -- for the task with ease of use and scalability in mind and set up a structure that can grow and change. The pros and cons of the application will be discussed, as well as tips on how to evaluate potential tools and make the best use of them through a modular and flexible approach to interactive maps. Involving students as designers and decision-makers in technology-related projects will be highlighted too.

== Getting the Content out of CONTENTdm: Building a Modular UI Template for Digital Collections ==

* Devin Becker, University of Idaho, dbecker@uidaho.edu

With the advent of iterations 6 and 6.1, CONTENTdm redesigned the basic user interfaces for individual collections, improving on what was already a robust and reliable system for archiving and displaying digital items. The majority of the items in these collections, however, still rarely see the light of a user's screen. Moreover, the typical modes for browsing these collections within the system are geared primarily to those who are already familiar with such systems or who have a specific need to see certain items.

To invite more casual browsing and easier discovery of our collections, the University of Idaho Library's Digital Initiatives department designed a scalable and modular interface for all of our collections with an increased emphasis on the time, location, and larger display of our images and other digital items. To do so, we used free and easy-to-use Javascript libraries and online applications (including Jquery, Google Fusion Tables, Simile Timeline, ImageFlow, and Tagcrowd.com), together with several, simple XSL stylesheets that utilize the metadata and persistent linking capabilities of the CONTENTdm database, to design a basic template with several browsing options (timeline, map, tag cloud, etc.) that can be used for any collection. This talk will detail the coding, methods, and metadata implemented for the redesign.

== saveMLAK: How Librarians, Curators, Archivists and Library Engineers Work Together with Semantic MediaWiki after the Great Earthquake of Japan ==

* Yuka Egusa, Senior Researcher of National Institute of Educational Policy Research, yuka_at_nier.go.jp

* Makoto Okamoto, Chief Editor of Academic Resource Guide (ARG), arg.editor_at_gmail.com

In March 11th 2011, the biggest earthquake and tsunami in the history attacked a large area of northern east region of Japan. A lot of people have worked together to save people in the area. For library community, a wiki named "savelibrary" was launched for sharing information on damages and rescues on the next day of the earthquake. Later then people from museum curators, archivists and community learning centers started similar projects. In April we joined to a project "saveMLAK", and launched a wiki site using Semantic MediaWiki under http://savemlak.jp/.

As of November 2011, information on over 13,000 cultural organizations are posted on the site by 269 contributors since the launch. The gathered information are organized along with Wiki categories of each type of facilities such library, museum, school, etc. We have held eight edit-a-thons to encourage people to contribute to the wiki.

We will report our activity, how the libraries and museums were damaged and have been recovered with lots of efforts, and how we can do a new style of collaboration with MLAK community, Wiki and other voluntary communities at the crisis.

== Kill the search button II - the handheld devices are coming ==

* Jørn Thøgersen, Statsbiblioteket/State and University Library, Aarhus, Denmark. jt@statsbiblioteket.dk

* Michael Poltorak Nielsen, Statsbiblioteket/State and University Library, Aarhus, Denmark. mn@statsbiblioteket.dk, (aka the Danes - some of them).

Web based library search engines are traditionally operated using keys, input fields, buttons, and links. Being equipped with touch screens, accelerometers, GPS's, and cameras, smartphones and tablets offer a whole new range of input options.

In this talk we'll demonstrate some of our ideas of how to

utilise these new input options interacting with a search engine. The basic idea is to have no traditional GUI input elements, but only use touch interactions (pinch, zoom, swipe, long-press, etc) and gestures (shake, tilt, turn, etc.). Using these interactions, we’ll demonstrate how to:

* do searches

* toggle search result views

* switch pages

* request materials, add to favourites

* interact with your stuff, renew items

We'll also show you some (conceptual) ideas about using the device camera for locating and checking out materials.

On a general level, what we are trying to achieve is a move away from a web based paradigm and establish new ways of interaction better suited to the new devices and on their own terms. The demonstration will feature working mobile prototypes including both native apps (iPhone) and web apps. In both cases they will run on live data from our OPAC on www.statsbiblioteket.dk/search/

This talk is actually also a continuation of our Code4Lib 2010 talk called "Kill The Search Button" (http://code4lib.org/conference/2010/schedule), which we unfortunately never got around to do, due to a Danish blizzard.

==Speaking in code: talking tech with humans (and librarians)==

* Erin White, Virginia Commonwealth University Libraries, erwhite@vcu.edu

We do awesome work, right? But what's the best way to communicate that work with non-geek stakeholders within our organizations? I'll present some ideas on how to communicate tech with those who don't always speak the language fluently. This'll include pitching new projects; communicating about existing projects; and dealing with project maintenance and problem-solving. I'll share some tips for explaining systems changes and problems, how to use help tickets as teachable moments for you or librarians, updating documentation, etc.

== Building a Code4Lib 2012 Conference Mobile App with the Kuali Mobility Framework ==

* Michelle Suranofsky, Lehigh University, michelle dot suranofsky at lehigh dot edu

* Tod Olson, University of Chicago, tod at uchicago dot edu

Hot off the heals of the Kuali Days 2011 Conference, we thought it would be fun to take the newly released Kuali Mobility for Enterprise framework for a test drive by creating a Code4Lib Conference Mobile App.

[http://kuali.org/mobility Kuali Mobility for Enterprise (KME)] is an open source framework for developing and deploying applications to connect mobile devices to an institution's information resources. Applications may be deployed as mobile websites or as installable apps. The KME framework makes heavy use of HTML5, CSS, and Javascript, and builds on other open source projects like PhoneGap and JQuery Mobile.

We will discuss the mechanics of the Kuali Mobility framework along with the experience using it to create a mobile app. for the Code4Lib conference.

== The ARCHIVEMATICA digital preservation system ==

* Peter Van Garderen, Archivematica Project Manager, [http://artefactual.com Artefactual Systems], peter at artefactual dot com

* Courtney Mumma, Archivematica Community Manager, courtney at artefactual dot com

The open source (AGPL3) [http://archivematica.org Archivematica] digital preservation system uses a micro-services architecture to integrate a suite of Linux utilities into workflow pipelines. It is designed as a backend tool for archivists and librarians managing digital collections and digital preservation responsibilities. We use Google Gearman for job scheduling and load balancing as well as Django (python) for a web-based administration interface that monitors and controls the processing of files in the pipelines. The system creates standards-compliant (e.g. METS, PREMIS, Bagit) archival packages as well as a registry interface to monitor format policies. This system is designed to provide the technical component for ISO 14721 (OAIS) and ISO 16363 (TRAC) compliant Trusthworthy Digital Repositories. The recent 0.8 release is the last alpha. Over winter 2012 we are continuing with scalability testing and tuning, adding ElasticSearch indexing, SWORD deposit support, interfaces for Dspace, ContentDM, XTF; all for inclusion in the 0.9-beta release sometime in Spring 2012. The presentation will give a quick demo of Archivematica's features as well as discuss technical architecture, APIs, development roadmap, user base, community building, project management, etc.

== Virtual Integrated Search - on-the-fly merging of relevancy ranked searches ==

* Mads Villadsen, The State and University Library Denmark, mv@statsbiblioteket.dk

What do you do when you have an integrated search system and the users want data at the article level? What we did was to try and get the data from the publishers - and when that failed we went with Summon for the article data while keeping our bibliographic records (and more) in our own system.

So how’s that working out for us?

We didn’t want to give up on our overall goal of having a single unified result set which meant we had to do something out of the ordinary.

We struck a deal with Serials Solutions that allowed us to apply our technical know-how and sprinkle fairy dust on our queries thereby achieving a proper relevancy ranked merging of results from our own index with the results from Summon. We gave a lightning talk about some of these ideas at last year's code4lib.

We have been running this "Virtual Integrated Search" in production since August and the end users haven't come at us with their pitch forks yet so we assume they are still able to find what they are looking for.

Just to be sure we will be performing a usability test in November 2011 that will hopefully guide our future development.

I will cover what goes into making fairy dust ("how it works", "what doesn't work") as well as some of the results from the usability test ("does it actually work?").

http://www.statsbiblioteket.dk/search/

== Kuali Rice and preparing for OLE ==

* Tod Olson, University of Chicago, tod at uchicago dot edu

* Michelle Suranofsky, Lehigh University, michelle dot suranofsky at lehigh dot edu

Kuali Rice provides some of the fundamental underlying services for Kuali OLE and other Kuali software, services such as workflows, a service bus, integration with campus identity management, and more. In preparation for OLE, some partner libraries are developing their own simple Rice-base applications to provide some useful automation now while gaining experience that will prepare us for running Rice as part of OLE. This talk will give a brief overview of Kuali Rice and then discuss the construction of a real-but-simple Rice application.

== Argo and DOR Services: The developer and administrative interfaces to Stanford's Digital Object Registry ==

* Michael B. Klein, Library Infrastructure Engineer, Stanford University Libraries, mbklein at stanford dot edu

Argo is the administrative interface for Stanford's Digital Object Registry (DOR), the central repository of information about digital assets owned or managed by Stanford University Libraries and Academic Information Resources (SULAIR). Built on Blacklight, with help from other pieces of the Hydra repository framework, Argo provides a top-down, source-independent, application-agnostic view of items working their way through various stages of registration, submission, description, digitization, accessioning, publication, shelving, and preservation.

Argo's functionality is provided through three separate layers:

* A traditional web application, which provides UI-based bulk and individual item registration, management, and reporting functions

* A web service, which provides RESTful access to several of the same functions

* A DOR services Ruby gem which opens most of this functionality to other Ruby code, from Rails applications to accessioning daemons to one-off scripts

This presentation will explore Argo's full stack, from the underlying DOR Services gem (encapsulating a number of other disparate library infrastructure functions) to its use by SULAIR developers, contractors, digitization lab staff, project managers, and SULAIR technical staff.

== The Way to Bulid C4L Activities in Your Homeland - Based on the Experience of Code4Lib JAPAN. ==

* Makoto Okamoto, Chief Editor of Academic Resource Guide (ARG) and Executive Officer of Code4Lib JAPAN, arg.editor_at_gmail.com

In August 2010, We launched the "Code4Lib JAPAN", a kind of local activities of Code4Lib in JAPAN after preparation for 6 months. Since then, Code4Lib JAPAN did a great sucess and growth. Approximately, activities of Code4Lib JAPAN are divided into 4 parts like operation of orgnization and activities, offer training program, proposing some guidelines, dispatching a mission to Code4Lib Conference and selection of good practice.

In this presentation, some key facters of our sucess and growth will be explained by Executive Officer of Code4Lib JAPAN. Those key facters like getting money from outside grant, indutrial sponsers and personal supporters, operation of orgnization and activities on a self-supporting basis will be very helpful for those who are wishing to launch local activitiy in their homeland. We can offer variuus tiips to spread value and activities of Code4Lib in the world.

== The Golden Road (To Unlimited Devotion): Building a Socially Constructed Archive of Grateful Dead Artifacts ==

* Robin Chandler, University of California (Santa Cruz), chandler [at] ucsc [dot] edu

* Susan Chesley Perry, University of California (Santa Cruz), chesley [at] ucsc [dot] edu

* Kevin S. Clarke, University of California (Santa Cruz), ksclarke [at] ucsc [dot] edu

The Grateful Dead Archive at the University of California (Santa Cruz) is a collection of over 600 linear feet of material, including: business records, photographs, posters, fan envelopes, tickets, video, audio (oral histories, interviews and music) and 3-d objects such as stage props and band merchandise. In addition, with the release of the ''Grateful Dead Archive Online'' website in 2012, the Archive will start actively collecting artifacts from an enthusiastic community of Grateful Dead fans.

This talk will discuss the challenges of merging a traditional archive with a socially constructed one. We will also present the first round of development and explain how we're using tools like Omeka, ContentDM, UC3 Merritt, djatoka, Kaltura, Google Maps, and Solr to lay the foundation for a robust and engaging site. Future directions, like the integration/development of better curation tools and what we hope to learn from opening the archive to contributions from a large community of fans, will also be discussed.

== Library News - A gathering place for library and tech news, and more ==

* Matt Phillips, Harvard Library Innovation Lab, mphillips@law.harvard.edu

[http://news.librarycloud.org Library News] is gathering place for people to share and discuss news from the technology and library worlds. Think [http://news.ycombinator.com Hacker News], but for library dorks instead of startup dorks.

Library News is more than a news and discussion site, it analyzes submitted links and shares its observations. One example of this sharing is the exposure of popular blogs: Library News tracks submitted blog entries and tallies them up, creating a list of most popular blogs in the community. This most popular list is exposed as an HTML document and as an [http://en.wikipedia.org/wiki/OPML OPML] download (The OPML file can be loaded directly into an RSS reader and be used as an always up-to-date "starter pack" of popular blogs in the library and tech spaces).

My rough talk outline:

* Demo Library News

* Present how Library News goes beyond normal discussion sites (the tools that allow to explore community submitted links)

* Discuss where Library News fits with the current library news ecosystem

Find more information about Library News at the [http://news.librarycloud.org/faq Library News FAQ]

== Data-Mining Repository Contents to Auto-populate Scholarly Research Repository Submission Metadata ==

* Mark Diggory, Head of U.S. Operations

The existing body of Open Access scholarly research is a well classified and described dataset. However, in Institutional Repositories it can be the case that there are insufficient resources to invest for cataloging and maintaining rich metadata descriptions of contributed content. This is especially the case when collections are populated and maintained by non-librarians. A great deal of classifiable detail preexists within files that are submitted to scholarly repositories. Utilizing existing Open Source technologies capable of extracting this information, a process can be provided to submitters and repository maintainers to suggest appropriate subject classifications and types for descriptive metadata during submission and update of repository items. This talk will provide an overview of an approach for utilizing machine learning as a tool for the auto population of subject classifications and content types.

== Mining Wikipedia for Book Articles ==

* Paul Deschner, Harvard Library Innovation Lab, deschner@law.harvard.edu

Suppose you were developing a browsing tool for library materials and wanted to include Wikipedia articles and categories whenever available -- how would you do it? There is no API or other data service which one can use to get a comprehensive listing of every page in Wikipedia devoted to the discussion of a book.

This talk will focus on the tools, workflows and data sources we have used to approach this problem. Tools and workflows include the use of Infobox ISBN's and other standard identifiers, analysis of Wikipedia categories and category hierarchies, exploitation of article abstracts and titles, and Mechanical Turk resources. Data sources include Dbpedia triple stores and Wikimedia XML/SQL dumps. So far, we have harvested around 60,000 book articles. This is an exploration in dealing with open, relatively unstructured Web content, and in aggregating answers to the same question using quite diverse techniques.

[[Category: Code4Lib2012]]

[[Category:Talk Proposals]]

← Older edit

Anarchivist

224

edits