Code4Lib - User contributions [en]

2014 preconference proposals

2013-12-09T19:51:11Z

Ryscher: /* Managing Projects: Or I'm in charge, now what? (aka PM4Lib) */

= PROPOSALS ARE CLOSED : PLEASE DO NOT ADD NEW PRECONFERENCES TO THIS PAGE =

Proposals were accepted through December 6th, 2013.

It would be really, super duper helpful if folks who think they might want to attend a pre-conference could indicate interest by adding your name to a session below.

===Note===
Attendance at a pre-conference will require a small fee ''due at the time of conference registration".

Although this was specified in the email announcements relating to pre-conferences, it was not added to this page until December 2nd. I (Adam C.) apologize for the omission and I hope this will not cause any "sticker shock." Putting your name on this list does not incur any obligation on your part, but we'll be using it to gauge interest and work out room assignments.

Please put your pre-conference on the list in the following format:

=Code4Lib 2014 Pre-Conference Proposals=

===Drupal4lib Sub-con Barcamp===
=====Full Day=====

* Contact [[User:highermath|Cary Gordon]], cgordon@chillco.com

This will be a full day of self-selected barcamp style sessions. Anyone who wants to present can write down the topic on an index card and, after the keynote, we will vote to choose what we want to see. Attendees can also pick a topic and attempt to talk someone else into presenting on it.

This event is open to the library community. There will be a nominal fee (t/b/d) for non-Code4LibCon attendees (subject to organizer approval).

[[resources to help you learn drupal]]

====Interested in Attending:====

=====All Day=====

* Renna Tuten

=====Morning=====

* Kevin Reiss
* Charlie Morris (NCSU) - glad to see this again this year!

=====Afternoon=====

 
----

===Open Refine Hackfest===
'''"Half-Day"'''
* Contact [[User:bibliotechy|Chad Nelson]], chadbnelson@gmail.com

[http://openrefine.org/ Open Refine] is a powerful open source tool for wrangling messy data that can also be used to help in the creation of Linked Data via the [https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API Reconciliation API]. It is possible to write reconciliation services against API's, like the [http://iphylo.blogspot.com/2013/04/reconciling-author-names-using-open.html VIAF service] or, even just against local authority files for helping maintain authority control

The session would first introduce Open Refine, then walk through building a reconciliation service, and the rest of the session would be a hackfest where we build new reconciliation services for public consumption or local use.

''Interested in Attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here
<ol>
<li>Adam Constabaris
<li>Ray Schwartz
<li>Jason Stirnaman
<li>Joshua Gomez
<li>Sam Kome
</ol>
----

===Responsive Design Hackfest===
'''"Half-Day [Afternoon]"'''
* Contact Jim Hahn, University of Illinois, jimfhahn@gmail.com
* Contact David Ward, University of Illinois, dh-ward@illinois.edu

This structured hackfest will give attendees an opportunity to explore methods to create responsive mobile apps using the Bootstrap framework [http://getbootstrap.com/]and a set of APIs for accessing library data. We will start with an API template for creating space-based mobile tools that draw from work coming out of the IMLS funded Student/Library Collaborative grant [http://www.library.illinois.edu/nlg_student_apps]. Available APIs will include a room reservation template and codebase for implementing at any campus and the set of Minrva catalog APIs generating JSONP [http://minrvaproject.org/services.php].

Hosts will give a brief report of a study on student hacking projects and interests in mobile library apps that are the basis for the templates utilized in this Hackathon. By the end of the pre-conference attendees will have a sample responsive mobile web app in Bootstrap 3 to bring back to their campus which can plug into their site-based content.

''Interested in Attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here
----

=== Intro to Blacklight ===
'''"Half-Day [Morning]"'''
* Contact: Chris Beer, Stanford University, cabeer@stanford.edu
* TA: Bess Sadler, Stanford University, bess@stanford.edu

This session will be walk-through of the architecture of Blacklight, the community, and an introduction to building a Blacklight-based application. Each participant will have the opportunity to build a simple Blacklight application, and make basic customizations, while using a test-driven approach.

For more information about Blacklight see our wiki ( http://projectblacklight.org/ ) and our GitHub repo ( https://github.com/projectblacklight/blacklight ). We will also send out some brief instructions beforehand for those that would like to setup their environments to follow along and get Blacklight up and running on their local machines.

''Interested in Attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here

# Megan Kudzia
# Bret Davidson
# Coral Sheldon-Hess
# Cory Lown
----

===Blacklight Hackfest===
'''"Half-Day [Afternoon]"'''
* Contact Chris Beer, Stanford University, cabeer@stanford.edu

This afternoon hackfest is both a follow-on to the Intro to Blacklight morning session to continue building Blacklight-based applications, and also an opportunity for existing Blacklight contributors and members of the Blacklight community to exchange common patterns and approaches into reusable gems or incorporate customizations into Blacklight itself.

For more information about Blacklight see our wiki ( http://projectblacklight.org/ ) and our GitHub repo ( https://github.com/projectblacklight/blacklight ).

''Interested in Attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here

# Shaun Ellis
# Kevin Reiss
# Megan Kudzia
# Erik Hatcher

----
===RailsBridge: Intro to programming in Ruby on Rails===
'''"Half-Day" [morning]'''
* Contact Justin Coyne, Data Curation Experts, justin@curationexperts.com

Interested in learning how to program? Want to build your own web application? Never written a line of code before and are a little intimidated? There's no need to be! RailsBridge is a friendly place to get together and learn how to write some code.

RailsBridge is a great workshop that opens the doors to projects like Blacklight and Hydra.

''Interested in Attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here

1. Ayla Stein

2. Heidi Dowding

----
===Managing Projects: Or I'm in charge, now what? (aka PM4Lib)===
'''Full-Day'''

Contact:
* [[User:rosy1280|Rosalyn Metz]], rosalynmetz@gmail.com
* [[User:yoosebj|Becky Yoose]], yoosebec@grinnell.edu

This will be a full day session on project management. We'll cover
* '''Kicking off the Project''' -- project lifecycle, project constraints, scoping/goals, stakeholders, assessment
* '''Planning the Project''' -- project charters, work breakdown structures, responsibilities, estimating time, creating budgets
* '''Executing the Project''' -- status meeting, status reports, issue management
* '''Finishing the Project''' -- achieving the goal, post mortems, project v. product
This is a revival of rosy1280's LITA Forum Pre-Conference, but better (because iteration is good) and adapted to c4lib types.

''Interested in Attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here

# Robin Dean
# Erin White
# Andrew Darby
# Sam Kome
# Ryan Scherle
----

===Fail4Lib 2014===
'''Half Day [TBD, probably afternoon]'''

Contacts:
* Andreas Orphanides, akorphan (at) ncsu.edu
* Jason Casden, jmcasden (at) ncsu.edu

The task of design (and the work that we do as library coders) is intimately tied to failure. Failures, both big and small, motivate us to create and improve. Failures are also occasionally the result of our work. Understanding and embracing failure, encouraging enlightened risk-taking, and seeking out opportunities to fail and learn are essential to success in our field. At Fail4Lib, we'll talk about our own experiences with projects gone wrong, explore some famous design failures in the real world, and talk about how we can come to terms with the reality of failure, to make it part of our creative process -- rather than something to be feared.

The schedule may include the following:

* Case studies. We'll look at some classic failures from the literature: What can we learn from the mistakes of others?
* Confessionals, for those willing to share. Talk about your own experiences with rough starts, labor pains, and doomed projects in your own work: What can we learn from our own (and each others') failures?
* Group therapy. Let's talk about how to deal with risk management, failed projects, experimental endeavors, and more: How can we make ourselves, our colleagues, and our organizations more fault tolerant? How do we make sure we fail as productively as possible?

''Interested in attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here

#Bret Davidson
#Mike Graves
#Ray Schwartz
#Jason Stirnaman

----
===CLLAM @ code4lib===
'''(Computational Linguistics for Libraries, Archives and Museums)'''

'''Full Day'''

Contacts:
* Douglas W. Oard (primary), oard (at) umd.edu
* Corey Harper, corey (dot) harper (at) nyu.edu
* Robert Sanderson, azaroth42 (at) gmail.com
* Robert Warren, rwarren (at) math.carleton.ca

We will hack at the intersection of diverse content from Libraries, Archives and Museums and bleeding edge tools from computational linguistics for slicing and dicing that content. Did you just acquire the email archives of a startup company? Maybe you can automatically build an org chart. Have you got metadata in a slew of languages? Perhaps you can search it all using one query. Is name authority control for e-resources getting too costly? Let’s see if entity linking techniques can help. These are just a few teasers.

There’ll be plenty of content and tools supplied, but please bring your own [data] too -- you’ll hack with it in new ways throughout the day. We’ll get started with some lightning talks on what we’ve brought,then we’ll break up into groups to experiment and work on the ideas that appeal. Three guaranteed outcomes: you’ll walk away with new ideas, new tools, and new people you’ll have met.

''Interested in attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here

# Devon Smith
# Kevin S. Clarke
# Jason Stirnaman
# Joshua Gomez
----

=== GeoHydra: Managing geospatial content ===

'''Half-day [Afternoon]'''

* Contact: Darren Hardy, Stanford University, drh@stanford.edu
* Moderator: Bess Sadler, Stanford University, bess@stanford.edu

Do you have digitized maps, GIS datasets like Shapefiles, aerial photography,
etc., all of which you want to integrate into your digital repository? In this
workshop, we will discuss how Hydra can provide discovery, delivery, and
management services for geospatial assets, as well as solicit questions about
your own GIS projects. We aim to help answer the following questions you might have about putting geospatial data into your Hydra-based digital library:

* What are the types of geospatial data?
* How to dive into Hydra?
* How to model geospatial holdings with Hydra?
* How to discover and view geospatial data?
* How to build a geospatial data infrastructure?
* What are common approaches and problems?

''Interested in Attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here

# Esmé Cowles
----

===Technology, Librarianship, and Gender: Moving the conversation forward===
'''Full Day'''

Contact: Lisa Rabey lisa @ biblyotheke dot net | [http://twitter.com/pnkrcklibrarian @pnkrcklibrarian]

'''Description'''

Librarianship is largely made up of women, yet women are significantly underrepresented in tech positions, on any level, within libraries themselves. Why? What are we doing to encourage women to become more involved in STEM within librarianship? What kind of message are we sending when library technology keynotes remain almost resolutely male? How are we changing the face of technology, not only within libraries, but with the field itself? How are we training our staff and colleagues in the areas of fairness and removal of bias? Our vendors?

Lots of tough questions.

While the conversation has been going on via various blogs and articles within the last few years, it was given a public face at [http://infotoday.com/il2013/day.asp?day=Monday#session_D105 Internet Librarian 2013] where a panel of 7 (four women, three men) gave personal experiences on the above and then opened up the conversation to the audience. As eye opening and enriching the conversation was, a 45 minute panel was not enough. One thing remains clear: We need to keep the conversation moving forward and start making some radical changes in the way we think, act, and how we need to harness this to start making real changes within librarianship itself.

Topics to include: Fairness, bias, impostor syndrome, code of conducts, sexual harassment, training opportunities, support systems, mentoring, ally support, and more

Those attending should expect: Begin with opening up the conversation of experiences and talking about what is most needed, spending remaining time putting together live, usable solutions to start implementing as well as pushing the conversation forward at local levels

''Interested in Attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here

=====All Day=====
1. Kate Kosturski

2. Valerie Aurora

3. Declan Fleming (I'd be good with a half day too)

=====Morning=====
1. Shaun Ellis

2. Jason Casden

=====Afternoon=====
1. Ayla Stein

2. Heidi Dowding

3. Coral Sheldon-Hess

4. Cory Lown
----

===FileAnalyzer: Rapid Development of File Manipulation Tasks===
'''"Half-Day" [morning]'''
* Contact Terry Brady, twb27@georgetown.edu

The FileAnalyzer (https://github.com/Georgetown-University-Libraries/File-Analyzer) is an application designed to solve a number of library automation challenges:

* validating digitized and reformatted files
* validating vendor statistics for counter compliance
* preparing collections of digital files for archiving and ingest
* manipulating ILS import and export files

The File Analyzer application was used by the US National Archives to validate 3.5 million digitized images from the 1940 Census. After implementing a customized ingest workflow within the File Analyzer, the Georgetown University Libraries was able to process an ingest backlog of over a thousand files of digital resources into DigitalGeorgetown, the Libraries’ Digital Collections and Institutional Repository platform. Georgetown is currently developing customized workflows that integrate Apache Tika, BagIt, and Marc conversion utilities.

The File Analyzer is a desktop application with a powerful framework for implementing customized file validation and transformation rules. As new rules are deployed, they are presented to users within a user interface that is easy (and powerful) to use.

The first half of this session will be targeted to potential users and developers. The second half of the session will be targeted towards developers who are interested in developing custom rules for the application.

''Session Overview''
* Overview of the application
* Running sample file tests/transformations through the application
* Compiling and building the application
* Coding a custom file processing task

''Interested in Attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here

#Ray Schwartz
----

===Collecting social media data with Social Feed Manager===
'''Half-Day [Morning]'''

Contacts:
* Dan Chudnov, GW Libraries, dchud (at) gwu.edu
* Dan Kerchner, GW Libraries, kerchner (at) gwu.edu
* Laura Wrubel, GW Libraries, lwrubel (at) gwu.edu

Social media data is a popular material for research and a new format for building collections. What does it take to collect meaningfully from Twitter, Tumblr, YouTube, Weibo, Facebook, and other sites? We will:
* Introduce options for collections, including both high- and low-end commercial offerings. Discuss what it means to collect these resources, covering boundaries, policies, and workflows required to develop a social media collection program in your institution.
* Explore the Twitter API in depth, with hands-on opportunities for those w/laptops and others who want to team up w/them
* Help you get started using the free [http://gwu-libraries.github.io/social-feed-manager Social Feed Manager] (SFM) app we're developing at GW to create your first collections. We’ll demo its use and demo a clean install (those w/environments can follow along)

''Interested in Attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here

# Declan Fleming
# Esmé Cowles
# Jason Stirnaman
# Ray Schwartz
----

=== Intro to Git ===
'''"Half-Day [tbd - probably afternoon]"'''
* Contact: Erin Fahy, Stanford University, efahy at stanford.edu
* TA: Michael Klein, Northwestern University, michael.klein at northwestern.edu

This session will cover the fundamentals of git by discussing/going through (time allowing):
* what is a distributed version control system
* what is git and github
* initializing a repo on a remote server/github
* cloning an existing repo
* creating a branch
* contributing code to a repo
* how to handle merge conflicts

''Interested in Attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here

# Ray Schwartz
# Sam Kome
----

=== Archival discovery and use ===
'''Full Day'''

Contacts:
* Tim Shearer, UNC Chapel Hill, tshearer at email.unc.edu,
* Will Sexton, Duke, will.sexton at duke.edu

This is a full day pre-conference about archival collections and will cover the intersections of archives, workflows, technologies, discovery, and use.

Morning agenda: focused talks around (but not limited to) issues such as:
* Crowd-sourcing description to enhance collecitons
* Linked data and authority
* Mass digitization and sustainable workflows
* Digitized objects in context (images and other objects in finding aids)
* Too many cooks in the kitchen: versioning
* Global-, intra-, and inter- discovery of archival materials via finding aids
* and more...

Afternoon agenda: Focused talks around specific tools followed by general discussion, connections, opportunities, aspirations, and planning.

Tool examples:
* Archivespace
* STEADy
* "RAMP" (Remixing Archival Metadata Project)
* OpenRefine
* Aeon

''Interested in Attending''

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here

Morning:
* your name

Afternoon:
* your name

All day:

# Josh Wilson
# Sam Kome

----

===AV Content Slam===
'''Half-Day [morning]'''
Contacts:
* Kara Van Malssen, kara (at) avpreserve.com
* Lauren Sorenson, laurens (at) bavc.org
* Steven Villereal , villereal (at) gmail.com
A morning BarCamp/unconference for practitioners and coders who work with audiovisual content. The agenda will be attendee-driven, with a focus on sharing, synthesizing, and improving workflow strategies and documentation for software-based approaches to wrangling and providing access to audio and video content.
Possible topics of discussion might include:
* Use of format id and characterization/metadata extraction tools for AV
* Creating and using time-based metadata
* Managing (moving, fixity checking, etc) massive files (like uncompressed video)
For a better idea of the topics and concerns that have informed some past AV-themed events, check out the event wikis for [http://wiki.curatecamp.org/index.php/CURATEcamp_AVpres_2013 CURATEcamp AVpres 2013] as well as the [http://wiki.curatecamp.org/index.php/Association_of_Moving_Image_Archivists_%26_Digital_Library_Federation_Hack_Day_2013 AMIA/DLF 2013 Hack Day] .

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here:

----

===OCLC Web Services Hackfest===

"Half-Day" [afternoon]

Contact: Shelley Hostetler, Community Manager, Developer Network hostetls[at]oclc.org

This half-day hackfest will explore some of the OCLC Developer Network web services. We will provide an overview of some of the common topics such as the general REST-based architecture for most services and how to use some new authentication clients. The group can then decide to take a deep dive into a particular API and/or write a client library for the community.

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here:

----

===Obey the Testing Goat!: Test Driven Web Development From The Ground Up===
'''Half-Day [tbd - probably afternoon]'''
* Contact [[User:Mredar|Mark Redar]], mredar[at]gmail.com

Test driven development is a proven method for producing better quality code. But I've found it hard to follow a strict TDD methodology when starting new web projects. How do you write that first test when there is no code or web pages created yet.

In this session, we will follow the excellent book [http://shop.oreilly.com/product/0636920029533.do "Test-Driven Web Development with Python"] to create a simple web site in Django following TDD from the first character typed. Come ready to code and test. No prior knowledge of python or Django required.

By the end of this session, you should be able to [http://www.obeythetestinggoat.com/ "Obey the Testing Goat"] from the start to finish for your next project.

If you would be interested in attending, please indicate by adding your name (but not email address, etc.) here:

# Charlie Morris (NCSU)
# Jason Stirnaman
# Joshua Gomez
----

===Summon Camp===
Placeholder by Tim McGeary for Gillian Cain (Serials Solutions)
Description to be provided by Gillian after account issues resolved.

----

[[:Category:Code4Lib2014]]

2014 Prepared Talk Proposals

2013-11-08T20:34:06Z

Ryscher: /* Identifiers, Data, and Norse Gods */

'''Proposals for Prepared Talks:'''

Prepared talks are 20 minutes (including setup and questions), and should focus on one or more of the following areas:

* ''Projects'' you've worked on which incorporate innovative implementation of existing technologies and/or development of new software
* ''Tools and technologies'' – How to get the most out of existing tools, standards and protocols (and ideas on how to make them better)
* ''Technical issues'' - Big issues in library technology that should be addressed or better understood
* ''Relevant non-technical issues'' – Concerns of interest to the Code4Lib community which are not strictly technical in nature, e.g. collaboration, diversity, organizational challenges, etc.

'''To Propose a Talk'''
* Log in to the wiki in order to submit a proposal. If you are not already registered, follow the instructions to do so.
* Provide a title and brief (500 words or fewer) description of your proposed talk.
* If you so choose, you may also indicate when, if ever, you have presented at a prior Code4Lib conference. This information is completely optional, but it may assist us in opening the conference to new presenters.

As in past years, the Code4Lib community will vote on proposals that they would like to see included in the program. This year, however, only the top 10 proposals will be guaranteed a slot at the conference. Additional presentations will be selected by the Program Committee in an effort to ensure diversity in program content. Community votes will, of course, still weigh heavily in these decisions.

Presenters whose proposals are selected for inclusion in the program will be guaranteed an opportunity to register for the conference. The standard conference registration fee will still apply.

''Proposals can be submitted through '''Friday, November 8, 2013, at 5pm PST'''''. Voting will commence on November 18, 2013 and continue through December 6, 2013. The final line-up of presentations will be announced in early January, 2014.

'''Talk Proposals'''

==Creating a new Greek-Dutch dictionary==
* Caspar Treijtel, University of Amsterdam, c.treijtel@uva.nl

At present, no complete dictionary of (ancient) Greek-Dutch is available online. A new dictionary is currently under construction at Leiden University, with software being developed at the University of Amsterdam. The team in Leiden has already begun preparation of the data, with at this moment about 6,000 approved lemmas. The ultimate goal is to produce both a print version and online open access version from the same source documents. The software needed for this has been made in a project that was funded by CLARIN-NL.

Migrator

For the production of lemmas we have implemented an advanced workflow. The (generally non-technical) users create lemmas using MS Word, which is both familiar and easy to use. We have developed a custom software module that carefully migrates the Word documents into deeply structured XML by analyzing the structure and semantics of the lemmas, and falling back on heuristics in ambiguous cases. While having initially envisioned the oXygen XML Author component as the main tool for creating new lemmas, we obtained excellent results with the migrator module, and decided therefore to continue using MS Word as the primary composition tool. The main advantage of this is that the editors are much more familiar with Word than with any other WYSIWYG editor. Lemmas that have been migrated to XML are stored in an XML database and can be further edited using oXygen XML Author.

Lemmatizer

Greek morphology is complicated. In order to use a dictionary effectively, a rather high level of initial language competence is necessary for the user to be able to relate the word form s/he finds in a text to the correct basic lemma form, where the definition of the word can be found. Using a Greek morphological database we have been able to facilitate the search for lemmas. A ‘lemmatizer’ module gives the possible parsings of the word forms and the lemmas they can be derived from. This enables the user to type in the word as found in the text and be redirected to the correct lemma.

Visualization

For the online dictionary we have implemented a visualization module that allows the user to view multiple lemmas at once. The implementation of this module has been done using the Javascript framework MooTools. The result is a viewer that performs really well and is run by maintainable Javascript code.

The online dictionary is still being worked on, have a look at http://www.woordenboekgrieks.nl/ for the beta version. A newer test version with additional features can be found here: http://angel.ic.uva.nl:8600/.

Credits

* construction of the dictionary: Prof. Ineke Sluiter, Classics department of Leiden University; Prof. Albert Rijksbaron, University of Amsterdam
* publisher of the dictionary: Amsterdam University Press
* design/typesetting dictionary: TaT Zetwerk (http://www.tatzetwerk.nl/)
* software development: Digital Production Center, University Library, University of Amsterdam
* project funding: CLARIN-NL (http://www.clarin.nl/)
* morphological database for use by the lemmatizer: courtesy of Prof. Helma Dik, University of Chicago (based on data of the Perseus Project)

----

== Using Drupal to drive alternative presentation systems ==

* [[User:Highermath|Cary Gordon]], The Cherry Hill Company, cgordon@chillco.com

Recently, we have been building systems that use angular.js, Rails, or other systems for presentation, while leveraging Drupal's sophisticated content management capabilities on the back end.

So far, these have been one-way systems, but as we move to Drupal 8 we are beginning to explore ways to further decouple the presentation and CMS functions.

== A Book, a Web Browser and a Tablet: How Bibliotheca Alexandrina's Book Viewer Framework Makes It Possible ==

* [[User:Mohammed.abuouda|Mohammed Abu ouda]], Bibliotheca Alexandrina (The new Library of Alexandria)

A lot of institutions around the world are engaged in multiple digitization projects aiming at preserving the human knowledge present in books and availing them through multiple channels to people around the whole globe. These efforts will sure help close the digital gap particularly with the arrival of affordable e-readers, mobile phones and network coverage. However, the digital reading experience has not yet arrived to its maximum potential. Many readers miss features they like in their good old books and wish to find them in their digital counterpart. In an attempt to create a unique digital reading experience, Bibliotheca Alexandria (BA) created a flexible book viewing framework that is currently used to access its current collection of more than 300,000 digital books in five different languages which includes the largest collection of digitized Arabic books.

Using open source tools, BA used the framework to develop a modular book viewer that can be deployed in different environments and is currently at the heart of various BA projects. The Book viewer provides several features creating a more natural reading experience. As with physical books, the reader can now personalize the books he reads by adding annotations like highlights, underlines and sticky notes to capture his thoughts and ideas in addition to being able to share the book with friends on social networks. The reader can perform a search across the content of the book receiving highlighted search results within the pages of the book. More features can be further added to the book viewer through its plugin architecture.

== Structured data NOW: seeding schema.org in library systems ==

* [http://coffeecode.net Dan Scott], Laurentian University
** Previous code4lib presentations: [https://archive.org/details/code4lib.conf.2008.pres.CouchDBsacrilege CouchDB is sacrilege... mmm, delicious sacrilege] at Code4Lib 2008

The semantic web, linked data, and structured data are all fantastic ideas with a barrier imposed by implementation constraints. If their system does not allow customizations, or the institution lacks skilled human resources, it does not matter how enthused a given library might be about publishing structured data... it will not happen. However, if the software in use simply publishes structured data by default, then the web will be populated for free. Really! No extra resources necessary.

This presentation highlights Dan's work with systems such as Evergreen, Koha, and VuFind to enable the publication of schema.org structured data out-of-the-box. Along the way, we reflect the current state of the W3C Schema.org Bibliographic Extension community group efforts to shape the evolution of the schema.org vocabulary. Finally, hold on tight as we contemplate next steps and the possibilities of a world where structured data is the norm on the web.

== Towards Pasta Code Nirvana: Using JavaScript MVC to Fill Your Programming Ravioli ==

* Bret Davidson, North Carolina State University Libraries, bret_davidson@ncsu.edu
** Previous Code4Lib Presentations: [http://wiki.code4lib.org/index.php/2013_talks_proposals#Data-Driven_Documents:_Visualizing_library_data_with_D3.js Visualizing library data with D3.js] at Code4Lib 2013

JavaScript MVC frameworks are ushering in a golden age of robust and responsive web applications that take advantage of evergreen browsers, performant JS engines, and the unprecedented reach provided by billions of personal computing devices. The web browser has emerged as the world’s most popular application runtime and the complexity[1] and scope of JavaScript applications has exploded accordingly. Server-side web frameworks like Rails and Django have helped developers adhere to best practices like modularity, dependency injection, and unit testing for years, practices that are now being applied to JavaScript development through projects like Backbone[2], Ember[3], and Angular[4].

This talk will discuss the issues JavaScript MVC frameworks are trying to solve, common features like data binding, implications for the future of web development[5], and the appropriateness of JavaScript MVC for library applications.

*[1]http://en.wikipedia.org/wiki/Spaghetti_code
*[2]http://backbonejs.org
*[3]http://emberjs.com
*[4]http://angularjs.org
*[5]http://tomdale.net/2013/09/progressive-enhancement-is-dead/

== WebSockets for Real-Time and Interactive Interfaces ==

* [http://ronallo.com Jason Ronallo], NCSU Libraries, jason_ronallo@ncsu.edu

Previous Code4Lib presentations:
* [http://code4lib.org/conference/2012/ronallo HTML5 Microdata and Schema.org] 2012
* [http://code4lib.org/conference/2013/ronallo HTML5 Video Now!] 2013

Watching the Google Analytics Real-Time dashboard for the first time was mesmerizing. As soon as someone visited a site, I could see what page they were on. For a digital collections site with a lot of images, it was fun to see what visitors were looking at. But getting from Google Analytics to the image or other content of what was currently being viewed was cumbersome. The real-time experience was something I wanted share with others. I'll show you how I used a WebSocket service to create a real-time interface to digital collections.

In the Hunt Library at NCSU we have some large video walls. I wanted to make HTML-based exhibits that featured viewer interactions. I'll show you how I converted Listen to Wikipedia [1] into an bring-your-own-device interactive exhibit. With WebSockets any HTML page can be remote controlled by any internet connected device.

I will attempt to include real-time audience participation.

[1] http://listen.hatnote.com/

== Rapid Development of Automated Tasks with the File Analyzer ==

* Terry Brady, Georgetown University Libraries, twb27@georgetown.edu

The Georgetown University Libraries have customized the File Analyzer and Metadata Harvester application (https://github.com/Georgetown-University-Libraries/File-Analyzer) to solve a number of library automation challenges:
* validating digitized and reformatted files
* validating vendor statistics for counter compliance
* preparing collections of digital files for archiving and ingest
* manipulating ILS import and export files

The File Analyzer application was used by the US National Archives to validate 3.5 million digitized images from the 1940 Census. After implementing a customized ingest workflow within the File Analyzer, the Georgetown University Libraries was able to process an ingest backlog of over a thousand files of digital resources into DigitalGeorgetown, the Libraries’ Digital Collections and Institutional Repository platform. Georgetown is currently developing customized workflows that integrate Apache Tika, BagIt, and Marc conversion utilities.

The File Analyzer is a desktop application with a powerful framework for implementing customized file validation and transformation rules. As new rules are deployed, they are presented to users within a user interface that is easy (and powerful) to use.

Learn about the functionality that is available for download, how you can use this tool to automate workflows from digital collections to ILS ingests to electronic resources statistics and also discuss the opportunities to collaborate on enhancements to this application!

== GeoHydra: How to Build a Geospatial Digital Library with Fedora ==

* [http://stanford.edu/~drh Darren Hardy], Stanford University, drh@stanford.edu

Geographically-rich data are exploding and putting fear in those trying to
tackle integrating them into existing digital library infrastructures.
Building a spatial data infrastructure that integrates with your digital
library infrastructure need not be a daunting task. We have successfully
deployed a geospatial digital library infrastructure using Fedora and
open-source geospatial software [1]. We'll discuss the primary design
decisions and technologies that led to a production deployment within a few
months. Briefly, our architecture revolves around discovery, delivery, and
metadata pipelines using open-source OpenGeoPortal [2], Solr [3], GeoServer
[4], PostGIS [5], and GeoNetwork [6] technologies, plus the proprietary ESRI
ArcMap [7] -- the GIS industry's workhorse. Finally, we'll discuss the key
skillsets needed to build and maintain a spatial data infrastructure.

[1] http://foss4g.org
[2] http://opengeoportal.org
[3] http://lucene.apache.org/solr
[4] http://geoserver.org
[5] http://postgis.net
[6] http://geonetwork-opensource.org
[7] http://esri.com

==Under the Hood of Hadoop Processing at OCLC Research ==

[http://roytennant.com/ Roy Tennant]

* Previous Code4Lib presentations: 2006: "The Case for Code4Lib 501c(3)"

[http://hadoop.apache.org/ Apache Hadoop] is widely used by Yahoo!, Google, and many others to process massive amounts of data quickly. OCLC Research uses a 40-node compute cluster with Hadoop and HBase to process the 300 million MARC records of WorldCat in various ways. This presentation will explain how Hadoop MapReduce works and illustrate it with specific examples and code. The role of the jobtracker in both monitoring and reporting on processes will be explained. String searching WorldCat will also be demonstrated live.

== Quick and Easy Data Visualization with Google Visualization API and Google Chart Libraries ==

[http://bohyunkim.net/blog Bohyun Kim], Florida International University, bohyun.kim@fiu.edu
* 'No' previous Code4Lib presentations

Do most of the data that your library collects stay in spreadsheets or are published as a static table with a series of boring numbers? Do your library stakeholders spend more time collecting the data than using it as a decision-making tool because the data is presented in a way that makes it hard for them [http://developers.google.com/chart/interactive/docs/gallery to quickly grasp its significance? ]

This talk will provide an overview of [http://developers.google.com/chart/interactive/docs/reference Google Visualization API] [2] and [http://developers.google.com/chart/ Google Chart Libraries] [3] to get you started on the way to quickly query and visualize your library data from remote data sources (e.g. a Google Spreadsheet or your own database) with (or without) cool-looking user-controls, animation effects, and even a dashboard.

== Leap Motion + Rare Books: A hands-free way to view and interact with rare books in 3D ==

[http://http://www.youtube.com/user/jpdenzer Juan Denzer], Binghamton University, jdenzer@binghamton.edu
* 'No' previous Code4Lib presentations

As rare books become more delicate over time, making them available to the public becomes harder. We at Binghamton University Library have developed an application that makes it easier to view rare books without ever having to touch them. We have combined the Leap Motion hands-free device and 3D rendered models to create a new virtual experience for the viewer.

The application allows the user to rotate and zoom in on a 3D representation of a rare book. The user is also able to ‘open’ the virtual book and flip through it using a natural user interface. Such as swiping the hand left or right to turn the page.

The application is built on the .Net framework and is written in C#. 3D models are created using simple 3D software such as sketchup or Blender. Scans of the book cover and spine are created using simple flatbed scanners. The inside pages are scanned using overhead scanners.

This talk with discuss the technologies used in developing the application and virtually any library could implement the application with virtually no coding at all. This presentation will have a demonstration of the software and also a chance for audience members to experience the Rare Book Leap Motion App themselves.

== Course Reserves Unleashed! ==

* Bobbi Fox, Library Technology Services, Harvard University, bobbi_fox@harvard.edu
* Gloria Korsman, Andover-Harvard Theological Library
** No previous Code4Lib presentations

Hey kids! Remember when SOAP was used for something other than washing? Our sophisticated (and highly functional) Course Reserves Request system does!

However, while the system is great for submitting and processing course reserve requests, the student-facing presentation through Havard’s home-grown -- and soon to be replaced -- LMS leaves a lot to be desired.

Follow along as we leverage Solr 4 as a No-SQL database, along with more progressive RESTful API techniques, to release Reserves data into the wild without interfering with reserves request processing -- and, in the process, open up the opportunity for other schools to feed their data in as well.

== We Are All Disabled! Universal Web Design Making Web Services Accessible for Everyone ==

Cynthia Ng, Accessibility Librarian, CILS at Langara College
* No previous Code4Lib presentations (not counting lightning talks)

We’re building and improving tools and services all the time, but do you only develop for the “average” user or add things for “disabled” users? We all use “assistive” technology accessing information in a multitude of ways with different platforms, devices, etc. Let’s focus on providing web services that are accessible to everyone without it being onerous or ugly. The aim is to get you thinking about what you can do to make web-based services and content more accessible for all from the beginning or with small amounts of effort whether you're a developer or not.

The goal of the presentation is to provide both developers and content creators with information on simple, practical ways to make web content and web services more accessible. However, rather than thinking about putting in extra effort or making adjustment for those with disabilities, I want to help people think about how to make their websites more accessible for all users through universal web design.

== Personalize your Google Analytics Data with Custom Events and Variables ==

[http://joshwilson.net Josh Wilson], Systems Integration Librarian, State Library of North Carolina - joshwilsonnc@gmail.com

At the State Library of North Carolina, we had more specific questions about the use of our digital collections than standard GA could provide. A few implementations of custom events and custom variables later, we have our answers.

I'll demonstrate how these analytics add-ons work, and why implementation can sometimes be more complicated than just adding a few lines of JavaScript to your ga.js. I'll discuss some specific examples in use at the SLNC:

* Capturing the content of specific metadata fields in CONTENTdm as Custom Events
* Recording Drupal taxonomy terms as Custom Variables

In both instances, this data deepened our understanding of how our sites and collections were being used, and in turn, we were able to report usage more accurately to content contributors and other stakeholders.

More on: [https://developers.google.com/analytics/devguides/collection/gajs/eventTrackerGuide GA Custom Events] | [https://developers.google.com/analytics/devguides/collection/gajs/gaTrackingCustomVariables GA Custom Variables]

== Behold Fedora 4: The Incredible Shrinking Repository! ==

Esmé Cowles, UC San Diego Library. Previous talk: [http://code4lib.org/conference/2013/cowles-critchlow-westbrook All Teh Metadatas Re-Revisited] (2013)

* One repository contains untold numbers of digital objects and powers many Hydra and Islandora apps
* It speaks RDF, but contains no triplestore! (triplestores sold separately, SPARQL Update may be involved, some restrictions apply)
* Flexible enough to tie itself in knots implementing storage and access control policies
* Witness feats of strength and scalability, with dramatically increased performance and clustering
* Plumb the depths of bottomless hierarchies, and marvel at the metadata woven into the very fabric of the repository
* Ponder the paradox of ingesting large files by not ingesting them
* Be amazed as Fedora 4 swallows other systems whole (including Fedora 3 repositories)
* Watch novice developers setup Fedora 4 from scratch, with just a handful of incantations to Git and Maven

The Fedora Commons Repository is the foundation of many digital collections, e-research, digital library, archives, digital preservation, institutional repository and open access publishing systems. This talk will focus on how Fedora 4 improves core repository functionality, adds new features, maintains backwards compatibility, and addresses the shortcomings of Fedora 3.

== Organic Free-Range API Development - Making Web Services That You Will Actually Want to Consume ==

Steve Meyer and Karen Coombs, OCLC

Building web services can have great benefits by providing reusability of data and functionality. Underpinning your applications with a web service will allow you to write code once and support multiple environments: your library's web app, mobile applications, the embedded widget in your campus portal. However, building a web service is its own kind of artful programming. Doing it well requires attention to many of the same techniques and requirements as building web applications, though with different outcomes.

So what are the usability principles for web services? How do you build a web service that you (and others) will actually want to use? In this talk, we’ll share some of the lessons learned - the good, the bad, and the ugly - through OCLC's work on the WorldCat Metadata API. This web service is a sophisticated API that provides external clients with read and write access to WorldCat data. It provides a model to help aspiring API creators navigate the potential complications of crafting a web service. We'll cover:

* Loose coupling of data assets and resource-oriented data modeling at the core
* Coding to standards vs. exposure of an internal data model
* Authentication and security for web services: API Keys, Digital Signing, OAuth Flows
* Building web services that behave as a suite so it looks like the left hand knows what the right hand is doing

So at the end of the day, your team will know your API is a very good egg after all.

If accepted, the presenters intend to produce and share a Quick Guide for building a web service that will reflect content presented in the talk.

== Lucene's Latest (for Libraries) ==

erik.hatcher@lucidworks.com

Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.

== The Why and How of Very Large Displays in Libraries. ==

* Cory Lown, NCSU Libraries, cwlown@ncsu.edu

Previous Code4Lib Presentations:
* [http://code4lib.org/conference/2012/lown How People Search the Library from a Single Search Box] 2012
* [http://code4lib.org/conference/2010/orphanides_lown_lynema Enhancing Discoverability with Virtual Shelf Browse] 2010

Built into the walls of NC State's new Hunt Library are several [http://www.christiedigital.com/en-us/digital-signage/products/microtiles/pages/microtiles-digital-signage-video-wall.aspx Christie MicroTile Display Wall Systems]. What does a library do with a display that's seven feet tall and over twenty feet wide? I'll talk about why libraries might want large displays like this, what we're doing with them right now, and what we might do with them in the future. I'll talk about how these displays factor into planning for new and existing web projects. And I'll get into the fun details of how you build web applications that scale from the very small browser window on a phone all the way up to a browser window with about 14 million pixels (about 10 million more than a dual 24" monitor desktop setup).

== Discovering your Discovery System in Real Time. ==

* Godmar Back, Virginia Tech, gback@vt.edu
* Annette Bailey, Virginia Tech, afbailey@vt.edu

Practically all libraries today provide web-based discovery systems to their users;
users discover items and peruse or check them out by clicking on links. Unlike
the traditional transaction of checking out a book at the circulation desk, this
interaction is largely invisible. We have built a system that records user's
interaction with Summon in real-time, processes the resulting data with minimal delay,
and visualizes it in various ways using Google Charts and using various d3.js modules,
such as word clouds, tree maps, and others.

These visualizations can be embedded in web sites, but are also suitable for
projection via large-scale displays or projectors right into the 'Learning Spaces'
many libraries are converted into. The goal of this talk is to share the technology
and advocate the building of a cloud-based infrastructure that would make this
technology available to any library that uses a discovery system, rather than just
those who have the technological prowess for developing such systems and
visualizations in-house.

Previous presentations at Code4Lib:
* Talk: Code4Lib 2009 [http://code4lib.org/files/LibX2.0-Code4Lib-2009AsPresented.ppt LibX 2.0]
* Preconference: [http://wiki.code4lib.org/index.php/LibX_Preconference LibX 2.0, 2009]
* Preconference: Code4Lib 2010, On Widgets and Web Services

== Your Library, Anywhere: A Modern, Responsive Library Catalogue at University of Toronto Libraries ==

* Bilal Khalid, Gordon Belray, Lisa Gayhart (lisa.gayhart@utoronto.ca)

* No previous Code4Lib presentations

With the recent surge in the mobile device market and an ever expanding patron base with increasingly divergent levels of technical ability, the University of Toronto Libraries embarked on the development of a new catalogue discovery layer to fit the needs of its diverse users.

[http://search.library.utoronto.ca The result]: a mobile-friendly, flexible and intuitive web application that brings the full power of a faceted library catalogue to users without compromising quality or performance, employing Responsive Web Design principles. This talk will discuss: application development; service improvements; interface design; and user outreach, testing, and project communications. Feedback and questions from the audience are very welcome. If time runs short, we will be available for questions and conversation after the presentation.

Note: A version of this content has been provisionally accepted as an article for Code4Lib Journal, January 2014 publication.)

== All Tiled Up ==

* Mike Graves, MIT Libraries (mgraves@mit.edu)

You've got maps. You even scanned and georeferenced them. Now what? Running a full GIS stack can be expensive, and overkill in some cases. The good news is that you have a lot more options now than you did just a few years ago. I'd like to present some lighter weight solutions to making georeferenced images available on the Web.

This talk will provide an introduction to MBTiles. I'll go over what they are, how you create them, how you use them and why you would use them.

== The Great War: Image Interoperability to Facebook ==

* Rob Sanderson, Los Alamos National Laboratory (azaroth42@gmail.com)
** (Code4Lib 2006: [http://www.code4lib.org/2006/sanderson | Library Text Mining])
* Rob Warren, Carleton University
** No previous presentations

Using a pipeline constructed from Linked Open Data and other interoperability specifications, it is possible to merge and re-use image and textual data from distributed library collections to build new, useful tools and applications. Starting with the OAI-PMH interface to ContentDM, we will take you on a tour through the International Image Interoperability Framework and Shared Canvas, to a cross-institutional viewer, and image analysis for the purposes of building a historical Facebook from finding and tagging people in photographs. The World War One collections are drawn from multiple institutions and merged by the machine learning code.

The presentation will focus on the (open source) toolchain and the benefits of the use of standards throughout: OAI-PMH to get the metadata, IIIF for interaction with the images, the Shared Canvas ontology for describing collections of digitized objects, Open Annotation for tagging things in the images and specialized ontologies that are specific to the contents. The tools include standard RDF / OWL technologies, JSON-LD, imagemagick and OpenCV for image analysis.

== Visualizing Solr Search Results with D3.js for User-Friendly Navigation of Large Results Sets ==

*Julia Bauder, Grinnell College Libraries (bauderj-at-grinnell-dot-edu)
*No previous presentations at national Code4Lib conferences

As the corpus of articles, books, and other resources searched by discovery systems continues to get bigger, searchers are more and more frequently confronted with unmanageably large numbers of results. How can we help users make sense of 10,000 hits and find the ones they actually want? Facets help, but making sense of a gigantic sidebar of facets is not an easy task for users, either.
During this talk, I will explain how we will soon be using Solr 4’s pivot queries and hierarchical visualizations (e.g., treemaps) from D3.js to let patrons view and manipulate search results. We will be doing this with our VuFind 2.0 catalog, but this technique will work with any system running Solr 4. I will also talk about early student reaction to our tests of these visualization features.

== PeerLibrary – open source cloud based collaborative library ==

[https://github.com/peerlibrary/peerlibrary PeerLibrary is a new open source project] and a cloud service providing collaborative reading, sharing and storing. Users can upload publications they want to read (currently in PDF format), read them in the browser in real-time with others, highlight, annotate and organize their own or collaborative library. PeerLibrary provides a search engine to search over all uploaded open access publications. Additionally, it aims to collaboratively aggregate the open layer of knowledge on top of this publications through public annotations and references user will add to publications. In this way publications would not just be available to read, but accessible to the general public as well. Currently, it is aiming at scientific community and scientific publications.

See [http://blog.peerlibrary.org/post/63458789185/screencast-previewing-the-peerlibrary-project screencast here].

It is still in development and beta launch is planned at the end of November.

== Who was where when, or finding biographical articles on Wikipedia by place and time ==

* [http://morton-owens.info Emily Morton-Owens], The Seattle Public Library (presenting on work from NYU)
* No previous c4l presentations

It's easy to answer the question "What important people were in Paris in 1939?" But what about Virginia in the 1750s or Scandinavia in the 14th century? I created a tool that allows you to search for biographies in a generally applicable way, using a map interface. I would like to present updates to my thesis project, which combines a crawler written in Java that extracts information from Wikipedia articles, with a MongoDB data store and a frontend in Python.

The input to the project is freetext of entire articles in Wikipedia; this is important to allow us to pick up Benjamin Franklin not just in the single most obvious place of Philadelphia but also in London, Paris, Boston, etc. I can talk about my experiments disambiguating place names (approaches pioneered on newspaper articles were actually unhelpful on this type of text) and setting up a processing queue that does not become mired in the biographies of every human who ever played soccer. I also want to mitigate some of the implementation choices I made due to my academic deadline and improve the accuracy/usability.

What I hope to show is that I was able to develop a novel and useful reference tool automatically, using fairly simple heuristics that are a far cry from hand-cataloging familiar to many librarians.

You can try out [http://linserv1.cims.nyu.edu:48866/ the original version] (this server is inconveniently set to be updated/rebooted on 11/8--may be temporarily unavailable)

== Good!, DRY, and Dynamic: Content Strategy for Libraries (Especially the Big Ones) ==

*Michael Schofield, Nova Southeastern University Libraries, mschofield@nova.edu
*No previous code4lib presentations.

The responsibilities of the #libweb are exploding [it’s a good thing] and it is no longer uncommon for libraries to manage or even home-grow multiple applications and sites. Often it is at this point where the web people begin to suffer the absence of a content strategy when, say, business hours need to be updated sitewide a half-dozen times.

We were already feeling this crunch when we decided to further complicate the Nova Southeastern University Libraries by splitting the main library website into two. The Alvin Sherman Library, Research, and Information Technology Center is a unique joint-use facility that serves not only the academic community but the public of Broward County - and marketing a hyperblend of content through one portal just wasn't cutting it. With a web team of two, we knew that managing all this rehashed, disparate content was totally unsustainable.

I want to share in this talk how I went about making our library content DRY (“don’t repeat yourself”): input content in one place--blurbs, policies, featured events, featured databases, book reviews, business hours, and so on.--and syndicate it everywhere - even, sometimes, dynamically target that content for specific audiences or context. It is a presentation that is a little about workflow, a little more about browser and context detection, a tangent about content-modeling the CMS, and a lot about APIs, syndication, and performance.

== No code, no root, no problem? Adventures in SaaS and library discovery ==

*[mailto:erwhite@vcu.edu Erin White, VCU]
*No previous C4L presentations

In 2012 VCU was an eager early adopter of Ex Libris' cloud service Alma as an ILS, ERM, link resolver, and single-stop, de-silo'd public-facing discovery tool. This has been a disruptive change that has shifted our systems staff's day-to-day work, relationships with others in the library, and relationships with vendors.

I'll share some of our experiences and takeaways from implementing and maintaining a cloud service:
* Seeking disruption and finding it
* Changing expectations of service and the reality of unplanned downtime
* Communication and problem resolution with non-IT library staff
* Working with a vendor that uses agile development methodology
* Benefits and pitfalls of creating customizations and code workarounds
* Changes in library IT/coders' roles with SaaS

...as well as thoughts on the philosophy of library discovery vs real-life experiences in moving to a single-search model.

== Building for others (and ourselves): the Avalon Media System ==
* [mailto:michael.klein@northwestern.edu Michael B Klein], Senior Software Developer, Northwestern University
** [http://code4lib.org/conference/2010/metz_klein Public Datasets in the Cloud] (code4lib 2010)
** [http://code4lib.org/conference/2013/klein-rogers The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery] (code4lib 2013)
* [mailto:j-rudder@northwestern.edu Julie Rudder], Digital Initiatives Project Manager, Northwestern University
** no previous code4lib presentations

[http://www.avalonmediasystem.org/ Avalon Media System] is a collaborative effort between development teams at Northwestern and Indiana Universities. Our goal is to produce an open source media management platform that works well for us, but is also widely adopted and contributed to by other institutions. We believe that building a strong user and contributor community is vital to the success and longevity of the project, and have developed the system with this goal in mind. We will share lessons learned, pains and successes we’ve had releasing two versions of the application since last year.

Our presentation will cover our experiences:
* providing flexible, admin-friendly distribution and installation options
* building with abstraction, customization and local integrations in mind
* prioritizing features (user stories)
* attracting code contributions from other institutions
* gathering community feedback
* creating a product rather than a bag of parts

== How to check your data to provide a great data product? Data quality as a key product feature at Europeana ==

*[mailto:Peter.Kiraly@kb.nl Péter Király] portal backend developer, Europeana
*No previous C4L presentations

[http://Europeana.eu/ Europeana.eu] - Europe's digital library, archive and museum - aggregates more than 30 million metadata records from more than 2200 institutions. The records come from libraries, archives, museums and every other kind of cultural institution, from very different systems and metadata schemas, and are typically transformed several times until they are ingested into the Europeana data repository. Europeana builds a consolidated database from these records, creating reliable and consistent services for end-users (a search portal, search widget, mobile apps, thematic sites etc.) and an API, which supports our strategic goeal of data for reuse in education, creative industries, and the cultural sector. A reliable "data product" is thus at the core of our own software products, as well as those of our API partners.

Much effort is needed to smooth out local differences in the metadata curation practice of our data providers. We need a solid framework to measure the consistency of our data and provide feedback to decision-makers inside and outside the organisation. We can also use this metrics framework to ask content providers to improve their own metadata. Of course, a data-quality-driven approach requires that we also improve the data transformation steps of the Europeana ingestion process itself. Data quality issues heavily define what new features we are able to create in our user interfaces and API, and might actually affect the design and implementation of our underlying data structure, the Europeana Data Model.

In the presentation I briefly describe the Europeana metadata ingestion process, show the data quality metrics, the measuring techniques (using the Europeana API, Solr and MongoDB queries), some typical problems (both trivial and difficult ones), and finally the feedback mechanism we propose to deploy.

Keywords: Europeana, data quality, EDM, API, Apache Solr, MongoDB, #opendata, #openglam

== Teach your Fedora to Fly: scaling out a digital repository ==

*[mailto:acoburn@amherst.edu Aaron Coburn], Software Developer, Amherst College
*No previous C4L presentations

Fedora is a great repository system for managing large collections of digital objects, but what happens when a popular food magazine begins directing a large number of readers to a manuscript showing Emily Dickinson’s own recipe for doughnuts? While Fedora excels in its support of XML-based metadata, it doesn’t always perform well under a high volume of traffic. Nor is it especially tolerant of network or hardware failures.

This presentation will show how we are making heavy use of a fedora repository while at the same time insulating it almost entirely from any web traffic. Starting with a distributed web front-end built with Node.js, and caching most of the user-accessible content from Fedora in an elastic, fault-tolerant Riak (NoSQL) cluster, we have eliminated nearly all single points of failure in the system. It also means that our production system is spread across twelve separate servers, where asynchrony and Map-Reduce are king. And aside from being blazing fast, it is also entirely Hydra-compliant.

Furthermore, we will attempt to answer the question: if fedora crashes and the visitors to your site don’t notice, did it really fail?

== Using Open Source Software and Freeware to Preserve and Deliver Digital Videos ==
* [mailto:wfang@kinoy.rutgers.edu Wei Fang], Head of Digital Services, Rutgers University Law Library
* Jiebei Luo, Digital Projects Initiative Intern, Rutgers University
*No previous C4L presentations

The Rutgers University Law Library is the official digital repository of the New Jersey Supreme Court oral arguments since 2002. This large video collection contains approximately 3,000 videos with a total of 400 GB or 6,000 viewing hours. With the expansion of this collection, the existing database and the static website could not efficiently support the library’s daily operations and meet its patrons’ search needs.
By utilizing open source software and freeware such as Ubuntu, FFmpeg, Solr and Drupal, the library is able to develop a complete solution to re-encoding videos, embedding subtitles, incorporating Solr search engine and content management system to support full-text subtitle search, automatically updating video metadata records in the library catalog system and eventually providing a plug-in free HTML 5-based Web interface for patrons to view the videos online.
The aspects below will be presented in detail at the conference:
* Video codecs comparison
* Server-end batch video encoding/re-encoding
* HTML 5 video tag and embedding subtitles
* Incorporating search engine Solr and content management tool Drupal with the database to retrieve videos by full-text search especially in subtitle files
* Incorporating video metadata with the library catalog system

== Shared Vision, Shared Resources: the Curate Institutional Repository ==
* Dan Brubaker Horst, University of Notre Dame
** [http://code4lib.org/conference/2011/JohnsonHorst A Community-Based Approach to Developing a Digital Exhibit at Notre Dame Using the Hydra Framework]
* Julie Rudder, Northwestern University
** no previous presentations

Curate is being collaboratively developed by several institutions in the Hydra community who share the need and vision for a Fedora-backed Institutional Repository. The first release of Curate was a collaboration between Notre Dame and Northwestern University, along with Digital Curation Experts (DCE) - a vendor hired jointly by our two institutions. Powered by the Hydra engine Sufia, the team worked quickly to release the first version of Curate in October 2013 which provides a basic self-deposit system that has support for various content types, collection building, DOI minting, and user profile creation. From the very beginning we have built Curate to be easy to theme and extend in order to ease the process of installation and use by other institutions.

In December 2013, additional partners will join the project including: Indiana University, the University of Cincinnati and the University of Virginia. Each institution contributes resources to the project in order to further our common goal to create a product that fits our needs and has a sustainable future.Together we will tackle additional content types (like complex data, software, media), administrative collections and more.

Our presentation will include:
* a brief demonstration of Curate and technical overview
* why and how we work together
* why build Curate
* the future of the project

== Solr, Cloud and Blacklight ==
* David Jiao, Library Information Systems, Indiana University at Bloomington, djiao@indiana.edu
** No previous code4lib presentations

SolrCloud refers to the distributed capabilities in Solr4. It is designed to offer a highly available, fault tolerant environment by organizing data into multiple pieces that can be hosted on multiple machines with replicas, and providing a centralized cluster configuration and management.

At Indiana University, we are upgrading our Solr backend for our recently released Blacklight-based OPAC system from Solr 1.4 to Solr4, and we also put up efforts to build a private cloud of Solr4 servers. In this talk, I will persent certain features of SolrCloud, including distributed requests, fault tolerance, near real time indexing/searching, and configuration management with Zookeeper, and our experiences of utilizing these features to provide better performance and architecture for our OPAC system, which serves over 7 million bibliographic records to over 100 thousand students and faculty members. I will also discuss some practical lessons learned from our SolrCloud setup/upgrade and the integration of the new SolrCloud to our customized Blacklight system.

== Leveraging XSD's for Reflective, Live Dataset Support in Institutional Repositories ==
* [mailto:msulliva@ufl.edu Mark Sullivan], Library Information Technology, University of Florida
** No previous code4lib presentations

The University of Florida Libraries are currently adding support for active datasets into our METS-based institutional repository software. This ongoing project enables the library to be a partner in current, or long-running, data-driven projects around the university by providing tangible short-term and long-term benefits to the projects. The system assists project teams by storing and providing access to their data, while supporting online filtering and sorting of the data, custom queries, and adding and editing of the data by authorized users. We are also exploring simple data visualizations to allow users to perform basic graphical and geographic queries. Several different schemas were explored including DDI and EML, but ultimately the streamlined approach of using XSD's with some custom attributes was chosen, with all other data residing in the METS file portions. Currently the system is being developed using XSD's describing XML datasets, but this model should easily scale to support SQL datasets or large datasets supported by Hadoop or iRODS.

This work is being integrated in the open source [http://sobek.ufl.edu SobekCM Digital Content Management System] which is built on a pair-tree structure of METS resources with [http://ufdc.ufl.edu/design/webcontent/sobekcm/SobekCM_Resource_Object.pdf rich metadata support] including DC, MODS, MARC, VRACore, DarwinCore, IEE-LOM, GML/KML, schema.org microdata, and many other standard schemas. The system has emphasized online, distributed creation and maintenance of resources including geo-placement and geographic searching of resources, building structure maps (table of contents) visually online, and a broad suite of curator tools.

This work is presented as a model which could be implemented in other systems as well. We will demonstrate current support and discuss our upcoming roadmap to provide complete support.

== Dead-simple Video Content Management: Let Your Filesystem Do The Work ==

* Andreas Orphanides, NCSU Libraries (akorphan (at) ncsu.edu)
** (never led or soloed a C4L presentation)

Content management is hard. To keep all the moving parts in order, and to maintain a layer of separation between the system and content creators (who are frequently not technical experts), we typically turn to content management systems like Drupal. But even Drupal and its kin require significant overhead and present a not inconsiderable learning curve for nontechnical users.

In some contexts it's possible -- and desirable -- to manage content in a more streamlined, lightweight way, with a minimum of fuss and technical infrastructure. In this presentation I'll share a simple MVC-like architecture for managing video content for playback on the web, which uses a combination of Apache's mod_rewrite module and your server's filesystem structure to provide an automated approach to video content management that's easy to implement and provides a low barrier to content updates: friendly to content creators and technology implementors alike. Even better, the basic method is HTML5-friendly, and can be integrated into your favorite content management system if you've got permissions for creating templates.

In the presentation I'll go into detail about the system structure and logic required to implement this approach. I'll detail the benefits and limitations of the system, as well as the challenges I encountered in developing its implementation. Audience members should come away with sufficient background to implement a similar system on their own servers. Implementation documentation and genericized code will also be shared, as available.

== Managing Discovery ==

* Andrew Pasterfield, Senior Programmer/Systems Analyst, University of Calgary Library, ampaster@ucalgary.ca
**No previous code4lib presentations 
In fall 2012 the University of Calgary Library launched a new home page that incorporated a Summon powered
Single Search Box with customized “bento box” results display. Search at the U of C now combines a range of
metadata sources for discovery and customized mapping of a database recommender and LibGuide into a unified
display. Further customizations include a non Google Analytics/non proxy method to log clicks. 

This presentation will discuss the technical details of bringing the various systems together into one display interface to increase discovery at the U of C Library.

http://library.ucalgary.ca

== Sorting it out: a piece of the User Centered Design Process ==

* Cindy Beggs, [http://www.akendi.com/aboutus/management/ Akendi], cindy@akendi.com

This talk is about how to apply a user centered design methodology to the process of creating an information architecture. Participants learn the fundamentals of UCD and how card sorting and reverse card sorting enable us to isolate the content we present on screen from the layouts and visuals of those screens. We talk about ways to identify who will be using the information architecture you are creating and why we need to know how it will be used.

What will attendees takes away from your talk?
The criticality of involving “real” end users in the process of creating an information architecture. The basics of following a user-centered-design process in the creation of best in class, content-rich, digital products.

Cindy Beggs has been working in the “information industry” for over 25 years. A librarian by profession, she has spent decades helping users figure out how to find their way through large bodies of content. Her insights into how people seek information, her empathy for those who find it a challenge and her practical experience helping organizations figure out how to best structure their content contribute to her success as an information architect with both clients and trainees. (http://www.akendi.com/aboutus/management/)

==Implementation of ArchivesSpace in University of Richmond==

*Birong Ho, bho@richmond.edu

University of Richmond implemented its archive collection management ArchivsSpace in the fall, 2013. As a charter member and the Head of Special Collection as the Board member, implementation of such an Open Source Software became a priority.

Several aspects of implementation will be addressed in the talk. Among them, they are Collections and Repository, storage layer including data format, System resources requirements, Technical architecture, Customization, scaling and integrated with other systems in the library.

The customization, scale, and integration with other systems such as Archeon and Exist on campus became a concern will be focused and elaborated in the talk.

==Easy Wins for Modern Web Technologies in Libraries==

*[mailto:trey.terrell@oregonstate.edu Trey Terrell], Analyst Programmer, Oregon State University
** No previous Code4Lib presentations

Oregon State University is currently implementing an updated version of its room reservation system. In its development we've come across and implemented a variety of "easy wins" to make it more responsive, easier to maintain, less expensive to run, and just cooler to experience. While our particular system was in Ruby on Rails, this talk will address general methods and example utilities which can be used no matter your stack.

I'll be talking about things like cache management, reverse proxies, publish/subscribe servers, WebSockets, responsive design, asynchronous processing, and keeping complicated stacks up and running with minimal effort.

==Implementing Islandora at a Small Institution==

*Megan Kudzia, Albion College Library
*Eddie Bachle, Albion College IT
**No previous Code4Lib presentations

Albion College (and particularly the Library/Archives and Special Collections) has a variety of needs which could be met by an open-source Institutional Repository system. Several months and lots of conversations later, we’re continuing to troubleshoot our way through Islandora. We’d like to talk about what has worked for us, where our frustrations have been, whether it’s even possible to install and develop a system like this at a small institution, and where the process has stalled.

As of right now, we do have a semi-working installation. We’re not sure when it will be ready for our end users, but we'll talk about our development process and evaluate our progress.
''Contributions also by Nicole Smeltekop, Albion College Archives & Special Collections''

== PhantomJS+Selenium: Easy Automated Testing of AJAX-y UIs ==

* Martin Haye, California Digital Library, martin.haye@ucop.edu
** Previous Code4Lib Presentation: [http://code4lib.org/conference/2012/collett Beyond code: Versioning data with Git and Mercurial] at Code4Lib 2012 (Martin co-presenting with Stephanie Collett)
* Mark Redar, California Digital Library, mark.redar@ucop.edu

Web user interfaces are demanding ever-more dynamism and polish, combining HTML5, AJAX, lots of CSS and jQuery (or ilk) to create autocomplete drop-downs, intelligent buttons, stylish alert dialogs, etc. How can you make automated tests for these highly complex and interactive UIs?

Part of the answer is PhantomJS. It’s a modern WebKit browser that’s “headless” (meaning it has no display) that can be driven from command-line Selenium unit tests. PhantomJS is dead simple to install, and its blazing speed and server-friendliness make continuous integration testing easy. You can write UI unit tests in {language-of-your-choice} and run them not just in PhantomJS but in Firefox and Chrome, plus a zillion browser/OS combinations at places like SauceLabs, TestingBot and BrowserStack.

In this double-team live code talk, we’ll explain all that while we demonstrate the following in real time:

* Start with nothing.
* Install Selenium bindings for Ruby and Python.
* In each language write a small test of an AJAX-y UI.
* Run the tests in Firefox, and fix bugs (in the test or UI) as needed.
* Install PhantomJS.
* Show the same tests running headless as part of a server-friendly test suite.
* (Wifi permitting) Show the same tests running on a couple different browser/OS combinations on the server cloud at SauceLabs – talking through a tunnel to the local firewalled application.

==New Technologies, Collaboration, & Entrepreneurship in Libraries: Harnessing Their Power to Help Your Library==

* Stephanie Walker – swalker@brooklyn.cuny.edu
* Howard Spivak – howards@brooklyn.cuny.edu
* Alex - Alex@brooklyn.cuny.edu

Academic libraries are caught in budget squeezes and often struggle to find ways to communicate value to senior administration and others. At Brooklyn College Library, we have taken an unusual, possibly unique, approach to these issues. Our technology staff have long worked directly with librarians to develop products that meet library, faculty, and student needs, and we have shared many of our products with colleagues, including an award-winning website, e-resource, and content management system we call 4MyLibrary, which we shared for free with 8 CUNY colleges, and also an easy-to-use book scanner, which has proven overwhelming popular with students, faculty, other librarians, and numerous campus offices. Recently, motivated by budget cuts, we decided that what worked for us might interest other libraries, and working with our Office of Technology Commercialization, we started selling 2 products: our book scanners (at half the price of commercial alternatives), and a hosting service, whereby we could host and support 4MyLibrary for libraries with minimal technology staff. Both succeeded, and yielded major benefits: a steady revenue stream and the admiration and serious goodwill of our senior administration and others. However, this presentation is neither a basic how-to, nor an advertisement. With this presentation, we hope to spur a conversation for broader collaboration, especially regarding new technologies, among libraries. We all have some level of technical expertise, most of us are struggling with rising prices and tight budgets, and many of us are unhappy with various technology products we use, from scanners to our ILS. We believe – and can demonstrate – that with collaboration, we can solve many of our problems, and provide better services to boot.

== Identifiers, Data, and Norse Gods ==

* Ryan Scherle, [http://datadryad.org Dryad Digital Repository], ryan@datadryad.org
** previous Code4Lib talk [http://ryan.scherle.org/papers/2010-2-code4lib-HIVE.ppt HIVE: A New Tool for Working With Vocabularies], at Code4Lib 2011.

ORCID and DataCite provide stable identifiers for researchers and and data, respectively. Each system does a fine job of providing value to its users. But wouldn't it be great if they could link their systems to create something much more powerful? Perhaps even as powerful as a god?

Enter [http://odin-project.eu/ ODIN], The ORCID and DataCite Interoperability Network. ODIN is a two-year project to unleash the power of persistent identifiers for researchers and the research they create. This talk will present recent work from the ODIN project, including several tools that can unleash the godlike power of identifiers at your institution. Current tools include:
* Metadata generator tool: allows repository staff to create DataCite metadata with embedded ORCIDs.
* Claiming tool: assists researchers in claiming their work within the ORCID system.
* ORCID-feed: includes a list of ORCID works on any web page.
* ODIN's HAMR: assists in populating a DSpace repository with ORCIDs. Based on work from a Code4Lib hackathon!

== Armed Bandits in the Digital Library ==

* Roman Chyla, [http://labs.adsabs.harvard.edu/adsabs/ Astrophysics Data System], rchyla@cfa.harvard.edu
** Previous Code4Lib: [http://code4lib.org/conference/2013/chyla Citation search in SOLR and second-order operators]

Many of us are using the excellent Lucene library (or SOLR appliance) to provide search functionality. These systems contain number of features to adjust relevancy ranking of hits, but we may not know how to use them. In this presentation, I'll present the available options - eg. what is the default ranking 'Vector space model, what are the alternatives (eg. BM25) and what are the other options we have to tweak and adjust the ranking of the hits (eg. boost factors, functions). But even if we know how to deploy these adjustments and tweaks, we are still left in dark. We do not know whether the change we've just rolled out had a significant (statistically significant) effect or maybe it was just a waste of time and resources? A/B testing is one option, but there may be a much better one - so called "Multi-Armed Bandits Approach". And in this talk I'd like to show how we are experimenting with this strategy to adjust [http://labs.adsabs.harvard.edu/adsabs/ ADS search engine].

== Building Worker Queues with AWS and Resque ==

* Eric Rochester [http://scholarslab.org Scholars' Lab], erochest@virginia.edu
* Scott Turnbull [http://aptrust.org/ Academic Preservation Trust], scott.turnbull@aptrust.org

A common task in larger systems is to be able to process large input files automatically. Often users can drop those files into a shared directory on AWS or on NFS or another shared drive. Those files need to be processed and potentially integrated into a system. This task has come up recently in the University of Virginia libraries in allowing users to add GIS data to the system and in setting up a system for the Academic Preservation Trust (http://aptrust.org/) that ingests files and resources into the preservation system.

This system is built by loosely coupling a number of different technologies. This allows us to easily interoperate and communicate between different system and programming environments. Because the interfaces are well defined, it’s also fairly simple to switch out technologies as the requirements of the system change.

The process is fairly simple:

First, a Ruby daemon monitors an AWS S3 bucket that others can upload new files into. This daemon creates a Resque status task, adds a marker for the task in a database, and continues monitoring.

Second, Resque mediates incoming job requests and routes them to the appropriate workers which may be in Java, Go, or Ruby. The diversity of technologies that Resque can manage allows great latitude to leverage the appropriate tool for a specific job. While processing, it updates the status for that job and coordinates processing with other jobs.

Finally, a page that is integrated into a larger Rails app provides a novice-user-friendly view of the status of the workers and allows basic tasks such as restarting the job.

This architecture allows us to swap in the technology that best fits each part of the process, and it makes it easier to maintain the system. We use this to integrate and coordinate between tasks handled in Java, Ruby, and Go, and it provides an effective way to interoperate with these programming languages and the respective strengths that they bring to this system.

==Piwik: Open source web analytics==
* Kirk Hess, University of Illinois at Urbana-Champaign (kirkhess@illinois.edu)
** (Code4Lib 2012: [http://code4lib.org/conference/2012/hess| Discovering Digital Library User Behavior with Google Analytics])

While Google Analytics is synonymous with Web Analytics, fortunately today we have many other good options, and one option is Piwik, [http://piwik.org| piwik.org] a simple to install, open-source PHP/MySQL application with a tracking script that will sit alongside Google Analytics tracking the usual clicks, events and variables. In this presentation, I'd like to cover the usual analytics topics and also cover what makes Piwik powerful, such as importing and visualizing web logs from any system to incorporate both past and future data, easily tracking downloads, and the ability to write your own reports or dashboard. The visitor log data is stored securely on your own server so you have control over who looks at the data and how much or how little to keep. With an active and helpful developer community, Piwik has the potential for analytics which makes sense for libraries, not e-commerce.

[[:Category:Code4Lib2014]]

2014 Prepared Talk Proposals

2013-11-08T20:31:50Z

Ryscher: /* Identifiers, Data, and Norse Gods */

'''Proposals for Prepared Talks:'''

Prepared talks are 20 minutes (including setup and questions), and should focus on one or more of the following areas:

* ''Projects'' you've worked on which incorporate innovative implementation of existing technologies and/or development of new software
* ''Tools and technologies'' – How to get the most out of existing tools, standards and protocols (and ideas on how to make them better)
* ''Technical issues'' - Big issues in library technology that should be addressed or better understood
* ''Relevant non-technical issues'' – Concerns of interest to the Code4Lib community which are not strictly technical in nature, e.g. collaboration, diversity, organizational challenges, etc.

'''To Propose a Talk'''
* Log in to the wiki in order to submit a proposal. If you are not already registered, follow the instructions to do so.
* Provide a title and brief (500 words or fewer) description of your proposed talk.
* If you so choose, you may also indicate when, if ever, you have presented at a prior Code4Lib conference. This information is completely optional, but it may assist us in opening the conference to new presenters.

As in past years, the Code4Lib community will vote on proposals that they would like to see included in the program. This year, however, only the top 10 proposals will be guaranteed a slot at the conference. Additional presentations will be selected by the Program Committee in an effort to ensure diversity in program content. Community votes will, of course, still weigh heavily in these decisions.

Presenters whose proposals are selected for inclusion in the program will be guaranteed an opportunity to register for the conference. The standard conference registration fee will still apply.

''Proposals can be submitted through '''Friday, November 8, 2013, at 5pm PST'''''. Voting will commence on November 18, 2013 and continue through December 6, 2013. The final line-up of presentations will be announced in early January, 2014.

'''Talk Proposals'''

==Creating a new Greek-Dutch dictionary==
* Caspar Treijtel, University of Amsterdam, c.treijtel@uva.nl

At present, no complete dictionary of (ancient) Greek-Dutch is available online. A new dictionary is currently under construction at Leiden University, with software being developed at the University of Amsterdam. The team in Leiden has already begun preparation of the data, with at this moment about 6,000 approved lemmas. The ultimate goal is to produce both a print version and online open access version from the same source documents. The software needed for this has been made in a project that was funded by CLARIN-NL.

Migrator

For the production of lemmas we have implemented an advanced workflow. The (generally non-technical) users create lemmas using MS Word, which is both familiar and easy to use. We have developed a custom software module that carefully migrates the Word documents into deeply structured XML by analyzing the structure and semantics of the lemmas, and falling back on heuristics in ambiguous cases. While having initially envisioned the oXygen XML Author component as the main tool for creating new lemmas, we obtained excellent results with the migrator module, and decided therefore to continue using MS Word as the primary composition tool. The main advantage of this is that the editors are much more familiar with Word than with any other WYSIWYG editor. Lemmas that have been migrated to XML are stored in an XML database and can be further edited using oXygen XML Author.

Lemmatizer

Greek morphology is complicated. In order to use a dictionary effectively, a rather high level of initial language competence is necessary for the user to be able to relate the word form s/he finds in a text to the correct basic lemma form, where the definition of the word can be found. Using a Greek morphological database we have been able to facilitate the search for lemmas. A ‘lemmatizer’ module gives the possible parsings of the word forms and the lemmas they can be derived from. This enables the user to type in the word as found in the text and be redirected to the correct lemma.

Visualization

For the online dictionary we have implemented a visualization module that allows the user to view multiple lemmas at once. The implementation of this module has been done using the Javascript framework MooTools. The result is a viewer that performs really well and is run by maintainable Javascript code.

The online dictionary is still being worked on, have a look at http://www.woordenboekgrieks.nl/ for the beta version. A newer test version with additional features can be found here: http://angel.ic.uva.nl:8600/.

Credits

* construction of the dictionary: Prof. Ineke Sluiter, Classics department of Leiden University; Prof. Albert Rijksbaron, University of Amsterdam
* publisher of the dictionary: Amsterdam University Press
* design/typesetting dictionary: TaT Zetwerk (http://www.tatzetwerk.nl/)
* software development: Digital Production Center, University Library, University of Amsterdam
* project funding: CLARIN-NL (http://www.clarin.nl/)
* morphological database for use by the lemmatizer: courtesy of Prof. Helma Dik, University of Chicago (based on data of the Perseus Project)

----

== Using Drupal to drive alternative presentation systems ==

* [[User:Highermath|Cary Gordon]], The Cherry Hill Company, cgordon@chillco.com

Recently, we have been building systems that use angular.js, Rails, or other systems for presentation, while leveraging Drupal's sophisticated content management capabilities on the back end.

So far, these have been one-way systems, but as we move to Drupal 8 we are beginning to explore ways to further decouple the presentation and CMS functions.

== A Book, a Web Browser and a Tablet: How Bibliotheca Alexandrina's Book Viewer Framework Makes It Possible ==

* [[User:Mohammed.abuouda|Mohammed Abu ouda]], Bibliotheca Alexandrina (The new Library of Alexandria)

A lot of institutions around the world are engaged in multiple digitization projects aiming at preserving the human knowledge present in books and availing them through multiple channels to people around the whole globe. These efforts will sure help close the digital gap particularly with the arrival of affordable e-readers, mobile phones and network coverage. However, the digital reading experience has not yet arrived to its maximum potential. Many readers miss features they like in their good old books and wish to find them in their digital counterpart. In an attempt to create a unique digital reading experience, Bibliotheca Alexandria (BA) created a flexible book viewing framework that is currently used to access its current collection of more than 300,000 digital books in five different languages which includes the largest collection of digitized Arabic books.

Using open source tools, BA used the framework to develop a modular book viewer that can be deployed in different environments and is currently at the heart of various BA projects. The Book viewer provides several features creating a more natural reading experience. As with physical books, the reader can now personalize the books he reads by adding annotations like highlights, underlines and sticky notes to capture his thoughts and ideas in addition to being able to share the book with friends on social networks. The reader can perform a search across the content of the book receiving highlighted search results within the pages of the book. More features can be further added to the book viewer through its plugin architecture.

== Structured data NOW: seeding schema.org in library systems ==

* [http://coffeecode.net Dan Scott], Laurentian University
** Previous code4lib presentations: [https://archive.org/details/code4lib.conf.2008.pres.CouchDBsacrilege CouchDB is sacrilege... mmm, delicious sacrilege] at Code4Lib 2008

The semantic web, linked data, and structured data are all fantastic ideas with a barrier imposed by implementation constraints. If their system does not allow customizations, or the institution lacks skilled human resources, it does not matter how enthused a given library might be about publishing structured data... it will not happen. However, if the software in use simply publishes structured data by default, then the web will be populated for free. Really! No extra resources necessary.

This presentation highlights Dan's work with systems such as Evergreen, Koha, and VuFind to enable the publication of schema.org structured data out-of-the-box. Along the way, we reflect the current state of the W3C Schema.org Bibliographic Extension community group efforts to shape the evolution of the schema.org vocabulary. Finally, hold on tight as we contemplate next steps and the possibilities of a world where structured data is the norm on the web.

== Towards Pasta Code Nirvana: Using JavaScript MVC to Fill Your Programming Ravioli ==

* Bret Davidson, North Carolina State University Libraries, bret_davidson@ncsu.edu
** Previous Code4Lib Presentations: [http://wiki.code4lib.org/index.php/2013_talks_proposals#Data-Driven_Documents:_Visualizing_library_data_with_D3.js Visualizing library data with D3.js] at Code4Lib 2013

JavaScript MVC frameworks are ushering in a golden age of robust and responsive web applications that take advantage of evergreen browsers, performant JS engines, and the unprecedented reach provided by billions of personal computing devices. The web browser has emerged as the world’s most popular application runtime and the complexity[1] and scope of JavaScript applications has exploded accordingly. Server-side web frameworks like Rails and Django have helped developers adhere to best practices like modularity, dependency injection, and unit testing for years, practices that are now being applied to JavaScript development through projects like Backbone[2], Ember[3], and Angular[4].

This talk will discuss the issues JavaScript MVC frameworks are trying to solve, common features like data binding, implications for the future of web development[5], and the appropriateness of JavaScript MVC for library applications.

*[1]http://en.wikipedia.org/wiki/Spaghetti_code
*[2]http://backbonejs.org
*[3]http://emberjs.com
*[4]http://angularjs.org
*[5]http://tomdale.net/2013/09/progressive-enhancement-is-dead/

== WebSockets for Real-Time and Interactive Interfaces ==

* [http://ronallo.com Jason Ronallo], NCSU Libraries, jason_ronallo@ncsu.edu

Previous Code4Lib presentations:
* [http://code4lib.org/conference/2012/ronallo HTML5 Microdata and Schema.org] 2012
* [http://code4lib.org/conference/2013/ronallo HTML5 Video Now!] 2013

Watching the Google Analytics Real-Time dashboard for the first time was mesmerizing. As soon as someone visited a site, I could see what page they were on. For a digital collections site with a lot of images, it was fun to see what visitors were looking at. But getting from Google Analytics to the image or other content of what was currently being viewed was cumbersome. The real-time experience was something I wanted share with others. I'll show you how I used a WebSocket service to create a real-time interface to digital collections.

In the Hunt Library at NCSU we have some large video walls. I wanted to make HTML-based exhibits that featured viewer interactions. I'll show you how I converted Listen to Wikipedia [1] into an bring-your-own-device interactive exhibit. With WebSockets any HTML page can be remote controlled by any internet connected device.

I will attempt to include real-time audience participation.

[1] http://listen.hatnote.com/

== Rapid Development of Automated Tasks with the File Analyzer ==

* Terry Brady, Georgetown University Libraries, twb27@georgetown.edu

The Georgetown University Libraries have customized the File Analyzer and Metadata Harvester application (https://github.com/Georgetown-University-Libraries/File-Analyzer) to solve a number of library automation challenges:
* validating digitized and reformatted files
* validating vendor statistics for counter compliance
* preparing collections of digital files for archiving and ingest
* manipulating ILS import and export files

The File Analyzer application was used by the US National Archives to validate 3.5 million digitized images from the 1940 Census. After implementing a customized ingest workflow within the File Analyzer, the Georgetown University Libraries was able to process an ingest backlog of over a thousand files of digital resources into DigitalGeorgetown, the Libraries’ Digital Collections and Institutional Repository platform. Georgetown is currently developing customized workflows that integrate Apache Tika, BagIt, and Marc conversion utilities.

The File Analyzer is a desktop application with a powerful framework for implementing customized file validation and transformation rules. As new rules are deployed, they are presented to users within a user interface that is easy (and powerful) to use.

Learn about the functionality that is available for download, how you can use this tool to automate workflows from digital collections to ILS ingests to electronic resources statistics and also discuss the opportunities to collaborate on enhancements to this application!

== GeoHydra: How to Build a Geospatial Digital Library with Fedora ==

* [http://stanford.edu/~drh Darren Hardy], Stanford University, drh@stanford.edu

Geographically-rich data are exploding and putting fear in those trying to
tackle integrating them into existing digital library infrastructures.
Building a spatial data infrastructure that integrates with your digital
library infrastructure need not be a daunting task. We have successfully
deployed a geospatial digital library infrastructure using Fedora and
open-source geospatial software [1]. We'll discuss the primary design
decisions and technologies that led to a production deployment within a few
months. Briefly, our architecture revolves around discovery, delivery, and
metadata pipelines using open-source OpenGeoPortal [2], Solr [3], GeoServer
[4], PostGIS [5], and GeoNetwork [6] technologies, plus the proprietary ESRI
ArcMap [7] -- the GIS industry's workhorse. Finally, we'll discuss the key
skillsets needed to build and maintain a spatial data infrastructure.

[1] http://foss4g.org
[2] http://opengeoportal.org
[3] http://lucene.apache.org/solr
[4] http://geoserver.org
[5] http://postgis.net
[6] http://geonetwork-opensource.org
[7] http://esri.com

==Under the Hood of Hadoop Processing at OCLC Research ==

[http://roytennant.com/ Roy Tennant]

* Previous Code4Lib presentations: 2006: "The Case for Code4Lib 501c(3)"

[http://hadoop.apache.org/ Apache Hadoop] is widely used by Yahoo!, Google, and many others to process massive amounts of data quickly. OCLC Research uses a 40-node compute cluster with Hadoop and HBase to process the 300 million MARC records of WorldCat in various ways. This presentation will explain how Hadoop MapReduce works and illustrate it with specific examples and code. The role of the jobtracker in both monitoring and reporting on processes will be explained. String searching WorldCat will also be demonstrated live.

== Quick and Easy Data Visualization with Google Visualization API and Google Chart Libraries ==

[http://bohyunkim.net/blog Bohyun Kim], Florida International University, bohyun.kim@fiu.edu
* 'No' previous Code4Lib presentations

Do most of the data that your library collects stay in spreadsheets or are published as a static table with a series of boring numbers? Do your library stakeholders spend more time collecting the data than using it as a decision-making tool because the data is presented in a way that makes it hard for them [http://developers.google.com/chart/interactive/docs/gallery to quickly grasp its significance? ]

This talk will provide an overview of [http://developers.google.com/chart/interactive/docs/reference Google Visualization API] [2] and [http://developers.google.com/chart/ Google Chart Libraries] [3] to get you started on the way to quickly query and visualize your library data from remote data sources (e.g. a Google Spreadsheet or your own database) with (or without) cool-looking user-controls, animation effects, and even a dashboard.

== Leap Motion + Rare Books: A hands-free way to view and interact with rare books in 3D ==

[http://http://www.youtube.com/user/jpdenzer Juan Denzer], Binghamton University, jdenzer@binghamton.edu
* 'No' previous Code4Lib presentations

As rare books become more delicate over time, making them available to the public becomes harder. We at Binghamton University Library have developed an application that makes it easier to view rare books without ever having to touch them. We have combined the Leap Motion hands-free device and 3D rendered models to create a new virtual experience for the viewer.

The application allows the user to rotate and zoom in on a 3D representation of a rare book. The user is also able to ‘open’ the virtual book and flip through it using a natural user interface. Such as swiping the hand left or right to turn the page.

The application is built on the .Net framework and is written in C#. 3D models are created using simple 3D software such as sketchup or Blender. Scans of the book cover and spine are created using simple flatbed scanners. The inside pages are scanned using overhead scanners.

This talk with discuss the technologies used in developing the application and virtually any library could implement the application with virtually no coding at all. This presentation will have a demonstration of the software and also a chance for audience members to experience the Rare Book Leap Motion App themselves.

== Course Reserves Unleashed! ==

* Bobbi Fox, Library Technology Services, Harvard University, bobbi_fox@harvard.edu
* Gloria Korsman, Andover-Harvard Theological Library
** No previous Code4Lib presentations

Hey kids! Remember when SOAP was used for something other than washing? Our sophisticated (and highly functional) Course Reserves Request system does!

However, while the system is great for submitting and processing course reserve requests, the student-facing presentation through Havard’s home-grown -- and soon to be replaced -- LMS leaves a lot to be desired.

Follow along as we leverage Solr 4 as a No-SQL database, along with more progressive RESTful API techniques, to release Reserves data into the wild without interfering with reserves request processing -- and, in the process, open up the opportunity for other schools to feed their data in as well.

== We Are All Disabled! Universal Web Design Making Web Services Accessible for Everyone ==

Cynthia Ng, Accessibility Librarian, CILS at Langara College
* No previous Code4Lib presentations (not counting lightning talks)

We’re building and improving tools and services all the time, but do you only develop for the “average” user or add things for “disabled” users? We all use “assistive” technology accessing information in a multitude of ways with different platforms, devices, etc. Let’s focus on providing web services that are accessible to everyone without it being onerous or ugly. The aim is to get you thinking about what you can do to make web-based services and content more accessible for all from the beginning or with small amounts of effort whether you're a developer or not.

The goal of the presentation is to provide both developers and content creators with information on simple, practical ways to make web content and web services more accessible. However, rather than thinking about putting in extra effort or making adjustment for those with disabilities, I want to help people think about how to make their websites more accessible for all users through universal web design.

== Personalize your Google Analytics Data with Custom Events and Variables ==

[http://joshwilson.net Josh Wilson], Systems Integration Librarian, State Library of North Carolina - joshwilsonnc@gmail.com

At the State Library of North Carolina, we had more specific questions about the use of our digital collections than standard GA could provide. A few implementations of custom events and custom variables later, we have our answers.

I'll demonstrate how these analytics add-ons work, and why implementation can sometimes be more complicated than just adding a few lines of JavaScript to your ga.js. I'll discuss some specific examples in use at the SLNC:

* Capturing the content of specific metadata fields in CONTENTdm as Custom Events
* Recording Drupal taxonomy terms as Custom Variables

In both instances, this data deepened our understanding of how our sites and collections were being used, and in turn, we were able to report usage more accurately to content contributors and other stakeholders.

More on: [https://developers.google.com/analytics/devguides/collection/gajs/eventTrackerGuide GA Custom Events] | [https://developers.google.com/analytics/devguides/collection/gajs/gaTrackingCustomVariables GA Custom Variables]

== Behold Fedora 4: The Incredible Shrinking Repository! ==

Esmé Cowles, UC San Diego Library. Previous talk: [http://code4lib.org/conference/2013/cowles-critchlow-westbrook All Teh Metadatas Re-Revisited] (2013)

* One repository contains untold numbers of digital objects and powers many Hydra and Islandora apps
* It speaks RDF, but contains no triplestore! (triplestores sold separately, SPARQL Update may be involved, some restrictions apply)
* Flexible enough to tie itself in knots implementing storage and access control policies
* Witness feats of strength and scalability, with dramatically increased performance and clustering
* Plumb the depths of bottomless hierarchies, and marvel at the metadata woven into the very fabric of the repository
* Ponder the paradox of ingesting large files by not ingesting them
* Be amazed as Fedora 4 swallows other systems whole (including Fedora 3 repositories)
* Watch novice developers setup Fedora 4 from scratch, with just a handful of incantations to Git and Maven

The Fedora Commons Repository is the foundation of many digital collections, e-research, digital library, archives, digital preservation, institutional repository and open access publishing systems. This talk will focus on how Fedora 4 improves core repository functionality, adds new features, maintains backwards compatibility, and addresses the shortcomings of Fedora 3.

== Organic Free-Range API Development - Making Web Services That You Will Actually Want to Consume ==

Steve Meyer and Karen Coombs, OCLC

Building web services can have great benefits by providing reusability of data and functionality. Underpinning your applications with a web service will allow you to write code once and support multiple environments: your library's web app, mobile applications, the embedded widget in your campus portal. However, building a web service is its own kind of artful programming. Doing it well requires attention to many of the same techniques and requirements as building web applications, though with different outcomes.

So what are the usability principles for web services? How do you build a web service that you (and others) will actually want to use? In this talk, we’ll share some of the lessons learned - the good, the bad, and the ugly - through OCLC's work on the WorldCat Metadata API. This web service is a sophisticated API that provides external clients with read and write access to WorldCat data. It provides a model to help aspiring API creators navigate the potential complications of crafting a web service. We'll cover:

* Loose coupling of data assets and resource-oriented data modeling at the core
* Coding to standards vs. exposure of an internal data model
* Authentication and security for web services: API Keys, Digital Signing, OAuth Flows
* Building web services that behave as a suite so it looks like the left hand knows what the right hand is doing

So at the end of the day, your team will know your API is a very good egg after all.

If accepted, the presenters intend to produce and share a Quick Guide for building a web service that will reflect content presented in the talk.

== Lucene's Latest (for Libraries) ==

erik.hatcher@lucidworks.com

Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.

== The Why and How of Very Large Displays in Libraries. ==

* Cory Lown, NCSU Libraries, cwlown@ncsu.edu

Previous Code4Lib Presentations:
* [http://code4lib.org/conference/2012/lown How People Search the Library from a Single Search Box] 2012
* [http://code4lib.org/conference/2010/orphanides_lown_lynema Enhancing Discoverability with Virtual Shelf Browse] 2010

Built into the walls of NC State's new Hunt Library are several [http://www.christiedigital.com/en-us/digital-signage/products/microtiles/pages/microtiles-digital-signage-video-wall.aspx Christie MicroTile Display Wall Systems]. What does a library do with a display that's seven feet tall and over twenty feet wide? I'll talk about why libraries might want large displays like this, what we're doing with them right now, and what we might do with them in the future. I'll talk about how these displays factor into planning for new and existing web projects. And I'll get into the fun details of how you build web applications that scale from the very small browser window on a phone all the way up to a browser window with about 14 million pixels (about 10 million more than a dual 24" monitor desktop setup).

== Discovering your Discovery System in Real Time. ==

* Godmar Back, Virginia Tech, gback@vt.edu
* Annette Bailey, Virginia Tech, afbailey@vt.edu

Practically all libraries today provide web-based discovery systems to their users;
users discover items and peruse or check them out by clicking on links. Unlike
the traditional transaction of checking out a book at the circulation desk, this
interaction is largely invisible. We have built a system that records user's
interaction with Summon in real-time, processes the resulting data with minimal delay,
and visualizes it in various ways using Google Charts and using various d3.js modules,
such as word clouds, tree maps, and others.

These visualizations can be embedded in web sites, but are also suitable for
projection via large-scale displays or projectors right into the 'Learning Spaces'
many libraries are converted into. The goal of this talk is to share the technology
and advocate the building of a cloud-based infrastructure that would make this
technology available to any library that uses a discovery system, rather than just
those who have the technological prowess for developing such systems and
visualizations in-house.

Previous presentations at Code4Lib:
* Talk: Code4Lib 2009 [http://code4lib.org/files/LibX2.0-Code4Lib-2009AsPresented.ppt LibX 2.0]
* Preconference: [http://wiki.code4lib.org/index.php/LibX_Preconference LibX 2.0, 2009]
* Preconference: Code4Lib 2010, On Widgets and Web Services

== Your Library, Anywhere: A Modern, Responsive Library Catalogue at University of Toronto Libraries ==

* Bilal Khalid, Gordon Belray, Lisa Gayhart (lisa.gayhart@utoronto.ca)

* No previous Code4Lib presentations

With the recent surge in the mobile device market and an ever expanding patron base with increasingly divergent levels of technical ability, the University of Toronto Libraries embarked on the development of a new catalogue discovery layer to fit the needs of its diverse users.

[http://search.library.utoronto.ca The result]: a mobile-friendly, flexible and intuitive web application that brings the full power of a faceted library catalogue to users without compromising quality or performance, employing Responsive Web Design principles. This talk will discuss: application development; service improvements; interface design; and user outreach, testing, and project communications. Feedback and questions from the audience are very welcome. If time runs short, we will be available for questions and conversation after the presentation.

Note: A version of this content has been provisionally accepted as an article for Code4Lib Journal, January 2014 publication.)

== All Tiled Up ==

* Mike Graves, MIT Libraries (mgraves@mit.edu)

You've got maps. You even scanned and georeferenced them. Now what? Running a full GIS stack can be expensive, and overkill in some cases. The good news is that you have a lot more options now than you did just a few years ago. I'd like to present some lighter weight solutions to making georeferenced images available on the Web.

This talk will provide an introduction to MBTiles. I'll go over what they are, how you create them, how you use them and why you would use them.

== The Great War: Image Interoperability to Facebook ==

* Rob Sanderson, Los Alamos National Laboratory (azaroth42@gmail.com)
** (Code4Lib 2006: [http://www.code4lib.org/2006/sanderson | Library Text Mining])
* Rob Warren, Carleton University
** No previous presentations

Using a pipeline constructed from Linked Open Data and other interoperability specifications, it is possible to merge and re-use image and textual data from distributed library collections to build new, useful tools and applications. Starting with the OAI-PMH interface to ContentDM, we will take you on a tour through the International Image Interoperability Framework and Shared Canvas, to a cross-institutional viewer, and image analysis for the purposes of building a historical Facebook from finding and tagging people in photographs. The World War One collections are drawn from multiple institutions and merged by the machine learning code.

The presentation will focus on the (open source) toolchain and the benefits of the use of standards throughout: OAI-PMH to get the metadata, IIIF for interaction with the images, the Shared Canvas ontology for describing collections of digitized objects, Open Annotation for tagging things in the images and specialized ontologies that are specific to the contents. The tools include standard RDF / OWL technologies, JSON-LD, imagemagick and OpenCV for image analysis.

== Visualizing Solr Search Results with D3.js for User-Friendly Navigation of Large Results Sets ==

*Julia Bauder, Grinnell College Libraries (bauderj-at-grinnell-dot-edu)
*No previous presentations at national Code4Lib conferences

As the corpus of articles, books, and other resources searched by discovery systems continues to get bigger, searchers are more and more frequently confronted with unmanageably large numbers of results. How can we help users make sense of 10,000 hits and find the ones they actually want? Facets help, but making sense of a gigantic sidebar of facets is not an easy task for users, either.
During this talk, I will explain how we will soon be using Solr 4’s pivot queries and hierarchical visualizations (e.g., treemaps) from D3.js to let patrons view and manipulate search results. We will be doing this with our VuFind 2.0 catalog, but this technique will work with any system running Solr 4. I will also talk about early student reaction to our tests of these visualization features.

== PeerLibrary – open source cloud based collaborative library ==

[https://github.com/peerlibrary/peerlibrary PeerLibrary is a new open source project] and a cloud service providing collaborative reading, sharing and storing. Users can upload publications they want to read (currently in PDF format), read them in the browser in real-time with others, highlight, annotate and organize their own or collaborative library. PeerLibrary provides a search engine to search over all uploaded open access publications. Additionally, it aims to collaboratively aggregate the open layer of knowledge on top of this publications through public annotations and references user will add to publications. In this way publications would not just be available to read, but accessible to the general public as well. Currently, it is aiming at scientific community and scientific publications.

See [http://blog.peerlibrary.org/post/63458789185/screencast-previewing-the-peerlibrary-project screencast here].

It is still in development and beta launch is planned at the end of November.

== Who was where when, or finding biographical articles on Wikipedia by place and time ==

* [http://morton-owens.info Emily Morton-Owens], The Seattle Public Library (presenting on work from NYU)
* No previous c4l presentations

It's easy to answer the question "What important people were in Paris in 1939?" But what about Virginia in the 1750s or Scandinavia in the 14th century? I created a tool that allows you to search for biographies in a generally applicable way, using a map interface. I would like to present updates to my thesis project, which combines a crawler written in Java that extracts information from Wikipedia articles, with a MongoDB data store and a frontend in Python.

The input to the project is freetext of entire articles in Wikipedia; this is important to allow us to pick up Benjamin Franklin not just in the single most obvious place of Philadelphia but also in London, Paris, Boston, etc. I can talk about my experiments disambiguating place names (approaches pioneered on newspaper articles were actually unhelpful on this type of text) and setting up a processing queue that does not become mired in the biographies of every human who ever played soccer. I also want to mitigate some of the implementation choices I made due to my academic deadline and improve the accuracy/usability.

What I hope to show is that I was able to develop a novel and useful reference tool automatically, using fairly simple heuristics that are a far cry from hand-cataloging familiar to many librarians.

You can try out [http://linserv1.cims.nyu.edu:48866/ the original version] (this server is inconveniently set to be updated/rebooted on 11/8--may be temporarily unavailable)

== Good!, DRY, and Dynamic: Content Strategy for Libraries (Especially the Big Ones) ==

*Michael Schofield, Nova Southeastern University Libraries, mschofield@nova.edu
*No previous code4lib presentations.

The responsibilities of the #libweb are exploding [it’s a good thing] and it is no longer uncommon for libraries to manage or even home-grow multiple applications and sites. Often it is at this point where the web people begin to suffer the absence of a content strategy when, say, business hours need to be updated sitewide a half-dozen times.

We were already feeling this crunch when we decided to further complicate the Nova Southeastern University Libraries by splitting the main library website into two. The Alvin Sherman Library, Research, and Information Technology Center is a unique joint-use facility that serves not only the academic community but the public of Broward County - and marketing a hyperblend of content through one portal just wasn't cutting it. With a web team of two, we knew that managing all this rehashed, disparate content was totally unsustainable.

I want to share in this talk how I went about making our library content DRY (“don’t repeat yourself”): input content in one place--blurbs, policies, featured events, featured databases, book reviews, business hours, and so on.--and syndicate it everywhere - even, sometimes, dynamically target that content for specific audiences or context. It is a presentation that is a little about workflow, a little more about browser and context detection, a tangent about content-modeling the CMS, and a lot about APIs, syndication, and performance.

== No code, no root, no problem? Adventures in SaaS and library discovery ==

*[mailto:erwhite@vcu.edu Erin White, VCU]
*No previous C4L presentations

In 2012 VCU was an eager early adopter of Ex Libris' cloud service Alma as an ILS, ERM, link resolver, and single-stop, de-silo'd public-facing discovery tool. This has been a disruptive change that has shifted our systems staff's day-to-day work, relationships with others in the library, and relationships with vendors.

I'll share some of our experiences and takeaways from implementing and maintaining a cloud service:
* Seeking disruption and finding it
* Changing expectations of service and the reality of unplanned downtime
* Communication and problem resolution with non-IT library staff
* Working with a vendor that uses agile development methodology
* Benefits and pitfalls of creating customizations and code workarounds
* Changes in library IT/coders' roles with SaaS

...as well as thoughts on the philosophy of library discovery vs real-life experiences in moving to a single-search model.

== Building for others (and ourselves): the Avalon Media System ==
* [mailto:michael.klein@northwestern.edu Michael B Klein], Senior Software Developer, Northwestern University
** [http://code4lib.org/conference/2010/metz_klein Public Datasets in the Cloud] (code4lib 2010)
** [http://code4lib.org/conference/2013/klein-rogers The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery] (code4lib 2013)
* [mailto:j-rudder@northwestern.edu Julie Rudder], Digital Initiatives Project Manager, Northwestern University
** no previous code4lib presentations

[http://www.avalonmediasystem.org/ Avalon Media System] is a collaborative effort between development teams at Northwestern and Indiana Universities. Our goal is to produce an open source media management platform that works well for us, but is also widely adopted and contributed to by other institutions. We believe that building a strong user and contributor community is vital to the success and longevity of the project, and have developed the system with this goal in mind. We will share lessons learned, pains and successes we’ve had releasing two versions of the application since last year.

Our presentation will cover our experiences:
* providing flexible, admin-friendly distribution and installation options
* building with abstraction, customization and local integrations in mind
* prioritizing features (user stories)
* attracting code contributions from other institutions
* gathering community feedback
* creating a product rather than a bag of parts

== How to check your data to provide a great data product? Data quality as a key product feature at Europeana ==

*[mailto:Peter.Kiraly@kb.nl Péter Király] portal backend developer, Europeana
*No previous C4L presentations

[http://Europeana.eu/ Europeana.eu] - Europe's digital library, archive and museum - aggregates more than 30 million metadata records from more than 2200 institutions. The records come from libraries, archives, museums and every other kind of cultural institution, from very different systems and metadata schemas, and are typically transformed several times until they are ingested into the Europeana data repository. Europeana builds a consolidated database from these records, creating reliable and consistent services for end-users (a search portal, search widget, mobile apps, thematic sites etc.) and an API, which supports our strategic goeal of data for reuse in education, creative industries, and the cultural sector. A reliable "data product" is thus at the core of our own software products, as well as those of our API partners.

Much effort is needed to smooth out local differences in the metadata curation practice of our data providers. We need a solid framework to measure the consistency of our data and provide feedback to decision-makers inside and outside the organisation. We can also use this metrics framework to ask content providers to improve their own metadata. Of course, a data-quality-driven approach requires that we also improve the data transformation steps of the Europeana ingestion process itself. Data quality issues heavily define what new features we are able to create in our user interfaces and API, and might actually affect the design and implementation of our underlying data structure, the Europeana Data Model.

In the presentation I briefly describe the Europeana metadata ingestion process, show the data quality metrics, the measuring techniques (using the Europeana API, Solr and MongoDB queries), some typical problems (both trivial and difficult ones), and finally the feedback mechanism we propose to deploy.

Keywords: Europeana, data quality, EDM, API, Apache Solr, MongoDB, #opendata, #openglam

== Teach your Fedora to Fly: scaling out a digital repository ==

*[mailto:acoburn@amherst.edu Aaron Coburn], Software Developer, Amherst College
*No previous C4L presentations

Fedora is a great repository system for managing large collections of digital objects, but what happens when a popular food magazine begins directing a large number of readers to a manuscript showing Emily Dickinson’s own recipe for doughnuts? While Fedora excels in its support of XML-based metadata, it doesn’t always perform well under a high volume of traffic. Nor is it especially tolerant of network or hardware failures.

This presentation will show how we are making heavy use of a fedora repository while at the same time insulating it almost entirely from any web traffic. Starting with a distributed web front-end built with Node.js, and caching most of the user-accessible content from Fedora in an elastic, fault-tolerant Riak (NoSQL) cluster, we have eliminated nearly all single points of failure in the system. It also means that our production system is spread across twelve separate servers, where asynchrony and Map-Reduce are king. And aside from being blazing fast, it is also entirely Hydra-compliant.

Furthermore, we will attempt to answer the question: if fedora crashes and the visitors to your site don’t notice, did it really fail?

== Using Open Source Software and Freeware to Preserve and Deliver Digital Videos ==
* [mailto:wfang@kinoy.rutgers.edu Wei Fang], Head of Digital Services, Rutgers University Law Library
* Jiebei Luo, Digital Projects Initiative Intern, Rutgers University
*No previous C4L presentations

The Rutgers University Law Library is the official digital repository of the New Jersey Supreme Court oral arguments since 2002. This large video collection contains approximately 3,000 videos with a total of 400 GB or 6,000 viewing hours. With the expansion of this collection, the existing database and the static website could not efficiently support the library’s daily operations and meet its patrons’ search needs.
By utilizing open source software and freeware such as Ubuntu, FFmpeg, Solr and Drupal, the library is able to develop a complete solution to re-encoding videos, embedding subtitles, incorporating Solr search engine and content management system to support full-text subtitle search, automatically updating video metadata records in the library catalog system and eventually providing a plug-in free HTML 5-based Web interface for patrons to view the videos online.
The aspects below will be presented in detail at the conference:
* Video codecs comparison
* Server-end batch video encoding/re-encoding
* HTML 5 video tag and embedding subtitles
* Incorporating search engine Solr and content management tool Drupal with the database to retrieve videos by full-text search especially in subtitle files
* Incorporating video metadata with the library catalog system

== Shared Vision, Shared Resources: the Curate Institutional Repository ==
* Dan Brubaker Horst, University of Notre Dame
** [http://code4lib.org/conference/2011/JohnsonHorst A Community-Based Approach to Developing a Digital Exhibit at Notre Dame Using the Hydra Framework]
* Julie Rudder, Northwestern University
** no previous presentations

Curate is being collaboratively developed by several institutions in the Hydra community who share the need and vision for a Fedora-backed Institutional Repository. The first release of Curate was a collaboration between Notre Dame and Northwestern University, along with Digital Curation Experts (DCE) - a vendor hired jointly by our two institutions. Powered by the Hydra engine Sufia, the team worked quickly to release the first version of Curate in October 2013 which provides a basic self-deposit system that has support for various content types, collection building, DOI minting, and user profile creation. From the very beginning we have built Curate to be easy to theme and extend in order to ease the process of installation and use by other institutions.

In December 2013, additional partners will join the project including: Indiana University, the University of Cincinnati and the University of Virginia. Each institution contributes resources to the project in order to further our common goal to create a product that fits our needs and has a sustainable future.Together we will tackle additional content types (like complex data, software, media), administrative collections and more.

Our presentation will include:
* a brief demonstration of Curate and technical overview
* why and how we work together
* why build Curate
* the future of the project

== Solr, Cloud and Blacklight ==
* David Jiao, Library Information Systems, Indiana University at Bloomington, djiao@indiana.edu
** No previous code4lib presentations

SolrCloud refers to the distributed capabilities in Solr4. It is designed to offer a highly available, fault tolerant environment by organizing data into multiple pieces that can be hosted on multiple machines with replicas, and providing a centralized cluster configuration and management.

At Indiana University, we are upgrading our Solr backend for our recently released Blacklight-based OPAC system from Solr 1.4 to Solr4, and we also put up efforts to build a private cloud of Solr4 servers. In this talk, I will persent certain features of SolrCloud, including distributed requests, fault tolerance, near real time indexing/searching, and configuration management with Zookeeper, and our experiences of utilizing these features to provide better performance and architecture for our OPAC system, which serves over 7 million bibliographic records to over 100 thousand students and faculty members. I will also discuss some practical lessons learned from our SolrCloud setup/upgrade and the integration of the new SolrCloud to our customized Blacklight system.

== Leveraging XSD's for Reflective, Live Dataset Support in Institutional Repositories ==
* [mailto:msulliva@ufl.edu Mark Sullivan], Library Information Technology, University of Florida
** No previous code4lib presentations

The University of Florida Libraries are currently adding support for active datasets into our METS-based institutional repository software. This ongoing project enables the library to be a partner in current, or long-running, data-driven projects around the university by providing tangible short-term and long-term benefits to the projects. The system assists project teams by storing and providing access to their data, while supporting online filtering and sorting of the data, custom queries, and adding and editing of the data by authorized users. We are also exploring simple data visualizations to allow users to perform basic graphical and geographic queries. Several different schemas were explored including DDI and EML, but ultimately the streamlined approach of using XSD's with some custom attributes was chosen, with all other data residing in the METS file portions. Currently the system is being developed using XSD's describing XML datasets, but this model should easily scale to support SQL datasets or large datasets supported by Hadoop or iRODS.

This work is being integrated in the open source [http://sobek.ufl.edu SobekCM Digital Content Management System] which is built on a pair-tree structure of METS resources with [http://ufdc.ufl.edu/design/webcontent/sobekcm/SobekCM_Resource_Object.pdf rich metadata support] including DC, MODS, MARC, VRACore, DarwinCore, IEE-LOM, GML/KML, schema.org microdata, and many other standard schemas. The system has emphasized online, distributed creation and maintenance of resources including geo-placement and geographic searching of resources, building structure maps (table of contents) visually online, and a broad suite of curator tools.

This work is presented as a model which could be implemented in other systems as well. We will demonstrate current support and discuss our upcoming roadmap to provide complete support.

== Dead-simple Video Content Management: Let Your Filesystem Do The Work ==

* Andreas Orphanides, NCSU Libraries (akorphan (at) ncsu.edu)
** (never led or soloed a C4L presentation)

Content management is hard. To keep all the moving parts in order, and to maintain a layer of separation between the system and content creators (who are frequently not technical experts), we typically turn to content management systems like Drupal. But even Drupal and its kin require significant overhead and present a not inconsiderable learning curve for nontechnical users.

In some contexts it's possible -- and desirable -- to manage content in a more streamlined, lightweight way, with a minimum of fuss and technical infrastructure. In this presentation I'll share a simple MVC-like architecture for managing video content for playback on the web, which uses a combination of Apache's mod_rewrite module and your server's filesystem structure to provide an automated approach to video content management that's easy to implement and provides a low barrier to content updates: friendly to content creators and technology implementors alike. Even better, the basic method is HTML5-friendly, and can be integrated into your favorite content management system if you've got permissions for creating templates.

In the presentation I'll go into detail about the system structure and logic required to implement this approach. I'll detail the benefits and limitations of the system, as well as the challenges I encountered in developing its implementation. Audience members should come away with sufficient background to implement a similar system on their own servers. Implementation documentation and genericized code will also be shared, as available.

== Managing Discovery ==

* Andrew Pasterfield, Senior Programmer/Systems Analyst, University of Calgary Library, ampaster@ucalgary.ca
**No previous code4lib presentations 
In fall 2012 the University of Calgary Library launched a new home page that incorporated a Summon powered
Single Search Box with customized “bento box” results display. Search at the U of C now combines a range of
metadata sources for discovery and customized mapping of a database recommender and LibGuide into a unified
display. Further customizations include a non Google Analytics/non proxy method to log clicks. 

This presentation will discuss the technical details of bringing the various systems together into one display interface to increase discovery at the U of C Library.

http://library.ucalgary.ca

== Sorting it out: a piece of the User Centered Design Process ==

* Cindy Beggs, [http://www.akendi.com/aboutus/management/ Akendi], cindy@akendi.com

This talk is about how to apply a user centered design methodology to the process of creating an information architecture. Participants learn the fundamentals of UCD and how card sorting and reverse card sorting enable us to isolate the content we present on screen from the layouts and visuals of those screens. We talk about ways to identify who will be using the information architecture you are creating and why we need to know how it will be used.

What will attendees takes away from your talk?
The criticality of involving “real” end users in the process of creating an information architecture. The basics of following a user-centered-design process in the creation of best in class, content-rich, digital products.

Cindy Beggs has been working in the “information industry” for over 25 years. A librarian by profession, she has spent decades helping users figure out how to find their way through large bodies of content. Her insights into how people seek information, her empathy for those who find it a challenge and her practical experience helping organizations figure out how to best structure their content contribute to her success as an information architect with both clients and trainees. (http://www.akendi.com/aboutus/management/)

==Implementation of ArchivesSpace in University of Richmond==

*Birong Ho, bho@richmond.edu

University of Richmond implemented its archive collection management ArchivsSpace in the fall, 2013. As a charter member and the Head of Special Collection as the Board member, implementation of such an Open Source Software became a priority.

Several aspects of implementation will be addressed in the talk. Among them, they are Collections and Repository, storage layer including data format, System resources requirements, Technical architecture, Customization, scaling and integrated with other systems in the library.

The customization, scale, and integration with other systems such as Archeon and Exist on campus became a concern will be focused and elaborated in the talk.

==Easy Wins for Modern Web Technologies in Libraries==

*[mailto:trey.terrell@oregonstate.edu Trey Terrell], Analyst Programmer, Oregon State University
** No previous Code4Lib presentations

Oregon State University is currently implementing an updated version of its room reservation system. In its development we've come across and implemented a variety of "easy wins" to make it more responsive, easier to maintain, less expensive to run, and just cooler to experience. While our particular system was in Ruby on Rails, this talk will address general methods and example utilities which can be used no matter your stack.

I'll be talking about things like cache management, reverse proxies, publish/subscribe servers, WebSockets, responsive design, asynchronous processing, and keeping complicated stacks up and running with minimal effort.

==Implementing Islandora at a Small Institution==

*Megan Kudzia, Albion College Library
*Eddie Bachle, Albion College IT
**No previous Code4Lib presentations

Albion College (and particularly the Library/Archives and Special Collections) has a variety of needs which could be met by an open-source Institutional Repository system. Several months and lots of conversations later, we’re continuing to troubleshoot our way through Islandora. We’d like to talk about what has worked for us, where our frustrations have been, whether it’s even possible to install and develop a system like this at a small institution, and where the process has stalled.

As of right now, we do have a semi-working installation. We’re not sure when it will be ready for our end users, but we'll talk about our development process and evaluate our progress.
''Contributions also by Nicole Smeltekop, Albion College Archives & Special Collections''

== PhantomJS+Selenium: Easy Automated Testing of AJAX-y UIs ==

* Martin Haye, California Digital Library, martin.haye@ucop.edu
** Previous Code4Lib Presentation: [http://code4lib.org/conference/2012/collett Beyond code: Versioning data with Git and Mercurial] at Code4Lib 2012 (Martin co-presenting with Stephanie Collett)
* Mark Redar, California Digital Library, mark.redar@ucop.edu

Web user interfaces are demanding ever-more dynamism and polish, combining HTML5, AJAX, lots of CSS and jQuery (or ilk) to create autocomplete drop-downs, intelligent buttons, stylish alert dialogs, etc. How can you make automated tests for these highly complex and interactive UIs?

Part of the answer is PhantomJS. It’s a modern WebKit browser that’s “headless” (meaning it has no display) that can be driven from command-line Selenium unit tests. PhantomJS is dead simple to install, and its blazing speed and server-friendliness make continuous integration testing easy. You can write UI unit tests in {language-of-your-choice} and run them not just in PhantomJS but in Firefox and Chrome, plus a zillion browser/OS combinations at places like SauceLabs, TestingBot and BrowserStack.

In this double-team live code talk, we’ll explain all that while we demonstrate the following in real time:

* Start with nothing.
* Install Selenium bindings for Ruby and Python.
* In each language write a small test of an AJAX-y UI.
* Run the tests in Firefox, and fix bugs (in the test or UI) as needed.
* Install PhantomJS.
* Show the same tests running headless as part of a server-friendly test suite.
* (Wifi permitting) Show the same tests running on a couple different browser/OS combinations on the server cloud at SauceLabs – talking through a tunnel to the local firewalled application.

==New Technologies, Collaboration, & Entrepreneurship in Libraries: Harnessing Their Power to Help Your Library==

* Stephanie Walker – swalker@brooklyn.cuny.edu
* Howard Spivak – howards@brooklyn.cuny.edu
* Alex - Alex@brooklyn.cuny.edu

Academic libraries are caught in budget squeezes and often struggle to find ways to communicate value to senior administration and others. At Brooklyn College Library, we have taken an unusual, possibly unique, approach to these issues. Our technology staff have long worked directly with librarians to develop products that meet library, faculty, and student needs, and we have shared many of our products with colleagues, including an award-winning website, e-resource, and content management system we call 4MyLibrary, which we shared for free with 8 CUNY colleges, and also an easy-to-use book scanner, which has proven overwhelming popular with students, faculty, other librarians, and numerous campus offices. Recently, motivated by budget cuts, we decided that what worked for us might interest other libraries, and working with our Office of Technology Commercialization, we started selling 2 products: our book scanners (at half the price of commercial alternatives), and a hosting service, whereby we could host and support 4MyLibrary for libraries with minimal technology staff. Both succeeded, and yielded major benefits: a steady revenue stream and the admiration and serious goodwill of our senior administration and others. However, this presentation is neither a basic how-to, nor an advertisement. With this presentation, we hope to spur a conversation for broader collaboration, especially regarding new technologies, among libraries. We all have some level of technical expertise, most of us are struggling with rising prices and tight budgets, and many of us are unhappy with various technology products we use, from scanners to our ILS. We believe – and can demonstrate – that with collaboration, we can solve many of our problems, and provide better services to boot.

== Identifiers, Data, and Norse Gods ==

* Ryan Scherle, Dryad Digital Repository, ryan@datadryad.org
** previous Code4Lib talk [http://ryan.scherle.org/papers/2010-2-code4lib-HIVE.ppt HIVE: A New Tool for Working With Vocabularies], at Code4Lib 2011.

ORCID and DataCite provide stable identifiers for researchers and and data, respectively. Each system does a fine job of providing value to its users. But wouldn't it be great if they could link their systems to create something much more powerful? Perhaps even as powerful as a god?

Enter [http://odin-project.eu/ ODIN], The ORCID and DataCite Interoperability Network. ODIN is a two-year project to unleash the power of persistent identifiers for researchers and the research they create. This talk will present recent work from the ODIN project, including several tools that can be used to unleash the godlike power of identifiers at your institution. Current tools include:
* Metadata generator tool: allows repository staff to create DataCite metadata with embedded ORCIDs.
* Claiming tool: assists researchers in claiming their work within the ORCID system.
* ORCID-feed: includes a list of ORCID works on any web page.
* ODIN's HAMR: assists in populating a DSpace repository with ORCIDs. Based on work from a Code4Lib hackathon!

== Armed Bandits in the Digital Library ==

* Roman Chyla, [http://labs.adsabs.harvard.edu/adsabs/ Astrophysics Data System], rchyla@cfa.harvard.edu
** Previous Code4Lib: [http://code4lib.org/conference/2013/chyla Citation search in SOLR and second-order operators]

Many of us are using the excellent Lucene library (or SOLR appliance) to provide search functionality. These systems contain number of features to adjust relevancy ranking of hits, but we may not know how to use them. In this presentation, I'll present the available options - eg. what is the default ranking 'Vector space model, what are the alternatives (eg. BM25) and what are the other options we have to tweak and adjust the ranking of the hits (eg. boost factors, functions). But even if we know how to deploy these adjustments and tweaks, we are still left in dark. We do not know whether the change we've just rolled out had a significant (statistically significant) effect or maybe it was just a waste of time and resources? A/B testing is one option, but there may be a much better one - so called "Multi-Armed Bandits Approach". And in this talk I'd like to show how we are experimenting with this strategy to adjust [http://labs.adsabs.harvard.edu/adsabs/ ADS search engine].

== Building Worker Queues with AWS and Resque ==

* Eric Rochester [http://scholarslab.org Scholars' Lab], erochest@virginia.edu
* Scott Turnbull [http://aptrust.org/ Academic Preservation Trust], scott.turnbull@aptrust.org

A common task in larger systems is to be able to process large input files automatically. Often users can drop those files into a shared directory on AWS or on NFS or another shared drive. Those files need to be processed and potentially integrated into a system. This task has come up recently in the University of Virginia libraries in allowing users to add GIS data to the system and in setting up a system for the Academic Preservation Trust (http://aptrust.org/) that ingests files and resources into the preservation system.

This system is built by loosely coupling a number of different technologies. This allows us to easily interoperate and communicate between different system and programming environments. Because the interfaces are well defined, it’s also fairly simple to switch out technologies as the requirements of the system change.

The process is fairly simple:

First, a Ruby daemon monitors an AWS S3 bucket that others can upload new files into. This daemon creates a Resque status task, adds a marker for the task in a database, and continues monitoring.

Second, Resque mediates incoming job requests and routes them to the appropriate workers which may be in Java, Go, or Ruby. The diversity of technologies that Resque can manage allows great latitude to leverage the appropriate tool for a specific job. While processing, it updates the status for that job and coordinates processing with other jobs.

Finally, a page that is integrated into a larger Rails app provides a novice-user-friendly view of the status of the workers and allows basic tasks such as restarting the job.

This architecture allows us to swap in the technology that best fits each part of the process, and it makes it easier to maintain the system. We use this to integrate and coordinate between tasks handled in Java, Ruby, and Go, and it provides an effective way to interoperate with these programming languages and the respective strengths that they bring to this system.

==Piwik: Open source web analytics==
* Kirk Hess, University of Illinois at Urbana-Champaign (kirkhess@illinois.edu)
** (Code4Lib 2012: [http://code4lib.org/conference/2012/hess| Discovering Digital Library User Behavior with Google Analytics])

While Google Analytics is synonymous with Web Analytics, fortunately today we have many other good options, and one option is Piwik, [http://piwik.org| piwik.org] a simple to install, open-source PHP/MySQL application with a tracking script that will sit alongside Google Analytics tracking the usual clicks, events and variables. In this presentation, I'd like to cover the usual analytics topics and also cover what makes Piwik powerful, such as importing and visualizing web logs from any system to incorporate both past and future data, easily tracking downloads, and the ability to write your own reports or dashboard. The visitor log data is stored securely on your own server so you have control over who looks at the data and how much or how little to keep. With an active and helpful developer community, Piwik has the potential for analytics which makes sense for libraries, not e-commerce.

[[:Category:Code4Lib2014]]

2014 Prepared Talk Proposals

2013-11-08T20:06:27Z

Ryscher: /* Identifiers, Data, and Norse Gods */

'''Proposals for Prepared Talks:'''

Prepared talks are 20 minutes (including setup and questions), and should focus on one or more of the following areas:

* ''Projects'' you've worked on which incorporate innovative implementation of existing technologies and/or development of new software
* ''Tools and technologies'' – How to get the most out of existing tools, standards and protocols (and ideas on how to make them better)
* ''Technical issues'' - Big issues in library technology that should be addressed or better understood
* ''Relevant non-technical issues'' – Concerns of interest to the Code4Lib community which are not strictly technical in nature, e.g. collaboration, diversity, organizational challenges, etc.

'''To Propose a Talk'''
* Log in to the wiki in order to submit a proposal. If you are not already registered, follow the instructions to do so.
* Provide a title and brief (500 words or fewer) description of your proposed talk.
* If you so choose, you may also indicate when, if ever, you have presented at a prior Code4Lib conference. This information is completely optional, but it may assist us in opening the conference to new presenters.

As in past years, the Code4Lib community will vote on proposals that they would like to see included in the program. This year, however, only the top 10 proposals will be guaranteed a slot at the conference. Additional presentations will be selected by the Program Committee in an effort to ensure diversity in program content. Community votes will, of course, still weigh heavily in these decisions.

Presenters whose proposals are selected for inclusion in the program will be guaranteed an opportunity to register for the conference. The standard conference registration fee will still apply.

''Proposals can be submitted through '''Friday, November 8, 2013, at 5pm PST'''''. Voting will commence on November 18, 2013 and continue through December 6, 2013. The final line-up of presentations will be announced in early January, 2014.

'''Talk Proposals'''

==Creating a new Greek-Dutch dictionary==
* Caspar Treijtel, University of Amsterdam, c.treijtel@uva.nl

At present, no complete dictionary of (ancient) Greek-Dutch is available online. A new dictionary is currently under construction at Leiden University, with software being developed at the University of Amsterdam. The team in Leiden has already begun preparation of the data, with at this moment about 6,000 approved lemmas. The ultimate goal is to produce both a print version and online open access version from the same source documents. The software needed for this has been made in a project that was funded by CLARIN-NL.

Migrator

For the production of lemmas we have implemented an advanced workflow. The (generally non-technical) users create lemmas using MS Word, which is both familiar and easy to use. We have developed a custom software module that carefully migrates the Word documents into deeply structured XML by analyzing the structure and semantics of the lemmas, and falling back on heuristics in ambiguous cases. While having initially envisioned the oXygen XML Author component as the main tool for creating new lemmas, we obtained excellent results with the migrator module, and decided therefore to continue using MS Word as the primary composition tool. The main advantage of this is that the editors are much more familiar with Word than with any other WYSIWYG editor. Lemmas that have been migrated to XML are stored in an XML database and can be further edited using oXygen XML Author.

Lemmatizer

Greek morphology is complicated. In order to use a dictionary effectively, a rather high level of initial language competence is necessary for the user to be able to relate the word form s/he finds in a text to the correct basic lemma form, where the definition of the word can be found. Using a Greek morphological database we have been able to facilitate the search for lemmas. A ‘lemmatizer’ module gives the possible parsings of the word forms and the lemmas they can be derived from. This enables the user to type in the word as found in the text and be redirected to the correct lemma.

Visualization

For the online dictionary we have implemented a visualization module that allows the user to view multiple lemmas at once. The implementation of this module has been done using the Javascript framework MooTools. The result is a viewer that performs really well and is run by maintainable Javascript code.

The online dictionary is still being worked on, have a look at http://www.woordenboekgrieks.nl/ for the beta version. A newer test version with additional features can be found here: http://angel.ic.uva.nl:8600/.

Credits

* construction of the dictionary: Prof. Ineke Sluiter, Classics department of Leiden University; Prof. Albert Rijksbaron, University of Amsterdam
* publisher of the dictionary: Amsterdam University Press
* design/typesetting dictionary: TaT Zetwerk (http://www.tatzetwerk.nl/)
* software development: Digital Production Center, University Library, University of Amsterdam
* project funding: CLARIN-NL (http://www.clarin.nl/)
* morphological database for use by the lemmatizer: courtesy of Prof. Helma Dik, University of Chicago (based on data of the Perseus Project)

----

== Using Drupal to drive alternative presentation systems ==

* [[User:Highermath|Cary Gordon]], The Cherry Hill Company, cgordon@chillco.com

Recently, we have been building systems that use angular.js, Rails, or other systems for presentation, while leveraging Drupal's sophisticated content management capabilities on the back end.

So far, these have been one-way systems, but as we move to Drupal 8 we are beginning to explore ways to further decouple the presentation and CMS functions.

== A Book, a Web Browser and a Tablet: How Bibliotheca Alexandrina's Book Viewer Framework Makes It Possible ==

* [[User:Mohammed.abuouda|Mohammed Abu ouda]], Bibliotheca Alexandrina (The new Library of Alexandria)

A lot of institutions around the world are engaged in multiple digitization projects aiming at preserving the human knowledge present in books and availing them through multiple channels to people around the whole globe. These efforts will sure help close the digital gap particularly with the arrival of affordable e-readers, mobile phones and network coverage. However, the digital reading experience has not yet arrived to its maximum potential. Many readers miss features they like in their good old books and wish to find them in their digital counterpart. In an attempt to create a unique digital reading experience, Bibliotheca Alexandria (BA) created a flexible book viewing framework that is currently used to access its current collection of more than 300,000 digital books in five different languages which includes the largest collection of digitized Arabic books.

Using open source tools, BA used the framework to develop a modular book viewer that can be deployed in different environments and is currently at the heart of various BA projects. The Book viewer provides several features creating a more natural reading experience. As with physical books, the reader can now personalize the books he reads by adding annotations like highlights, underlines and sticky notes to capture his thoughts and ideas in addition to being able to share the book with friends on social networks. The reader can perform a search across the content of the book receiving highlighted search results within the pages of the book. More features can be further added to the book viewer through its plugin architecture.

== Structured data NOW: seeding schema.org in library systems ==

* [http://coffeecode.net Dan Scott], Laurentian University
** Previous code4lib presentations: [https://archive.org/details/code4lib.conf.2008.pres.CouchDBsacrilege CouchDB is sacrilege... mmm, delicious sacrilege] at Code4Lib 2008

The semantic web, linked data, and structured data are all fantastic ideas with a barrier imposed by implementation constraints. If their system does not allow customizations, or the institution lacks skilled human resources, it does not matter how enthused a given library might be about publishing structured data... it will not happen. However, if the software in use simply publishes structured data by default, then the web will be populated for free. Really! No extra resources necessary.

This presentation highlights Dan's work with systems such as Evergreen, Koha, and VuFind to enable the publication of schema.org structured data out-of-the-box. Along the way, we reflect the current state of the W3C Schema.org Bibliographic Extension community group efforts to shape the evolution of the schema.org vocabulary. Finally, hold on tight as we contemplate next steps and the possibilities of a world where structured data is the norm on the web.

== Towards Pasta Code Nirvana: Using JavaScript MVC to Fill Your Programming Ravioli ==

* Bret Davidson, North Carolina State University Libraries, bret_davidson@ncsu.edu
** Previous Code4Lib Presentations: [http://wiki.code4lib.org/index.php/2013_talks_proposals#Data-Driven_Documents:_Visualizing_library_data_with_D3.js Visualizing library data with D3.js] at Code4Lib 2013

JavaScript MVC frameworks are ushering in a golden age of robust and responsive web applications that take advantage of evergreen browsers, performant JS engines, and the unprecedented reach provided by billions of personal computing devices. The web browser has emerged as the world’s most popular application runtime and the complexity[1] and scope of JavaScript applications has exploded accordingly. Server-side web frameworks like Rails and Django have helped developers adhere to best practices like modularity, dependency injection, and unit testing for years, practices that are now being applied to JavaScript development through projects like Backbone[2], Ember[3], and Angular[4].

This talk will discuss the issues JavaScript MVC frameworks are trying to solve, common features like data binding, implications for the future of web development[5], and the appropriateness of JavaScript MVC for library applications.

*[1]http://en.wikipedia.org/wiki/Spaghetti_code
*[2]http://backbonejs.org
*[3]http://emberjs.com
*[4]http://angularjs.org
*[5]http://tomdale.net/2013/09/progressive-enhancement-is-dead/

== WebSockets for Real-Time and Interactive Interfaces ==

* [http://ronallo.com Jason Ronallo], NCSU Libraries, jason_ronallo@ncsu.edu

Previous Code4Lib presentations:
* [http://code4lib.org/conference/2012/ronallo HTML5 Microdata and Schema.org] 2012
* [http://code4lib.org/conference/2013/ronallo HTML5 Video Now!] 2013

Watching the Google Analytics Real-Time dashboard for the first time was mesmerizing. As soon as someone visited a site, I could see what page they were on. For a digital collections site with a lot of images, it was fun to see what visitors were looking at. But getting from Google Analytics to the image or other content of what was currently being viewed was cumbersome. The real-time experience was something I wanted share with others. I'll show you how I used a WebSocket service to create a real-time interface to digital collections.

In the Hunt Library at NCSU we have some large video walls. I wanted to make HTML-based exhibits that featured viewer interactions. I'll show you how I converted Listen to Wikipedia [1] into an bring-your-own-device interactive exhibit. With WebSockets any HTML page can be remote controlled by any internet connected device.

I will attempt to include real-time audience participation.

[1] http://listen.hatnote.com/

== Rapid Development of Automated Tasks with the File Analyzer ==

* Terry Brady, Georgetown University Libraries, twb27@georgetown.edu

The Georgetown University Libraries have customized the File Analyzer and Metadata Harvester application (https://github.com/Georgetown-University-Libraries/File-Analyzer) to solve a number of library automation challenges:
* validating digitized and reformatted files
* validating vendor statistics for counter compliance
* preparing collections of digital files for archiving and ingest
* manipulating ILS import and export files

The File Analyzer application was used by the US National Archives to validate 3.5 million digitized images from the 1940 Census. After implementing a customized ingest workflow within the File Analyzer, the Georgetown University Libraries was able to process an ingest backlog of over a thousand files of digital resources into DigitalGeorgetown, the Libraries’ Digital Collections and Institutional Repository platform. Georgetown is currently developing customized workflows that integrate Apache Tika, BagIt, and Marc conversion utilities.

The File Analyzer is a desktop application with a powerful framework for implementing customized file validation and transformation rules. As new rules are deployed, they are presented to users within a user interface that is easy (and powerful) to use.

Learn about the functionality that is available for download, how you can use this tool to automate workflows from digital collections to ILS ingests to electronic resources statistics and also discuss the opportunities to collaborate on enhancements to this application!

== GeoHydra: How to Build a Geospatial Digital Library with Fedora ==

* [http://stanford.edu/~drh Darren Hardy], Stanford University, drh@stanford.edu

Geographically-rich data are exploding and putting fear in those trying to
tackle integrating them into existing digital library infrastructures.
Building a spatial data infrastructure that integrates with your digital
library infrastructure need not be a daunting task. We have successfully
deployed a geospatial digital library infrastructure using Fedora and
open-source geospatial software [1]. We'll discuss the primary design
decisions and technologies that led to a production deployment within a few
months. Briefly, our architecture revolves around discovery, delivery, and
metadata pipelines using open-source OpenGeoPortal [2], Solr [3], GeoServer
[4], PostGIS [5], and GeoNetwork [6] technologies, plus the proprietary ESRI
ArcMap [7] -- the GIS industry's workhorse. Finally, we'll discuss the key
skillsets needed to build and maintain a spatial data infrastructure.

[1] http://foss4g.org
[2] http://opengeoportal.org
[3] http://lucene.apache.org/solr
[4] http://geoserver.org
[5] http://postgis.net
[6] http://geonetwork-opensource.org
[7] http://esri.com

==Under the Hood of Hadoop Processing at OCLC Research ==

[http://roytennant.com/ Roy Tennant]

* Previous Code4Lib presentations: 2006: "The Case for Code4Lib 501c(3)"

[http://hadoop.apache.org/ Apache Hadoop] is widely used by Yahoo!, Google, and many others to process massive amounts of data quickly. OCLC Research uses a 40-node compute cluster with Hadoop and HBase to process the 300 million MARC records of WorldCat in various ways. This presentation will explain how Hadoop MapReduce works and illustrate it with specific examples and code. The role of the jobtracker in both monitoring and reporting on processes will be explained. String searching WorldCat will also be demonstrated live.

== Quick and Easy Data Visualization with Google Visualization API and Google Chart Libraries ==

[http://bohyunkim.net/blog Bohyun Kim], Florida International University, bohyun.kim@fiu.edu
* 'No' previous Code4Lib presentations

Do most of the data that your library collects stay in spreadsheets or are published as a static table with a series of boring numbers? Do your library stakeholders spend more time collecting the data than using it as a decision-making tool because the data is presented in a way that makes it hard for them [http://developers.google.com/chart/interactive/docs/gallery to quickly grasp its significance? ]

This talk will provide an overview of [http://developers.google.com/chart/interactive/docs/reference Google Visualization API] [2] and [http://developers.google.com/chart/ Google Chart Libraries] [3] to get you started on the way to quickly query and visualize your library data from remote data sources (e.g. a Google Spreadsheet or your own database) with (or without) cool-looking user-controls, animation effects, and even a dashboard.

== Leap Motion + Rare Books: A hands-free way to view and interact with rare books in 3D ==

[http://http://www.youtube.com/user/jpdenzer Juan Denzer], Binghamton University, jdenzer@binghamton.edu
* 'No' previous Code4Lib presentations

As rare books become more delicate over time, making them available to the public becomes harder. We at Binghamton University Library have developed an application that makes it easier to view rare books without ever having to touch them. We have combined the Leap Motion hands-free device and 3D rendered models to create a new virtual experience for the viewer.

The application allows the user to rotate and zoom in on a 3D representation of a rare book. The user is also able to ‘open’ the virtual book and flip through it using a natural user interface. Such as swiping the hand left or right to turn the page.

The application is built on the .Net framework and is written in C#. 3D models are created using simple 3D software such as sketchup or Blender. Scans of the book cover and spine are created using simple flatbed scanners. The inside pages are scanned using overhead scanners.

This talk with discuss the technologies used in developing the application and virtually any library could implement the application with virtually no coding at all. This presentation will have a demonstration of the software and also a chance for audience members to experience the Rare Book Leap Motion App themselves.

== Course Reserves Unleashed! ==

* Bobbi Fox, Library Technology Services, Harvard University, bobbi_fox@harvard.edu
* Gloria Korsman, Andover-Harvard Theological Library
** No previous Code4Lib presentations

Hey kids! Remember when SOAP was used for something other than washing? Our sophisticated (and highly functional) Course Reserves Request system does!

However, while the system is great for submitting and processing course reserve requests, the student-facing presentation through Havard’s home-grown -- and soon to be replaced -- LMS leaves a lot to be desired.

Follow along as we leverage Solr 4 as a No-SQL database, along with more progressive RESTful API techniques, to release Reserves data into the wild without interfering with reserves request processing -- and, in the process, open up the opportunity for other schools to feed their data in as well.

== We Are All Disabled! Universal Web Design Making Web Services Accessible for Everyone ==

Cynthia Ng, Accessibility Librarian, CILS at Langara College
* No previous Code4Lib presentations (not counting lightning talks)

We’re building and improving tools and services all the time, but do you only develop for the “average” user or add things for “disabled” users? We all use “assistive” technology accessing information in a multitude of ways with different platforms, devices, etc. Let’s focus on providing web services that are accessible to everyone without it being onerous or ugly. The aim is to get you thinking about what you can do to make web-based services and content more accessible for all from the beginning or with small amounts of effort whether you're a developer or not.

The goal of the presentation is to provide both developers and content creators with information on simple, practical ways to make web content and web services more accessible. However, rather than thinking about putting in extra effort or making adjustment for those with disabilities, I want to help people think about how to make their websites more accessible for all users through universal web design.

== Personalize your Google Analytics Data with Custom Events and Variables ==

[http://joshwilson.net Josh Wilson], Systems Integration Librarian, State Library of North Carolina - joshwilsonnc@gmail.com

At the State Library of North Carolina, we had more specific questions about the use of our digital collections than standard GA could provide. A few implementations of custom events and custom variables later, we have our answers.

I'll demonstrate how these analytics add-ons work, and why implementation can sometimes be more complicated than just adding a few lines of JavaScript to your ga.js. I'll discuss some specific examples in use at the SLNC:

* Capturing the content of specific metadata fields in CONTENTdm as Custom Events
* Recording Drupal taxonomy terms as Custom Variables

In both instances, this data deepened our understanding of how our sites and collections were being used, and in turn, we were able to report usage more accurately to content contributors and other stakeholders.

More on: [https://developers.google.com/analytics/devguides/collection/gajs/eventTrackerGuide GA Custom Events] | [https://developers.google.com/analytics/devguides/collection/gajs/gaTrackingCustomVariables GA Custom Variables]

== Behold Fedora 4: The Incredible Shrinking Repository! ==

Esmé Cowles, UC San Diego Library. Previous talk: [http://code4lib.org/conference/2013/cowles-critchlow-westbrook All Teh Metadatas Re-Revisited] (2013)

* One repository contains untold numbers of digital objects and powers many Hydra and Islandora apps
* It speaks RDF, but contains no triplestore! (triplestores sold separately, SPARQL Update may be involved, some restrictions apply)
* Flexible enough to tie itself in knots implementing storage and access control policies
* Witness feats of strength and scalability, with dramatically increased performance and clustering
* Plumb the depths of bottomless hierarchies, and marvel at the metadata woven into the very fabric of the repository
* Ponder the paradox of ingesting large files by not ingesting them
* Be amazed as Fedora 4 swallows other systems whole (including Fedora 3 repositories)
* Watch novice developers setup Fedora 4 from scratch, with just a handful of incantations to Git and Maven

The Fedora Commons Repository is the foundation of many digital collections, e-research, digital library, archives, digital preservation, institutional repository and open access publishing systems. This talk will focus on how Fedora 4 improves core repository functionality, adds new features, maintains backwards compatibility, and addresses the shortcomings of Fedora 3.

== Organic Free-Range API Development - Making Web Services That You Will Actually Want to Consume ==

Steve Meyer and Karen Coombs, OCLC

Building web services can have great benefits by providing reusability of data and functionality. Underpinning your applications with a web service will allow you to write code once and support multiple environments: your library's web app, mobile applications, the embedded widget in your campus portal. However, building a web service is its own kind of artful programming. Doing it well requires attention to many of the same techniques and requirements as building web applications, though with different outcomes.

So what are the usability principles for web services? How do you build a web service that you (and others) will actually want to use? In this talk, we’ll share some of the lessons learned - the good, the bad, and the ugly - through OCLC's work on the WorldCat Metadata API. This web service is a sophisticated API that provides external clients with read and write access to WorldCat data. It provides a model to help aspiring API creators navigate the potential complications of crafting a web service. We'll cover:

* Loose coupling of data assets and resource-oriented data modeling at the core
* Coding to standards vs. exposure of an internal data model
* Authentication and security for web services: API Keys, Digital Signing, OAuth Flows
* Building web services that behave as a suite so it looks like the left hand knows what the right hand is doing

So at the end of the day, your team will know your API is a very good egg after all.

If accepted, the presenters intend to produce and share a Quick Guide for building a web service that will reflect content presented in the talk.

== Lucene's Latest (for Libraries) ==

erik.hatcher@lucidworks.com

Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.

== The Why and How of Very Large Displays in Libraries. ==

* Cory Lown, NCSU Libraries, cwlown@ncsu.edu

Previous Code4Lib Presentations:
* [http://code4lib.org/conference/2012/lown How People Search the Library from a Single Search Box] 2012
* [http://code4lib.org/conference/2010/orphanides_lown_lynema Enhancing Discoverability with Virtual Shelf Browse] 2010

Built into the walls of NC State's new Hunt Library are several [http://www.christiedigital.com/en-us/digital-signage/products/microtiles/pages/microtiles-digital-signage-video-wall.aspx Christie MicroTile Display Wall Systems]. What does a library do with a display that's seven feet tall and over twenty feet wide? I'll talk about why libraries might want large displays like this, what we're doing with them right now, and what we might do with them in the future. I'll talk about how these displays factor into planning for new and existing web projects. And I'll get into the fun details of how you build web applications that scale from the very small browser window on a phone all the way up to a browser window with about 14 million pixels (about 10 million more than a dual 24" monitor desktop setup).

== Discovering your Discovery System in Real Time. ==

* Godmar Back, Virginia Tech, gback@vt.edu
* Annette Bailey, Virginia Tech, afbailey@vt.edu

Practically all libraries today provide web-based discovery systems to their users;
users discover items and peruse or check them out by clicking on links. Unlike
the traditional transaction of checking out a book at the circulation desk, this
interaction is largely invisible. We have built a system that records user's
interaction with Summon in real-time, processes the resulting data with minimal delay,
and visualizes it in various ways using Google Charts and using various d3.js modules,
such as word clouds, tree maps, and others.

These visualizations can be embedded in web sites, but are also suitable for
projection via large-scale displays or projectors right into the 'Learning Spaces'
many libraries are converted into. The goal of this talk is to share the technology
and advocate the building of a cloud-based infrastructure that would make this
technology available to any library that uses a discovery system, rather than just
those who have the technological prowess for developing such systems and
visualizations in-house.

Previous presentations at Code4Lib:
* Talk: Code4Lib 2009 [http://code4lib.org/files/LibX2.0-Code4Lib-2009AsPresented.ppt LibX 2.0]
* Preconference: [http://wiki.code4lib.org/index.php/LibX_Preconference LibX 2.0, 2009]
* Preconference: Code4Lib 2010, On Widgets and Web Services

== Your Library, Anywhere: A Modern, Responsive Library Catalogue at University of Toronto Libraries ==

* Bilal Khalid, Gordon Belray, Lisa Gayhart (lisa.gayhart@utoronto.ca)

* No previous Code4Lib presentations

With the recent surge in the mobile device market and an ever expanding patron base with increasingly divergent levels of technical ability, the University of Toronto Libraries embarked on the development of a new catalogue discovery layer to fit the needs of its diverse users.

[http://search.library.utoronto.ca The result]: a mobile-friendly, flexible and intuitive web application that brings the full power of a faceted library catalogue to users without compromising quality or performance, employing Responsive Web Design principles. This talk will discuss: application development; service improvements; interface design; and user outreach, testing, and project communications. Feedback and questions from the audience are very welcome. If time runs short, we will be available for questions and conversation after the presentation.

Note: A version of this content has been provisionally accepted as an article for Code4Lib Journal, January 2014 publication.)

== All Tiled Up ==

* Mike Graves, MIT Libraries (mgraves@mit.edu)

You've got maps. You even scanned and georeferenced them. Now what? Running a full GIS stack can be expensive, and overkill in some cases. The good news is that you have a lot more options now than you did just a few years ago. I'd like to present some lighter weight solutions to making georeferenced images available on the Web.

This talk will provide an introduction to MBTiles. I'll go over what they are, how you create them, how you use them and why you would use them.

== The Great War: Image Interoperability to Facebook ==

* Rob Sanderson, Los Alamos National Laboratory (azaroth42@gmail.com)
** (Code4Lib 2006: [http://www.code4lib.org/2006/sanderson | Library Text Mining])
* Rob Warren, Carleton University
** No previous presentations

Using a pipeline constructed from Linked Open Data and other interoperability specifications, it is possible to merge and re-use image and textual data from distributed library collections to build new, useful tools and applications. Starting with the OAI-PMH interface to ContentDM, we will take you on a tour through the International Image Interoperability Framework and Shared Canvas, to a cross-institutional viewer, and image analysis for the purposes of building a historical Facebook from finding and tagging people in photographs. The World War One collections are drawn from multiple institutions and merged by the machine learning code.

The presentation will focus on the (open source) toolchain and the benefits of the use of standards throughout: OAI-PMH to get the metadata, IIIF for interaction with the images, the Shared Canvas ontology for describing collections of digitized objects, Open Annotation for tagging things in the images and specialized ontologies that are specific to the contents. The tools include standard RDF / OWL technologies, JSON-LD, imagemagick and OpenCV for image analysis.

== Visualizing Solr Search Results with D3.js for User-Friendly Navigation of Large Results Sets ==

*Julia Bauder, Grinnell College Libraries (bauderj-at-grinnell-dot-edu)
*No previous presentations at national Code4Lib conferences

As the corpus of articles, books, and other resources searched by discovery systems continues to get bigger, searchers are more and more frequently confronted with unmanageably large numbers of results. How can we help users make sense of 10,000 hits and find the ones they actually want? Facets help, but making sense of a gigantic sidebar of facets is not an easy task for users, either.
During this talk, I will explain how we will soon be using Solr 4’s pivot queries and hierarchical visualizations (e.g., treemaps) from D3.js to let patrons view and manipulate search results. We will be doing this with our VuFind 2.0 catalog, but this technique will work with any system running Solr 4. I will also talk about early student reaction to our tests of these visualization features.

== PeerLibrary – open source cloud based collaborative library ==

[https://github.com/peerlibrary/peerlibrary PeerLibrary is a new open source project] and a cloud service providing collaborative reading, sharing and storing. Users can upload publications they want to read (currently in PDF format), read them in the browser in real-time with others, highlight, annotate and organize their own or collaborative library. PeerLibrary provides a search engine to search over all uploaded open access publications. Additionally, it aims to collaboratively aggregate the open layer of knowledge on top of this publications through public annotations and references user will add to publications. In this way publications would not just be available to read, but accessible to the general public as well. Currently, it is aiming at scientific community and scientific publications.

See [http://blog.peerlibrary.org/post/63458789185/screencast-previewing-the-peerlibrary-project screencast here].

It is still in development and beta launch is planned at the end of November.

== Who was where when, or finding biographical articles on Wikipedia by place and time ==

* [http://morton-owens.info Emily Morton-Owens], The Seattle Public Library (presenting on work from NYU)
* No previous c4l presentations

It's easy to answer the question "What important people were in Paris in 1939?" But what about Virginia in the 1750s or Scandinavia in the 14th century? I created a tool that allows you to search for biographies in a generally applicable way, using a map interface. I would like to present updates to my thesis project, which combines a crawler written in Java that extracts information from Wikipedia articles, with a MongoDB data store and a frontend in Python.

The input to the project is freetext of entire articles in Wikipedia; this is important to allow us to pick up Benjamin Franklin not just in the single most obvious place of Philadelphia but also in London, Paris, Boston, etc. I can talk about my experiments disambiguating place names (approaches pioneered on newspaper articles were actually unhelpful on this type of text) and setting up a processing queue that does not become mired in the biographies of every human who ever played soccer. I also want to mitigate some of the implementation choices I made due to my academic deadline and improve the accuracy/usability.

What I hope to show is that I was able to develop a novel and useful reference tool automatically, using fairly simple heuristics that are a far cry from hand-cataloging familiar to many librarians.

You can try out [http://linserv1.cims.nyu.edu:48866/ the original version] (this server is inconveniently set to be updated/rebooted on 11/8--may be temporarily unavailable)

== Good!, DRY, and Dynamic: Content Strategy for Libraries (Especially the Big Ones) ==

*Michael Schofield, Nova Southeastern University Libraries, mschofield@nova.edu
*No previous code4lib presentations.

The responsibilities of the #libweb are exploding [it’s a good thing] and it is no longer uncommon for libraries to manage or even home-grow multiple applications and sites. Often it is at this point where the web people begin to suffer the absence of a content strategy when, say, business hours need to be updated sitewide a half-dozen times.

We were already feeling this crunch when we decided to further complicate the Nova Southeastern University Libraries by splitting the main library website into two. The Alvin Sherman Library, Research, and Information Technology Center is a unique joint-use facility that serves not only the academic community but the public of Broward County - and marketing a hyperblend of content through one portal just wasn't cutting it. With a web team of two, we knew that managing all this rehashed, disparate content was totally unsustainable.

I want to share in this talk how I went about making our library content DRY (“don’t repeat yourself”): input content in one place--blurbs, policies, featured events, featured databases, book reviews, business hours, and so on.--and syndicate it everywhere - even, sometimes, dynamically target that content for specific audiences or context. It is a presentation that is a little about workflow, a little more about browser and context detection, a tangent about content-modeling the CMS, and a lot about APIs, syndication, and performance.

== No code, no root, no problem? Adventures in SaaS and library discovery ==

*[mailto:erwhite@vcu.edu Erin White, VCU]
*No previous C4L presentations

In 2012 VCU was an eager early adopter of Ex Libris' cloud service Alma as an ILS, ERM, link resolver, and single-stop, de-silo'd public-facing discovery tool. This has been a disruptive change that has shifted our systems staff's day-to-day work, relationships with others in the library, and relationships with vendors.

I'll share some of our experiences and takeaways from implementing and maintaining a cloud service:
* Seeking disruption and finding it
* Changing expectations of service and the reality of unplanned downtime
* Communication and problem resolution with non-IT library staff
* Working with a vendor that uses agile development methodology
* Benefits and pitfalls of creating customizations and code workarounds
* Changes in library IT/coders' roles with SaaS

...as well as thoughts on the philosophy of library discovery vs real-life experiences in moving to a single-search model.

== Building for others (and ourselves): the Avalon Media System ==
* [mailto:michael.klein@northwestern.edu Michael B Klein], Senior Software Developer, Northwestern University
** [http://code4lib.org/conference/2010/metz_klein Public Datasets in the Cloud] (code4lib 2010)
** [http://code4lib.org/conference/2013/klein-rogers The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery] (code4lib 2013)
* [mailto:j-rudder@northwestern.edu Julie Rudder], Digital Initiatives Project Manager, Northwestern University
** no previous code4lib presentations

[http://www.avalonmediasystem.org/ Avalon Media System] is a collaborative effort between development teams at Northwestern and Indiana Universities. Our goal is to produce an open source media management platform that works well for us, but is also widely adopted and contributed to by other institutions. We believe that building a strong user and contributor community is vital to the success and longevity of the project, and have developed the system with this goal in mind. We will share lessons learned, pains and successes we’ve had releasing two versions of the application since last year.

Our presentation will cover our experiences:
* providing flexible, admin-friendly distribution and installation options
* building with abstraction, customization and local integrations in mind
* prioritizing features (user stories)
* attracting code contributions from other institutions
* gathering community feedback
* creating a product rather than a bag of parts

== How to check your data to provide a great data product? Data quality as a key product feature at Europeana ==

*[mailto:Peter.Kiraly@kb.nl Péter Király] portal backend developer, Europeana
*No previous C4L presentations

[http://Europeana.eu/ Europeana.eu] - Europe's digital library, archive and museum - aggregates more than 30 million metadata records from more than 2200 institutions. The records come from libraries, archives, museums and every other kind of cultural institution, from very different systems and metadata schemas, and are typically transformed several times until they are ingested into the Europeana data repository. Europeana builds a consolidated database from these records, creating reliable and consistent services for end-users (a search portal, search widget, mobile apps, thematic sites etc.) and an API, which supports our strategic goeal of data for reuse in education, creative industries, and the cultural sector. A reliable "data product" is thus at the core of our own software products, as well as those of our API partners.

Much effort is needed to smooth out local differences in the metadata curation practice of our data providers. We need a solid framework to measure the consistency of our data and provide feedback to decision-makers inside and outside the organisation. We can also use this metrics framework to ask content providers to improve their own metadata. Of course, a data-quality-driven approach requires that we also improve the data transformation steps of the Europeana ingestion process itself. Data quality issues heavily define what new features we are able to create in our user interfaces and API, and might actually affect the design and implementation of our underlying data structure, the Europeana Data Model.

In the presentation I briefly describe the Europeana metadata ingestion process, show the data quality metrics, the measuring techniques (using the Europeana API, Solr and MongoDB queries), some typical problems (both trivial and difficult ones), and finally the feedback mechanism we propose to deploy.

Keywords: Europeana, data quality, EDM, API, Apache Solr, MongoDB, #opendata, #openglam

== Teach your Fedora to Fly: scaling out your digital repository ==

*[mailto:acoburn@amherst.edu Aaron Coburn], Software Developer, Amherst College
*No previous C4L presentations

Fedora is a great repository system for managing large collections of digital objects, but what happens when a popular food magazine begins directing a large number of readers to a manuscript showing Emily Dickinson’s own recipe for doughnuts? While Fedora excels in its support of XML-based metadata, it doesn’t always perform well under a high volume of traffic. Nor is it especially tolerant of network or hardware failures.

This presentation will show how we are making heavy use of a fedora repository while at the same time insulating it almost entirely from any web traffic. Starting with a distributed web front-end built with Node.js, and caching most of the user-accessible content from Fedora in an elastic, fault-tolerant Riak (NoSQL) cluster, we have eliminated nearly all single points of failure in the system. It also means that our production system is spread across twelve separate servers, where asynchrony and Map-Reduce are king. And aside from being blazing fast, it is also entirely Hydra-compliant.

Furthermore, we will attempt to answer the question: if fedora crashes and the visitors to your site don’t notice, did it really fail?

== Using Open Source Software and Freeware to Preserve and Deliver Digital Videos ==
* [mailto:wfang@kinoy.rutgers.edu Wei Fang], Head of Digital Services, Rutgers University Law Library
* Jiebei Luo, Digital Projects Initiative Intern, Rutgers University
*No previous C4L presentations

The Rutgers University Law Library is the official digital repository of the New Jersey Supreme Court oral arguments since 2002. This large video collection contains approximately 3,000 videos with a total of 400 GB or 6,000 viewing hours. With the expansion of this collection, the existing database and the static website could not efficiently support the library’s daily operations and meet its patrons’ search needs.
By utilizing open source software and freeware such as Ubuntu, FFmpeg, Solr and Drupal, the library is able to develop a complete solution to re-encoding videos, embedding subtitles, incorporating Solr search engine and content management system to support full-text subtitle search, automatically updating video metadata records in the library catalog system and eventually providing a plug-in free HTML 5-based Web interface for patrons to view the videos online.
The aspects below will be presented in detail at the conference:
* Video codecs comparison
* Server-end batch video encoding/re-encoding
* HTML 5 video tag and embedding subtitles
* Incorporating search engine Solr and content management tool Drupal with the database to retrieve videos by full-text search especially in subtitle files
* Incorporating video metadata with the library catalog system

== Shared Vision, Shared Resources: the Curate Institutional Repository ==
* Dan Brubaker Horst, University of Notre Dame
** [http://code4lib.org/conference/2011/JohnsonHorst A Community-Based Approach to Developing a Digital Exhibit at Notre Dame Using the Hydra Framework]
* Julie Rudder, Northwestern University
** no previous presentations

Curate is being collaboratively developed by several institutions in the Hydra community who share the need and vision for a Fedora-backed Institutional Repository. The first release of Curate was a collaboration between Notre Dame and Northwestern University, along with Digital Curation Experts (DCE) - a vendor hired jointly by our two institutions. Powered by the Hydra engine Sufia, the team worked quickly to release the first version of Curate in October 2013 which provides a basic self-deposit system that has support for various content types, collection building, DOI minting, and user profile creation. From the very beginning we have built Curate to be easy to theme and extend in order to ease the process of installation and use by other institutions.

In December 2013, additional partners will join the project including: Indiana University, the University of Cincinnati and the University of Virginia. Each institution contributes resources to the project in order to further our common goal to create a product that fits our needs and has a sustainable future.Together we will tackle additional content types (like complex data, software, media), administrative collections and more.

Our presentation will include:
* a brief demonstration of Curate and technical overview
* why and how we work together
* why build Curate
* the future of the project

== Solr, Cloud and Blacklight ==
* David Jiao, Library Information Systems, Indiana University at Bloomington, djiao@indiana.edu
** No previous code4lib presentations

SolrCloud refers to the distributed capabilities in Solr4. It is designed to offer a highly available, fault tolerant environment by organizing data into multiple pieces that can be hosted on multiple machines with replicas, and providing a centralized cluster configuration and management.

At Indiana University, we are upgrading our Solr backend for our recently released Blacklight-based OPAC system from Solr 1.4 to Solr4, and we also put up efforts to build a private cloud of Solr4 servers. In this talk, I will persent certain features of SolrCloud, including distributed requests, fault tolerance, near real time indexing/searching, and configuration management with Zookeeper, and our experiences of utilizing these features to provide better performance and architecture for our OPAC system, which serves over 7 million bibliographic records to over 100 thousand students and faculty members. I will also discuss some practical lessons learned from our SolrCloud setup/upgrade and the integration of the new SolrCloud to our customized Blacklight system.

== Leveraging XSD's for Reflective, Live Dataset Support in Institutional Repositories ==
* [mailto:msulliva@ufl.edu Mark Sullivan], Library Information Technology, University of Florida
** No previous code4lib presentations

The University of Florida Libraries are currently adding support for active datasets into our METS-based institutional repository software. This ongoing project enables the library to be a partner in current, or long-running, data-driven projects around the university by providing tangible short-term and long-term benefits to the projects. The system assists project teams by storing and providing access to their data, while supporting online filtering and sorting of the data, custom queries, and adding and editing of the data by authorized users. We are also exploring simple data visualizations to allow users to perform basic graphical and geographic queries. Several different schemas were explored including DDI and EML, but ultimately the streamlined approach of using XSD's with some custom attributes was chosen, with all other data residing in the METS file portions. Currently the system is being developed using XSD's describing XML datasets, but this model should easily scale to support SQL datasets or large datasets supported by Hadoop or iRODS.

This work is being integrated in the open source [http://sobek.ufl.edu SobekCM Digital Content Management System] which is built on a pair-tree structure of METS resources with [http://ufdc.ufl.edu/design/webcontent/sobekcm/SobekCM_Resource_Object.pdf rich metadata support] including DC, MODS, MARC, VRACore, DarwinCore, IEE-LOM, GML/KML, schema.org microdata, and many other standard schemas. The system has emphasized online, distributed creation and maintenance of resources including geo-placement and geographic searching of resources, building structure maps (table of contents) visually online, and a broad suite of curator tools.

This work is presented as a model which could be implemented in other systems as well. We will demonstrate current support and discuss our upcoming roadmap to provide complete support.

== Dead-simple Video Content Management: Let Your Filesystem Do The Work ==

* Andreas Orphanides, NCSU Libraries (akorphan (at) ncsu.edu)
** (never led or soloed a C4L presentation)

Content management is hard. To keep all the moving parts in order, and to maintain a layer of separation between the system and content creators (who are frequently not technical experts), we typically turn to content management systems like Drupal. But even Drupal and its kin require significant overhead and present a not inconsiderable learning curve for nontechnical users.

In some contexts it's possible -- and desirable -- to manage content in a more streamlined, lightweight way, with a minimum of fuss and technical infrastructure. In this presentation I'll share a simple MVC-like architecture for managing video content for playback on the web, which uses a combination of Apache's mod_rewrite module and your server's filesystem structure to provide an automated approach to video content management that's easy to implement and provides a low barrier to content updates: friendly to content creators and technology implementors alike. Even better, the basic method is HTML5-friendly, and can be integrated into your favorite content management system if you've got permissions for creating templates.

In the presentation I'll go into detail about the system structure and logic required to implement this approach. I'll detail the benefits and limitations of the system, as well as the challenges I encountered in developing its implementation. Audience members should come away with sufficient background to implement a similar system on their own servers. Implementation documentation and genericized code will also be shared, as available.

== Managing Discovery ==

* Andrew Pasterfield, Senior Programmer/Systems Analyst, University of Calgary Library, ampaster@ucalgary.ca
**No previous code4lib presentations 
In fall 2012 the University of Calgary Library launched a new home page that incorporated a Summon powered
Single Search Box with customized “bento box” results display. Search at the U of C now combines a range of
metadata sources for discovery and customized mapping of a database recommender and LibGuide into a unified
display. Further customizations include a non Google Analytics/non proxy method to log clicks. 

This presentation will discuss the technical details of bringing the various systems together into one display interface to increase discovery at the U of C Library.

http://library.ucalgary.ca

== Sorting it out: a piece of the User Centered Design Process ==

* Cindy Beggs, [http://www.akendi.com/aboutus/management/ Akendi], cindy@akendi.com

This talk is about how to apply a user centered design methodology to the process of creating an information architecture. Participants learn the fundamentals of UCD and how card sorting and reverse card sorting enable us to isolate the content we present on screen from the layouts and visuals of those screens. We talk about ways to identify who will be using the information architecture you are creating and why we need to know how it will be used.

What will attendees takes away from your talk?
The criticality of involving “real” end users in the process of creating an information architecture. The basics of following a user-centered-design process in the creation of best in class, content-rich, digital products.

Cindy Beggs has been working in the “information industry” for over 25 years. A librarian by profession, she has spent decades helping users figure out how to find their way through large bodies of content. Her insights into how people seek information, her empathy for those who find it a challenge and her practical experience helping organizations figure out how to best structure their content contribute to her success as an information architect with both clients and trainees. (http://www.akendi.com/aboutus/management/)

==Implementation of ArchivesSpace in University of Richmond==

*Birong Ho, bho@richmond.edu

University of Richmond implemented its archive collection management ArchivsSpace in the fall, 2013. As a charter member and the Head of Special Collection as the Board member, implementation of such an Open Source Software became a priority.

Several aspects of implementation will be addressed in the talk. Among them, they are Collections and Repository, storage layer including data format, System resources requirements, Technical architecture, Customization, scaling and integrated with other systems in the library.

The customization, scale, and integration with other systems such as Archeon and Exist on campus became a concern will be focused and elaborated in the talk.

==Easy Wins for Modern Web Technologies in Libraries==

*[mailto:trey.terrell@oregonstate.edu Trey Terrell], Analyst Programmer, Oregon State University
** No previous Code4Lib presentations

Oregon State University is currently implementing an updated version of its room reservation system. In its development we've come across and implemented a variety of "easy wins" to make it more responsive, easier to maintain, less expensive to run, and just cooler to experience. While our particular system was in Ruby on Rails, this talk will address general methods and example utilities which can be used no matter your stack.

I'll be talking about things like cache management, reverse proxies, publish/subscribe servers, WebSockets, responsive design, asynchronous processing, and keeping complicated stacks up and running with minimal effort.

==Implementing Islandora at a Small Institution==

*Megan Kudzia, Albion College Library
*Eddie Bachle, Albion College IT
**No previous Code4Lib presentations

Albion College (and particularly the Library/Archives and Special Collections) has a variety of needs which could be met by an open-source Institutional Repository system. Several months and lots of conversations later, we’re continuing to troubleshoot our way through Islandora. We’d like to talk about what has worked for us, where our frustrations have been, whether it’s even possible to install and develop a system like this at a small institution, and where the process has stalled.

As of right now, we do have a semi-working installation. We’re not sure when it will be ready for our end users, but we'll talk about our development process and evaluate our progress.
''Contributions also by Nicole Smeltekop, Albion College Archives & Special Collections''

== PhantomJS+Selenium: Easy Automated Testing of AJAX-y UIs ==

* Martin Haye, California Digital Library, martin.haye@ucop.edu
** Previous Code4Lib Presentation: [http://code4lib.org/conference/2012/collett Beyond code: Versioning data with Git and Mercurial] at Code4Lib 2012 (Martin co-presenting with Stephanie Collett)
* Mark Redar, California Digital Library, mark.redar@ucop.edu

Web user interfaces are demanding ever-more dynamism and polish, combining HTML5, AJAX, lots of CSS and jQuery (or ilk) to create autocomplete drop-downs, intelligent buttons, stylish alert dialogs, etc. How can you make automated tests for these highly complex and interactive UIs?

Part of the answer is PhantomJS. It’s a modern WebKit browser that’s “headless” (meaning it has no display) that can be driven from command-line Selenium unit tests. PhantomJS is dead simple to install, and its blazing speed and server-friendliness make continuous integration testing easy. You can write UI unit tests in {language-of-your-choice} and run them not just in PhantomJS but in Firefox and Chrome, plus a zillion browser/OS combinations at places like SauceLabs, TestingBot and BrowserStack.

In this double-team live code talk, we’ll explain all that while we demonstrate the following in real time:

* Start with nothing.
* Install Selenium bindings for Ruby and Python.
* In each language write a small test of an AJAX-y UI.
* Run the tests in Firefox, and fix bugs (in the test or UI) as needed.
* Install PhantomJS.
* Show the same tests running headless as part of a server-friendly test suite.
* (Wifi permitting) Show the same tests running on a couple different browser/OS combinations on the server cloud at SauceLabs – talking through a tunnel to the local firewalled application.

==New Technologies, Collaboration, & Entrepreneurship in Libraries: Harnessing Their Power to Help Your Library==

* Stephanie Walker – swalker@brooklyn.cuny.edu
* Howard Spivak – howards@brooklyn.cuny.edu
* Alex - Alex@brooklyn.cuny.edu

Academic libraries are caught in budget squeezes and often struggle to find ways to communicate value to senior administration and others. At Brooklyn College Library, we have taken an unusual, possibly unique, approach to these issues. Our technology staff have long worked directly with librarians to develop products that meet library, faculty, and student needs, and we have shared many of our products with colleagues, including an award-winning website, e-resource, and content management system we call 4MyLibrary, which we shared for free with 8 CUNY colleges, and also an easy-to-use book scanner, which has proven overwhelming popular with students, faculty, other librarians, and numerous campus offices. Recently, motivated by budget cuts, we decided that what worked for us might interest other libraries, and working with our Office of Technology Commercialization, we started selling 2 products: our book scanners (at half the price of commercial alternatives), and a hosting service, whereby we could host and support 4MyLibrary for libraries with minimal technology staff. Both succeeded, and yielded major benefits: a steady revenue stream and the admiration and serious goodwill of our senior administration and others. However, this presentation is neither a basic how-to, nor an advertisement. With this presentation, we hope to spur a conversation for broader collaboration, especially regarding new technologies, among libraries. We all have some level of technical expertise, most of us are struggling with rising prices and tight budgets, and many of us are unhappy with various technology products we use, from scanners to our ILS. We believe – and can demonstrate – that with collaboration, we can solve many of our problems, and provide better services to boot.

== Identifiers, Data, and Norse Gods ==

* Ryan Scherle, Dryad Digital Repository, ryan@datadryad.org

ORCID and DataCite are provide stable identifiers for researchers and and data, respectively. Each system does a fine job of providing value to its users. But wouldn't it be great if they could link their systems to create something much more powerful? Perhaps even as powerful as a god?

Enter [http://odin-project.eu/ ODIN], The ORCID and DataCite Interoperability Network. ODIN is a two-year project to unleash the power of persistent identifiers for researchers and the research they create. This talk will present recent work from the ODIN project, including several tools that can be used to unleash the godlike power of identifiers at your institution.

[[:Category:Code4Lib2014]]

2014 Prepared Talk Proposals

2013-11-08T20:03:02Z

Ryscher: /* New Technologies, Collaboration, & Entrepreneurship in Libraries: Harnessing Their Power to Help Your Library */

'''Proposals for Prepared Talks:'''

Prepared talks are 20 minutes (including setup and questions), and should focus on one or more of the following areas:

* ''Projects'' you've worked on which incorporate innovative implementation of existing technologies and/or development of new software
* ''Tools and technologies'' – How to get the most out of existing tools, standards and protocols (and ideas on how to make them better)
* ''Technical issues'' - Big issues in library technology that should be addressed or better understood
* ''Relevant non-technical issues'' – Concerns of interest to the Code4Lib community which are not strictly technical in nature, e.g. collaboration, diversity, organizational challenges, etc.

'''To Propose a Talk'''
* Log in to the wiki in order to submit a proposal. If you are not already registered, follow the instructions to do so.
* Provide a title and brief (500 words or fewer) description of your proposed talk.
* If you so choose, you may also indicate when, if ever, you have presented at a prior Code4Lib conference. This information is completely optional, but it may assist us in opening the conference to new presenters.

As in past years, the Code4Lib community will vote on proposals that they would like to see included in the program. This year, however, only the top 10 proposals will be guaranteed a slot at the conference. Additional presentations will be selected by the Program Committee in an effort to ensure diversity in program content. Community votes will, of course, still weigh heavily in these decisions.

Presenters whose proposals are selected for inclusion in the program will be guaranteed an opportunity to register for the conference. The standard conference registration fee will still apply.

''Proposals can be submitted through '''Friday, November 8, 2013, at 5pm PST'''''. Voting will commence on November 18, 2013 and continue through December 6, 2013. The final line-up of presentations will be announced in early January, 2014.

'''Talk Proposals'''

==Creating a new Greek-Dutch dictionary==
* Caspar Treijtel, University of Amsterdam, c.treijtel@uva.nl

At present, no complete dictionary of (ancient) Greek-Dutch is available online. A new dictionary is currently under construction at Leiden University, with software being developed at the University of Amsterdam. The team in Leiden has already begun preparation of the data, with at this moment about 6,000 approved lemmas. The ultimate goal is to produce both a print version and online open access version from the same source documents. The software needed for this has been made in a project that was funded by CLARIN-NL.

Migrator

For the production of lemmas we have implemented an advanced workflow. The (generally non-technical) users create lemmas using MS Word, which is both familiar and easy to use. We have developed a custom software module that carefully migrates the Word documents into deeply structured XML by analyzing the structure and semantics of the lemmas, and falling back on heuristics in ambiguous cases. While having initially envisioned the oXygen XML Author component as the main tool for creating new lemmas, we obtained excellent results with the migrator module, and decided therefore to continue using MS Word as the primary composition tool. The main advantage of this is that the editors are much more familiar with Word than with any other WYSIWYG editor. Lemmas that have been migrated to XML are stored in an XML database and can be further edited using oXygen XML Author.

Lemmatizer

Greek morphology is complicated. In order to use a dictionary effectively, a rather high level of initial language competence is necessary for the user to be able to relate the word form s/he finds in a text to the correct basic lemma form, where the definition of the word can be found. Using a Greek morphological database we have been able to facilitate the search for lemmas. A ‘lemmatizer’ module gives the possible parsings of the word forms and the lemmas they can be derived from. This enables the user to type in the word as found in the text and be redirected to the correct lemma.

Visualization

For the online dictionary we have implemented a visualization module that allows the user to view multiple lemmas at once. The implementation of this module has been done using the Javascript framework MooTools. The result is a viewer that performs really well and is run by maintainable Javascript code.

The online dictionary is still being worked on, have a look at http://www.woordenboekgrieks.nl/ for the beta version. A newer test version with additional features can be found here: http://angel.ic.uva.nl:8600/.

Credits

* construction of the dictionary: Prof. Ineke Sluiter, Classics department of Leiden University; Prof. Albert Rijksbaron, University of Amsterdam
* publisher of the dictionary: Amsterdam University Press
* design/typesetting dictionary: TaT Zetwerk (http://www.tatzetwerk.nl/)
* software development: Digital Production Center, University Library, University of Amsterdam
* project funding: CLARIN-NL (http://www.clarin.nl/)
* morphological database for use by the lemmatizer: courtesy of Prof. Helma Dik, University of Chicago (based on data of the Perseus Project)

----

== Using Drupal to drive alternative presentation systems ==

* [[User:Highermath|Cary Gordon]], The Cherry Hill Company, cgordon@chillco.com

Recently, we have been building systems that use angular.js, Rails, or other systems for presentation, while leveraging Drupal's sophisticated content management capabilities on the back end.

So far, these have been one-way systems, but as we move to Drupal 8 we are beginning to explore ways to further decouple the presentation and CMS functions.

== A Book, a Web Browser and a Tablet: How Bibliotheca Alexandrina's Book Viewer Framework Makes It Possible ==

* [[User:Mohammed.abuouda|Mohammed Abu ouda]], Bibliotheca Alexandrina (The new Library of Alexandria)

A lot of institutions around the world are engaged in multiple digitization projects aiming at preserving the human knowledge present in books and availing them through multiple channels to people around the whole globe. These efforts will sure help close the digital gap particularly with the arrival of affordable e-readers, mobile phones and network coverage. However, the digital reading experience has not yet arrived to its maximum potential. Many readers miss features they like in their good old books and wish to find them in their digital counterpart. In an attempt to create a unique digital reading experience, Bibliotheca Alexandria (BA) created a flexible book viewing framework that is currently used to access its current collection of more than 300,000 digital books in five different languages which includes the largest collection of digitized Arabic books.

Using open source tools, BA used the framework to develop a modular book viewer that can be deployed in different environments and is currently at the heart of various BA projects. The Book viewer provides several features creating a more natural reading experience. As with physical books, the reader can now personalize the books he reads by adding annotations like highlights, underlines and sticky notes to capture his thoughts and ideas in addition to being able to share the book with friends on social networks. The reader can perform a search across the content of the book receiving highlighted search results within the pages of the book. More features can be further added to the book viewer through its plugin architecture.

== Structured data NOW: seeding schema.org in library systems ==

* [http://coffeecode.net Dan Scott], Laurentian University
** Previous code4lib presentations: [https://archive.org/details/code4lib.conf.2008.pres.CouchDBsacrilege CouchDB is sacrilege... mmm, delicious sacrilege] at Code4Lib 2008

The semantic web, linked data, and structured data are all fantastic ideas with a barrier imposed by implementation constraints. If their system does not allow customizations, or the institution lacks skilled human resources, it does not matter how enthused a given library might be about publishing structured data... it will not happen. However, if the software in use simply publishes structured data by default, then the web will be populated for free. Really! No extra resources necessary.

This presentation highlights Dan's work with systems such as Evergreen, Koha, and VuFind to enable the publication of schema.org structured data out-of-the-box. Along the way, we reflect the current state of the W3C Schema.org Bibliographic Extension community group efforts to shape the evolution of the schema.org vocabulary. Finally, hold on tight as we contemplate next steps and the possibilities of a world where structured data is the norm on the web.

== Towards Pasta Code Nirvana: Using JavaScript MVC to Fill Your Programming Ravioli ==

* Bret Davidson, North Carolina State University Libraries, bret_davidson@ncsu.edu
** Previous Code4Lib Presentations: [http://wiki.code4lib.org/index.php/2013_talks_proposals#Data-Driven_Documents:_Visualizing_library_data_with_D3.js Visualizing library data with D3.js] at Code4Lib 2013

JavaScript MVC frameworks are ushering in a golden age of robust and responsive web applications that take advantage of evergreen browsers, performant JS engines, and the unprecedented reach provided by billions of personal computing devices. The web browser has emerged as the world’s most popular application runtime and the complexity[1] and scope of JavaScript applications has exploded accordingly. Server-side web frameworks like Rails and Django have helped developers adhere to best practices like modularity, dependency injection, and unit testing for years, practices that are now being applied to JavaScript development through projects like Backbone[2], Ember[3], and Angular[4].

This talk will discuss the issues JavaScript MVC frameworks are trying to solve, common features like data binding, implications for the future of web development[5], and the appropriateness of JavaScript MVC for library applications.

*[1]http://en.wikipedia.org/wiki/Spaghetti_code
*[2]http://backbonejs.org
*[3]http://emberjs.com
*[4]http://angularjs.org
*[5]http://tomdale.net/2013/09/progressive-enhancement-is-dead/

== WebSockets for Real-Time and Interactive Interfaces ==

* [http://ronallo.com Jason Ronallo], NCSU Libraries, jason_ronallo@ncsu.edu

Previous Code4Lib presentations:
* [http://code4lib.org/conference/2012/ronallo HTML5 Microdata and Schema.org] 2012
* [http://code4lib.org/conference/2013/ronallo HTML5 Video Now!] 2013

Watching the Google Analytics Real-Time dashboard for the first time was mesmerizing. As soon as someone visited a site, I could see what page they were on. For a digital collections site with a lot of images, it was fun to see what visitors were looking at. But getting from Google Analytics to the image or other content of what was currently being viewed was cumbersome. The real-time experience was something I wanted share with others. I'll show you how I used a WebSocket service to create a real-time interface to digital collections.

In the Hunt Library at NCSU we have some large video walls. I wanted to make HTML-based exhibits that featured viewer interactions. I'll show you how I converted Listen to Wikipedia [1] into an bring-your-own-device interactive exhibit. With WebSockets any HTML page can be remote controlled by any internet connected device.

I will attempt to include real-time audience participation.

[1] http://listen.hatnote.com/

== Rapid Development of Automated Tasks with the File Analyzer ==

* Terry Brady, Georgetown University Libraries, twb27@georgetown.edu

The Georgetown University Libraries have customized the File Analyzer and Metadata Harvester application (https://github.com/Georgetown-University-Libraries/File-Analyzer) to solve a number of library automation challenges:
* validating digitized and reformatted files
* validating vendor statistics for counter compliance
* preparing collections of digital files for archiving and ingest
* manipulating ILS import and export files

The File Analyzer application was used by the US National Archives to validate 3.5 million digitized images from the 1940 Census. After implementing a customized ingest workflow within the File Analyzer, the Georgetown University Libraries was able to process an ingest backlog of over a thousand files of digital resources into DigitalGeorgetown, the Libraries’ Digital Collections and Institutional Repository platform. Georgetown is currently developing customized workflows that integrate Apache Tika, BagIt, and Marc conversion utilities.

The File Analyzer is a desktop application with a powerful framework for implementing customized file validation and transformation rules. As new rules are deployed, they are presented to users within a user interface that is easy (and powerful) to use.

Learn about the functionality that is available for download, how you can use this tool to automate workflows from digital collections to ILS ingests to electronic resources statistics and also discuss the opportunities to collaborate on enhancements to this application!

== GeoHydra: How to Build a Geospatial Digital Library with Fedora ==

* [http://stanford.edu/~drh Darren Hardy], Stanford University, drh@stanford.edu

Geographically-rich data are exploding and putting fear in those trying to
tackle integrating them into existing digital library infrastructures.
Building a spatial data infrastructure that integrates with your digital
library infrastructure need not be a daunting task. We have successfully
deployed a geospatial digital library infrastructure using Fedora and
open-source geospatial software [1]. We'll discuss the primary design
decisions and technologies that led to a production deployment within a few
months. Briefly, our architecture revolves around discovery, delivery, and
metadata pipelines using open-source OpenGeoPortal [2], Solr [3], GeoServer
[4], PostGIS [5], and GeoNetwork [6] technologies, plus the proprietary ESRI
ArcMap [7] -- the GIS industry's workhorse. Finally, we'll discuss the key
skillsets needed to build and maintain a spatial data infrastructure.

[1] http://foss4g.org
[2] http://opengeoportal.org
[3] http://lucene.apache.org/solr
[4] http://geoserver.org
[5] http://postgis.net
[6] http://geonetwork-opensource.org
[7] http://esri.com

==Under the Hood of Hadoop Processing at OCLC Research ==

[http://roytennant.com/ Roy Tennant]

* Previous Code4Lib presentations: 2006: "The Case for Code4Lib 501c(3)"

[http://hadoop.apache.org/ Apache Hadoop] is widely used by Yahoo!, Google, and many others to process massive amounts of data quickly. OCLC Research uses a 40-node compute cluster with Hadoop and HBase to process the 300 million MARC records of WorldCat in various ways. This presentation will explain how Hadoop MapReduce works and illustrate it with specific examples and code. The role of the jobtracker in both monitoring and reporting on processes will be explained. String searching WorldCat will also be demonstrated live.

== Quick and Easy Data Visualization with Google Visualization API and Google Chart Libraries ==

[http://bohyunkim.net/blog Bohyun Kim], Florida International University, bohyun.kim@fiu.edu
* 'No' previous Code4Lib presentations

Do most of the data that your library collects stay in spreadsheets or are published as a static table with a series of boring numbers? Do your library stakeholders spend more time collecting the data than using it as a decision-making tool because the data is presented in a way that makes it hard for them [http://developers.google.com/chart/interactive/docs/gallery to quickly grasp its significance? ]

This talk will provide an overview of [http://developers.google.com/chart/interactive/docs/reference Google Visualization API] [2] and [http://developers.google.com/chart/ Google Chart Libraries] [3] to get you started on the way to quickly query and visualize your library data from remote data sources (e.g. a Google Spreadsheet or your own database) with (or without) cool-looking user-controls, animation effects, and even a dashboard.

== Leap Motion + Rare Books: A hands-free way to view and interact with rare books in 3D ==

[http://http://www.youtube.com/user/jpdenzer Juan Denzer], Binghamton University, jdenzer@binghamton.edu
* 'No' previous Code4Lib presentations

As rare books become more delicate over time, making them available to the public becomes harder. We at Binghamton University Library have developed an application that makes it easier to view rare books without ever having to touch them. We have combined the Leap Motion hands-free device and 3D rendered models to create a new virtual experience for the viewer.

The application allows the user to rotate and zoom in on a 3D representation of a rare book. The user is also able to ‘open’ the virtual book and flip through it using a natural user interface. Such as swiping the hand left or right to turn the page.

The application is built on the .Net framework and is written in C#. 3D models are created using simple 3D software such as sketchup or Blender. Scans of the book cover and spine are created using simple flatbed scanners. The inside pages are scanned using overhead scanners.

This talk with discuss the technologies used in developing the application and virtually any library could implement the application with virtually no coding at all. This presentation will have a demonstration of the software and also a chance for audience members to experience the Rare Book Leap Motion App themselves.

== Course Reserves Unleashed! ==

* Bobbi Fox, Library Technology Services, Harvard University, bobbi_fox@harvard.edu
* Gloria Korsman, Andover-Harvard Theological Library
** No previous Code4Lib presentations

Hey kids! Remember when SOAP was used for something other than washing? Our sophisticated (and highly functional) Course Reserves Request system does!

However, while the system is great for submitting and processing course reserve requests, the student-facing presentation through Havard’s home-grown -- and soon to be replaced -- LMS leaves a lot to be desired.

Follow along as we leverage Solr 4 as a No-SQL database, along with more progressive RESTful API techniques, to release Reserves data into the wild without interfering with reserves request processing -- and, in the process, open up the opportunity for other schools to feed their data in as well.

== We Are All Disabled! Universal Web Design Making Web Services Accessible for Everyone ==

Cynthia Ng, Accessibility Librarian, CILS at Langara College
* No previous Code4Lib presentations (not counting lightning talks)

We’re building and improving tools and services all the time, but do you only develop for the “average” user or add things for “disabled” users? We all use “assistive” technology accessing information in a multitude of ways with different platforms, devices, etc. Let’s focus on providing web services that are accessible to everyone without it being onerous or ugly. The aim is to get you thinking about what you can do to make web-based services and content more accessible for all from the beginning or with small amounts of effort whether you're a developer or not.

The goal of the presentation is to provide both developers and content creators with information on simple, practical ways to make web content and web services more accessible. However, rather than thinking about putting in extra effort or making adjustment for those with disabilities, I want to help people think about how to make their websites more accessible for all users through universal web design.

== Personalize your Google Analytics Data with Custom Events and Variables ==

[http://joshwilson.net Josh Wilson], Systems Integration Librarian, State Library of North Carolina - joshwilsonnc@gmail.com

At the State Library of North Carolina, we had more specific questions about the use of our digital collections than standard GA could provide. A few implementations of custom events and custom variables later, we have our answers.

I'll demonstrate how these analytics add-ons work, and why implementation can sometimes be more complicated than just adding a few lines of JavaScript to your ga.js. I'll discuss some specific examples in use at the SLNC:

* Capturing the content of specific metadata fields in CONTENTdm as Custom Events
* Recording Drupal taxonomy terms as Custom Variables

In both instances, this data deepened our understanding of how our sites and collections were being used, and in turn, we were able to report usage more accurately to content contributors and other stakeholders.

More on: [https://developers.google.com/analytics/devguides/collection/gajs/eventTrackerGuide GA Custom Events] | [https://developers.google.com/analytics/devguides/collection/gajs/gaTrackingCustomVariables GA Custom Variables]

== Behold Fedora 4: The Incredible Shrinking Repository! ==

Esmé Cowles, UC San Diego Library. Previous talk: [http://code4lib.org/conference/2013/cowles-critchlow-westbrook All Teh Metadatas Re-Revisited] (2013)

* One repository contains untold numbers of digital objects and powers many Hydra and Islandora apps
* It speaks RDF, but contains no triplestore! (triplestores sold separately, SPARQL Update may be involved, some restrictions apply)
* Flexible enough to tie itself in knots implementing storage and access control policies
* Witness feats of strength and scalability, with dramatically increased performance and clustering
* Plumb the depths of bottomless hierarchies, and marvel at the metadata woven into the very fabric of the repository
* Ponder the paradox of ingesting large files by not ingesting them
* Be amazed as Fedora 4 swallows other systems whole (including Fedora 3 repositories)
* Watch novice developers setup Fedora 4 from scratch, with just a handful of incantations to Git and Maven

The Fedora Commons Repository is the foundation of many digital collections, e-research, digital library, archives, digital preservation, institutional repository and open access publishing systems. This talk will focus on how Fedora 4 improves core repository functionality, adds new features, maintains backwards compatibility, and addresses the shortcomings of Fedora 3.

== Organic Free-Range API Development - Making Web Services That You Will Actually Want to Consume ==

Steve Meyer and Karen Coombs, OCLC

Building web services can have great benefits by providing reusability of data and functionality. Underpinning your applications with a web service will allow you to write code once and support multiple environments: your library's web app, mobile applications, the embedded widget in your campus portal. However, building a web service is its own kind of artful programming. Doing it well requires attention to many of the same techniques and requirements as building web applications, though with different outcomes.

So what are the usability principles for web services? How do you build a web service that you (and others) will actually want to use? In this talk, we’ll share some of the lessons learned - the good, the bad, and the ugly - through OCLC's work on the WorldCat Metadata API. This web service is a sophisticated API that provides external clients with read and write access to WorldCat data. It provides a model to help aspiring API creators navigate the potential complications of crafting a web service. We'll cover:

* Loose coupling of data assets and resource-oriented data modeling at the core
* Coding to standards vs. exposure of an internal data model
* Authentication and security for web services: API Keys, Digital Signing, OAuth Flows
* Building web services that behave as a suite so it looks like the left hand knows what the right hand is doing

So at the end of the day, your team will know your API is a very good egg after all.

If accepted, the presenters intend to produce and share a Quick Guide for building a web service that will reflect content presented in the talk.

== Lucene's Latest (for Libraries) ==

erik.hatcher@lucidworks.com

Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.

== The Why and How of Very Large Displays in Libraries. ==

* Cory Lown, NCSU Libraries, cwlown@ncsu.edu

Previous Code4Lib Presentations:
* [http://code4lib.org/conference/2012/lown How People Search the Library from a Single Search Box] 2012
* [http://code4lib.org/conference/2010/orphanides_lown_lynema Enhancing Discoverability with Virtual Shelf Browse] 2010

Built into the walls of NC State's new Hunt Library are several [http://www.christiedigital.com/en-us/digital-signage/products/microtiles/pages/microtiles-digital-signage-video-wall.aspx Christie MicroTile Display Wall Systems]. What does a library do with a display that's seven feet tall and over twenty feet wide? I'll talk about why libraries might want large displays like this, what we're doing with them right now, and what we might do with them in the future. I'll talk about how these displays factor into planning for new and existing web projects. And I'll get into the fun details of how you build web applications that scale from the very small browser window on a phone all the way up to a browser window with about 14 million pixels (about 10 million more than a dual 24" monitor desktop setup).

== Discovering your Discovery System in Real Time. ==

* Godmar Back, Virginia Tech, gback@vt.edu
* Annette Bailey, Virginia Tech, afbailey@vt.edu

Practically all libraries today provide web-based discovery systems to their users;
users discover items and peruse or check them out by clicking on links. Unlike
the traditional transaction of checking out a book at the circulation desk, this
interaction is largely invisible. We have built a system that records user's
interaction with Summon in real-time, processes the resulting data with minimal delay,
and visualizes it in various ways using Google Charts and using various d3.js modules,
such as word clouds, tree maps, and others.

These visualizations can be embedded in web sites, but are also suitable for
projection via large-scale displays or projectors right into the 'Learning Spaces'
many libraries are converted into. The goal of this talk is to share the technology
and advocate the building of a cloud-based infrastructure that would make this
technology available to any library that uses a discovery system, rather than just
those who have the technological prowess for developing such systems and
visualizations in-house.

Previous presentations at Code4Lib:
* Talk: Code4Lib 2009 [http://code4lib.org/files/LibX2.0-Code4Lib-2009AsPresented.ppt LibX 2.0]
* Preconference: [http://wiki.code4lib.org/index.php/LibX_Preconference LibX 2.0, 2009]
* Preconference: Code4Lib 2010, On Widgets and Web Services

== Your Library, Anywhere: A Modern, Responsive Library Catalogue at University of Toronto Libraries ==

* Bilal Khalid, Gordon Belray, Lisa Gayhart (lisa.gayhart@utoronto.ca)

* No previous Code4Lib presentations

With the recent surge in the mobile device market and an ever expanding patron base with increasingly divergent levels of technical ability, the University of Toronto Libraries embarked on the development of a new catalogue discovery layer to fit the needs of its diverse users.

[http://search.library.utoronto.ca The result]: a mobile-friendly, flexible and intuitive web application that brings the full power of a faceted library catalogue to users without compromising quality or performance, employing Responsive Web Design principles. This talk will discuss: application development; service improvements; interface design; and user outreach, testing, and project communications. Feedback and questions from the audience are very welcome. If time runs short, we will be available for questions and conversation after the presentation.

Note: A version of this content has been provisionally accepted as an article for Code4Lib Journal, January 2014 publication.)

== All Tiled Up ==

* Mike Graves, MIT Libraries (mgraves@mit.edu)

You've got maps. You even scanned and georeferenced them. Now what? Running a full GIS stack can be expensive, and overkill in some cases. The good news is that you have a lot more options now than you did just a few years ago. I'd like to present some lighter weight solutions to making georeferenced images available on the Web.

This talk will provide an introduction to MBTiles. I'll go over what they are, how you create them, how you use them and why you would use them.

== The Great War: Image Interoperability to Facebook ==

* Rob Sanderson, Los Alamos National Laboratory (azaroth42@gmail.com)
** (Code4Lib 2006: [http://www.code4lib.org/2006/sanderson | Library Text Mining])
* Rob Warren, Carleton University
** No previous presentations

Using a pipeline constructed from Linked Open Data and other interoperability specifications, it is possible to merge and re-use image and textual data from distributed library collections to build new, useful tools and applications. Starting with the OAI-PMH interface to ContentDM, we will take you on a tour through the International Image Interoperability Framework and Shared Canvas, to a cross-institutional viewer, and image analysis for the purposes of building a historical Facebook from finding and tagging people in photographs. The World War One collections are drawn from multiple institutions and merged by the machine learning code.

The presentation will focus on the (open source) toolchain and the benefits of the use of standards throughout: OAI-PMH to get the metadata, IIIF for interaction with the images, the Shared Canvas ontology for describing collections of digitized objects, Open Annotation for tagging things in the images and specialized ontologies that are specific to the contents. The tools include standard RDF / OWL technologies, JSON-LD, imagemagick and OpenCV for image analysis.

== Visualizing Solr Search Results with D3.js for User-Friendly Navigation of Large Results Sets ==

*Julia Bauder, Grinnell College Libraries (bauderj-at-grinnell-dot-edu)
*No previous presentations at national Code4Lib conferences

As the corpus of articles, books, and other resources searched by discovery systems continues to get bigger, searchers are more and more frequently confronted with unmanageably large numbers of results. How can we help users make sense of 10,000 hits and find the ones they actually want? Facets help, but making sense of a gigantic sidebar of facets is not an easy task for users, either.
During this talk, I will explain how we will soon be using Solr 4’s pivot queries and hierarchical visualizations (e.g., treemaps) from D3.js to let patrons view and manipulate search results. We will be doing this with our VuFind 2.0 catalog, but this technique will work with any system running Solr 4. I will also talk about early student reaction to our tests of these visualization features.

== PeerLibrary – open source cloud based collaborative library ==

[https://github.com/peerlibrary/peerlibrary PeerLibrary is a new open source project] and a cloud service providing collaborative reading, sharing and storing. Users can upload publications they want to read (currently in PDF format), read them in the browser in real-time with others, highlight, annotate and organize their own or collaborative library. PeerLibrary provides a search engine to search over all uploaded open access publications. Additionally, it aims to collaboratively aggregate the open layer of knowledge on top of this publications through public annotations and references user will add to publications. In this way publications would not just be available to read, but accessible to the general public as well. Currently, it is aiming at scientific community and scientific publications.

See [http://blog.peerlibrary.org/post/63458789185/screencast-previewing-the-peerlibrary-project screencast here].

It is still in development and beta launch is planned at the end of November.

== Who was where when, or finding biographical articles on Wikipedia by place and time ==

* [http://morton-owens.info Emily Morton-Owens], The Seattle Public Library (presenting on work from NYU)
* No previous c4l presentations

It's easy to answer the question "What important people were in Paris in 1939?" But what about Virginia in the 1750s or Scandinavia in the 14th century? I created a tool that allows you to search for biographies in a generally applicable way, using a map interface. I would like to present updates to my thesis project, which combines a crawler written in Java that extracts information from Wikipedia articles, with a MongoDB data store and a frontend in Python.

The input to the project is freetext of entire articles in Wikipedia; this is important to allow us to pick up Benjamin Franklin not just in the single most obvious place of Philadelphia but also in London, Paris, Boston, etc. I can talk about my experiments disambiguating place names (approaches pioneered on newspaper articles were actually unhelpful on this type of text) and setting up a processing queue that does not become mired in the biographies of every human who ever played soccer. I also want to mitigate some of the implementation choices I made due to my academic deadline and improve the accuracy/usability.

What I hope to show is that I was able to develop a novel and useful reference tool automatically, using fairly simple heuristics that are a far cry from hand-cataloging familiar to many librarians.

You can try out [http://linserv1.cims.nyu.edu:48866/ the original version] (this server is inconveniently set to be updated/rebooted on 11/8--may be temporarily unavailable)

== Good!, DRY, and Dynamic: Content Strategy for Libraries (Especially the Big Ones) ==

*Michael Schofield, Nova Southeastern University Libraries, mschofield@nova.edu
*No previous code4lib presentations.

The responsibilities of the #libweb are exploding [it’s a good thing] and it is no longer uncommon for libraries to manage or even home-grow multiple applications and sites. Often it is at this point where the web people begin to suffer the absence of a content strategy when, say, business hours need to be updated sitewide a half-dozen times.

We were already feeling this crunch when we decided to further complicate the Nova Southeastern University Libraries by splitting the main library website into two. The Alvin Sherman Library, Research, and Information Technology Center is a unique joint-use facility that serves not only the academic community but the public of Broward County - and marketing a hyperblend of content through one portal just wasn't cutting it. With a web team of two, we knew that managing all this rehashed, disparate content was totally unsustainable.

I want to share in this talk how I went about making our library content DRY (“don’t repeat yourself”): input content in one place--blurbs, policies, featured events, featured databases, book reviews, business hours, and so on.--and syndicate it everywhere - even, sometimes, dynamically target that content for specific audiences or context. It is a presentation that is a little about workflow, a little more about browser and context detection, a tangent about content-modeling the CMS, and a lot about APIs, syndication, and performance.

== No code, no root, no problem? Adventures in SaaS and library discovery ==

*[mailto:erwhite@vcu.edu Erin White, VCU]
*No previous C4L presentations

In 2012 VCU was an eager early adopter of Ex Libris' cloud service Alma as an ILS, ERM, link resolver, and single-stop, de-silo'd public-facing discovery tool. This has been a disruptive change that has shifted our systems staff's day-to-day work, relationships with others in the library, and relationships with vendors.

I'll share some of our experiences and takeaways from implementing and maintaining a cloud service:
* Seeking disruption and finding it
* Changing expectations of service and the reality of unplanned downtime
* Communication and problem resolution with non-IT library staff
* Working with a vendor that uses agile development methodology
* Benefits and pitfalls of creating customizations and code workarounds
* Changes in library IT/coders' roles with SaaS

...as well as thoughts on the philosophy of library discovery vs real-life experiences in moving to a single-search model.

== Building for others (and ourselves): the Avalon Media System ==
* [mailto:michael.klein@northwestern.edu Michael B Klein], Senior Software Developer, Northwestern University
** [http://code4lib.org/conference/2010/metz_klein Public Datasets in the Cloud] (code4lib 2010)
** [http://code4lib.org/conference/2013/klein-rogers The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery] (code4lib 2013)
* [mailto:j-rudder@northwestern.edu Julie Rudder], Digital Initiatives Project Manager, Northwestern University
** no previous code4lib presentations

[http://www.avalonmediasystem.org/ Avalon Media System] is a collaborative effort between development teams at Northwestern and Indiana Universities. Our goal is to produce an open source media management platform that works well for us, but is also widely adopted and contributed to by other institutions. We believe that building a strong user and contributor community is vital to the success and longevity of the project, and have developed the system with this goal in mind. We will share lessons learned, pains and successes we’ve had releasing two versions of the application since last year.

Our presentation will cover our experiences:
* providing flexible, admin-friendly distribution and installation options
* building with abstraction, customization and local integrations in mind
* prioritizing features (user stories)
* attracting code contributions from other institutions
* gathering community feedback
* creating a product rather than a bag of parts

== How to check your data to provide a great data product? Data quality as a key product feature at Europeana ==

*[mailto:Peter.Kiraly@kb.nl Péter Király] portal backend developer, Europeana
*No previous C4L presentations

[http://Europeana.eu/ Europeana.eu] - Europe's digital library, archive and museum - aggregates more than 30 million metadata records from more than 2200 institutions. The records come from libraries, archives, museums and every other kind of cultural institution, from very different systems and metadata schemas, and are typically transformed several times until they are ingested into the Europeana data repository. Europeana builds a consolidated database from these records, creating reliable and consistent services for end-users (a search portal, search widget, mobile apps, thematic sites etc.) and an API, which supports our strategic goeal of data for reuse in education, creative industries, and the cultural sector. A reliable "data product" is thus at the core of our own software products, as well as those of our API partners.

Much effort is needed to smooth out local differences in the metadata curation practice of our data providers. We need a solid framework to measure the consistency of our data and provide feedback to decision-makers inside and outside the organisation. We can also use this metrics framework to ask content providers to improve their own metadata. Of course, a data-quality-driven approach requires that we also improve the data transformation steps of the Europeana ingestion process itself. Data quality issues heavily define what new features we are able to create in our user interfaces and API, and might actually affect the design and implementation of our underlying data structure, the Europeana Data Model.

In the presentation I briefly describe the Europeana metadata ingestion process, show the data quality metrics, the measuring techniques (using the Europeana API, Solr and MongoDB queries), some typical problems (both trivial and difficult ones), and finally the feedback mechanism we propose to deploy.

Keywords: Europeana, data quality, EDM, API, Apache Solr, MongoDB, #opendata, #openglam

== Teach your Fedora to Fly: scaling out your digital repository ==

*[mailto:acoburn@amherst.edu Aaron Coburn], Software Developer, Amherst College
*No previous C4L presentations

Fedora is a great repository system for managing large collections of digital objects, but what happens when a popular food magazine begins directing a large number of readers to a manuscript showing Emily Dickinson’s own recipe for doughnuts? While Fedora excels in its support of XML-based metadata, it doesn’t always perform well under a high volume of traffic. Nor is it especially tolerant of network or hardware failures.

This presentation will show how we are making heavy use of a fedora repository while at the same time insulating it almost entirely from any web traffic. Starting with a distributed web front-end built with Node.js, and caching most of the user-accessible content from Fedora in an elastic, fault-tolerant Riak (NoSQL) cluster, we have eliminated nearly all single points of failure in the system. It also means that our production system is spread across twelve separate servers, where asynchrony and Map-Reduce are king. And aside from being blazing fast, it is also entirely Hydra-compliant.

Furthermore, we will attempt to answer the question: if fedora crashes and the visitors to your site don’t notice, did it really fail?

== Using Open Source Software and Freeware to Preserve and Deliver Digital Videos ==
* [mailto:wfang@kinoy.rutgers.edu Wei Fang], Head of Digital Services, Rutgers University Law Library
* Jiebei Luo, Digital Projects Initiative Intern, Rutgers University
*No previous C4L presentations

The Rutgers University Law Library is the official digital repository of the New Jersey Supreme Court oral arguments since 2002. This large video collection contains approximately 3,000 videos with a total of 400 GB or 6,000 viewing hours. With the expansion of this collection, the existing database and the static website could not efficiently support the library’s daily operations and meet its patrons’ search needs.
By utilizing open source software and freeware such as Ubuntu, FFmpeg, Solr and Drupal, the library is able to develop a complete solution to re-encoding videos, embedding subtitles, incorporating Solr search engine and content management system to support full-text subtitle search, automatically updating video metadata records in the library catalog system and eventually providing a plug-in free HTML 5-based Web interface for patrons to view the videos online.
The aspects below will be presented in detail at the conference:
* Video codecs comparison
* Server-end batch video encoding/re-encoding
* HTML 5 video tag and embedding subtitles
* Incorporating search engine Solr and content management tool Drupal with the database to retrieve videos by full-text search especially in subtitle files
* Incorporating video metadata with the library catalog system

== Shared Vision, Shared Resources: the Curate Institutional Repository ==
* Dan Brubaker Horst, University of Notre Dame
** [http://code4lib.org/conference/2011/JohnsonHorst A Community-Based Approach to Developing a Digital Exhibit at Notre Dame Using the Hydra Framework]
* Julie Rudder, Northwestern University
** no previous presentations

Curate is being collaboratively developed by several institutions in the Hydra community who share the need and vision for a Fedora-backed Institutional Repository. The first release of Curate was a collaboration between Notre Dame and Northwestern University, along with Digital Curation Experts (DCE) - a vendor hired jointly by our two institutions. Powered by the Hydra engine Sufia, the team worked quickly to release the first version of Curate in October 2013 which provides a basic self-deposit system that has support for various content types, collection building, DOI minting, and user profile creation. From the very beginning we have built Curate to be easy to theme and extend in order to ease the process of installation and use by other institutions.

In December 2013, additional partners will join the project including: Indiana University, the University of Cincinnati and the University of Virginia. Each institution contributes resources to the project in order to further our common goal to create a product that fits our needs and has a sustainable future.Together we will tackle additional content types (like complex data, software, media), administrative collections and more.

Our presentation will include:
* a brief demonstration of Curate and technical overview
* why and how we work together
* why build Curate
* the future of the project

== Solr, Cloud and Blacklight ==
* David Jiao, Library Information Systems, Indiana University at Bloomington, djiao@indiana.edu
** No previous code4lib presentations

SolrCloud refers to the distributed capabilities in Solr4. It is designed to offer a highly available, fault tolerant environment by organizing data into multiple pieces that can be hosted on multiple machines with replicas, and providing a centralized cluster configuration and management.

At Indiana University, we are upgrading our Solr backend for our recently released Blacklight-based OPAC system from Solr 1.4 to Solr4, and we also put up efforts to build a private cloud of Solr4 servers. In this talk, I will persent certain features of SolrCloud, including distributed requests, fault tolerance, near real time indexing/searching, and configuration management with Zookeeper, and our experiences of utilizing these features to provide better performance and architecture for our OPAC system, which serves over 7 million bibliographic records to over 100 thousand students and faculty members. I will also discuss some practical lessons learned from our SolrCloud setup/upgrade and the integration of the new SolrCloud to our customized Blacklight system.

== Leveraging XSD's for Reflective, Live Dataset Support in Institutional Repositories ==
* [mailto:msulliva@ufl.edu Mark Sullivan], Library Information Technology, University of Florida
** No previous code4lib presentations

The University of Florida Libraries are currently adding support for active datasets into our METS-based institutional repository software. This ongoing project enables the library to be a partner in current, or long-running, data-driven projects around the university by providing tangible short-term and long-term benefits to the projects. The system assists project teams by storing and providing access to their data, while supporting online filtering and sorting of the data, custom queries, and adding and editing of the data by authorized users. We are also exploring simple data visualizations to allow users to perform basic graphical and geographic queries. Several different schemas were explored including DDI and EML, but ultimately the streamlined approach of using XSD's with some custom attributes was chosen, with all other data residing in the METS file portions. Currently the system is being developed using XSD's describing XML datasets, but this model should easily scale to support SQL datasets or large datasets supported by Hadoop or iRODS.

This work is being integrated in the open source [http://sobek.ufl.edu SobekCM Digital Content Management System] which is built on a pair-tree structure of METS resources with [http://ufdc.ufl.edu/design/webcontent/sobekcm/SobekCM_Resource_Object.pdf rich metadata support] including DC, MODS, MARC, VRACore, DarwinCore, IEE-LOM, GML/KML, schema.org microdata, and many other standard schemas. The system has emphasized online, distributed creation and maintenance of resources including geo-placement and geographic searching of resources, building structure maps (table of contents) visually online, and a broad suite of curator tools.

This work is presented as a model which could be implemented in other systems as well. We will demonstrate current support and discuss our upcoming roadmap to provide complete support.

== Dead-simple Video Content Management: Let Your Filesystem Do The Work ==

* Andreas Orphanides, NCSU Libraries (akorphan (at) ncsu.edu)
** (never led or soloed a C4L presentation)

Content management is hard. To keep all the moving parts in order, and to maintain a layer of separation between the system and content creators (who are frequently not technical experts), we typically turn to content management systems like Drupal. But even Drupal and its kin require significant overhead and present a not inconsiderable learning curve for nontechnical users.

In some contexts it's possible -- and desirable -- to manage content in a more streamlined, lightweight way, with a minimum of fuss and technical infrastructure. In this presentation I'll share a simple MVC-like architecture for managing video content for playback on the web, which uses a combination of Apache's mod_rewrite module and your server's filesystem structure to provide an automated approach to video content management that's easy to implement and provides a low barrier to content updates: friendly to content creators and technology implementors alike. Even better, the basic method is HTML5-friendly, and can be integrated into your favorite content management system if you've got permissions for creating templates.

In the presentation I'll go into detail about the system structure and logic required to implement this approach. I'll detail the benefits and limitations of the system, as well as the challenges I encountered in developing its implementation. Audience members should come away with sufficient background to implement a similar system on their own servers. Implementation documentation and genericized code will also be shared, as available.

== Managing Discovery ==

* Andrew Pasterfield, Senior Programmer/Systems Analyst, University of Calgary Library, ampaster@ucalgary.ca
**No previous code4lib presentations 
In fall 2012 the University of Calgary Library launched a new home page that incorporated a Summon powered
Single Search Box with customized “bento box” results display. Search at the U of C now combines a range of
metadata sources for discovery and customized mapping of a database recommender and LibGuide into a unified
display. Further customizations include a non Google Analytics/non proxy method to log clicks. 

This presentation will discuss the technical details of bringing the various systems together into one display interface to increase discovery at the U of C Library.

http://library.ucalgary.ca

== Sorting it out: a piece of the User Centered Design Process ==

* Cindy Beggs, [http://www.akendi.com/aboutus/management/ Akendi], cindy@akendi.com

This talk is about how to apply a user centered design methodology to the process of creating an information architecture. Participants learn the fundamentals of UCD and how card sorting and reverse card sorting enable us to isolate the content we present on screen from the layouts and visuals of those screens. We talk about ways to identify who will be using the information architecture you are creating and why we need to know how it will be used.

What will attendees takes away from your talk?
The criticality of involving “real” end users in the process of creating an information architecture. The basics of following a user-centered-design process in the creation of best in class, content-rich, digital products.

Cindy Beggs has been working in the “information industry” for over 25 years. A librarian by profession, she has spent decades helping users figure out how to find their way through large bodies of content. Her insights into how people seek information, her empathy for those who find it a challenge and her practical experience helping organizations figure out how to best structure their content contribute to her success as an information architect with both clients and trainees. (http://www.akendi.com/aboutus/management/)

==Implementation of ArchivesSpace in University of Richmond==

*Birong Ho, bho@richmond.edu

University of Richmond implemented its archive collection management ArchivsSpace in the fall, 2013. As a charter member and the Head of Special Collection as the Board member, implementation of such an Open Source Software became a priority.

Several aspects of implementation will be addressed in the talk. Among them, they are Collections and Repository, storage layer including data format, System resources requirements, Technical architecture, Customization, scaling and integrated with other systems in the library.

The customization, scale, and integration with other systems such as Archeon and Exist on campus became a concern will be focused and elaborated in the talk.

==Easy Wins for Modern Web Technologies in Libraries==

*[mailto:trey.terrell@oregonstate.edu Trey Terrell], Analyst Programmer, Oregon State University
** No previous Code4Lib presentations

Oregon State University is currently implementing an updated version of its room reservation system. In its development we've come across and implemented a variety of "easy wins" to make it more responsive, easier to maintain, less expensive to run, and just cooler to experience. While our particular system was in Ruby on Rails, this talk will address general methods and example utilities which can be used no matter your stack.

I'll be talking about things like cache management, reverse proxies, publish/subscribe servers, WebSockets, responsive design, asynchronous processing, and keeping complicated stacks up and running with minimal effort.

==Implementing Islandora at a Small Institution==

*Megan Kudzia, Albion College Library
*Eddie Bachle, Albion College IT
**No previous Code4Lib presentations

Albion College (and particularly the Library/Archives and Special Collections) has a variety of needs which could be met by an open-source Institutional Repository system. Several months and lots of conversations later, we’re continuing to troubleshoot our way through Islandora. We’d like to talk about what has worked for us, where our frustrations have been, whether it’s even possible to install and develop a system like this at a small institution, and where the process has stalled.

As of right now, we do have a semi-working installation. We’re not sure when it will be ready for our end users, but we'll talk about our development process and evaluate our progress.
''Contributions also by Nicole Smeltekop, Albion College Archives & Special Collections''

== PhantomJS+Selenium: Easy Automated Testing of AJAX-y UIs ==

* Martin Haye, California Digital Library, martin.haye@ucop.edu
** Previous Code4Lib Presentation: [http://code4lib.org/conference/2012/collett Beyond code: Versioning data with Git and Mercurial] at Code4Lib 2012 (Martin co-presenting with Stephanie Collett)
* Mark Redar, California Digital Library, mark.redar@ucop.edu

Web user interfaces are demanding ever-more dynamism and polish, combining HTML5, AJAX, lots of CSS and jQuery (or ilk) to create autocomplete drop-downs, intelligent buttons, stylish alert dialogs, etc. How can you make automated tests for these highly complex and interactive UIs?

Part of the answer is PhantomJS. It’s a modern WebKit browser that’s “headless” (meaning it has no display) that can be driven from command-line Selenium unit tests. PhantomJS is dead simple to install, and its blazing speed and server-friendliness make continuous integration testing easy. You can write UI unit tests in {language-of-your-choice} and run them not just in PhantomJS but in Firefox and Chrome, plus a zillion browser/OS combinations at places like SauceLabs, TestingBot and BrowserStack.

In this double-team live code talk, we’ll explain all that while we demonstrate the following in real time:

* Start with nothing.
* Install Selenium bindings for Ruby and Python.
* In each language write a small test of an AJAX-y UI.
* Run the tests in Firefox, and fix bugs (in the test or UI) as needed.
* Install PhantomJS.
* Show the same tests running headless as part of a server-friendly test suite.
* (Wifi permitting) Show the same tests running on a couple different browser/OS combinations on the server cloud at SauceLabs – talking through a tunnel to the local firewalled application.

==New Technologies, Collaboration, & Entrepreneurship in Libraries: Harnessing Their Power to Help Your Library==

* Stephanie Walker – swalker@brooklyn.cuny.edu
* Howard Spivak – howards@brooklyn.cuny.edu
* Alex - Alex@brooklyn.cuny.edu

Academic libraries are caught in budget squeezes and often struggle to find ways to communicate value to senior administration and others. At Brooklyn College Library, we have taken an unusual, possibly unique, approach to these issues. Our technology staff have long worked directly with librarians to develop products that meet library, faculty, and student needs, and we have shared many of our products with colleagues, including an award-winning website, e-resource, and content management system we call 4MyLibrary, which we shared for free with 8 CUNY colleges, and also an easy-to-use book scanner, which has proven overwhelming popular with students, faculty, other librarians, and numerous campus offices. Recently, motivated by budget cuts, we decided that what worked for us might interest other libraries, and working with our Office of Technology Commercialization, we started selling 2 products: our book scanners (at half the price of commercial alternatives), and a hosting service, whereby we could host and support 4MyLibrary for libraries with minimal technology staff. Both succeeded, and yielded major benefits: a steady revenue stream and the admiration and serious goodwill of our senior administration and others. However, this presentation is neither a basic how-to, nor an advertisement. With this presentation, we hope to spur a conversation for broader collaboration, especially regarding new technologies, among libraries. We all have some level of technical expertise, most of us are struggling with rising prices and tight budgets, and many of us are unhappy with various technology products we use, from scanners to our ILS. We believe – and can demonstrate – that with collaboration, we can solve many of our problems, and provide better services to boot.

== Identifiers, Data, and Norse Gods ==

* Ryan Scherle, Dryad Digital Repository, ryan@datadryad.org

ORCID and DataCite are provide stable identifiers for researchers and and data, respectively. Each system does a fine job of providing value to its users. But wouldn't it be great if they could link their systems to create something much more powerful? Perhaps even as powerful as a god?

Enter ODIN, The ORCID and DataCite Interoperability Network. ODIN is a two-year project to unleash the power of persistent identifiers for researchers and the research they create. This talk will present recent work from the ODIN project, including several tools that can be used to unleash the godlike power of identifiers at your institution.

[[:Category:Code4Lib2014]]

HAMR: Human/Authority Metadata Reconciliation

2012-03-09T20:29:10Z

Ryscher:

[[HAMR: Human/Authority Metadata Reconciliation]]

Initial design/prototype by: Sean Chen, Tim Donohue, Joshua Gomez, Ranti Junus, Ryan Scherle

A tool for a curator to determine whether the various fields of a metadata record are correct. Takes a metadata record, locates any identifiers (e.g., DOI, PMID). Retrieves a copy of the metadata record from an authoritative source (e.g., CrossRef, PubMed). Displays a human-readable page that compares fields in the initial record with fields in the authoritative record. Each field is color-coded based on how well it matches, so the curator can quickly identify discrepancies.

== UI Prototype (uses static data) ==
http://dl.dropbox.com/u/9074989/code4lib/unverified.html

== Basic design ==

Narrowing the focus for an initial usable version:
* Dublin core (maybe qualified)
* framework that allows multiple authority sources
* NOT focusing on author names ([http://www.orcid.org/ ORCID] is already working on this), except the fact that they are strings, and we'll do basic string matching
* 1 to 1 matching. Even if you want to eventually match with multiple authorities, you'd only do one at a time

Possible authority sources:
* PubMed
** Sample pubmed query (in Java): [https://wiki.duraspace.org/display/DSPACE/PubMedPrefill-PubmedPrefillStep.java DSpace PubMedPrefillStep.java] (From [https://wiki.duraspace.org/display/DSPACE/PopulateMetadataFromPubMed Populate Metadata from PubMed])
*** See 'retrievePubmedXML()' in above java code for actual call to PubMed
*** Mapping happens here: See [https://wiki.duraspace.org/display/DSPACE/PubMedPrefill-pmid+dim.xsl pmid-to-dim.xsl] for a sample XSLT crosswalk to translate PubMed format to a qualified dublin core (internal DSpace metadata format)
** More examples of querying PubMed: http://www.my-whiteboard.com/how-to-automate-pubmed-search-using-perl-php-or-java/
** Useful tool for finding PubMed IDs: http://www.ncbi.nlm.nih.gov/entrez/getids.cgi
* CrossRef
** simply send the DOI to crossref, and get JSON/XML back
*** http://api.labs.crossref.org/10.1111/j.1558-5646.2009.00626.x.json
*** http://api.labs.crossref.org/10.2307/1935157.xml
*** [http://code.google.com/p/dryad/source/browse/trunk/dryad/dspace/modules/doi/dspace-doi-webapp/src/main/java/org/dspace/doi/DOIServlet.java java code that includes a lookup]
** [http://labs.crossref.org/site/crossref_metadata_search.html Metadata Search] -- send a text query, receive a list of matching records
** [http://labs.crossref.org/site/quick_and_dirty_api_guide.html OpenURL search]
* google scholar - does it have an API?
* [http://www.mendeley.com mendeley] - [http://dev.mendeley.com/ Mendeley API]
* [http://vivoweb.org/ vivo]
* [http://bibapp.org/ bibapp]

Thoughts / Questions:
* Is there a way to do most/all of this via Javascript/AJAX/JQuery? Could it be a simple Javascript framework you could "drop" into any metadata editing interface?
** Unfortunately, it seems this wouldn't work out. In order to perform querying of external authorities, they'd all need to support [http://en.wikipedia.org/wiki/JSON#JSONP JSONP] or similar (and they don't)

== Code ==

* [http://gitref.org/ quick reference for Git]
* [https://github.com/ryscher/hamr Ryan's really stupid scratch implementation]

=== Draft Matching Algorithm ===
<pre>
function compareRecords(localDubCore, authDubCore)
recordMatches = []
for each element-type:
loc = array of local values
auth = array of authority values
// arrays are actually lists of dictionaries
// a1
// 0 value="Benson, Arnold", match="", strength=""
// 1 value="Terrence, D.", match="a2[3]", strength="100%"
elementMatches = compareElements(loc, auth)
recordMatches.add(elementMatches)

function compareElements(loc, auth)
output = []
//nested loops run through values and assigns strongest matches to each element
for each element in loc
for each element in auth
strength = string distance between the two elements
if strength = 100
//if match is perfect go ahead pop each element and add their values to output array
//output array is also list of dictionaries
//0 loc="Hector", auth="Hector", strength="100"
//1 loc="Albert", auth="Alberto", strength="90"
if strength > auth element's current strength value
overwrite auth element's strength and match values
if strength > loc element's current strength value
overwrite loc element's strength and match values
//this second set of non-nested loops pull out the strongest matches
for each item in auth
//x = some arbitrary barrier for a decent enough match
if element strength > x AND if matching element is still in the loc list
pop each element and add their values to output array
for each item in loc
if element strength > x AND if matching element is still in the auth list
pop each element and add their values to output array
//now do cleanup and look for values that have no decent matches
for each element in loc
pop element and add to output array without match //x loc="Heyward", auth="", strength=""
for each element in auth
pop element and add to output array without match //x loc="", auth="Perry", strength=""
return output
</pre>

== Output Spec ==

* We will use a simple XML output consisting of paired (and possibly unpaired) values.
* The root element will contain an attribute signifying the source of the authority metadata.
* The <match> element will be used to pair values, with a strength attribute to signify the string distance.
* Within each match element will be exactly 2 metadata elements with attributes signifying the source of each value: either the local input or the remote authority data.
* An <nonmatch> element will be used for unpaired values.

=== Sample Output ===
<pre>
<hamr authority="PubMed">
<match strength="100%">
<creator src="input">Trojan, Tommy</creator>
<creator src="authority">Trojan, Tommy</creator>
</match>
<match strength="90%">
<title src="input">Great American Article</title>
<title src="authority">Great American Article, The</title>
</match>
<nonmatch>
<subject src="input">Medical Stuff</subject>
</nonmatch>
<nonmatch>
<type src="authority">text</type>
</nonmatch>
</hamr>
</pre>

== Need to do ==

# Implement metadata retrieval from authority ''(done for crossref in ryan's code)''
# Design structure of plugins
## crosswalk from authority format to simple dc
# Design matching algorithm

HAMR: Human/Authority Metadata Reconciliation

2011-06-03T20:21:35Z

Ryscher:

[[HAMR: Human/Authority Metadata Reconciliation]]

Sean Chen, Tim Donohue, Joshua Gomez, Ranti Junus, Ryan Scherle

A tool for a curator to determine whether the various fields of a metadata record are correct. Takes a metadata record, locates any identifiers (e.g., DOI, PMID). Retrieves a copy of the metadata record from an authoritative source (e.g., CrossRef, PubMed). Displays a human-readable page that compares fields in the initial record with fields in the authoritative record. Each field is color-coded based on how well it matches, so the curator can quickly identify discrepancies.

== UI Prototype (uses static data) ==
http://dl.dropbox.com/u/9074989/code4lib/unverified.html

== Basic design ==

Narrowing the focus for an initial usable version:
* Dublin core (maybe qualified)
* framework that allows multiple authority sources
* NOT focusing on author names ([http://www.orcid.org/ ORCID] is already working on this), except the fact that they are strings, and we'll do basic string matching
* 1 to 1 matching. Even if you want to eventually match with multiple authorities, you'd only do one at a time

Possible authority sources:
* PubMed
** Sample pubmed query (in Java): [https://wiki.duraspace.org/display/DSPACE/PubMedPrefill-PubmedPrefillStep.java DSpace PubMedPrefillStep.java] (From [https://wiki.duraspace.org/display/DSPACE/PopulateMetadataFromPubMed Populate Metadata from PubMed])
*** See 'retrievePubmedXML()' in above java code for actual call to PubMed
*** Mapping happens here: See [https://wiki.duraspace.org/display/DSPACE/PubMedPrefill-pmid+dim.xsl pmid-to-dim.xsl] for a sample XSLT crosswalk to translate PubMed format to a qualified dublin core (internal DSpace metadata format)
** More examples of querying PubMed: http://www.my-whiteboard.com/how-to-automate-pubmed-search-using-perl-php-or-java/
** Useful tool for finding PubMed IDs: http://www.ncbi.nlm.nih.gov/entrez/getids.cgi
* CrossRef
** simply send the DOI to crossref, and get JSON/XML back
*** http://api.labs.crossref.org/10.1111/j.1558-5646.2009.00626.x.json
*** http://api.labs.crossref.org/10.2307/1935157.xml
*** [http://code.google.com/p/dryad/source/browse/trunk/dryad/dspace/modules/doi/dspace-doi-webapp/src/main/java/org/dspace/doi/DOIServlet.java java code that includes a lookup]
** [http://labs.crossref.org/site/crossref_metadata_search.html Metadata Search] -- send a text query, receive a list of matching records
** [http://labs.crossref.org/site/quick_and_dirty_api_guide.html OpenURL search]
* google scholar - does it have an API?
* [http://www.mendeley.com mendeley] - [http://dev.mendeley.com/ Mendeley API]
* [http://vivoweb.org/ vivo]
* [http://bibapp.org/ bibapp]

Thoughts / Questions:
* Is there a way to do most/all of this via Javascript/AJAX/JQuery? Could it be a simple Javascript framework you could "drop" into any metadata editing interface?
** Unfortunately, it seems this wouldn't work out. In order to perform querying of external authorities, they'd all need to support [http://en.wikipedia.org/wiki/JSON#JSONP JSONP] or similar (and they don't)

== Code ==

* [http://gitref.org/ quick reference for Git]
* [https://github.com/ryscher/hamr Ryan's really stupid scratch implementation]

=== Draft Matching Algorithm ===
<pre>
function compareRecords(localDubCore, authDubCore)
recordMatches = []
for each element-type:
loc = array of local values
auth = array of authority values
// arrays are actually lists of dictionaries
// a1
// 0 value="Benson, Arnold", match="", strength=""
// 1 value="Terrence, D.", match="a2[3]", strength="100%"
elementMatches = compareElements(loc, auth)
recordMatches.add(elementMatches)

function compareElements(loc, auth)
output = []
//nested loops run through values and assigns strongest matches to each element
for each element in loc
for each element in auth
strength = string distance between the two elements
if strength = 100
//if match is perfect go ahead pop each element and add their values to output array
//output array is also list of dictionaries
//0 loc="Hector", auth="Hector", strength="100"
//1 loc="Albert", auth="Alberto", strength="90"
if strength > auth element's current strength value
overwrite auth element's strength and match values
if strength > loc element's current strength value
overwrite loc element's strength and match values
//this second set of non-nested loops pull out the strongest matches
for each item in auth
//x = some arbitrary barrier for a decent enough match
if element strength > x AND if matching element is still in the loc list
pop each element and add their values to output array
for each item in loc
if element strength > x AND if matching element is still in the auth list
pop each element and add their values to output array
//now do cleanup and look for values that have no decent matches
for each element in loc
pop element and add to output array without match //x loc="Heyward", auth="", strength=""
for each element in auth
pop element and add to output array without match //x loc="", auth="Perry", strength=""
return output
</pre>

== Output Spec ==

* We will use a simple XML output consisting of paired (and possibly unpaired) values.
* The root element will contain an attribute signifying the source of the authority metadata.
* The <match> element will be used to pair values, with a strength attribute to signify the string distance.
* Within each match element will be exactly 2 metadata elements with attributes signifying the source of each value: either the local input or the remote authority data.
* An <nonmatch> element will be used for unpaired values.

=== Sample Output ===
<pre>
<hamr authority="PubMed">
<match strength="100%">
<creator src="input">Trojan, Tommy</creator>
<creator src="authority">Trojan, Tommy</creator>
</match>
<match strength="90%">
<title src="input">Great American Article</title>
<title src="authority">Great American Article, The</title>
</match>
<nonmatch>
<subject src="input">Medical Stuff</subject>
</nonmatch>
<nonmatch>
<type src="authority">text</type>
</nonmatch>
</hamr>
</pre>

== Need to do ==

# Implement metadata retrieval from authority ''(done for crossref in ryan's code)''
# Design structure of plugins
## crosswalk from authority format to simple dc
# Design matching algorithm

HAMR: Human/Authority Metadata Reconciliation

2011-02-17T04:24:42Z

Ryscher:

[[HAMR: Human/Authority Metadata Reconciliation]]

Sean Chen, Tim Donohue, Joshua Gomez, Ranti Junus, Ryan Scherle

A tool for a curator to determine whether the various fields of a metadata record are correct. Takes a metadata record, locates any identifiers (e.g., DOI, PMID). Retrieves a copy of the metadata record from an authoritative source (e.g., CrossRef, PubMed). Displays a human-readable page that compares fields in the initial record with fields in the authoritative record. Each field is color-coded based on how well it matches, so the curator can quickly identify discrepancies.

Narrowing the focus for today:
* Dublin core (maybe qualified)
* framework that allows multiple authority sources
* NOT focusing on author names ([http://www.orcid.org/ ORCID] is already working on this), except the fact that they are strings, and we'll do basic string matching
* 1 to 1 matching. Even if you want to eventually match with multiple authorities, you'd only do one at a time

Possible authority sources:
* PubMed
** Sample pubmed query (in Java): [https://wiki.duraspace.org/display/DSPACE/PubMedPrefill-PubmedPrefillStep.java DSpace PubMedPrefillStep.java] (From [https://wiki.duraspace.org/display/DSPACE/PopulateMetadataFromPubMed Populate Metadata from PubMed])
*** See 'retrievePubmedXML()' in above java code for actual call to PubMed
*** Mapping happens here: See [https://wiki.duraspace.org/display/DSPACE/PubMedPrefill-pmid+dim.xsl pmid-to-dim.xsl] for a sample XSLT crosswalk to translate PubMed format to a qualified dublin core (internal DSpace metadata format)
** More examples of querying PubMed: http://www.my-whiteboard.com/how-to-automate-pubmed-search-using-perl-php-or-java/
** Useful tool for finding PubMed IDs: http://www.ncbi.nlm.nih.gov/entrez/getids.cgi
* CrossRef
** simply send the DOI to crossref, and get JSON/XML back
*** http://api.labs.crossref.org/10.1111/j.1558-5646.2009.00626.x.json
*** http://api.labs.crossref.org/10.2307/1935157.xml
*** [http://code.google.com/p/dryad/source/browse/trunk/dryad/dspace/modules/doi/dspace-doi-webapp/src/main/java/org/dspace/doi/DOIServlet.java java code that includes a lookup]
** [http://labs.crossref.org/site/crossref_metadata_search.html Metadata Search] -- send a text query, receive a list of matching records
** [http://labs.crossref.org/site/quick_and_dirty_api_guide.html OpenURL search]
* google scholar - does it have an API?
* [http://www.mendeley.com mendeley] - [http://dev.mendeley.com/ Mendeley API]
* [http://vivoweb.org/ vivo]
* [http://bibapp.org/ bibapp]

Thoughts / Questions:
* Is there a way to do most/all of this via Javascript/AJAX/JQuery? Could it be a simple Javascript framework you could "drop" into any metadata editing interface?
** Unfortunately, it seems this wouldn't work out. In order to perform querying of external authorities, they'd all need to support [http://en.wikipedia.org/wiki/JSON#JSONP JSONP] or similar (and they don't)

== Code ==

* [http://gitref.org/ quick reference for Git]
* [https://github.com/ryscher/hamr Ryan's really stupid scratch implementation]

=== Draft Matching Algorithm ===
<pre>
function compareRecords(localDubCore, authDubCore)
recordMatches = []
for each element-type:
loc = array of local values
auth = array of authority values
// arrays are actually lists of dictionaries
// a1
// 0 value="Benson, Arnold", match="", strength=""
// 1 value="Terrence, D.", match="a2[3]", strength="100%"
elementMatches = compareElements(loc, auth)
recordMatches.add(elementMatches)

function compareElements(loc, auth)
output = []
//nested loops run through values and assigns strongest matches to each element
for each element in loc
for each element in auth
strength = string distance between the two elements
if strength = 100
//if match is perfect go ahead pop each element and add their values to output array
//output array is also list of dictionaries
//0 loc="Hector", auth="Hector", strength="100"
//1 loc="Albert", auth="Alberto", strength="90"
if strength > auth element's current strength value
overwrite auth element's strength and match values
if strength > loc element's current strength value
overwrite loc element's strength and match values
//this second set of non-nested loops pull out the strongest matches
for each item in auth
//x = some arbitrary barrier for a decent enough match
if element strength > x AND if matching element is still in the loc list
pop each element and add their values to output array
for each item in loc
if element strength > x AND if matching element is still in the auth list
pop each element and add their values to output array
//now do cleanup and look for values that have no decent matches
for each element in loc
pop element and add to output array without match //x loc="Heyward", auth="", strength=""
for each element in auth
pop element and add to output array without match //x loc="", auth="Perry", strength=""
return output
</pre>

== Output Spec ==

* We will use a simple XML output consisting of paired (and possibly unpaired) values.
* The root element will contain an attribute signifying the source of the authority metadata.
* The <match> element will be used to pair values, with a strength attribute to signify the string distance.
* Within each match element will be exactly 2 metadata elements with attributes signifying the source of each value: either the local input or the remote authority data.
* An <nonmatch> element will be used for unpaired values.

=== Sample Output ===
<pre>
<hamr authority="PubMed">
<match strength="100%">
<creator src="input">Trojan, Tommy</creator>
<creator src="authority">Trojan, Tommy</creator>
</match>
<match strength="90%">
<title src="input">Great American Article</title>
<title src="authority">Great American Article, The</title>
</match>
<nonmatch>
<subject src="input">Medical Stuff</subject>
</nonmatch>
<nonmatch>
<type src="authority">text</type>
</nonmatch>
</hamr>
</pre>

== UI Example (static) ==
http://dl.dropbox.com/u/9074989/code4lib/unverified.html

== Need to do ==

# Implement metadata retrieval from authority ''(done for crossref in ryan's code)''
# Design structure of plugins
## crosswalk from authority format to simple dc
# Design matching algorithm

C4L2011 social activities

2011-02-10T02:52:27Z

Ryscher: /* Werewolf Signup */

'''This page is under development'''

==Code4Lib 2011 social activities ==

[http://maps.google.com/maps/ms?hl=en&ie=UTF8&msa=0&msid=201419377696104618083.000499eeb466dfb962201&ll=39.169964,-86.53008&spn=0.025286,0.03077&z=15 Code4Lib 2011 Google Map] - It's the ultimate plot device! See what others are recommending and recommend your own places to see and things to do!

[http://www.imu.indiana.edu/pdfs/imu/pdfs/IMU%20Map%202011.pdf Map of the Indiana Memorial Union] - The conference is happening [http://www.imu.indiana.edu/event/AlumniHall.mov here], and [http://www.imu.indiana.edu/img/rooms/whittenberger.jpg here], [http://www.imu.indiana.edu/img/rooms/solarium.jpg here], and [http://www.imu.indiana.edu/event/TreeSuitesRoom.mov here]. Use this map and you can probably find it all.

==Local events==

* [https://wiki.dlib.indiana.edu/confluence/display/EVENTS/Code4Lib+2011+Proposal Original Proposal (suggests some nearby events)]
* February 4-5, 8-12: [http://www.indiana.edu/~thtr/productions/2010/angelsInAmerica.shtml Angels in America: Part One] at IU Wells-Metz Theatre
* February 5: [http://music.indiana.edu/events/?e=9221 New Music Ensemble] at Auer Hall, 8 p.m, with [http://www.parnasmusic.com/Index.html duo parnas] performing.
* February 6: [http://www.indiana.edu/~iucinema/calendar.shtml Who the #$&% is Jackson Pollock] and [http://www.indiana.edu/~iucinema/events2011.21.shtml And Everything is Going Fine] at the new IU Cinema
* February 7: [http://www.indiana.edu/~iucinema/events2011.33.shtml Reign of Terror] at the new IU Cinema
* February 7: [http://music.indiana.edu/events/?e=9224 Jazz Ensemble] at Musical Arts Center, 8 p.m.
* February 7: [https://onestart.iu.edu/ccl-prd/EventMaintenance.do?methodToCall=viewEvent&eventId=488273&pubCalId=GRP1445 Michael Chabon] at Fine Arts Auditorium, 5:30 p.m.
* February 8: [http://www.theroommovie.com/screeningspop.html Tommy Wiseau's Love Is Blind Tour] (showing of "[[wikipedia:The Room (film)|The Room]]" w/ Q&A!)
* February 8: [http://music.indiana.edu/events/share.php?e=9225 IU Wind Ensemble] at Musical Arts Center - FREE!
* February 9: [http://music.indiana.edu/events/share.php?e=9226 IU University Orchestra] at Musical Arts Center - FREE!
* February 8 and 9: [http://www.iuauditorium.com/site/show-fiddler.html Fiddler on the Roof] at the IU Auditorium - $38-60
* February 9: [http://dylanettinger.bandcamp.com/ Dylan Ettinger]/[http://www.myspace.com/kamkama Kam Kama] at The Bishop
* February 10: Bob Marley's band [http://wailers.com/ Legendary Wailers] at [http://www.thebluebird.ws/ Bluebird Nightclub] - $20
* February 10: Amy Schumer at [http://www.comedyattic.com/index.php?option=com_k2&view=item&id=84:amy-schumer&Itemid=3 The Comedy Attic] - $13

Things to see:
* Indiana University Art Museum: [http://www.iub.edu/~iuam/section.php?navSection=galleries New in the Galleries]
* Indiana University SoFA Gallery: [http://www.indiana.edu/~sofa/exhibitions/iu-school-of-fine-arts-student-shows-1/ MFA Painting, Metals, Graphic Design, Ceramics, and Textiles]
* [http://www.indiana.edu/~mathers/=Indiana University Mathers Museum]

==Recommended Restaurants/Bars (no particular order)==

* [http://www.the-uptown.com/ Uptown Café] - great for breakfast, lunch, or wine
* [http://www.villagedeli.biz/ Village Deli] - can accommodate large crowds for breakfast or lunch
* [http://grazieitalianeatery.com/Welcome.html Grazie] - Italian, great wine selection
* Japanee (320 N. Walnut St) - Bento Box lunch – yum!
* [http://www.maxsplace.info/ Max’s Place] - pizza and beer
* [http://www.samirasrestaurant.com/ Samira] - Afghanistan cuisine – lunch buffet w/ roasted chicken
* Shanti (221 E. Kirkwood Ave) - great for lunch
* [http://www.esanthairestaurant.com/ Esan Thai] - delicious, but slow service
* [http://www.cafedjango.us/ Café Django] - Indian and Thai, great noodle dishes
* [http://www.stefanoscafe.com/ Stefano’s Ice Café] - best chicken salad sandwich in town
* [http://www.thelaughingplanetcafe.com/ Laughing Planet] - yummy burritos/nachos
* [http://www.bbcbagel.com/ Bloomington Bagel Co.] - best bagels in town
* RockIt's Pizza (222 N. Walnut St) - open late for a slice after visiting nearby bars
* [http://www.farm-bloomington.com/ Farm] - fun little hipster whiskey bar in basement
* [http://www.nicksenglishhut.com/ Nick's] - local meat, in-house batter for deep fried goodness
* [http://www.bbcbloomington.com/ Lennie’s Bar & Grill] - local brew
* [http://squaredonuts.com Square Donuts] - donuts that are square and fresh and you eat them zomg
* [http://uplandbeer.com/ Upland Brewing Co.] - more local brew (different from Lennie's)
* [http://www.irishlion.com/ The Irish Lion] - Irish pub, Guinness on tap
* [http://www.bloomingpedia.org/wiki/Bloomington_Sandwich_Company Bloomington Sandwich Co.] - great for lunch, yummy reuben
* Bub’s Burgers and Ice Cream (480 N. Morton St) - eat the giganto burger in a certain time limit, get your picture on the wall (if that's what you're into)
* [http://www.yogis.com/ Yogi's Grill & Bar] - bazillion beer choices
* Restaurant Ami (1500 E. 3rd St) - Japanese/Korean, great for lunch
* [http://www.turkuazcafe.com/ Turkuaz Cafe] - Turkish, pides great for lunch
* [http://www.crazyhorseindiana.com/ Crazy Horse] - also bazillion beer choices
* [http://www.finchsbrasserie.com/ Finch's] - great wine selection

==Planned events==

Plan one if you like! Either on your own or you can [[2011 committees_sign-up_page|join the social activities committee]].

=== [[Craft Brew Drinkup]] ===
''''Tuesday 2/8, 8:30PM, Hospitality suite'''' Like good beer? Bring some in your luggage! Some of us are planning on bringing on bottles of our favorite local brews to share. Interested? Sign up on the [[Craft Brew Drinkup]] page!

=== Newcomer Dinner ===
First time at code4lib? Join fellow c4l newbies and veterans for an evening of food, socializing, and stimulating <strike>discussions about</strike> demonstrations of the many uses of <strike>bacon</strike> dongles.

Code4Lib veterans, you're invited too. Join us in welcoming the newcomers!

'''Plans'''
* When: Monday evening (2/7)
* Time: 6 PM (ish)
* Mastermind (if you have any questions): [mailto:yoosebj@muohio.edu Becky Yoose]

''Guidelines:''
*Max of '''6''' per location
**Please, no waitlisting :(
*ID yourselves so we can jget a good mix of new people and veterans in each group
**New folks - n
**c4l vets - v
*One leader needed for each location (declare yourself! - '''Vets are highly encouraged to lead the group :)''')
**Leader duties
***Make reservations if required; otherwise make sure that the restaurant can handle a group of 6 rowdy library coders
***Herd folks from hotel to restaurant (know where you're going!)

'''Restaurants'''

'''West Side of Campus (towards downtown)'''

'''Indiana Avenue''', across street from campus (5-10 minute walk)

[http://www.buffalouies.com/home.html Buffa Louie's] (Wings/Subs/Sandwiches) - Gables location on Indiana Avenue is historic site

'''4th St. between IMU and downtown square''' (10-15 minute walk)

[http://www.anyetsangs.com/ Anyetsang's Little Tibet] (Tibetan/Thai/Indian)

**MEET ON THE MEZZANINE LEVEL NEAR THE COMFY CHAIRS**

*Dot Porter (leader) - n
*Jason Ronallo - n
*Ben Anderson - n
*Bill Dueber - v
*Jakub Skoczen - n
*[mailto:birkin_diana@brown.edu Birkin] - v

[http://www.siamhousebloomington.com/ Siam House] (Thai) ''Meet in the Biddle hotel lobby at 6pm. Look for the short woman in a trench coat and wide brim hat ~Becky''
*Becky Yoose (leader) - v
*Margaret Heller - n
*Bohyun Kim - n
*Karen Hanson - v
*Daniel Lovins - v
*Gerald Snyder - v
*<strike>Wayne Schneider - v</strike> off to hear [https://onestart.iu.edu/ccl-prd/EventMaintenance.do?methodToCall=viewEvent&eventId=488273&pubCalId=GRP1445 Michael Chabon] instead.

Puccini's La Dolce Vita (Italian)

'''Kirkwood Ave. (5th St) between IMU and downtown square''' (10-15 minute walk)

[http://www.nicksenglishhut.com/ Nick's English Hut] (American Pub) - NOTE: web site plays IU fight song

[http://www.cafepizzaria.com/ Cafe Pizzaria] (Pizza)

[http://www.falafelsonline.com/Falafels/www.FalafelsOnline.com.html Falafels] (Middle Eastern)

[http://www.finchsbrasserie.com/ Finch's] (Gastropub-ish) Meet in the mezzanine lounge at 6 PM, carefully avoiding contact with the man-eating comfy chairs. ''(8 min walk. Res. at 6:15 PM)''
* [mailto:cgordon@chillco.com Cary Gordon] (leader) - v (818-694-1626) ''(big guy wearing yellow jacket)''
* Michael Doran - v
* Tim Daniels - n
* Joshua Gomez - n
* Jenny Reiswig - n
* Kosuke Tanabe - n

Note: FARMbloomingon is closed on Mondays

[http://www.the-uptown.com/ Michael's Uptown Cafe] (Cajun/Creole/American) ''To walk to restaurant, meet in lounge upstairs from hotel lobby in IMU at 6pm. Reservation for 6:30pm at Uptown (under Julie Hardesty).''
*[mailto:jlhardes@indiana.edu Julie Hardesty] (leader) - n
*Jean Rainwater - v
*D Ruth Bavousett - n
* Theodor T - n
*Sarah Weeks - n
*Mark Mounts - v
* Takanori Hayashi -n

[http://www.thetrojanhorse.com/ Trojan Horse] (Greek)

Shanti (Indian, Vegetarian options)

'''6th St. between IMU and downtown square''' (10-15 minute walk)

[http://www.runciblespoonrestaurant.com/ Runcible Spoon] (Variety, Vegetarian/Vegan options)

'''Grant St. between 3rd St. and 6th St.''', about halfway between IMU and downtown square (10-15 minute walk)

[http://cafedjango.us/ Cafe Django] (Indian, Thai, Vegetarian options) ''Meet in the Biddle hotel lobby at 6pm. Look for the guy with the bright blue knitted Mets hat.''
* Maccabee Levine (leader) - v
* Andrew Darby - v
* Michael Slone - n
* Linda Ballinger - n
* Nell Taylor - n
* Richard Anderson - n

[http://www.bloomingpedia.org/wiki/Snow_Lion Snow Lion] (Tibetan, Indian, Vegetarian options)

'''Lincoln St. between 3rd St. and 6th St.''', about halfway between IMU and downtown square (15-20 minute walk)

[http://www.esanthairestaurant.com/ Esan Thai] (Thai, Vegetarian options) ''Meet outside Whittenberger Auditorium at 6pm''
* Ryan Scherle (leader) - v
* Jason Stirnaman (drummer) - v
* Jon Dunn - n
* Richard Maiti - n
* Mike Stroming - n
* Trish Rose-Sandler (can drive if needed) - n

'''Downtown Square''' (20-25 minute walk)

<strike>[http://uplandbeer.com/ Upland Brewing]</strike> (Microbrewery)

'''CHANGE OF PLAN: Going to [http://yogis.com/ Yogi's] instead; meet in lobby at 5:30PM'''

:Note: Lennie's (below) is also a Microbrewery/restaurant

* THE ARCHITECT'S GROUP
** Mike Giarlo (leader) - v
** Jay Luker (wall dodger) - v
** Benoit Thiell (french tickler) - n
** Dan Suchy (designated non-driver) - v
** Matt Critchlow (lupulin addict) - n
** Esme Cowles (designated non-walker) -v
* THE ARCHIVIST'S GROUP
** Mark Matienzo (leader) - v
** Joe Atzberger (agitator) - v
** Hillel Arnold (intern) - n
** Mark Custer (title tk) - n
** Patrick Force (TBD) - n
** Adam Wead (zymurgist) - n
** Christopher Chagal (#7 apparently, but I had to squeeze into one of these groups. Exhibit A: [http://awesome.good.is/transparency/web/1102/beer-map/flat.html The United States of GOOD beer]) - v
* THE DIRECTOR'S GROUP (where DIRECTOR = people cooler than Giarlo or Matienzo. Which is pretty much anyone. Except McDonald.)
** Declan Fleming - v
** Mark Phillips - v
** Tim Donohue (not a "director", per se, but I enjoy microbrew beer) - v
** <strike>Andrea Schurr - n</strike> - Last minute ship jumping so that I can go to Lennie's. Have fun!
** Larry Baerveldt (also not a director, but I like beer, and this group isn't full yet) - n
** Shaun Ellis (all I know is that I'm following Declan when beer is involved) - n
** Jon Stroop - v

[http://www.irishlion.com/ Irish Lion] (Irish Pub)

[http://www.crazyhorseindiana.com/ Crazy Horse] (American) ''Meet near Walnut Meeting Room (M015) at 6pm''
* Joel Richard (leader, usurped. richardjm at si dot edu) - v
* Genevieve Francis - n
* Ben Shum (bshum AT biblio DOT org)- n
* Francis Kayiwa - (former leader francis dot kayiwa AT gmail) n
* Roberto Hoyle - n
* Hui Zhang - n (hz3 AT indiana DOT edu)

[http://www.grazieitalianeatery.com/Welcome.html Grazie] (Italian)

[http://www.samirasrestaurant.com/ Samira] (Afghanistan cuisine) - ''let's meet at 6pm in the mezzanine (above the hotel lobby)''
* Ranti Junus (ranti dot junus at gmail) (lead) - v
* Andreas Orphanides - v
* Matt Connolly - v ( back in the game.)
* Toke Eskildsen - intermediate
* Mads Villadsen - intermediate
* Will Kurt - n

[http://www.opietaylors.com/welcome.html Opie Taylors] (American)

[http://www.malibugrill.net/ Malibu Grill] (California)

[http://www.littlezagreb.com/ Janko's Little Zagreb] (Steakhouse)

[http://www.bloomingpedia.org/wiki/El_Norteño El Norteño] (Mexican)

[http://www.maxsplace.info/ Max's Place] (Pizza)

[http://www.restauranttallent.com/ Tallent] (Local/Seasonal)
''7:00 Reservation for 2 tables, 6 people each; will remix at dessert. Leave from hotel lobby at 6:45''

Leaving Hotel Lobby at 6:25. Otherwise, meet us there. [http://maps.google.com/maps?f=d&source=s_d&saddr=IU+memorial+union&daddr=208+North+Walnut+Street,+Bloomington,+IN+47404+(Restaurant+Tallent)&hl=en&geocode=FfamVQIdwsXX-iElSoPsShsQPA%3BFfynVQIdlJrX-iFsH61zxeNixA&mra=ltm&dirflg=w&sll=39.167352,-86.527956&sspn=0.009283,0.01811&ie=UTF8&ll=39.1682,-86.527956&spn=0.009283,0.01811&z=16| directions]

ULCERATIVE LOONS GROUP
* Matt Zumwalt (leader) - v
* Michael Levy - n
* Devon Smith - v
* Sean Hannan - v
* Steven Miles - n
* Naomi Dushay - v

TROUBLEMAKERS GROUP
* Eric Hellman (leader- camo ski jacket) - v
* Michael Klein (troublemaker) - v
* Matt Cordial - v
* Scot Colford (natalie wood to round out the james deans of troublemakers) - v
* Rachel Frick
* Ken Irwin - (semi-v)

Roots on the Square (Vegan/Vegetarian)

'''East Side of Campus (away from downtown)'''

'''10th St., east of Wells Library''' (10-15 minute walk)

[http://www.bbcbloomington.com/ Lennie's Bar & Grill] (Pizza/American) + (Microbrewery) - ''Meet in the Biddle hotel lobby at 6:15pm. Reservation made for 6 at 6:45pm''

* [mailto:ryanwick@gmail.com Ryan Wick] (leader) - v
* June Rayner - n
* Roni Shwaish - n
* Kirk Hess (I'm a local & I think Lennie's has better beer & food than Upland.) - n
* Aaron Collie (Listens to locals; wonders who will be the last person to sign up) - n
* Jon Gorman - v - I know I make 7, but hope that's not a huge issue. If it turns out to be I'll grab a bar seat.
Tried to make a reservation for another group, but no go. Will be going to Yogi's.
* Chelsea Lobdell - n
* Andrea Schurr - n - jumped ship from Yogi's and made a reservation
* Mark Leggott - n (would be 10, if we're doing 12?)
* Zac Howarth - n 11?
(taking the bull by the horns, I just made a second reservation for 6 at 6:45 - pub-side)

'''3rd St., east of Music School''' (10-15 minute walk)

[http://www.motherbearspizza.com/index2.html Mother Bear's Pizza] (Pizza)

[http://www.macgyros.com/ The MAC] (Mediterranean)

[http://www.yelp.com/biz/red-chopsticks-bloomington Red Chopsticks] (Sushi/Asian Fusion)

Cafe Ami/Domo (Japanese/Korean)

=== "Social Network Dine Arounds" ===

Wednesday night. Not sure what these are. Make it up as you go. Be social, network, dine around.

'''Snow Lion Tibetan Restaurant (run by Dalai Rama's brother's neighbor!!!)'''

Wednesday 6PM meeting at the mezzannine above the lobby

The only agenda is relaxing and having a good time :))
[http://local.yahoo.com/info-16065573-snow-lion-restaurant-bloomington More info]
* Ranti Junus
* Becky Yoose
* Bohyun Kim
* Dileshni Jayasinghe
* Will Kurt
* Ian Mulvany

'''Dine with Hydra'''
Wednesday Evening

Location: [http://www.irishlion.com/ Irish Lion] upstairs
Time: 6:30-8:00

Join us if you want to connect with the [http://wiki.duraspace.org/display/hydra/The+Hydra+Project Hydra] developer community. Questions, ideas, suggestions, etc. all welcome. Please add yourself to the list if you're coming.

* Matt Zumwalt
* Bess Sadler
* Adam Wead
* Jeremy Nelson
* Rick Johnson
* Dan Brubaker Horst
* Scot Colford
* Mike Stroming
* Christopher Curry
* Banu Lakshminarayanan
* Richard Anderson
* Michael Levy
* Jason Stirnaman
* Daniel Lovins

'''Anyetsang's Little Tibet''' Wednesday, leaving from the mezzanine above the lobby at 6:30 PM. Menu available (in a really cheesy set of pictures of the actual menu [[http://www.anyetsangs.com/|right here]]. No agenda, no plan for discussion, just social time and Tibetan/Thai/Indian grub. Add yourself if you want, or just show up. If you sign up, we won't leave without you!
* D Ruth Bavousett (suggested it, so is going)
* Mark Mounts
* Roberto Hoyle
* Ian Walls
* Keith Nickum
* Ken Irwin
* Sarah Weeks
* Jason Fowler
* Ben Anderson

'''Run to White Castle''' We need cars.
* Declan Fleming
* Matt Critchlow

'''[http://uplandbeer.com/ Upland Brewing]''' - Leaving from lobby by hotel registration desk at 6:30 PM

Just throwing this out there to gauge interest. No specific agenda.

* Hillel Arnold
* Mark Matienzo
* Andrea Schurr - I was just going to do the same thing. I'm in.
* Jon Gorman - i'd be in. Do they serve food? Ya betcha - just linked to website.
* Chelsea Lobdell - I might be a little later than 6:30 so I will meet everyone there.
* Joe Atzberger (atz)
* Wayne Schneider - will meet you there
* gluejar (Me)
* Spencer Lamm
* Benoit Thiell - I'll meet you there.
* Patrick Force

'''[http://www.finchsbrasserie.com/ Finch's]'''
Meeting in lobby at 7, leaving no later then 7:15 - reservation made for 8 at 7:30
* decasm
* ndushay
* cbeer
* MrDys
* jrochkind
* tburton-west
* Mads Villadsen
* Toke Eskildsen

'''UNC/Dook Basketball Game'''
Wednesday - TIPOFF @ 9pm - [http://www.nicksenglishhut.com/ Nick's English Hut] - just show up!

'''Thursday breakfast at Village Deli''' Meet in hotel lobby 7:45 AM.
* Devon Smith/decasm
* Jason Stirnaman/jstirnaman
* Keith Nickum

[[Category: Code4Lib2011]]

=== Hanging out in the Hospitality Suite ===
* Located in Biddle Hotel room 116/117
* Keyholders: Gabriel Farrell (gsf, @g5f), Mike Giarlo (mjgiarlo, @mjgiarlo), Mark Matienzo (anarchivist, @anarchivist), Devon Smith (decasm)

== Werewolf! ==
It wouldn't be a tech conference unless we got together one evening to turn into a gang of murdering beasts and hyper-suspicious victims. Facilitated by the one and only mbklein.

* When: Wednesday evening
* Time: 10-10:30 PM (or whenever enough people wander in)
* Where: Planning to Meet in Hospitality Suite and move on from there is necessary. Check this space for updates.

=== About Werewolf ===

Werewolf (also known as Mafia) is a parlor game that has become [http://www.wired.co.uk/wired-magazine/archive/2010/03/features/werewolf.aspx the obsession of techie conferences everywhere]. At it's most basic, it's a game of information asymmetry -- a battle between an uninformed majority (the townspeople) and an informed minority (the werewolves who live and hunt among them). At its best, it's an off-the-wall paranoid screaming match. There are dozens of variations -- we'll start with the basics, and depending on everyone's stamina and desire to keep playing, save the tricky stuff for later. Hopefully by the end of the evening all the participants will be jibbering, jumpy, sleep-deprived lunatics incapable of trusting even their closest friends.

In other words, ''good times.''

=== Werewolf Signup ===

If you're planning on/thinking about attending, put your name here.

# Michael Klein
# anarchivist
# cbeer
# knickum
# vampire1
# vampire2
# mike D
# Sarah Weeks
# my middle name is werewolf
# do we really hafta sign up?

2011 Breakout Sessions

2011-02-09T16:58:27Z

Ryscher:

''''''NOTE: Breakout sessions are usually proposed at the conference or shortly befrore the conference begins"''

Those interested in the same project/problem can hang out in a space together for 70 minute blocks. Generally the person who suggests the topic will take on the role as moderator to begin and moderate the discussion. Anyone can propose a breakout session - please think about whether you would want a session to be held on Tuesday or Wednesday, depending on the order of talks and who you hope will attend. There are lots of spaces in the IMU where small groups can congregate and we do have a couple of rooms including the large Alumni Hall space for this. We will route different proposed sessions to the different rooms depending on a quick show-of-hands survey just before each one begins.

This page will list any sessions proposed, but there will also be flip charts outside the meeting room where more sessions can be proposed. '''Please include your name when proposing a session.

'''Tuesday 16:00-17:00:''' Alumni Hall (1-2 groups); Solarium (2-3 groups); Whittenberger Auditorium

* Drupal in Libraries - Cary Gordon and a cast of <strike>1,000s</strike> <strike>100s</strike> some - Solarium.
* Digital Video for research - William Cowan, Indiana University - Can we get beyond YouTube? - Solarium
* Plone / Zope in Libraries - Maccabee Levine, University of Wisconsin Oshkosh - Alumni Hall
* Usability research/designing for user experience - what are we doing about it? - Erin White, VCU - Solarium
* [[Can we hack on this: Open Extensible Proxy: going beyond EZProxy?]] - Terry Reese (Oregon State) and Jeremy Frumkin (University of Arizona) - Alumni Hall

'''Wednesday 14:40-15:50:''' Alumni Hall (1-2 groups); Solarium (2-3 groups); Maple Room; Walnut Room

* Solr--Tom Burton-West University of Michigan (HathiTrust)
* Supporting Open Source in Libraries - Peter Murray and Tim Daniels, LYRASIS
* Digital Library Federation Community Update - Rachel L. Frick, CLIR/DLF
* Student thesis self-submission with Pylons, Fedora Commons, & MARC - Jeremy Nelson (Colorado College)
* Collection, appraisal and tools like the [https://github.com/UNC-Libraries/Curators-Workbench Curator's Workbench] - Greg Jansen, UNC Chapel Hill - Solarium
* How OpenSocial apps and open APIs/data can personalize and enhance search and discovery on ScienceDirect,Scopus,Hub- Remko Caprio, Developer Platform Evangelist at [http://developer.sciverse.com SciVerse]
* Blacklight/Hydra - Bess Sadler & Matt Zumwalt
* [[Can we hack on this: Open Extensible Proxy: going beyond EZProxy?]] Part II - Reese, Frumkin, et al.
* VuFind Discussion - Demian Katz, Villanova University
* Data curation - Eric Lease Morgan, and I am hoping we can discuss and share best practices for curating research data across the enterprise. Copyright. Storage. Metadata. Privacy. Etc.

'''Not yet scheduled:''' ''Organizers: Please move your sessions to Tuesday or Wednesday above.''

* ColdFusion in Libraries - Daria Norris, Free Library of Philadelphia

[[Category:Code4Lib2011]]

HAMR: Human/Authority Metadata Reconciliation

2011-02-07T20:23:59Z

Ryscher: /* Need to do */

HAMR: Human/Authority Metadata Reconciliation

2011-02-07T20:23:18Z

Ryscher: /* Need to do */

HAMR: Human/Authority Metadata Reconciliation

2011-02-07T20:22:59Z

Ryscher: /* Need to do */

HAMR: Human/Authority Metadata Reconciliation

2011-02-07T20:19:23Z

Ryscher: /* Need to do */

HAMR: Human/Authority Metadata Reconciliation

2011-02-07T20:18:58Z

Ryscher: /* Need to do */

HAMR: Human/Authority Metadata Reconciliation

2011-02-07T19:11:35Z

Ryscher: /* Code */

HAMR: Human/Authority Metadata Reconciliation

2011-02-07T18:41:11Z

Ryscher: /* Code */

HAMR: Human/Authority Metadata Reconciliation

2011-02-07T18:40:01Z

Ryscher:

HAMR: Human/Authority Metadata Reconciliation

2011-02-07T16:12:41Z

Ryscher:

HAMR: Human/Authority Metadata Reconciliation

2011-02-07T16:11:59Z

Ryscher:

HAMR: Human/Authority Metadata Reconciliation

2011-02-07T15:59:50Z

Ryscher: