Difference between revisions of "2014 Prepared Talk Proposals"
DarrenHardy (Talk | contribs) |
(New proposal) |
||
Line 122: | Line 122: | ||
[7] http://esri.com | [7] http://esri.com | ||
+ | ==Under the Hood of Hadoop Processing at OCLC Research == | ||
+ | |||
+ | [http://roytennant.com/ Roy Tennant] | ||
+ | |||
+ | * Previous Code4Lib presentations: 2006: "The Case for Code4Lib 501c(3)" | ||
+ | |||
+ | [http://hadoop.apache.org/ Apache Hadoop] is widely used by Yahoo!, Google, and many others to process massive amounts of data quickly. OCLC Research uses a 40-node compute cluster with Hadoop and HBase to process the 300 million MARC records of WorldCat in various ways. This presentation will explain how Hadoop MapReduce works and illustrate it with specific examples and code. The role of the jobtracker in both monitoring and reporting on processes will be explained. String searching WorldCat will also be demonstrated live. | ||
[[:Category:Code4Lib2014]] | [[:Category:Code4Lib2014]] |
Revision as of 18:19, 31 October 2013
Contents
- 1 2014 Prepared Talk Proposals
- 2 Using Drupal to drive alternative presentation systems
- 3 A Book, a Web Browser and a Tablet: How Bibliotheca Alexandrina's Book Viewer Framework Makes It Possible
- 4 Structured data NOW: seeding schema.org in library systems
- 5 Towards Pasta Code Nirvana: Using JavaScript MVC to Fill Your Programming Ravioli
- 6 WebSockets for Real-Time and Interactive Interfaces
- 7 Rapid Development of Automated Tasks with the File Analyzer
- 8 GeoHydra: How to Build a Geospatial Digital Library with Fedora
- 9 Under the Hood of Hadoop Processing at OCLC Research
2014 Prepared Talk Proposals
Proposals for Prepared Talks:
Prepared talks are 20 minutes (including setup and questions), and should focus on one or more of the following areas:
- Projects you've worked on which incorporate innovative implementation of existing technologies and/or development of new software
- Tools and technologies – How to get the most out of existing tools, standards and protocols (and ideas on how to make them better)
- Technical issues - Big issues in library technology that should be addressed or better understood
- Relevant non-technical issues – Concerns of interest to the Code4Lib community which are not strictly technical in nature, e.g. collaboration, diversity, organizational challenges, etc.
To Propose a Talk
- Log in to the wiki in order to submit a proposal. If you are not already registered, follow the instructions to do so.
- Provide a title and brief (500 words or fewer) description of your proposed talk.
- If you so choose, you may also indicate when, if ever, you have presented at a prior Code4Lib conference. This information is completely optional, but it may assist us in opening the conference to new presenters.
As in past years, the Code4Lib community will vote on proposals that they would like to see included in the program. This year, however, only the top 10 proposals will be guaranteed a slot at the conference. Additional presentations will be selected by the Program Committee in an effort to ensure diversity in program content. Community votes will, of course, still weigh heavily in these decisions.
Presenters whose proposals are selected for inclusion in the program will be guaranteed an opportunity to register for the conference. The standard conference registration fee will still apply.
Proposals can be submitted through Friday, November 8, 2013, at 5pm PST. Voting will commence on November 18, 2013 and continue through December 6, 2013. The final line-up of presentations will be announced in early January, 2014.
Talk Proposals
Using Drupal to drive alternative presentation systems
- Cary Gordon, The Cherry Hill Company, cgordon@chillco.com
Recently, we have been building systems that use angular.js, Rails, or other systems for presentation, while leveraging Drupal's sophisticated content management capabilities on the back end.
So far, these have been one-way systems, but as we move to Drupal 8 we are beginning to explore ways to further decouple the presentation and CMS functions.
A Book, a Web Browser and a Tablet: How Bibliotheca Alexandrina's Book Viewer Framework Makes It Possible
- Mohammed Abu ouda, Bibliotheca Alexandrina (The new Library of Alexandria)
A lot of institutions around the world are engaged in multiple digitization projects aiming at preserving the human knowledge present in books and availing them through multiple channels to people around the whole globe. These efforts will sure help close the digital gap particularly with the arrival of affordable e-readers, mobile phones and network coverage. However, the digital reading experience has not yet arrived to its maximum potential. Many readers miss features they like in their good old books and wish to find them in their digital counterpart. In an attempt to create a unique digital reading experience, Bibliotheca Alexandria (BA) created a flexible book viewing framework that is currently used to access its current collection of more than 300,000 digital books in five different languages which includes the largest collection of digitized Arabic books.
Using open source tools, BA used the framework to develop a modular book viewer that can be deployed in different environments and is currently at the heart of various BA projects. The Book viewer provides several features creating a more natural reading experience. As with physical books, the reader can now personalize the books he reads by adding annotations like highlights, underlines and sticky notes to capture his thoughts and ideas in addition to being able to share the book with friends on social networks. The reader can perform a search across the content of the book receiving highlighted search results within the pages of the book. More features can be further added to the book viewer through its plugin architecture.
Structured data NOW: seeding schema.org in library systems
- Dan Scott, Laurentian University
- Previous code4lib presentations: CouchDB is sacrilege... mmm, delicious sacrilege at Code4Lib 2008
The semantic web, linked data, and structured data are all fantastic ideas with a barrier imposed by implementation constraints. If their system does not allow customizations, or the institution lacks skilled human resources, it does not matter how enthused a given library might be about publishing structured data... it will not happen. However, if the software in use simply publishes structured data by default, then the web will be populated for free. Really! No extra resources necessary.
This presentation highlights Dan's work with systems such as Evergreen, Koha, and VuFind to enable the publication of schema.org structured data out-of-the-box. Along the way, we reflect the current state of the W3C Schema.org Bibliographic Extension community group efforts to shape the evolution of the schema.org vocabulary. Finally, hold on tight as we contemplate next steps and the possibilities of a world where structured data is the norm on the web.
Towards Pasta Code Nirvana: Using JavaScript MVC to Fill Your Programming Ravioli
- Bret Davidson, North Carolina State University Libraries, bret_davidson@ncsu.edu
- Previous Code4Lib Presentations: Visualizing library data with D3.js at Code4Lib 2013
JavaScript MVC frameworks are ushering in a golden age of robust and responsive web applications that take advantage of evergreen browsers, performant JS engines, and the unprecedented reach provided by billions of personal computing devices. The web browser has emerged as the world’s most popular application runtime and the complexity[1] and scope of JavaScript applications has exploded accordingly. Server-side web frameworks like Rails and Django have helped developers adhere to best practices like modularity, dependency injection, and unit testing for years, practices that are now being applied to JavaScript development through projects like Backbone[2], Ember[3], and Angular[4].
This talk will discuss the issues JavaScript MVC frameworks are trying to solve, common features like data binding, implications for the future of web development[5], and the appropriateness of JavaScript MVC for library applications.
- [1]http://en.wikipedia.org/wiki/Spaghetti_code
- [2]http://backbonejs.org
- [3]http://emberjs.com
- [4]http://angularjs.org
- [5]http://tomdale.net/2013/09/progressive-enhancement-is-dead/
WebSockets for Real-Time and Interactive Interfaces
- Jason Ronallo, NCSU Libraries, jason_ronallo@ncsu.edu
Previous Code4Lib presentations:
Watching the Google Analytics Real-Time dashboard for the first time was mesmerizing. As soon as someone visited a site, I could see what page they were on. For a digital collections site with a lot of images, it was fun to see what visitors were looking at. But getting from Google Analytics to the image or other content of what was currently being viewed was cumbersome. The real-time experience was something I wanted share with others. I'll show you how I used a WebSocket service to create a real-time interface to digital collections.
In the Hunt Library at NCSU we have some large video walls. I wanted to make HTML-based exhibits that featured viewer interactions. I'll show you how I converted Listen to Wikipedia [1] into an bring-your-own-device interactive exhibit. With WebSockets any HTML page can be remote controlled by any internet connected device.
I will attempt to include real-time audience participation.
[1] http://listen.hatnote.com/
Rapid Development of Automated Tasks with the File Analyzer
- Terry Brady, Georgetown University Libraries, twb27@georgetown.edu
The Georgetown University Libraries have customized the File Analyzer and Metadata Harvester application (https://github.com/Georgetown-University-Libraries/File-Analyzer) to solve a number of library automation challenges:
- validating digitized and reformatted files
- validating vendor statistics for counter compliance
- preparing collections of digital files for archiving and ingest
- manipulating ILS import and export files
The File Analyzer application was used by the US National Archives to validate 3.5 million digitized images from the 1940 Census. After implementing a customized ingest workflow within the File Analyzer, the Georgetown University Libraries was able to process an ingest backlog of over a thousand files of digital resources into DigitalGeorgetown, the Libraries’ Digital Collections and Institutional Repository platform. Georgetown is currently developing customized workflows that integrate Apache Tika, BagIt, and Marc conversion utilities.
The File Analyzer is a desktop application with a powerful framework for implementing customized file validation and transformation rules. As new rules are deployed, they are presented to users within a user interface that is easy (and powerful) to use.
Learn about the functionality that is available for download, how you can use this tool to automate workflows from digital collections to ILS ingests to electronic resources statistics and also discuss the opportunities to collaborate on enhancements to this application!
GeoHydra: How to Build a Geospatial Digital Library with Fedora
- Darren Hardy, Stanford University, drh@stanford.edu
Geographically-rich data are exploding and putting fear in those trying to tackle integrating them into existing digital library infrastructures. Building a spatial data infrastructure that integrates with your digital library infrastructure need not be a daunting task. We have successfully deployed a geospatial digital library infrastructure using Fedora and open-source geospatial software [1]. We'll discuss the primary design decisions and technologies that led to a production deployment within a few months. Briefly, our architecture revolves around discovery, delivery, and metadata pipelines using open-source OpenGeoPortal [2], Solr [3], GeoServer [4], PostGIS [5], and GeoNetwork [6] technologies, plus the proprietary ESRI ArcMap [7] -- the GIS industry's workhorse. Finally, we'll discuss the key skillsets needed to build and maintain a spatial data infrastructure.
[1] http://foss4g.org [2] http://opengeoportal.org [3] http://solr.apache.org [4] http://geoserver.org [5] http://postgis.net [6] http://geonetwork-opensource.org [7] http://esri.com
Under the Hood of Hadoop Processing at OCLC Research
- Previous Code4Lib presentations: 2006: "The Case for Code4Lib 501c(3)"
Apache Hadoop is widely used by Yahoo!, Google, and many others to process massive amounts of data quickly. OCLC Research uses a 40-node compute cluster with Hadoop and HBase to process the 300 million MARC records of WorldCat in various ways. This presentation will explain how Hadoop MapReduce works and illustrate it with specific examples and code. The role of the jobtracker in both monitoring and reporting on processes will be explained. String searching WorldCat will also be demonstrated live.