Changes

2012 talks proposals

1,176 bytes added, 19:46, 27 May 2016

→‎Beyond code: Versioning data with Git and Mercurial.

Deadline for talk submission is was ''Sunday, November 20''.(The deadline for 2012 Talks proposals is now closed.)

Prepared talks are 20 minutes (including setup and questions), and focus on one or more of the following areas:

== Beyond code: Versioning data with Git and Mercurial. ==

* ~~Stephanie~~ Charlie Collett, California Digital Library, ~~stephanie~~charlie.collett@ucop.edu

* Martin Haye, California Digital Library, martin.haye@ucop.edu

Mendeley has built the world's largest open database of research and we've now begun to collect some interesting social metadata around the document metadata. I would like to share with the Code4Lib attendees information about using this resource to do things within your application that have previously been impossible for the library community, or in some cases impossible without expensive database subscriptions. One thing that's now possible is to augment catalog search by surfacing information about content usage, allowing people to not only find things matching a query, but popular things or things read by their colleagues. In addition to augmenting search, you can also use this information to augment discovery. Imagine an online exhibit of artifacts from a newly discovered dig not just linking to papers which discuss the artifact, but linking to really good interesting papers about the place and the people who made the artifacts. So the big idea is, "How will looking at the literature from a broader perspective than simple citation analysis change how research is done and communicated? How can we build tools that make this process easier and faster?" I can show some examples of applications that have been built using the Mendeley and PLoS APIs to begin to address this question, and I can also present results from Mendeley's developer challenge which shows what kinds of applications researchers are looking for, what kind of applications peope are building, and illustrates some interesting places where the two don't overlap.

Slides from my talk are here: http://db.tt/PMaqFoVw

==Your UI can make or break the application (to the user, anyway)==

==Search Engine Relevancy Tuning - A Static Rank Framework for Solr/Lucene==

* Mike Schultz, ~~Amazon.com (~~formerly Summon Search Architect) , mike.schultz@gmail.com

Solr/Lucene provides a lot of flexibility for adjusting relevancy scoring and improving search results. Roughly speaking there are two areas of concern: Firstly, a 'dynamic rank' calculation that is a function of the user query and document text fields. And secondly, a 'static rank' which is independent of the query and generally is a function of non-text document metadata. In this talk I will outline an easily understood, hand-tunable static rank system with a minimal number of parameters.

== DMPTool: Guidance and resources to build a data management plan==

* Marisa Strong, California Digital Libary, marisa.strong@ucop.edu

== The Golden Road (To Unlimited Devotion): Building a Socially ~~Contructed~~ Constructed Archive of Grateful Dead Artifacts ==

* Robin Chandler, University of California (Santa Cruz), chandler [at] ucsc [dot] edu

This talk will discuss the challenges of merging a traditional archive with a socially constructed one. We will also present the first round of development and explain how we're using tools like Omeka, ContentDM, UC3 Merritt, djatoka, Kaltura, Google Maps, and Solr to lay the foundation for a robust and engaging site. Future directions, like the integration/development of better curation tools and what we hope to learn from opening the archive to contributions from a large community of fans, will also be discussed.

== Library News - A gathering place for library and tech news, and more ==

The existing body of Open Access scholarly research is a well classified and described dataset. However, in Institutional Repositories it can be the case that there are insufficient resources to invest for cataloging and maintaining rich metadata descriptions of contributed content. This is especially the case when collections are populated and maintained by non-librarians. A great deal of classifiable detail preexists within files that are submitted to scholarly repositories. Utilizing existing Open Source technologies capable of extracting this information, a process can be provided to submitters and repository maintainers to suggest appropriate subject classifications and types for descriptive metadata during submission and update of repository items. This talk will provide an overview of an approach for utilizing machine learning as a tool for the auto population of subject classifications and content types.

== Mining Wikipedia for Book Articles ==

* Paul Deschner, Harvard Library Innovation Lab, deschner@law.harvard.edu

Suppose you were developing a browsing tool for library materials and wanted to include Wikipedia articles and categories whenever available -- how would you do it? There is no API or other data service which one can use to get a comprehensive listing of every page in Wikipedia devoted to the discussion of a book.

This talk will focus on the tools, workflows and data sources we have used to approach this problem. Tools and workflows include the use of Infobox ISBN's and other standard identifiers, analysis of Wikipedia categories and category hierarchies, exploitation of article abstracts and titles, and Mechanical Turk resources. Data sources include Dbpedia triple stores and Wikimedia XML/SQL dumps. So far, we have harvested around 60,000 book articles. This is an exploration in dealing with open, relatively unstructured Web content, and in aggregating answers to the same question using quite diverse techniques.

[[Category: Code4Lib2012]]

[[Category:Talk Proposals]]

← Older edit

Anarchivist

224

edits

Changes

2012 talks proposals

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools