209
edits
Changes
New proposal
[7] http://esri.com
==Under the Hood of Hadoop Processing at OCLC Research ==
[http://roytennant.com/ Roy Tennant]
* Previous Code4Lib presentations: 2006: "The Case for Code4Lib 501c(3)"
[http://hadoop.apache.org/ Apache Hadoop] is widely used by Yahoo!, Google, and many others to process massive amounts of data quickly. OCLC Research uses a 40-node compute cluster with Hadoop and HBase to process the 300 million MARC records of WorldCat in various ways. This presentation will explain how Hadoop MapReduce works and illustrate it with specific examples and code. The role of the jobtracker in both monitoring and reporting on processes will be explained. String searching WorldCat will also be demonstrated live.
[[:Category:Code4Lib2014]]