Difference between revisions of "2015 Prepared Talk Proposals"

From Code4Lib
Jump to: navigation, search
Line 17: Line 17:
 
'''Talk Proposals'''
 
'''Talk Proposals'''
  
 +
== The Impossible Search: Pulling data form unknown sources ==
 +
 +
* Riley Childs, no official affiliation (currently a Senior in High School at Charlotte United Christian Academy), rchilds (AT) cucawarriors.com
 +
 +
It's easy to search data you know the structure of, but what if you need to pull in data from sources that don't have a standard structure. The ability to search community events along with your standard catalog search results is an example, but often the only way to pull these events is through XML, JSON, (Insert structured format here), or even just raw html. But how do you get that structure? That simple question is what makes this impossible. The process to define and process this structure takes a lot of manual labor, especially if the data you are pulling is just HTML, and then every time you add data to the index you have to run all the data through a script to pull in data in a format Solr or an other index can use. This talk will focus on Solr, but the principles explained will apply to many other indexes.
  
  

Revision as of 16:38, 8 September 2014

Proposals for Prepared Talks:


Please follow the formatting guidelines:


== Talk Title: ==
 
* Speaker's name, affiliation, and email address
* Second speaker's name, affiliation, email address, if second speaker

Abstract of no more than 500 words.

Talk Proposals

The Impossible Search: Pulling data form unknown sources

  • Riley Childs, no official affiliation (currently a Senior in High School at Charlotte United Christian Academy), rchilds (AT) cucawarriors.com

It's easy to search data you know the structure of, but what if you need to pull in data from sources that don't have a standard structure. The ability to search community events along with your standard catalog search results is an example, but often the only way to pull these events is through XML, JSON, (Insert structured format here), or even just raw html. But how do you get that structure? That simple question is what makes this impossible. The process to define and process this structure takes a lot of manual labor, especially if the data you are pulling is just HTML, and then every time you add data to the index you have to run all the data through a script to pull in data in a format Solr or an other index can use. This talk will focus on Solr, but the principles explained will apply to many other indexes.