Talk:2015 Prepared Talk Proposals

From Code4Lib
Jump to: navigation, search

The Impossible Search: Pulling data form unknown sources

  • Riley Childs, no official affiliation (currently a Senior in High School at Charlotte United Christian Academy), rchilds (AT) cucawarriors.com

It's easy to search data you know the structure of, but what if you need to pull in data from sources that don't have a standard structure. The ability to search community events along with your standard catalog search results is an example, but often the only way to pull these events is through XML, JSON, (Insert structured format here), or even just raw html. But how do you get that structure? That simple question is what makes this impossible. The process to define and process this structure takes a lot of manual labor, especially if the data you are pulling is just HTML, and then every time you add data to the index you have to run all the data through a script to pull in data in a format Solr or an other index can use. This talk will focus on Solr, but the principles explained will apply to many other indexes.