Difference between revisions of "2015 Prepared Talk Proposals"
Line 37: | Line 37: | ||
'''Talk Proposals''' | '''Talk Proposals''' | ||
+ | |||
+ | |||
+ | == The Impossible Search: Pulling data form unknown sources == | ||
+ | |||
+ | * Riley Childs, no official affiliation (currently a Senior in High School at Charlotte United Christian Academy), rchilds (AT) cucawarriors.com | ||
+ | |||
+ | It's easy to search data you know the structure of, but what if you need to pull in data from sources that don't have a standard structure. The ability to search community events along with your standard catalog search results is an example, but often the only way to pull these events is through XML, JSON, (Insert structured format here), or even just raw html. But how do you get that structure? That simple question is what makes this impossible. The process to define and process this structure takes a lot of manual labor, especially if the data you are pulling is just HTML, and then every time you add data to the index you have to run all the data through a script to pull in data in a format Solr or an other index can use. This talk will focus on Solr, but the principles explained will apply to many other indexes. | ||
+ | |||
[[Category:Code4Lib2015]] | [[Category:Code4Lib2015]] | ||
[[Category:Talk Proposals]] | [[Category:Talk Proposals]] |
Revision as of 06:33, 23 September 2014
DRAFT ONLY
Code4lib 2015 is a loosely-structured conference that provides people working at the intersection of libraries/archives/museums/cultural heritage and technology with a chance to share ideas, be inspired, and forge collaborations. For more information about the Code4lib community, please visit http://code4lib.org/about/. The conference will be held at the Portland Hilton & Executive Tower in Portland, Oregon, from February 9-12, 2015.
Proposals for Prepared Talks:
We encourage everyone to propose a talk.
Prepared talks are 20 minutes (including setup and questions), and should focus on one or more of the following areas:
- Projects you've worked on which incorporate innovative implementation of existing technologies and/or development of new software
- Tools and technologies – How to get the most out of existing tools, standards and protocols (and ideas on how to make them better)
- Technical issues - Big issues in library technology that should be addressed or better understood
- Relevant non-technical issues – Concerns of interest to the Code4Lib community which are not strictly technical in nature, e.g. collaboration, diversity, organizational challenges, etc.
Proposals can be submitted through Friday, November 7, 2014 at 5pm PST (GMT−8). Voting will start on November 11, 2014 and continue through November 25, 2014. The URL to submit votes will be announced on the Code4Lib website and mailing list and will require an active code4lib.org account to participate. The final list of presentations will be announced in early- to mid-December.
Proposals for Prepared Talks:
Log in to the Code4lib wiki and edit this wiki page using the prescribed format. If you are not already registered, follow the instructions to do so. Provide a title and brief (500 words or fewer) description of your proposed talk. If you so choose, you may also indicate when, if ever, you have presented at a prior Code4Lib conference. This information is completely optional, but it may assist voters in opening the conference to new presenters.
Please follow the formatting guidelines:
== Talk Title: == * Speaker's name, email address, and (optional) affiliation * Second speaker's name, email address, and affiliation, if second speaker Abstract of no more than 500 words.
Talk Proposals
The Impossible Search: Pulling data form unknown sources
- Riley Childs, no official affiliation (currently a Senior in High School at Charlotte United Christian Academy), rchilds (AT) cucawarriors.com
It's easy to search data you know the structure of, but what if you need to pull in data from sources that don't have a standard structure. The ability to search community events along with your standard catalog search results is an example, but often the only way to pull these events is through XML, JSON, (Insert structured format here), or even just raw html. But how do you get that structure? That simple question is what makes this impossible. The process to define and process this structure takes a lot of manual labor, especially if the data you are pulling is just HTML, and then every time you add data to the index you have to run all the data through a script to pull in data in a format Solr or an other index can use. This talk will focus on Solr, but the principles explained will apply to many other indexes.