2012 Solr Preconference
The Solr 2012 Preconference Notes of Greatness:
How to get titles like Nature (specifically, big journals like "Nature") to appear near the top:
- exact title match fields, boost those. (left and right anchored)
- create boost fields on index for specific things (like serials, you want to boost)
Things coming down the pike on solr:
- Solr 4.0: hierarchical facets? (do any nightlies work?)
- new way of handling other languages other than CJK
- works fine, but slower than scp for ingests.
- jrochkind: not as slower as you'd expect.
- ingest with higher mergefactor for speed, then optimize to 1 merge segment, then use on read only
- etsy: http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication/
How to get started on solr?
- don't be afraid to ask on list
- don't be afraid to use jetty
- buy this book: http://www.amazon.com/Apache-Solr-3-1-Cookbook-Rafal/dp/1849512183
- look in comments, dont' necessarily trust the wiki
- dont' use the example, start from scratch
- lucidimagination.com : run by erikhatcher, solr conference.
How to get data in:
- sunburn in python ( https://github.com/tow/sunburnt )
- solrmarc for marc
- rsolr for ruby ( https://github.com/mwmitchell/rsolr )
How to track results?
- google analytics
- put in specific fields for tracking what people search for and click, especially which result number.
- types of tests that should always be true (hyphenation should work this way)
- worry about adding too many "result X should appear here"
- cucumber http://cukes.info/
- talk to ndushay about stanford's use of this