Difference between revisions of "2012 Solr Preconference"
From Code4Lib
(New page: The Solr 2012 Preconference Notes of Greatness:) |
|||
(6 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
The Solr 2012 Preconference Notes of Greatness: | The Solr 2012 Preconference Notes of Greatness: | ||
+ | |||
+ | How to get titles like Nature (specifically, big journals like "Nature") to appear near the top: | ||
+ | |||
+ | * exact title match fields, boost those. (left and right anchored) | ||
+ | * create boost fields on index for specific things (like serials, you want to boost) | ||
+ | |||
+ | Things coming down the pike on solr: | ||
+ | * Solr 4.0: hierarchical facets? (do any nightlies work?) | ||
+ | * new way of handling other languages other than CJK | ||
+ | |||
+ | |||
+ | Replication: | ||
+ | |||
+ | * works fine, but slower than scp for ingests. | ||
+ | * jrochkind: not as slower as you'd expect. | ||
+ | * ingest with higher mergefactor for speed, then optimize to 1 merge segment, then use on read only | ||
+ | * etsy: http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication/ | ||
+ | |||
+ | |||
+ | How to get started on solr? | ||
+ | * don't be afraid to ask on list | ||
+ | * don't be afraid to use jetty | ||
+ | * buy this book: http://www.amazon.com/Apache-Solr-3-1-Cookbook-Rafal/dp/1849512183 | ||
+ | * look in comments, don't necessarily trust the wiki | ||
+ | * don't use the example, start from scratch | ||
+ | * lucidimagination.com : run by erikhatcher, solr conference (http://www.lucenerevolution.com/ ). | ||
+ | * https://issues.apache.org/jira/browse/SOLR | ||
+ | * http://lucene.apache.org/solr/mailing_lists.html | ||
+ | |||
+ | |||
+ | How to get data in: | ||
+ | * sunburnt in python ( https://github.com/tow/sunburnt http://pypi.python.org/pypi/sunburnt) | ||
+ | * solrmarc for marc | ||
+ | * rsolr for ruby ( https://github.com/mwmitchell/rsolr ) | ||
+ | |||
+ | |||
+ | How to track results? | ||
+ | * google analytics | ||
+ | * put in specific fields for tracking what people search for and click, especially which result number. | ||
+ | |||
+ | What are people doing to present info to user? Just from solr? | ||
+ | * one method is to mark some fields in solr as "display only" | ||
+ | * hathi (per Tom Burton-West) takes user to first page of document, then allows user to search, (against a separate index) | ||
+ | |||
+ | Relevance testing: | ||
+ | * types of tests that should always be true (hyphenation should work this way) | ||
+ | * worry about adding too many "result X should appear here" | ||
+ | * cucumber http://cukes.info/ | ||
+ | * talk to ndushay about stanford's use of this | ||
+ | |||
+ | Tika (http://tika.apache.org/ ) use with Solr? | ||
+ | * Danish Web archive used it. |
Latest revision as of 19:26, 6 February 2012
The Solr 2012 Preconference Notes of Greatness:
How to get titles like Nature (specifically, big journals like "Nature") to appear near the top:
- exact title match fields, boost those. (left and right anchored)
- create boost fields on index for specific things (like serials, you want to boost)
Things coming down the pike on solr:
- Solr 4.0: hierarchical facets? (do any nightlies work?)
- new way of handling other languages other than CJK
Replication:
- works fine, but slower than scp for ingests.
- jrochkind: not as slower as you'd expect.
- ingest with higher mergefactor for speed, then optimize to 1 merge segment, then use on read only
- etsy: http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication/
How to get started on solr?
- don't be afraid to ask on list
- don't be afraid to use jetty
- buy this book: http://www.amazon.com/Apache-Solr-3-1-Cookbook-Rafal/dp/1849512183
- look in comments, don't necessarily trust the wiki
- don't use the example, start from scratch
- lucidimagination.com : run by erikhatcher, solr conference (http://www.lucenerevolution.com/ ).
- https://issues.apache.org/jira/browse/SOLR
- http://lucene.apache.org/solr/mailing_lists.html
How to get data in:
- sunburnt in python ( https://github.com/tow/sunburnt http://pypi.python.org/pypi/sunburnt)
- solrmarc for marc
- rsolr for ruby ( https://github.com/mwmitchell/rsolr )
How to track results?
- google analytics
- put in specific fields for tracking what people search for and click, especially which result number.
What are people doing to present info to user? Just from solr?
- one method is to mark some fields in solr as "display only"
- hathi (per Tom Burton-West) takes user to first page of document, then allows user to search, (against a separate index)
Relevance testing:
- types of tests that should always be true (hyphenation should work this way)
- worry about adding too many "result X should appear here"
- cucumber http://cukes.info/
- talk to ndushay about stanford's use of this
Tika (http://tika.apache.org/ ) use with Solr?
- Danish Web archive used it.