Difference between revisions of "2012 Solr Preconference"

From Code4Lib
Jump to: navigation, search
 
(One intermediate revision by one other user not shown)
Line 23: Line 23:
 
* don't be afraid to use jetty
 
* don't be afraid to use jetty
 
* buy this book: http://www.amazon.com/Apache-Solr-3-1-Cookbook-Rafal/dp/1849512183
 
* buy this book: http://www.amazon.com/Apache-Solr-3-1-Cookbook-Rafal/dp/1849512183
* look in comments, dont' necessarily trust the wiki
+
* look in comments, don't necessarily trust the wiki
* dont' use the example, start from scratch
+
* don't use the example, start from scratch
* lucidimagination.com :  run by erikhatcher, solr conference.
+
* lucidimagination.com :  run by erikhatcher, solr conference (http://www.lucenerevolution.com/ ).
 
* https://issues.apache.org/jira/browse/SOLR
 
* https://issues.apache.org/jira/browse/SOLR
 
* http://lucene.apache.org/solr/mailing_lists.html
 
* http://lucene.apache.org/solr/mailing_lists.html
Line 31: Line 31:
  
 
How to get data in:
 
How to get data in:
* sunburn in python ( https://github.com/tow/sunburnt )
+
* sunburnt in python ( https://github.com/tow/sunburnt http://pypi.python.org/pypi/sunburnt)
 
* solrmarc for marc
 
* solrmarc for marc
 
* rsolr for ruby ( https://github.com/mwmitchell/rsolr )
 
* rsolr for ruby ( https://github.com/mwmitchell/rsolr )
Line 38: Line 38:
 
How to track results?
 
How to track results?
 
* google analytics
 
* google analytics
* put in specific fields for tracking what people search for and click, especially which result number.
+
* put in specific fields for tracking what people search for and click, especially which result number.
 +
 
 +
What are people doing to present info to user? Just from solr?
 +
* one method is to mark some fields in solr as "display only"
 +
* hathi (per Tom Burton-West) takes user to first page of document, then allows user to search, (against a separate index)
 +
 
 +
Relevance testing:
 +
* types of tests that should always be true (hyphenation should work this way)
 +
* worry about adding too many "result X should appear here"
 +
* cucumber http://cukes.info/
 +
* talk to ndushay about stanford's use of this
 +
 
 +
Tika (http://tika.apache.org/ ) use with Solr?
 +
* Danish Web archive used it.

Latest revision as of 19:26, 6 February 2012

The Solr 2012 Preconference Notes of Greatness:

How to get titles like Nature (specifically, big journals like "Nature") to appear near the top:

  • exact title match fields, boost those. (left and right anchored)
  • create boost fields on index for specific things (like serials, you want to boost)

Things coming down the pike on solr:

  • Solr 4.0: hierarchical facets? (do any nightlies work?)
  • new way of handling other languages other than CJK


Replication:


How to get started on solr?


How to get data in:


How to track results?

  • google analytics
  • put in specific fields for tracking what people search for and click, especially which result number.

What are people doing to present info to user? Just from solr?

  • one method is to mark some fields in solr as "display only"
  • hathi (per Tom Burton-West) takes user to first page of document, then allows user to search, (against a separate index)

Relevance testing:

  • types of tests that should always be true (hyphenation should work this way)
  • worry about adding too many "result X should appear here"
  • cucumber http://cukes.info/
  • talk to ndushay about stanford's use of this

Tika (http://tika.apache.org/ ) use with Solr?

  • Danish Web archive used it.