Jump to: navigation, search

2013 talks proposals

1,447 bytes added, 21:39, 8 November 2012
no edit summary
also display versions of resources available in Memento compliant web
archives and content management systems.
== Practical Relevance Ranking for 10 million books. ==
* Tom Burton-West, University of Michigan Library,
[ HathiTrust Full-text search] indexes the full-text and metadata for over 10 million books. There are many challenges in tuning relevance ranking for a collection of this size. This talk will discuss some of the underlying issues, some of our experiments to improve relevance ranking, and our ongoing efforts to develop a principled framework for testing changes to relevance ranking.
Some of the topics covered will include:
* Length normalization for indexing the full-text of book-length documents
* Indexing granularity for books
*Testing new features in Solr 4.0:
**New ranking formulas that should work better with book-length documents: BM25 and DFR.
**Grouping/Field Collapsing. Can we index 3 billion pages and then use Solr's field collapsing feature to rank books according to the most relevant page(s)?
**Finite State Automota/Block Trees for storing the in-memory index to the index. Will this allow us to allow wildcards/truncation despite over 2 billion unique terms per index?
*Relevance testing methodologies:Query log analysis, Click models, Interleaving, A/B testing, and Test collection based evaluation.
*Testing of a new high-performance storage system to be installed in early 2013. We will report on any tests we are able to run prior to conference time.

Navigation menu