Changes

2013 talks proposals

1,447 bytes added, 21:39, 8 November 2012

no edit summary

also display versions of resources available in Memento compliant web

archives and content management systems.

== Practical Relevance Ranking for 10 million books. ==

* Tom Burton-West, University of Michigan Library, tburtonw@umich.edu

[http://www.hathitrust.org/ HathiTrust Full-text search] indexes the full-text and metadata for over 10 million books. There are many challenges in tuning relevance ranking for a collection of this size. This talk will discuss some of the underlying issues, some of our experiments to improve relevance ranking, and our ongoing efforts to develop a principled framework for testing changes to relevance ranking.

Some of the topics covered will include:

* Length normalization for indexing the full-text of book-length documents

* Indexing granularity for books

*Testing new features in Solr 4.0:

**New ranking formulas that should work better with book-length documents: BM25 and DFR.

**Grouping/Field Collapsing. Can we index 3 billion pages and then use Solr's field collapsing feature to rank books according to the most relevant page(s)?

**Finite State Automota/Block Trees for storing the in-memory index to the index. Will this allow us to allow wildcards/truncation despite over 2 billion unique terms per index?

*Relevance testing methodologies:Query log analysis, Click models, Interleaving, A/B testing, and Test collection based evaluation.

*Testing of a new high-performance storage system to be installed in early 2013. We will report on any tests we are able to run prior to conference time.

[[Category:Code4Lib2013]]

Tburtonw

4

edits

Changes

2013 talks proposals

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools