2014 Breakout II (Wednesday)

Unusual searches & long searches
This group met to talk about unusual searches, especially extremely long searches, copied and pasted citations, and other issues related to serving niche searches.
Some of the possible solutions include:
*Looking for DOI, ISBN or other identifiers in the query, extract these, and make the request to a service using these IDs.
*Remove extraneous characters from the beginning of a string that may indicate copied and pasted text.
*Truncate a long query at a certain character length (80 to 100?) assuming that the most useful text appears at the start of the query.
*Use a regex to identify a citation by detecting some combination of words commonly used in citations (Vol., Iss., pp.), four digit years, and other combinations of numbers.
**It would be useful to test this regex against a search corpus to check for false matches.
**Once a citation is identified either certain characters could be removed from the query or a citation parser such as the Brown's FreeCite [].
Other things noted:
*If you truncate a query don't truncate in the middle of a word or else recall may be worse.
*Log queries that provide zero hit as way to find types of queries that may need some post processing.
*Is there way to provide smarter, live results for libraries for thing such as library hours, similar to the way Google provides live flight tracking information directly in the results list.

