Open main menu
2014 Breakout II (Wednesday)
1,424 bytes added
20:32, 26 March 2014
Unusual searches & long searches
==Unusual searches & long searches==
This group met to talk about unusual searches, especially extremely long searches, copied and pasted citations, and other issues related to serving niche searches.
Some of the possible solutions include:
*Looking for DOI, ISBN or other identifiers in the query, extract these, and make the request to a service using these IDs.
*Remove extraneous characters from the beginning of a string that may indicate copied and pasted text.
*Truncate a long query at a certain character length (80 to 100?) assuming that the most useful text appears at the start of the query.
*Use a regex to identify a citation by detecting some combination of words commonly used in citations (Vol., Iss., pp.), four digit years, and other combinations of numbers.
**It would be useful to test this regex against a search corpus to check for false matches.
**Once a citation is identified either certain characters could be removed from the query or a citation parser such as the Brown's FreeCite [http://freecite.library.brown.edu/].
Other things noted:
*If you truncate a query don't truncate in the middle of a word or else recall may be worse.
*Log queries that provide zero hit as way to find types of queries that may need some post processing.
*Is there way to provide smarter, live results for libraries for thing such as library hours, similar to the way Google provides live flight tracking information directly in the results list.
← Older edit
Newer edit →