I think the search should take whatever text a use enters and put quotes around it so the system treats it as a phrase search. Most of the user tests we’ve done have showed that this is a problem for users – they seem to expect it to do phrase searching and are surprised by the results. Most users will not go back and put quotes on, I think we should add them by default when they hit enter so that the system will match their expectations.
I agree that a fix for this is of high importance. Another option for fixing this would be to use AND as the Boolean default rather than the current default to OR. The result set would then have ALL words that the user entered, where currently the results can include a match on only one of the words.
Hi @jimfhahn @bethpc, we were briefly discussing this topic in the last AC.
Executing a “phrase search” by default means having a system which offers the maximum precision together with a zero recall. Usually, that results in a very bad search experience for those users that are not so sure about what they are looking for.
Ex1: A user that remembers only the surname and a part of the title. A phrase search won’t get any relevant result
Ex2: A user that remembers (or better, he’s sure he remembers) the title of a work, but exchanges the order of some terms. He won’t get any results
For the Ex2, we could introduce a phrase slop (as already the system does) but it would no longer be an “exact match”.
These are just 2 examples for saying: in my opinion, we should provide as much as possible queries + concrete results + expectations, other than relying on or guessing which would be the best search algorithm in advance. In modern search engines, that usually leads to a very complicated configuration that goes beyond a simple boolean AND/OR logic.
In a search system, the search quality tuning and evaluation usually follow these steps:
- Collect as many as possible explicit judgments (query → results + relevance rankings). Basically a query, with actual and expected results; what was wrong, what was good
- Use those results to model a search logic which fits with them
- Set up a system (I described RRE, which is open source, in the past) which would help us to automatically compute some relevance metrics
A system like that would fit the required search logic and in addition would act as a regression suite: when we add some feature or make some tuning, it would indicate us if the collected queries are still performing good, and how.
- Analysis of system logs in order to capture a huge amount of user interactions (clicks, search, impressions, paging, scroll)
- Use the data above for training a model used in a Learning To Rank context
The purpose is to have a “machine learning” pipeline which will create a theoretical model that represents and summarize what the Share-VDE users think is “relevant” (from a search perspective).
Happy to discuss and go deep in this topic