C11: user interaction Flashcards
how can the search engine learn from user interactions?
- query modification behaviour (query suggestions)
- interactions with documents (clicks)
query suggestions
goal: find related queries in the query log, based on
- common substring
- co-occurrence in session
- term clustering
- clicks
how can we use log data for evaluation?
use clicking and browsing behaviour in addition to queries:
- click-through rate: nr of clicks a document attracts
- dwell time: time spent on a document
- scrolling behaviour: how users interact with the page
- stopping information: does the user abandon the search engine after a click?
what are the limitations of query logs?
- information need is unknown (can be partly deduced from previous queries)
- relevance assessments unknown (deduce from clicks + dwell time)
learning from interaction data
implicit feedback, needed if we don’t have explicit relevance assessments
assumption: when the user clicks on a result, it is relevant to them
3 limitations of implicit feedback
noisy: a non-relevant document might be clicked or a relevant document might not be clicked
biased: clicks for reasons other than relevance
- position bias: higher ranked documents get more attention
- selection bias: only interactions on retrieved documents
- presentation bias: results that are presented differently will be treated differently
what is the interpretation of a non-click? => either the document didn’t seem relevant or the user did not see the document
probabilistic model of user clicks
P(clicked(d)|relevance(d), position(d)) = P(clicked(d)|relevance(d), observed(d)) * P(clicked(d)|position(d))
how to measure the effect of position bias?
Idea: changing the position of a document doesn’t change its relevance, so all changes in click behaviour come from the position bias
intervention in the ranking:
1. swap two documents in the ranking
2. present the modified ranking to some users (A/B test)
3. record the clicks on the document in both original and modified rankings
4. measure the probability of a document being observed based on the clicks
how to correct for position bias?
Inverse Propensity Scoring (IPS) estimators can remove bias
Main idea: weigh clicks depending on their observation probability => clicks near the top get low weight, clicks near bottom get large weight
formula on slide 20, lecture 11
simulation of interaction
session simulation:
- simulate queries
- simulate clicks
- simulate user satisfaction
require a model of range of user behaviour
- users do not always behave deterministically
- might make non-optimal choices
- models need to contain noise
click models
How do users examine the result list and where do they click?
cascade assumption: user examines result list from top to bottom
Dependent Click Model (DCM)
Dependent Click Model (DCM)
- users traverse result lists from top to bottom
- users examine each document as it is encountered
- user decides whether to click on the document or skip it
- after each clicked document the user decides whether or not to continue examining the document list
- relevant documents are more likely to be clicked than non-relevant documents
advantages of simulation of interaction
- Investigate how the system behaves under certain behaviour
- Potentially a large amount of user data
- Relatively low cost to create and use
- Enable the exact same circumstances to be replicated, repeated, re-used
- Encapsulates our understanding of the process
disadvantages of simulation of interaction
- Models can become complex if we want to mirror realistic user behaviour
- Simulations enable us to explore many possibilities, but which ones, why, how to make sense of data?
- Does it represent actual user behavior/performance?
- What claims can we make? In what context?
query expansion
expand the query with more similar terms: easy to experiment with in a live search engine because no changes to the index are required
document expansion
- documents are longer than queries, so more context for a model to choose appropriate expansion terms
- can be applied at index time, and in parallel to multiple documents
Doc2Query
document expansion: train a sequence-to-sequence model that, given a text from a corpus, produces queries for which that document might be relevant
- train on relevant pairs of documents-queries
- use model to predict relevant queries for docs
- append predicted queries to documents
conversational search: different methods
retrieval-based: select best response from a collection of responses
generation-based: generate response in natural language
hybrid: retrieve information, then generate response
pros of retrieval-based methods
- source is transparent
- efficient
- evaluation straightforward
cons of retrieval-based methods
- answer space is limited
- potentially not fluent
- less interactive
pros of generation-based methods
- fluent and human-like
- tailored to user and input
- more interactive
cons of generation-based methods
- not necessarily factual, potentially toxic
- GPU-heavy
- evaluation is challenging
how to evaluate conversational search methods?
retrieval-based methods:
- Precision@n
- Mean Reciprocal Rank (MRR)
- Normalized Discounted Cumulative Gain (NDCG)
generation-based methods (measure word overlap):
- BLEU
- ROUGE
- METEOR
challenges in conversational search
- coreference issues (referring back to earlier concepts)
- dependence on previous user and system turns
- explicit feedback
- topic-switching user behaviour
ConvPR
Conversational Passage Ranking: user asks a question, model retrieves a relevant passage from a collection
methods: encoder and retrieval models (fine-tune on conversational data)
challenges of conversational search
- logical self-consistency: semantic coherence and internal logic
- safety, transparency, controllability: difficult to control the output of a generative model (could lead to hate speach)
- efficiency: time and memory-consuming training and inference