W9 Neural IR 2 Flashcards
why would we want dense retrieval?
sometimes we need exact matching, but often we also want in-exact matching of documents to queries: if we use exact matching in 1st stage we might miss relevant documents
what is dense retrieval?
neural first-stage retrieval, so using embeddings
- bi-encoder architecture: encoding the query and document independently, then computing the relevance
bi-encoder archtitecture: 3 steps
1.generate a representation of the query that captures the information need
2.generate a representation of the document that captures the information contained
3.match the query and the document representations to estimate their mutual relevance
how do we measure the relevance between query and document?
use a function to compute the similarity between the query and the document representation vectors
what are the 4 differences between cross-encoders and bi-encoders?
cross: one encoder for q and d
bi: separate encoders for q and d
cross: full interaction between words in q and d
bi: no interaction between words in q and d
cross: higher quality ranker than bi
cross: only possible in re-ranking
bi: highly efficient (also in 1st stage)
Sentence-BERT
commonly used bi-encoder, originally designed for sentence similarity but can be used for q,d pairs
it is a pointwise model, because we only take on d into account per learning item. At inference we measure the similarity between q and each d and then sort the docs by this similarity
what is the goal of training bi-encoders?
the similarity between the 2 vectors is maximized docs relevant to q and minimized for non-relevant docs to q, given the similarity function
why are bi-encoders less effective than cross-encoders?
cross-encoders can learn relevance signals from attention between the query and candidate texts at each transformer encoder layer
ColBERT
proposed as a model that has the effectiveness of cross-encoders and the efficiency of bi-encoders
- compatible with nearest neighbour search techniques
what is nearest-neighbour search?
finding which document embeddings vector are most similary to the query embeddings vector
computing the similarity for each d,q pair is not scalable => approximate nearest-neighbour (ANN) search
similarity in ColBERT
similarity between d and q is the sum of maximum cosine similarities between each query term and the best matching term in d
ColBERT: query ime
L(q,d+, d-) = - log e^s_q,d+ / ( e^s_q,d+ + e^s_q,d-)
challenges of long documents
memory burden of reading the whole document in the encoder
mixture of many topics, query matches may be spread
neural model must aggregate the relevant matches from different parts
challenges of short documents
fewer query matches
but neural model is more robust towards the vocabulary mismatch problem than term-based matching models
what is the long-tail problem?
a good IR method must be able to retrieve
infrequently searched-for documents and perform reasonably well on queries with rare terms