C9: neural IR Flashcards

1
Q

why would we want dense retrieval?

A

sometimes we need exact matching, but often we also want in-exact matching of documents to queries: if we use exact matching in 1st stage we might miss relevant documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is dense retrieval?

A

neural first-stage retrieval, so using embeddings
- bi-encoder architecture: encoding the query and document independently, then computing the relevance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

bi-encoder architecture: 3 steps

A
  1. generate a representation of the query that captures the information need
  2. generate a representation of the document that captures the information contained
  3. match the query and the document representations to estimate their mutual relevance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how do we measure the relevance between query and document?

A

use a function to compute the similarity between the query and the document representation vectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the 4 differences between cross-encoders and bi-encoders?

A
  1. cross: one encoder for q and d
    bi: separate encoders for q and d
  2. cross: full interaction between words in q and d
    bi: no interaction between words in q and d
  3. cross: higher quality ranker than bi
  4. cross: only possible in re-ranking
    bi: highly efficient (also in 1st stage)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sentence-BERT

A

commonly used bi-encoder, originally designed for sentence similarity but can be used for q,d pairs

it is a pointwise model, because we only take one d into account per learning item. At inference we measure the similarity between q and each d and then sort the docs by this similarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the goal of training bi-encoders?

A

the similarity between the 2 vectors is maximized for docs relevant to q and minimized for non-relevant docs to q, given the similarity function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

why are bi-encoders less effective than cross-encoders?

A

cross-encoders can learn relevance signals from attention between the query and candidate texts at each transformer encoder layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ColBERT

A

proposed as a model that has the effectiveness of cross-encoders and the efficiency of bi-encoders
- compatible with nearest neighbour search techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is nearest-neighbour search?

A

finding which document embeddings vector are most similary to the query embeddings vector

computing the similarity for each d,q pair is not scalable => approximate nearest-neighbour (ANN) search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

similarity in ColBERT

A

similarity between d and q is the sum of maximum cosine similarities between each query term and the best matching term in d: compute the similarity of each query term to every document term and sum the maximum similarities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ColBERT: query time

A
  1. query term embedding
  2. 1st stage: the top-k texts from the corpus are retrieved for each query embedding
  3. 2nd stage: k candidate texts are scored using all query token representations according to the MaxSim operator and then ranked
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ColBERT loss function

A

L(q,d+, d-) = - log exp(s_q,d+) / ( exp(s_q,d+) + exp(s_q,d-))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

challenges of long documents

A
  • memory burden of reading the whole document in the encoder
  • mixture of many topics, query matches may be spread
  • neural model must aggregate the relevant matches from different parts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

challenges of short documents

A
  • fewer query matches
  • but neural model is more robust towards the vocabulary mismatch problem than term-based matching models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the long-tail problem?

A

a good IR method must be able to retrieve infrequently searched-for documents and perform reasonably well on queries with rare terms