IR - Neural Re-Ranking Flashcards

Question 1

Q

What is re-ranking?

Answer

A

It is changing the ranking of some list of documents

Question 2

Q

What is the normal workflow of relevant document retrieval?

Answer

A

Before anything, we do some preprocessing:
- Create an inverted index on the corpus of documents.
- Then, based on the user query, some first-stage ranker is used to rank all documents and retrieve top 1000 documents. This ranker should be fast, but usually not very accurate.
- Then, these 1000 documents are fed into second-stage re-ranker together with the query so that top 10 are retrieved. This second stage re-ranker is usually slower (can be neural for example) but more accurate.

Question 3

Q

What is the core of neural re-ranking models?

Answer

A

It is the matching module: how relevant a specific document is for a given query. It outputs some score and re-ranker ranks based on this score.

Both documents and question have to be encoded, and then some feature extraction is used, and afterwards there is a matching where we retrieve the relevant score.

Question 4

Q

How are neural re-rankers trained?

Answer

A

Same as dense retrieval models:

Re-rankers can be trained independently from the operations (one trained and done), but could be done repeatedly in case some shift in data happens.

Ususally trained with triplets: Q, P+, P-. They are converted to embeddings. There is a Loss function that has to be minimizes (like maximizing a margin between rel and non-rel documents).

Training is done end-to-end but we could freeze some parts for more efficient training.

Question 5

Q

How is the evaluation of re-rankers performed?

Answer

A

Scoring is done per tuple (1 query 1 document). Then, a list of tuples in sorted and evaluated using some ranking metrics like MRR@10

Question 6

Q

Why can’t we compare training loss and IR evaluation (like MRR - Mean Reciprocal Rank)?

Answer

A

Loss is only good in beginning of the training to check that the network is learning and is going in the right direction. But after some time, it should converge (not change). MRR is more fine-granular, it is checking top 10 documents for example, and not the margin between rel and nonrel documents.

Question 7

Q

Explain the match matrix (MatchPyramid) approach in ranking.

Answer

A

First, we perform the embedding for query and document. Can be work embedding (like word2vec) or with BERT that gives embedding for each token in the text.

Then we perform the match matrix multiplication where we do some similarity like cosine similarity between each pair of query-embedding and document-embedding.

This matrix can be seen as an image, and now we perform some pooling to extract the features from this matric (like in CNNs). This process highly depends on the configurations: how many convolution layers, kernel sizes, what pooling operations etc. One convolutional layer reduces the matrix size by compressing information (extracting features). After some number of conv/pooling layers, we have an MLP layer with the final score at the end.

Question 8

Q

Explain BERTcat

Answer

A

BERTcat = BERT + concatenation

The idea is to combine a query and a document as a concatenated input (CLS + query + SEP + document), feed it to BERT and extrach the embedding/vector of the CLS token. Feed it into a feed forward network, and the output should be the score between a document and the query.

This process has to be done (in real time) for query and every passage. That is why this is not used in initial ranking, but only for re-ranking.

Bad interpretability (only can see score, black box), slower execution time but bigger accuracy.

Question 9

Q

Explain the mono-duo pattern in document ranking

Answer

A

There are two stages:

Mono: Normal score for a document score(q, d). In case we want to find top 1000 documents, we run this for all 1000 documents and rank them.

2, Duo: In the second step, we want to re-rank these documents and get top50. We ask the question “Is d1 more relevant than d2” score(q, d1, d2). This has to be done 50*50 times.

Logic is that this is how humans would do it.

Question 10

Q

What is query or document expansion?

Answer

A

When doing information retrieval or ranking, you add some similar words to the querydocument so it enriches the search, it matches better.

Question 11

Q

How would you do ranking for long documents?

Answer

A

Cap the documents to like 512 input tokens (q + d). Works well only if the beginning is relevant.
Sliding window over the whole document. Take the max score of all parts.

Question 12

Q

How to increase the efficiency (latency) of the ranking?

Answer

A

Reduce model size: using knowledge distillation, just a small model (train smaller model).
Move computation from query-time. Do some precomputation. We could do document embeddings in offline (precompute them) and include indexing phase to find them.
PreTTR: Split the BERT model. The first n layers can be preprocessed (for the passages), then the following layers are somehow combined with query BERT process.

Pro is the same quality as BERTcat, Neg is still the low latency and storage requirements.

ColBERT: Again, split the BERT model but not like BERTcat. Precompute the whole BERT embedding of passages and store it somewhere. At the query time, only to BERT for query, and then do match matrix of each query and each passage term. Next we have some layers to max and sum over it to get the score.

Pro is very fast query, but a lot of storage needed.

IR - Neural Re-Ranking Flashcards

Lecture 6