Question Answering Flashcards
What is IR based QA?
It is information retrieval question answering. It uses a large corpus from the web, and given a question Machine Reading Comprehension (MRC) extracts answers from spans of text in a corpus
What is knowledge based QA?
It maps questions to a knowledge based query to get an answer - it uses a database of facts like DBpedia
What are some other types of QA?
Long-form QA, which uses why type questions (long answers)
Community QA, uses QA pairs from sources like Quora and Stack Overflow
What is the concept of IR-based QA?
Given a question, return an answer from text spans with a corpus of web documents
What model does IR-based factoid QA use?
Retrieve and Read model
What does the retrieve and read model do?
It retrieves relevant documents to a query from an index. MRC is used to select best text span from relevant documents for an answer
What are some MRC datasets?
SQuAD 1.1, 2.0
Hotspot QA
Natural Questions
TyDI QA
How does MRC work?
It performs answer extraction, which is a span labelling task. The input is a question and a passage. There are two outputs which form two decoder stacks, which produces two outputs stating whether a token it observes is the start and end of the answer.
How is MRC encoded?
It uses the standard BERT model. We take the final hidden layer, T of the BERT model for the paragraph. An MLP is used for both the start and end point of the answer.
What loss function is used for MRC?
Cross entropy for both the start answer embedding and the end answer embedding.
How do we find the best span score for MRC?
We take the dot product of the start embedding with the word embedding, added to the dot product of the end embedding with the word embedding. We look at all possible options and use argmax to take the best span.
What do we need to ensure with the start and end embeddings?
That the start is always before the end
What happens when there are no-answer questions?
They contain the special [CLS] token as a proxy for a no answer
What happens when the answer is longer than the BERT limit?
BERT limit is 512, so a sliding 512 word window is used over larger passage documents
Where is IR based factual QA likely to fail?
Where answers are rooted in the deep web, so databases have to consulted.