Question Answering Flashcards

Question 1

Q

What is IR based QA?

Answer

A

It is information retrieval question answering. It uses a large corpus from the web, and given a question Machine Reading Comprehension (MRC) extracts answers from spans of text in a corpus

Question 2

Q

What is knowledge based QA?

Answer

A

It maps questions to a knowledge based query to get an answer - it uses a database of facts like DBpedia

Question 3

Q

What are some other types of QA?

Answer

A

Long-form QA, which uses why type questions (long answers)

Community QA, uses QA pairs from sources like Quora and Stack Overflow

Question 4

Q

What is the concept of IR-based QA?

Answer

A

Given a question, return an answer from text spans with a corpus of web documents

Question 5

Q

What model does IR-based factoid QA use?

Answer

A

Retrieve and Read model

Question 6

Q

What does the retrieve and read model do?

Answer

A

It retrieves relevant documents to a query from an index. MRC is used to select best text span from relevant documents for an answer

Question 7

Q

What are some MRC datasets?

Answer

A

SQuAD 1.1, 2.0
Hotspot QA
Natural Questions
TyDI QA

Question 8

Q

How does MRC work?

Answer

A

It performs answer extraction, which is a span labelling task. The input is a question and a passage. There are two outputs which form two decoder stacks, which produces two outputs stating whether a token it observes is the start and end of the answer.

Question 9

Q

How is MRC encoded?

Answer

A

It uses the standard BERT model. We take the final hidden layer, T of the BERT model for the paragraph. An MLP is used for both the start and end point of the answer.

Question 10

Q

What loss function is used for MRC?

Answer

A

Cross entropy for both the start answer embedding and the end answer embedding.

Question 11

Q

How do we find the best span score for MRC?

Answer

A

We take the dot product of the start embedding with the word embedding, added to the dot product of the end embedding with the word embedding. We look at all possible options and use argmax to take the best span.

Question 12

Q

What do we need to ensure with the start and end embeddings?

Answer

A

That the start is always before the end

Question 13

Q

What happens when there are no-answer questions?

Answer

A

They contain the special [CLS] token as a proxy for a no answer

Question 14

Q

What happens when the answer is longer than the BERT limit?

Answer

A

BERT limit is 512, so a sliding 512 word window is used over larger passage documents

Question 15

Q

Where is IR based factual QA likely to fail?

Answer

A

Where answers are rooted in the deep web, so databases have to consulted.

Question 16

Q

What is the concept of knowledge-based QA?

Answer

A

Given a question, return a DB query that can get the answer

Question 17

Q

What type of information is knowledge-based QA good for?

Answer

A

Factual information (numerical information)

Question 18

Q

What are graph-based QA approaches based on?

Answer

A

Based on relation databases or triple stores of subject, predicate, objects. The first task is to perform entity linking, then perform relation detection/linking, then finally perform a database query for the answer

Question 19

Q

What is entity linking?

Answer

A

It is the task of associating a mention in text with an ontology/database entry where the input X is a question text and the output Y is the entity text span plus entity URI.

Question 20

Q

What is a non-neural entity linking approach?

Answer

A

TAGME uses anchor dictionaries (wikipedia concept URI + text spans linking to this URI) + entity disambiguation. The probability of a page given a particular anchor entity is the co-occurrence in the corpus plus the relatedness

Question 21

Q

What model is used for Neural Entity Linking?

Answer

A

The EQL model

Question 22

Q

How does the EQL model work?

Answer

A

Two encoders are used, a question encoder and an entity encoder. We have two inputs, the question, which is passed to the question encoder, and the entity candidate description, which is passed to the entity encoder along with the entity title. There are two separate outputs also with their own classifiers. The entity mention classifier observes the start of the entity, the end of the entity and is the token part of the entity. It tries to predict if a particular token is an entity within the question. The entity linker outputs the entity URI, the actual knowledge based entity. The entity linker compares the entity mention embedding and the entity candidate embedding to provide a disambiguated knowledge based entity for each question entity text span

Question 23

Q

What happens once we have identified the entities with knowledge based QA?

Answer

A

Once we have identified the entity within the question and identity with the a knowledge base, the relation needs to be identified.

Question 24

Q

What is neural relation detection and linking?

Answer

A

It is a way of linking a relation between entities in the database. The input is the question text and the output is the relation text span and the relation URI