Question Answering Flashcards
Question answering (QA) main focus
QA systems focus on factoid questions, that is, questions that can be answered with simple facts.
Question answering (QA) 2 paradigms
Text-based QA:
- use efficient algorithms (information retrieval) to find relevant documents from text collection
- use reading comprehension algorithms (machine reading) on relevant documents to select span of text containing the answer
Knowledge-based QA:
- produce a semantic representation of the query
- match the semantic representation against fact databases
Text-based QA steps
1. Information retrieval (IR): maps input queries to a set of documents from some collection, ordered by relevance.
2. Machine reading:
- the input is a factoid question along with a passage that could contain the answer
- the output is the answer fragment, or else NULL if there is no answer in the passage
EXAMPLE:
question: “How tall is Mt. Everest?”
passage: “Mount Everest, reaching 29,029 feet at its summit, is
located in Nepal and Tibet …”
output fragment: “29,029 feet”
Machine reading model
Let q = q1, …, qn be a query and let p = p1, …, pm be a passage, where qt and pt are tokens.
A span is any fragment pi, …, pj of p.
The goal is to compute the probability P(pi, …, pj|q, p) that span pi, …, pj is the answer to q.
Assumption:
P(pi, …, pj|q, p) = Pstart(i|q,p)*Pend(j|p,q)
Answer span extraction using contextual embeddings; start and end vectors, score and fine-tuning loss
Pre-trained BERT (Bidirectional Encoder Representations from Transformers) is used to encode question and passage as strings separated by a [SEP] token:
- Let e(pi) be the BERT embedding of token pi within passage p.
-
Start vector S is learned to estimate start probabilities for each position i, using dot product and softmax:
- Pstart(i|q,p)=exp(Se(pi))/sum over j of exp(Se(pj))
- Similarly, we learn end vector E to estimate Pend(j|q,p). - The score of a candidate span from position i to j is:
- score = Se(pi) + Ee(pj)
TRAINING:
For each training instance, compute the negative sum of the log-likelihoods of the correct start and end positions:
L = -logPstart(i|p,q) - logPend(j|p,q)
Averaging for all instances provides the fine-tuning loss
Negative examples in contextual embeddings
Many datasets contain negative examples, that is, (q,p) pairs in which the answer to q is not in the passage p.
Negative examples are conventionally treated as having start and end indices pointing to the [CLS] special token.
Stanford attentive reader; bilinear product and attention
Stanford attentive reader is a neural model based on the RNN and it uses an attention-like mechanism.
Assume query q = q1, . . . , qN and passage p = p1, . . . , pM, where qt and pt’ are tokens.
Let e(qt), e(pt’) be non-contextual embeddings associated to tokens qt and pt’ respectively.
We use bidirectional LSTM to encode individual tokens from query and passage:
- We start with monodirectional embeddings: write them
- We concatenate monodirectional embeddings to encode individual passage tokens: write it
- We pick up boundary query embeddings to encode the entire query q: write it
- We derive an attention distribution by computing a vector of bilinear products and by applying softmax: write them
- We then use attention to combine passage tokens, and compute an output vector: write it
- Finally, the score of each candidate c is computed as the inner product: write it
Machine reading system evaluation measures
Machine reading systems often evaluated using two metrics
- Exact match: percentage of predicted answers that match the gold answer exactly
- F1 score: compute precision and recall for system and gold answers, viewed as bag of tokens, and return average F1 over all questions.
Knowledge-based QA, entity linking
Text-based QA use textual information over the web (unstructured).
Knowledge-based QA answers a natural language question by mapping it to a query over some structured knowledge repository.
Two main approaches to knowledge-based QA:
- Graph-based QA: models the knowledge base as a graph, with entities as nodes and relations as edges.
- QA by semantic parsing: maps queries to logical formulas, and query a fact database.
Both approaches to knowledge-based QA require algorithms for entity linking.
Entity linking, mention
Is the task of associating a mention in text with the representation of some real-world entity in an ontology.
The most common ontology for factoid question-answering is Wikipedia, in which case the task is called wikification.
Entity linking is done in (roughly) two stages:
- mention detection
- mention disambiguation