Week 7 - Span Extraction Flashcards
What can be difficult about span extraction
Not clear which collection of classes apply in large amounts of docs
So we need to identify which tokens in the text are relevant
What are the 5 ways to identify relevant tokens in a text
- Keyword extraction
- Relation extraction
- Open information Extraction
- Machine Reading Comprehension
- Question Answering
What is the definition of span extraction
extracting 0,…,n contiguous spans from a piece of text
What are Keywords
contiguous spans that represent and summarise the essential content of the document
What is Relation extraction
Relation classification without entity mention
Extraction of entities that are related by a fixed set of relations
What is keyword extraction
open-class document categorisation
what is open information extraction
relation extraction without relations
domain-independent
Allows us to convert information that is conveyed in textual, unstructured form into a machine readable format
what is machine reading comprehension
query-conditioned information extraction
finding a span in a passage that best answers a question referring to the passage
often employed as an endpoint of Question Answering systems
Relation extraction v open information extraction
fixed vs open vocabulary of relations
issues with open IE
has to make syntactical assumptions on data which can result in not ideal relations extracted (some being missed)
should perform some sort of normalisation to achieve a fixed set of relations “munch on” “chew” “eat” = same relation
what is Aggregation
gather and summarize information from many sources, providing a comprehensive view of the data
open IE: precision vs recall
to increase precision -> need stricter rules on relations which results in more false positives (higher recall) and vice versa
this is a tradeoff
Question answering pipeline
User query “what is…?”
Evidence retrieval makes query to wikipedia/search engines/etc
Question and retrieved documents are send to MRC model
MRC model extracts answer from evidence and sends it back
What are pattern based relation extractions
can define high precision/low recall patterns to extract relations of interest
these patterns can rely on syntax/semantics/additional knowledge
eg X/NP, (RB) found in Y/NP
How do we get hold of RE patterns
- linguistic knowledge (human)
- bootstrapping