Week 7 - Span Extraction Flashcards

Question 1

Q

What can be difficult about span extraction

Answer

A

Not clear which collection of classes apply in large amounts of docs
So we need to identify which tokens in the text are relevant

Question 2

Q

What are the 5 ways to identify relevant tokens in a text

Answer

A

Keyword extraction
Relation extraction
Open information Extraction
Machine Reading Comprehension
Question Answering

Question 3

Q

What is the definition of span extraction

Answer

A

extracting 0,…,n contiguous spans from a piece of text

Question 4

Q

What are Keywords

Answer

A

contiguous spans that represent and summarise the essential content of the document

Question 5

Q

What is Relation extraction

Answer

A

Relation classification without entity mention
Extraction of entities that are related by a fixed set of relations

Question 6

Q

What is keyword extraction

Answer

A

open-class document categorisation

Question 7

Q

what is open information extraction

Answer

A

relation extraction without relations
domain-independent

Allows us to convert information that is conveyed in textual, unstructured form into a machine readable format

Question 8

Q

what is machine reading comprehension

Answer

A

query-conditioned information extraction
finding a span in a passage that best answers a question referring to the passage

often employed as an endpoint of Question Answering systems

Question 9

Q

Relation extraction v open information extraction

Answer

A

fixed vs open vocabulary of relations

Question 10

Q

issues with open IE

Answer

A

has to make syntactical assumptions on data which can result in not ideal relations extracted (some being missed)

should perform some sort of normalisation to achieve a fixed set of relations “munch on” “chew” “eat” = same relation

Question 11

Q

what is Aggregation

Answer

A

gather and summarize information from many sources, providing a comprehensive view of the data

Question 12

Q

open IE: precision vs recall

Answer

A

to increase precision -> need stricter rules on relations which results in more false positives (higher recall) and vice versa
this is a tradeoff

Question 13

Q

Question answering pipeline

Answer

A

User query “what is…?”
Evidence retrieval makes query to wikipedia/search engines/etc
Question and retrieved documents are send to MRC model
MRC model extracts answer from evidence and sends it back

Question 14

Q

What are pattern based relation extractions

Answer

A

can define high precision/low recall patterns to extract relations of interest
these patterns can rely on syntax/semantics/additional knowledge

eg X/NP, (RB) found in Y/NP

Question 15

Q

How do we get hold of RE patterns

Answer

A

linguistic knowledge (human)
bootstrapping

Question 16

Q

What are the three steps of relation extraction bootstrapping

Answer

A

1) For known triples, search corpus to find other ways the triple is described -> find new expressions “Eartch contains Uranium”/”Uraniam found in the Earth”

2) extract patterns and select the top-K patterns that occur as “New”

3) apply “New” patterns to same corpus to extract new triples (these new patterns represent previously unknown relationships)

Question 17

Q

What is an RE triple

Answer

A

subject
predicate: type of relationship “lies on” “eats”
object

Question 18

Q

Bootstrapping - why only the top K patterns

Answer

A

Dont want to drift away from original relationships
new found expression can be too general

Question 19

Q

What is open IE expected output

Answer

A

set of extracted relations

Question 20

Q

Open IE motivation

Answer

A

applicable to Diverse Data
cannot resort to a specific domain
scalable to large data so cannot take ages for a single extraction

Question 21

Q

What is the ReVerb algorithm

Answer

A

open IE solution
Uses POS tagging and finds verbs
extracts relationships form surrounding verb phrases
filters out low confidence relations
scalable, flexible

Question 22

Q

what is OLLIE

Answer

A

Open IE bootstrapping solution
similar to normal bootstrapping
takes high confidence extractions from ReVerb
maps to large corpus (contains extraction words) and generates patterns
(paths in dependency parse)
apply patterns to corpus to extract new triples

Question 23

Q

Open IE as sequence labelling

Answer

A

a solution to open ie challenges
for each verb, expand the predicate
for each word: label as argument or non participating

Question 24

Q

How does MRC transformer work at a high level

Answer

A

Question and passage passed to encoder
forms a probability distribution to predict a token being at beginning/end of a span

Question 25

Q

How do get contextualised embeddings for MRC

Answer

A

As with many tasks, contextual embeddings from pre trained models are expressive enough to be directly plugged into this classifier (eg from BERT)

Question 26

Q

What is MRC transformer architecture

Answer

A

(almost identical to text classification)
contextualised embeddings of questions and passage are concatenated together
passed as input to LM
Multiply output of final layer with two vectors to obtain probability distributions over tokens (denoting start/end)
minimise the CE loss between predicted distribution and ground truth start and end positions

Question 27

Q

MRC transformers pre-processing

Answer

A

discards spans that go over special tokens (eg over [SEP]
discard spans where Pe (end ) comes before Ps(start)
cut out spans that are too long
rank based on probabilities
pick the best one

Question 28

Q

MRC for longer documents

Answer

A

similar to text classification
split in n chunks of size m and stride s
question is prepended to each of the chunks
process chunks independently (no contextualization layer)

Question 29

Q

what is MRC transformer post-processing

Answer

A

for longer documents, needs to take into account all chunks per question
rank them and pick best probability from candidates over all of the chunks

Question 30

Q

what does open ie allow us to perform

Answer

A

allows us to perform search, linking, aggregation

Brainscape's Knowledge GenomeTM

Week 7 - Span Extraction Flashcards

Brainscape's Knowledge Genome^TM