Lecture 5 - Relation Extraction & Question Answering Flashcards

Question 1

Q

What is a relation triple?

Answer

A

a simple relation between a predicate and 2 arguments (subject - predicate - object (of some sort)) | e.g.: Golden Gate Park location San Francisco

Question 2

Q

Where is relation extraction used?

Answer

A

create new structured knowledge bases (useful for any app)
augment current knowledge bases (adding words to WordNet)
support in question answering

Question 3

Q

What is the Automated Content Extraction (ACE)?

Answer

A

ACE is a tool containing the 17 most important (or at least the ones identified) relations
e.g. PHYSICAL - LOCATED
PERSON - SOCIAL - FAMILY

Question 4

Q

What are the three methods to extract relations?

Answer

A

Hand-written patterns
Supervised Machine Learning
Semi-supervised and unsupervised

Question 5

Q

Briefly, what does Hearst paper say about patterns

Answer

A

there are a lot of patterns that can be used to suggest that two entities are in this IS-A (hyponym) relation
these kind of patterns are able to learn the IS-A relation between a new term such as “bow lute” and “Bambara ndang”

Question 6

Q

What are some pros and cons of hand-written relation extraction?

Answer

A

PRO:
Human patterns tend to be high precision
Can be tailored to specific domains

CON:
Human patterns are often low recall
A lot of work to think about all the patterns corresponding to all the relations

Question 7

Q

What are the steps for supervised relation extraction?

Answer

A

Choose a set of relations we’d like to extract
Choose a set of relevant named entities
Find and label data
* choose a representative corpus
* label the named entities in the corpus
* hand label the relations between them
* break into training, development, test
Train a classifier on the training set

Question 8

Q

How do you do classification in supervised relation extraction?

Answer

A

Find all pairs of named entities
Decide if two entities are related
If yes, classify the relation

we need the extra step 2 because it is faster to drop the unimportant pairs

I think how it works is basically that you parse the sentence in some way: headwords, bag of words, bigrams and so on, and you also have the label for this relation (because the sentence contains the two entities you are looking for and the relation between them)
, or you can use ACE for this task

And then, basically, you can use SVM, Naive Bayes and so on to train the model on that dataset

Question 9

Q

How do you evaluate a supervised relation extraction model? What are the formulas?

Answer

A

You compute precision, recall and F1 score

P = # correctly extracted relations/ total # of extracted relations

R = # correctly extracted relations/ total # of gold relations

F1 = 2PR/ (P + R)

Question 10

Q

What are some pros and cons of supervised relation extraction?

Answer

A

PRO:
can get high accuracies with enough hand-labeled data, if test set is similar enough to training

CON:
labeling a large training set is expensive
supervised model do not generalize well to different genres

Question 11

Q

What are the three semi-supervised and unsupervised relation extraction models that we learned?

Answer

A

Bootstrapping (using seeds)
Distant Supervision
Unsupervised learning from the web

Question 12

Q

When should you use bootstrapping relation extraction

Answer

A

When you don’t have a hand-labeled dataset but you have some seed tuples or some high precision patterns

Question 13

Q

How does bootstrapping relation extraction work?

Answer

A

Gather a set of seed pairs that have the relation R
Iterate:
* Find sentences with these pairs (maybe from web)
* Look at the context between or around the pair and generalize the context to create patterns
* Use the patterns for grep for new pairs (to find new pair)

e.g. seed tuple

grep (Google) for the environments of the seed tuple:
“Mark Twain is buried in Elmira” - X is buried in Y
“The grave of Mark Twain is in Elmira” - The grave of X is in Y
“Elmira is Mark Twain’s final resting place” - Y is X’s final resting place

use these patterns to grep for new tuples

iterate

Question 14

Q

What is the Distant Supervision algorithm?

Answer

A

It combines bootstrapping with supervised learning

use a large dataset to get a huge # of seed tuples
create lots of features with these examples
combine in supervised classifiers

Question 15

Q

How does the Distant Supervision algorithm work?

Answer

A

For each relation (e.g born-in)
For each tuple in a big database (e.g. , )
Find sentences in large corpus with both entities (e.g. “Hubble was born in Marshfield”, “Einstein, born in Ulm”)
Extract frequent features (parse, words) (e.g. PER was born in LOC, PER, born in LOC)
Train supervised classifiers using thousands of features

Question 16

Q

How does unsupervised relation extraction work?

Answer

Study These Flashcards

A

Use parsed data to train a “trustworthy tuple” classifier
Single-pass extract all relations between NPs, keep if trustworthy
Assessor ranks relations based on text redundancy

Question 17

Q

How do you evaluate semi-supervised and unsupervised relations extraction models?

Answer

Study These Flashcards

A

You can only approximate precision by drawing a random sample of relations from output and check presision manually

Question 18

Q

What are the three types of question answering models?

Answer

Study These Flashcards

A

IR-based QA
Knowledge-based QA
Hybrid QA

Question 19

Q

What are the two main question types?

Answer

Study These Flashcards

A

Factoid - Where is Apple based?
Complex (narrative) - What do scholars think about Jefferson’s position on dealing with pirates?

these types of questions can be answered | Factoid questions are used in commercial applications | Complex questions are generally answered more in research systems

Question 20

Q

What is the judgment behind the three types of question answering models? (briefly)

Answer

Study These Flashcards

A

IR-based: go find the answer in some string on the web

Knowledge-based: build an answer by understanding a parse of the question

Hybrid: take a combination of these 2 approaches (most modern systems)

Question 21

Q

What are the three main steps of IR-based QA?

Answer

Study These Flashcards

A

QUESTION ANSWERING

detect question types, answer type, focus, relations
formulate queries to send to a search engine

PASSAGE RETRIEVAL

retrieved ranked documents
break into suitable passages and rerank

ANSWER PROCESSING

extract candidate answers
rank candidates using evidence from the text and external sources

Question 22

Q

Swipe to see how IR-based QA works in my own words

Answer

Study These Flashcards

A

starts with a questions and begins by extracting information from the question itself (most important and common: a query that is gonna be sent to an IR engine and the type of the answer that tell us what kind of entity we are looking for
in advance, we take a lot of documents, we index them so that when we have a query, we can return a lot of documents
from those documents we extract passages (so, parts of those documents) → then they are processed in answer processing (by looking at what type of answer we are looking for) → and then returns the answer

Question 23

Q

What things should the model extract from the question we are asking?

Answer

Study These Flashcards

A

“what are the two states that border Florida”

answer type (name, entity, number)
query formulation (two states, border, Florida)
focus detection (two states) - find the question word/s that can be replaced by the answer
relation extraction (borders(Florida, ?x, north)

Question 24

Q

Briefly what is knowledge-based and hybrid-based QA?

Answer

Study These Flashcards

A

Knowledge-based: builds a semantic representation of the query (times, dates, locations), and then maps from these semantics to query structured data or resources (geospatial database, ontologies like Wikipedia, restaurant reviews and so on)

Hybrid: builds a shallow semantic representation of the query, and then generates answer candidates using IR methods, and then score each candidate using richer knowledge sources (like above)

Lecture 5 - Relation Extraction & Question Answering Flashcards

(24 cards)