Lecture 5 - Relation Extraction & Question Answering Flashcards

1
Q

What is a relation triple?

A

a simple relation between a predicate and 2 arguments (subject - predicate - object (of some sort)) | e.g.: Golden Gate Park location San Francisco

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Where is relation extraction used?

A
  • create new structured knowledge bases (useful for any app)
  • augment current knowledge bases (adding words to WordNet)
  • support in question answering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Automated Content Extraction (ACE)?

A

ACE is a tool containing the 17 most important (or at least the ones identified) relations
e.g. PHYSICAL - LOCATED
PERSON - SOCIAL - FAMILY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the three methods to extract relations?

A
  1. Hand-written patterns
  2. Supervised Machine Learning
  3. Semi-supervised and unsupervised
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Briefly, what does Hearst paper say about patterns

A
  • there are a lot of patterns that can be used to suggest that two entities are in this IS-A (hyponym) relation
  • these kind of patterns are able to learn the IS-A relation between a new term such as “bow lute” and “Bambara ndang”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some pros and cons of hand-written relation extraction?

A

PRO:
Human patterns tend to be high precision
Can be tailored to specific domains

CON:
Human patterns are often low recall
A lot of work to think about all the patterns corresponding to all the relations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the steps for supervised relation extraction?

A
  1. Choose a set of relations we’d like to extract
  2. Choose a set of relevant named entities
  3. Find and label data
    * choose a representative corpus
    * label the named entities in the corpus
    * hand label the relations between them
    * break into training, development, test
  4. Train a classifier on the training set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you do classification in supervised relation extraction?

A
  1. Find all pairs of named entities
  2. Decide if two entities are related
  3. If yes, classify the relation

we need the extra step 2 because it is faster to drop the unimportant pairs

I think how it works is basically that you parse the sentence in some way: headwords, bag of words, bigrams and so on, and you also have the label for this relation (because the sentence contains the two entities you are looking for and the relation between them)
, or you can use ACE for this task

And then, basically, you can use SVM, Naive Bayes and so on to train the model on that dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you evaluate a supervised relation extraction model? What are the formulas?

A

You compute precision, recall and F1 score

P = # correctly extracted relations/ total # of extracted relations

R = # correctly extracted relations/ total # of gold relations

F1 = 2PR/ (P + R)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some pros and cons of supervised relation extraction?

A

PRO:
can get high accuracies with enough hand-labeled data, if test set is similar enough to training

CON:
labeling a large training set is expensive
supervised model do not generalize well to different genres

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the three semi-supervised and unsupervised relation extraction models that we learned?

A
  1. Bootstrapping (using seeds)
  2. Distant Supervision
  3. Unsupervised learning from the web
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When should you use bootstrapping relation extraction

A

When you don’t have a hand-labeled dataset but you have some seed tuples or some high precision patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does bootstrapping relation extraction work?

A
  1. Gather a set of seed pairs that have the relation R
  2. Iterate:
    * Find sentences with these pairs (maybe from web)
    * Look at the context between or around the pair and generalize the context to create patterns
    * Use the patterns for grep for new pairs (to find new pair)

e.g. seed tuple

grep (Google) for the environments of the seed tuple:
“Mark Twain is buried in Elmira” - X is buried in Y
“The grave of Mark Twain is in Elmira” - The grave of X is in Y
“Elmira is Mark Twain’s final resting place” - Y is X’s final resting place

use these patterns to grep for new tuples

iterate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Distant Supervision algorithm?

A

It combines bootstrapping with supervised learning

  1. use a large dataset to get a huge # of seed tuples
  2. create lots of features with these examples
  3. combine in supervised classifiers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does the Distant Supervision algorithm work?

A
  1. For each relation (e.g born-in)
  2. For each tuple in a big database (e.g. , )
  3. Find sentences in large corpus with both entities (e.g. “Hubble was born in Marshfield”, “Einstein, born in Ulm”)
  4. Extract frequent features (parse, words) (e.g. PER was born in LOC, PER, born in LOC)
  5. Train supervised classifiers using thousands of features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does unsupervised relation extraction work?

A
  1. Use parsed data to train a “trustworthy tuple” classifier
  2. Single-pass extract all relations between NPs, keep if trustworthy
  3. Assessor ranks relations based on text redundancy
17
Q

How do you evaluate semi-supervised and unsupervised relations extraction models?

A

You can only approximate precision by drawing a random sample of relations from output and check presision manually

18
Q

What are the three types of question answering models?

A

IR-based QA
Knowledge-based QA
Hybrid QA

19
Q

What are the two main question types?

A

Factoid - Where is Apple based?
Complex (narrative) - What do scholars think about Jefferson’s position on dealing with pirates?

these types of questions can be answered | Factoid questions are used in commercial applications | Complex questions are generally answered more in research systems

20
Q

What is the judgment behind the three types of question answering models? (briefly)

A

IR-based: go find the answer in some string on the web

Knowledge-based: build an answer by understanding a parse of the question

Hybrid: take a combination of these 2 approaches (most modern systems)

21
Q

What are the three main steps of IR-based QA?

A

QUESTION ANSWERING

  • detect question types, answer type, focus, relations
  • formulate queries to send to a search engine

PASSAGE RETRIEVAL

  • retrieved ranked documents
  • break into suitable passages and rerank

ANSWER PROCESSING

  • extract candidate answers
  • rank candidates using evidence from the text and external sources
22
Q

Swipe to see how IR-based QA works in my own words

A
  • starts with a questions and begins by extracting information from the question itself (most important and common: a query that is gonna be sent to an IR engine and the type of the answer that tell us what kind of entity we are looking for
  • in advance, we take a lot of documents, we index them so that when we have a query, we can return a lot of documents
  • from those documents we extract passages (so, parts of those documents) → then they are processed in answer processing (by looking at what type of answer we are looking for) → and then returns the answer
23
Q

What things should the model extract from the question we are asking?

A

“what are the two states that border Florida”

  1. answer type (name, entity, number)
  2. query formulation (two states, border, Florida)
  3. focus detection (two states) - find the question word/s that can be replaced by the answer
  4. relation extraction (borders(Florida, ?x, north)
24
Q

Briefly what is knowledge-based and hybrid-based QA?

A

Knowledge-based: builds a semantic representation of the query (times, dates, locations), and then maps from these semantics to query structured data or resources (geospatial database, ontologies like Wikipedia, restaurant reviews and so on)

Hybrid: builds a shallow semantic representation of the query, and then generates answer candidates using IR methods, and then score each candidate using richer knowledge sources (like above)