Information Extraction Flashcards

Question 1

Q

Information extraction

Answer

A

The process of Information Extraction turns the unstructured information embedded in texts into structured data, e.g. populating a relational database to enable further processing.

Question 2

Q

Relation Extraction

Answer

A

Finding and classifying semantic relations among entities mentioned in a text.

Question 3

Q

RDF triple

Answer

A

A tuple of entity-relation-entity,
called a subject-predicate-object expression.

Question 4

Q

5 Classes of algorithms for relation extraction

Answer

A

handwritten patterns
supervised machine learning
semi-supervised (via bootstrapping or distant supervision)
unsupervised

Question 5

Q

Semisurpervised Relation Extraction via Bootstrapping

Answer

A

If we have a few high-precision seed patterns, or seed tuples, we can bootstrap a classifier.

Bootstrapping proceeds by taking the entities in the seed pair, and then finding sentences (e.g. on the web) that contain both entities.

From all such sentences, we extract and generalize the context around the entities to learn new patterns.

Question 6

Q

Semantic drift

Answer

A

In semantic drift, an erroneous pattern leads to the introduction of erroneous tuples, which - in turn - leads to the creation of problematic patterns and the meaning of the extracted relations ‘drifts’.

Question 7

Q

Relation Extraction

Confidence values in bootstrapping

Answer

A

Bootstrapping systems assign confidence values to new tuples to avoid semantic drift.

Given a document collection D, a current set of tuples, T, and a proposed pattern p, we need to track two factors:

hits(p): the set of tuples in T that p matches while looking in D.
finds(p): the total set of tuples that p finds in D.

Conf(p) = log(|finds(p))|) x |hits(p)| / finds(p)

Question 8

Q

Distant Supervision for Relation Extraction

Answer

A

Distant supervision combines the advantages of bootstrapping with supervised learning.

Instead of just a handful of seeds, distant supervision uses a large database to acquire a huge number of seed examples, creates lots of noisy pattern features from all these examples, and then combines them in a supervised classifier.

Question 9

Q

Unsupervised Relation Extraction

Open Information Extraction

Answer

A

A task which has the goal of extracting relations from the web when we have no labeled training data, and not even any list of relations.

Question 10

Q

Open Information Extraction

ReVerb 4 Steps

Answer

A

Run a part-of-speech tagger and entity chuncker over s
For each verb in s, find the longest sequence of words w that start with a verb and satisfy syntactic and lexical constraints, merging adjacent matches.
For each phrase w, find the nearest noun phrase x to the left which is not a relative pronoun, wh-word or existential “there”. Find the nearest noun phrase y to the right.
Assign confidence c to the relation r = (x, w, y) using a confidence classifier and return it.

Question 11

Q

Temporal expressions

Answer

A

Expressions that refer to absolute points in time, relative times, durations and sets of those.

Absolute temporal expressions can be mapped directly to calendar dates, times of day, or both.

Relative temporal expressions map to particular times through some other reference point.

Durations denote spans of time at varying levels of granularity.

Question 12

Q

Temporal Normalization

Answer

A

The process of mapping a temporal expression to either a specific point in time, or to a duration.

Question 13

Q

Fully qualified date expression

Answer

A

Contains a year, month and day in some conventional form.

Question 14

Q

Event Extraction

Answer

A

The task of identifying mentions of events in tasks.

Question 15

Q

7 Allen Relations

Answer

A

A before B
A overlaps B
A meets B
A equals B
A starts B
A finishes B
A during B

Question 16

Q

Answer

Study These Flashcards

A

Information Extraction Flashcards

(16 cards)