C6 Flashcards

Question 1

Q

information extraction (IE) with applications

Answer

A

discover structured information from unstructured or semi-structured text

applications: automatically identify mentions of medications and side effects in electronic health records, find company names in economic newspaper texts

Question 2

Q

2 types of information extraction tasks

Answer

A

Named Entity Recognition (NER)
Relation extraction

Question 3

Q

Named Entity Recognition

Answer

A

machine learning task based on sequence labelling:
- word order matters
- one entity can span multiple words
- multiple ways to refer to the same concept

=> extracted entities often need to be linked to a standard form

Question 4

Q

sequence labelling for NER

Answer

A

sequence = sentence, element = word, label = entity type
one label per token
assigned tags capture both the boundary and the type

Question 5

Q

IOB tagging

Answer

A

format of training data
- each word gets a label (punctuation gets labelled separately)
- beginning (B), inside (I) of each entity type
- and one for tokens outside (O) any entity

Question 6

Q

Hidden Markov Model (HMM)

Answer

A

probabilistic sequence model: given a sequence of units (words), it computes a probability distribution over possible
sequences of labels and chooses the best label sequence

probabilities are estimated by counting on a labelled training corpus

Question 7

Q

feature-based NER

Answer

A

supervised learning:
- each word is represented by a feature vector with information about the word and its context: create a feature vector for word x_i in position i, describing x_i and its context

Question 8

Q

Part-Of-Speech tagging

Answer

A

Part-of-speech (POS) = category of words that have similar grammatical properties
- noun, verb, adjective, adverb
- pronoun, preposition, conjunction, determiner

Question 9

Q

Conditional Random Fields (CRF)

Answer

A

It is hard for generative models like HMMs to add features directly into the model => more powerful model: CRF

discriminative undirected probabilistic graphical model
can take rich representations of observations (feature vectors)
takes previous labels and context observations into account
optimizes the sequence as a whole. The probability of the best sequence is computed by the Viterbi algorithm

Question 10

Q

commonly used neural sequence model for NER

Answer

A

bi-LSTM-CRF:
LSTM = neural architecture with Long Short-Term Memory
Bi-LSTMs are Recurrent Neural Networks (RNNs)

But for NER the softmax optimization is insufficient because we need strong constraints for neighbouring tokens (I tag must follow I or B tag) => CRF layer on top of the bi-LSTM output

Question 11

Q

normalization of extracted mentions

Answer

A

suppose we have to extract company names and stock market info in newspaper text -> multiple extracted mentions can refer to the same concept

in order to normalize these, we need a list of concepts:
- knowledge bases (IMBD, Wikipedia)
- ontology

Question 12

Q

ontology linking approaches

Answer

A

Define it as text classification task with the ontology items as labels. challenges: huge label space and we don’t have training data for all items
Define it as term similarity task: use embeddings trained for synonym detection

Question 13

Q

give a relation extraction example and three possible methods

Answer

A

example relations: Tim Wagner is a spokesman for American Airlines, United is a unit of UAL Corp.

methods:
1. Co-occurrence based
2. Supervised learning (most reliable)
3. Distant supervision (if labelled data is limited)

Question 14

Q

co-occurrence based relation extraction

Answer

A

assumption: entities that frequently co-occur are semantically connected

use a context window (e.g. sentence) to determine co-occurrence
we can create a network structure based on this

Question 15

Q

supervised relation extraction

Answer

A

assumtions: two entities, one relation

relation extraction as classification problem
1. Find pairs of named entities (usually in the same sentence).
2. Apply a relation classification on each pair. The classifier can use any supervised technique

Question 16

Q

distant supervision relation extraction

Answer

Study These Flashcards

A

Suppose we don’t have labelled data for relation extraction, but we do have a knowledge base => How could you use the knowledge base to identify relations in the text and discover relations that are not yet in the knowledge base?

Start with a large, manually created knowledge base (e.g. IMDB)
Find occurrences of pairs of related entities from the database in sentences
- Assumption: If two entities participate in a relation, any sentence that contains these entities express that relation
Train a Relation Extraction classifier (supervised) on the found entities and their context
Apply the classifier to sentences with yet unconnected other entities in order to find new relations