Named Entity Recognition Flashcards by Anna L

what is a named entity

a real world object that can be named, e.g. person, location, time, money, organisation

How well did you know this?

Not at all

Perfectly

How long is an entity mention

can be either a single token or a span of text

How well did you know this?

Not at all

Perfectly

What are the possible approaches to named entity recognition

dictionary lookup
rule based
machine learning

How well did you know this?

Not at all

Perfectly

How can we treat named entity recognition as a ml problem

treat it as a tagging problem

use BIO for entity mention and then find the category

How well did you know this?

Not at all

Perfectly

How many classes do we consider with n entity types

2n + 1

How well did you know this?

Not at all

Perfectly

What is the local approach to NER

tags are independent of each other

How well did you know this?

Not at all

Perfectly

What methods are local approaches to NER

rnn, lstm, bilstm

How well did you know this?

Not at all

Perfectly

What does a global approach to NER mean

tags are dependent on each other

How well did you know this?

Not at all

Perfectly

What methods are global approaches to NER

hmm (hidden markov model)

crf (conditional random fields)

How well did you know this?

Not at all

Perfectly

How does a HMM perform NER

establish sequence by arranging output variables in a chain

sequence input x, sequence of states y

yt depends only on yt-1
xt depends only on yt

How well did you know this?

Not at all

Perfectly

Give the equation for sequences P(y,x) using a HMM

p(y,x) = multiply for each t p(yt-1|yt)p(xt|yt)

How well did you know this?

Not at all

Perfectly

What is a conditional random field crf?

a discriminative model for sequence labelling

finds the most probable sequence y’ given observation sequence x

How well did you know this?

Not at all

Perfectly

What is the equation for y* using a crf model

y* = argmax p(y|x)

How well did you know this?

Not at all

Perfectly

what is p(y|x) given a linear chain crf

1/normalisation factor *

exp( sum for t sum for feature f weight * feature function(yt, yt-1, xt)

How well did you know this?

Not at all

Perfectly

What is a feature function?

characterises the input (based on certain features)

How well did you know this?

Not at all

Perfectly

what feature types are there (7)

Study These Flashcards

contextual
POS tag
trigger words
length in tokens
orthographic (captials, punct, single char)
suffixes
gazetteers features (features from a list)

Give an example feature function

f(yt, yt-1, xt) = …

Study These Flashcards

1 if first letter of xt is uppercase

0 otherwise

The more features…

Study These Flashcards

the more powerful the learner

What are the benefits of crf (3)

Study These Flashcards

features are intuitive
easy to interpret and debug
high performance

what is the disadvantage of crf

Study These Flashcards

feature engineering requires domain knowledge

what is the solution to the feature engineering requirements of crf

Study These Flashcards

neural networks

why are neural networks the solution to the feature engineering requirements of crf

Study These Flashcards

we can represent word meanings in a high dimension space that shows the features of the word

How can we use a rnn for NER

Study These Flashcards

use softmax on an rnn to predict the tag given a token. use a training set of tagged sentences

What is the equation we minimise for rnn for NER

Study These Flashcards

negative log likelihood:

sum each training sentence: sum each tag: - log p(tag|token)

In the global approach, for a given sequence we predict...

all tags, a whole token sequence

how can we implement a global approach for NER using crf

use a linear chain crf but replace the feature function with whi + b from a bilstm

why is a global approach better than local for sequence labelling (2)

we can encode rules like in BIO, I never comes after O In a local approach, more Os is bad but it doesn't affect the global approach

Benefits of of CRF for NER (4)

feature engineering no pretrained vectors interpretable performs well with many NE categories

Key features of NN (4)

dont need features need pretrained vectors from large models not easy to interpret performs less well when lots of NE categories

Named Entity Recognition Flashcards

(29 cards)