Named Entity Recognition Flashcards

1
Q

what is a named entity

A

a real world object that can be named, e.g. person, location, time, money, organisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How long is an entity mention

A

can be either a single token or a span of text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the possible approaches to named entity recognition

A
  • dictionary lookup
  • rule based
  • machine learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we treat named entity recognition as a ml problem

A

treat it as a tagging problem

use BIO for entity mention and then find the category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How many classes do we consider with n entity types

A

2n + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the local approach to NER

A

tags are independent of each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What methods are local approaches to NER

A

rnn, lstm, bilstm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a global approach to NER mean

A

tags are dependent on each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What methods are global approaches to NER

A

hmm (hidden markov model)

crf (conditional random fields)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does a HMM perform NER

A

establish sequence by arranging output variables in a chain

sequence input x, sequence of states y

yt depends only on yt-1
xt depends only on yt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Give the equation for sequences P(y,x) using a HMM

A

p(y,x) = multiply for each t p(yt-1|yt)p(xt|yt)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a conditional random field crf?

A

a discriminative model for sequence labelling

finds the most probable sequence y’ given observation sequence x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the equation for y* using a crf model

A

y* = argmax p(y|x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is p(y|x) given a linear chain crf

A

1/normalisation factor *

exp( sum for t sum for feature f weight * feature function(yt, yt-1, xt)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a feature function?

A

characterises the input (based on certain features)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what feature types are there (7)

A
  • contextual
  • POS tag
  • trigger words
  • length in tokens
  • orthographic (captials, punct, single char)
  • suffixes
  • gazetteers features (features from a list)
17
Q

Give an example feature function

f(yt, yt-1, xt) = …

A

1 if first letter of xt is uppercase

0 otherwise

18
Q

The more features…

A

the more powerful the learner

19
Q

What are the benefits of crf (3)

A
  • features are intuitive
  • easy to interpret and debug
  • high performance
20
Q

what is the disadvantage of crf

A

feature engineering requires domain knowledge

21
Q

what is the solution to the feature engineering requirements of crf

A

neural networks

22
Q

why are neural networks the solution to the feature engineering requirements of crf

A

we can represent word meanings in a high dimension space that shows the features of the word

23
Q

How can we use a rnn for NER

A

use softmax on an rnn to predict the tag given a token. use a training set of tagged sentences

24
Q

What is the equation we minimise for rnn for NER

A

negative log likelihood:

sum each training sentence: sum each tag: - log p(tag|token)

25
In the global approach, for a given sequence we predict...
all tags, a whole token sequence
26
how can we implement a global approach for NER using crf
use a linear chain crf but replace the feature function with whi + b from a bilstm
27
why is a global approach better than local for sequence labelling (2)
we can encode rules like in BIO, I never comes after O In a local approach, more Os is bad but it doesn't affect the global approach
28
Benefits of of CRF for NER (4)
feature engineering no pretrained vectors interpretable performs well with many NE categories
29
Key features of NN (4)
dont need features need pretrained vectors from large models not easy to interpret performs less well when lots of NE categories