Named Entity Recognition Flashcards

1
Q

what is a named entity

A

a real world object that can be named, e.g. person, location, time, money, organisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How long is an entity mention

A

can be either a single token or a span of text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the possible approaches to named entity recognition

A
  • dictionary lookup
  • rule based
  • machine learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we treat named entity recognition as a ml problem

A

treat it as a tagging problem

use BIO for entity mention and then find the category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How many classes do we consider with n entity types

A

2n + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the local approach to NER

A

tags are independent of each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What methods are local approaches to NER

A

rnn, lstm, bilstm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a global approach to NER mean

A

tags are dependent on each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What methods are global approaches to NER

A

hmm (hidden markov model)

crf (conditional random fields)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does a HMM perform NER

A

establish sequence by arranging output variables in a chain

sequence input x, sequence of states y

yt depends only on yt-1
xt depends only on yt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Give the equation for sequences P(y,x) using a HMM

A

p(y,x) = multiply for each t p(yt-1|yt)p(xt|yt)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a conditional random field crf?

A

a discriminative model for sequence labelling

finds the most probable sequence y’ given observation sequence x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the equation for y* using a crf model

A

y* = argmax p(y|x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is p(y|x) given a linear chain crf

A

1/normalisation factor *

exp( sum for t sum for feature f weight * feature function(yt, yt-1, xt)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a feature function?

A

characterises the input (based on certain features)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what feature types are there (7)

A
  • contextual
  • POS tag
  • trigger words
  • length in tokens
  • orthographic (captials, punct, single char)
  • suffixes
  • gazetteers features (features from a list)
17
Q

Give an example feature function

f(yt, yt-1, xt) = …

A

1 if first letter of xt is uppercase

0 otherwise

18
Q

The more features…

A

the more powerful the learner

19
Q

What are the benefits of crf (3)

A
  • features are intuitive
  • easy to interpret and debug
  • high performance
20
Q

what is the disadvantage of crf

A

feature engineering requires domain knowledge

21
Q

what is the solution to the feature engineering requirements of crf

A

neural networks

22
Q

why are neural networks the solution to the feature engineering requirements of crf

A

we can represent word meanings in a high dimension space that shows the features of the word

23
Q

How can we use a rnn for NER

A

use softmax on an rnn to predict the tag given a token. use a training set of tagged sentences

24
Q

What is the equation we minimise for rnn for NER

A

negative log likelihood:

sum each training sentence: sum each tag: - log p(tag|token)

25
Q

In the global approach, for a given sequence we predict…

A

all tags, a whole token sequence

26
Q

how can we implement a global approach for NER using crf

A

use a linear chain crf but replace the feature function with whi + b from a bilstm

27
Q

why is a global approach better than local for sequence labelling (2)

A

we can encode rules like in BIO, I never comes after O

In a local approach, more Os is bad but it doesn’t affect the global approach

28
Q

Benefits of of CRF for NER (4)

A

feature engineering
no pretrained vectors
interpretable
performs well with many NE categories

29
Q

Key features of NN (4)

A

dont need features
need pretrained vectors from large models
not easy to interpret
performs less well when lots of NE categories