Named Entity Recognition Flashcards
what is a named entity
a real world object that can be named, e.g. person, location, time, money, organisation
How long is an entity mention
can be either a single token or a span of text
What are the possible approaches to named entity recognition
- dictionary lookup
- rule based
- machine learning
How can we treat named entity recognition as a ml problem
treat it as a tagging problem
use BIO for entity mention and then find the category
How many classes do we consider with n entity types
2n + 1
What is the local approach to NER
tags are independent of each other
What methods are local approaches to NER
rnn, lstm, bilstm
What does a global approach to NER mean
tags are dependent on each other
What methods are global approaches to NER
hmm (hidden markov model)
crf (conditional random fields)
How does a HMM perform NER
establish sequence by arranging output variables in a chain
sequence input x, sequence of states y
yt depends only on yt-1
xt depends only on yt
Give the equation for sequences P(y,x) using a HMM
p(y,x) = multiply for each t p(yt-1|yt)p(xt|yt)
What is a conditional random field crf?
a discriminative model for sequence labelling
finds the most probable sequence y’ given observation sequence x
What is the equation for y* using a crf model
y* = argmax p(y|x)
what is p(y|x) given a linear chain crf
1/normalisation factor *
exp( sum for t sum for feature f weight * feature function(yt, yt-1, xt)
What is a feature function?
characterises the input (based on certain features)
what feature types are there (7)
- contextual
- POS tag
- trigger words
- length in tokens
- orthographic (captials, punct, single char)
- suffixes
- gazetteers features (features from a list)
Give an example feature function
f(yt, yt-1, xt) = …
1 if first letter of xt is uppercase
0 otherwise
The more features…
the more powerful the learner
What are the benefits of crf (3)
- features are intuitive
- easy to interpret and debug
- high performance
what is the disadvantage of crf
feature engineering requires domain knowledge
what is the solution to the feature engineering requirements of crf
neural networks
why are neural networks the solution to the feature engineering requirements of crf
we can represent word meanings in a high dimension space that shows the features of the word
How can we use a rnn for NER
use softmax on an rnn to predict the tag given a token. use a training set of tagged sentences
What is the equation we minimise for rnn for NER
negative log likelihood:
sum each training sentence: sum each tag: - log p(tag|token)
In the global approach, for a given sequence we predict…
all tags, a whole token sequence
how can we implement a global approach for NER using crf
use a linear chain crf but replace the feature function with whi + b from a bilstm
why is a global approach better than local for sequence labelling (2)
we can encode rules like in BIO, I never comes after O
In a local approach, more Os is bad but it doesn’t affect the global approach
Benefits of of CRF for NER (4)
feature engineering
no pretrained vectors
interpretable
performs well with many NE categories
Key features of NN (4)
dont need features
need pretrained vectors from large models
not easy to interpret
performs less well when lots of NE categories