NLP APPLICATIONS Flashcards
What is Information Extraction
Extract information from text
Identify instances of pre-defined entities (dates, names of people, location) and relations between them
How do IE Systems work overall
Find and understand relevant parts of a document
Produce a structured representation of relevant information (semantically more precise form)
What are Entities
IE Key step 1
Identification of entities of interest
Entities are named
Find strings in text that denote it
typically designed for each class of interest
What are the various methodologies for identifying entities
dictionaries
rule-based: define patterns
machine learning
What are Relations
IE Key Step 2
Extract specific facts, relations and events by linking entities
use of templates and regex and grammar
typically designed around important verbs (find the verbs)
Relies on various features
identification of main syntactic units and their relations
What is NER
Named Entity Recognition
A systems ability to find an entity and group it in either: name, organisation, date, etc
What are the 2 approaches of NER
Knowledge Engineering
-developed by experienced (human) language engineers
Learning Systems
-use statistics or other machine learning
What are 4 NER methods
-dictionary based
-rule-based
-machine learning
-hybrid
NER Methods: Dictionary look-up
System recognises only entities stored in its lists
Adv
-simple, fast, independent
Disadv
-often impossible to enumerate all names (create exhaustive list)
-ambiguous - name variants
NER Methods: Rule-based
Use context clues indicative of specific entity types
but many issues:
- ambiguity in capitalisation, semantic, structure
NER Methods: ML sequence model
Consider named entities as sequence of tokens in text
output : sequence of labels that belong to a token (if it is an entity, what entity)
Example: HNN NER
Any ML model will work to learn the named entities classification, this is just one way
Each word in the sequence is considered a state
States: Use IO(B) tags
Emission probabilities model the likelihood of observing a word given the state we are in
Issue: cant add a lot of features in HMM to discriminate tags
HMM is a simple model: for a state it does not consider its surrounding context, etc.
How does the CRF NER Classifier work
The objective is to learn the weights that maximize the likelihood of the labeled sequences in the training data
taking a global view of the sequence
BERT for NER
BERT is considered a local approach, it does not take into account dependencies between tags
Can tune it to identify entities
It is common to add a CRF layer(or any classifier) on top of BERT
local -> global view
How do we fine tune pre-trained models
“transfer learning”, contains 2 steps:
- pretrain a large neural network in an unsupervised way
- Fine tune the NN on a specific task of interest
eg use BERT and fine tune it on a IOB token classification problem
Very common approach for NER