NLP APPLICATIONS Flashcards by Isabel Draper

What is Information Extraction

Extract information from text
Identify instances of pre-defined entities (dates, names of people, location) and relations between them

How well did you know this?

Not at all

Perfectly

How do IE Systems work overall

Find and understand relevant parts of a document
Produce a structured representation of relevant information (semantically more precise form)

How well did you know this?

Not at all

Perfectly

What are Entities

IE Key step 1
Identification of entities of interest
Entities are named
Find strings in text that denote it
typically designed for each class of interest

How well did you know this?

Not at all

Perfectly

What are the various methodologies for identifying entities

dictionaries
rule-based: define patterns
machine learning

How well did you know this?

Not at all

Perfectly

What are Relations

IE Key Step 2
Extract specific facts, relations and events by linking entities
use of templates and regex and grammar
typically designed around important verbs (find the verbs)

Relies on various features
identification of main syntactic units and their relations

How well did you know this?

Not at all

Perfectly

What is NER

Named Entity Recognition
A systems ability to find an entity and group it in either: name, organisation, date, etc

How well did you know this?

Not at all

Perfectly

What are the 2 approaches of NER

Knowledge Engineering
-developed by experienced (human) language engineers

Learning Systems
-use statistics or other machine learning

How well did you know this?

Not at all

Perfectly

What are 4 NER methods

-dictionary based
-rule-based
-machine learning
-hybrid

How well did you know this?

Not at all

Perfectly

NER Methods: Dictionary look-up

System recognises only entities stored in its lists

Adv
-simple, fast, independent

Disadv
-often impossible to enumerate all names (create exhaustive list)
-ambiguous - name variants

How well did you know this?

Not at all

Perfectly

NER Methods: Rule-based

Use context clues indicative of specific entity types
but many issues:
- ambiguity in capitalisation, semantic, structure

How well did you know this?

Not at all

Perfectly

NER Methods: ML sequence model

Consider named entities as sequence of tokens in text
output : sequence of labels that belong to a token (if it is an entity, what entity)

How well did you know this?

Not at all

Perfectly

Example: HNN NER

Any ML model will work to learn the named entities classification, this is just one way

Each word in the sequence is considered a state
States: Use IO(B) tags
Emission probabilities model the likelihood of observing a word given the state we are in

Issue: cant add a lot of features in HMM to discriminate tags
HMM is a simple model: for a state it does not consider its surrounding context, etc.

How well did you know this?

Not at all

Perfectly

How does the CRF NER Classifier work

The objective is to learn the weights that maximize the likelihood of the labeled sequences in the training data
taking a global view of the sequence

How well did you know this?

Not at all

Perfectly

BERT for NER

BERT is considered a local approach, it does not take into account dependencies between tags
Can tune it to identify entities

It is common to add a CRF layer(or any classifier) on top of BERT
local -> global view

How well did you know this?

Not at all

Perfectly

How do we fine tune pre-trained models

“transfer learning”, contains 2 steps:
- pretrain a large neural network in an unsupervised way
- Fine tune the NN on a specific task of interest

eg use BERT and fine tune it on a IOB token classification problem

Very common approach for NER

How well did you know this?

Not at all

Perfectly

Traditional ML-based sequence labelling vs Deep learning-based sequence labelling

Study These Flashcards

Traditional
-requires feature engineering
-no need for pretrained embeddings
-models are more interpretable

Deep Learning
-no need for feature engineering
-makes use of pretrained embedding
-Models cannot be interpreted as features are
implicit in hidden layers

What is Domain adaptation

Study These Flashcards

There are different entity types in different domains
How do we adapt a model from one domain to another
we have to adjust the output layer, training must be conducted again using both source and target domain, which can be costly
Few shot learning is one of the approaches

Performance metrics in NER

Study These Flashcards

Precision = correct answers/answers produced
Recall = correct answers/total possible correct
answers

We may also want to take into account partially correct answers - may recognise half of the entity
so we add +1/2 partially correct to the nominator of precision and recall
and +partial to denominator

Adv and disadv of hand-built patterns for relations

Study These Flashcards

Pros
Human patterns tend to be high-precision
Can be tailored to specific domains

Cons
Human patterns are often low-recall
A lot of work to think of all possible patterns
We’d like better accuracy

How do we extract relations

Study These Flashcards

Using rules

supervised ML

Relation extraction: what are Trigger Words

Study These Flashcards

identify specific trigger words mentioned between entities

What is Relation Bootsrapping

Study These Flashcards

A Semi-supervised method for relation extraction
Minimise reliance on large training set
Give a few examples or a few high precision patterns

Gather a set of seed pairs that are linked by the given relation
Iterate:
1. Find sentences with these pairs
2. Look at the context between or around the pair and generalize the context to create patterns
3. Use the patterns to “grep” for more pairs

IE: Some challenges

Study These Flashcards

Variability and ambiguity of templates
John Lewis announced - company or person?

Co-references
using different words to refer to same entity
John likes cats, he says the are soft - John/he

What is Knowledge Distillation

Study These Flashcards

compression technique in ML
Knowledge from a large, complex model (teacher) is transferred to a smaller, more lightweight model (student)
student model is more computationally efficient with the same knowledge
BERT -> BioBERT
(because large models can be computationally challenging to run)

What is a few shot approach

ML method a model is trained or fine-tuned with only a very small number of examples per class or task particularly valuable in scenarios where obtaining extensive labeled data is challenging, expensive, or time-consuming

NLP APPLICATIONS Flashcards

(25 cards)