NLP APPLICATIONS Flashcards

1
Q

What is Information Extraction

A

Extract information from text
Identify instances of pre-defined entities (dates, names of people, location) and relations between them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do IE Systems work overall

A

Find and understand relevant parts of a document
Produce a structured representation of relevant information (semantically more precise form)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are Entities

A

IE Key step 1
Identification of entities of interest
Entities are named
Find strings in text that denote it
typically designed for each class of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the various methodologies for identifying entities

A

dictionaries
rule-based: define patterns
machine learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are Relations

A

IE Key Step 2
Extract specific facts, relations and events by linking entities
use of templates and regex and grammar
typically designed around important verbs (find the verbs)

Relies on various features
identification of main syntactic units and their relations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is NER

A

Named Entity Recognition
A systems ability to find an entity and group it in either: name, organisation, date, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 2 approaches of NER

A

Knowledge Engineering
-developed by experienced (human) language engineers

Learning Systems
-use statistics or other machine learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are 4 NER methods

A

-dictionary based
-rule-based
-machine learning
-hybrid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

NER Methods: Dictionary look-up

A

System recognises only entities stored in its lists

Adv
-simple, fast, independent

Disadv
-often impossible to enumerate all names (create exhaustive list)
-ambiguous - name variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

NER Methods: Rule-based

A

Use context clues indicative of specific entity types
but many issues:
- ambiguity in capitalisation, semantic, structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

NER Methods: ML sequence model

A

Consider named entities as sequence of tokens in text
output : sequence of labels that belong to a token (if it is an entity, what entity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Example: HNN NER

A

Any ML model will work to learn the named entities classification, this is just one way

Each word in the sequence is considered a state
States: Use IO(B) tags
Emission probabilities model the likelihood of observing a word given the state we are in

Issue: cant add a lot of features in HMM to discriminate tags
HMM is a simple model: for a state it does not consider its surrounding context, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does the CRF NER Classifier work

A

The objective is to learn the weights that maximize the likelihood of the labeled sequences in the training data
taking a global view of the sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

BERT for NER

A

BERT is considered a local approach, it does not take into account dependencies between tags
Can tune it to identify entities

It is common to add a CRF layer(or any classifier) on top of BERT
local -> global view

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we fine tune pre-trained models

A

“transfer learning”, contains 2 steps:
- pretrain a large neural network in an unsupervised way
- Fine tune the NN on a specific task of interest

eg use BERT and fine tune it on a IOB token classification problem

Very common approach for NER

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Traditional ML-based sequence labelling vs Deep learning-based sequence labelling

A

Traditional
-requires feature engineering
-no need for pretrained embeddings
-models are more interpretable

Deep Learning
-no need for feature engineering
-makes use of pretrained embedding
-Models cannot be interpreted as features are
implicit in hidden layers

17
Q

What is Domain adaptation

A

There are different entity types in different domains
How do we adapt a model from one domain to another
we have to adjust the output layer, training must be conducted again using both source and target domain, which can be costly
Few shot learning is one of the approaches

18
Q

Performance metrics in NER

A

Precision = correct answers/answers produced
Recall = correct answers/total possible correct
answers

We may also want to take into account partially correct answers - may recognise half of the entity
so we add +1/2 partially correct to the nominator of precision and recall
and +partial to denominator

19
Q

Adv and disadv of hand-built patterns for relations

A

Pros
Human patterns tend to be high-precision
Can be tailored to specific domains

Cons
Human patterns are often low-recall
A lot of work to think of all possible patterns
We’d like better accuracy

20
Q

How do we extract relations

A

Using rules

supervised ML

21
Q

Relation extraction: what are Trigger Words

A

identify specific trigger words mentioned between entities

22
Q

What is Relation Bootsrapping

A

A Semi-supervised method for relation extraction
Minimise reliance on large training set
Give a few examples or a few high precision patterns

Gather a set of seed pairs that are linked by the given relation
Iterate:
1. Find sentences with these pairs
2. Look at the context between or around the pair and generalize the context to create patterns
3. Use the patterns to “grep” for more pairs

23
Q

IE: Some challenges

A

Variability and ambiguity of templates
John Lewis announced - company or person?

Co-references
using different words to refer to same entity
John likes cats, he says the are soft - John/he

24
Q

What is Knowledge Distillation

A

compression technique in ML
Knowledge from a large, complex model (teacher) is transferred to a smaller, more lightweight model (student)
student model is more computationally efficient with the same knowledge
BERT -> BioBERT
(because large models can be computationally challenging to run)

25
Q

What is a few shot approach

A

ML method
a model is trained or fine-tuned with only a very small number of examples per class or task
particularly valuable in scenarios where obtaining extensive labeled data is challenging, expensive, or time-consuming