Week 5 - Sequence Classification Flashcards

1
Q

What is a multinomial classifier

A

many possibilities of class, only one is correct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is multi-class multi-label classification

A

many possibilities of class, assign between 0 and K classes to the text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sequence classification tasks

A

Sentiment analysis
fact verification
relations:
detection of paraphrasing
semantic similarity
textual entailment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is relation classification

A

identifying relations between two entities in text can be considered text classification
if the set of possible relations is constrained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Textual entailment for fact verification

A

Claim
sent to evidence retrieval (queries wikipedia, search engine)
evidence + claim sent to textual entailment classifier
Decides whether evidence entails or contradicts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the main parts of deep learning text classification

A

use embeddings and neural networks
to obtain distributed and contextualised representation
this representation used to predict the probability distribution of the input belonging to class K

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the text classification encoder usually based on

A

averaging of embeddings
cnn/rnn over embeddings
or pre trained language model eg BERT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

architecture of transformers for text classification

A

pass contextualised embeddings into model in the form of a special separator token (CLS)
Into self attention layer
through FFNN or softmaxed linear layer
cross entropy loss is minimised petween predicted and ground truth labels
Weights of the language model are updated with feedforward neural network (fine tuning) (backpropagation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the CLS

A

via self attention, has information of all the other tokens in the sequence
can say it has the “average” sentence representation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the different in architecture for multi label multi class classification

A

Softmax is not used because softmax assumes label dependence on each other
Model probabilities independently using element wise sigmoid and reduce them to single scalar
Often use a threshold other than 0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the issue with long document classification

A

computing attention is quadratic to input size
so LMs are bounded by input size
but many documents are much larger than this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do transformers solve long document issue

A

-Truncation: first/last tokens
-hierarchical approaches
-dedicated architecture: the longformer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the longformer

A

doesnt calculate the attention with respect to each token
computes neighbours left and right instead
can also use a sliding window/dilated window/ global +sliding window
At lower layers - focus on closer neighbourhood; syntactic features
As we go to higher layers, spread attention out; semantic features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the hierarchical approach

A

Long documents are split into n chunks of size m and stride (overlap) s
embedded with BERT
embeddings used as input into RNN classifier
RNN sent into FFNN then softmax

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how do we train the hierarchical approach

A
  1. first finetune BERT on the text chunks of fixed size m
    discard this classifier
  2. fine-tune new RNN on BERT ouputs
    optimise this RNN to predict the correct classes
    (freeze BERT parameters)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the most common metric of text classification

A

Accuracy
can use class-balanced accuracy, precision, recall, f1

17
Q

In single label classification what is accuracy equivalent to

A

micro averaged f1 score

18
Q

what is the issue with accuracy/f1 score

A

they hide class imbalance

19
Q

What accuracy should we use for imbalanced classes

A

MCC - Matthews Correlation Coefficient
Measure how well our predictions correlate with ground truth labels beyond chance
Also generalises to multi class labels
however scores are ambiguous beyond 0,1,-1 - should just use it for comparison

20
Q

What does MCC score 1 mean

A

perfect correlation, all labels correct

21
Q

What does MCC score 0 mean

A

random guess

22
Q

what does MCC score -1

A

perfect anti correlation
everything is wrong
(probably accidentally flipped label somewhere)