Week 5 - Sequence Classification Flashcards

Question 1

Q

What is a multinomial classifier

Answer

A

many possibilities of class, only one is correct

Question 2

Q

what is multi-class multi-label classification

Answer

A

many possibilities of class, assign between 0 and K classes to the text

Question 3

Q

Sequence classification tasks

Answer

A

Sentiment analysis
fact verification
relations:
detection of paraphrasing
semantic similarity
textual entailment

Question 4

Q

What is relation classification

Answer

A

identifying relations between two entities in text can be considered text classification
if the set of possible relations is constrained

Question 5

Q

Textual entailment for fact verification

Answer

A

Claim
sent to evidence retrieval (queries wikipedia, search engine)
evidence + claim sent to textual entailment classifier
Decides whether evidence entails or contradicts

Question 6

Q

what are the main parts of deep learning text classification

Answer

A

use embeddings and neural networks
to obtain distributed and contextualised representation
this representation used to predict the probability distribution of the input belonging to class K

Question 7

Q

What is the text classification encoder usually based on

Answer

A

averaging of embeddings
cnn/rnn over embeddings
or pre trained language model eg BERT

Question 8

Q

architecture of transformers for text classification

Answer

A

pass contextualised embeddings into model in the form of a special separator token (CLS)
Into self attention layer
through FFNN or softmaxed linear layer
cross entropy loss is minimised petween predicted and ground truth labels
Weights of the language model are updated with feedforward neural network (fine tuning) (backpropagation)

Question 9

Q

What is the CLS

Answer

A

via self attention, has information of all the other tokens in the sequence
can say it has the “average” sentence representation

Question 10

Q

What is the different in architecture for multi label multi class classification

Answer

A

Softmax is not used because softmax assumes label dependence on each other
Model probabilities independently using element wise sigmoid and reduce them to single scalar
Often use a threshold other than 0.5

Question 11

Q

What is the issue with long document classification

Answer

A

computing attention is quadratic to input size
so LMs are bounded by input size
but many documents are much larger than this

Question 12

Q

How do transformers solve long document issue

Answer

A

-Truncation: first/last tokens
-hierarchical approaches
-dedicated architecture: the longformer

Question 13

Q

What is the longformer

Answer

A

doesnt calculate the attention with respect to each token
computes neighbours left and right instead
can also use a sliding window/dilated window/ global +sliding window
At lower layers - focus on closer neighbourhood; syntactic features
As we go to higher layers, spread attention out; semantic features

Question 14

Q

What is the hierarchical approach

Answer

A

Long documents are split into n chunks of size m and stride (overlap) s
embedded with BERT
embeddings used as input into RNN classifier
RNN sent into FFNN then softmax

Question 15

Q

how do we train the hierarchical approach

Answer

A

first finetune BERT on the text chunks of fixed size m
discard this classifier
fine-tune new RNN on BERT ouputs
optimise this RNN to predict the correct classes
(freeze BERT parameters)

Question 16

Q

What is the most common metric of text classification

Answer

A

Accuracy
can use class-balanced accuracy, precision, recall, f1

Question 17

Q

In single label classification what is accuracy equivalent to

Answer

A

micro averaged f1 score

Question 18

Q

what is the issue with accuracy/f1 score

Answer

A

they hide class imbalance

Question 19

Q

What accuracy should we use for imbalanced classes

Answer

A

MCC - Matthews Correlation Coefficient
Measure how well our predictions correlate with ground truth labels beyond chance
Also generalises to multi class labels
however scores are ambiguous beyond 0,1,-1 - should just use it for comparison

Question 20

Q

What does MCC score 1 mean

Answer

A

perfect correlation, all labels correct

Question 21

Q

What does MCC score 0 mean

Answer

A

random guess

Question 22

Q

what does MCC score -1

Answer

A

perfect anti correlation
everything is wrong
(probably accidentally flipped label somewhere)

Brainscape's Knowledge GenomeTM

Week 5 - Sequence Classification Flashcards

Brainscape's Knowledge Genome^TM