Session 2 Flashcards

Question 1

Q

Named entity recognition

Answer

A

find spans of text that constitute proper names and tag the type of the entity. Four entity tags are most common: PER (person), LOC (location), ORG (organization), or GPE (geopolitical entity)

Question 2

Q

POS tagging

Answer

A

parts of speech tagging
is the process of assigning a part-of-speech label to each of a sequence of words = taking a sequence of words and assigning each word a part of speech like NOUN or VERB

Question 3

Q

RNN

Answer

A

any NN that contains a cycle within its network connections, meaning that the value of some unit is dependent on its own earlier outputs as an input = predict next output dependent on the previous output (like having a memory of what has been so far)
All RNNs have same weights (U, V & W)
Input & number of RNNs can be arbitrary lengths

Question 4

Q

RNN Relevance

Answer

A

More efficient than transformer-based models
Trainable from scratch (on own machine)

Question 5

Q

RNN pro

Answer

A

Arbitrary input length can be used
Can be combined with other NNs
Can handle a variety of task-types e.g. classification, generation
Performance
No limited context (e.g. FFN: outside context window of e.g. 3 words is not considered) & learns meaning of words in phrases combined together <-> FNN

Question 6

Q

RNN limit

Answer

A

Uni-directional (later info can be informative e.g. “the old man the boat” -> need the boat to know that the old people do smth versus the old man)
Arbitrary long context in practice hard: vanishing gradient (multiplying many things -> 0)
Computationally inconvenient: depending on output from previous input -> can’t calculate simultaneously

Question 7

Q

LSTM

Answer

A

Long Short-Term Memory Network
Ct-1 = context vector = what previous block wanted us to remember (important part of whole information)
3 gates (each one is “masking” the input & decides what to take from previous):
each gate consists of a feed- forward layer, followed by a sigmoid activation function, followed by a pointwise multiplication with the layer being gate

Question 8

Q

LSTM gate

Answer

A

1. Forget gate = delete info from context that is not longer needed
1. Add gate = select information that is added to current context
1. Output gate = select info that is required for current hidden state; output which can be used for label etc.

Question 9

Q

LSTM advantage over RNN

Answer

A

RNN: despite having access to the entire preceding sequence, the information encoded in hidden states tends to be fairly local, more relevant to the most recent parts of the input sequence and recent decisions because hidden layer, are being asked to perform two tasks simultaneously: provide info useful for the current, and updating & carrying forward info for future decisions
RNN: vanishing gradient problem

Question 10

Q

BiLSTM

Answer

A

bidirectional LSTM
consists of two LSTMs that do not share weights
Limit: BOW assumption (word must exist in training)
Solution: add unknown word token or
2, train model on Character level (can do both character & word level together

Question 11

Q

Multi-task learning

Answer

A

solving multiple tasks while sharing common patterns

Question 12

Q

Multi-task learning. how?

Answer

A

pre-training on raw text
fine-tuning on annotated data

Question 13

Q

Sequential multi-task learning

Answer

A

Train on a task & save parameters
Use (part of) saved parameters to initialize new models & train for new task

Question 14

Q

Multi-task learning - why?

Answer

A

Efficiency: 1 model for multiple tasks, no separate training (pre-training is very expensive, needs lots of time)
Performance: many tasks related to each other, beneficial to share info between tasks
Needs less annotated data which is expensive