Deep Learning Flashcards by Anna L

How can we use deep learning for nlp

data driven approach

take input text, embed it in high dimensional vector space

run prediction model on this

both the embedding and the prediction model are usually neural networks

How well did you know this?

Not at all

Perfectly

How can use use time series for nlp

time series is a set of data points in time order. these datapoints for us are words and the time order is the appearance order in the text

How well did you know this?

Not at all

Perfectly

What tasks can we perform using deep learning (4)

sequence classification
sequence labelling
sequence extraction
sequence to sequence translation

How well did you know this?

Not at all

Perfectly

what is sequence classification

the output is the probability distribution of classes the text belongs to.

How well did you know this?

Not at all

Perfectly

what span extraction and how is it treated as a classification problem

return 2 probability distributions,
one for being the start of span
one for being the end of span

How well did you know this?

Not at all

Perfectly

what does a ml model for sequence labelling output

the output is a probability distribution for each token over classes

How well did you know this?

Not at all

Perfectly

examples of sequence labelling

such as POS tagging, named entity recognition, open information extraction, question type classification

How well did you know this?

Not at all

Perfectly

how is sequence to sequence to sequence translation performed

input is the sequence and output is another sequence

How well did you know this?

Not at all

Perfectly

applications for sequence to sequence translation

translation, summarisation, text generation, question answering

How well did you know this?

Not at all

Perfectly

what is the major disadvantage of bag of words

we lose the ordering

How well did you know this?

Not at all

Perfectly

what is a rnn

when processing the last token we use the output of the previous token

How well did you know this?

Not at all

Perfectly

what is the vanishing gradient problem

we get very small gradients, making the updates zero

How well did you know this?

Not at all

Perfectly

why is the vanishing gradient problem prevalent in nlp

because of the long term dependencies

How well did you know this?

Not at all

Perfectly

what is the solution to the vanishing gradient problem

lstm

How well did you know this?

Not at all

Perfectly

what is a lstm

uses a context vector that retains information from past calculations.
at each step it decides how much to add to the context vector and whether anything should be disgarded

How well did you know this?

Not at all

Perfectly

what is a gru

Study These Flashcards

gated recurrent unit. similar to lstm in performance but fewer parameters.

what is cataphora

Study These Flashcards

use of a word or phrase that stands for a later word

what is the main disadvantage of rnn, lstm and gru

Study These Flashcards

they only take into account previous, not future, data

what is a birnn, how do we create one

Study These Flashcards

bidirectional rnn. stack forwards and backward rnns on top of each other

what is self attention

Study These Flashcards

rather than processing a sequence left to right, learn the relative importance of a token with respect t the others.

To maintain the ordering, we encode the position

what is a word embedding

Study These Flashcards

a static map from string to vector trained on cooccurrence in a training corpus

what is language modelling

Study These Flashcards

autocomplete

predict the next token given previous

what information does language modelling combine (3)

Study These Flashcards

syntax, semantic, and self supervision

how do we perform language modelling using a bilstm

Study These Flashcards

predict missing word using left and right of blank

what bilstm model performs language modelling

elmo

what does a bilstm model for lanauge modelling return, how is it optimised

returns 2 probability distributions that we combine we want to maximise the log likelihood of the expected token jointly for both directions

what is a transformer language model

use self attention to replace the rnn in elmo

why do we replace the rnn with self attention

does not require back propagation this makes optimisation faster meaning bigger models on large data bigger language models mean better performance

what is the problem when using deep learning on nlp

spurious correlations. there is no way for a nn to determine spurious vs robust correlations

how do we overcome the issue of spurious correlations

more data

what are the two problems with neural networks

- out of distribution generalisation | - interpretability

Deep Learning Flashcards

(31 cards)