Lecture 12 Flashcards

Question 1

Q

Encoder:

Answer

A

a LSTM that encodes the input sequence to a fixed-length internal
representation W.

Question 2

Q

Decoder:

Answer

A

another LSTM that takes the internal representation W to extract the output sequence from that vector

Question 3

Q

Question Answering

Answer

A

we ingest a sentence with several words, processing the sentence with the help of a recurrent unit such as a LSTM. We preserve the “state” that results from ingesting that sentence into our trained model. The resulting context vector then serves as the context for a decoder module, also made
of LSTMs or GRUs. If we prompt the decoder, using a particular “state” vector and a “start of sentence” marker, we can generate output tokens (plus an end of sentence marker).

Question 4

Q

Seq2Seq

Answer

A

At a time step in the Encoder, the RNN takes a word vector (xi) from the input sequence and a hidden state (Hi-1) from the previous time step; the hidden state is updated (Hi)
The context vector to the decoder is the hidden state from the last unit of the encoder
(without the Attention mechanism) or the weighted sum of the hidden states of the encoder (with the Attention mechanism)

Question 5

Q

Inference

Answer

A

The task of applying a trained model to generate a translation is
called inference

Question 6

Q

MT Keras Overview – Input/Output

Answer

A

Source sequence for encoder:
x= (x1, x2,…, x|x|) initially one-hot
encoded that usually feeds into a
word embedding layer
Target sequence:
y= (y1, y2,…, y|y|) exists in two
versions – decoder input has a start sentence token, decoder output has an end sentence token; these two sequences are offset by one time step
Final decoder output goes through a softmax layer that designates the probabilities of each entry in the vocabulary

Question 7

Q

Greedy search:

Answer

A

Choose the output word for
each time step with the highest p value

Question 8

Q

Beam search:

Answer

A

Choose the k highest words
(5<=k<=10) at the next time step; assemble an overall sequence with the max probability

Question 9

Q

Attention Mechanisms

Answer

A

Unfortunately, the context vector between the two models is not always sufficient to produce a great result. Attention addresses this bottleneck.

Question 10

Q

Drawback of the “Vanilla” Encoder-Decoder

Answer

A

In the “vanilla” seq2seq model shown earlier, the decoder
takes the final hidden state of the encoder (the context vector)
and uses that to produce the target sentence.
The fixed-size context vector represents the final time step.
Loosely speaking, the encoding process gives slightly more
weight to each successive term in the input sentence.
Earlier terms may be more important than later, though, in
driving the accuracy of the output of the decoder.

Question 11

Q

Attention Mechanism

Answer

A

An “attention mechanism” makes all hidden states of the encoder
visible to the decoder
- Embedding all the words in the input (represented by hidden states) while creating the context vector
- a learning mechanism that helps the decoder identifies where to
pay attention attention in the encoding when predicting at each
time step

Question 12

Q

Fully Attention-Based Approaches

Answer

A

What if we simply avoided the use of recurrent neural layers (such as LSTMs) and simply used attention layers instead? This was the insight behind BERT and other transformer-based” approaches. Transformer-based models have increased performance on some tasks in comparison to recurrent networks with attention.

Question 13

Q

Transformer Architectures Proliferate

Answer

A

The basic transformer architecture from Vaswani et al. (2017; Advances in neural information processing systems) has morphed into numerous variations applied to a variety of tasks
In addition to NLP, transformers
have been applied to genetics,
computer vision, signal processing,
video analysis
In 2021 alone, more than 6000
papers have been published on
applications and improvements to
BERT

Question 14

Q

Lecture 12 Flashcards

Sequence to Sequence, Attention, Transformer (14 cards)