Deep Sequence Modelling Flashcards

1
Q

What is Sequence Model?

SM split data into …

A

SM split data into small chunks / sequences of data to solve classification problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 problems SM solve?

A
  1. one-to-one - binary class
  2. many-to-one - sentiment class
  3. one-to-many - Image caption
  4. many-to-many - Machine translation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Within SM, what is neurons with recurrence?

Neurons wif Recurrence is the computation of … curr input and previous output …

A

Neurons with recurrence is the computation at each time step of the product of current input and the output of previous time step(Past memory).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does RNNs work?

Applys a ___ relation at every ___ to process a sequence.

A

It apply a recurrence relation at every time step to process a sequence:
ht = fw(xt, ht-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is RNNs intuition?

Give the idea of process

A

Input Weight vector -> Update hidden state -> Output vector/Pred output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the computation of RNN across time?

  1. __ weight matrices
  2. __ across time step
  3. When forwardpropa, compute __ with backpropa
  4. Sum total of __ across all sequences
A
  1. Reuse same weight matrices.
  2. Re-update across Time Step.
  3. When forwardpropa, compute loss with backpropagation.
  4. Sum total of loss across all sequences.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 4 Design Criteria for SM?

  1. Handle __ length sequence
  2. Track ____ dependencies
  3. Maintain info about __
  4. Share __ across the seq
A
  1. Handle variable length sequence
  2. Track long-term dependencies
  3. Maintain info about order
  4. Share parameters across the sequence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the technique called that transform language into indexes?

Give 1 word

A

Embedding / Encoding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 4 criteria to model sequences?

RNNs meet these criterias

  1. Handle __ seq
  2. Track __ dependencies
  3. Maintain infor about __
  4. Share __ across the seq
A
  1. Handle variable-length sequences
  2. Track long-term dependecies
  3. Maintain information about order
  4. Share parameters across the sequence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The standard RNN gradient flow consists of repeated computation of weight matrices. What are the 2 issues with this?

  1. Large values cause __ gradients
  2. Small values cause __ gradients
A
  1. Very large values will cause exploding gradients.
  2. Small values will cause vanishing gradients.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is vanishing gradients a big problem?

It causes the model to…

A

It causes the model to lose the ability to learn something useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the most robust way to mitigate vanishing gradients?

___ cells: Use __ to add or remove info.

A

Gated cells: Use gates to selectively add or remove info within each recurrent unit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Long short-term memory (LSTM) key concept?

__ & __ information

A

Forget & Store information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to build a more effective Sequence Model?

Use __ to model seq without recurrence.

A

Use self-attention to model sequence without recurrence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the architecture of Transformer in AI?

A

Self-attention is the foundation mechanism build in NN of Transformer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 4 steps of self-attention with NN?

  1. Encode __ info
  2. Extract __, __, __ (Q, K, V)
  3. Compute __ weighting
  4. Extracts features with __
A
  1. Encode position info
  2. Extract Query, Key, Value
  3. Compute attention weighting
  4. Extracts features with high attention