Deep Sequence Modelling Flashcards
What is Sequence Model?
SM split data into …
SM split data into small chunks / sequences of data to solve classification problems
What are the 4 problems SM solve?
- one-to-one - binary class
- many-to-one - sentiment class
- one-to-many - Image caption
- many-to-many - Machine translation
Within SM, what is neurons with recurrence?
Neurons wif Recurrence is the computation of … curr input and previous output …
Neurons with recurrence is the computation at each time step of the product of current input and the output of previous time step(Past memory).
How does RNNs work?
Applys a ___ relation at every ___ to process a sequence.
It apply a recurrence relation at every time step to process a sequence:
ht = fw(xt, ht-1)
What is RNNs intuition?
Give the idea of process
Input Weight vector -> Update hidden state -> Output vector/Pred output
What is the computation of RNN across time?
- __ weight matrices
- __ across time step
- When forwardpropa, compute __ with backpropa
- Sum total of __ across all sequences
- Reuse same weight matrices.
- Re-update across Time Step.
- When forwardpropa, compute loss with backpropagation.
- Sum total of loss across all sequences.
What are the 4 Design Criteria for SM?
- Handle __ length sequence
- Track ____ dependencies
- Maintain info about __
- Share __ across the seq
- Handle variable length sequence
- Track long-term dependencies
- Maintain info about order
- Share parameters across the sequence
What is the technique called that transform language into indexes?
Give 1 word
Embedding / Encoding
What are the 4 criteria to model sequences?
RNNs meet these criterias
- Handle __ seq
- Track __ dependencies
- Maintain infor about __
- Share __ across the seq
- Handle variable-length sequences
- Track long-term dependecies
- Maintain information about order
- Share parameters across the sequence
The standard RNN gradient flow consists of repeated computation of weight matrices. What are the 2 issues with this?
- Large values cause __ gradients
- Small values cause __ gradients
- Very large values will cause exploding gradients.
- Small values will cause vanishing gradients.
Why is vanishing gradients a big problem?
It causes the model to…
It causes the model to lose the ability to learn something useful.
What is the most robust way to mitigate vanishing gradients?
___ cells: Use __ to add or remove info.
Gated cells: Use gates to selectively add or remove info within each recurrent unit.
What is the Long short-term memory (LSTM) key concept?
__ & __ information
Forget & Store information
How to build a more effective Sequence Model?
Use __ to model seq without recurrence.
Use self-attention to model sequence without recurrence.
What is the architecture of Transformer in AI?
Self-attention is the foundation mechanism build in NN of Transformer.