9 - Deep RL Flashcards
Deep Neural Networks (DNN) are function approximators that are a composition of a number of _______________ {features, functions}.
Deep Neural Networks (DNN) are function approximators that are a composition of a number of functions.
The nonlinearity in a DNN is called the ___________ function.
The nonlinearity in a DNN is called the activation function.
Why are Convolutional Neural Network (CNN) popular in Deep Learning and Deep Reinforcement Learning?
They are used throughout computer vision tasks since we can reduce the space-time complexity and include more of the surrounding local structure whereas in a traditional neural network we have trouble with.
What is a stride for a CNN?
A stride is the step size that a filter or mask take when moving across an array of input pixels. For example, a stride of length 1 means we move a 5x5 mask over 1 pixel length at a time.
What are the 3 elements of the Deadly Triad?
1) Function approximation
2) Bootstrapping
3) Off-policy learning
…when all combined can lead to learning that diverges in their value estimates and become unbounded.
For DQN in Atari, what is the specific s in Q(s, a)?
Input state s is a stack of raw pixels from the last 4 (or whatever chosen number of) frames.
What is Experience Replay in DQNs?
Experience Replay is one method that is used to help us overcome unstable learning. DNNs easily overfit on current episodes. To solve this problem, Experience Replay stores experiences including state transitions, rewards and actions, and makes mini-batches to update the neural net. It reduces correlation between experiences in updating the DNN, increases learning speed with mini batch, and reuses past transitions to avoid catastrophic forgetting.
In the seminal 2015 paper by Mnih et al, the DQN model was better than humans on ______________(none, some, most, all) Atari games.
In the seminal 2015 paper by Mnih et al, the DQN model was better than humans on most Atari games.
According to the ablation study, what as the most important study feature?
The most important study feature was found to be Replay (an “ablation” study is when some part of the model or algorithm is removed and we look at whether the performance changes).