W10 Future RL & Wrap-up Flashcards
Tabular RL key words
MDP, grid world, cartpole, tabular Q-learning, exploration-exploitation, on/off policy
1) Deep learning brings what to RL?
2) Pros & Cons of model-free methods?
1) methods to break correlations and improve convergence (replay buffer and a separate target network)
2) Algorithms often reach good quality optima, but model-free algorithms have a high sample complexity.
Model-based method summary
Combine planning and learning to improve sample effciency.
For high-dimensional env: use uncertainty modeling & latent models/world models to reduce dimensionality for planning.
Main challenge of DRL?
Manage the combinatorial explosion that occurs when a sequence of decisions is chained together. Finding the right kind of inductive bias can exploit structure in this state space.
3 major challenges
1. Solving larger problems faster (reduce sample complexity with latent-models, curriculum learning in self-play, transfer learning, meta-learning, better exploration through intrinsic motivation)
2. More agents (hierachical, population-based self-play league)
3. Human interaction (explainable AI, generalization)