W10 Future RL & Wrap-up Flashcards

Question 1

Q

Tabular RL key words

Answer

A

MDP, grid world, cartpole, tabular Q-learning, exploration-exploitation, on/off policy

Question 2

Q

1) Deep learning brings what to RL?
2) Pros & Cons of model-free methods?

Answer

A

1) methods to break correlations and improve convergence (replay buffer and a separate target network)
2) Algorithms often reach good quality optima, but model-free algorithms have a high sample complexity.

Question 3

Q

Model-based method summary

Answer

A

Combine planning and learning to improve sample effciency.
For high-dimensional env: use uncertainty modeling & latent models/world models to reduce dimensionality for planning.

Question 4

Q

Main challenge of DRL?

Answer

A

Manage the combinatorial explosion that occurs when a sequence of decisions is chained together. Finding the right kind of inductive bias can exploit structure in this state space.

3 major challenges
1. Solving larger problems faster (reduce sample complexity with latent-models, curriculum learning in self-play, transfer learning, meta-learning, better exploration through intrinsic motivation)
2. More agents (hierachical, population-based self-play league)
3. Human interaction (explainable AI, generalization)