CS7642_Week2 Flashcards

Question 1

Q

How do we evaluate a learner?

Answer

A

Question 2

Q

What are the 3 “classes” of solution methods for solving RL problems? (bonus: what category do TD methods fall into?)

Answer

A

Question 3

Q

What properties must the learning rate have for RL?

Answer

A

2. SUM OF SQUARES of learning rate values must be finite

Question 4

Q

Names some of the differences between TD(0) and TD(1)

Answer

A

TD(0):

TD(1):

Question 5

Q

What values lambda tend to work well (empirically speaking) when used in TD(lambda)?

Question 6

Q

Does Q-learning always converge? If so, what does it converge to?

Answer

A

Yes, it does converge, and in fact converges to the optimal value Q*

Question 7

Q

What is non-expansion/contractions?

Answer

A

TODO: Watch lesson 5 on convergence (need to particularly pay attention to stuff on contractions and non-expansion at a conceptual level)

Question 8

Q

What things are contraction mappings / non-expansions?

Answer

A

Order statistics, FIXED convex combinations

(8 cards)