Intelligent Agents Flashcards
1
Q
Kan MDPs ha kända states?
A
Ja
2
Q
When is it best to use Q-Learning?
A
When the optimal action depend on the current state and we dont beforehand know the reward of each state.
3
Q
Is Q-learning modelfree?
A
Yes
4
Q
What are the advantages of thompson samling over UBC?
A
It is extensible for contexutal bandits
5
Q
Why is state factorization important?
A
Allows us to handle combinatorial explosion of states
6
Q
What does mixed policies mean?
A
That we assign propabilities to policies rather than choose policiy entirely.
7
Q
What are need for a policy to be differentiable?
A
That it’s mixed
8
Q
How does policicy gradients work?
A