CCC Flashcards
What are the properties of DEC-POMDPs
Combines elements of game theory and POMDPs
NEXP-Complete
Agents can benefit from communication
Optimal solution balance cost of communicating w/ cost of not communicating
Some algorithms, heuristics, applications are known
What is a DEC-POMDP
A POMDP with a set of agents taking actions simultaneously
What is inverse reinforcement learning?
An agent uses the environment and induced behavior to infer a reward function
Briefly describe Maximum Likelihood Inverse RL (MLIRL)
Guess rewards -> Compute policy -> Measure Pr(D|Pi) -> Gradient on R -> Guess rewards
What is the general case (multinomial) probability of an action being optimal in policy shaping?
Pr(a|d_a) = C^(delta_a)/[C^(delta_a) + (1-C)^(delta_a)]
What is the only one case (multi-binomial) probability of an action being optimal in policy shaping?
P(a|d_a) ~= C^(delta_a) * (1-C)^[Sum_(j!=s) delta_j]
How do you represent trajectories as an MDP?
States: partial sequences
Actions: story actions
Model: player model
rewards: author evaluations