CCC Flashcards

Question 1

Q

What are the properties of DEC-POMDPs

Answer

A

Combines elements of game theory and POMDPs
NEXP-Complete
Agents can benefit from communication
Optimal solution balance cost of communicating w/ cost of not communicating
Some algorithms, heuristics, applications are known

Question 2

Q

What is a DEC-POMDP

Answer

A

A POMDP with a set of agents taking actions simultaneously

Question 3

Q

What is inverse reinforcement learning?

Answer

A

An agent uses the environment and induced behavior to infer a reward function

Question 4

Q

Briefly describe Maximum Likelihood Inverse RL (MLIRL)

Answer

A

Guess rewards -> Compute policy -> Measure Pr(D|Pi) -> Gradient on R -> Guess rewards

Question 5

Q

What is the general case (multinomial) probability of an action being optimal in policy shaping?

Answer

A

Pr(a|d_a) = C^(delta_a)/[C^(delta_a) + (1-C)^(delta_a)]

Question 6

Q

What is the only one case (multi-binomial) probability of an action being optimal in policy shaping?

Answer

A

P(a|d_a) ~= C^(delta_a) * (1-C)^[Sum_(j!=s) delta_j]

Question 7

Q

How do you represent trajectories as an MDP?

Answer

A

States: partial sequences
Actions: story actions
Model: player model
rewards: author evaluations