Lecture 17 - Reinforcement Learning Flashcards

Question 1

Q

Why is reinforcement learning an important part of AI?

Answer

A

Almost all “natural learning” is done by reinforcement

e.g. learning to read, play chess etc.

Question 2

Q

What are the properties of reinforcement learning?

Answer

A

Agent is learning to choose a sequence of actions

Ultimate consequences of an action may not be apparent until the end

When a reward is achieved it may not be due to the most recent action.

No predefined set of training samples/examples

Question 3

Q

What is the credit assignment problem?

Answer

A

When a reward is achieved it may not be due to the most recent action, but one performed earlier in the sequence.

Question 4

Q

Describe the components of a Markov Decision Process

Answer

A

Agent operates in a domain represented as a set of distinct states, S

Agent has a set of actions it can perform, A

Time advances in discrete steps

At time t the agent knows the current state s_t and must select an action to perform

When action a_t is performed the agent receives a reward r_t which may be positive, negative or zero. Reward given depends on the current state and action so can be determined by a reward function R: r_t= R(s_t, a_t)

New state s_t+1depends on the last state and action, so can be determined by a transition function T: s_t+1 = T(s_t, a_t)

Question 5

Q

What does an agent in a Markov Decision Process acquire?

Answer

A

A control policy; i.e. a function that determines the best action given a current state

Question 6

Q

Describe the “immediate reward” strategy for determining the best action in a Markov Decision Process, and why it is/isn’t usually used

Answer

A

Choosing the action with the highest immediate reward

Produces a good short term payoff but might not be optimal in the long run

Question 7

Q

Describe the “total payoff” strategy for determining the best action in a Markov Decision Process, and why it is/isn’t usually used

Answer

A

Maximise the total payoff by choosing a sequence of states that has a large sum of rewards

Not realistic because it will consider a reward in the very distant future just as valuable as one received immediately which is not usually the case

Question 8

Q

Describe the “discounted cumulative reward” strategy for determining the best action in a Markov Decision Process, and why it is/isn’t usually used

Answer

A

Same as total payoff except distant rewards are worth less than more immediate ones

Question 9

Q

What is the learning task in Markov Decision Processes?

Answer

A

To discover the optimal control policy, i.e. the best action for each state

Question 10

Q

If the agent in a Markov Decision Process knows the transition function, the reward function and the discounted value of each state then V* can be used as

Answer

A

an evaluation function for actions

Question 11

Q

If an agent in a Markov decision process does not know T or R, no form of evaluation function that requires _____________ is possible

Answer

A

looking ahead

Question 12

Q

What is the Q function?

Answer

A

An evaluation function of both state and function that estimates the total payoff from choosing a particular action

Question 13

Q

What are two possible Action Selection strategies in Markov Decision Processes?

Answer

A

Uniform Random Selection

Select Highest Expected Cumulative Reward

Question 14

Q

What is the advantage and disadvantage of using uniform random selection in Markov Decision Processes?

Answer

A

Advantage: Will explore entire state space and hence satisfy convergence theorem

Disadvantage: May spend a great deal of time learning the value of transitions that are not optimal

Question 15

Q

What is the advantage and disadvantage of Selecting the Highest Expected Cumulative Reward as the action selection strategy in markov decision processes?

Answer

A

Advantage: Concentrates resources on apparently useful transitions

Disadvantage: May ignore even better pathways which haven’t been explored, and does not satisfy convergence theorem

Lecture 17 - Reinforcement Learning Flashcards

(15 cards)