Reinforcement Learning Flashcards

Question 1

Q

what is reinforcement learning?

Answer

A

an agent acts in an environment and is rewarded depending on its actions

Question 2

Q

stochastic environment

Answer

A

no guarantee of same outcome if taking the same action in the same state

Question 3

Q

what is the agent given in rl?

Answer

A

initial state
set of possible actions
reward associated with each state
not given transition model

Question 4

Q

state

Answer

A

state of the environment as perceived by the agent

Question 5

Q

policy

Answer

A

strategy that defines the behavior of an agent by mapping states to actions

guides the agent on what actions to take in each state to maximize cumulative rewards over time

“decision-making function” of the agent

Question 6

Q

reward

Answer

A

feedback signals that an agent received from environment after taking an action
quantified value

Question 7

Q

utility

Answer

A

subjective value agent places on being in a given state capturing both immediate rewards and future (discounted) rewards

Question 8

Q

discount factor

Answer

A

value between 0 and 1 to determine how importance placed on future rewards
0 being only care about immediate rewards
closer to 1 agent considers future rewards almost as important as current reward

Question 9

Q

what is the utility of the terminal state?

Answer

A

just reward of that state

Question 10

Q

utility of a state formula

Answer

A

U(st)= rst + yU(st+1)

Question 11

Q

q-value

Answer

A

expected utility of taking a particular action in a state

Question 12

Q

q table

Answer

A

table of q values for all possible combinations of actions and states

Question 13

Q

q table is used as the agents

Question 14

Q

how is the q table used to make an action?

Answer

A

chose action with largest q value and this max expected reward over time

Question 15

Q

softmax

Answer

A

a row of q table can be transformed into a probability distribution of actions using softmax function

Question 16

Q

agent uses softmax to

Answer

Study These Flashcards

A

balance exploration and exploitation

Question 17

Q

softmax is alternative to just selecting

Answer

Study These Flashcards

A

action with highest q value

Question 18

Q

Temperature parameter

Answer

Study These Flashcards

A

T>0
higher encourages exploration by making action probability more even
lower encourages exploitation as action probability favours those with higher q values

Question 19

Q

exploration

Answer

Study These Flashcards

A

agent needs to explore environment to find out whether a better strategy exists

Question 20

Q

exploitation

Answer

Study These Flashcards

A

agent needs to exploit what it has already learnt about environment to make good choices
leverage agent’s current knowledge

Question 21

Q

temporal difference learning

Answer

Study These Flashcards

A

unsupervised learning technique used in reinforcement learning for the purpose of predicting the total reward expected over future

Question 22

Q

4 steps for temporal difference learning

Answer

Study These Flashcards

A

1) chose T, discount rate and desired learning rate
2) set all values in q table to 0
3) for each episode
start at initial state
chosen action using policy
take action and observe r and next state
update q table for state action pair using temporal difference update rule (based on difference between predicted reward and actual reward= TD error)
4) repeat for multiple episodes until q value converges

Reinforcement Learning Flashcards

(22 cards)