Reinforcement Learning Flashcards

1
Q

what is reinforcement learning?

A

an agent acts in an environment and is rewarded depending on its actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

stochastic environment

A

no guarantee of same outcome if taking the same action in the same state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the agent given in rl?

A

initial state
set of possible actions
reward associated with each state
not given transition model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

state

A

state of the environment as perceived by the agent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

policy

A

strategy that defines the behavior of an agent by mapping states to actions

guides the agent on what actions to take in each state to maximize cumulative rewards over time

“decision-making function” of the agent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

reward

A

feedback signals that an agent received from environment after taking an action
quantified value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

utility

A

subjective value agent places on being in a given state capturing both immediate rewards and future (discounted) rewards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

discount factor

A

value between 0 and 1 to determine how importance placed on future rewards
0 being only care about immediate rewards
closer to 1 agent considers future rewards almost as important as current reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the utility of the terminal state?

A

just reward of that state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

utility of a state formula

A

U(st)= rst + yU(st+1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

q-value

A

expected utility of taking a particular action in a state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

q table

A

table of q values for all possible combinations of actions and states

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

q table is used as the agents

A

policy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how is the q table used to make an action?

A

chose action with largest q value and this max expected reward over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

softmax

A

a row of q table can be transformed into a probability distribution of actions using softmax function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

agent uses softmax to

A

balance exploration and exploitation

17
Q

softmax is alternative to just selecting

A

action with highest q value

18
Q

Temperature parameter

A

T>0
higher encourages exploration by making action probability more even
lower encourages exploitation as action probability favours those with higher q values

19
Q

exploration

A

agent needs to explore environment to find out whether a better strategy exists

20
Q

exploitation

A

agent needs to exploit what it has already learnt about environment to make good choices
leverage agent’s current knowledge

21
Q

temporal difference learning

A

unsupervised learning technique used in reinforcement learning for the purpose of predicting the total reward expected over future

22
Q

4 steps for temporal difference learning

A

1) chose T, discount rate and desired learning rate
2) set all values in q table to 0
3) for each episode
start at initial state
chosen action using policy
take action and observe r and next state
update q table for state action pair using temporal difference update rule (based on difference between predicted reward and actual reward= TD error)
4) repeat for multiple episodes until q value converges