Reinforcement Learning Flashcards
what is reinforcement learning?
an agent acts in an environment and is rewarded depending on its actions
stochastic environment
no guarantee of same outcome if taking the same action in the same state
what is the agent given in rl?
initial state
set of possible actions
reward associated with each state
not given transition model
state
state of the environment as perceived by the agent
policy
strategy that defines the behavior of an agent by mapping states to actions
guides the agent on what actions to take in each state to maximize cumulative rewards over time
“decision-making function” of the agent
reward
feedback signals that an agent received from environment after taking an action
quantified value
utility
subjective value agent places on being in a given state capturing both immediate rewards and future (discounted) rewards
discount factor
value between 0 and 1 to determine how importance placed on future rewards
0 being only care about immediate rewards
closer to 1 agent considers future rewards almost as important as current reward
what is the utility of the terminal state?
just reward of that state
utility of a state formula
U(st)= rst + yU(st+1)
q-value
expected utility of taking a particular action in a state
q table
table of q values for all possible combinations of actions and states
q table is used as the agents
policy
how is the q table used to make an action?
chose action with largest q value and this max expected reward over time
softmax
a row of q table can be transformed into a probability distribution of actions using softmax function