Class 9 Flashcards
supervised learning
learning by which an agent passively learns by observing example input/output pairs provided by a teacher
reinforcement learning
type of learning where an agent interacts with the world and periodically receives rewards, goal is to maximize expected sum of rewards
markov decision process
when a specific action in a specific environment have provided sufficient rewards in the past these are the actions the machine should continue to take in the future
model based reinforcement learning
type of learning that uses a transition model of the environment to help interpret reward signals and make decisions about how to act
utility function
something that calculates a sum of rewards from a certain state onward
model free reinforcement learning
type of learning where the agent neither knowns nor learns a transition model for the environment
action utility learning
type of learning when the agent learns a quality function which calculates a reward in a state if a specific action is taken – the agent searches for action with the highest Q value
reflex agent
agent that performs a policy search when it directly maps states to actions
passive reinforcement learning
type of learning where the agent’s policy is fixed and the task is to learn the utilities of states
passive learning agent
agent with a fixed policy to determine its actions and which tries to learn a utility function, does not know state transition model or reward function
direct utility estimation
estimated total reward from that state onward
adaptive dynamic programming agent
agent that takes advantage of the constraints among the utilities of states by learning the transition model that connects them and solving the corresponding markov decision process
prioritized sweeping
heuristic that prefers to make adjustments to states whose likely successors have just undergone a large adjustment on their own utility estimations
absorbing state
when an agent can perform no actions that can have any effect and no rewards are received
q learning
type of learning that learns an action utility function instead of a utility function, off policy learning algorithm
bayesian reinforcement learning
type of learning that looks at prior probabilities over hypotheses about what the true model is, agents of this type of learning don’t explore as much as they should because they get caught on prior probabilities
sarsa
state, action, reward, state, action – close to q learning but updates with the q value of the action that is actually taken, on policy learning algorithm
evaluation function
compact measure of desirability for potentially vast state spaces
function approximation
process of constructing a compact approximation of the true utility function of q function
joint state space
foundation of HRL, each state is composed of a physical state s and a machine state m
policy search
keep tweaking the policy as long as its performance improves, then stop
stochastic policy
specifies the probability of selecting an action a in a state s
apprenticeship learning
field that studies the process of learning how to behave well given observations of expert behavior
imitation learning
applying supervised learning to the observed action state pairs to learn a policy
inverse reinforcement learning
learning rewards by observing a policy rather than learning a policy by observing rewards
boltzmann rationality
to allow for mistakes by the expert we have to assume that rewards exist
feature mapping
algorithm that assumes the reward function can be written as a weighted linear combination of features
feature expectation
expected discounted value of the feature f when a policy is executed