Class 9 Flashcards
supervised learning
learning by which an agent passively learns by observing example input/output pairs provided by a teacher
reinforcement learning
type of learning where an agent interacts with the world and periodically receives rewards, goal is to maximize expected sum of rewards
markov decision process
when a specific action in a specific environment have provided sufficient rewards in the past these are the actions the machine should continue to take in the future
model based reinforcement learning
type of learning that uses a transition model of the environment to help interpret reward signals and make decisions about how to act
utility function
something that calculates a sum of rewards from a certain state onward
model free reinforcement learning
type of learning where the agent neither knowns nor learns a transition model for the environment
action utility learning
type of learning when the agent learns a quality function which calculates a reward in a state if a specific action is taken – the agent searches for action with the highest Q value
reflex agent
agent that performs a policy search when it directly maps states to actions
passive reinforcement learning
type of learning where the agent’s policy is fixed and the task is to learn the utilities of states
passive learning agent
agent with a fixed policy to determine its actions and which tries to learn a utility function, does not know state transition model or reward function
direct utility estimation
estimated total reward from that state onward
adaptive dynamic programming agent
agent that takes advantage of the constraints among the utilities of states by learning the transition model that connects them and solving the corresponding markov decision process
prioritized sweeping
heuristic that prefers to make adjustments to states whose likely successors have just undergone a large adjustment on their own utility estimations
absorbing state
when an agent can perform no actions that can have any effect and no rewards are received
q learning
type of learning that learns an action utility function instead of a utility function, off policy learning algorithm