Class 9 Flashcards by Surya Mani

supervised learning

learning by which an agent passively learns by observing example input/output pairs provided by a teacher

How well did you know this?

Not at all

Perfectly

reinforcement learning

type of learning where an agent interacts with the world and periodically receives rewards, goal is to maximize expected sum of rewards

How well did you know this?

Not at all

Perfectly

markov decision process

when a specific action in a specific environment have provided sufficient rewards in the past these are the actions the machine should continue to take in the future

How well did you know this?

Not at all

Perfectly

model based reinforcement learning

type of learning that uses a transition model of the environment to help interpret reward signals and make decisions about how to act

How well did you know this?

Not at all

Perfectly

utility function

something that calculates a sum of rewards from a certain state onward

How well did you know this?

Not at all

Perfectly

model free reinforcement learning

type of learning where the agent neither knowns nor learns a transition model for the environment

How well did you know this?

Not at all

Perfectly

action utility learning

type of learning when the agent learns a quality function which calculates a reward in a state if a specific action is taken – the agent searches for action with the highest Q value

How well did you know this?

Not at all

Perfectly

reflex agent

agent that performs a policy search when it directly maps states to actions

How well did you know this?

Not at all

Perfectly

passive reinforcement learning

type of learning where the agent’s policy is fixed and the task is to learn the utilities of states

How well did you know this?

Not at all

Perfectly

passive learning agent

agent with a fixed policy to determine its actions and which tries to learn a utility function, does not know state transition model or reward function

How well did you know this?

Not at all

Perfectly

direct utility estimation

estimated total reward from that state onward

How well did you know this?

Not at all

Perfectly

adaptive dynamic programming agent

agent that takes advantage of the constraints among the utilities of states by learning the transition model that connects them and solving the corresponding markov decision process

How well did you know this?

Not at all

Perfectly

prioritized sweeping

heuristic that prefers to make adjustments to states whose likely successors have just undergone a large adjustment on their own utility estimations

How well did you know this?

Not at all

Perfectly

absorbing state

when an agent can perform no actions that can have any effect and no rewards are received

How well did you know this?

Not at all

Perfectly

q learning

type of learning that learns an action utility function instead of a utility function, off policy learning algorithm

How well did you know this?

Not at all

Perfectly

bayesian reinforcement learning

Study These Flashcards

type of learning that looks at prior probabilities over hypotheses about what the true model is, agents of this type of learning don’t explore as much as they should because they get caught on prior probabilities

sarsa

Study These Flashcards

state, action, reward, state, action – close to q learning but updates with the q value of the action that is actually taken, on policy learning algorithm

evaluation function

Study These Flashcards

compact measure of desirability for potentially vast state spaces

function approximation

Study These Flashcards

process of constructing a compact approximation of the true utility function of q function

joint state space

Study These Flashcards

foundation of HRL, each state is composed of a physical state s and a machine state m

policy search

Study These Flashcards

keep tweaking the policy as long as its performance improves, then stop

stochastic policy

Study These Flashcards

specifies the probability of selecting an action a in a state s

apprenticeship learning

Study These Flashcards

field that studies the process of learning how to behave well given observations of expert behavior

imitation learning

Study These Flashcards

applying supervised learning to the observed action state pairs to learn a policy

inverse reinforcement learning

learning rewards by observing a policy rather than learning a policy by observing rewards

boltzmann rationality

to allow for mistakes by the expert we have to assume that rewards exist

feature mapping

algorithm that assumes the reward function can be written as a weighted linear combination of features

feature expectation

expected discounted value of the feature f when a policy is executed

Class 9 Flashcards

(28 cards)