Class 9 Flashcards

1
Q

supervised learning

A

learning by which an agent passively learns by observing example input/output pairs provided by a teacher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

reinforcement learning

A

type of learning where an agent interacts with the world and periodically receives rewards, goal is to maximize expected sum of rewards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

markov decision process

A

when a specific action in a specific environment have provided sufficient rewards in the past these are the actions the machine should continue to take in the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

model based reinforcement learning

A

type of learning that uses a transition model of the environment to help interpret reward signals and make decisions about how to act

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

utility function

A

something that calculates a sum of rewards from a certain state onward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

model free reinforcement learning

A

type of learning where the agent neither knowns nor learns a transition model for the environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

action utility learning

A

type of learning when the agent learns a quality function which calculates a reward in a state if a specific action is taken – the agent searches for action with the highest Q value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

reflex agent

A

agent that performs a policy search when it directly maps states to actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

passive reinforcement learning

A

type of learning where the agent’s policy is fixed and the task is to learn the utilities of states

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

passive learning agent

A

agent with a fixed policy to determine its actions and which tries to learn a utility function, does not know state transition model or reward function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

direct utility estimation

A

estimated total reward from that state onward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

adaptive dynamic programming agent

A

agent that takes advantage of the constraints among the utilities of states by learning the transition model that connects them and solving the corresponding markov decision process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

prioritized sweeping

A

heuristic that prefers to make adjustments to states whose likely successors have just undergone a large adjustment on their own utility estimations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

absorbing state

A

when an agent can perform no actions that can have any effect and no rewards are received

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

q learning

A

type of learning that learns an action utility function instead of a utility function, off policy learning algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

bayesian reinforcement learning

A

type of learning that looks at prior probabilities over hypotheses about what the true model is, agents of this type of learning don’t explore as much as they should because they get caught on prior probabilities

17
Q

sarsa

A

state, action, reward, state, action – close to q learning but updates with the q value of the action that is actually taken, on policy learning algorithm

18
Q

evaluation function

A

compact measure of desirability for potentially vast state spaces

19
Q

function approximation

A

process of constructing a compact approximation of the true utility function of q function

20
Q

joint state space

A

foundation of HRL, each state is composed of a physical state s and a machine state m

21
Q

policy search

A

keep tweaking the policy as long as its performance improves, then stop

22
Q

stochastic policy

A

specifies the probability of selecting an action a in a state s

23
Q

apprenticeship learning

A

field that studies the process of learning how to behave well given observations of expert behavior

24
Q

imitation learning

A

applying supervised learning to the observed action state pairs to learn a policy

25
Q

inverse reinforcement learning

A

learning rewards by observing a policy rather than learning a policy by observing rewards

26
Q

boltzmann rationality

A

to allow for mistakes by the expert we have to assume that rewards exist

27
Q

feature mapping

A

algorithm that assumes the reward function can be written as a weighted linear combination of features

28
Q

feature expectation

A

expected discounted value of the feature f when a policy is executed