chapter 8 Flashcards

1
Q

reinforcement learning

A

inspired by operant conditioning

contrasts with the supervised-learning method

requires no labeled training examples

an AGENT—the learning program—performs ACTIONS in an ENVIRONMENT (usually a computer simulation) and occasionally receives REWARDS from the environment. These intermittent rewards are the only feedback the agent uses for learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The promise of reinforcement learning

A

the agent can learn flexible strategies on its own simply by performing actions in the world and occasionally receiving rewards (that is, reinforcement) without humans having to MANUALLY WRITE RULES or DIRECTLY TEACH THE AGENT EVERY POSSIBLE CIRCUMSTANCE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

state

A

the state of an agent at a given time is the agent’s perception of its current situation.

In the purest form of reinforcement learning, the learning agent doesn’t remember its previous states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does the algorithm do

A

tells the agent how to learn from her experiences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Reinforcement learning occurs by

A

having the agent take actions over a series of learning EPISODES, each of which consists of some number of ITERATIONS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the agent learn?

A

upon receiving a reward, the agent learns only about;

the STATE and the ACTION that immediately preceded the reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

the value of an action

A

he value of action A in state S is a number reflecting the agent’s current prediction of how much reward it will EVENTUALLY obtain if, when in state S, it performs action A, AND THEN CONTINUES PERFORMING HIGH-VALUE ACTIONS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

the goal of reinforcement learning

A

for the agent to learn values that are good predictions of upcoming rewards (assuming that the agent keeps doing the right thing after taking the action in question)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Q-table

A

a table of states, actions, and values

Given a state, each action in that state has a numerical value; these values will change— becoming more accurate predictions of upcoming rewards—as Rosie continues to learn

Reinforcement learning is here the gradual updating of values in the Q- table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

essence of q-learning

A

Rosie can now learn something about the action (Forward) she took in the immediately previous state (one step away).

so, memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

exploration versus exploitation balance

A

Deciding how much to explore new actions and how much to exploit

A naive strategy would be to always choose the action with the highest value for the current state in the Q-table.

Achieving the right balance is a core issue for making reinforcement learning successful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

two major stumbling blocks that might arise in extrapolating our “training Rosie” example to reinforcement learning in real-world tasks.

A
  1. the Q-table.
    > in complex real-world tasks, it’s impossible to define a small set of “states” that could be listed in a table.
    learning via a Q-table like the one in the “Rosie” example is out of the question.
    >For this reason, most modern approaches to reinforcement learning use a neural network instead of a Q-table.
  2. difficulty, in the real world, of actually carrying out the learning process over many episodes, using a real robot.
    > You just wouldn’t have enough time.
    > Moreover, you might risk the robot damaging itself by choosing the wrong action

the best-known reinforcement-learning successes have been in the domain of game playing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

episode of Q-learning

A

at each iteration the learning agent does the following:

  1. it figures out its current state
  2. looks up that state in the Q-table
  3. uses the values in the table to choose an action
  4. performs that action, possibly receives a reward
  5. the learning step: updates the values in its Q-table.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly