chapter 8 Flashcards

Question 1

Q

reinforcement learning

Answer

A

inspired by operant conditioning

contrasts with the supervised-learning method

requires no labeled training examples

an AGENT—the learning program—performs ACTIONS in an ENVIRONMENT (usually a computer simulation) and occasionally receives REWARDS from the environment. These intermittent rewards are the only feedback the agent uses for learning.

Question 2

Q

The promise of reinforcement learning

Answer

A

the agent can learn flexible strategies on its own simply by performing actions in the world and occasionally receiving rewards (that is, reinforcement) without humans having to MANUALLY WRITE RULES or DIRECTLY TEACH THE AGENT EVERY POSSIBLE CIRCUMSTANCE

Question 3

Q

state

Answer

A

the state of an agent at a given time is the agent’s perception of its current situation.

In the purest form of reinforcement learning, the learning agent doesn’t remember its previous states.

Question 4

Q

what does the algorithm do

Answer

A

tells the agent how to learn from her experiences.

Question 5

Q

Reinforcement learning occurs by

Answer

A

having the agent take actions over a series of learning EPISODES, each of which consists of some number of ITERATIONS.

Question 6

Q

What does the agent learn?

Answer

A

upon receiving a reward, the agent learns only about;

the STATE and the ACTION that immediately preceded the reward

Question 7

Q

the value of an action

Answer

A

he value of action A in state S is a number reflecting the agent’s current prediction of how much reward it will EVENTUALLY obtain if, when in state S, it performs action A, AND THEN CONTINUES PERFORMING HIGH-VALUE ACTIONS

Question 8

Q

the goal of reinforcement learning

Answer

A

for the agent to learn values that are good predictions of upcoming rewards (assuming that the agent keeps doing the right thing after taking the action in question)

Question 9

Q

Q-table

Answer

A

a table of states, actions, and values

Given a state, each action in that state has a numerical value; these values will change— becoming more accurate predictions of upcoming rewards—as Rosie continues to learn

Reinforcement learning is here the gradual updating of values in the Q- table

Question 10

Q

essence of q-learning

Answer

A

Rosie can now learn something about the action (Forward) she took in the immediately previous state (one step away).

so, memory

Question 11

Q

exploration versus exploitation balance

Answer

A

Deciding how much to explore new actions and how much to exploit

A naive strategy would be to always choose the action with the highest value for the current state in the Q-table.

Achieving the right balance is a core issue for making reinforcement learning successful.

Question 12

Q

two major stumbling blocks that might arise in extrapolating our “training Rosie” example to reinforcement learning in real-world tasks.

Answer

A

the Q-table.
> in complex real-world tasks, it’s impossible to define a small set of “states” that could be listed in a table.
learning via a Q-table like the one in the “Rosie” example is out of the question.
>For this reason, most modern approaches to reinforcement learning use a neural network instead of a Q-table.
difficulty, in the real world, of actually carrying out the learning process over many episodes, using a real robot.
> You just wouldn’t have enough time.
> Moreover, you might risk the robot damaging itself by choosing the wrong action

the best-known reinforcement-learning successes have been in the domain of game playing.

Question 13

Q

episode of Q-learning

Answer

A

at each iteration the learning agent does the following:

it figures out its current state
looks up that state in the Q-table
uses the values in the table to choose an action
performs that action, possibly receives a reward
the learning step: updates the values in its Q-table.