Intro: DavidSilver Flashcards

Question 1

Q

3 branches of ML ?

Answer

A

Supervised, Unsupervised, Reinforcement Learning

Question 2

Q

Why is RL different from other ML ?

Answer

A

1) No supervisor,
2) Only Reward
3) Feedback Delayed
4) Time dependent (sequential data, not fixed IID dataset)
5) Agents actions influence subsequent data in receives.

Question 3

Q

Reward Hypothesis ?

Answer

A

All goals can be described by the maximization of expected reward.

Question 4

Q

Describe the Agent process at each timestep t ?

Answer

A

Executes Action At,
Receives Observation Ot,
Receives Scalar Reward Rt

Question 5

Q

Describe The Environment at each tilmestep t ?

Answer

A

Receives Action At
Emits observation Ot+1
Emits Scalar Reward Rt+1

Question 6

Q

“History” in RL ?

Answer

A

Sequence of Observations, Actions and Rewards

Question 7

Q

“State” in RL ?

Answer

A

Used to determine what happens next.

Question 8

Q

Describe “State” formally.

Answer

A

St = f(Ht)

Question 9

Q

“Environment State” ?

Answer

A

The environments private representation.

Question 10

Q

“Agent State?”

Answer

A

The agents private representation.

Question 11

Q

“information State?”

Answer

A

“Markov State” contains all useful information from the history.

Question 12

Q

Markov State ?

Answer

A

The future is independent of the past given the present.
i.e. once the state is know, the past history may be discarded.

Question 13

Q

Fully Observable Environment ?

Answer

A

Agent directly observes environment state

Question 14

Q

Partially Observable Environment ?

Answer

A

Agent indirectly observes environment state

Question 15

Q

3 Components of an RL Agent ?

Answer

A

One or more of: Policy, Value, Model

Question 16

Q

Agent’s Policy ?

Answer

Study These Flashcards

A

Rules/Space that describe/represent and agents behavior.

Question 17

Q

Agent’s Value Function ?

Answer

Study These Flashcards

A

How good each state/action is.

Question 18

Q

Agent’s Model ?

Answer

Study These Flashcards

A

Agents representation of the environment.

Question 19

Q

3 Categories of RL Agents ?

Answer

Study These Flashcards

A

Value-Based
Policy-Based
Actor-Critic

Question 20

Q

Value-based RL Agent ?

Answer

Study These Flashcards

A

Has Value Function, Policy is implicit.

Question 21

Q

Policy-based RL Agent ?

Answer

Study These Flashcards

A

Has Policy Function, Value is implicit.

Question 22

Q

Actor-Critic RL Agent ?

Answer

Study These Flashcards

A

has both Value and Policy functions.

Question 23

Q

2 fundamental problems in RL ?

Answer

Study These Flashcards

A

Environment Is Initally unknown
Model of Environment is unknown

Question 24

Q

Exploration vs Exploitation ?

Answer

Study These Flashcards

A

Exploration finds more info about environment
Exploitation leverages known env. info to maximize reward

Intro: DavidSilver Flashcards

(24 cards)