Intro: DavidSilver Flashcards

1
Q

3 branches of ML ?

A

Supervised, Unsupervised, Reinforcement Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is RL different from other ML ?

A

1) No supervisor,
2) Only Reward
3) Feedback Delayed
4) Time dependent (sequential data, not fixed IID dataset)
5) Agents actions influence subsequent data in receives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Reward Hypothesis ?

A

All goals can be described by the maximization of expected reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the Agent process at each timestep t ?

A

Executes Action At,
Receives Observation Ot,
Receives Scalar Reward Rt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe The Environment at each tilmestep t ?

A

Receives Action At
Emits observation Ot+1
Emits Scalar Reward Rt+1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

“History” in RL ?

A

Sequence of Observations, Actions and Rewards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

“State” in RL ?

A

Used to determine what happens next.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe “State” formally.

A

St = f(Ht)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

“Environment State” ?

A

The environments private representation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

“Agent State?”

A

The agents private representation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

“information State?”

A

“Markov State” contains all useful information from the history.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Markov State ?

A

The future is independent of the past given the present.
i.e. once the state is know, the past history may be discarded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Fully Observable Environment ?

A

Agent directly observes environment state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Partially Observable Environment ?

A

Agent indirectly observes environment state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

3 Components of an RL Agent ?

A

One or more of: Policy, Value, Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Agent’s Policy ?

A

Rules/Space that describe/represent and agents behavior.

17
Q

Agent’s Value Function ?

A

How good each state/action is.

18
Q

Agent’s Model ?

A

Agents representation of the environment.

19
Q

3 Categories of RL Agents ?

A

Value-Based
Policy-Based
Actor-Critic

20
Q

Value-based RL Agent ?

A

Has Value Function, Policy is implicit.

21
Q

Policy-based RL Agent ?

A

Has Policy Function, Value is implicit.

22
Q

Actor-Critic RL Agent ?

A

has both Value and Policy functions.

23
Q

2 fundamental problems in RL ?

A

Environment Is Initally unknown
Model of Environment is unknown

24
Q

Exploration vs Exploitation ?

A

Exploration finds more info about environment
Exploitation leverages known env. info to maximize reward