Intro: DavidSilver Flashcards
3 branches of ML ?
Supervised, Unsupervised, Reinforcement Learning
Why is RL different from other ML ?
1) No supervisor,
2) Only Reward
3) Feedback Delayed
4) Time dependent (sequential data, not fixed IID dataset)
5) Agents actions influence subsequent data in receives.
Reward Hypothesis ?
All goals can be described by the maximization of expected reward.
Describe the Agent process at each timestep t ?
Executes Action At,
Receives Observation Ot,
Receives Scalar Reward Rt
Describe The Environment at each tilmestep t ?
Receives Action At
Emits observation Ot+1
Emits Scalar Reward Rt+1
“History” in RL ?
Sequence of Observations, Actions and Rewards
“State” in RL ?
Used to determine what happens next.
Describe “State” formally.
St = f(Ht)
“Environment State” ?
The environments private representation.
“Agent State?”
The agents private representation.
“information State?”
“Markov State” contains all useful information from the history.
Markov State ?
The future is independent of the past given the present.
i.e. once the state is know, the past history may be discarded.
Fully Observable Environment ?
Agent directly observes environment state
Partially Observable Environment ?
Agent indirectly observes environment state
3 Components of an RL Agent ?
One or more of: Policy, Value, Model
Agent’s Policy ?
Rules/Space that describe/represent and agents behavior.
Agent’s Value Function ?
How good each state/action is.
Agent’s Model ?
Agents representation of the environment.
3 Categories of RL Agents ?
Value-Based
Policy-Based
Actor-Critic
Value-based RL Agent ?
Has Value Function, Policy is implicit.
Policy-based RL Agent ?
Has Policy Function, Value is implicit.
Actor-Critic RL Agent ?
has both Value and Policy functions.
2 fundamental problems in RL ?
Environment Is Initally unknown
Model of Environment is unknown
Exploration vs Exploitation ?
Exploration finds more info about environment
Exploitation leverages known env. info to maximize reward