Intro: DavidSilver Flashcards
3 branches of ML ?
Supervised, Unsupervised, Reinforcement Learning
Why is RL different from other ML ?
1) No supervisor,
2) Only Reward
3) Feedback Delayed
4) Time dependent (sequential data, not fixed IID dataset)
5) Agents actions influence subsequent data in receives.
Reward Hypothesis ?
All goals can be described by the maximization of expected reward.
Describe the Agent process at each timestep t ?
Executes Action At,
Receives Observation Ot,
Receives Scalar Reward Rt
Describe The Environment at each tilmestep t ?
Receives Action At
Emits observation Ot+1
Emits Scalar Reward Rt+1
“History” in RL ?
Sequence of Observations, Actions and Rewards
“State” in RL ?
Used to determine what happens next.
Describe “State” formally.
St = f(Ht)
“Environment State” ?
The environments private representation.
“Agent State?”
The agents private representation.
“information State?”
“Markov State” contains all useful information from the history.
Markov State ?
The future is independent of the past given the present.
i.e. once the state is know, the past history may be discarded.
Fully Observable Environment ?
Agent directly observes environment state
Partially Observable Environment ?
Agent indirectly observes environment state
3 Components of an RL Agent ?
One or more of: Policy, Value, Model