Reinforcement Learning Flashcards
What is reinforcement learning?
An agent learning from interaction in the environment through positive and negative feedback.
Reward and Cost
What are Applications for reinforcement learning?
Robotics / Autonomous Vehicles
Games
Web navigation and chatbots
Recommender Systems
What is S-A-R?
S - Set of States
A - Set of Actions that can be taken in the states
R - Reward Function
Why can agent environment interaction be said to form a closed loop?
The agent recieves information about the environment in the form of a state S and Reward R at a time t, and takes and action
The action then modulates the environment leading to a new state S and reward at time t +1
And so it goes on.
What is the Markov Property?
The Value of any action the agent chooses depends only on the present state and not the previous states.
The current state the agent is in contains all information needed for takin the optimal action
=> Independence of path
What is the use/function of the Markov Property?
- To take the optimal action given knowledge of the state of the world
- It is difficult to do it if we have to consdier all past actions that have led up to the present state
- Goal is simple if we can summarize the value state of the world by a single value that is independet of any set of past states
- Formulating a decision makin process as markovian makes the Mathematic and computation of optimal action easier and more comminicable.
What is temporal difference learning?
A learning Algorithm that allows for action to be selected that account for present and predicted future reward given the current state.
=> A markovian Decision Process.