Reinforcement Learning Flashcards

Question 1

Q

What is reinforcement learning?

Answer

A

An agent learning from interaction in the environment through positive and negative feedback.

Reward and Cost

Question 2

Q

What are Applications for reinforcement learning?

Answer

A

Robotics / Autonomous Vehicles

Games

Web navigation and chatbots

Recommender Systems

Question 3

Q

What is S-A-R?

Answer

A

S - Set of States
A - Set of Actions that can be taken in the states
R - Reward Function

Question 4

Q

Why can agent environment interaction be said to form a closed loop?

Answer

A

The agent recieves information about the environment in the form of a state S and Reward R at a time t, and takes and action

The action then modulates the environment leading to a new state S and reward at time t +1

And so it goes on.

Question 5

Q

What is the Markov Property?

Answer

A

The Value of any action the agent chooses depends only on the present state and not the previous states.

The current state the agent is in contains all information needed for takin the optimal action

=> Independence of path

Question 6

Q

What is the use/function of the Markov Property?

Answer

A

To take the optimal action given knowledge of the state of the world
It is difficult to do it if we have to consdier all past actions that have led up to the present state
Goal is simple if we can summarize the value state of the world by a single value that is independet of any set of past states
Formulating a decision makin process as markovian makes the Mathematic and computation of optimal action easier and more comminicable.

Question 7

Q

What is temporal difference learning?

Answer

A

A learning Algorithm that allows for action to be selected that account for present and predicted future reward given the current state.
=> A markovian Decision Process.