Reinforcement Learning Flashcards

1
Q

What is reinforcement learning?

A

An agent learning from interaction in the environment through positive and negative feedback.

Reward and Cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Applications for reinforcement learning?

A

Robotics / Autonomous Vehicles

Games

Web navigation and chatbots

Recommender Systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is S-A-R?

A

S - Set of States
A - Set of Actions that can be taken in the states
R - Reward Function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why can agent environment interaction be said to form a closed loop?

A

The agent recieves information about the environment in the form of a state S and Reward R at a time t, and takes and action

The action then modulates the environment leading to a new state S and reward at time t +1

And so it goes on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Markov Property?

A

The Value of any action the agent chooses depends only on the present state and not the previous states.

The current state the agent is in contains all information needed for takin the optimal action

=> Independence of path

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the use/function of the Markov Property?

A
  1. To take the optimal action given knowledge of the state of the world
  2. It is difficult to do it if we have to consdier all past actions that have led up to the present state
  3. Goal is simple if we can summarize the value state of the world by a single value that is independet of any set of past states
  4. Formulating a decision makin process as markovian makes the Mathematic and computation of optimal action easier and more comminicable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is temporal difference learning?

A

A learning Algorithm that allows for action to be selected that account for present and predicted future reward given the current state.
=> A markovian Decision Process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly