1 - Intro Flashcards
Reinforcement Learning is about how computer systems can learn to ___________________.
Reinforcement Learning is about how computer systems can learn to take actions in an environment to maximize their rewards.
The Markov Assumption assumes that in order to predict the future you’ll use ___________________.
The Markov Assumption assumes that in order to predict the future you’ll use only the present state’s information. Another way of putting this is that the system is “memory-less” or that the future states of some process are only dependent upon the present state and not any of the states that preceded that.
What is the difference between state and history?
A state is generally a function of history. There could be other information that an agent would like to ideally use, but we are just going to constrain ourselves to defining the state as the history or in other words, the observations seen so far, the actions its taken, and the rewards its been served.
What does POMDP stand for?
This stands for Partially Observable Markov Decision Process. This means that the Agent’s state is not the same as the world state and the agent constructs its own state using the history or beliefs of the world state, etc. (ex - poker)
What is a policy in Reinforcement Learning?
The policy is a function that maps an agent’s states to an action. In basic terms, it is how the agent makes decisions or chooses an action. This can either be deterministic or stochastic. It is represented by the Greek symbol pi.
How is a policy evaluated?
We can do this with the Value function, which is the expected discount sum of future rewards under a particular policy pi. It can be used to quantify the goodness or badness of states and actions and allows the agent to decide how to act by letting us compare policies.
What is a transition matrix for Markov Process?
A transition matrix is a square matrix where the rows are usually non-negative real numbers that sum to 1. This matrix is used to describe the transitions of a Markov Chain, where each corresponding element in the ith row and jth column describes the probability of moving from state i to state j in one time step.
What is a horizon in Reinforcement Learning?
The horizon refers to how many steps into the future from the present state does the agent consider for the rewards it could obtain. There can either be finite horizons or infinite horizons. For instance, if we had a horizon of 1, then we would consider only the next state’s reward into our calculation.
Value iteration is more important in the applied setting. Value iteration is a method of computing an optimal MDP policy and its value. Bellman equation is very important. Bellman equation provides a solution for ___________________.
Value iteration is more important in the applied setting. Value iteration is a method of computing an optimal MDP policy and its value. Bellman equation is very important. Bellman equation provides a solution for finding the unknown function V, or the value function, which describes the best possible value of the objective as a function of the state x. It thus allows us to express the value of a state-action pair in terms of its successors, in essence allowing us to break up a complex problem into a sequence of smaller problems.