Reinforcement Learning Flashcards
Set of states that contains the state that the agent may be in
Belief states
If environment is deterministic, actions taken by the agent should result to a belief state with __ size compared to original
Lesser / equal
If environment is STOCHASTIC, actions taken by the agent should result to a belief state with ___ size compared to the original
GREATER SIZE
Repeating acting of]f one action from one solution path and observing the local environment is called
Act observe cycle
Can learn to EXPLORE THE TERRITORY, learn WHERE THE REWARDS ARE, and then LEARN THE OPTIMAL POLICY
Uses OBSERVED REWARDS/PUNISHMENTS to learn an OPTIMAL POLICY for an environment
Reinforcement learning
Type of RL that has a fixed policy to execute and learns the reward function and policies while executing the fixed policy
PASSIVE RL
Type of rl that changes its policy as it looks for the reward function and optimal control
Active rl
Formula of U[S]
U[S] + ( learning rate * ( rewards[S] + discount factor * U[Sā] - U[S] ))
Learning rate = 1 / N[S]+1
Active reinforcement learning algo that CHANGES THE CONTROL POLICY AFTER K ITERATIONSof the temporal diff learning
Greedy reinforcement learning
Agent knows nothing except WHAT IS LOCALLY AVAILABLE;
AGENT EXPLORES SURROUNDINGS CHECKING FOR REWARDS BASED IN GOALS
Reinforcement learning
Allows us to make sense of previous data
Machine learning
A passive RL algorithm that moves from one state to another
Takes note of the difference between 2 states and computes for the values of each state depending on WHERE THEY LEAD
Temporal difference learning
Agent knows nothing except what is locally available
Agent explores surrounding, checking for rewards and penalties then learns what to do
Reinforcement learning
Agent finds the program / control policy for a given problem using the data supplied to the agent
Machine learning