week 8 - reinforcement learning Flashcards
what is the difference between supervised learning, unsupervised learning and reinforcement learning?
supervised = learns a mapping between data and labels
unsupervised = discovers patterns in the data
reinforcement = a form of supervised learning that learns based on a reward signal
what is a pro of reinforcement learning?
it can succeed in solving very complex problems that other ML models cant
e.g it can be trained to play games like Go,
it can also be used to explain human learning. Lots of evidence suggests that components of reinforcement learning algorithms appear to be represented in the brain
how do we represent a reinforcement learning problem?
As a markov decision process
MDP have 4 components:
State
Action
Transition probabilities (the probabilitity of transitioning to another state given a particular action)
the goal of the MDP is to find the best way to act (optimal policy)
what can be represented as a markov decision process?
basically anything
it can be used in motion control problems, it can be used in social interactions, it can be used in games
what is model free reinforcement learning?
the process of learning which actions produce the most reward, through trial and error
this can be achieved with temporal difference learning
what is the sarsa algorithm?
SARSA: State-Action-Reward-State-Action, an on-policy reinforcement learning algorithm.
Model-free: Does not require knowledge of the environment’s dynamics.
Action-value function Q(s, a): Estimates the expected cumulative reward of taking action a in state s and following the policy.
Update Rule: Q(s_t, a_t) ← Q(s_t, a_t) + α [ r_{t+1} + γ Q(s_{t+1}, a_{t+1}) - Q(s_t, a_t) ]
On-policy: Updates Q(s_t, a_t) based on the action actually taken, considering the next action a_{t+1}.
Exploration-exploitation: Uses ε-greedy policy to balance exploration and exploitation.
Goal: Learn an optimal policy by iteratively updating Q-values through interaction with the environment.
sarsa
go over this to understand how it works better, when i’m less sleep deprived