lecture 3 - reinforcement learning Flashcards
value-based decision making
used because many decisions are about objective stimulus values instead of subjective preferences like before (DDM, SDT)
reinforcement learning in AI
how do agents learn to behave in an environment
reinforcement learning in psychology
how do humans learn from rewards
basic mechanisms of interest for RL in neural models for cognitive processes
- decision-making
- learning
classical conditioning
- a process where a new stimulus-response connection is formed through association, allowing an agent to associate a previously neutral stimulus with an unconditioned response.
- conditioning happens even without action from an agent
key stages of classical conditioning
- Before Conditioning: Unconditioned stimulus (US) elicits an unconditioned response (UR). Neutral stimulus (NS) produces no response.
- During Conditioning: Neutral stimulus (NS) is paired with US, leading to UR.
- After Conditioning: The NS becomes a conditioned stimulus (CS), eliciting a conditioned response (CR).
acquisition
the process by which a neutral stimulus gains associative value, leading to a learned response
extinction
the learned association can weaken and eventually disappear if the conditioned stimulus is not reinforced by the unconditioned stimulus
‘kamin’ blocking
a previously learned association prevents the formation of a new association with a second stimulus
rescorla-wagner model
The model aims to predict rewards or punishments by learning from the prediction error (𝛿), which is the difference between the expected reward and the actual reward.
rescorla-wagner model: formula
prediction error (δ) = actual reward - predicted reward
delta-rule
how much learning occurs on each trial, based on;
- prediction error
- learning rate
- stimulus salience
delta-rule: formula
ΔV=αβ(λ−ΣV)
name the components: ΔV=αβ(λ−ΣV)
ΔV: Amount of learning on a given trial.
α: Learning rate.
β: Salience of the stimulus.
λ: Asymptote of learning (maximum value).
ΣV: Total amount learned so far (expectation).
(λ−ΣV): prediction error (δ)
delta rule: prediction error
- difference between the value of the feedback (λ) and the current expectation (ΣV) based on prior experience
- when the prediction error is large, learning occurs more rapidly
if δ is large, the prediction error is larger, and learning occurs more rapidly on the trial than when e.g., δ = 0, as this would indicate no needed adjustment (perfect prediction)
delta rule: value of the learning rate
- determines the steepness of the learning curve
- higher α leads to faster increases in association strength.
- lower α results in slower learning over trials.
delta rule: λ
- asymptote of learning (value of CS)
- larger λ leads to steeper curve, as the initial prediction error will be larger
- extinction happens when λ = 0 (downward curve)
delta rule: What happens as ΣV approaches λ over trials
As the sum of learned values (ΣV) approaches the asymptote (λ), the prediction error decreases, learning slows down, and eventually stops when the asymptote is reached.
limitation of rescorla-wagner model
- RW model is designed to predict only the immediate reward based on the current conditioned stimulus (CS)
- doesn’t capture more complicated situations such as higher order conditioning, which involves learning complex associations that extend beyond immediate, single-step predictions
- limits its ability to model scenarios where rewards are delayed or involve a sequence of predictive cues.
- we therefore need to predict all possible future rewards
temporal difference learning
- extends RW model to cover all the time steps in the trial with an eligibility trace
- we are now not only predicting the immediate reward but also accounting for future rewards
eligibility trace
- bookkeeping of all times in the trial
- projects a value back in time, which means that it can adjust earlier state values based on rewards that occur later in a sequence
- enables the model to learn from delayed rewards