Week 9: Rescorla-Wagner and Temporal Difference Learning Flashcards
There are other related model-free algorithms
such as the temporal difference learning
Temporal Difference Learning (2)
- It is also a model-free RL algorithm
- It is different from Q-learning
Definition of classical conditoning
a learning process that occurs when two stimuli are repeatedly paired; a response that is at first elicited by the second stimulus is eventually elicited by the first stimulus alone.
UCS
Unconditioned stimulus
UCS is a stimulus that
leads to an automatic response
Neutral stimulus is a
stimulus that does not trigger a response on its own
Conditioned stimulus is
a stimulus that was once neutral (did not trigger a response) but now leads to a response
Unconditioned response (UCR)
is an automatic response tha occurs without thought when an unconditioned stimulsu is present
Conditioned response (CR)
Is a learned response created where no response existed before
Pavlov’s Dog Experiment (4)
- Before conditioning, Dog was presented with food (UCS) that lead to automatically trigger a salivation response (UCR)
- Before condtioning, Dog heard a bell rang (NS) which lead to no response from dog
- During conditoning, dog presented with food (UCS) and sound of bell (NS) which lead to salivation (UCR)
- After conditioning, the dog salivated (CR) when he heard the bell rang (CS)
Diagram of Pavlov’s dog: classical conditioning experiment,
Now we view S as
conditoned stimulus (bell after pairing)
Do not confuse ‘S’ as
state
Now we view R as
reinforcement (i.e., the food)
Table of acqusition, extinction, partial reinforcement in classical conditoning Diagram
Acqusition in table is where (3)
S (CS) is paired with reward in Phase 1
Nothing in Phase 2
Then we get response to S
Extinction in classical condtioning table outcome is where (3)
S (CS) paired with R In Phase 1
Present S (CS) on its own in Phase 2
Then see no response to S
Partial Reinforcement in table of classical condtioning (2)
where we ocassionally present S with R
Lead to weak response to S
Simple way to model that table of classical condtioning (CC) (of acquisition, extinction and partial reinforcement is) (2)
Stimulus neuron which has an input weight to a reward neuron
If the r neuron is active, we predict reinforcement reward/punishment
Use a simple delta-rule model in simple model of CC (of acqusition, extinction and partial reinforcement)
If S stimulus is present… (2)
S=1 (S=0 if not present):
then update weight: w → w + εSδ