Neural networks for reinforcement learning Flashcards
What neurological substrate do reinforcement models typically concern
Unit recordings from mesencephalic DA neurons in monkeys:
Can we explain their firing from models of Reinforcement Learning?
How do these DA neurons behave before learning?
DA cell responds to reward but not to the predictive CS (sound)
How do DA neurons behave following learning?
DA cell does not respond to reward when it is predicted by the CS; Backwards shift of response towards the CS itself!
Does this change if the reward comes unexpectantly?
Cells still responsive to reward when it comes unexpectedly
What temporal Dynamics are at play here?
There’s a fixed interval between sound and liquid reward, Sound is predictive of reward
What type of neural network architecture does this model employ?
(iii) Hybrid neural network: feed-forward and feedback/ recurrent connections
* Important subclass: reinforcement learning
Describe the architecture of a simple reinforcement learning network subclass of hybrid neural networks
Inputs (P1-5) provide semi connected inputs to:
Hidden layer (3 nodes) provides semi connected inputs to:
Output layer (a1,a2) provides input to:
The environment which both:
Delivers neutral sensory stimuli and
delivers reinforcement (punishment or reward) to post input layers
What could be biological correspondence to these variables?
p = sensory patterns such as sound
a = output, let’s say motor output
reward could be prey that was caught and eaten; punishment could be pain
What are some key features of reinforcement learning? (5)
- Instructive signal for learning is one scalar value for whole
network: the Reinforcement signal - Scalar value can be 1 bit (‘right or wrong’, 0 or 1) or can be
graded (‘pretty good…very good’) - Reinforcement Learning follows operant (instrumental) conditioning, but can also be applied to Pavlovian Conditioning
Stimulus -> Action -> Reinforcement => Modification of
network connections - Reinforcement Learning relies on learning with a critic” (was
the action good or bad?) - Scalar feedback: only tells how good/bad the action given the stimulus was
What can this learning with a critic be contrasted with? What method does this correspond to?
Reinforcement Learning relies on learning with a critic” (was
the action good or bad?)
Contrasts with: “learning with a teacher”
(what was right or wrong in any trial); backpropagation supervised learning
What can this scalar feedback not tell us?
Scalar feedback only tells how good/bad the action given the stimulus was ( just one numerical number; 0,1), not what the optimal output would have been
What is meant by the credit assignment problem in reinforcement learning? How can this be subdivided? (2)
- In real (and artificial) life, a reinforcement is usually obtained only after a long sequence of actions (e.g. playing chess – win/lose)
- temporal credit assignment problem: which individual move was particularly good or bad?
- structural credit assignment problem: which individual neuron
(unit) behaved correctly or erroneously?
In regards to the taxonomy of mammalian memory systems, where does reinforcement learning concern?
Non-declarative (implicit):RL resorts under stimulus-response skill learning (procedural learning) and classical conditioning
What neural substrates are often assigned to these kinds of learning?
Procedural learning: Striatum
Classical conditioning:
Emotional responses: Amygdala
Skeletal musculature: Cerebellum
Give the learning sequences for these types of learning respectively
skill: stimulus => action => outcome
classical cond: stimulus => outcome
Classical Reinforcement Learning captures only two elements of complex processes underlying operant conditioning, what are these? Describe their learning sequences in regards
Stimulus-response (operant) learning and Pavlovian association
Both concern the transition of a stimulus to a reinforcer
Pavlovian learning assigns motivational value to stimulus and elicits automated (‘reflexive’) reaction (no instrumental action needed to obtain outcome)
Stimulus-response (operant) learning first concerns the transition of a stimulus to a response and a response to a reinforcer
Experimental Psychology produced evidence for additional learning processes within stimulus-response (operant) learning, what are these? (The transitional processes )
- Habits: in real life stimulus-response learning eventually leads to habit formation (=weakly sensitive to reinforcement);
*Action-outcome learning (associating a (associated) response to a reinforcer)
What behaviour is associated with action-outcome learning according to Pennartz?
Goal Directedness to determine if response is needed for reinforcer
In pavlovian response is not important
How does this learning process contrast with backpropagation?
Critic evaluates whether the action is good or not rather than at the individual level; was the action good or not. Backprop would be learning with a teacher, quite artificial.
What is action-outcome learning about? (3)
:: Knowing what you need (most) and how to get it
:: Representing your goal before undertaking action
:: Knowing whether your action is relevant or not
Illustrate the importance of action outcome learning with a dilemma
“Castaway’s dilemma”; Why stimulus-response learning is not sufficient
A castaway on an island sees palm trees, what action should he carry out?
One stimulus, multiple options:
1) Search for coconuts?
2) Burn trees to get warm?
3) Build a raft to escape?
Stimulus-response learning does not solve the problem; Stimulus does not tell you what to do. Once goal is identified, you need to know the associated action required to achieve that goal