Reward and learning - Dr McCabe Flashcards
How does a PET scan measure
radiation emitted from a radioactive glucose tracer in the blood - active regions in the brain require greater blood supply and therefore release greater quantities of radiation
How is the radiation in a PET scan released/detected
radioactive chemical releases positrons (injected or inhaled)
radioactive tracer flurodeoxyglucose (FDG)
advantage of PET
measure many aspects of function in the brain
body treat flurodeoxyglucose in similar way to normal glucose
disadvantage of PET
poor spatial res but better than EEG/MEG
poor temporal - relies on blood
injection of radioactive
bulky and costly
What does an MRI measure?
signal changes in the brain related to different magnetic properties of ocygenated and deoxygenated properties in the blood - relate to neural activity
advantages of MRI
good spatial res
non invasive
disadvantages of MRI
worse temporal than EEG/MEG (but better than PET)
expensive
participants cant have metal in body, must stay still and experience can be claustrophobic
What is learning and why is it necessary
Learning is an innate behavioural response which enables adaptation in novel situations - necessary for survival of the fittest so can quickly respond to stimuli in the environment
basic idea of classical conditioning
unconditioned stimulus provokes an unconditioned response (natural ie startled)
pair with a neutral stimulus that does not evoke a response (ie bell tone)
consistant association between NS and response leads to association of the stimulus with the response - react only to ns
ns becomes conditioned stimulus with conditioned response
How might classical conditioning go wrong?
watson and rayner
9m infant test for fear response to range of stimuli
feared loud and unexpected bang - pair with white rat
become fearful of white rat and generalise to other stimuli ie beard
basic idea of operant conditioning
behaviour leads to reward or punishment which determines if the behaviour is repeated or omitted
relationship between the behaviour and its consequence
basic idea of blocking
when add in another stimulus in CC assoc - dont learn a new association as unnecessary, already have assoc and new stimulus doesnt provide any additional information
LEARNING ONLY OCCUR WHEN SOMETHING HAS CHANGED
not because first stimuli preempts attention but because second stimuli fails to dignal a change in rienforcement
when does learning occur after a conditioned assoc has been formed
when something has changed i.e. information expected has been omitted
when make mistakes
define a prediction error
learning occurs only when what was expected does not occur and therefore made an error in judgement - ie.e the predicted time or occurance or magnitude of a stimulus is different to what expect/is ommited completely
what actually occurs is not what was predicted
define the rescorla wagner model
change in predictive value of a stimulus = difference between what actually happens and what you expected to occur (suprise)
what is dopamine
dopamine is a neurotransmitter/neuromodulator in the brain
most predominantly targets the stratium for motivation and action
what role does the basal ganglia have with reward learning etc
The basal ganglia, a group of interconnected brain areas located deep in the cerebral cortex, have proved to be at work in learning, the formation of good and bad habits
projects from cortex (sensory and motor) to stratium - lots of connections with dopamine
topographic
define topographic
sets of systematic axonal connections from
one neural region to another that preserve (or precisely
invert) the spatial relationship between neurons; cells that
are close together on the sending surface project to regions
that are close together on the target surface
functions of the basal ganglia
hought to be mainly involved with aspects of motor control i.e. disorders such as parkinsons - reaching and grabbing problems
BUT also
basal ganglia facilitate learning, with the neurotransmitter dopamine important to the process
dopamine released in the basal ganglia system communicates with theprefrontal cortex to allow people to pay attention tasks, ignore distractions, and update relevant task information in working memory during problem-solving tasks
describe dopamine responses before conditioning is learned (schultz et al 1997)
burst of dopamine activity to unexpected reward - no pairing with stimulus
describe dopamine response during/ after conditioning (schultz et al 1997)
response begins to transfer towards the stimulus/cue presentation instead of at the reward - fires at the prediction of a reward
describe dopamine response when ommit reward expected from previous conditioning (schultz et al 1997)
spike at prediction but activity of dopamine depressed below basal firing rate where expect the reward - recognise something has changed
how do dopamine responses change according to probabilities between the stimulus and the reward
0% predictive of cue - spike at reward
50% predictive of cue - spike at cue and reward but both smaller activity
100% predictive of cue - spike only at prediction
dopamine response to reward as a weighted sum of past and current rewards (r-v)
what did ramania et al 2004 want to investigate?
look at prediction error related activity in human brain during classical conditioning - fmri used to localise activity in the brain of DA neurons in tracking prediction errors when monetary rewards delivered independent of goal related actions
1- failure of expected rewards and 2- occurence of unexpected reward
control condition where event occurred as expected
problem with ramania et al 2004
assume that fmri relate to activity of dopamine neurons byt not specific - only shows the areas active during learning
what did ramania et al 2004 find in their fmri study
unexpected omission of reward
depressed activity anterior superior frontal gyrus, FC, temporal pole and superior temporal sulcus
brain recognised omission by reducting
error = depressed activity in ant.PFC and OFC for omission but increased for presentation
define tonic release of dopamine/neuromodulators
sustained release over relatively long periods of time associated with general dopamine activity
define phasic release of dopamine/neuromodulators
bursts of neuronal firning thought to be involved in learning associations between stimuli and consequences
how does dopamine activity relate to the contigencies associated with a reward (fiorrilo 2003)
track the probability that a cue is related to a reward - as probability of assoc changes, so does dopamine activitiy - code the discrpency between expected and actual reward
tonic firing maintained in uncertain trials where probability 50%/0.5
frontotemporal circuit not only process the predictive stimuli and reward but actively encodes the assoc between them
activity of midbrain neuron characteristics (fiorillo, 2003)
signal the prediction of a future reward(spike at cue), unexpected occurence of a reward (spike at presentation) and its unexpected absence (pause in activity)
what does data show about dopamine and regions of the brain associated with reward learning
anterior prefrontal cortex - responds to both types of prediciton error (unexpected and omitted)
medial orbitofronal cortex - activity changes specific to unexpected reward
activity in frontotempotal circuits actively encode the association between stimuli and rewards
define temporal difference error
used in rienforcement learning to predict reward over time
(RESCORLA WAGNER DOES NOT ACCOUNT FOR THIS)
timing in trial taken into account to provide a prediction ofr the time between and stimulus and an expected/unexpected reward -
what does temporal difference error imply about reward learning
constantly updating our understanding about the association between a stimulus and a reward
i.e. expect after long/short period of time etc
what do temporal difference errors predict (o’Doherty et al 2006)
before learn - positive PE (assoc strength) response to reward (UCS)
during learning shift to the CS (cue)
unexpected reward lead to positive PE response at delivery
unexpected omission lead to negative PE response at expected delivery
what did O’Doherty et al 2006 want to investigate
see if human ventral stratium, orbitofrontal cortex etc show activity consistent in temporal difference error predictions using fmri during appetitive conditioning of pleasant taste reward
what did O’Doherty do to look at TD error
calculated TD error from behaviour between cs presentation and reward time
use info as regressor in fmri
look at correlation between change in TD error and brain activity
show parts of brain that track TD error
what did O’Dohery 2006 find
backward shift in time of peak of hemodynamic response in ventral stratium during learning - response transfers from time reward presented towards the time the cue is presented
how did Rolls, McCabe and ReDoute 2008 investigate TD error in a probablistic decision task?
OC -decision influence when high risk for large value reward vs low risk for low reward
ask participants to press left for 10p 90% time or right for 30% either 90%, 70% or 60% of time - told to max winnings
Expected value = choice calculated after choice made ( update over time as calc predicition error between EV and magnitude of reward) - EV x RM+TD -use as regressors to track changes in brain according to fmri
Results of Rolls et al 2008 reward magntitude TD errors
TD error correlate with nucleas accumbens, IFG and midbrain acitivty
change behaviour related to contingencies -
RM corelate with OFC
TD error correlate with NA, frontal gyrus and midbrain
EV neg correlate with anterior insula - expect low reward and when uncertain about outcome (50%)
mOFC respond to reward and to prediciton - reflect prob how much reward obtained based on risk taken
VS pos correlate with reward obtained in present
how can dopamine info be applied in real world
understand how decisions etc go wrong ie schisophrenia
what did Kumar et al 2008 do
compared results of adults with MDD on antidepressant medication (citalopram) with controls of non medicated and acutely medicated patients and scanned with fmri during reward learning task - TD error in CC task
depression assoc w/ anhedonia symptoms - assoc w/ reduce DA functioning
what did kumar et al 2008 find - med depressed
depressive have sig reduced reward learning signals in
- VS
- r/d ACC
-retroplensl cotrex
-midbrain
- hippocampus
abnormal TD correlate with illness severity
BUT enhances signal in VTA- compensatory reponse for blunted reward sig outside brain stem or due to meds?
define positive rienforcement
rewarded for behaviour to encourage repetition
define negative rienforcement
remove neg consequence to encourage repetition
define punishment
pos or neg to weaken response to a stimulus
when might blocking fail
when the introduction of a second stimulus does signal and increase or decrease in rienforcement/punishment than the first sitmulus alone
ie stronger shock
rescorla wagner in the explanation of blocking
novel stimulus has no additional predictive value = 0
first stimulus already established full assoc value with CS = 1
neither CS of novel stimulus changed predictive value because cs already at 1- no suprise
DA in prediction error
has been recorded in primates to track and display “error signals” - activity of dopamine neurons appears to follow similar idea presented in rescorla wagner
milner and olds 1954
DA system in rat behaviour
electrode in hypothallamic areas in brain - pleasure centre
stimulated when approached certain parts of cage
found rats would return to place where recieved previous stimulation
spangel and weiss 1999
DA associated brain areas
self stimulation found in ventra tegmenta area (VTA) which projects to closelet associated limbic structures
thought to be involved in DA system for natural rewards and can be disrupted by drugs
koob and moal 1997
DA and drugs
DA recepors have a rienforcing influence on drug use
increase DA in brain which leads to pleasure
BUT downregulates - need more drug to get same effect which leads to an increase in drug use
areas within the basal ganglia
caudate nucleas putamen globus palidus substantia nigra subthallamic nuclease
overlapping functions within the BG
attention selection
switching
internal generation of movement
rienforcement learning
van schouwenburg 2014 BG processes
in behavioural/attentional switching, BG enhances procesing of attended features and supressess unattended by modulating connection sfrom the PFC (top down) to the visual cortex
BG mediates top down connections
dopamine deficiencies and links to disorder
deficiency of DA in nigrostriatal area related to parkinsons
and disturbance thought to also be responsible to schizophrenic systems
schultz et al 1997 importance of dopamine in prediction
prediction gives animal time to prepate to a future stimulus and react in an appropriate manner
reward value not static - assign values at tiem stimulus is sncounterd and as a function of continued experience
DA in VTA nad SN assoc with reward process
florillo et al 2003 prediction of reward
when reward mag and timing constant, error = prob of outcome compared to actual
monkeys conditions in CC paradigm in prob of stimulus following a liquid rewrd - measure licking beh with increased prob
as prob increase, so does licking behaviour as presentation of predictive cue
describe pearce-hall theory of attention and learning
atention and learning is proportional to uncertaintl about rienforcers
DA facilitates attention and learning when uncertain about reward - attention necessary for learning
BUT when establish an assoc, no further attention required
bayer and glimcher 2004 prediction error
DA respnse = weighted sum of current and past rewards
examined activity in single DA neurons during trial and error
neurons encode difference between current reward and the WEIGHTED AVERAGE of previous rewards
DA firing increase when current > than weighted av of prev BUT the same when current less than weighted av of previous
ramania et al 2004 findings fmri
unexpected reward
activate medial orbital gyrus of OFC, PFC and inferior frontal sulcus
kumar et al 2008 findings med controls
TD signal blunted in rACC, RC, and hippocampus
BUT not increase signal in VTA
- enhanced VTA in depressed unlikely to be due to meds
pizzagalli et al 2009 depression and reward
depressed show reduced positive affects and arousal following gains
sig weaker response to gains in nucleas accumbens and caudate bilaterally
anhedonia and depression severity assoc with reduces caudate volume
- BG in MDD may affect the consummatory phase of reward processing