Learning Flashcards
Pavlovian conditioning
- what is it
- issues
Acquisition of new behavioural response to previously neutral stimulus due to experiencing a predictive relationship between it (CS) and a biologically-relevant stimulus (US)
- appetitive conditioning - CS + something nice (US)
- aversive conditioning - CS + something aversive (US)
- CS-US association means CS elicits CR in absence of US
ISSUES:
- can’t explain blocking (from ‘what fires together wires together’) should associate new CS with US as well
- inflexible –> reflexive, not cognitive control, not causal relationship
Neural basis of pavlovian conditioning
Appetitive conditioning based on working of dopamine system –> for reward + learning
- dopamine released from ventral tegmental area + substantia nigra
- mesolimbic projections = VTA –> nucleus accumbens (part of ventral striatum)
- mesocortical projections = VTA –> cortex (particularly frontal lobes)
What does dopamine do in pavlovian conditioning (3 theories)
- evidence for them
- Signals reward –> high reward = high dopamine hit - released in response to receiving a reward
- Incentive motivation –> anticipation of receiving a reward
- Surprise –> it’s the surprise element that is associated with reward (prediction error)
O’Doherty et al., (2002) - fMRI + 2 stimuli (one predicts salt, the other sugar)
- NA + AMY more active during anticipation than receipt
- VTA + SN - more active during anticipation of pleasant outcome than unpleasant
- 2
Merenowicz + Shulz (1994) - single cell recordings in dopamine neurons - monkey
- initially (just CS = no response; US (sweet) = robust dopamine response)
- after learning –> US (not much response - has been predicted by CS); CS (robust response)
- dopamine in response to predictors of reward stronger than reward itself
- 2+3
Necessary conditions for learning
- awareness
- temporal contiguity
- salience
- attention
Awareness of CS-US relationship (explicit or implicit knowledge)
- Lovibond (1992) –> plant (CS) + shock (US): awareness = increased SCR; unaware = no real SCR difference
- Hugdahl + Ohman (1977) - instructive extinction (told no longer relationship): if told = SCR disappeared immediately, if not = gradual decrease; if biologically relevant US - still fear-related SCR present but gradual decrease
- Bechara et al., (1993) - bilateral amygdala or hippocampal damage (double diss.): AMG = explicit awareness NOT implicit - no SCR; HC = no explicit awareness, INTACT SCR (implicit)
Temporal contiguity: US + CS close together in time
- Shanks et al., (1989) - computer key + outcome –> increase delay, decreased judgement of causality (from 70% to chance from 0-16s)
- BUT: flavour-aversion learning (TC not nec):
Andrykowski + Otis (1990) - chemo patients feel nauseous after treatment: no relationship between food + nausea time delay and aversion development (<1 day)
- BUT: blocking (TC not sufficient):
Despite contiguity being same for CS1/CS2 and US, prior learning of CS1-US blocked new learning about CS2
Tobler et al., (2006) - fMRI - prior learning blocked new learning about new stimulus despite same amount of training
Salience: how much you notice/case
- easier to see, new, more important, biological preparedness (inbuilt bias)
Attention: more salient = pay more attention
- previous experience affects attention - latent inhibition (CS pre-exposed without US, retards learning of CS-US after)
- Nelson + Sanjuan (2006) - space-ship computer game, stop clicking when attacked:
pre-exposure: red sensor (meant to be informative of attack but wasn’t)
learning phase: 50% same spacescape, flashes now predictive
worse suppressing mouse clicking if pre-exposed (context specific - not case if new context)
Extinction and inhibition:
- when does extinction occur
- inhibitory associative strength
- superlearning
- blocking
Extinction occurs when CS-US then CS-no US (pavlovian conditioning)
IAS - learn to anticipate absence of US given a particular CS:
- CS1-US –> CR BUT CS1+CS2 -no US –> no response
- CS2 acquires IAS, CS1 maintains excitatory associative strength
- Lovibond et al., (2002) - conditioned inhibition: CS (A,C,D) + shocks; CSB + no shock; CSA+CSE = no shock; –> extinction of CE - C alone still maintains excitatory associative strength
Superlearning: after conditioned inhibition (CS2 having IAS)
- if CS2 paired with new stimulus (CS3) after having IAS + US present –> think CS3 is SUPER strong to override
- Turner et al., (2004) - predicting food allergies –> R PFC activation when PE large –> superlearning increases rPFC activation because high PE
Blocking –> previous association of CS1 + US then CS2 + CS1 –> same US - blocks learning about CS2 because same outcome
Why does pre-exposure retard learning? (latent inhibition/blocking)
- 2 theories of attention
Mackintosh 1975:
- more attention paid to relevant/reliable CS
- blocking occurs because second CS is worse predictor of US than first CS - decreases attention to it
Pearce & Hall (1980):
- only need to pay attention when first learning –> so pay attention to unreliable stimulus as learning still needs to occur (until it reaches a stable asymptote)
Hogarth et al., (2008): distractor X:
- X + A –> noise (reliable predictor of US)
- X + B –> 50% noise (unreliable)
- X + C –> no noise (reliable predictor of absence)
- decreased attention to A and C (reliable and certain), increased attention to B (unreliable) –> suggests PH theory
- measured by fixation
Prediction error
- what is it?
- brain activation
PE:
- if outcome predicted –> no PE (no learning)
- if outcome not predicted –> PE because surprising/unexpected (so learning)
- when CS-US first paired - surprising so learning
- learning complete when reaches asymptote
Brain:
- rPFC sensitive to magnitude of PE regardless of whether excitatory or inhibitory
- superlearning = large activation (Turner et al., 2004)
Rescorla-Wagner rule:
- what is it?
- what does it explain?
- limitations
Increase in associative strength of CS results from extent to which current associative strength deviates from perfect learning (deviation = PE)
- assumes a limited amount of associative strength to go around
Explains:
- blocking –> CSA predicts outcome, CSB added but same outcome - no PE, no learning about CSB
- IAS –> novel CSB paired with CSA (conditioned) + no outcome –> very surprising, change in associative strength will be negative - CSB gains IAS
- Superlearning –> existing IAS is negative, so speedy change - twice the basic learning parameter because super surprising
BUT: can’t explain latent inhibition
- no PE during pre-exposure (CS - no US) so no learning
- shouldn’t affect later learning of CS-US
- maybe look at Pearce-Hall theory instead - attention paid to uncertain outcomes in past
Instrumental learning (operant conditioning)
- what is it?
- reinforcement and punishment
- schedules of reinforcement
Operant conditioning = change in behaviour caused by causal relationship between behaviour + biologically relevant stimulus (reinforcer)
Reinforcement (responding increase over trials):
- positive –> add something good
- negative –> remove something bad
Punishment (responding decrease over trials):
- positive –> add something bad
- negative –> remove something good
Schedules of reinforcement - relationship between how often you do behaviour + whether it has outcome:
- RATIO: link between no. of behaviour + no. of times you’re rewarded (fixed = always same no. needed for reward; variable = same no. of overall reward but behaviour between rewards may vary)
- INTERVAL: get reward at particular time (fixed = always same gap; variable = no. of rewards is same but time between may vary)
- YOKING: one individual (A) on ratio schedule, other (B) rewarded at same time - A will respond consistently highly, B will respond more when they feel it has been a long time between rewards
Neural basis of operant conditioning
Ventral striatum = important for learning about reward
Dorsal striatum = important in stimulus-response learning
O’Doherty (2004):
- ventral striatum activation proportional to PE in both operant + pav. conditioning
- dorsal striatum activation in instrumental learning only
- suggests PE combined with reinforcement of produced behaviour
Habit vs goal directed behaviour (in operant conditioning)
- evidence
- brain areas
Habit:
stimulus -(outcome/reinforcement)-> response - outcome strengthens S-R relationship
- good because: quick reaction (if dangerous stimuli) + quick learning (for dealing with predictable outcomes)
Goal-directed:
stimulus -> response -> outcome - do behaviour to bring about outcome
- interaction between: representation of causal action-outcome relationship + representation of current incentive value of outcome
Klossek et al., 2008: 2 buttons - to 2 attractive video clips, children satiated on one:
- 3+4yrs - respond more to non-devalued video (goal-directed) [NB: symbol to represent video]
- 1+2yrs - similar response to both
- 1-4yrs [not symbol, still of vid. instead] - respond more to devalued video
so. .. it was the representation of video that couldn’t drive behaviour - not goal-directed
BRAIN:
De Wit et al. (2009) - vmPFC = activated in goal-directed learning; dmPFC = activated in habit learning
Valentin et al. (2007):
- medial orbitofrontal cortex (in vmPFC) activated when comparing action choices to rewarding vs neutral outcomes during training (represents rewardingness)
- after devaluation - increase in medial + central PFC activity when choosing high probability action in valued condition
- represents outcome + how valued it is
Generalisations:
- if normal
- if 2 stimuli with opposite associations
Learning needs to be general enough to be flexible BUT not so general it produces inappropriate behaviour
Normal distribution of stimulus property (x) x associative strength of response (y):
- learning generalises to degree proportional with trained stimulus
- associative learning theory
- less similar –> associative strength decreases
Two stimuli - what to do if as similar to both?
- Peak shift –> peak of responding occurs where the distinction between 2 curves is the greatest (see this in pigeons)
- Rule-based –> peak responding shifts to extremes - most like X vs most like O
Wills + Mackintosh (1998) –> artificial dimension with uncategorisable icons
- humans showed peak shift
- peak performance is for stimuli similar to but not same as trained stimuli
- when humans can’t use rules - do associative learning
Categorisation
- exemplars
- prototype effect
- typicality effect
Humans group similar stimuli together in categories
Low distortion exemplars = fewer discrepancies to typical category members
- quicker to categorise
High distortion exemplars = fit into category but aren’t very typical
Prototype effect –> even if participant hasn’t seen prototype before, it will still be more accurately categorised than other novel exemplars
Typicality effect –> low-distortion exemplars will be more accurately classified than high distortion exemplars
Theories of categorisation:
- Exemplar theory
- Prototype theory
EXEMPLAR THEORY:
- We have explicit stored memory of exemplars and carry explicit comparisons with new stimuli
- BUT: amnesiacs show prototype + typicality effects even without memory (Squire + Knowlton, 1995 - as good at categorisation as controls)
PROTOTYPE THEORY:
- during training on a set of exemplars, prototypes are abstracted + generalised to new exemplars based on similarity to prototype
- BUT: can’t explain this associatively based on the current models we have
Rule learning
- patterning (rule vs associative)
- issues with rule learning
Positive patterning: A-, B-, AB+
Negative patterning: A+, B+, AB-
Shanks + Barby (1998) –> allergies + patterning
- if rule-based learning - learn combo = opposite of alone
- if associative learning - think alone = combo
- good learners here use rule-based
- bad learners here learn associatively
BUT: Gambler’s fallacy - based on rule-based learning on false assumption; associative learners do better