Instrumental Learning Flashcards
EARLY WORK
- animal psychologists studying instrumental learning before Pavlov:
1. Small (rats in Hampton Court mazes) - issue of not being geared to studying learning process
2. Thorndike (cats in puzzle boxes) - better for learning process focus
- learned to escape; faster over trials
INSTRUMENTAL CONDITIONING
- Law of Effect aka. if reward follows animal response -> association between stimuli/response = strengthened (S-R learning)
- concept following naturally post Throndike’s analysis = S-R reflex
PROCEDURES
POSITIVE REINFORCEMENT
PUNISHMENT
NEGATIVE REINFORCEMENT
OMISSION TRAINING
POSITIVE REINFOREMENT
- R -> appetitive aka. ^ R
- reward follows reinforcement
THRONDIKE - animals repeat actions -> satisfying state of affairs
HULL - drive reduction aka. animal works for food if hungry aka. redefined “satisfying state of affairs”
PUNISHMENT
- R -> aversive aka. less R
- reduces responding
NEGATIVE REINFORCEMENT
- R -> no aversive aka. ^ R
- response stops aversive stimulus that otherwise would have occurred
OMISSION TRAINING
- R -> no appetitive aka. less R
- response cancels reward that would normally occur = omission schedule
- eventually leads to response reduction
SCHEDULES OF REINFOREMENT
- extinction applies to instrumental conditioning too aka. stop giving reinforcers -> response stops
- BUT we can only get away w/reinforcing some responses pps emit w/stil stable conditioned responding
- reinforcement schedule = rule for deciding which responses to reinforce
- dif schedules -> dif/^ predictable response patterns; instantly recognisable on cumulative record patterns
SIMPLE SCHEDULES & EFFECTS
CONTINUOUS REINFORCEMENT
FIXED RATIO
VARIABLE RATIO
FIXED INTERVAL
VARIABLE INTERVAL
CONTINUOUS REINFORCEMENT
- CRF
- reinforces every response
FIXED RATIO
- FR
- reinforce every nth response
- pause after each followed by fast responding
VARIABLE RATIO
- VR
- reinforce every nth response on average
- continuous fast responding
FIXED INTERVAL
- FI
- reinforce first response after time (t) elapsed since last reinforcer
- pause after each reinforcement followed by gradually ^ response rate
VARIABLE INTERVAL
- VI
- same as FI
- BUT w/variable time period
- continuous moderate response rate
RATIO SCHEDULES
- reinforcement depends on responses number
- 1 = continuous reinforcement
- not 1 = partial/intermittent reinforcement
- fixed ratio schedule = FR10
- variable ratio schedule = VR10
INTERVAL SCHEDULES
- reinforcement depends on time interval
- ratio schedules typically support more rapid responding
- variable ratio (ie. get reinforcement on average every 10 responses) smooths it out
- interval schedules give rise to quite specific pattern; as interval (ie. 30s) comes to end -> responding ^ til pellet obtained then falls back
- smoothed out in variable interval (VI); most commonly used schedule for lever pressing ie. conditioned suppression exps; gives steady responding rate
- first response after certain time gives reward; time varies so average = 30s
INSTRUMENTAL LEARNING = PAVLOVIAN?
- US = reward ie. food/freedom
- UR = natural response ie. eating/approach
- CS = starting condition ie. start maze/puzzle box
- CR = approach
- when rat “learns” lever it may just find it attractive (stimulus substitution) aka. bumping it
- is apparent learning simply artifact brought via Pavlovian conditioning
OMISSION SCHEDULE
- distinguishes between Pavlovian/instrumental conditioning
- if all apparently instrumental learning = Pavlovian conditioning then rat shouldn’t learn this
- if tone sounds -> food delivered BUT only if rat doesn’t anticipate/go into deliverance magazine
- BUT if it does then it’s cancelled
- rat must learn response of not entering magazine
OMISSION SCHEDULE: RESULTS
- rat just about learns not to enter magazine
- when we pair tone/food -> natural tendency = approach magazine when tone sounds
- gradually learns to suppress tendency hence get more ^ food
GRIDNELY’S BIDIRECTIONAL CONTROL
- another way to check instrumental learning = not due to amplification of some pre-existing response to CS via US pairing (NOT true instrumental learning)
- earliest automated psychology exps on record
- guinea pig which likes carrot
- will get access to carrot if it turns head left when buzzer sounds, shifting lever
- learns to do this
GRINDLEY’S: RESULTS
- guinea pigs will learn to turn heard when buzzer has same relationship w/reward = evidence that it’s not simple Pavlovian conditioning
- then trained to new response to turn head to right
- slow at start as gives old response BUT just as fast as og w/more trials
- if it had a tendency to turn left (which was still being reinforced) this cannot explain reversal
- has learned at least 1 new response consistent w/instrumental learning
CONTEMPORARY ISSUE
- actions/habits; is all instrumental learning the same?
- ANS = no
- in some circumstances S-R account = correct
- clear evidence in others that animal has some expectancy of outcome & modifies beh accordingly
ADAM & DICKINSON
- earliest evidence of animals having some representation of outcome in instrumental learning
- if outcome = aversive -> less response
- animals trained to lever press for sucrose; went through devaluation phase
- controls = getting sucrose one day/getting ill next
- shouldn’t have any particular effect
- exp animals = sucrose/illness paired; should not like sucrose anymore
- would still press lever when given opportunity BUT not as much
ADAM & DICKINSON: RESULTS
- reaction could depend on how much training given lever pressing beforehand
- if normal (100 trials) = tended not to press lever for sucrose they didn’t like (no sucrose delivered in this exp)
- if over-trained (500 trials) then they kept pressing lever
ADAM & DICKINSON: HABITS
- over-trained animals exhibited habits
- S-R account would expect this
- habit = current outcome value has no impact on probability of making response in discriminative stimulus presence
- just seeing lever activates response of pressing it automatically (ie. pulling light switch just because you see it)
COLWILL & RESCORLA
- some representation of outcome is involved in determining performance
- light on = pressing lever -> food/pulling chain -> sucrose
- if tone sounds -> reinforcers swapped around so pressing lever -> sucrose solution etc.
- post training = 1 reinforcer devalued by pairing w/illness (1 shown = sucrose solution)
- then test in extinction (no reinforcers)
COLWILL & RESCORLA: RESULTS
- response leading to devalued outcome = performed less than the other one
- BUT response changes depending on whether light/tone = present
- animal has good grasp of what outcome to expect in given situation; avoids the one it doesn’t want
CASTAWAYS DILEMMA
- instrumental learning results -> Dickinson suggested 2 learning types:
1. actions (require knowledge of expected outcome)
2. habits (S-R) - tested it via castaways dilemma: someone who is castaway on desert island is hungry; eats coconuts; thirsty but no water; what do?
- ANS = obvious (drink coconut milk) BUT can animals do this?
CASTAWAYS DILEMMA: IN THE LAB
WHEN HUNGRY
- both outcomes = rewarding/performed
WHEN THIRSTY
- drive state changed; test in extinction so no further training
CASTAWAYS DILEMMA: DICKINSON (1997) BEFORE
- found no dif in performance of 2 actions
- both performed more in control group who’d not been made thirsty
- BUT interpreted as general activation of available responses by thirst; seemly reasonable
- no sign of any outcome specific activation of an action
- realised they’d missed something…
CASTAWAYS DILEMMA: DICKINSON (1997) AFTER
- animals CAN solve the castaway dilemma!
- respond more for sugar water under thirst BUT only if you let it learn that 1 reinforcer (sugar water) = valuable under new drive state (thirst) before test
- new idea incorporated into original design
CASTAWAYS DILEMMA: ANALYSIS
- incentive learning needed to support drive-related action on basis of available outcomes
- Dickinson argued for model of instrumental performance requiring inference on basis of results
- animal postulated to reason that:
1. it’s thirsty
2. pulling chain -> sugar water
3. sugar water = good when thirsty
4. so it should pull the chain - each step on chain must be available for inference possibility; animal must know sugar water = valued under thirst
SUMMARY I
- instrumental learning cannot be explained purely as Pavlovian conditioning BUT evidence of both oft involved in beh control
- 2 forms of instrumental learning:
1. knowledge of action consequences
2. S-R reflex supports habitual responding (via overtraining)
IMPLICATIONS
- consider addiction; role of reinforcement in maintaining drug seeking beh
- over time could -> habit formation causing drug seeking beh to become independent of value of the drug; automatic response literally out of control
SUMMARY II
- instrumental performance that isn’t habit (ie. S-R) based may well differ from habits/Pavlovian conditioning in important respects
- if animal knows consequences of its actions (ie. expected outcome) -> must also represent outcome & relationship to action performed
- can use knowledge to make inference in combination w/other knowledge
- if animal knows outcome = valuable under certain state + outcome produced by given action (never been performed under said state) => can combine knowledge productively to give appropriate response
- this is beyond simple association