- animal psychologists studying instrumental learning before Pavlov: 1. Small (rats in Hampton Court mazes) - issue of not being geared to studying learning process 2. Thorndike (cats in puzzle boxes) - better for learning process focus - learned to escape; faster over trials

- R -> aversive aka. less R - reduces responding

- R -> no appetitive aka. less R - response cancels reward that would normally occur = omission schedule - eventually leads to response reduction

- FR - reinforce every nth response - pause after each followed by fast responding

- VR - reinforce every nth response on average - continuous fast responding

- FI - reinforce first response after time (t) elapsed since last reinforcer - pause after each reinforcement followed by gradually ^ response rate

- VI - same as FI - BUT w/variable time period - continuous moderate response rate

- reinforcement depends on responses number - 1 = continuous reinforcement - not 1 = partial/intermittent reinforcement - fixed ratio schedule = FR10 - variable ratio schedule = VR10

Instrumental Learning Flashcards by Gabriela Ilewicz

EARLY WORK

animal psychologists studying instrumental learning before Pavlov:
1. Small (rats in Hampton Court mazes)
issue of not being geared to studying learning process
2. Thorndike (cats in puzzle boxes)
better for learning process focus
learned to escape; faster over trials

How well did you know this?

Not at all

Perfectly

INSTRUMENTAL CONDITIONING

Law of Effect aka. if reward follows animal response -> association between stimuli/response = strengthened (S-R learning)
concept following naturally post Throndike’s analysis = S-R reflex

How well did you know this?

Not at all

Perfectly

PROCEDURES

POSITIVE REINFORCEMENT
PUNISHMENT
NEGATIVE REINFORCEMENT
OMISSION TRAINING

How well did you know this?

Not at all

Perfectly

POSITIVE REINFOREMENT

R -> appetitive aka. ^ R
reward follows reinforcement
THRONDIKE
animals repeat actions -> satisfying state of affairs
HULL
drive reduction aka. animal works for food if hungry aka. redefined “satisfying state of affairs”

How well did you know this?

Not at all

Perfectly

PUNISHMENT

R -> aversive aka. less R
reduces responding

How well did you know this?

Not at all

Perfectly

NEGATIVE REINFORCEMENT

R -> no aversive aka. ^ R
response stops aversive stimulus that otherwise would have occurred

How well did you know this?

Not at all

Perfectly

OMISSION TRAINING

R -> no appetitive aka. less R
response cancels reward that would normally occur = omission schedule
eventually leads to response reduction

How well did you know this?

Not at all

Perfectly

SCHEDULES OF REINFOREMENT

extinction applies to instrumental conditioning too aka. stop giving reinforcers -> response stops
BUT we can only get away w/reinforcing some responses pps emit w/stil stable conditioned responding
reinforcement schedule = rule for deciding which responses to reinforce
dif schedules -> dif/^ predictable response patterns; instantly recognisable on cumulative record patterns

How well did you know this?

Not at all

Perfectly

SIMPLE SCHEDULES & EFFECTS

CONTINUOUS REINFORCEMENT
FIXED RATIO
VARIABLE RATIO
FIXED INTERVAL
VARIABLE INTERVAL

How well did you know this?

Not at all

Perfectly

CONTINUOUS REINFORCEMENT

CRF
reinforces every response

How well did you know this?

Not at all

Perfectly

FIXED RATIO

FR
reinforce every nth response
pause after each followed by fast responding

How well did you know this?

Not at all

Perfectly

VARIABLE RATIO

VR
reinforce every nth response on average
continuous fast responding

How well did you know this?

Not at all

Perfectly

FIXED INTERVAL

FI
reinforce first response after time (t) elapsed since last reinforcer
pause after each reinforcement followed by gradually ^ response rate

How well did you know this?

Not at all

Perfectly

VARIABLE INTERVAL

VI
same as FI
BUT w/variable time period
continuous moderate response rate

How well did you know this?

Not at all

Perfectly

RATIO SCHEDULES

reinforcement depends on responses number
1 = continuous reinforcement
not 1 = partial/intermittent reinforcement
fixed ratio schedule = FR10
variable ratio schedule = VR10

How well did you know this?

Not at all

Perfectly

INTERVAL SCHEDULES

Study These Flashcards

reinforcement depends on time interval
ratio schedules typically support more rapid responding
variable ratio (ie. get reinforcement on average every 10 responses) smooths it out
interval schedules give rise to quite specific pattern; as interval (ie. 30s) comes to end -> responding ^ til pellet obtained then falls back
smoothed out in variable interval (VI); most commonly used schedule for lever pressing ie. conditioned suppression exps; gives steady responding rate
first response after certain time gives reward; time varies so average = 30s

INSTRUMENTAL LEARNING = PAVLOVIAN?

Study These Flashcards

US = reward ie. food/freedom
UR = natural response ie. eating/approach
CS = starting condition ie. start maze/puzzle box
CR = approach
when rat “learns” lever it may just find it attractive (stimulus substitution) aka. bumping it
is apparent learning simply artifact brought via Pavlovian conditioning

OMISSION SCHEDULE

Study These Flashcards

distinguishes between Pavlovian/instrumental conditioning
if all apparently instrumental learning = Pavlovian conditioning then rat shouldn’t learn this
if tone sounds -> food delivered BUT only if rat doesn’t anticipate/go into deliverance magazine
BUT if it does then it’s cancelled
rat must learn response of not entering magazine

OMISSION SCHEDULE: RESULTS

Study These Flashcards

rat just about learns not to enter magazine
when we pair tone/food -> natural tendency = approach magazine when tone sounds
gradually learns to suppress tendency hence get more ^ food

GRIDNELY’S BIDIRECTIONAL CONTROL

Study These Flashcards

another way to check instrumental learning = not due to amplification of some pre-existing response to CS via US pairing (NOT true instrumental learning)
earliest automated psychology exps on record
guinea pig which likes carrot
will get access to carrot if it turns head left when buzzer sounds, shifting lever
learns to do this

GRINDLEY’S: RESULTS

Study These Flashcards

guinea pigs will learn to turn heard when buzzer has same relationship w/reward = evidence that it’s not simple Pavlovian conditioning
then trained to new response to turn head to right
slow at start as gives old response BUT just as fast as og w/more trials
if it had a tendency to turn left (which was still being reinforced) this cannot explain reversal
has learned at least 1 new response consistent w/instrumental learning

CONTEMPORARY ISSUE

Study These Flashcards

actions/habits; is all instrumental learning the same?
ANS = no
in some circumstances S-R account = correct
clear evidence in others that animal has some expectancy of outcome & modifies beh accordingly

ADAM & DICKINSON

Study These Flashcards

earliest evidence of animals having some representation of outcome in instrumental learning
if outcome = aversive -> less response
animals trained to lever press for sucrose; went through devaluation phase
controls = getting sucrose one day/getting ill next
shouldn’t have any particular effect
exp animals = sucrose/illness paired; should not like sucrose anymore
would still press lever when given opportunity BUT not as much

ADAM & DICKINSON: RESULTS

Study These Flashcards

reaction could depend on how much training given lever pressing beforehand
if normal (100 trials) = tended not to press lever for sucrose they didn’t like (no sucrose delivered in this exp)
if over-trained (500 trials) then they kept pressing lever

ADAM & DICKINSON: HABITS

- over-trained animals exhibited habits - S-R account would expect this - habit = current outcome value has no impact on probability of making response in discriminative stimulus presence - just seeing lever activates response of pressing it automatically (ie. pulling light switch just because you see it)

COLWILL & RESCORLA

- some representation of outcome is involved in determining performance - light on = pressing lever -> food/pulling chain -> sucrose - if tone sounds -> reinforcers swapped around so pressing lever -> sucrose solution etc. - post training = 1 reinforcer devalued by pairing w/illness (1 shown = sucrose solution) - then test in extinction (no reinforcers)

COLWILL & RESCORLA: RESULTS

- response leading to devalued outcome = performed less than the other one - BUT response changes depending on whether light/tone = present - animal has good grasp of what outcome to expect in given situation; avoids the one it doesn't want

CASTAWAYS DILEMMA

- instrumental learning results -> Dickinson suggested 2 learning types: 1. actions (require knowledge of expected outcome) 2. habits (S-R) - tested it via castaways dilemma: someone who is castaway on desert island is hungry; eats coconuts; thirsty but no water; what do? - ANS = obvious (drink coconut milk) BUT can animals do this?

CASTAWAYS DILEMMA: IN THE LAB

WHEN HUNGRY - both outcomes = rewarding/performed WHEN THIRSTY - drive state changed; test in extinction so no further training

CASTAWAYS DILEMMA: DICKINSON (1997) BEFORE

- found no dif in performance of 2 actions - both performed more in control group who'd not been made thirsty - BUT interpreted as general activation of available responses by thirst; seemly reasonable - no sign of any outcome specific activation of an action - realised they'd missed something...

CASTAWAYS DILEMMA: DICKINSON (1997) AFTER

- animals CAN solve the castaway dilemma! - respond more for sugar water under thirst BUT only if you let it learn that 1 reinforcer (sugar water) = valuable under new drive state (thirst) before test - new idea incorporated into original design

CASTAWAYS DILEMMA: ANALYSIS

- incentive learning needed to support drive-related action on basis of available outcomes - Dickinson argued for model of instrumental performance requiring inference on basis of results - animal postulated to reason that: 1. it's thirsty 2. pulling chain -> sugar water 3. sugar water = good when thirsty 4. so it should pull the chain - each step on chain must be available for inference possibility; animal must know sugar water = valued under thirst

SUMMARY I

- instrumental learning cannot be explained purely as Pavlovian conditioning BUT evidence of both oft involved in beh control - 2 forms of instrumental learning: 1. knowledge of action consequences 2. S-R reflex supports habitual responding (via overtraining)

IMPLICATIONS

- consider addiction; role of reinforcement in maintaining drug seeking beh - over time could -> habit formation causing drug seeking beh to become independent of value of the drug; automatic response literally out of control

SUMMARY II

- instrumental performance that isn't habit (ie. S-R) based may well differ from habits/Pavlovian conditioning in important respects - if animal knows consequences of its actions (ie. expected outcome) -> must also represent outcome & relationship to action performed - can use knowledge to make inference in combination w/other knowledge - if animal knows outcome = valuable under certain state + outcome produced by given action (never been performed under said state) => can combine knowledge productively to give appropriate response - this is beyond simple association

Instrumental Learning Flashcards

(35 cards)