Instrumental Learning Flashcards

1
Q

EARLY WORK

A
  • animal psychologists studying instrumental learning before Pavlov:
    1. Small (rats in Hampton Court mazes)
  • issue of not being geared to studying learning process
    2. Thorndike (cats in puzzle boxes)
  • better for learning process focus
  • learned to escape; faster over trials
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

INSTRUMENTAL CONDITIONING

A
  • Law of Effect aka. if reward follows animal response -> association between stimuli/response = strengthened (S-R learning)
  • concept following naturally post Throndike’s analysis = S-R reflex
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

PROCEDURES

A

POSITIVE REINFORCEMENT
PUNISHMENT
NEGATIVE REINFORCEMENT
OMISSION TRAINING

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

POSITIVE REINFOREMENT

A
  • R -> appetitive aka. ^ R
  • reward follows reinforcement
    THRONDIKE
  • animals repeat actions -> satisfying state of affairs
    HULL
  • drive reduction aka. animal works for food if hungry aka. redefined “satisfying state of affairs”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

PUNISHMENT

A
  • R -> aversive aka. less R
  • reduces responding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

NEGATIVE REINFORCEMENT

A
  • R -> no aversive aka. ^ R
  • response stops aversive stimulus that otherwise would have occurred
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

OMISSION TRAINING

A
  • R -> no appetitive aka. less R
  • response cancels reward that would normally occur = omission schedule
  • eventually leads to response reduction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

SCHEDULES OF REINFOREMENT

A
  • extinction applies to instrumental conditioning too aka. stop giving reinforcers -> response stops
  • BUT we can only get away w/reinforcing some responses pps emit w/stil stable conditioned responding
  • reinforcement schedule = rule for deciding which responses to reinforce
  • dif schedules -> dif/^ predictable response patterns; instantly recognisable on cumulative record patterns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SIMPLE SCHEDULES & EFFECTS

A

CONTINUOUS REINFORCEMENT
FIXED RATIO
VARIABLE RATIO
FIXED INTERVAL
VARIABLE INTERVAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

CONTINUOUS REINFORCEMENT

A
  • CRF
  • reinforces every response
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

FIXED RATIO

A
  • FR
  • reinforce every nth response
  • pause after each followed by fast responding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

VARIABLE RATIO

A
  • VR
  • reinforce every nth response on average
  • continuous fast responding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

FIXED INTERVAL

A
  • FI
  • reinforce first response after time (t) elapsed since last reinforcer
  • pause after each reinforcement followed by gradually ^ response rate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

VARIABLE INTERVAL

A
  • VI
  • same as FI
  • BUT w/variable time period
  • continuous moderate response rate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

RATIO SCHEDULES

A
  • reinforcement depends on responses number
  • 1 = continuous reinforcement
  • not 1 = partial/intermittent reinforcement
  • fixed ratio schedule = FR10
  • variable ratio schedule = VR10
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

INTERVAL SCHEDULES

A
  • reinforcement depends on time interval
  • ratio schedules typically support more rapid responding
  • variable ratio (ie. get reinforcement on average every 10 responses) smooths it out
  • interval schedules give rise to quite specific pattern; as interval (ie. 30s) comes to end -> responding ^ til pellet obtained then falls back
  • smoothed out in variable interval (VI); most commonly used schedule for lever pressing ie. conditioned suppression exps; gives steady responding rate
  • first response after certain time gives reward; time varies so average = 30s
17
Q

INSTRUMENTAL LEARNING = PAVLOVIAN?

A
  • US = reward ie. food/freedom
  • UR = natural response ie. eating/approach
  • CS = starting condition ie. start maze/puzzle box
  • CR = approach
  • when rat “learns” lever it may just find it attractive (stimulus substitution) aka. bumping it
  • is apparent learning simply artifact brought via Pavlovian conditioning
18
Q

OMISSION SCHEDULE

A
  • distinguishes between Pavlovian/instrumental conditioning
  • if all apparently instrumental learning = Pavlovian conditioning then rat shouldn’t learn this
  • if tone sounds -> food delivered BUT only if rat doesn’t anticipate/go into deliverance magazine
  • BUT if it does then it’s cancelled
  • rat must learn response of not entering magazine
19
Q

OMISSION SCHEDULE: RESULTS

A
  • rat just about learns not to enter magazine
  • when we pair tone/food -> natural tendency = approach magazine when tone sounds
  • gradually learns to suppress tendency hence get more ^ food
20
Q

GRIDNELY’S BIDIRECTIONAL CONTROL

A
  • another way to check instrumental learning = not due to amplification of some pre-existing response to CS via US pairing (NOT true instrumental learning)
  • earliest automated psychology exps on record
  • guinea pig which likes carrot
  • will get access to carrot if it turns head left when buzzer sounds, shifting lever
  • learns to do this
21
Q

GRINDLEY’S: RESULTS

A
  • guinea pigs will learn to turn heard when buzzer has same relationship w/reward = evidence that it’s not simple Pavlovian conditioning
  • then trained to new response to turn head to right
  • slow at start as gives old response BUT just as fast as og w/more trials
  • if it had a tendency to turn left (which was still being reinforced) this cannot explain reversal
  • has learned at least 1 new response consistent w/instrumental learning
22
Q

CONTEMPORARY ISSUE

A
  • actions/habits; is all instrumental learning the same?
  • ANS = no
  • in some circumstances S-R account = correct
  • clear evidence in others that animal has some expectancy of outcome & modifies beh accordingly
23
Q

ADAM & DICKINSON

A
  • earliest evidence of animals having some representation of outcome in instrumental learning
  • if outcome = aversive -> less response
  • animals trained to lever press for sucrose; went through devaluation phase
  • controls = getting sucrose one day/getting ill next
  • shouldn’t have any particular effect
  • exp animals = sucrose/illness paired; should not like sucrose anymore
  • would still press lever when given opportunity BUT not as much
24
Q

ADAM & DICKINSON: RESULTS

A
  • reaction could depend on how much training given lever pressing beforehand
  • if normal (100 trials) = tended not to press lever for sucrose they didn’t like (no sucrose delivered in this exp)
  • if over-trained (500 trials) then they kept pressing lever
25
Q

ADAM & DICKINSON: HABITS

A
  • over-trained animals exhibited habits
  • S-R account would expect this
  • habit = current outcome value has no impact on probability of making response in discriminative stimulus presence
  • just seeing lever activates response of pressing it automatically (ie. pulling light switch just because you see it)
26
Q

COLWILL & RESCORLA

A
  • some representation of outcome is involved in determining performance
  • light on = pressing lever -> food/pulling chain -> sucrose
  • if tone sounds -> reinforcers swapped around so pressing lever -> sucrose solution etc.
  • post training = 1 reinforcer devalued by pairing w/illness (1 shown = sucrose solution)
  • then test in extinction (no reinforcers)
27
Q

COLWILL & RESCORLA: RESULTS

A
  • response leading to devalued outcome = performed less than the other one
  • BUT response changes depending on whether light/tone = present
  • animal has good grasp of what outcome to expect in given situation; avoids the one it doesn’t want
28
Q

CASTAWAYS DILEMMA

A
  • instrumental learning results -> Dickinson suggested 2 learning types:
    1. actions (require knowledge of expected outcome)
    2. habits (S-R)
  • tested it via castaways dilemma: someone who is castaway on desert island is hungry; eats coconuts; thirsty but no water; what do?
  • ANS = obvious (drink coconut milk) BUT can animals do this?
29
Q

CASTAWAYS DILEMMA: IN THE LAB

A

WHEN HUNGRY
- both outcomes = rewarding/performed
WHEN THIRSTY
- drive state changed; test in extinction so no further training

30
Q

CASTAWAYS DILEMMA: DICKINSON (1997) BEFORE

A
  • found no dif in performance of 2 actions
  • both performed more in control group who’d not been made thirsty
  • BUT interpreted as general activation of available responses by thirst; seemly reasonable
  • no sign of any outcome specific activation of an action
  • realised they’d missed something…
31
Q

CASTAWAYS DILEMMA: DICKINSON (1997) AFTER

A
  • animals CAN solve the castaway dilemma!
  • respond more for sugar water under thirst BUT only if you let it learn that 1 reinforcer (sugar water) = valuable under new drive state (thirst) before test
  • new idea incorporated into original design
32
Q

CASTAWAYS DILEMMA: ANALYSIS

A
  • incentive learning needed to support drive-related action on basis of available outcomes
  • Dickinson argued for model of instrumental performance requiring inference on basis of results
  • animal postulated to reason that:
    1. it’s thirsty
    2. pulling chain -> sugar water
    3. sugar water = good when thirsty
    4. so it should pull the chain
  • each step on chain must be available for inference possibility; animal must know sugar water = valued under thirst
32
Q

SUMMARY I

A
  • instrumental learning cannot be explained purely as Pavlovian conditioning BUT evidence of both oft involved in beh control
  • 2 forms of instrumental learning:
    1. knowledge of action consequences
    2. S-R reflex supports habitual responding (via overtraining)
33
Q

IMPLICATIONS

A
  • consider addiction; role of reinforcement in maintaining drug seeking beh
  • over time could -> habit formation causing drug seeking beh to become independent of value of the drug; automatic response literally out of control
34
Q

SUMMARY II

A
  • instrumental performance that isn’t habit (ie. S-R) based may well differ from habits/Pavlovian conditioning in important respects
  • if animal knows consequences of its actions (ie. expected outcome) -> must also represent outcome & relationship to action performed
  • can use knowledge to make inference in combination w/other knowledge
  • if animal knows outcome = valuable under certain state + outcome produced by given action (never been performed under said state) => can combine knowledge productively to give appropriate response
  • this is beyond simple association