Chapter 6: Reinforcement and Choice Flashcards
Intrinsic reinforcer
obtain reinforcing value while engaging in the behvaiour
intrinsically motivating
social contact
exercise
Extrinsic reinforcer
things that are provided as a result of the behaviour to encourage more behaviour in the future
ex. reading in children
the only way to teach kids to read is to get them to read and this usually involved enticing with them with social reinforcement (saying “good job”) or other kinds of external reinforcement
More reward does not always mean more
____________. Why?
Reinforcement
bonus for making more parts (i.e. over 50)
only diff between the group that does not get a bonus and the groups that did get a bonus (no difference between the groups who all got bonus but diff bonus values, they all were equally reinforced)
aversives can __________ behaviour
reinforce
the aversiveness drives the behaviour!
Continuous Reinforcement
Behaviour is reinforced every time it occurs
Ratio Schedules
Reinforcer is given after the animal makes the required number of responses
Fixed ratio (FR):
fixed ratio between the number of responses made and reinforcers delivered (e.g., FR 10)
• Key elements: postreinforcement pause, ratio run, and ratio strain (going from 1 peck to 100 pecks, subjects tend to stop responding)
see graph slide 9
Cumulative Record
Based on old cumulative recorder device (Constant paper output, pen jumps with each response)
rate of responding across time!
Variable Ratio (VR):
Different number of responses are required for the delivery of each reinforcer
Value is equal to the average number of
responses made to receive a reinforcer (VR 5)
Responding based on average VR and
minimum
ex. gambling
see graph slide 19 (steep slope of responding
responding at high rate, no postreinforcement pause!)
Interval Schedules
Responses are only reinforced if the response
occurs after a certain time interval.
Fixed interval (FI):
a response is reinforced only if it occurs more than a set amount of time (responses during the interval don’t matter)
Key elements: fixed interval scallop, limited
hold
i. e. have to wait 10 secs, after 10 secs has elapsed, the first peck at the key will gain them the reward!
ex. cramming before tests
see graph slide 16 (low rates of responding
scallopes responding, post-reinforcement pause!)
Variable interval (VI):
responses are reinforced if they occur after a variable interval of time
see graph slide 19
Reynolds 1975
Ratio and Interval Schedules Compared
• Compared rates of key pecking of pigeons on
VR and VI schedules
• Opportunities for reinforcement were made
identical for each bird
• The VI bird could receive reward when the VR
bird was within one response of its reward
With equivalent rate of reinforcement, variable ratio schedules produce a higher rate of responding than variable interval schedules
Variable schedules produce _________ responding compared to Fixed
Variable schedules produce steadier responding compared to Fixed
fixed = post reinforcement pause
Ratio schedules produce ________ of responding than Interval
Ratio schedules produce higher rates of responding than Interval
Source of Differences Between Ratio and Interval Schedules:
Differential reinforcement of Inter-response times
Ratio schedules reinforce shorter IRTs
Interval schedules reinforce longer IRTs
Source of Differences Between Ratio and Interval Schedules: Feedback function
More feedback (reinforcement) comes with more responding on Ratio schedules; not so for Interval Schedules (different jobs differ on this aspect)
Intermittent Schedules
Fewer reinforcers needed
More resistant to extinction
Variable reinforcement/interval schedules are resistant to intinction
Differential reinforcement of high rates (DRH)
Minimum ratio per interval
Differential reinforcement of low rates (DRL)
Maximum ratio per interval
to get rid of a behaviour if you don’t hate it but want it to occur less
Differential reinforcement of paced rates (DRP)
Maximum ratio per interval (DRH)
Minimum ratio per interval (DRL)
Duration Schedules:
response must be made continuously for a period of time
Complex schedules:
Conjunctive schedules, Adjusting schedules, Chained schedules…
Noncontingent Schedules: Fixed time (FT)
Reinforcer occurs following predictable amount of time regardless of behaviour
Choice
Usually considered as a cognitive deliberation
- Here measured based on effect of different,
concurrent payoff schedules
With true and fickle “freedom of choice”, choices would be unpredictable
- Understanding choices in terms of consequences allows for prediction
with concurrent schedules!
Herrnstein, 1961
matching law
would the animal be able to figure out how to respond based on how much food it gets??
measurement of choice
concurrent schedules
Measures of choice
- Relative rate of responding (Behaviour)
BL/(BL + BR)
BL = rate of responding to left choice
BR = rate of responding to right choice
(BR + BL) = total responding
- Relative rate of reinforcement
RL/(RL + RR)
Matching law
Herrnstein, 1961:
Proportion of responding (choice) is equal to the proportion of reinforcement for doing so.
There is a correlation between behaviour and
the environment.
the first equation is = to the second they are proportional = its called the matching law
Relative rates of responding match relative rates of reinforcement
BL/(BL + BR) = RL/(RL + RR)
OR
BL/BR = RL/RR
Basket ball matching
26 players on a large university basketball team
Relative choice of different shot types = relative rate of reinforcement (baskets made)
VR, because you may need to make some shots before taking a 3 pt one
BL/BR = b(rL/rR)^s
real matching law
real matching law
b = bias
s = sensitivity
Perfect matching, s = 1
• Undermatching, s 1
Undermatching
Undermatching, s
Overmatching
s > 1 – Increased sensitivity to rates of reinforcement – “Stick to the best option” – Common with high cost of switching
respond a lot to the best option common if it is costly to switch (long change over delay) will rarely sample from other options that pay off at lower rates
Response Bias
Important when there is a difference between operant behaviours
- Commonly: side bias
Important when there is a choice between
reinforcers or responses
- Biological predispositions
- Quality
Matching Law and Simple Schedules
Rate of operant (Bx) and rate other (BO) activities
= Bx/(Bx+Bo) = rx(rx+ro)
Matching law describes ________, but does not ________
Matching Law describes the behaviour, but
does not explain it
Maximizing theories
Organisms distribute their behaviour so as to obtain the maximum amount of reinforcement over time
Explains ratio schedule choice
Doesn’t always hold
Melioration theories
Making the situation “better” than the recent past
Change from one alternative to the next to improve the local rate of reinforcement
Animals respond so that the local rate is the same on each alternative
Predicted issue: Behaviour is strongly controlled by immediate consequences
Serial Reversal Learning
Over many reversals, re-acquisition speeds up
Compare with initial acquisition and number of reversals
- Behavioural Flexibility
Mid-session Reversal
With a reversal part- way through the session:
- Perseverative Errors
- Anticipatory Errors
- Errors shift with changes in time
- Pigeons predict reversal based on time
- Self-Control?
Concurrent Chain Schedules
• Method to determine choice
– e.g., whether variety is preferred
• Different from concurrent schedules since
animals are not free to switch
• Able to investigate choice with commitment
Self-Control
Commonly used as “willpower”
- Circular logic
- Describes outcome, not process
Better described as:
- 1Choice of impulsive vs. delayed options
Temporal Self-Control
Choose: Smaller-Sooner reward (SS) vs. Larger-Later reward (LL)
• Self-control vs. impulsivity
Waiting in Animals
Different species tolerate delays differently e.g., time they will wait for a threefold increase in reward
- Temporal Self-Control
- chimps are very good at this!
Waiting in Humans (Rosati et al. (2007))
- Temporal Self-Control
- chimps are very good at this!
- chimps are better than humans (expect for when the reward is money)
Rachlin & Green (1972)
• Pigeons chose small reward when no delay
in Phase 1
• In phase 2, pigeons chose large, delayed reward when T between initial and terminal phases was increased
Delay-discounting
Value discounting function : value (V) is directly related to magnitude (M) and inversely related to delay (D), or
V = M/(1 + KD)
V = value of a reinforcer M = magnitude K = discounting parameter D = delay
Madden et al. 1997
•Madden et al. 1997 Opioid-addicted participants steeply discount delayed money and (especially) heroin
it is a tendency to give greater value to rewards as they move away from their temporal horizons and towards the “now”.
Small-But-Cumulative Effects
Malott (1989)
Each choice of an SS reward over a LL reward has only a small effect
- Builds over time
- Difficulty in impulse control
- Establishing rules for acceptable vs. unacceptable behaviour
- Relapse handling: Dealing with steady stream of temptations
Long-term Effects
delayed gratification
Mischel: Delayed Gratification Eating the first marshmallow / less- preferred food correlated longitudinally with: - Lower SAT scores - Less educational and professional achievement - Higher rates of drug use
Simple Self-Control Methods (Skinner)
Physical restraint Deprivation/Satiation Distraction DRO Self-Reinforcement Self-Punishment Shaping
Clinical Implications of self control issues
ADHD Predominantly Hyperactive-Impulsive Type Substance abuse disorders Impulsive overeating Other impulse-control disorders Pathological gambling