Schedules of Reinforcement and Choice Behaviours Flashcards
schedules of reinforcement
indicates what has to be done for the reinforcer to be delivered; occurrence is followed by the reinforcer
- reinforcer delivery can depend on
1. presence of certain stimuli
2. passage of time
3. number of responses
4. etc.
- produce predictable patterns of B
- influences how instrumental responses are learned/maintained by reinforcement
Why are schedules of reinforcement important?
determine:
- rate of instrumental behavior
- pattern of instrumental behavior
- persistence of instrumental behavior
schedules of reinforcement: schedule effects
- highly relevant to motivation of B
- whether person is industrious/lazy has little to do with personality
- has more to do with reinforcement schedule
in the real world, instrumental responses _____ get reinforced each time they occur.
what is the name of this concept?
rarely
intermittent schedules of reinforcement
simple schedule of reinforcement
- single factor determines which occurrence of the instrumental response is reinforced
- i.e. how many responses have occurred
- i.e. how much time has passed before the target response can be reinforced
schedules of reinforcement: ratio
- depend only on the number of responses, time is irrelevant
- reinforcement depends only on number of responses subject has to perform
- reinforcement = delivered each time the set number of responses = reached
ratio schedules: CRF (continuous reinforcement)
- each response results in delivery of reinfrocer
- often part of contingency management programs for drug addiction rehabs
- e.g. clean urine = money reward
- e.g. entering correct ATM pin = let’s you withdraw cash
- this is the only schedule where reinforcement is NOT intermittent
ratio schedules: partial/intermittent
- responding reinforced only sometimes
- enter correct ATM pin, BUT receive “out of order” sign
cumulative record
- way of representing how a response is repeated over time
- shows total (cumulative) number of responses that have occurred up to a particular point in time
ratio run
high and steady rate of responding that completes each ratio requirement
ratio strain
- If the ratio requirement is suddenly increased (e.g., from FR 120 to FR 500), the animal is likely to pause periodically before completion of the ratio requirement
in extreme cases: ratio strain may be so great that animal stops responding altogether
avoiding ratio strain
must be careful not to raise ratio requirement too quickly in approaching the desired FR response requirement
what is the best FR ratio when strengthening a new response?
CRF - FR1
disadvantages of FR1
- satiety and reduced effort due to it being so easy
- time + resource consuming
what is the best approach regarding fixed ratio schedules in order to learn a B?
- moving from a low ratio requirement (a dense schedule) to a high ratio requirement (a lean schedule).
- should be done gradually to avoid “ratio strain” or burnout.
At higher ratios, you can _________ the response at a higher/faster level.
At higher ratios, you can increase the response at a higher/faster level.
fixed-ratio schedule
- reinforcer earned at specific, predictable response instance in a sequence of responses
- e.g. 10 responses per reinforcer = FR10
- e.g. entering correct cell number (response) = FR; reaching the person = reinforcer
- e.g. being paid per item manufactured in a factory
- delivering a quota of 50 flyers (response); being paid (reinforcer) = FR50
fixed-ratio schedule: cumulative record
- total nb of responses that have occurred up to a particular point in time or within specific interval
- complete visual record of when + frequency of subject response during a session
fixed-ratio schedule: post-reinforcement pause
- 0 rate of responding that typically occurs just after reinforcement on FR
- controlled by the upcoming ratio requirement (nb of responses)
- should be called pre-ratio pause: see an intimidating task ahead = you will pause
variable ratio (VR)
- unpredictable amount of effort, nb of responses, required to earn the reinforcement
- e.g. pigeon must make 10 responses in trial 1, 13 in trial 2, 7 in trial 3
- predictable pauses in the rate of responding decreases in likelihood with VR than FR
Real life examples of VR?
- gambling
- fishing
FR + VR responding rates are similar…
- … provided similar nbs of responses required
- … generally, FR responses have pause distribution, while VR schedule have pattern of steady responding
fixed-interval schedule (FI)
- amount of time that must pass before
- constant from one trial to the next
- e.g. pigeons only reinforced if peck after 4 minutes; pigeon learns to wait until the end of interval before INCREASING response rate toward end of each FI
- addition of timing cue increases duration of post-reinforcement pause, BUT shifts responding time closer to the end of FI
- determines when reinforcer becomes available or not
- when delivered, subjects must still make instrumental response
variable-interval schedules
- time required to set up reinforcer = unpredictable
- varies from one trial to next, unlike FI
- subject has to respond to obtain the set-up reinforcer
- maintain steady + stable rates of responding without regular pauses (unlike in FR)
- found in situations where an unpredictable amount of time is required to prepare the reinforcer
ratio vs interval schedules: FR + FI BOTH….
- have post-reinforcement pause after each reinforcer
- produce high rates of responding just before next reinforcer delivery
ratio vs interval schedules: VR + VI BOTH….
have steady rates of responding without predictable pauses
ratio vs interval schedules: VR + VI differences
motivate B differently
- VR induces more responses, motivates most vigorous instrumental B
Why does VR induce a higher rate of responses?
- due to short inter-response times (IRTs) or relationship between response rates and reinforcement
- faster the organism completes VR requirement, the faster the organism will be reinforced
- VI schedules favour waiting longer between responses
- frequent responses before food is set up - short IRTs - will not be reinforced; more likely after interval has timed out
Would you study more for fixed exams or pop quizzes?
pop quizzes (VI) instead of fixed exams (FI)
Why have so many different schedules
different learning techniques
reasons for high VR vs VI rates
- reinforcement = consequences of responding
- faster response schedule completed, the sooner the next reinforcer is obtained
- in VI…. 2 min schedule, if organisms obtains each reinforcer, there is still a limit to the amount of reinforcers they can obtain for a certain amount of time
feedback function
- the relationship between response rates and reinforcement rates calculated over an entire experimental session or an extended period of time
- reinforcement is considered to be the feedback or consequence of responding
choice behaviour - concurrent schedule
- can have 1+ response option/reinforcer
- allows continuous measurement of choices because subject = free to change back and forth between alternatives
- e.g. slot machines
- investigates mechanisms of choice
- often used in laboratory setting
measuring choice behaviour
- calculates relative rates of responding
- assume the same VI schedule
- relative rate of reinforcement = same
- reinforcers would be equal to responses
matching law
- relative rate of responding = relative rate of reinforcement
- if 2 alternative responses are not reinforced according to the same schedule, the relative rate of responding will still be similar to alternative
- choice = not random
- whether B occurs frequently or not depends on:
1. schedule of reinforcement
2. availability of alternative source of reinforcement - rates of response/reinforcement are averaged over duration of experiment
- slide 14
“…if 2 alternative responses are not reinforced according to the same schedule, the relative rate of responding will still be similar to alternative…” example
- if situation 1 FR5 requires 5 responses to get 5 reinforcers, and situation 2 FR10 requires 10 responses for 1 reinforcer…..
- they will still have similar relative rates of responding
- situation 1 will simply occur twice as often as situation 2
differences in the matching of response rates
can be accounted for by…
1. generalized form of matching law
2. response bias
- occurs when response alternatives require different amounts of efforts
- occur if reinforcer for one response is more attractive
2. sensitivity
- sensitivity of choice B to relative rates of reinforcement for response alternative
- undermatching: response ratio lower than reinforcement ratio
- overmatching: response ratio higher than reinforcement ratio
https://www.youtube.com/watch?v=kzrj0CSTq3Q&ab_channel=BrettDiNoviBehavioralKarma
molar theories
- refer to notebook
molecular theories
- refer to notebook
melioration
- to make something better
- operates on time scale between molar and molecular theories
- focuses on local rates of responding + reinforcement (based on time period that a subject devotes to a choice alternative
- predicts subjects will shift their response choice toward alternative that provides highest local rate of reinforcement
- involves assumption of melioration
assumption of melioration
- adjustments in distribution of B between choice alternatives will continue until subject responses have same local rate of reinforcement on each alternative
- matching law ^
- 3rd mechanism of choice
concurrent chain schedules
standard concurrent schedule of reinforcement
- 2+ response alternatives available
- switching can occur any time
in complex B, once a behaviour is made, options are then limited
concurrent chain schedules: studying how choices involve commitment to one choice
stage 1: choice link
- subject chooes between 2 schedules
- either A or B
stage 2: terminal link
- opportunity occurs after initial choice
- “chained” to it until end of schedule’s end of trial
refer to drawing in notebook
self-control
- chooing a large delayed reward over an immediate smaller reward
- why is it hard to work for large but delayed rewards ?
delay discounting
- value of a reinforcer declines as function of how long you have to wait to obtain it
- would you want 50$ now or next week?
- value of reinforcer directly related to reward magnitude
- slide 24
training self-control
shaping
- training that more preference for larger delayed reward, where delay is increased gradually in steps
introducing a distracting task during the delay period may also increase self-control
- therefore preference for a delayed larger reward
fMRI
imaging method
- measures haemodynamic response of brain to neural activity
- when there is more blood flow in a specific
oxygen is delivered to neurons by …
- hemoglobin = oxygen transport protein in red blood cell
- when neuronal activity increases, increased demand for oxygen
most common fMRI method = BOLD (blood oxygenation level dependent signal)
- BOLD corresponds mainly to the concentration of deoxyhemogloblin
increase in blood flow produces increase in ratio of oxygenated hemoglobin relative to deoxygenated hemoglobin in certain active area
- allows fMRI to produce effective map of which neurons are active
- refer to slide 32
reward and movement: basolateral amygdala
a) index of reward magnitude and valence
- positive valence = pleasant emotional stimulus
- negative valence = unpleasant emotional stimulus
b) can motivate B and reinforce new learning
- drug paired cues can increase drug-taking; can be eliminated by lesioning BLA
- lesioning BLA also disrupts escape from fear-eliciting cue
- BUT lesioning BLA has no effect on devaluating reward
biological view of reward: striatum
- caudate, putamen, nucleus accumbens
- modifies B by integrating positive/negative outcomes via 2 pathways, eventually projecting to thalamus
1. direct pathway (slides 35-36): - disinhibits thalamic nuclei
- inhibition of inhibitory regions
- activation of direct pathway results in more output from the thalamus (because it is disinhibited)
2. indirect pathway (“no go”) - to suppress B
refer to diagram in notebook
rewards: rats vs humans
rats
- DA microinjection/electrical stimulation in striatum = highly rewarding
- striatal lesions diminish responding and increase sensitivity to reward devaluation
humans
- disruption of DA input from substantial nigra results in movement abnormality (Parkinson’s)
reward: OFC
- orbitofrontal cortex
- PFC anatomically connected to amygdala + striatum; connected to reward
- OFC implicated in executive control; weighing relative value of each ALTERNATIVE CHOICES
- OFC reward actions = greater activation of medial OFC
- response inhibition = greater activation of lateral OFC
damage to OFC: effects
- value of predicted outcome can not be used to guide B
- difficulty incorporating negative feedback from previous B to guide future B (what did I do wrong?)
- B of such people = controlled by impulsive amygdala
reward: instrumental learning
- lower level subcortical structures
- includes amygdala + striatum
- learning occurs in incremental fashion
- guided by predictability and signal error - OFC
- capable of biasing/changing B based on new info
- weighs benefit of delayed reward + set goals
- allows delayed gratification to pick larger reward
- in rats, greater activation of lower level limbic areas occurs with immediate reward selection
addictive B
- choose immediate outcome/reward despite knowledge of long term negative effects
- opioid addicts show undervalue of delayed reward
- may be due to overactive impulsive system (guided by amygdala and striatum)
- abuse of drugs artificially change reward system bias towards impulsivity