PSY260 - 5. Operant Conditioning Flashcards
Operant Conditioning
whereby organisms learn to make responses in order to obtain or avoid important consequences
Operant conditioning is a form of associative learning
Operant Conditioning
based on avoiding/obtaining a specific outcome
requires organism operate in its environment to determine outcome
instrumental conditioning
Thorndike
first to study behavioural outputs due to operant conditioning - Puzzle boxes
Thorndike
findings of the puzzle box work suggest organisms:
More likely repeat actions they have experienced as producing satisfying consequences
Less likely repeat actions they have experienced as producing undesirable consequences
law of effect
probability that particular behavioural response increases/decreases depending on consequences that have followed that response in the past
law of effect
Stimulus S→Response R→Outcome O
Free-Operant Learning
Thorndike’s learning procedures involved discrete trials
Discrete Trials: operant conditioning paradigm where experimenter defines beginning + end of each trial
B.F. Skinner
believed he could refine Thorndike’s techniques, and devised the Skinner box to do this
Skinner Box
conditioning chamber - reinforcement/punishment is automatically delivered when animal makes a response (lever pressing)
trough on one wall - food delivered automatically
•When animal pressed lever, food dropped into trough
Free Operant Paradigm
Skinner’s paradigm: animal can operate apparatus “freely”, responding to obtain reinforcement/avoid punishment, whenever it chooses
Skinner Box: Extinction
-decrease by #13, no longer gets desired outcome
Free Operant Learning
S (Light ON) → R (Lever Press) → O (Food Release)
S (Light OFF) → R (Lever Press) → O (NO Food Release)
learn to distinguish betw light on/off + understand consequences change
reinforcement
Providing consequences to increase probability of a behaviour occurring again in future
punishment
Providing consequence to decrease probability of a behaviour occurring again in future
Components of Learned Association: 3
stimulus (or set of stimuli)
response (or set of responses)
outcome
Components of Learned Association
3-way association betw S, R, and O
Discriminative Stimuli
stimuli that signal whether particular response will lead to particular outcome
Stimuli
particular set of stimuli, responses + outcomes might become so strongly associated that they become inflexible
habit slip
when discriminative stimulus so strongly associated with response – alarm clock wakes you up and you get dressed for school even on the weekend
Responses
Behaviour given in reaction to stimulus in order for a particular outcome to come about
Shaping
operant conditioning technique in which successive approximations to desired response are reinforced
Chaining
organisms gradually trained to execute complicated sequences of discrete responses
Backwards chaining: longer, more complex set of steps
Reinforcers
particular consequence for associated behaviour that
increases likelihood of behaviour being repeated in future
Primary Reinforcers
stimuli - food, water, sex + sleep - innately reinforcing: organisms naturally driven to obtain these things + tend to repeat behaviours that increase their access to them
Secondary Reinforcers
stimuli no intrinsic value but paired with primary reinforcers/provide access to primary reinforcers
(money, gets us our primary needs)
useful because trainer can deliver reinforcement immediately without waiting till trick is finished
•Although animals will not work for food unless they’re hungry, they may continue to work indefinitely for secondary reinforcers
Punishers
consequence of behaviour leads to decreased
likelihood of behaviour occurring again in future
Effectiveness of Punishment: 1. Discriminative stimuli for punishment can encourage cheating
Discriminative stimuli can signal if response will be punished causing someone to alter their behaviour to avoid punishment only when they believe there will be a consequence
Effectiveness of Punishment: 2. Concurrent reinforcement can undermine punishment
Effectiveness of punishment can be counteracted if reinforcement occurs along with punishment
Effectiveness of Punishment: 3. Punishment leads to more variable behaviour
does not specify what alternate response will occur when an organism explores other possible responses
punishment is not a good way to shape/train particular desired behaviours
Effectiveness of Punishment: 3. Punishment leads to more variable behaviour
reinforcement is a faster way to produce learning than simply punishing the alternate undesired response, as it reduces the likelihood of organism exploring undesirable alternate behaviours
Effectiveness of Punishment: 4. Initial intensity of punishment determines effectiveness
most effective if strong punisher used from the outset – if prior weak punishers are initially given instead, they undermine effectiveness of severe punisher when it finally comes later on
Putting it all Together: Building the S- R-O Association
Rules determining when outcomes delivered - reinforcement schedules
Timing Affects Learning
faster if R-O interval is short
Schlinger & Blakely (1994): Immediate reward delivery following lever press = quicker association formation than delayed reward presentation
Timing Affects Learning
closeness in timing important for effectiveness
Reinforcement/punishment = most effective if no delay betw response + punishing consequence
Timing Affects Learning
society tends to delay delivery of punishment which
undermines punishment’s effectiveness + weakens learning
Timing Affects Learning
Delay betw response + consequence weakens reinforcer/punisher effectiveness because later consequences/outcomes more likely to be associated with other behaviours that occurred during the delay
Self-Control
organism’s willingness to forgo small immediate reinforcement in favor of a large future reinforcement
trade-off
Age impacts ability to wait for delayed reinforcement
Pre-commitments
improve ability to wait for reward
make it harder to go back on commitments needed for long term achievements
Pre-commitments
would need to break their pre-commitment
difficulty associated with breaking a pre-commitment helps people stick to their commitment or promise
Outcomes Can Be Added or Subtracted
When consequence (reinforcer/punisher) is added→positive reinforcement/punishment
Positive Reinforcement
response cause reinforcer to be “added”
over time response becomes more frequent
S (toilet present) → R (empties bladder) → O (praise)
Positive Punishment:
response cause punisher to be “added”
over time response becomes less frequent
S (toilet not present) → R (empties bladder) → O (disapproval)
Consequences Can Be Added or Subtracted
Outcomes/consequences (reinforcers/punishers) can be removed or subtracted to cause learning
Negative Reinforcement
response causes punisher subtracted
over time response becomes more frequent
Behaviour is encouraged (reinforced) because it causes something to be taken away/subtracted from environment
S (headache) → R (take aspirin) → O (no more headache)
Negative Punishment
response causes reinforcer to be subtracted
over time response becomes less frequent
Behaviour not encouraged – something subtracted from environ + subtraction punishes behaviour
S (party) → R (late for curfew) → O (grounded)
Negative Punishment
activity/consequence being restricted needs to be deemed enjoyable by person being punished
Continuous Reinforcement
every instance of the response followed by consequence
Partial Reinforcement
only some responses reinforced - intermittent reinforcement schedules
can be applied to reinforcement/punishment
Partial Reinforcement
fixed ratio, fixed interval, variable ratio, variable interval schedule
Fixed Ratio (FR) Schedule
specific # of responses required before reinforcer delivered
Reinforcement comes after fixed # of responses
schedules can increase gradually
Can often lead to a postreinforcement pause
Postreinforcement Pause
FR schedule of reinforcement - brief pause following period of fast responding leading to reinforcement
Fixed Interval (FI) Schedule
first response after fixed amount of time reinforced
Variable Ratio Schedule
certain number of responses, on avg, required before reinforcer is delivered
Reinforces first response after particular time interval
Variable Ratio Schedule
Responder never knows exactly when reinforcer is coming
Produces a higher rate of responding than fixed ratio schedules
Variable Interval (VI) Schedule
reinforcement schedule where first response after fixed amount of time, on average, is reinforced
Reinforces the first response after an interval that averages particular amount of time
Variable Interval (VI) Schedule
Response rate steadier than fixed interval schedule due to element of uncertainty + since animals periodically check for reinforcement availability
Protestant ethic affect
reward should be earned and that hard workers are morally superior to freeloaders
Clark Hull – drive reduction theory
all learning reflects biological need to reduce drives by obtaining primary reinforcers
•primary reinforcers are not always reinforcing, not created equal
• Negative contrast
organisms given a less preferred reinforcement in place of unexpected and preferred reinforcer will respond less strongly for the less preferred reinforcer and if they had been given that last preferred reinforcer all along
Choice behavior
•Concurrent reinforcement schedules: organism can make any of several possible responses, each leading to different outcome
•Examine how organisms choose to divide their time + efforts among different options
Channel Surfing
Matching a lot of choice behavior
given 2 responses reinforced on VI schedules, organisms relative rate of making each response will match relative rate of reinforcement for that response
•Rate of response for A/rate of response B = rate of reinforcement for a/rate of VI reinforcement for B
Behavioral economics and bliss point
- Study of how organisms allocate their time and resources among possible options
- Bliss point: allocation of resources that provides maximal subjective value to an individual
The Premack principle: responses as reinforcers
- Opportunity to perform highly frequent behavior can reinforce a less frequent behavior
- Watching television is preferred activity, parent restricts television time, making it contingent on homework
- Response deprivation hypothesis: critical variable is merely which response has been restricted
dorsal Striatum + stimulus-response learning
- Info from sensory cortex to motor cortex can travel via indirect route through basal ganglia
- Dorsal striatum – caudate nucleus + putamen
- Receives highly processed stimulus info from sensory cortical areas would project to motor cortex, which produces behavioral response
- critical role in operant conditioning, particularly if discriminative stimuli involved
- Individuals with damage/disruption to striatum show deficit inability to associate stimulus with correct response
Orbitofrontal cortex and learning to predict outcomes
- Underside of front of brain contribute to goal directed behavior by representing predicted outcome
- Receives inputs conveying full range of sensory modalities + visceral sensations [hunger], allowing to integrate many types of information
- Outputs from orbitofrontal cortex travel to striatum, where they can help determine which motor responses are executed
- Lesions tend to show inflexible or inappropriate responding
- Important for associating response with particular outcomes
Orbitofrontal cortex and learning to predict outcomes
- During delay, some neurons in orbitofrontal cortex fire differently, depending on what the reward or punisher is expected
- Medial portion process info about reinforcement, lateral portion process info about punishers
- Some neurons appear to code actual identity of expected outcome
- Neurons may also play a role in helping us select between potential actions based on expected consequences
- Neurons respond with strength proportional to perceived value of each choice
Wanting and liking in the brain
ventral tegmental area (VTA) – small region + midbrain of mammals
•Electrical brain stimulation causes excitement or anticipation of reinforcement
•Hedonic value: goodness of reinforcer, how much we like it
•Motivational value: how much we want a reinforcer and how hard we are willing to work to obtain it
•Only wanting + liking signals both present will arrival of reinforcer evoke responding + strengthen SR Association
Dopamine: how the brain signals wanting
- VTA produces dopamine
- Dopamine release from VTA/SNc triggered by encounters with food, sex, drugs of abuse, and secondary reinforcers
- Drugs that interfere with dopamine production/transmission reduce responding in trained animal
• Incentive salience hypothesis of dopamine function
role of dopamine in operant conditioning is signal how much animal wants particular outcome – how motivated it is to work for it
•Dope depleted animal still willing to eat preferred food if placed in front of them, but unwilling to work hard to earn it
•Increasing brain dopamine levels can increase craving
•Stimulating dopamine system increases wanting but not liking
•Increases in brain dopamine levels tend to enhance new SR learning
Endogenous opioids: how brain signals liking
- Endogenous opioids: naturally occurring neurotransmitter like substances [peptides] with same affects of opiates
- Released in response to primary reinforcers, may be released in response to secondary reinforcers + pleasurable behaviours
- Differences in amount of endogenous opioid release + in specific opiate receptors to determine organisms preference for one reinforcer over another
How do wanting and liking interact
- Some endogenous opioids may modulate dopamine release
* Endogenous opioids with signal liking, which in turn would affect VTAs ability to signal info about want