Operant conditioning Flashcards
What is Thorndike’s law of effect?
If, in a specific situation, a response is followed by a reinforcer, the response will become associated with that situation and will be more likely to occur again in that situation.
What is operant conditioning?
The organism operates on its environment in some way to achieve some desirable outcome
Behaviour is associated with consequences
What are 3 Key features of Skinner’s Operant Box?
• Some behaviour that can be done to obtain reward.
―Rate measured by experimenter
• A dispenser of food or liquid used as a reinforcer (reward)
• Tones or lights to signal availability of opportunity for reward or pending punishment
―Used in discrimination and generalisation studies
What is shaping?
• Shaping is the use of reinforcement of successive approximations of a desired behaviour.
• Specifically, when using a shaping technique, each approximate desired behaviour that is demonstrated is reinforced, while behaviours that are not approximations of the desired behaviour are not reinforced
Incrementally build towards a behaviour (step by step)
You know the end pt but there are lots of actions that need to happen in order to get there so reward each step chronologically
What is positive reinforcement?
Smtg added to the env causes behaviour to aug in f
What is positive punishment?
Something is added to the environment, that causes the behaviour to decrease in frequency ∴ that something must have been
unpleasant
What is negative punishment?
Something is removed from the environment, that causes the behaviour to decrease in frequency ∴ that something must have been pleasant
AKA Response Cost or Omission Training – but regardless of name – they all involve the removal of a stimulus, following the targeted behaviour, that the person values/desires/enjoys.
To facilitate the process they may be reinforced for exhibiting another more desirable behaviour (DRO: Differential Reinforcement of Other behaviour)
If the person makes the “wrong” response then they will lose something of value
So they should learn to inhibit or omit the “wrong” behaviour (omission learning).
What is negative reinforcement?
Something is removed from the environment, that causes the behaviour to increase in frequency ∴ that something must have been unpleasant
Smtg neg removed from the env increases the behaviour that allowed us to avoid the neg (applying sunscreen)
How do different types of reinforcement interact with different emotions?
- Happiness: Positive Reinforcement; Application of Pleasant Stimulus
- Anger: Omission Learning; Removal of a Pleasant Stimulus = Negative punishment
- Relief: Negative Reinforcement; Removal of an Unpleasant Stimulus
- Fear: Positive Punishment; Application of Unpleasant Stimulus
What is a continuous schedule of reinforcement?
- Behaviour is followed by a consequence each time it occurs
- Excellent for getting a new behaviour started
- Behaviour stops quickly when reinforcement stops
- Schedule of choice for punishment and time-out
What is thinning intermittent reinforcement?
• One of two methods commonly used:
―Gradually increasing the response ratio or the duration of the time interval between Response –> Reinforcer
Response ratio = how many times have to respond to get a reward
Can change behaviour to get response ratio we want
Time interval = no matter how many times they respond, only get reward at exact time
— Providing instructions such as rules, directions and signs to communicate the schedule of reinforcement.
i.e. give a cue/signal that Reinforcement is on its way
What are 4 partial schedules used for resistance to extiction?
• Ratio Schedules: (Responses/actions)
• e.g. after the pre-determined number of responses has
been made –> outcome
• Interval Schedules: (Time lapse)
• e.g. the 1st response after the specified time has elapsed
–> outcome
• Fixed Schedules: (set rate/time)
• e.g., every 5 responses (ratio) or every 5 mins (interval) –>
outcome
• i.e., a predictable schedule
• Variable Schedules: (random average)
• E.g., every 2 - 5 responses (ratio) or every 2 - 5 mins
(interval) –> outcome
• i.e., an unpredictable schedule
Combinations: • Fixed-Ratio • Variable-Ratio • Fixed-Interval • Variable-Interval
What is a fixed ratio schedule?
Same ratio continues all throughout
• Behaviour/reinforcement (100/1 or 15/1)
• Response Rate: (Higher ratio = faster responding)
• Behaviour: tend to work hard (Ratio run); receive reinforcement; then brief postreinforcement pause then work hard
Resistance to Extinction: Low
• High rates of responding –> pause after receiving reward (PRP) –> then onwards for the next reward
• Make the number of responses too high –> ratio strain
a disruption in responding due to an overly demanding response requirement
Note also the closer they get to their target # of responses – so the rate of bar pressing increases – known as a ratio run
What is ratio strain?
― A result of abrupt increases in ratio requirements
― Characteristics include: avoidance, aggression, and unpredictable pauses in responding
― Ratio strain is the point of too much energy expended in exchange for too little in return.
What is the goal gradient hypothesis?
Animals in traversing a maze will move at a progressively more rapid pace as the goal is approached
What is a variable ratio schedule?
• Behaviour/Reinforcement: random/unpredictable number of responses between reinforcements
• Response Rate: Fast
• Behaviour: Work hard and at steady rate
Resistance to Extinction: High
How would the rate of responding be affected if rewards came on a temporal basis?
• Reduce it – no point working if its not making the rewards come any faster
What is a fixed interval schedule?
• Behaviour/Reinforcement: After 1st then fixed amount of time
• Response: Scalloped
• Behaviour: High before Reinforcement/Long pause after
• Lowest rate of responding
Resistance to Extinction: Low
What is a variable interval schedule?
• Behaviour/Reinforcement: First Response, then “average” time period elapse. • Response Rate: Slow • Behaviour: Work at Steady rate Resistance to extinction: High
What are 4 procedures of differential reinforcement?
Four procedures that incorporate reinforcement to address and treat disruptive behaviours are:
- Differential Reinforcement of Other behaviour (DRO)
- Differential Reinforcement of Low rates of responding (DRL)
- Differential Reinforcement of Incompatible behaviour (DRI)
- Differential Reinforcement of Alternative behaviour (DRA) not necessarily incompatible
What is differential reinforcement of other behaviours?
• Differential Reinforcement of Other behaviours (DRO)
―In this case the subject periodically receives the positive reinforcer provided it is engaged in other behaviours.
Not punishing the thing you want to get rid of, rewarding other behaviours
If target behaviour has not occurred during an interval, it is rewarded
What are establishing operations?
Establishing Operations (EOs) are factors that affect the effectiveness of reinforcers • The intensity, amount, and type (i.e. quality) of reinforcer determines its effectiveness
What is the reinforcer magnitude?
Larger the reward the faster the acquisition of learning
Reinforcer must be of sufficient magnitude for it to be worth making the response
Needs to be a reasonable relationship between the effort required and the size of the reward
• Reward magnitude is often a matter of “being in the eyes of the beholder”.
• It need not be the absolute size that is important but how it is perceived.
• E.g., two groups of rats reinforced for the same amount of food. Rats run faster for the same amount of food, but when it is broken up into more pieces
• Similar studies show that many small reinforcers are generally more effective than a few large ones
• Magnitude is relative to the person.
What are contrast effects?
All a matter of what you have been used to
• Shifting the value of the reward in “mid-stream” is also effective in changing behaviour known as Contrast Effects
• Reinforcer magnitude is all a matter of relativity
• Contrast effects are obtained when the quality of the reinforcer is switched as well
Responding is influenced by the reinforcement characteristics that an organism has come to expect in its past