3A: Operant Conditioning and Schedules of Reinforcement Flashcards
Principles of operant conditioning
Involves the strengthening or weakening of a behaviour as a result of the consequences.
Behaviours are voluntary or goal-directed.
The consequence of the behaviour affects future occurrences of that behaviour.
Reinforcers strengthen behaviours, punishers reduce a behaviour.
Law of effect
Behaviour is controlled by its consequences.
Behaviours that result in pleasant consequences will be more likely in the future.
Behaviours that result in unpleasant consequences will be less likely in the future.
Two types of behaviours
Reflexive type: involuntary, named respondent behaviour.
Operant: voluntary, behaviours controlled by consequences.
Operant antecedents
Discriminative stimulus (S^D): Indicates that a response will be followed by a contingency (reinforcer or punisher) e.g. light signals pressing a lever will now produce food.
Positive reinforcement
When behaviour is strengthened because it is followed by a reinforcing or rewarding stimulus
e.g. smile at someone (R) -> person smiles back (S^R)
Negative reinforcement
When behaviour is strengthened because it is followed by the removal of an aversive stimulus
e.g. Take a panadol (R) -> eliminate a headache (S^R)
Escape learning
Learning of a response that allows a subject to escape an aversive stimulus (e.g. switch off an electric shock).
Avoidance learning
Learning of a response that allows a subject to avoid an aversive stimulus.
e.g. learning that when a light comes on the shock is about to start and they much press the bar to prevent the shock.
Operant learning
Any procedure or experience in which a behavior becomes stronger or weaker (e.g., more or less likely to occur), depending on its consequences. Also called instrumental learning.
Using positive reinforcement
It is important for learning that an organism wants to take part in activities and learns new skills via desired behaviours, not because it is scare of a consequence/being punished.
Primary reinforcers
Unlearned
Inherently reinforcing because they satisfy a biological need (e.g. food, water)
Unconditioned reinforcers
Secondary reinforcers
Conditioned reinforcers
Are learnt or become reinforcers after being associated with primary reinforcers (e.g. money)
Natural reinforcers
Any reinforcer that is the spontaneous consequence of a behavior. Also called automatic reinforcer.
e.g. brush your teeth in the morning and morning breath goes away.
Contrived reinforcer
Any reinforcer that is provided by someone for the purpose of changing behavior.
Contingency
The extent to which the behaviour and the consequence are correlated.
The stronger the correlation, the more effective the reinforcer is likely to be.
e.g. if likely to get food by not pressing lever, won’t continue pressing lever.
Contiguity
The gap between a behaviour and its consequence.
In general, the shorter the interval the faster learning occurs.
Usually, if too long left between, can cause confusion.
Some learning can occur despite a delay, however.
Reinforcer characteristics
Some reinforcers work better than others.
The size and the strength of the reinforcer can impact conditioning.
Generally, a large reinforcer will be more effective than a small one. BUT frequent small reinforcers may work better.
Behaviour characteristics
Certain aspects of a behaviour may be easier to learn than others.
Remember: task difficulty will vary with species and that it is easier to train/teach behaviours that are somewhat aligned to an animals natural behaviour.
Motivating operations
Anything that changes the effectiveness of a consequence - either in terms of increasing or decreasing its effectiveness.
Establishing operations
Increase the effectiveness of a consequence.
The greater the deprivation the more powerful the reinforcer e.g. food.
Abolishing operations
Decrease the effectiveness of a consequence.
Drive reduction theory
The event is reinforcing if it is associated with a reduction of a physiological drive (primary reinforcer).
Not comprehensive enough, unsatisfactory explination of reinforcers.
Premack’s principle
Helps us understand what can be used as a reinforcer.
High probability behaviour can be used to reinforce a low probability behaviour.
Reinforcers as behaviours and reinforcement as a sequence of two behaviours:
1. Behaviour being reinforced
2. Behaviour that is the reinforcer
e.g. rats running get a drink
Response deprivation hypothesis
The theory of reinforcement that says a behavior is reinforcing to the extent that the organism has been deprived (relative to its baseline frequency) of performing that behavior.
Behavioural bliss point approach
An organism that has free access to alternative activities will organise its behaviour to maximise its overall (optimal) reinforcement.
Escape behaviour (theory of avoidance)
Performing a behaviour stops an aversive stimulus, and as such strengthens that behaviour
Avoidance behaviour (theory of avoidance)
Performing a behaviour prevents an aversive stimulus from happening, and as such strengthens that behaviour.
Two-process theory of avoidance
The view that avoidance and punishment involves two procedures—classical conditioning (fear response) and operant learning (negative reinforcement).
Two-process theory of avoidance problem
Avoidance responses can be extremely persistent.
Possible explanation: anxiety conservation hypothesis - avoidance behaviours occur so quickly that there is insufficient exposure to the CS for extinction to take place.
One-process theory
Escape and avoidance behaviours are reinforced by the reduction of the aversive stimulus.
Acquisition
The initial stage of learning - learning a pattern of responding or the association between behaviour and reinforcer.
A gradual process that requires shaping.
Shaping
The reinforcement of closer and closer approximations of the desired behaviour.
Important when subject does not on its own perform the desired behaviour.
Extinction
The gradual weakening and elimination of the response tendency.
Achieved through halting the reinforcement. The time this takes depends on how resistant the subject is to extinction.
If the response ceases, it has been extinguished.
Chaining
Training a person or animal to perform a sequence of behaviours.
Involves breaking down a behaviour or sequence into its components using task analysis. Then reinforcing the performance of each component.
Forward chaining
Reinforce the first component, then when it is performed we add the second component reinforcing performance of the two together until this is completed without hesitation, then add the third and so on.
Backward chaining
Starting with the last link in the chain and building towards the first component.
This is often the more efficient and easier approach.
Continuous reinforcement (CRF)
Every occurance of an operant response is followed by a reinforcer.
Intermittent reinforcement
Only some occurrences of the operant response are followed by a reinforcer.
Close alignment with life.
Steady-state behaviours emerge once there has been considerable exposure to the schedule.
Fixed ratio schedule (FR)
Reinforcement depends upon a fixed/predictable number of responses emitted since the last reinforcer.
FR4 = the 4th response is followed by reinforcement.
Post-reinforcement pause.
Low resistance to extinction
Variable ratio schedule (VR)
Reinforcement depends upon a variable/unpredictable number of responses emitted since the last reinforcer.
High and steady response rates
Little or no post-reinforcement pauses.
High resistance to extinction.
Fixed interval schedule (FI)
A response is reinforced when a fixed/predictable period of time has elapsed since the last reinforcer.
Scallop pattern of responding
Post-reinforcement pause
Low resistance to extinction
Variable interval schedule (VI)
A response is reinforced when a variable/unpredictable period of time has elapsed since the last reinforcer.
Moderate-steady rate of responding
No post-reinforcement pause
High resistance to extinction
Extinction burst
Temporary increase in frequency and intensity of responding when implemented
Side effects of extinction
Extinction burst Increase in variability Emotional behaviour Aggression Resurgence Depression
Resurgence
Unusual but like regression - reappearance of previously successful behaviour
Spontaneous recovery
Reappearance of extinguished response after rest period
Repeated effects required for learning due to presence of discriminative stimulus.
Differential reinforcement of other behaviour (DRO)
Simultaneously extinguish behaviour while reinforcing alternative behaviour
No deprivation of reinforcement in the setting, thus reducing likely side effects and can achieve desired outcome but for alternativ behavioural processes.
Duration schedules
A behaviour muse be performed continuously for a period of time (either fixed or variable)
Time schedules
A reinforcer is delivered after a period of time (either fixed or variable) regardless of what behaviour occurs.
Progressive schedules
The requirement for the reinforcement increases in a predetermined way following each reinforcement
Chained schedules
Sequence of simple schedules in a specific order
Multiple schedules
A mix of 2 or more simple schedules
Mixed schedules
Requirements for reinforcement are a combination of two or more simple schedules
Cooperative schudules
Reinforcement is contingent on the behaviour of two or more individuals
Concurrent schedules
Two or more schedules are available at once and the individual or animal must choose between them
The discrimination hypothesis
Extinction takes longer after intermittent reinforcement because it is harder to discriminate between an intermittent schedule and extinction, than it is to discriminate between continuous reinforcement and an extinction procedure.
The frustration hypothesis
Non-reinforcement of a previously reinforced behaviour is frustrating and as frustration is an aversive state, anything that reduces frustration will be reinforcing.
In a partial reinforcement schedule, performing the behaviour becomes a reinforcer for reducing frustration and as such continues during a phase of extinction.
The sequential hypothesis
The idea that the partial reinforcement effect occurs because the sequence of reinforced and nonreinforced behaviors during intermittent reinforcement becomes a signal for responding during extinction.
The response unit hypothesis
The partial reinforcement effect is due to differences in the definition of a behaviour during intermittent and continuous reinforcement.
Matching law
The principle that, given the opportunity to respond on two or more reinforcement schedules, the rate of responding on each schedule will match the reinforcement available on each schedule.
e.g. 10% of reward, 10% use this option.
Melioration Theory
Distribution of behaviour in a choice situation shifts toward those alternatives that have higher value regardless of the long-term effect on the overall amount of reinforcement