Week 4 Flashcards
Classical Conditioning
Ivan Pavlov
• Learning via association
Classical (Pavlovian) conditioning relies on the
formation of reflexive associations between stimuli,
resulting in involuntary responses
Operant Conditioning
B.F. Skinner
• Learning via reinforcement
Operant conditioning (sometimes called instrumental
conditioning) relies on the consequences of past
actions influencing future behaviour, resulting in
increase or decrease of voluntary behaviours
Operant Conditioning- Principle
Operates on a simple but powerful principle:
– Consequences lead to change in voluntary
behaviours
– A behaviour that results in a reward tends
to be repeated or become more frequent.
– A behaviour that results in a punishment tends
to be avoided or become less frequent.
A brief history: Thorndike
Edward Thorndike put cats inside puzzle boxes • Cats could escape the box by pulling a string, stepping on a platform, and turning a latch on the door (etc.) • Cats get quicker at this with experience
The Law of Effect
‘Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur.’ – EL Thorndike, 1898. The tendency to perform an action is increased if rewarded, weakened if it is not
Skinner Box
Small Box used to conduct operant conditioning on animals
Teaching New Behaviour - example
But how to get the rat to press the lever in the first
place?
• First option: wait…
• Second option: reinforce any behaviour that could
lead to desired behaviour (i.e., shaping)
– selective reinforcement of behaviour resembling the
desired target behaviour (e.g. going within 5 cm from the
lever, touching the lever).
What is Real World Learning?
No experimenter/trainer • Real world reinforcement (e.g. in foraging) • Animal adapts behaviourally to environmental feedback
Superstitious Behaviour
Skinner (1948) discovered that if you randomly reward pigeons (e.g. by giving a reward every 15 seconds), they end up repeatedly performing distinct behaviours between food presentation
• Self-perpetuating because the increased display of what was reinforced before, seems to increase the chance of being rewarded again
• Akin to ‘superstitious behaviour’ – reflecting apparent belief that things they do cause the random rewards (even if there is no causation)
• Random reinforcement shapes behaviour
Not just pigeons
– Athletes with warm up rituals
– Lucky clothes
– Other lucky charms
• Even if there is actually no true association between a
behaviour and an outcome we
expect and try to find links
– Pedestrian crossing - push button traffic light repeatedly
Techniques for Teaching New Behaviour
Shaping (scan & capture)
• Baiting
• Mimicking
• Sculpting
• Instruction (language)
– Simply imagining the behaviour-reinforcement pairing
can be enough to “repeat” the behaviour in reality
Teaching New Behaviour: Chaining
Many behaviours are made up of smaller behaviours
• Acquiring a behaviour is easier if done in bits and
pieces
• Can be done forward or backward
• To shape a behaviour, it’s often best to start with the
last behaviour in the chain
– Backward chaining
Reinforcers and Punishers
The consequence of one’s actions this time (after a
response, R, follows a stimulus, S) determines the
likelihood of that behaviour happening again when
the next instance of the stimulus occurs…
• Reinforcer: increases behaviour
• Punisher: decreases behaviour
Types of Reinforcers and Punishers
• Positive (add) – The person/animal receives something • A shock • Ice cream Negative (subtract) – Something is taken away from the person/animal • Night off from chores • Removing TV privileges
Reinforcement vs
Punishment:
•Reinforcement increases behaviour (and isn’t necessarily rewarding in itself) •Punishment decreases behaviour (and isn’t necessarily irritating in itself)
Positive vs Negative:
•Positive adds something (and
isn’t necessarily good)
•Negative takes away something
(and isn’t necessarily bad)
Positive
Reinforcement:
Adds something to increase a behaviour •Finish your homework and you can have an ice-cream •Gold stars for good behaviour
Bridging
In animal training, it is useful to teach association
between a stimulus that can be delivered
immediately, and a subsequent reward
– Short phoneme
– Whistle
– Clicker
• The stimulus hence comes to signal the arrival of
the reward – it is a conditioned reinforcer - and
effectively bridges the time between the
behaviour and the primary reinforcement
– Combines associative and operant conditioning!
Positive
Punishment
•Adds something to
decrease a behaviour
•Anti-barking collars
•Getting told off
Negative
Reinforcement:
Removes something to increase a behaviour • Removes discomfort (e.g. heat) •Giving student a night off from chores after good marks
Negative
Punishment:
Removes something to decrease a behaviour •Losing your license •Being put in time out
Schedules of Reinforcement
Continuous (CRF): Each response reinforced
• Partial (PRF): Only some responses reinforced
Ratio vs. Interval
Ratio- Instances of the behaviour
Interval- Time of the behaviour
Schedules of Reinforcement
Partial (PRF): Only some responses reinforced
– Fixed ratio (FR): Every Nth action (e.g., pay on commission)
• 100 responses: rewarded for 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
– Variable ratio (VR): On average every Nth action
(e.g., gambling)
• 100 responses: rewarded for 3, 7, 28, 34, 66, 67, 70, 84, 91, 95
– Fixed interval (FI): First behaviour after N seconds (e.g., waiting for a bus)
– Variable interval (VI): On average, first behaviour after N seconds (e.g., checking email)
Schedules of Reinforcement:
Ratio vs Interval
Ratio schedules are more efficient
– Because the reinforcement directly depends on
how many times the animal produces the action
• VR schedule is most resistant to extinction
– Because it teaches persistence (e.g. gambling)
Why is gambling so addictive?
What if wins were on a fixed ratio schedule?
– E.g., you lose $10 the first 9 bets, but win $80 every 10th
• You predictably lose $10 for every 10 bets
– No one would ever play!
• But because of the unpredictability of variable ratio:
– Sometimes you get on a lucky run
• E.g., you win $80 twice in a row or three times out of 10
• And this is very reinforcing!
– But you still lose in the long run
How To Punish Effectively
- No escape
- As intense as
possible (within
limits) - Continuous schedule
- No delay
- Over a short period
of time - No subsequent
reinforcement - Reinforce incompatible, appropriate behaviour
….concurrently - Watch for side effects
– Changes in other behaviours
– Aggression
– Fear
– Modelling of violence
– Learned helplessness
Reward Variables
It is not only the schedule that affects conditioning • Other variables also matter: – Drive – Size – Delay
Reward Variables: Drive
It’s not only the schedule that affects conditioning – Drive – Size – Delay Reinforcement depends on how much the organism wants the reinforcer • Hungry organism vs sated organism
What motivates your dog?
Individual differences • E.g. Identify the reward that works best – Give choices (food, toys etc.) and see what they pick – Only ever use most desired item for training • E.g. Identify which dog is likely to make a good sniffer dog – Throw target in grass and hold dog for 5sec, then 15sec, then 30 sec – is dog still searching? – Will dog search for 60 seconds if target removed?
Reward Variables: Size
It’s not only the schedule that affects conditioning – Drive – Size – Delay • In operant conditioning, size does matter • Animals in a Skinner box learn faster if they get more food pellets • BUT: diminishing returns (next slide) • Acquisition: faster with large/desired reward • Extinction: faster with large/desired reward
Reward Variables: Delay
Increasing the delay reduces the learning effect
Big Problem:
– Short term reinforcement vs. long term punishment
Reinforcers work better when
Drive/Desire is higher
– Reinforcer is larger (but this effect tapers off)
– Reinforcer is immediate
The Three Term Contingency
i.e. not every behaviour is appropriate in every situation
- The discriminative stimulus
- Sets the occasion - The operant response
- The behaviour - The outcome (reinforcer/punisher) that
follows
- The consequence
Skinner argued that these three terms are the
basis of operant conditioning
Stimulus Control
In the three-term contingency a discriminative stimulus serves to signal the occasion when a particular behaviour will be reinforced/punished • So learning to discriminate the stimulus is key to operant conditioning • Stimuli become signals if… – Predictive of a consequence • Stimulus Generalization • Stimulus Discrimination …occurs when your behaviour comes to be under the control of the stimulus • The behaviour happens when the stimulus is present and doesn’t happen when the stimulus is absent • Much of our everyday behaviour is under stimulus control • Can you think of examples of human behaviour that seem as if they are under stimulus control? • Where it always happens when the stimulus is present and never happens when it’s absent? Examples: • Traffic lights • Typical talking distances • Social drinkers/smokers • Social behaviours
• Stimulus Generalization
Definition: When a response is reinforced in the
presence of one stimulus there is a general tendency
to respond in the presence of new stimuli that have
similar physical properties or have been associated
with the stimulus
• Loose degree of stimulus control
• Stimulus Discrimination
– Definition: Degree to which different stimuli set the
occasion for particular responses
• Precise degree of stimulus control
• How? Stimulus discrimination is taught by using
discrimination training procedures such as
differential reinforcement
Schedules of Reinforcement
Partial (PRF): Only some responses reinforced
– Fixed ratio (FR): Every Nth action (e.g., pay on commission)
• 100 responses: rewarded for 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
– Variable ratio (VR): On average every Nth action
(e.g., gambling)
• 100 responses: rewarded for 3, 7, 28, 34, 66, 67, 70, 84, 91, 95
– Fixed interval (FI): First behaviour after N seconds (e.g., waiting for a bus)
– Variable interval (VI): On average, first behaviour after N seconds (e.g., checking email)
Schedules of Reinforcement:
Ratio vs Interval
Ratio schedules are more efficient
– Because the reinforcement directly depends on
how many times the animal produces the action
• VR schedule is most resistant to extinction
– Because it teaches persistence (e.g. gambling)
Why is gambling so addictive?
What if wins were on a fixed ratio schedule?
– E.g., you lose $10 the first 9 bets, but win $80 every 10th
• You predictably lose $10 for every 10 bets
– No one would ever play!
• But because of the unpredictability of variable ratio:
– Sometimes you get on a lucky run
• E.g., you win $80 twice in a row or three times out of 10
• And this is very reinforcing!
– But you still lose in the long run
How To Punish Effectively
- No escape
- As intense as
possible (within
limits) - Continuous schedule
- No delay
- Over a short period
of time - No subsequent
reinforcement - Reinforce incompatible, appropriate behaviour
….concurrently - Watch for side effects
– Changes in other behaviours
– Aggression
– Fear
– Modelling of violence
– Learned helplessness
Reward Variables
It is not only the schedule that affects conditioning • Other variables also matter: – Drive – Size – Delay
Reward Variables: Drive
It’s not only the schedule that affects conditioning – Drive – Size – Delay Reinforcement depends on how much the organism wants the reinforcer • Hungry organism vs sated organism
What motivates your dog?
Individual differences • E.g. Identify the reward that works best – Give choices (food, toys etc.) and see what they pick – Only ever use most desired item for training • E.g. Identify which dog is likely to make a good sniffer dog – Throw target in grass and hold dog for 5sec, then 15sec, then 30 sec – is dog still searching? – Will dog search for 60 seconds if target removed?
Reward Variables: Size
It’s not only the schedule that affects conditioning – Drive – Size – Delay • In operant conditioning, size does matter • Animals in a Skinner box learn faster if they get more food pellets • BUT: diminishing returns (next slide) • Acquisition: faster with large/desired reward • Extinction: faster with large/desired reward
Reward Variables: Delay
Increasing the delay reduces the learning effect
Big Problem:
– Short term reinforcement vs. long term punishment
Reinforcers work better when
Drive/Desire is higher
– Reinforcer is larger (but this effect tapers off)
– Reinforcer is immediate
The Three Term Contingency
i.e. not every behaviour is appropriate in every situation
- The discriminative stimulus
- Sets the occasion - The operant response
- The behaviour - The outcome (reinforcer/punisher) that
follows
- The consequence
Skinner argued that these three terms are the
basis of operant conditioning
Stimulus Control
In the three-term contingency a discriminative stimulus serves to signal the occasion when a particular behaviour will be reinforced/punished • So learning to discriminate the stimulus is key to operant conditioning • Stimuli become signals if… – Predictive of a consequence • Stimulus Generalization • Stimulus Discrimination …occurs when your behaviour comes to be under the control of the stimulus • The behaviour happens when the stimulus is present and doesn’t happen when the stimulus is absent • Much of our everyday behaviour is under stimulus control • Can you think of examples of human behaviour that seem as if they are under stimulus control? • Where it always happens when the stimulus is present and never happens when it’s absent? Examples: • Traffic lights • Typical talking distances • Social drinkers/smokers • Social behaviours
• Stimulus Generalization
Definition: When a response is reinforced in the
presence of one stimulus there is a general tendency
to respond in the presence of new stimuli that have
similar physical properties or have been associated
with the stimulus
• Loose degree of stimulus control
• Stimulus Discrimination
– Definition: Degree to which different stimuli set the
occasion for particular responses
• Precise degree of stimulus control
• How? Stimulus discrimination is taught by using
discrimination training procedures such as
differential reinforcement