Week 4 Flashcards
Classical Conditioning
Ivan Pavlov
• Learning via association
Classical (Pavlovian) conditioning relies on the
formation of reflexive associations between stimuli,
resulting in involuntary responses
Operant Conditioning
B.F. Skinner
• Learning via reinforcement
Operant conditioning (sometimes called instrumental
conditioning) relies on the consequences of past
actions influencing future behaviour, resulting in
increase or decrease of voluntary behaviours
Operant Conditioning- Principle
Operates on a simple but powerful principle:
– Consequences lead to change in voluntary
behaviours
– A behaviour that results in a reward tends
to be repeated or become more frequent.
– A behaviour that results in a punishment tends
to be avoided or become less frequent.
A brief history: Thorndike
Edward Thorndike put cats inside puzzle boxes • Cats could escape the box by pulling a string, stepping on a platform, and turning a latch on the door (etc.) • Cats get quicker at this with experience
The Law of Effect
‘Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur.’ – EL Thorndike, 1898. The tendency to perform an action is increased if rewarded, weakened if it is not
Skinner Box
Small Box used to conduct operant conditioning on animals
Teaching New Behaviour - example
But how to get the rat to press the lever in the first
place?
• First option: wait…
• Second option: reinforce any behaviour that could
lead to desired behaviour (i.e., shaping)
– selective reinforcement of behaviour resembling the
desired target behaviour (e.g. going within 5 cm from the
lever, touching the lever).
What is Real World Learning?
No experimenter/trainer • Real world reinforcement (e.g. in foraging) • Animal adapts behaviourally to environmental feedback
Superstitious Behaviour
Skinner (1948) discovered that if you randomly reward pigeons (e.g. by giving a reward every 15 seconds), they end up repeatedly performing distinct behaviours between food presentation
• Self-perpetuating because the increased display of what was reinforced before, seems to increase the chance of being rewarded again
• Akin to ‘superstitious behaviour’ – reflecting apparent belief that things they do cause the random rewards (even if there is no causation)
• Random reinforcement shapes behaviour
Not just pigeons
– Athletes with warm up rituals
– Lucky clothes
– Other lucky charms
• Even if there is actually no true association between a
behaviour and an outcome we
expect and try to find links
– Pedestrian crossing - push button traffic light repeatedly
Techniques for Teaching New Behaviour
Shaping (scan & capture)
• Baiting
• Mimicking
• Sculpting
• Instruction (language)
– Simply imagining the behaviour-reinforcement pairing
can be enough to “repeat” the behaviour in reality
Teaching New Behaviour: Chaining
Many behaviours are made up of smaller behaviours
• Acquiring a behaviour is easier if done in bits and
pieces
• Can be done forward or backward
• To shape a behaviour, it’s often best to start with the
last behaviour in the chain
– Backward chaining
Reinforcers and Punishers
The consequence of one’s actions this time (after a
response, R, follows a stimulus, S) determines the
likelihood of that behaviour happening again when
the next instance of the stimulus occurs…
• Reinforcer: increases behaviour
• Punisher: decreases behaviour
Types of Reinforcers and Punishers
• Positive (add) – The person/animal receives something • A shock • Ice cream Negative (subtract) – Something is taken away from the person/animal • Night off from chores • Removing TV privileges
Reinforcement vs
Punishment:
•Reinforcement increases behaviour (and isn’t necessarily rewarding in itself) •Punishment decreases behaviour (and isn’t necessarily irritating in itself)
Positive vs Negative:
•Positive adds something (and
isn’t necessarily good)
•Negative takes away something
(and isn’t necessarily bad)
Positive
Reinforcement:
Adds something to increase a behaviour •Finish your homework and you can have an ice-cream •Gold stars for good behaviour
Bridging
In animal training, it is useful to teach association
between a stimulus that can be delivered
immediately, and a subsequent reward
– Short phoneme
– Whistle
– Clicker
• The stimulus hence comes to signal the arrival of
the reward – it is a conditioned reinforcer - and
effectively bridges the time between the
behaviour and the primary reinforcement
– Combines associative and operant conditioning!
Positive
Punishment
•Adds something to
decrease a behaviour
•Anti-barking collars
•Getting told off
Negative
Reinforcement:
Removes something to increase a behaviour • Removes discomfort (e.g. heat) •Giving student a night off from chores after good marks