Week 4 Flashcards

Question 1

Q

Classical Conditioning

Answer

A

Ivan Pavlov
• Learning via association
Classical (Pavlovian) conditioning relies on the
formation of reflexive associations between stimuli,
resulting in involuntary responses

Question 2

Q

Operant Conditioning

Answer

A

B.F. Skinner
• Learning via reinforcement
Operant conditioning (sometimes called instrumental
conditioning) relies on the consequences of past
actions influencing future behaviour, resulting in
increase or decrease of voluntary behaviours

Question 3

Q

Operant Conditioning- Principle

Answer

A

Operates on a simple but powerful principle:
– Consequences lead to change in voluntary
behaviours
– A behaviour that results in a reward tends
to be repeated or become more frequent.
– A behaviour that results in a punishment tends
to be avoided or become less frequent.

Question 4

Q

A brief history: Thorndike

Answer

A

Edward Thorndike put
cats inside puzzle boxes
• Cats could escape the
box by pulling a string,
stepping on a platform,
and turning a latch on
the door (etc.)
• Cats get quicker at this
with experience

Question 5

Q

The Law of Effect

Answer

A

‘Of several responses
made to the same
situation, those which are
accompanied or closely
followed by satisfaction
to the animal will, other
things being equal, be
more firmly connected
with the situation, so
that, when it recurs, they
will be more likely to
recur.’ – EL Thorndike,
1898.
The tendency to perform an action is increased if
rewarded, weakened if it is not

Question 6

Q

Skinner Box

Answer

A

Small Box used to conduct operant conditioning on animals

Question 7

Q

Teaching New Behaviour - example

Answer

A

But how to get the rat to press the lever in the first
place?
• First option: wait…
• Second option: reinforce any behaviour that could
lead to desired behaviour (i.e., shaping)
– selective reinforcement of behaviour resembling the
desired target behaviour (e.g. going within 5 cm from the
lever, touching the lever).

Question 8

Q

What is Real World Learning?

Answer

A

No
experimenter/trainer
• Real world
reinforcement (e.g. in
foraging)
• Animal adapts
behaviourally to
environmental feedback

Question 9

Q

Superstitious Behaviour

Answer

A

Skinner (1948) discovered that if you randomly reward pigeons (e.g. by giving a reward every 15 seconds), they end up repeatedly performing distinct behaviours between food presentation
• Self-perpetuating because the increased display of what was reinforced before, seems to increase the chance of being rewarded again
• Akin to ‘superstitious behaviour’ – reflecting apparent belief that things they do cause the random rewards (even if there is no causation)
• Random reinforcement shapes behaviour
Not just pigeons
– Athletes with warm up rituals
– Lucky clothes
– Other lucky charms
• Even if there is actually no true association between a
behaviour and an outcome we
expect and try to find links
– Pedestrian crossing - push button traffic light repeatedly

Question 10

Q

Techniques for Teaching New Behaviour

Answer

A

Shaping (scan & capture)
• Baiting
• Mimicking
• Sculpting
• Instruction (language)
– Simply imagining the behaviour-reinforcement pairing
can be enough to “repeat” the behaviour in reality

Question 11

Q

Teaching New Behaviour: Chaining

Answer

A

Many behaviours are made up of smaller behaviours
• Acquiring a behaviour is easier if done in bits and
pieces
• Can be done forward or backward
• To shape a behaviour, it’s often best to start with the
last behaviour in the chain
– Backward chaining

Question 12

Q

Reinforcers and Punishers

Answer

A

The consequence of one’s actions this time (after a
response, R, follows a stimulus, S) determines the
likelihood of that behaviour happening again when
the next instance of the stimulus occurs…
• Reinforcer: increases behaviour
• Punisher: decreases behaviour

Question 13

Q

Types of Reinforcers and Punishers

Answer

A

• Positive (add)
– The person/animal
receives something
• A shock
• Ice cream 
Negative (subtract)
– Something is taken away
from the person/animal
• Night off from chores
• Removing TV privileges

Question 14

Q

Reinforcement vs

Punishment:

Answer

A

•Reinforcement
increases behaviour
(and isn’t necessarily
rewarding in itself)
•Punishment
decreases behaviour
(and isn’t necessarily
irritating in itself)

Question 15

Q

Positive vs Negative:

Answer

A

•Positive adds something (and
isn’t necessarily good)
•Negative takes away something
(and isn’t necessarily bad)

Question 16

Q

Positive

Reinforcement:

Answer

A

Adds something to
increase a behaviour
•Finish your
homework and you
can have an ice-cream
•Gold stars for good
behaviour

Question 17

Q

Bridging

Answer

A

In animal training, it is useful to teach association
between a stimulus that can be delivered
immediately, and a subsequent reward
– Short phoneme
– Whistle
– Clicker
• The stimulus hence comes to signal the arrival of
the reward – it is a conditioned reinforcer - and
effectively bridges the time between the
behaviour and the primary reinforcement
– Combines associative and operant conditioning!

Question 18

Q

Positive

Punishment

Answer

A

•Adds something to
decrease a behaviour
•Anti-barking collars
•Getting told off

Question 19

Q

Negative

Reinforcement:

Answer

A

Removes something
to increase a
behaviour
• Removes discomfort
(e.g. heat)
•Giving student a
night off from chores
after good marks

Question 20

Q

Negative

Punishment:

Answer

A

Removes something
to decrease a
behaviour
•Losing your license
•Being put in time out

Question 21

Q

Schedules of Reinforcement

Answer

A

Continuous (CRF): Each response reinforced
• Partial (PRF): Only some responses reinforced
Ratio vs. Interval
Ratio- Instances of the behaviour
Interval- Time of the behaviour

Question 22

Q

Schedules of Reinforcement

Answer

A

Partial (PRF): Only some responses reinforced
– Fixed ratio (FR): Every Nth action (e.g., pay on commission)
• 100 responses: rewarded for 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
– Variable ratio (VR): On average every Nth action
(e.g., gambling)
• 100 responses: rewarded for 3, 7, 28, 34, 66, 67, 70, 84, 91, 95
– Fixed interval (FI): First behaviour after N seconds (e.g., waiting for a bus)
– Variable interval (VI): On average, first behaviour after N seconds (e.g., checking email)

Question 23

Q

Schedules of Reinforcement:

Ratio vs Interval

Answer

A

Ratio schedules are more efficient
– Because the reinforcement directly depends on
how many times the animal produces the action
• VR schedule is most resistant to extinction
– Because it teaches persistence (e.g. gambling)

Question 24

Q

Why is gambling so addictive?

Answer

A

What if wins were on a fixed ratio schedule?
– E.g., you lose $10 the first 9 bets, but win $80 every 10th
• You predictably lose $10 for every 10 bets
– No one would ever play!
• But because of the unpredictability of variable ratio:
– Sometimes you get on a lucky run
• E.g., you win $80 twice in a row or three times out of 10
• And this is very reinforcing!
– But you still lose in the long run

Question 25

Q

How To Punish Effectively

Answer

A

No escape
As intense as
possible (within
limits)
Continuous schedule
No delay
Over a short period
of time
No subsequent
reinforcement
Reinforce incompatible, appropriate behaviour
….concurrently
Watch for side effects
– Changes in other behaviours
– Aggression
– Fear
– Modelling of violence
– Learned helplessness

Question 26

Q

Reward Variables

Answer

A

It is not only the schedule that affects conditioning
• Other variables also matter:
– Drive
– Size
– Delay

Question 27

Q

Reward Variables: Drive

Answer

A

It’s not only the
schedule that affects
conditioning
– Drive
– Size
– Delay
Reinforcement depends
on how much the
organism wants the
reinforcer
• Hungry organism vs
sated organism

Question 28

Q

What motivates your dog?

Answer

A

Individual differences
• E.g. Identify the reward that
works best
– Give choices (food, toys etc.) and
see what they pick
– Only ever use most desired item
for training
• E.g. Identify which dog is likely
to make a good sniffer dog
– Throw target in grass and hold
dog for 5sec, then 15sec, then
30 sec – is dog still searching?
– Will dog search for 60 seconds if
target removed?

Question 29

Q

Reward Variables: Size

Answer

A

It’s not only the
schedule that affects
conditioning
– Drive
– Size
– Delay
• In operant conditioning,
size does matter
• Animals in a Skinner
box learn faster if they
get more food pellets
• BUT: diminishing
returns (next slide)
• Acquisition: faster with large/desired reward
• Extinction: faster with large/desired reward

Question 30

Q

Reward Variables: Delay

Answer

A

Increasing the delay reduces the learning effect
Big Problem:
– Short term reinforcement vs. long term punishment

Question 31

Q

Reinforcers work better when

Answer

A

Drive/Desire is higher
– Reinforcer is larger (but this effect tapers off)
– Reinforcer is immediate

Question 32

Q

The Three Term Contingency

i.e. not every behaviour is appropriate in every situation

Answer

A

The discriminative stimulus
- Sets the occasion
The operant response
- The behaviour
The outcome (reinforcer/punisher) that
follows
- The consequence
Skinner argued that these three terms are the
basis of operant conditioning

Question 33

Q

Stimulus Control

Answer

A

In the three-term contingency a discriminative
stimulus serves to signal the occasion when a
particular behaviour will be reinforced/punished
• So learning to discriminate the stimulus is key to
operant conditioning
• Stimuli become signals if…
– Predictive of a consequence
• Stimulus Generalization
• Stimulus Discrimination
…occurs when your
behaviour comes to be under
the control of the stimulus
• The behaviour happens when
the stimulus is present and
doesn’t happen when the
stimulus is absent
• Much of our everyday
behaviour is under stimulus
control
• Can you think of
examples of human
behaviour that seem as
if they are under
stimulus control?
• Where it always
happens when the
stimulus is present and
never happens when
it’s absent?
Examples:
• Traffic lights
• Typical talking distances
• Social drinkers/smokers
• Social behaviours

Question 34

Q

• Stimulus Generalization

Answer

A

Definition: When a response is reinforced in the
presence of one stimulus there is a general tendency
to respond in the presence of new stimuli that have
similar physical properties or have been associated
with the stimulus
• Loose degree of stimulus control

Question 35

Q

• Stimulus Discrimination

Answer

A

– Definition: Degree to which different stimuli set the
occasion for particular responses
• Precise degree of stimulus control
• How? Stimulus discrimination is taught by using
discrimination training procedures such as
differential reinforcement

Question 36

Q

Schedules of Reinforcement

Answer

A

Partial (PRF): Only some responses reinforced
– Fixed ratio (FR): Every Nth action (e.g., pay on commission)
• 100 responses: rewarded for 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
– Variable ratio (VR): On average every Nth action
(e.g., gambling)
• 100 responses: rewarded for 3, 7, 28, 34, 66, 67, 70, 84, 91, 95
– Fixed interval (FI): First behaviour after N seconds (e.g., waiting for a bus)
– Variable interval (VI): On average, first behaviour after N seconds (e.g., checking email)

Question 37

Q

Schedules of Reinforcement:

Ratio vs Interval

Answer

A

Ratio schedules are more efficient
– Because the reinforcement directly depends on
how many times the animal produces the action
• VR schedule is most resistant to extinction
– Because it teaches persistence (e.g. gambling)

Question 38

Q

Why is gambling so addictive?

Answer

A

What if wins were on a fixed ratio schedule?
– E.g., you lose $10 the first 9 bets, but win $80 every 10th
• You predictably lose $10 for every 10 bets
– No one would ever play!
• But because of the unpredictability of variable ratio:
– Sometimes you get on a lucky run
• E.g., you win $80 twice in a row or three times out of 10
• And this is very reinforcing!
– But you still lose in the long run

Question 39

Q

How To Punish Effectively

Answer

A

No escape
As intense as
possible (within
limits)
Continuous schedule
No delay
Over a short period
of time
No subsequent
reinforcement
Reinforce incompatible, appropriate behaviour
….concurrently
Watch for side effects
– Changes in other behaviours
– Aggression
– Fear
– Modelling of violence
– Learned helplessness

Question 40

Q

Reward Variables

Answer

A

It is not only the schedule that affects conditioning
• Other variables also matter:
– Drive
– Size
– Delay

Question 41

Q

Reward Variables: Drive

Answer

A

It’s not only the
schedule that affects
conditioning
– Drive
– Size
– Delay
Reinforcement depends
on how much the
organism wants the
reinforcer
• Hungry organism vs
sated organism

Question 42

Q

What motivates your dog?

Answer

A

Individual differences
• E.g. Identify the reward that
works best
– Give choices (food, toys etc.) and
see what they pick
– Only ever use most desired item
for training
• E.g. Identify which dog is likely
to make a good sniffer dog
– Throw target in grass and hold
dog for 5sec, then 15sec, then
30 sec – is dog still searching?
– Will dog search for 60 seconds if
target removed?

Question 43

Q

Reward Variables: Size

Answer

A

It’s not only the
schedule that affects
conditioning
– Drive
– Size
– Delay
• In operant conditioning,
size does matter
• Animals in a Skinner
box learn faster if they
get more food pellets
• BUT: diminishing
returns (next slide)
• Acquisition: faster with large/desired reward
• Extinction: faster with large/desired reward

Question 44

Q

Reward Variables: Delay

Answer

A

Increasing the delay reduces the learning effect
Big Problem:
– Short term reinforcement vs. long term punishment

Question 45

Q

Reinforcers work better when

Answer

A

Drive/Desire is higher
– Reinforcer is larger (but this effect tapers off)
– Reinforcer is immediate

Question 46

Q

The Three Term Contingency

i.e. not every behaviour is appropriate in every situation

Answer

A

The discriminative stimulus
- Sets the occasion
The operant response
- The behaviour
The outcome (reinforcer/punisher) that
follows
- The consequence
Skinner argued that these three terms are the
basis of operant conditioning

Question 47

Q

Stimulus Control

Answer

A

In the three-term contingency a discriminative
stimulus serves to signal the occasion when a
particular behaviour will be reinforced/punished
• So learning to discriminate the stimulus is key to
operant conditioning
• Stimuli become signals if…
– Predictive of a consequence
• Stimulus Generalization
• Stimulus Discrimination
…occurs when your
behaviour comes to be under
the control of the stimulus
• The behaviour happens when
the stimulus is present and
doesn’t happen when the
stimulus is absent
• Much of our everyday
behaviour is under stimulus
control
• Can you think of
examples of human
behaviour that seem as
if they are under
stimulus control?
• Where it always
happens when the
stimulus is present and
never happens when
it’s absent?
Examples:
• Traffic lights
• Typical talking distances
• Social drinkers/smokers
• Social behaviours

Question 48

Q

• Stimulus Generalization

Answer

A

Definition: When a response is reinforced in the
presence of one stimulus there is a general tendency
to respond in the presence of new stimuli that have
similar physical properties or have been associated
with the stimulus
• Loose degree of stimulus control

Question 49

Q

• Stimulus Discrimination

Answer

A

– Definition: Degree to which different stimuli set the
occasion for particular responses
• Precise degree of stimulus control
• How? Stimulus discrimination is taught by using
discrimination training procedures such as
differential reinforcement