Week 4 Flashcards

1
Q

Classical Conditioning

A

Ivan Pavlov
• Learning via association
Classical (Pavlovian) conditioning relies on the
formation of reflexive associations between stimuli,
resulting in involuntary responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Operant Conditioning

A

B.F. Skinner
• Learning via reinforcement
Operant conditioning (sometimes called instrumental
conditioning) relies on the consequences of past
actions influencing future behaviour, resulting in
increase or decrease of voluntary behaviours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Operant Conditioning- Principle

A

Operates on a simple but powerful principle:
– Consequences lead to change in voluntary
behaviours
– A behaviour that results in a reward tends
to be repeated or become more frequent.
– A behaviour that results in a punishment tends
to be avoided or become less frequent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A brief history: Thorndike

A
Edward Thorndike put
cats inside puzzle boxes
• Cats could escape the
box by pulling a string,
stepping on a platform,
and turning a latch on
the door (etc.)
• Cats get quicker at this
with experience
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The Law of Effect

A
‘Of several responses
made to the same
situation, those which are
accompanied or closely
followed by satisfaction
to the animal will, other
things being equal, be
more firmly connected
with the situation, so
that, when it recurs, they
will be more likely to
recur.’ – EL Thorndike,
1898.
The tendency to perform an action is increased if
rewarded, weakened if it is not
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Skinner Box

A

Small Box used to conduct operant conditioning on animals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Teaching New Behaviour - example

A

But how to get the rat to press the lever in the first
place?
• First option: wait…
• Second option: reinforce any behaviour that could
lead to desired behaviour (i.e., shaping)
– selective reinforcement of behaviour resembling the
desired target behaviour (e.g. going within 5 cm from the
lever, touching the lever).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Real World Learning?

A
No
experimenter/trainer
• Real world
reinforcement (e.g. in
foraging)
• Animal adapts
behaviourally to
environmental feedback
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Superstitious Behaviour

A

Skinner (1948) discovered that if you randomly reward pigeons (e.g. by giving a reward every 15 seconds), they end up repeatedly performing distinct behaviours between food presentation
• Self-perpetuating because the increased display of what was reinforced before, seems to increase the chance of being rewarded again
• Akin to ‘superstitious behaviour’ – reflecting apparent belief that things they do cause the random rewards (even if there is no causation)
• Random reinforcement shapes behaviour
Not just pigeons
– Athletes with warm up rituals
– Lucky clothes
– Other lucky charms
• Even if there is actually no true association between a
behaviour and an outcome we
expect and try to find links
– Pedestrian crossing - push button traffic light repeatedly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Techniques for Teaching New Behaviour

A

Shaping (scan & capture)
• Baiting
• Mimicking
• Sculpting
• Instruction (language)
– Simply imagining the behaviour-reinforcement pairing
can be enough to “repeat” the behaviour in reality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Teaching New Behaviour: Chaining

A

Many behaviours are made up of smaller behaviours
• Acquiring a behaviour is easier if done in bits and
pieces
• Can be done forward or backward
• To shape a behaviour, it’s often best to start with the
last behaviour in the chain
– Backward chaining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Reinforcers and Punishers

A

The consequence of one’s actions this time (after a
response, R, follows a stimulus, S) determines the
likelihood of that behaviour happening again when
the next instance of the stimulus occurs…
• Reinforcer: increases behaviour
• Punisher: decreases behaviour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Types of Reinforcers and Punishers

A
• Positive (add)
– The person/animal
receives something
• A shock
• Ice cream 
Negative (subtract)
– Something is taken away
from the person/animal
• Night off from chores
• Removing TV privileges
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Reinforcement vs

Punishment:

A
•Reinforcement
increases behaviour
(and isn’t necessarily
rewarding in itself)
•Punishment
decreases behaviour
(and isn’t necessarily
irritating in itself)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Positive vs Negative:

A

•Positive adds something (and
isn’t necessarily good)
•Negative takes away something
(and isn’t necessarily bad)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Positive

Reinforcement:

A
Adds something to
increase a behaviour
•Finish your
homework and you
can have an ice-cream
•Gold stars for good
behaviour
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Bridging

A

In animal training, it is useful to teach association
between a stimulus that can be delivered
immediately, and a subsequent reward
– Short phoneme
– Whistle
– Clicker
• The stimulus hence comes to signal the arrival of
the reward – it is a conditioned reinforcer - and
effectively bridges the time between the
behaviour and the primary reinforcement
– Combines associative and operant conditioning!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Positive

Punishment

A

•Adds something to
decrease a behaviour
•Anti-barking collars
•Getting told off

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Negative

Reinforcement:

A
Removes something
to increase a
behaviour
• Removes discomfort
(e.g. heat)
•Giving student a
night off from chores
after good marks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Negative

Punishment:

A
Removes something
to decrease a
behaviour
•Losing your license
•Being put in time out
21
Q

Schedules of Reinforcement

A

Continuous (CRF): Each response reinforced
• Partial (PRF): Only some responses reinforced
Ratio vs. Interval
Ratio- Instances of the behaviour
Interval- Time of the behaviour

22
Q

Schedules of Reinforcement

A

Partial (PRF): Only some responses reinforced
– Fixed ratio (FR): Every Nth action (e.g., pay on commission)
• 100 responses: rewarded for 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
– Variable ratio (VR): On average every Nth action
(e.g., gambling)
• 100 responses: rewarded for 3, 7, 28, 34, 66, 67, 70, 84, 91, 95
– Fixed interval (FI): First behaviour after N seconds (e.g., waiting for a bus)
– Variable interval (VI): On average, first behaviour after N seconds (e.g., checking email)

23
Q

Schedules of Reinforcement:

Ratio vs Interval

A

Ratio schedules are more efficient
– Because the reinforcement directly depends on
how many times the animal produces the action
• VR schedule is most resistant to extinction
– Because it teaches persistence (e.g. gambling)

24
Q

Why is gambling so addictive?

A

What if wins were on a fixed ratio schedule?
– E.g., you lose $10 the first 9 bets, but win $80 every 10th
• You predictably lose $10 for every 10 bets
– No one would ever play!
• But because of the unpredictability of variable ratio:
– Sometimes you get on a lucky run
• E.g., you win $80 twice in a row or three times out of 10
• And this is very reinforcing!
– But you still lose in the long run

25
Q

How To Punish Effectively

A
  1. No escape
  2. As intense as
    possible (within
    limits)
  3. Continuous schedule
  4. No delay
  5. Over a short period
    of time
  6. No subsequent
    reinforcement
  7. Reinforce incompatible, appropriate behaviour
    ….concurrently
  8. Watch for side effects
    – Changes in other behaviours
    – Aggression
    – Fear
    – Modelling of violence
    – Learned helplessness
26
Q

Reward Variables

A
It is not only the schedule that affects conditioning
• Other variables also matter:
– Drive
– Size
– Delay
27
Q

Reward Variables: Drive

A
It’s not only the
schedule that affects
conditioning
– Drive
– Size
– Delay
Reinforcement depends
on how much the
organism wants the
reinforcer
• Hungry organism vs
sated organism
28
Q

What motivates your dog?

A
Individual differences
• E.g. Identify the reward that
works best
– Give choices (food, toys etc.) and
see what they pick
– Only ever use most desired item
for training
• E.g. Identify which dog is likely
to make a good sniffer dog
– Throw target in grass and hold
dog for 5sec, then 15sec, then
30 sec – is dog still searching?
– Will dog search for 60 seconds if
target removed?
29
Q

Reward Variables: Size

A
It’s not only the
schedule that affects
conditioning
– Drive
– Size
– Delay
• In operant conditioning,
size does matter
• Animals in a Skinner
box learn faster if they
get more food pellets
• BUT: diminishing
returns (next slide)
• Acquisition: faster with large/desired reward
• Extinction: faster with large/desired reward
30
Q

Reward Variables: Delay

A

Increasing the delay reduces the learning effect
Big Problem:
– Short term reinforcement vs. long term punishment

31
Q

Reinforcers work better when

A

Drive/Desire is higher
– Reinforcer is larger (but this effect tapers off)
– Reinforcer is immediate

32
Q

The Three Term Contingency

i.e. not every behaviour is appropriate in every situation

A
  1. The discriminative stimulus
    - Sets the occasion
  2. The operant response
    - The behaviour
  3. The outcome (reinforcer/punisher) that
    follows
    - The consequence
    Skinner argued that these three terms are the
    basis of operant conditioning
33
Q

Stimulus Control

A
In the three-term contingency a discriminative
stimulus serves to signal the occasion when a
particular behaviour will be reinforced/punished
• So learning to discriminate the stimulus is key to
operant conditioning
• Stimuli become signals if…
– Predictive of a consequence
• Stimulus Generalization
• Stimulus Discrimination
…occurs when your
behaviour comes to be under
the control of the stimulus
• The behaviour happens when
the stimulus is present and
doesn’t happen when the
stimulus is absent
• Much of our everyday
behaviour is under stimulus
control
• Can you think of
examples of human
behaviour that seem as
if they are under
stimulus control?
• Where it always
happens when the
stimulus is present and
never happens when
it’s absent?
Examples:
• Traffic lights
• Typical talking distances
• Social drinkers/smokers
• Social behaviours
34
Q

• Stimulus Generalization

A

Definition: When a response is reinforced in the
presence of one stimulus there is a general tendency
to respond in the presence of new stimuli that have
similar physical properties or have been associated
with the stimulus
• Loose degree of stimulus control

35
Q

• Stimulus Discrimination

A

– Definition: Degree to which different stimuli set the
occasion for particular responses
• Precise degree of stimulus control
• How? Stimulus discrimination is taught by using
discrimination training procedures such as
differential reinforcement

36
Q

Schedules of Reinforcement

A

Partial (PRF): Only some responses reinforced
– Fixed ratio (FR): Every Nth action (e.g., pay on commission)
• 100 responses: rewarded for 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
– Variable ratio (VR): On average every Nth action
(e.g., gambling)
• 100 responses: rewarded for 3, 7, 28, 34, 66, 67, 70, 84, 91, 95
– Fixed interval (FI): First behaviour after N seconds (e.g., waiting for a bus)
– Variable interval (VI): On average, first behaviour after N seconds (e.g., checking email)

37
Q

Schedules of Reinforcement:

Ratio vs Interval

A

Ratio schedules are more efficient
– Because the reinforcement directly depends on
how many times the animal produces the action
• VR schedule is most resistant to extinction
– Because it teaches persistence (e.g. gambling)

38
Q

Why is gambling so addictive?

A

What if wins were on a fixed ratio schedule?
– E.g., you lose $10 the first 9 bets, but win $80 every 10th
• You predictably lose $10 for every 10 bets
– No one would ever play!
• But because of the unpredictability of variable ratio:
– Sometimes you get on a lucky run
• E.g., you win $80 twice in a row or three times out of 10
• And this is very reinforcing!
– But you still lose in the long run

39
Q

How To Punish Effectively

A
  1. No escape
  2. As intense as
    possible (within
    limits)
  3. Continuous schedule
  4. No delay
  5. Over a short period
    of time
  6. No subsequent
    reinforcement
  7. Reinforce incompatible, appropriate behaviour
    ….concurrently
  8. Watch for side effects
    – Changes in other behaviours
    – Aggression
    – Fear
    – Modelling of violence
    – Learned helplessness
40
Q

Reward Variables

A
It is not only the schedule that affects conditioning
• Other variables also matter:
– Drive
– Size
– Delay
41
Q

Reward Variables: Drive

A
It’s not only the
schedule that affects
conditioning
– Drive
– Size
– Delay
Reinforcement depends
on how much the
organism wants the
reinforcer
• Hungry organism vs
sated organism
42
Q

What motivates your dog?

A
Individual differences
• E.g. Identify the reward that
works best
– Give choices (food, toys etc.) and
see what they pick
– Only ever use most desired item
for training
• E.g. Identify which dog is likely
to make a good sniffer dog
– Throw target in grass and hold
dog for 5sec, then 15sec, then
30 sec – is dog still searching?
– Will dog search for 60 seconds if
target removed?
43
Q

Reward Variables: Size

A
It’s not only the
schedule that affects
conditioning
– Drive
– Size
– Delay
• In operant conditioning,
size does matter
• Animals in a Skinner
box learn faster if they
get more food pellets
• BUT: diminishing
returns (next slide)
• Acquisition: faster with large/desired reward
• Extinction: faster with large/desired reward
44
Q

Reward Variables: Delay

A

Increasing the delay reduces the learning effect
Big Problem:
– Short term reinforcement vs. long term punishment

45
Q

Reinforcers work better when

A

Drive/Desire is higher
– Reinforcer is larger (but this effect tapers off)
– Reinforcer is immediate

46
Q

The Three Term Contingency

i.e. not every behaviour is appropriate in every situation

A
  1. The discriminative stimulus
    - Sets the occasion
  2. The operant response
    - The behaviour
  3. The outcome (reinforcer/punisher) that
    follows
    - The consequence
    Skinner argued that these three terms are the
    basis of operant conditioning
47
Q

Stimulus Control

A
In the three-term contingency a discriminative
stimulus serves to signal the occasion when a
particular behaviour will be reinforced/punished
• So learning to discriminate the stimulus is key to
operant conditioning
• Stimuli become signals if…
– Predictive of a consequence
• Stimulus Generalization
• Stimulus Discrimination
…occurs when your
behaviour comes to be under
the control of the stimulus
• The behaviour happens when
the stimulus is present and
doesn’t happen when the
stimulus is absent
• Much of our everyday
behaviour is under stimulus
control
• Can you think of
examples of human
behaviour that seem as
if they are under
stimulus control?
• Where it always
happens when the
stimulus is present and
never happens when
it’s absent?
Examples:
• Traffic lights
• Typical talking distances
• Social drinkers/smokers
• Social behaviours
48
Q

• Stimulus Generalization

A

Definition: When a response is reinforced in the
presence of one stimulus there is a general tendency
to respond in the presence of new stimuli that have
similar physical properties or have been associated
with the stimulus
• Loose degree of stimulus control

49
Q

• Stimulus Discrimination

A

– Definition: Degree to which different stimuli set the
occasion for particular responses
• Precise degree of stimulus control
• How? Stimulus discrimination is taught by using
discrimination training procedures such as
differential reinforcement