Operant conditioning Flashcards

1
Q

What is Thorndike’s law of effect?

A

If, in a specific situation, a response is followed by a reinforcer, the response will become associated with that situation and will be more likely to occur again in that situation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is operant conditioning?

A

The organism operates on its environment in some way to achieve some desirable outcome
Behaviour is associated with consequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are 3 Key features of Skinner’s Operant Box?

A

• Some behaviour that can be done to obtain reward.
―Rate measured by experimenter
• A dispenser of food or liquid used as a reinforcer (reward)
• Tones or lights to signal availability of opportunity for reward or pending punishment
―Used in discrimination and generalisation studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is shaping?

A

• Shaping is the use of reinforcement of successive approximations of a desired behaviour.
• Specifically, when using a shaping technique, each approximate desired behaviour that is demonstrated is reinforced, while behaviours that are not approximations of the desired behaviour are not reinforced
Incrementally build towards a behaviour (step by step)
You know the end pt but there are lots of actions that need to happen in order to get there so reward each step chronologically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is positive reinforcement?

A

Smtg added to the env causes behaviour to aug in f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is positive punishment?

A

Something is added to the environment, that causes the behaviour to decrease in frequency ∴ that something must have been
unpleasant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is negative punishment?

A

Something is removed from the environment, that causes the behaviour to decrease in frequency ∴ that something must have been pleasant
AKA Response Cost or Omission Training – but regardless of name – they all involve the removal of a stimulus, following the targeted behaviour, that the person values/desires/enjoys.
To facilitate the process they may be reinforced for exhibiting another more desirable behaviour (DRO: Differential Reinforcement of Other behaviour)
If the person makes the “wrong” response then they will lose something of value
So they should learn to inhibit or omit the “wrong” behaviour (omission learning).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is negative reinforcement?

A

Something is removed from the environment, that causes the behaviour to increase in frequency ∴ that something must have been unpleasant
Smtg neg removed from the env increases the behaviour that allowed us to avoid the neg (applying sunscreen)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do different types of reinforcement interact with different emotions?

A
  • Happiness: Positive Reinforcement; Application of Pleasant Stimulus
  • Anger: Omission Learning; Removal of a Pleasant Stimulus = Negative punishment
  • Relief: Negative Reinforcement; Removal of an Unpleasant Stimulus
  • Fear: Positive Punishment; Application of Unpleasant Stimulus
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a continuous schedule of reinforcement?

A
  • Behaviour is followed by a consequence each time it occurs
  • Excellent for getting a new behaviour started
  • Behaviour stops quickly when reinforcement stops
  • Schedule of choice for punishment and time-out
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is thinning intermittent reinforcement?

A

• One of two methods commonly used:
―Gradually increasing the response ratio or the duration of the time interval between Response –> Reinforcer
Response ratio = how many times have to respond to get a reward
Can change behaviour to get response ratio we want
Time interval = no matter how many times they respond, only get reward at exact time
— Providing instructions such as rules, directions and signs to communicate the schedule of reinforcement.
i.e. give a cue/signal that Reinforcement is on its way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are 4 partial schedules used for resistance to extiction?

A

• Ratio Schedules: (Responses/actions)
• e.g. after the pre-determined number of responses has
been made –> outcome

• Interval Schedules: (Time lapse)
• e.g. the 1st response after the specified time has elapsed
–> outcome

• Fixed Schedules: (set rate/time)
• e.g., every 5 responses (ratio) or every 5 mins (interval) –>
outcome
• i.e., a predictable schedule

• Variable Schedules: (random average)
• E.g., every 2 - 5 responses (ratio) or every 2 - 5 mins
(interval) –> outcome
• i.e., an unpredictable schedule

Combinations:
• Fixed-Ratio
• Variable-Ratio
• Fixed-Interval
• Variable-Interval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a fixed ratio schedule?

A

Same ratio continues all throughout
• Behaviour/reinforcement (100/1 or 15/1)
• Response Rate: (Higher ratio = faster responding)
• Behaviour: tend to work hard (Ratio run); receive reinforcement; then brief postreinforcement pause then work hard
Resistance to Extinction: Low
• High rates of responding –> pause after receiving reward (PRP) –> then onwards for the next reward
• Make the number of responses too high –> ratio strain
a disruption in responding due to an overly demanding response requirement
Note also the closer they get to their target # of responses – so the rate of bar pressing increases – known as a ratio run

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is ratio strain?

A

― A result of abrupt increases in ratio requirements
― Characteristics include: avoidance, aggression, and unpredictable pauses in responding
― Ratio strain is the point of too much energy expended in exchange for too little in return.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the goal gradient hypothesis?

A

Animals in traversing a maze will move at a progressively more rapid pace as the goal is approached

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a variable ratio schedule?

A

• Behaviour/Reinforcement: random/unpredictable number of responses between reinforcements
• Response Rate: Fast
• Behaviour: Work hard and at steady rate
Resistance to Extinction: High

17
Q

How would the rate of responding be affected if rewards came on a temporal basis?

A

• Reduce it – no point working if its not making the rewards come any faster

18
Q

What is a fixed interval schedule?

A

• Behaviour/Reinforcement: After 1st then fixed amount of time
• Response: Scalloped
• Behaviour: High before Reinforcement/Long pause after
• Lowest rate of responding
Resistance to Extinction: Low

19
Q

What is a variable interval schedule?

A
• Behaviour/Reinforcement: First Response, then “average”
time period elapse.
• Response Rate: Slow
• Behaviour: Work at Steady rate
Resistance to extinction: High
20
Q

What are 4 procedures of differential reinforcement?

A

Four procedures that incorporate reinforcement to address and treat disruptive behaviours are:

  1. Differential Reinforcement of Other behaviour (DRO)
  2. Differential Reinforcement of Low rates of responding (DRL)
  3. Differential Reinforcement of Incompatible behaviour (DRI)
  4. Differential Reinforcement of Alternative behaviour (DRA) not necessarily incompatible
21
Q

What is differential reinforcement of other behaviours?

A

• Differential Reinforcement of Other behaviours (DRO)
―In this case the subject periodically receives the positive reinforcer provided it is engaged in other behaviours.
Not punishing the thing you want to get rid of, rewarding other behaviours
If target behaviour has not occurred during an interval, it is rewarded

22
Q

What are establishing operations?

A
Establishing Operations (EOs) are factors that affect the effectiveness of reinforcers
• The intensity, amount, and type (i.e. quality) of reinforcer determines its effectiveness
23
Q

What is the reinforcer magnitude?

A

Larger the reward the faster the acquisition of learning
Reinforcer must be of sufficient magnitude for it to be worth making the response
Needs to be a reasonable relationship between the effort required and the size of the reward
• Reward magnitude is often a matter of “being in the eyes of the beholder”.
• It need not be the absolute size that is important but how it is perceived.
• E.g., two groups of rats reinforced for the same amount of food. Rats run faster for the same amount of food, but when it is broken up into more pieces
• Similar studies show that many small reinforcers are generally more effective than a few large ones
• Magnitude is relative to the person.

24
Q

What are contrast effects?

A

All a matter of what you have been used to
• Shifting the value of the reward in “mid-stream” is also effective in changing behaviour known as Contrast Effects
• Reinforcer magnitude is all a matter of relativity
• Contrast effects are obtained when the quality of the reinforcer is switched as well
Responding is influenced by the reinforcement characteristics that an organism has come to expect in its past

25
Q

What happens when the reinforcement is delayed?

A

Gradient of delay: The delay decreases the contiguity between response and outcome
Temporal contiguity is an important factor in the effectiveness of operant conditioning. This golden retriever’s obedience training will be much more effective if the owner rewards his dog with a treat straight after the desired response
The delay decreases the contiguity between response and outcome
¹ Long delays make it difficult for the person/animal to see the relationship between their response and the consequence.
¹ A delay allows time for other behaviours to occur during the interval –> superstitious reinforcement of them.
¹ Deleterious effects of delay can be reduced by providing a signal that the reward is coming i.e. clicker.
Much faster responding if less delay

26
Q

What is the effect of the speed of the reward?

A

Addiction is linked to the speed of reward
• And this is exactly why modern poker machines are much more addictive than older pokies – the “one-armed bandits”.
• Modern pokies increase the gambling ‘dosage’ to much higher levels.
• All this speed means more bets, and more bets mean more excitement and more excitement means more dopamine.

27
Q

What is the effect of response-reinforcer contingency?

A

• The Reinforcer must be the result of some Response
• The greater the consistency between the Reinforcer and the Response, the quicker/more effective the conditioning.
GOALS MUST BE SET AND MET BEFORE A REWARD IS GIVEN

28
Q

What are primary and secondary reinforcers?

A

A primary reinforcer is a stimulus that is reinforcing even without previous training. Primary reinforcers are biologically relevant stimuli or events i.e. they have survival value.
Examples include food, water, and sex.

A conditioned reinforcer is an arbitrary event (such as a tone, clicker or token) that increases the frequency of an operant response. Events that have been associated with rewarding experiences acquire reinforcing power.
They are reinforcing because they permit an organism to obtain a primary reinforcer.

29
Q

What are 3 functions of conditioned reinforcement?

A

Conditioned reinforcers:
• Tell organism it has done right thing
• Tell the organism what to do next
• Bridge long periods between unconditioned reinforcers

30
Q

What is clicker training?

A

Conditioned reinforcement
Pair a hand-held clicker with food through straightforward
classical conditioning.
The sound of the clicker can then reinforce other behaviours

31
Q

What are 5 advantages of clicker training?

A
  • Clickers sound the same no matter how you are feeling when you press it
  • A clicker is easier to discriminate from everything else we say to the dogs
  • Split second timing is possible with the clicker thereby reinforcing the precise behaviour.
  • Using a primary reinforcer, such as food, can cause the dog to become focused on the food, and the food giver, rather than on the behaviour.
  • The clicker can reinforce the behaviour immediately
32
Q

What 3 factors affect the strength of a secondary reinforcer?

A

A number of variables affect the strength of a secondary reinforcer:
1. The magnitude of the primary reinforcer
2. The number of pairings (with the primary reinforcing)
3. Time elapsing between the presentation of the secondary
reinforcer and the primary reinforcer

33
Q

What was Premack’s theory?

A

“reinforcement involves a relation, typically between two responses, one that is being reinforced and another that is responsible for the reinforcement. This leads to the following generalization: Of any two responses, the more probable response will reinforce the less probable one” .
This generalization, known as the Premack Principle, is usually stated somewhat more simply:
High probability behaviour reinforces low probability behaviour
The theory also states that punishment occurs when the instrumental behaviour leads to a less-preferred response

34
Q

What is chaining?

A
  • Chaining refers to a method of teaching a behaviour using behaviour chains. Behaviour chains are sequences of individual behaviours that when linked together form a terminal behaviour.
  • It involves reinforcing individual responses occurring in a sequence to form a complex behaviour. It is frequently used for training behavioural sequences (or “chains”) that are beyond the current repertoire of the learner.
  • The chain of responses is broken down into small steps using task analysis.Parts of a chain are referred to as links.
35
Q

What is a response chain?

A

Response chain: a sequence of behaviours occurring in a specific order reinforced on the occurrence of the terminal
response.
Each step in the response chain acts both as a conditioned reinforcer (SR) for the previous step and as a discriminative
stimulus (SD) for the next step

36
Q

What is a discriminative stimulus?

A

a stimulus that indicates whether or not responding will lead to reinforcement

37
Q

What are 2 main chaining techniques?

A
  • Forward Chaining: Using forward chaining, the behaviour is taught in its naturally occurring order.
  • Each step of the sequence is taught and reinforced when completed correctly. Once 1st is mastered –> next step

• Backward Chaining: Using backward chaining the learner first performs the final behaviour in the sequence at the predetermined criterion level, reinforcement is delivered.
• Next, reinforcement is delivered when the last and the next-to-last behaviours in the sequence are performed to criterion.
― This sequence proceeds backwards through the chain until all the steps in the task analysis have been introduced in reverse order and practiced cumulatively

• Both techniques more successful than whole task learning

38
Q

What are 4 considerations we need to make about response chains?

A
  • Dependent on reinforcement for continued performance.
  • If a link breaks, all behaviours prior to the broken link will be extinguished.
  • Each reinforcer does not have equal value.
  • Responses farthest from reinforcement are the weakest and easiest to extinguish