PSY260 - 5. Operant Conditioning Flashcards

1
Q

Operant Conditioning

A

whereby organisms learn to make responses in order to obtain or avoid important consequences
Operant conditioning is a form of associative learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Operant Conditioning

A

based on avoiding/obtaining a specific outcome
requires organism operate in its environment to determine outcome
instrumental conditioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Thorndike

A

first to study behavioural outputs due to operant conditioning - Puzzle boxes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Thorndike

A

findings of the puzzle box work suggest organisms:
More likely repeat actions they have experienced as producing satisfying consequences
Less likely repeat actions they have experienced as producing undesirable consequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

law of effect

A

probability that particular behavioural response increases/decreases depending on consequences that have followed that response in the past

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

law of effect

A

Stimulus S→Response R→Outcome O

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Free-Operant Learning

A

Thorndike’s learning procedures involved discrete trials

Discrete Trials: operant conditioning paradigm where experimenter defines beginning + end of each trial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

B.F. Skinner

A

believed he could refine Thorndike’s techniques, and devised the Skinner box to do this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Skinner Box

A

conditioning chamber - reinforcement/punishment is automatically delivered when animal makes a response (lever pressing)
trough on one wall - food delivered automatically
•When animal pressed lever, food dropped into trough

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Free Operant Paradigm

A

Skinner’s paradigm: animal can operate apparatus “freely”, responding to obtain reinforcement/avoid punishment, whenever it chooses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Skinner Box: Extinction

A

-decrease by #13, no longer gets desired outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Free Operant Learning

A

S (Light ON) → R (Lever Press) → O (Food Release)
S (Light OFF) → R (Lever Press) → O (NO Food Release)
learn to distinguish betw light on/off + understand consequences change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

reinforcement

A

Providing consequences to increase probability of a behaviour occurring again in future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

punishment

A

Providing consequence to decrease probability of a behaviour occurring again in future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Components of Learned Association: 3

A

stimulus (or set of stimuli)
response (or set of responses)
outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Components of Learned Association

A

3-way association betw S, R, and O

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Discriminative Stimuli

A

stimuli that signal whether particular response will lead to particular outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Stimuli

A

particular set of stimuli, responses + outcomes might become so strongly associated that they become inflexible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

habit slip

A

when discriminative stimulus so strongly associated with response – alarm clock wakes you up and you get dressed for school even on the weekend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Responses

A

Behaviour given in reaction to stimulus in order for a particular outcome to come about

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Shaping

A

operant conditioning technique in which successive approximations to desired response are reinforced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Chaining

A

organisms gradually trained to execute complicated sequences of discrete responses
Backwards chaining: longer, more complex set of steps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Reinforcers

A

particular consequence for associated behaviour that

increases likelihood of behaviour being repeated in future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Primary Reinforcers

A

stimuli - food, water, sex + sleep - innately reinforcing: organisms naturally driven to obtain these things + tend to repeat behaviours that increase their access to them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Secondary Reinforcers

A

stimuli no intrinsic value but paired with primary reinforcers/provide access to primary reinforcers
(money, gets us our primary needs)
useful because trainer can deliver reinforcement immediately without waiting till trick is finished
•Although animals will not work for food unless they’re hungry, they may continue to work indefinitely for secondary reinforcers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Punishers

A

consequence of behaviour leads to decreased

likelihood of behaviour occurring again in future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Effectiveness of Punishment: 1. Discriminative stimuli for punishment can encourage cheating

A

Discriminative stimuli can signal if response will be punished causing someone to alter their behaviour to avoid punishment only when they believe there will be a consequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Effectiveness of Punishment: 2. Concurrent reinforcement can undermine punishment

A

Effectiveness of punishment can be counteracted if reinforcement occurs along with punishment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Effectiveness of Punishment: 3. Punishment leads to more variable behaviour

A

does not specify what alternate response will occur when an organism explores other possible responses
punishment is not a good way to shape/train particular desired behaviours

30
Q

Effectiveness of Punishment: 3. Punishment leads to more variable behaviour

A

reinforcement is a faster way to produce learning than simply punishing the alternate undesired response, as it reduces the likelihood of organism exploring undesirable alternate behaviours

31
Q

Effectiveness of Punishment: 4. Initial intensity of punishment determines effectiveness

A

most effective if strong punisher used from the outset – if prior weak punishers are initially given instead, they undermine effectiveness of severe punisher when it finally comes later on

32
Q

Putting it all Together: Building the S- R-O Association

A

Rules determining when outcomes delivered - reinforcement schedules

33
Q

Timing Affects Learning

A

faster if R-O interval is short
Schlinger & Blakely (1994): Immediate reward delivery following lever press = quicker association formation than delayed reward presentation

34
Q

Timing Affects Learning

A

closeness in timing important for effectiveness

Reinforcement/punishment = most effective if no delay betw response + punishing consequence

35
Q

Timing Affects Learning

A

society tends to delay delivery of punishment which

undermines punishment’s effectiveness + weakens learning

36
Q

Timing Affects Learning

A

Delay betw response + consequence weakens reinforcer/punisher effectiveness because later consequences/outcomes more likely to be associated with other behaviours that occurred during the delay

37
Q

Self-Control

A

organism’s willingness to forgo small immediate reinforcement in favor of a large future reinforcement
trade-off
Age impacts ability to wait for delayed reinforcement

38
Q

Pre-commitments

A

improve ability to wait for reward

make it harder to go back on commitments needed for long term achievements

39
Q

Pre-commitments

A

would need to break their pre-commitment

difficulty associated with breaking a pre-commitment helps people stick to their commitment or promise

40
Q

Outcomes Can Be Added or Subtracted

A

When consequence (reinforcer/punisher) is added→positive reinforcement/punishment

41
Q

Positive Reinforcement

A

response cause reinforcer to be “added”
over time response becomes more frequent
S (toilet present) → R (empties bladder) → O (praise)

42
Q

Positive Punishment:

A

response cause punisher to be “added”
over time response becomes less frequent
S (toilet not present) → R (empties bladder) → O (disapproval)

43
Q

Consequences Can Be Added or Subtracted

A

Outcomes/consequences (reinforcers/punishers) can be removed or subtracted to cause learning

44
Q

Negative Reinforcement

A

response causes punisher subtracted
over time response becomes more frequent
Behaviour is encouraged (reinforced) because it causes something to be taken away/subtracted from environment
S (headache) → R (take aspirin) → O (no more headache)

45
Q

Negative Punishment

A

response causes reinforcer to be subtracted
over time response becomes less frequent
Behaviour not encouraged – something subtracted from environ + subtraction punishes behaviour
S (party) → R (late for curfew) → O (grounded)

46
Q

Negative Punishment

A

activity/consequence being restricted needs to be deemed enjoyable by person being punished

47
Q

Continuous Reinforcement

A

every instance of the response followed by consequence

48
Q

Partial Reinforcement

A

only some responses reinforced - intermittent reinforcement schedules
can be applied to reinforcement/punishment

49
Q

Partial Reinforcement

A

fixed ratio, fixed interval, variable ratio, variable interval schedule

50
Q

Fixed Ratio (FR) Schedule

A

specific # of responses required before reinforcer delivered
Reinforcement comes after fixed # of responses
schedules can increase gradually
Can often lead to a postreinforcement pause

51
Q

Postreinforcement Pause

A

FR schedule of reinforcement - brief pause following period of fast responding leading to reinforcement

52
Q

Fixed Interval (FI) Schedule

A

first response after fixed amount of time reinforced

53
Q

Variable Ratio Schedule

A

certain number of responses, on avg, required before reinforcer is delivered
Reinforces first response after particular time interval

54
Q

Variable Ratio Schedule

A

Responder never knows exactly when reinforcer is coming

Produces a higher rate of responding than fixed ratio schedules

55
Q

Variable Interval (VI) Schedule

A

reinforcement schedule where first response after fixed amount of time, on average, is reinforced
Reinforces the first response after an interval that averages particular amount of time

56
Q

Variable Interval (VI) Schedule

A

Response rate steadier than fixed interval schedule due to element of uncertainty + since animals periodically check for reinforcement availability

57
Q

Protestant ethic affect

A

reward should be earned and that hard workers are morally superior to freeloaders

58
Q

Clark Hull – drive reduction theory

A

all learning reflects biological need to reduce drives by obtaining primary reinforcers
•primary reinforcers are not always reinforcing, not created equal

59
Q

• Negative contrast

A

organisms given a less preferred reinforcement in place of unexpected and preferred reinforcer will respond less strongly for the less preferred reinforcer and if they had been given that last preferred reinforcer all along

60
Q

Choice behavior

A

•Concurrent reinforcement schedules: organism can make any of several possible responses, each leading to different outcome
•Examine how organisms choose to divide their time + efforts among different options
Channel Surfing

61
Q

Matching a lot of choice behavior

A

given 2 responses reinforced on VI schedules, organisms relative rate of making each response will match relative rate of reinforcement for that response
•Rate of response for A/rate of response B = rate of reinforcement for a/rate of VI reinforcement for B

62
Q

Behavioral economics and bliss point

A
  • Study of how organisms allocate their time and resources among possible options
  • Bliss point: allocation of resources that provides maximal subjective value to an individual
63
Q

The Premack principle: responses as reinforcers

A
  • Opportunity to perform highly frequent behavior can reinforce a less frequent behavior
  • Watching television is preferred activity, parent restricts television time, making it contingent on homework
  • Response deprivation hypothesis: critical variable is merely which response has been restricted
64
Q

dorsal Striatum + stimulus-response learning

A
  • Info from sensory cortex to motor cortex can travel via indirect route through basal ganglia
  • Dorsal striatum – caudate nucleus + putamen
  • Receives highly processed stimulus info from sensory cortical areas would project to motor cortex, which produces behavioral response
  • critical role in operant conditioning, particularly if discriminative stimuli involved
  • Individuals with damage/disruption to striatum show deficit inability to associate stimulus with correct response
65
Q

Orbitofrontal cortex and learning to predict outcomes

A
  • Underside of front of brain contribute to goal directed behavior by representing predicted outcome
  • Receives inputs conveying full range of sensory modalities + visceral sensations [hunger], allowing to integrate many types of information
  • Outputs from orbitofrontal cortex travel to striatum, where they can help determine which motor responses are executed
  • Lesions tend to show inflexible or inappropriate responding
  • Important for associating response with particular outcomes
66
Q

Orbitofrontal cortex and learning to predict outcomes

A
  • During delay, some neurons in orbitofrontal cortex fire differently, depending on what the reward or punisher is expected
  • Medial portion process info about reinforcement, lateral portion process info about punishers
  • Some neurons appear to code actual identity of expected outcome
  • Neurons may also play a role in helping us select between potential actions based on expected consequences
  • Neurons respond with strength proportional to perceived value of each choice
67
Q

Wanting and liking in the brain

A

ventral tegmental area (VTA) – small region + midbrain of mammals
•Electrical brain stimulation causes excitement or anticipation of reinforcement
•Hedonic value: goodness of reinforcer, how much we like it
•Motivational value: how much we want a reinforcer and how hard we are willing to work to obtain it
•Only wanting + liking signals both present will arrival of reinforcer evoke responding + strengthen SR Association

68
Q

Dopamine: how the brain signals wanting

A
  • VTA produces dopamine
  • Dopamine release from VTA/SNc triggered by encounters with food, sex, drugs of abuse, and secondary reinforcers
  • Drugs that interfere with dopamine production/transmission reduce responding in trained animal
69
Q

• Incentive salience hypothesis of dopamine function

A

role of dopamine in operant conditioning is signal how much animal wants particular outcome – how motivated it is to work for it
•Dope depleted animal still willing to eat preferred food if placed in front of them, but unwilling to work hard to earn it
•Increasing brain dopamine levels can increase craving
•Stimulating dopamine system increases wanting but not liking
•Increases in brain dopamine levels tend to enhance new SR learning

70
Q

Endogenous opioids: how brain signals liking

A
  • Endogenous opioids: naturally occurring neurotransmitter like substances [peptides] with same affects of opiates
  • Released in response to primary reinforcers, may be released in response to secondary reinforcers + pleasurable behaviours
  • Differences in amount of endogenous opioid release + in specific opiate receptors to determine organisms preference for one reinforcer over another
71
Q

How do wanting and liking interact

A
  • Some endogenous opioids may modulate dopamine release

* Endogenous opioids with signal liking, which in turn would affect VTAs ability to signal info about want