W8 Learning 2: Operant conditioning Flashcards

1
Q

Action outcome framework

A

Classical and operant conditioning are experimental paradigms that have lead to highly influential frame-work for associative learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Classical conditioning (pavlov)

A

stimulus-response-associations. Involves the pairing of two stimuli. Conditioning stimulus (CS) and Unconditioned Stimulus (US), US is associated with a hardwired response. Response becomes associated with the CS-through conditioning. CS and US can be temporally segregated or overlapping.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Operant (instrumental) Conditioning

A

US is contingent on behaviour of animal (e.g., only occurs when a lever has been pressed), need for action. Action-outcome association (action will determine the outcomes)

It goes beyond hard-wired unconditioned responses and incorporates more complex behaviour.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Learning of action-outcome associations

A

‘Response’ (in operant conditioning): pressing a lever, opening a door, pushing a button etc.
Operant behaviour: under stimulus control, so that the action can be a response to a certain stimulus/situation

The outcome can be a ‘reinforcement’ or a ‘punishment’
Action => Outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Law of Effect

A

“… responses that create a typically pleasant outcome in a particular situation are more likely to occur again in a similar situation, whereas responses that produce a typically unpleasant outcome are less likely to occur again in the situation” (Thorndike, 1911)

Action is driven by reward (pleasant outcome).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Skinner Box (Operant chamber)

A

Allows for variety of operant conditioning paradigms.
Lights – Speakers – stimulus : generate action
Lever for responses
Food dispenser – appetitive stimuli/rewards: outcomes (reward)
Electrified grid – aversive stimuli/punishment: outcomes (punishment)
Used with rodents – very good at responding to these paradigms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Skinner’s Terminology: Reinforcer

A

an event that increases the likelihood of the action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Skinner’s Terminology: Punishment

A

an event that decreases the likelihood of the action. (prevent you to do something again.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Skinner’s terminology: Positive

A

Something has been introduced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Skinner’s Terminology: Negative

A

Something has been removed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Punishment

A

Decreases Behaviour
Less beneficial than Reinforcement
Temporary changes in behaviour – based on coercion
Creates negative/adversarial relationship
When the person who provide punishment leaves – unwanted behaviour returns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Reinforcement

A

Increases Behaviour
More beneficial than punishment
More likely to result in long-term changes in behaviour
Creates positive relationship with the person providing reinforcement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Classical condition: partial reinforcement:

A

Classical condition: partial reinforcement: intersperse trials in which the CS is not followed by the US. This is done randomly so that the CS is followed by the US with a certain probability (here 75%). Slows down both acquisition and extinction learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Partial reinforcement: reinforcement schedules

A

responses are sometimes reinforced and sometimes not.
Slower initial learning: but greater resistance to extinction
As reinforcement does not appear after every behaviour, it takes longer for learner to determine a lack of reward. Extinction is slower.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Fixed ration

A

behaviour is reinforced after a specific number of responses. (e.g. giving a child a sweet after reading 5 pages of a book.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Variable ratio

A

behaviour is reinforced after an average, but unpredictable number of responses. (e.g. Payoffs from slot machines and other games of
chance)

16
Q

Variable interval

A

behaviour is reinforced for the first response after an average but unpredictable, amount of time has passed (e.g. periodically checking email)

17
Q

Fixed interval

A

behaviour is reinforced for the first response after a specific amount of time has passed (e.g. receiving a monthly salary for work)

18
Q

Fixed Interval (FI) Schedule

A

First response after a designated amount of time is followed by reinforcement. FI 60s – every 60 seconds.
Produce characteristic pattern of responding observable across species.
PRP followed by slow rates of responding and high rates of responding toward the end of the interval. “Scallop” (monthly salery)

18
Q

Variable Ratio (VR) Schedule

A

Responding reinforced after a randomly determined number of responses have been emitted. (VR). VR15 – average number of responses for reinforcement – but could be anywhere between 1 – 29. Rate of responding for VR is typically faster than FR, no PRP in VR schedule. Response rates relatively constant over time. (quicker than fixed ration)

18
Q

Fixed ratio (FR) schedule

A

The number of responses required for reinforcement is describes the schedule. Continuous reinforcement is technically FR1. Probability of reinforcement increases with successive responses. Brief pause in responses (Post-reinforcement Pause – PRP) after each reinforcement before responses begin again. ‘Stair-step’ pattern

19
Q

Variable Interval (VI) Schedule

A

Responding reinforced after a randomly determined amount of time. VI 60s – average of 60 seconds between reinforcements – but individual intervals will differ from one another.
Relatively constant, no PRP (except at unusually low rates of reinforcement). Most commonly used schedule in operant research – produces steady predictable performance

20
Q

Shaping

A

process of guiding behaviour to the desired outcome through the use of intermediate stages.