W8 Learning 2: Operant conditioning Flashcards
Action outcome framework
Classical and operant conditioning are experimental paradigms that have lead to highly influential frame-work for associative learning.
Classical conditioning (pavlov)
stimulus-response-associations. Involves the pairing of two stimuli. Conditioning stimulus (CS) and Unconditioned Stimulus (US), US is associated with a hardwired response. Response becomes associated with the CS-through conditioning. CS and US can be temporally segregated or overlapping.
Operant (instrumental) Conditioning
US is contingent on behaviour of animal (e.g., only occurs when a lever has been pressed), need for action. Action-outcome association (action will determine the outcomes)
It goes beyond hard-wired unconditioned responses and incorporates more complex behaviour.
Learning of action-outcome associations
‘Response’ (in operant conditioning): pressing a lever, opening a door, pushing a button etc.
Operant behaviour: under stimulus control, so that the action can be a response to a certain stimulus/situation
The outcome can be a ‘reinforcement’ or a ‘punishment’
Action => Outcome
Law of Effect
“… responses that create a typically pleasant outcome in a particular situation are more likely to occur again in a similar situation, whereas responses that produce a typically unpleasant outcome are less likely to occur again in the situation” (Thorndike, 1911)
Action is driven by reward (pleasant outcome).
Skinner Box (Operant chamber)
Allows for variety of operant conditioning paradigms.
Lights – Speakers – stimulus : generate action
Lever for responses
Food dispenser – appetitive stimuli/rewards: outcomes (reward)
Electrified grid – aversive stimuli/punishment: outcomes (punishment)
Used with rodents – very good at responding to these paradigms.
Skinner’s Terminology: Reinforcer
an event that increases the likelihood of the action
Skinner’s Terminology: Punishment
an event that decreases the likelihood of the action. (prevent you to do something again.)
Skinner’s terminology: Positive
Something has been introduced
Skinner’s Terminology: Negative
Something has been removed.
Punishment
Decreases Behaviour
Less beneficial than Reinforcement
Temporary changes in behaviour – based on coercion
Creates negative/adversarial relationship
When the person who provide punishment leaves – unwanted behaviour returns
Reinforcement
Increases Behaviour
More beneficial than punishment
More likely to result in long-term changes in behaviour
Creates positive relationship with the person providing reinforcement
Classical condition: partial reinforcement:
Classical condition: partial reinforcement: intersperse trials in which the CS is not followed by the US. This is done randomly so that the CS is followed by the US with a certain probability (here 75%). Slows down both acquisition and extinction learning.
Partial reinforcement: reinforcement schedules
responses are sometimes reinforced and sometimes not.
Slower initial learning: but greater resistance to extinction
As reinforcement does not appear after every behaviour, it takes longer for learner to determine a lack of reward. Extinction is slower.
Fixed ration
behaviour is reinforced after a specific number of responses. (e.g. giving a child a sweet after reading 5 pages of a book.)
Variable ratio
behaviour is reinforced after an average, but unpredictable number of responses. (e.g. Payoffs from slot machines and other games of
chance)
Variable interval
behaviour is reinforced for the first response after an average but unpredictable, amount of time has passed (e.g. periodically checking email)
Fixed interval
behaviour is reinforced for the first response after a specific amount of time has passed (e.g. receiving a monthly salary for work)
Fixed Interval (FI) Schedule
First response after a designated amount of time is followed by reinforcement. FI 60s – every 60 seconds.
Produce characteristic pattern of responding observable across species.
PRP followed by slow rates of responding and high rates of responding toward the end of the interval. “Scallop” (monthly salery)
Variable Ratio (VR) Schedule
Responding reinforced after a randomly determined number of responses have been emitted. (VR). VR15 – average number of responses for reinforcement – but could be anywhere between 1 – 29. Rate of responding for VR is typically faster than FR, no PRP in VR schedule. Response rates relatively constant over time. (quicker than fixed ration)
Fixed ratio (FR) schedule
The number of responses required for reinforcement is describes the schedule. Continuous reinforcement is technically FR1. Probability of reinforcement increases with successive responses. Brief pause in responses (Post-reinforcement Pause – PRP) after each reinforcement before responses begin again. ‘Stair-step’ pattern
Variable Interval (VI) Schedule
Responding reinforced after a randomly determined amount of time. VI 60s – average of 60 seconds between reinforcements – but individual intervals will differ from one another.
Relatively constant, no PRP (except at unusually low rates of reinforcement). Most commonly used schedule in operant research – produces steady predictable performance
Shaping
process of guiding behaviour to the desired outcome through the use of intermediate stages.