The heyday of behaviourism: operant learning Flashcards
What did Edward L. Thorndike study?
‘Human-like’ attributes in animals, explained in terms of reflexes and connections
What was Thorndike’s idea on human psychology?
> In humans, ‘mental life’ requires the assumption of internal events
-> mediated the stimulus-response relationship
> Mediation = key to cognitive psychology
- cognitive processes are mediational
What does the behaviourism vs. cognitivism debate on?
> The nature of the mediating events between observable events (input, output)
> Wether they are necessary for a comprehensive model of the mind
What did Thorndike use as a measure of intelligence?
Is it contested?
Ability to learn as measure of intelligence
- historically challenged by later behaviourists
What are Thorndike’s puzzle boxes?
How do they work?
Puzzle box to study learning:
- hungry cat in cage has to escape to fet the food placed out of reach
- > how long does it take the cat to do it?
- at one point, the cat will push the lever that opens the cage door
- with repeated trials, animal learns to push the lever to get the food
What is Thorndike’s process behind his puzzle boxes?
What is his subsequent learning theory?
Trial and error learning
-> Connectionism
What is the type of learning Thorndike set out, that differs from classical conditioning?
Operant / Contingency (SR) learning:
- association of stimulus-response- outcome
(rather than simple pairing)
- when a specific response is made contingent to a specific stimulus being present
Was Thorndike’s operant learning in experiments with animals limited to simple behaviours?
No, experiments from Thorndike showed animals could learn complex behaviours with same operant learning
What did Thorndike emphasise, which represented a fundamental step in behaviourist thinking?
Importance of the Effect:
> Critical role of the consequences of the response for the organism and in future behaviour
> Dissatisfaction -> less likely to repeat behaviour
Satisfaction -> more likely to repeat behaviour
=> law of effect
What is Thorndike’s concept of 3-Term contingency?
- Situation (stimulus) - antecedent
- Response
- Effect - consequence
What was the work of BF Skinner on behaviourism?
Defined and codified behaviourism
- standardised the tools
- defined its language
Why was BF Skinner more than an experimental psychologist?
He applied principles of behaviourism:
- to child development
- inter-individual differences
- education
- the criminal justice system
- and its wider impact in shaping society/culture
What is Skinner’s categorisation system?
Describes types of consequences:
- increase of behaviour -> reinforcement
- decrease of behaviour -> punishment
What is Skinner’s meaning of positive and negative in his categorisation system?
> Positive (reinforcement/punishment) = add, present, provide something
> Negative (reinforcement/punishment) = remove, take away something
What is a positive reinforcement (Skinner)?
Stimulus is added/provided contingent on the behaviour
- leading to an increase of behaviour in the future
-> add to increase
What is a negative reinforcement (Skinner)?
Stimulus is removed contingent on the behaviour
- leading to an increase of behaviour in the future
-> remove to increase
What is a positive punishment (Skinner)?
Stimulus is added/provided contingent on the behaviour
- leading to a decrease of behaviour in the future
-> add to decrease
What is a negative punishment (Skinner)?
Stimulus is removed contingent on the behaviour
- leading to a decrease of behaviour in the future
- remove to decrease
Why do we have to be careful about how we understand and use terms in psychology (e.g. “reinforcement”, “punishment”)?
They have precise technical meanings and similar but often different everyday usage
e.g. the act of fining as retribution for a crime is seen as punishment by behaviourists only if the fine lead to a reduction of the offending behaviour in the future
What are primary reinforcers?
Natural, unconditioned reinforcers
- adding or taking away any of the primary reinforcers is a powerful determinant of learning and future behaviour
What does the power of primary reinforcers rely on?
Context dependent
- on the state of the organism at a time
e.g. same food will influence behaviour more if animal deprived/hungry vs. satiated
What are secondary reinforcers?
Conditioned reinforcers
- that have acquired reinforcing properties by association with another reinforcer (often primary)
e.g. ‘Little Albert’
Do primary and secondary reinforcers provide different results in animal behaviour studies?
They have similar properties, and show the same effects of deprivation and satiety
What is Skinner’s box?
Puzzle box - operant chamber
- operant learning
- reduces behaviour to simple parameters: speed, intensity, duration
What are the common features of a Skinner box?
> Stimulus (e.g. light, speaker)
Response (e.g. lever)
Reinforcer or punisher (e.g. food, electric shock)
Controller
What do reinforcement schedules refer to in behavioural psychology?
Relationship of contingency between behaviour and reinforcer/punisher, in term of how and when it is delivered
What are the two types of reinforcement schedules?
- Continuous reinforcement schedule (CRF)
- where reinforcement occurs every time a response is made - Intermittent (partial) reinforcement schedule
- reinforcement occurs only after some responses
What are the parameters that define the main types of intermittent reinforcement schedules?
> Time/quantity: ratio or interval
> Certainty/predictability: fixed or variable
What are the 4 types of intermittent reinforcement schedules?
- Fixed ratio (FR): every nth response exactly
- Variable ratio (VR): every nth response on average
- Fixed interval (FI): response every t minutes exactly
- Variable interval (VI): response every t minutes on average
What does a reinforcement schedule influence?
- Learning time
- Intensity and pattern of responding
- Persistence of learnt behaviour
What is the shaping process in operant learning?
Early in learning, it is usually necessary to have a continuous reinforcement schedule or one with a high probability of reinforcement
-> the initial trial and error response is reinforced and likely to be repeated
- once learned, the ratio can be slowly increased across trials so that reinforcement happens less and less while maintaining the responding
What is the characteristic of the behaviour during fixed ratio reinforcement schedules?
‘Staircase pattern’:
- bursts of responding: steep part of the trace until reinforcement is obtained, at which point the response stops (trace shows horizontal line)
- same pattern for all animals during fixed ratio schedules
What is the characteristic of the behaviour during fixed interval reinforcement schedules?
‘Scallop shaped’:
- once animal has learned the interval, they tend to stop or slow down the rate of responding after obtaining the predictably timed reward
- start responding more quickly as duration increases towards the end of expected interval
What characterises the behaviour during variable reinforcement schedules?
- Continuous behaviour
- Rate of responding gets slower as interval between reinforcements is larger, or the higher the number of responses needed on average to get a reward
During variable reinforcement schedules, why is the behaviour continuous, even when animal has just obtained a reward?
Because of uncertainty
- it’s adaptive for the animal to keep responding just in case
During variable reinforcement schedules, why is the rate of responding slower as the interval between reinforcements is larger, or the number of responses needed on average to get a reward is higher?
Adaptive logic:
- e.g. if the reward is food, animal adjusts the amount of energy it uses by varying the rate of responding
-> makes sure there is a gain in energy > to energy spent
(we don’t assume this process is conscious)
What happens when the response stops being reinforced in reinforcement schedules?
Extinction process
What is the pattern of extinction in fixed reinforcement schedules?
> We typically see a continuation in behaviour rather than an immediate stop
- even though it’s more obvious when expected reward is not delivered (than in variable schedules)
> Pre-extinction burst
> Fixed schedules and variable ratio schedules have similar extinction pattern
What is a pre-extinction burst during reinforcement schedules?
Rate of responding increases before eventually stopping
What is the pattern of extinction in variable reinforcement schedules?
Why?
> Behaviour continues much longer than in fixed schedules
- because there’s the degree of uncertainty about when the next reward is expected
- animal is less able to determine wether it’s an unusually large pause or wether behaviour is no longer reinforced
> The larger the mean time between responses, the larger the behaviour will continue before it starts to extinguish
What is the intermittent/partial reinforcement extinction effect?
Intermittent (partial) reinforcement - on fixed or variable schedule - is more resistant to extinction then when the animal has been previously continuously reinforced at every response
When the response stops being reinforced, why does the animal’s behaviour extinguishes?
It’s not useful or adaptive to respond to a stimulus that did not produce a reward (reinforcer)
After a response that has stopped to be reinforced is reinforced once again, why does the animal’s previous behaviour is quickly reestablished and much quicker than it took to learn originally?
The learned association remained within the animal’s learned repertoire and was available to be used when the circumstances changed
- similar to Ebbinghaus’ use of ‘saving’ scores, where participants had not forgotten the associations within informations and could recall even after a month