Learning Part 2: Operant Conditioning Flashcards
Operant Conditioning
Goal directed behaviour
operant conditioning is concerned with how environmental stimuli shape complex goal-directed behaviours?
Edward Thorndike
His experiments, conducted at the turn of the 20th century, paved the way for a behaviourist account of voluntary behaviour
He worked with different animals: e.g. chicks, cats and dogs
He wanted to find out whether animals use reasoning to solve problems
Famous for Thorndike’s puzzle box
Thorndike’s puzzle box
Thorndike’s puzzle box: a cat was placed inside a puzzle box and food is placed outside of the box
Is the cat able to work out a mechanism to open the door of the box to obtain the food?
Results:
The cat learned by trial and error (and success): first attempts are random, then it stumbled across solution
Cats became faster on subsequent trials in the same puzzle box
Cats learn to associate response with rewarding consequence
Consequences shape behaviour: unsuccessful responses are gradually eliminated
The conclusion is that cats learn simple stimulus-response (S-R) associations rather than complex reasoning processes
Law of Effect
Responses followed by a satisfying state of affairs are strengthened and are more likely to occur again (rewards)
Responses followed by an annoying or unsatisfactory state of affairs are weakened and are unlikely to occur again (punishment)
B.F Skinner (1904-1990)
He was influenced by Thonrndike’s work describing voluntary human behaviour using basic S-R associations and without resorting to mentalistic concepts
“Behaviour operates on the environment to generate consequences.”
Organisms learn which behaviours are emitted to earn rewards or avoid punishments
Operant describes any active (voluntary) behaviour that is produced in order to generate consequences, or is instrumental in generating consequences
Essentially everyone is trying to gain something desired or avoid something unpleasant
B.F Skinner (consequences shape behaviour)
consequences shape behaviour: unsuccessful responses are gradually eliminated
Reinforcement:
Reinforcement occurs when the consequences of an action increase the likelihood of the action being repeated
Reinforcement increases or strengthens the occurrence of a behavior in the future
Positive reinforcement +
Stimulus or event which, when presented as a consequence of a behaviour, increases the likelihood of that behaviour recurring in the future
Negative reinforcement -
Stimulus or event which, when reduced or terminated, increases the likelihood that an associated behavior will recur
Continuous reinforcement
Each response is reinforced
Partial reinforcement
Reinforcement is given only for some correct responses
Generates behavior that persists longer: learners keep “testing” for a reward
Fixed ratio schedule
Rewarded after a fixed number of correct responses
high rate of responding
faster responses yield quicker payoffs (“bursts”)
e.g. paid for producing a specific number of items
Variable ratio schedule
Rewarded after an average number of correct responses
high rate of responding: persistent responding
People/ animals hope that the next response will bring reward
e.g. gambling
Fixed interval schedule
Reinforcement for first correct response after a fixed time period
Flurry of responding right before a reward is due
e.g. test scheduled every four weeks
Variable interval schedule
Rewarded for first correct response after an average time period
Less predictable
Slow but steady pattern of responding (“testing”)
e.g. surprise quizzes
Shaping
Learning more complex behaviours by reinforcing successive approximations to the desired behaviour:
Reinforce high frequency component of desired response
Drop reinforcement – behaviour becomes more variable again
Await response that is still close to desired response – then reintroduce reinforcement
keep cycling: closer approximations are achieved
Shaping of behaviour which is not in the animal’s natural repertoire
Extinction
Extinction occurs when reinforcement is withheld
It is not an immediate process, often brief increase in responding
Partially reinforced responses are harder to extinguish
Punishment
The use of aversive consequences to reduce undesirable behavior
Any event which decreases the likelihood that ongoing behaviour will recur
Positive punishment +
Behaviour is followed by the presentation of an aversive stimulus
Stimulus is added to situation
e.g. electric shock
Negative punishment -
Behaviour is followed by withdrawal of rewarding stimulus
Stimulus is taken away
e.g. removal of toys
problems associated with Punishment
Punishment is more effective when it is swift (no delay) and consistent (not just administered sometimes)
It is less effective than reinforcement because no desired behaviour is established
It does not cause long-term behaviour change: suppression of behaviour
When threat of punishment is removed, the behaviour returns (e.g. speed cameras)
It produces negative feelings and does not promote new learning
It may indeed teach the recipient to use punishment towards others
It is useful if behaviour is dangerous and must be changed/suppressed quickly
Operant Conditioning: Children
Reinforce alternative behaviour that is incompatible with the undesirable behaviour (e.g. respond to normal voice only, not to screaming)
Identify the crucial reinforcer (maintaining the behaviour) and stop reinforcing the problem behaviour (extinction)
Reinforce the non-occurrence of the undesirable behaviour
Remove the opportunity for positive reinforcement
Use strongly reinforcing stimuli, but use variety (e.g. praise, privileges)
Immediate reinforcement after the preferred behaviour
Start with reinforcing all the time, switch to intermittent
Encourage self-reinforcement through pride and a sense of self-control
Martin Seligman (Learned Helplessness)
He investigated the effects of exposure to uncontrollable shock on escape/avoidance learning in dogs
1/3 of dogs exposed to unavoidable shock failed to learn to avoid or escape from an unpleasant or aversive stimulus
first phase: Classical Conditioning
- shock paired with light
second phase: Operant Conditioning
- learn to jump when light is switched on to the other side of the box
Basic Principles of Learned Helplessness
Learned helplessness might explain behaviour after abuse and in depression
When the traumatic event first occurs it causes a heightened state of emotionality, which has been called “fear“
Fear continues until the subject learns that he can or cannot control the trauma
“If subject learns that he cannot control the traumatic event, fear decreases and is replaced with depression.” (Seligman, 1979)