Learning Part 2: Operant Conditioning Flashcards
Operant Conditioning
Goal directed behaviour
operant conditioning is concerned with how environmental stimuli shape complex goal-directed behaviours?
Edward Thorndike
His experiments, conducted at the turn of the 20th century, paved the way for a behaviourist account of voluntary behaviour
He worked with different animals: e.g. chicks, cats and dogs
He wanted to find out whether animals use reasoning to solve problems
Famous for Thorndike’s puzzle box
Thorndike’s puzzle box
Thorndike’s puzzle box: a cat was placed inside a puzzle box and food is placed outside of the box
Is the cat able to work out a mechanism to open the door of the box to obtain the food?
Results:
The cat learned by trial and error (and success): first attempts are random, then it stumbled across solution
Cats became faster on subsequent trials in the same puzzle box
Cats learn to associate response with rewarding consequence
Consequences shape behaviour: unsuccessful responses are gradually eliminated
The conclusion is that cats learn simple stimulus-response (S-R) associations rather than complex reasoning processes
Law of Effect
Responses followed by a satisfying state of affairs are strengthened and are more likely to occur again (rewards)
Responses followed by an annoying or unsatisfactory state of affairs are weakened and are unlikely to occur again (punishment)
B.F Skinner (1904-1990)
He was influenced by Thonrndike’s work describing voluntary human behaviour using basic S-R associations and without resorting to mentalistic concepts
“Behaviour operates on the environment to generate consequences.”
Organisms learn which behaviours are emitted to earn rewards or avoid punishments
Operant describes any active (voluntary) behaviour that is produced in order to generate consequences, or is instrumental in generating consequences
Essentially everyone is trying to gain something desired or avoid something unpleasant
B.F Skinner (consequences shape behaviour)
consequences shape behaviour: unsuccessful responses are gradually eliminated
Reinforcement:
Reinforcement occurs when the consequences of an action increase the likelihood of the action being repeated
Reinforcement increases or strengthens the occurrence of a behavior in the future
Positive reinforcement +
Stimulus or event which, when presented as a consequence of a behaviour, increases the likelihood of that behaviour recurring in the future
Negative reinforcement -
Stimulus or event which, when reduced or terminated, increases the likelihood that an associated behavior will recur
Continuous reinforcement
Each response is reinforced
Partial reinforcement
Reinforcement is given only for some correct responses
Generates behavior that persists longer: learners keep “testing” for a reward
Fixed ratio schedule
Rewarded after a fixed number of correct responses
high rate of responding
faster responses yield quicker payoffs (“bursts”)
e.g. paid for producing a specific number of items
Variable ratio schedule
Rewarded after an average number of correct responses
high rate of responding: persistent responding
People/ animals hope that the next response will bring reward
e.g. gambling
Fixed interval schedule
Reinforcement for first correct response after a fixed time period
Flurry of responding right before a reward is due
e.g. test scheduled every four weeks
Variable interval schedule
Rewarded for first correct response after an average time period
Less predictable
Slow but steady pattern of responding (“testing”)
e.g. surprise quizzes