Lecture 22 - Instrumental conditioning Flashcards
Instrumental conditioning is also called
operant conditioning
Thorndike
In about 1900, he conducted research examining whether animals could solve problems or “think”.
Thorndike designed a variety of “puzzle boxes” from which the cats had to learn to escape.
Measure behaviours of how they would escape
Didn’t think that cats could understand the effects of their behaviour
Used graphs to measure the rate of learning
Well rehearsed cat got out faster, if there was a reward such as food then the learning becomes stamped in their behaviour repertoire
So, had the cat “understood” the solution to the problem?
Thorndike argued “No”, because the learning curves show no sudden “insightful” drop.
Behaviour slowly drifted to the solution of the problem
Law of effect
“Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur“
In other words, positive consequences increased the likelihood or probability of a response (rather than a “reflexive” relation).
Positive consequences like food
More likely to occur
Not a guaranteed behaviour like reflexive reactions
Punishment is seen as the opposite to
reinforcement
Punishment is seen as the opposite to reinforcement
“those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with the situation weakened, so that, when it recurs, they will be less likely to recur.”
Behaviours are “stamped out” if followed by negative consequences.
e.g. like discomfort
Therefore less likely to occur again
In Thorndike’s puzzle box, the behaviours followed by release were steadily strengthened, while behaviours unrelated to release faded with time.
Shaping of behaviour with interaction with the environment
The environment selects the “fittest” behaviours, in the same way it selects the “fittest” individuals of a species.
Behvaiours in the repertoire followed by positive consequences become more frequent and vice versa
Learnt behaviours are shaped by the law of effect (e.g. Peachy running to the kitchen when her food drawer opens up)
e.g. bears
Bears develop their own fishing style through experience
Difference between cubes and bears as bears are well rehearsed in this activity
Classical vs instrumental conditioning
Classical conditioning is a relation between two stimuli (CS and US). The CS elicits the CR.
Instrumental conditioning concerns the probability or likelihood of a response changing as a function of its consequences. The subject emits the response in order to produce a reward.
Skinner
One of the most famous psychologists. A central figure in the area of psychology known as Behaviourism.
This movement was a reaction against introspectionism towards more objective measurement in psychology.
Before there was no objective data base, all subjective most of the time
Skinner was not big on punishment, thought that positive reinforcement would be more effective
Experiment (well known)
Reared his infant daughter in an “air crib”. This filtered and controlled the air supply, and a roll of paper replaced nappies.
Skinner claimed this convenient apparatus allowed more time for social interaction, while critics called it dehumanising.
Story that she went crazy as a result is false
“Walden Two” - behavioural utopia.
“Verbal Behaviour”
WWII work - Project Pigeon.
Orgcon project
Missile were taught to pilot missiles and steer the missile accurately towards a ship
Responses to 2 out of 3 pigeons that steered
This technique was never used in actual combat
Teach the pigeons to peck dot (food reinforcement) then put silhouette of boat with dot for steering the missile
Use of animals that have been instrumentally conditioned used in
the US navy and also in border control dogs for example
Operant
The operant is the response defined in terms of its environmental effect.
Varying response with the same overall effect
Behaviour = operant
Skinner’s Version of the Law of Effect.
When a response is followed by a reinforcer, the strength of the response increases. When a response is followed by a punisher, the strength of the response decreases.
Acquistion and operant conditioning
Behaviour shaped by successive approximations.
Broad behavioural category and reward any behaviour in this, then refine the category and reinforce more specific response that you want
Training a rat to press a lever - start with a broad response criterion and progressively narrow it.
The world is a “trainer”. Positive and negative consequences of actions constantly shaping behaviour repertoires.
E.g. learning to write as a kid
Year 1 anything that looks like a circle with a stick for the letter a is rewarded but by year 4 there is a level of accuracy required so only reward good a letter writing
Reinforcement …
Reinforcement increases the likelihood of behaviour. (About increasing rate/probability/likelihood of behaviour)
Postive reinforcement
Positive reinforcement
Adding a stimulus or event contingent upon a response increases that behaviour.
Lab: Provide food to a food-deprived rat for lever- pressing.
Life: Child receives pocket-money for doing “chores”. Affection of partner for “kind” act. Henry the cat “nuzzles” Jessica.
Negative reinforcment
Removing a stimulus or event contingent upon a response increases that behaviour. (Decreasing)
Lab: A rat presses a lever to terminate (escape - terminates aversive behaviour) or prevent (avoidance - prevents happening in the first place) an electric shock through the floor.
Life: Child does homework to avoid detention (or corporal punishment) at school. Don’t stay out all night to avoid partner’s wrath.
Negative reinforcement example
take a prisoner out of prison for good behaviour