Instrumental Conditioning Flashcards
Thorndike’s puzzle box
the cats showed a gradual decline in time it takes to escape across repeated trials;
not the pattern that would be expected if a human were placed repeatedly in the same situation
Law of effect
a response followed by a satisfying effect is strengthened and likely to occur again in that situation (stamped in);
a response followed by an unsatisfying effect is weakened and less likely to occur again in that situation (stamped out)
Instrumental conditioning
forming an association between a stimulus and a behavioural response; learning of a contingency between behaviour and consequence
Reinforcers
primary reinforcers = intrinsic value (food, water, mate), instrumental conditioning;
secondary reinforcers = come through previous learning (money), classical conditioning
Operant chamber
chamber with lever or mechanisms by which an animal could respond to produce a reinforcer such as a food pellet; better version of Thorndike
Discriminate stimluli
positive (SD) = signals that relationship is valid;
negative (S-delta) = signals that relationship is invalid
Differences between CS and SD
CS = automatically elicits a response (classical); SD = set the occasion for a response (instrumental)
Acquisition
leads to learning the contingency between a response and its consequences
Reward training
occurs when the arrival of an appetitive stimulus following a response increases the probability that the response will occur again
Escape training
occurs when the removal of an aversive stimulus follows a response and leads to an increase in the probability that the response will occur again
Punishment training
occurs when the arrival of an aversive stimulus follows a response, decreasing the likelihood that the response will occur again
Omission training
occurs when a response leads to the removal of an appetitive stimulus, which decreases the probability of the response happening again
Immediate vs delayed consequences
reinforcement and punishment are most effective when the consequence immediately follows the target behaviour;
allows an organism to accurately associate the correct behaviour with the reinforcer
Autoshaping
learning without direct guidance
Shaping
used for more complex behaviours; target behaviour can be broken down into successive approximations; reduces acquisition time
Chaining
develop a sequence of responses to build even more complex behaviours; a response is reinforced with the opportunity to perform the next response
Contrast effects
negative contrast = going from high reward to low reward respond at a slower rate;
positive contrast = going from low reward to high reward respond at a faster rate
Overjustification effect
a task that was previously regarded as having intrinsic value now becomes viewed as with extrinsic value
Continuous reinforcement
reinforcer follow every correct response
Partial reinforcement
reinforcer follows only some of the responses; more resistant to extinction
Fixed ratio
following reinforcement, there is a post-reinforcement pause, where there is a stop in responses before starting up again
Ratio strain
as the number of responses increases, the post-reinforcement pause becomes longer; can lead to a breakpoint
Variable ratio
produce high, steady rates of responding without the post-reinforcement pause
Fixed interval
FI scallop = responses are typically produced at a low rate early in the interval, with the response rate gradually increasing over the interval
Variable interval
results in a steady rate of responding
Associative learning
in both classical and instrumental conditioning, learning occurs as a result of direct experience
Mirror neurons
cell that responds in the same way when performing an action a it does when the animal possessing that cell observes someone else perform the action or imagines performing the action