Mazur Chapter 5: Basic Principles of Operant Conditioning Flashcards
Thorndike
Thorndike was the first researcher to systematically study how nonreflexive behaviors can be modified as a result of experience.
A puzzle box (term Thorndike used for his small chamber) was used to study operant conditioning. The puzzle boxes ranged in difficulty from simple to complex.
Thorndike believed that organisms’ initial correct responses to leave the chamber were accidental
The gradual improvement over trials strengthened the
S – R connection
He formulated the Law of Effect to account for this strengthening of the S – R association.
Law of Effect
Responses that are accompanied with or are closely followed by satisfaction to the organism will be more connected to the situation and make it more likely that the response will recur again
Responses that are accompanied with or are closely followed by discomfort to the organism will have weakened connections to the situation and make it less likely that the response will recur
The greater the satisfaction or discomfort, the greater the strengthening or weakening of the association, respectively.
The “satisfying state of affairs” that Thorndike referred to has been replaced by the term “reinforcer.”
Gutherie and Horton: Evidence for a Mechanical Strengthening Process
Gutherie and Horton were two researchers who followed Thorndike’s experimental paradigm.
The learning that took place in the puzzle box involved the strengthening of whatever behavior happened to be followed by escape and food
After their cats mastered the task (i.e., getting out of the box), there was relatively little variability from trial to trial for a given cat, but there was considerable variability from one cat to the other
stop-action principle
Brown and Herrnstein (1975) used Guthrie and Horton’s results to add a principle to the Law of Effect, which they called the stop-action principle:
The occurrence of the reinforcer (i.e., escape) serves to stop the organism’s ongoing behavior and to strengthen the association between the situation (the puzzle box) and those precise behaviors that were occurring at the moment of reinforcement
The specific bodily position and muscle movements occurring at the time of reinforcement will have a higher probability of occurring on the next trial
The more those particular behaviors yield the reinforcement on subsequent trials, the stronger the S – R connection will be.
Skinner’s “superstition experiment”
Provided a strong case for the power of accidental reinforcement
Skinner observed that pigeons would engage in various behaviors repeatedly (e.g., head tossing) after head tossing just so happened to be occurring when the reinforcer was delivered. If the first reinforce occurred immediately after a pigeon had tossed its head upward, this behavior would be more likely to occur in the future.
Skinner believed that many of the superstitious behaviors people engaged in were produced by the same mechanism
Superstitious behaviors frequently arise in situations where an individual as no control over the outcome, such as gambling
They also occur often in sports, and even sometimes they occur without the athlete’s awareness.
Societal Superstitions
Herrnstein (1966) believed that Skinner’s analysis was confined to idiosyncratic behaviors, but that societal superstitions were acquired through communication with others, and not through direct experience
The initial superstition was likely a result of contingencies of reinforcement that are no longer in effect.
Staddon and Simmelhag: Interim and Terminal Behaviors
Replicated Skinner’s results, but came to a different conclusion about the purpose the behaviors
They found that there were behavior patterns that occurred frequently in many or all of the pigeons during the interval between food deliveries
These behaviors fall into 2 groups:
- Interim behaviors: Those that are frequent in the early part of the interval, when the next reinforcer was still some time away
- Terminal behaviors: Behaviors that seldom occurred early in the interval but increased in frequency as the time of food delivery approached
They felt that interim behaviors were not a result of accidental reinforcement, but instead they are ones that the animal has an innate predisposition to perform when the likelihood of reinforcement is low
Terminal behaviors may not be related to accidental reinforcement either, and could simply frequently occur when food is about to be delivered.
Research has supported Staddon and Simmelhag’s theory, at least in part. Others call interim and terminal behaviors adjunctive behaviors. There has been evidence for Skinner’s theory in the laboratory and out in the field too.
Interim behaviors
Those that are frequent in the early part of the interval, when the next reinforcer was still some time away
Terminal behaviors
Behaviors that seldom occurred early in the interval but increased in frequency as the time of food delivery approached
Shaping
Method of successive approximations toward a desired response
A primary reinforcer is one that naturally strengthens any response it follows, including food, water, sexual pleasure, and comfort.
Shaping involves making use of the natural variability in the subject’s behavior by gradually making your criterion for reinforcement more demanding, until the desired response is executed.
Shaping as a Tool in Behavior Modification
Shaping is frequently used as a method to establish new or better behaviors in a wide range of settings, including athletics and therapy. It can be used on individuals and groups
One example was to give cocaine users vouchers for movie tickets if their urine samples showed a 25% reduction in cocaine metabolites
The therapists gradually decreased the amount of cocaine metabolites that could be present to receive the reinforcer
This method was more effective than requiring complete abstinence at the beginning of the program
Percentile Schedules
Shaping can be made more precise and effective through the use of percentile schedules
A response is reinforced if it is better than a certain percentage of the last several responses that the learner has made.
e.g. smoking cessation: each smoker gave a breath sample once a day and received a small amount of money if the sample had a lower carbon monoxide level than on at least 4 of the last 9 days.
Advantage of percentile schedules
Can be tailored to individual performance
Small, gradual improvements can be rewarded even when a subject is struggling
Versatility of the Shaping Process
The Law of Effect is wider in applicability than classical conditioning, which only applies to those behaviors that are reliably elicited by some stimulus
The stop-action principle applies to any behavior that is produced by the organism
The shaping process extends operant conditioning even further, but is limited only to the capabilities of the subject (i.e., a behavior the organism is able to produce).
Operant Conditioning: terminology
The subject obtains reinforcement by operating on the environment