Mazur Chapter 5: Basic Principles of Operant Conditioning Flashcards by Nikos Rebetis

Thorndike

Thorndike was the first researcher to systematically study how nonreflexive behaviors can be modified as a result of experience.

A puzzle box (term Thorndike used for his small chamber) was used to study operant conditioning. The puzzle boxes ranged in difficulty from simple to complex.

Thorndike believed that organisms’ initial correct responses to leave the chamber were accidental

The gradual improvement over trials strengthened the
S – R connection

He formulated the Law of Effect to account for this strengthening of the S – R association.

How well did you know this?

Not at all

Perfectly

Law of Effect

Responses that are accompanied with or are closely followed by satisfaction to the organism will be more connected to the situation and make it more likely that the response will recur again

Responses that are accompanied with or are closely followed by discomfort to the organism will have weakened connections to the situation and make it less likely that the response will recur

The greater the satisfaction or discomfort, the greater the strengthening or weakening of the association, respectively.

The “satisfying state of affairs” that Thorndike referred to has been replaced by the term “reinforcer.”

How well did you know this?

Not at all

Perfectly

Gutherie and Horton: Evidence for a Mechanical Strengthening Process

Gutherie and Horton were two researchers who followed Thorndike’s experimental paradigm.

The learning that took place in the puzzle box involved the strengthening of whatever behavior happened to be followed by escape and food

After their cats mastered the task (i.e., getting out of the box), there was relatively little variability from trial to trial for a given cat, but there was considerable variability from one cat to the other

How well did you know this?

Not at all

Perfectly

stop-action principle

Brown and Herrnstein (1975) used Guthrie and Horton’s results to add a principle to the Law of Effect, which they called the stop-action principle:

The occurrence of the reinforcer (i.e., escape) serves to stop the organism’s ongoing behavior and to strengthen the association between the situation (the puzzle box) and those precise behaviors that were occurring at the moment of reinforcement

The specific bodily position and muscle movements occurring at the time of reinforcement will have a higher probability of occurring on the next trial

The more those particular behaviors yield the reinforcement on subsequent trials, the stronger the S – R connection will be.

How well did you know this?

Not at all

Perfectly

Skinner’s “superstition experiment”

Provided a strong case for the power of accidental reinforcement

Skinner observed that pigeons would engage in various behaviors repeatedly (e.g., head tossing) after head tossing just so happened to be occurring when the reinforcer was delivered. If the first reinforce occurred immediately after a pigeon had tossed its head upward, this behavior would be more likely to occur in the future.

Skinner believed that many of the superstitious behaviors people engaged in were produced by the same mechanism

Superstitious behaviors frequently arise in situations where an individual as no control over the outcome, such as gambling
They also occur often in sports, and even sometimes they occur without the athlete’s awareness.

How well did you know this?

Not at all

Perfectly

Societal Superstitions

Herrnstein (1966) believed that Skinner’s analysis was confined to idiosyncratic behaviors, but that societal superstitions were acquired through communication with others, and not through direct experience

The initial superstition was likely a result of contingencies of reinforcement that are no longer in effect.

How well did you know this?

Not at all

Perfectly

Staddon and Simmelhag: Interim and Terminal Behaviors

Replicated Skinner’s results, but came to a different conclusion about the purpose the behaviors

They found that there were behavior patterns that occurred frequently in many or all of the pigeons during the interval between food deliveries

These behaviors fall into 2 groups:

Interim behaviors: Those that are frequent in the early part of the interval, when the next reinforcer was still some time away
Terminal behaviors: Behaviors that seldom occurred early in the interval but increased in frequency as the time of food delivery approached

They felt that interim behaviors were not a result of accidental reinforcement, but instead they are ones that the animal has an innate predisposition to perform when the likelihood of reinforcement is low

Terminal behaviors may not be related to accidental reinforcement either, and could simply frequently occur when food is about to be delivered.

Research has supported Staddon and Simmelhag’s theory, at least in part. Others call interim and terminal behaviors adjunctive behaviors. There has been evidence for Skinner’s theory in the laboratory and out in the field too.

How well did you know this?

Not at all

Perfectly

Interim behaviors

Those that are frequent in the early part of the interval, when the next reinforcer was still some time away

How well did you know this?

Not at all

Perfectly

Terminal behaviors

Behaviors that seldom occurred early in the interval but increased in frequency as the time of food delivery approached

How well did you know this?

Not at all

Perfectly

Shaping

Method of successive approximations toward a desired response

A primary reinforcer is one that naturally strengthens any response it follows, including food, water, sexual pleasure, and comfort.

Shaping involves making use of the natural variability in the subject’s behavior by gradually making your criterion for reinforcement more demanding, until the desired response is executed.

How well did you know this?

Not at all

Perfectly

Shaping as a Tool in Behavior Modification

Shaping is frequently used as a method to establish new or better behaviors in a wide range of settings, including athletics and therapy. It can be used on individuals and groups

One example was to give cocaine users vouchers for movie tickets if their urine samples showed a 25% reduction in cocaine metabolites

The therapists gradually decreased the amount of cocaine metabolites that could be present to receive the reinforcer

This method was more effective than requiring complete abstinence at the beginning of the program

How well did you know this?

Not at all

Perfectly

Percentile Schedules

Shaping can be made more precise and effective through the use of percentile schedules

A response is reinforced if it is better than a certain percentage of the last several responses that the learner has made.

e.g. smoking cessation: each smoker gave a breath sample once a day and received a small amount of money if the sample had a lower carbon monoxide level than on at least 4 of the last 9 days.

How well did you know this?

Not at all

Perfectly

Advantage of percentile schedules

Can be tailored to individual performance

Small, gradual improvements can be rewarded even when a subject is struggling

How well did you know this?

Not at all

Perfectly

Versatility of the Shaping Process

The Law of Effect is wider in applicability than classical conditioning, which only applies to those behaviors that are reliably elicited by some stimulus

The stop-action principle applies to any behavior that is produced by the organism

The shaping process extends operant conditioning even further, but is limited only to the capabilities of the subject (i.e., a behavior the organism is able to produce).

How well did you know this?

Not at all

Perfectly

Operant Conditioning: terminology

The subject obtains reinforcement by operating on the environment

How well did you know this?

Not at all

Perfectly

Instrumental Conditioning: terminology

Study These Flashcards

The subject’s behavior is instrumental in obtaining the reinforcer

The Free Operant

Study These Flashcards

Skinner modified Thorndike’s puzzle box, which only allowed for discrete trials.

Skinner designed free-operant procedures, which allowed for the operant response to occur at any time and for the operant response to occur repeatedly for as long as the subject remained in the chamber.

Rather than response latency as the dependent variable as in Thorndike’s work, Skinner was able to use response rate.

The Three-Term Contingency

Study These Flashcards

A contingency is a rule that states that some event (B) will occur if and only if another event (A) occurs.

In operant conditioning, however, there are three components
(1) the context or situation in which a response occurs

(2) the response itself
(3) the stimuli that follow the response (i.e., the reinforcer).

More specifically, in the presence of a specific stimulus (a discriminative stimulus), the reinforcer will occur if and only if the operant response occurs. The three-term contingency includes the discriminative stimulus, the response, and the reinforcer.

Stimulus control

Study These Flashcards

The broad topic of how stimuli that precede a behavior can control the occurrence of that behavior

Basic Principles of Operant Conditioning

Study These Flashcards

Operant conditioning is affected by the same principles involved in classical conditioning.

Acquisition of the operant response (like that of a CR) is usually a gradual process

Extinction in operant conditioning involves no longer following the response with a reinforcer, and spontaneous recovery of a previously reinforced response occurs

Discrimination and its opposite, generalization learning occurs in operant conditioning as well

Resurgence

Study These Flashcards

Similar to spontaneous recovery, resurgence is the reappearance of a previously reinforced response that occurs when a more recently reinforced response is extinguished

e.g. pecking was reinforced then extinguished, and then stomping was reinforced

When stomping was extinguished, there was a resurgence of pecking.

Resurgence tends to be stronger for behaviors tat previously occurred at faster rates, and resurgence of specific response patterns can occur

Conditioned Reinforcement

Study These Flashcards

Conditioned reinforcement, or secondary conditioning, is analogous to second-order classical conditioning

An initially neutral stimulus becomes a conditioned reinforcer through repeated pairings with the primary reinforcer

In addition, if a conditioned reinforcer is no longer paired with the primary one, it eventually loses its capacity to act as a reinforcer

Generalized reinforcers

Study These Flashcards

Conditioned reinforcers that are associated with a large number of different primary reinforcers

e.g. Money is a generalized reinforcer because it can be exchanged for so many different stimuli that are inherently reinforcing for most people

Marking

Study These Flashcards

Immediate feedback is provided for a particular response

Bridging

Conditioned reinforcer fills the time period between a response and the delivery of a primary reinforcer, which may help the learner to associate the response and the reinforcer

Function of Conditioned Reinforcers

2 schools of thought: 1. They provide information about the future delivery of a primary reinforcer 2. They add additional reinforcing value above wheat the primary reinforcer already provides. Most results are consistent with the idea that conditioned reinforcers act as “signposts” that “serve to guide rather than strengthen behavior” (information theory) Others still argue for the reinforcement value hypothesis.

Response Chains

A response chain is a sequence of behavior that must occur in a specific order, with the primary reinforcer being delivered only after he final response of the sequence (e.g., teaching a child to brush their teeth). Each stimulus in the middle of the chain is assumed to serve two functions 1. It is a conditioned reinforcer for the previous response 2. It is a discriminative stimulus for the next response of the chain • Responses near the beginning of a chain are the weakest because they are furthest from the reinforcer. Efforts to disrupt a maladaptive behavior chain (e.g., smoking) should include disrupting the earliest links of the chain (e.g., walking to the drugstore).

Backward Chaining

Starting with the last response of the chain and working backward The last response is reinforced and gradually each backward response is added

forward chaining

The first response of the chain is reinforced, and then gradually each response is added

total task method

All the steps of the response chain are taught at once, with prompts, and the individual is reinforced at the end

Instinctive Drift

Primary example of the “failures of conditioning theory”: the animals exhibited behaviors that they did not reinforce in place of behaviors that they had reinforced Keller Breland and Marian Breland, the scientists turned carnies, began to notice that there were recurrent problems in their use of reinforcement procedures on their animals (e.g., pigs would drop and root coins instead of placing them immediately in the container to receive reinforcement). The Brelands referred to this as instinctive drift With extensive experience, the animal’s performance drifted away from the reinforced behaviors and toward instinctive behaviors that occur when it is seeking the reinforcer (i.e., food) in a natural environment.

Autoshaping

Autoshaping (or sign tracking) can more generally refer to any situation in which an animal produces some distinctive behavior in response to a signal that precedes and predicts an upcoming reinforcer Pigeons would peck at the lighted key even though it was not necessary for the delivery of food. Research has shown that autoshaping is not a result of superstitious behavior Some researchers have seem autoshaping as an instance of classical conditioning intruding into what the experimenter might view as an operant conditioning situation e.g. Wasserman observed that baby chicks would engage in “snuggling” behavior to a key that signaled the onset of warmth to a cold chamber Since the snuggling behavior was different than the behavior exhibited once the warm lamp was turned on (i.e., extending wings to absorb more of the heat), Wasserman concluded that stimulus substitution could not solely account for autoshaping

Behavior Systems Analysis

Timberland and Grant conceded that autoshaped behavior reflects conditioning of a system of species-typical behaviors commonly related to the reward Behavior-systems analysis, coined by Timberlake, reflects the idea that different reinforcers evoke different systems or collections of behaviors This approach has been applied to other learning phenomenon such as instinctive drift, preparedness, and backward conditioning.

Reconciling Reinforcement Theory and Biological Constraints

Autoshaping now appears to be most consistent with the classical conditioning model. The principles of reinforcement cannot account for instinctive drift, which has led some researchers, including Timberlake, to assert that reinforcement is inadequate and should be abandoned.

Skinner's Views on Autoshaping

Skinner always maintained that behavior is determined by learning history and heredity He was not dissuaded by instinctive drift or autoshaping He believed that these were circumstances were heredity and learning were operating simultaneously He stated that, “Phylogeny and ontogeny are friendly rivals and neither one always wins”

Mazur Chapter 5: Basic Principles of Operant Conditioning Flashcards

(35 cards)