Chp 7.3: Operant Conditioning 1 Flashcards
Throndike’s law of effect
Thorndike’s concept that a response followed by satisfying consequences will become more likely to occur, whereas a response followed by unsatisfying consequences will become less likely to occur
Skinner’s law of effect (reinforcement)
Skinner’s concept that any consequences of a response that increases the probability of that response, is a reinforcer. (Unlike Throndike, Skinner avoids reference to mental states)
Skinner’s law of effect (punishment)
Skinner’s concept that any consequences of a response that decreases the probability of that response, is a punisher.
operant conditioning
a type of learning in which behaviour is modified by its consequences, such as by reinforcement, punishment, and extinction
-For example, we learn that smiling at others is followed by a friendly greeting.
Reinforcement (operant conditioning)
the strengthening of a response by an outcome that follows it
punishment (operant conditioning)
a response is weakened by an outcome that follows it
ABCs of Operant Conditioning (contingencies)
- antecedents (A), which are stimuli that are present before a behaviour occurs
- behaviours (B) that the organism emits
- consequences (C) that follow the behaviours.
Identify two key differences between classical and operant conditioning
- In classical conditioning, the organism learns an association between two stimuli—the CS and UCS (e.g., a tone and food)—that occurs before the behaviour (e.g., salivation).
- Classical conditioning focuses on elicited behaviours. The conditioned response is triggered involuntarily, almost like a reflex, by a stimulus that precedes it.
- In operant conditioning, the organism learns an association between behaviour and its consequences. Behaviour changes because of events that occur after it.
- Operant conditioning focuses on emitted behaviours: In a given situation, the organism generates responses (e.g., pressing a lever) that are under its physical control.
discriminative stimulus
an antecedent stimulus that signals the likelihood of certain consequences if a response is made
Eg. Going on red light is punished by ticket/car crash
positive reinforcement
a response is strengthened by the subsequent presentation (appearance) of a (positive) stimulus
negative reinforcement
a response is strengthened by the subsequent removal of a (noxious) stimulus
operant extinction
the weakening and eventual disappearance of a response because it is no longer reinforced
positive punishment
occurs when a response is weakened by the subsequent presentation (appearance) of a (noxious) stimulus
Describe some disadvantages of using positive punishment to control behaviour
- Punishment suppresses behaviour but does not cause the organism to forget how to make the response or provide a different more appropriate response.
- Unlike reinforcement, punishment arouses negative emotions, such as fear and anger, which can produce dislike and avoidance of the person delivering the punishment.
negative punishment
the removal of a (positive) stimulus following an undesired response to weaken it
(e.g., TV privileges are taken away from a misbehaving child who wants attention)
Describe some advantages of using negative punishment to control behaviour
- First, although it may arouse temporary frustration or anger, it is less likely to create strong fear or even hatred of the punishing agent
- Second, physical aggression is not being modelled, so there is less opportunity for learning of aggression through observational learning
primary reinforcers
positive reinforcers that satisfy biological needs, such as food and water
secondary (conditioned) reinforcer
a stimulus that acquires reinforcing qualities by being associated with primary reinforcers (eg. Money)
delay of gratification
the ability to forgo immediate rewards for delayed but more satisfying outcomes
shaping
an operant conditioning procedure in which reinforcement begins with a behaviour that the organism can already perform, and then is made contingent (dependent) on behaviours that increasingly approximate the final desired behaviour
chaining
an operant conditioning procedure used to develop a sequence (chain) of responses by reinforcing each response with the opportunity to perform the next response
operant generalization
an operant response occurs to a new antecedent stimulus that is similar to the original antecedent stimulus
operant discrimination
an operant response that occurs when a particular antecedent stimulus is present, but not when another antecedent stimulus is present
Ratio schedules
- Number of responses determine reinforcement
- Leads to high rates of response if ratio is not too big
Interval schedules
- The first response, after a time interval, produces reinforcement
- Produce moderate response rate
continuous reinforcement schedule
a reinforcement schedule in which each correct response is followed by reinforcement
partial reinforcement schedule
a schedule in which reinforcement follows some correct responses but not others
fixed-ratio (FR) schedule
a reinforcement schedule in which reinforcement is given after a constant number of correct responses
Smallish FR produce:
- High response rates
- Little pausing
Large FR produce:
- long pauses before responding
e.g., receiving a free coffee after having your loyalty card stamped 10 times
variable-ratio (VR) schedule
- Unpredictable numbers of responses required for reinforcement
Smallish VR produce:
- very high response rates
- little or no pausing
e.g., a variable and unpredictable number of responses need to occur before the slot machine pays off
fixed-interval (FI) schedule
- reinforcement is available after a constant length of time
- creates graduallly-increasing response rate as time passes
e. g., relatively little studying during the period immediately following each exam, and an increasing amount of studying as the next scheduled exam approaches
variable-interval (VI) schedule
- Reinforcement is available after a changing length of time
- VI tends to produce very steady, but lowish rates of response
eg. A course might average a quiz every two weeks, but they can occur anytime. Their unpredictable timing will produce a steadier approach to studying than regularly scheduled quizzes.