Topic 4: Operant conditioning: reinforcement Flashcards
Operant (instrumental) conditioning
Learning that is controlled by the consequences of the organisms behavior
Reinforcement
-Process in which a behavior is strengthened by the immediate consequence that reliably follows its occurrence
-Strengthened=more likely to occur in future
-Thorndike law of effect
-Skinner + operant boxes
Thorndike’s law of effect
-“If a response, in the presence of a stimulus, is followed by a satisfying state of affairs, the bond between stimulus and response will be strengthened”
-Satisfaction=Stamping in
-Discomfort=Stamping out
Positive reinforcement
-Adding something good/desirable that makes the behavior occur in the future
-Eg) Yelling at a kid who enjoys the attention at being yelled at will yell again in the future
-Adding in yelling (positive in this scenario) leads to more yelling in future (behavior reinforced)
Negative reinforcement
-Taking away something bad/undesirable that encourages behavior to occur more often
EG) Turning off a loud buzzing noise when child finishes cleaning room
-Negative reinforcement does NOT equal punishment
Positive punishment
-Adding something bad/undesirable to make behavior happen less often
-EG) adding more chores when child misbehaves
Negative punishment
-Taking away something good that leads to behavior occurring less in the future
-EG) kid is having fun but is too loud, they get a timeout away from having fun. When out of timeout they wont be loud again to avoid being sent away from having fun
Antecedent
-The conditions you are in that determines whether a behavior will occur or not
-Could also be called stimulus
Operant behavior
-A behavior that is strengthened through the process of reinforcement
-Acts on environment to produce a consequence
Operant learning
Change in a behavior as a function of the consequence that followed it
Reinforcement
The procedure of providing consequences for a behavior that increases or maintains the probability of that behavior occurring in the future
Reinforcer
Any event or stimulus that follows an operant response and increases or maintains its future probability
Positive reinforcement #2
Any event or stimulus that, when presented as a consequence of a behavior, increases or maintains the future probability of that behavior
Negative reinforcement #2
Any event or stimulus that, when removed as a consequence of a behavior, increases or maintains the future probability of that behavior
Escape behavior
-When operant behavior increases by removing an ongoing event/stimulus
-Eg) turning off alarm clock or pressing lever to stop electric shock
Avoidance behavior
-When operant behavior increases by preventing the onset of the event or stimulus
-Eg) pressing a lever to prevent an electric shock
Discrete trial procedure
-Instrumental response produced once per trial
-Each training trial ends with removal of animal from the apparatus
-Each trial is done as an isolated chunk
Free-Operant procedure
-Animals remain in apparatus and can make many responses
-No intervention by the experimenter
-Developed by skinner
-Continuous trials
Cumulative record
-Based on old cumulative recorder device
-Constant paper output, pen jumps with each
-Plot of cumulative responses (y-axis) over time (x-axis)
Unconditional (primary) reinforcer
-A reinforcer that acquired its properties as a function of species evolutionary theory
-Stimuli and events have biological importance
-Usually depends on some amount of deprivation
conditional reinforcer
-Otherwise neutral stimuli or events that have acquired the ability to reinforce due to a contingent
relationship with other, typically unconditional, reinforcers
Immediacy
-A stimulus is more effective as a reinforcer when it is delivered immediately after the behavior
Specific reinforcer used
-Certain reinforcers are preferred over others
-Chocolate > Sunflower seeds
Task characteristics
Reinforce a pigeon pecking for food vs. a hawk pecking for food
Contingency
-A stimulus is more effective as a reinforcer when it is delivered contingent on the behavior
-The degree of correlation between a behavior and its consequence
Contiguity
-Nearness of events in time (temporal contiguity) or space (spatial contiguity)
-High contiguity referred to as pairing
-Less contiguity (longer delays) between the operant responses and the reinforcer diminishes the effectiveness of the reinforcer
-Hyperbolic decay function
Motivating operations
-Establishing operations
-Abolishing operations
Establishing operations
-Make a stimulus more effective as a reinforcer at a particular time
-Eg) deprivation
Abolishing operations
-Make a stimulus less potent as a reinforcer at a particular time
-Eg- Satiation
Reinforcer magnitude
-Generally a more intense stimulus is a more effective reinforcer
-relationship between size and effectiveness is not linear
-More magnitude increases the less benefit you get from increase
-Effectiveness of unconditional reinforcers tend to diminish quickly
Schedule of reinforcement
-A rule describing the delivery of reinforcement
-Different schedules produce unique schedule effects
Schedule effects
-Particular pattern and rate of behavior over time
-Over the long-term, effects are very predictable
-Occur in numerous species (humans too)
Continuous reinforcement schedule
-Behavior is reinforced each time it occurs
-Rate of behavior increases rapidly
-rare in natural environment
intermittent reinforcement schedule
-Four different types
1) fixed ratio
2) Variable ratio
3) Fixed interval
4) Variable-interval
1) fixed ratio schedule
-Behavior reinforced after a fixed number of times
-Generates post-reinforcement Pause
-Generates study run rates followed the post-reinforcement pause (PRP)
Post-reinforcement pause
Pausing typically increases the ratio size and reinforcer magnitude
2) Variable ratio schedule
-Number of responses needed varies each time
-Ratio-requirement varies around an average
-PRP’s are rare and very short, they are influenced by the lowest and/or average ratio
-Produces higher rates than a comparable fixed-ratio schedule
-Common in natural environmental
-2 common variations–1) Random-Ratio, 2) progressive ratio
1) random ratio
-Scheduling is controlled through a random number generator
-Produces similarly high rates of responding
-eg) casinos or video games
2) Progressive ratio
-Ratio requirements move from small to large
-PRP increase with ratio size
-creates a “break point” measure of how hard an organism will work
Fixed-Interval schedule
-Behavior is reinforced when it occurs after a given period of time
-Produce PRPs
-Responding increases gradually creating a scallop shape
-Uncommon in natural environment
Variable interval schedule
-The timing of the response needed varies each time
-Interval varies around an average
-PRP’s are short and rare
-steady rates of responding, not as high as VR
-Common in natural environments
-
Premack principle
-In nature different behaviors have different probabilities of occurring
-Low to high..reinforces low
-High to low.. does not reinforce high
-Any high probability response will can serve as a reinforcer for a lower probability response
How to test premack principle
1) establish baseline responding for different behaviors
2) Instrumental conditioning procedure with low to high and high to low
Example of premack principle
-If a child prefers playing pinball to eating veggies you can reinforce eating veggies by letting them play pinball each time they eat veggies
-high-probability behavior reinforces low-probability behavior
Problems with premack principle
-Does not nicely account for conditional reinforcement
-low probability behavior can reinforce high-probability behavior when the organism has been deprived of the low probability behavior
Antecedents/controlling stimuli
-Controlling stimulus is a stimulus that changes the probability of an operant behavior
-2 types
1) Discriminative stimulus
2) Extinction stimulus/ S delta
1) Discriminative stimulus/occasion setter
-A stimulus or event that precedes an operant and sets the occasion for its reinforcement
-Makes behavior less likely to occur in the moment
2) Extinction stimulus
-A stimulus or event that precedes an operant and sets the occasion for non-reinforcement
-Makes behavior less likely to occur in the moment
Antecedents
-Include establishing and abolishing operations as well as control stimuli
-Evoke a behavior
-Alter the current probability of behavior
Consequences
-Include reinforcers and punishers
-Strengthen or weaken the behavior
-Alter they future probability of behavior
When does discrimination occur?
-When the presence or absence of stimuli is the occasion on which a response will be followed by reinforcement
eg) Pecking is only reinforced when the green light is on
-The green light IS the occasion when pecking will be reinforced
What does discrimination refer to?
-The effect an occasion setting contingency has on behavior
-Refers to the effect of the response being more likely to occur in the presence of the SD than its absence
Stimulus control
A change in operant behavior that occurs when either S^D or S-Delta is presented
Discrimination index (ID)/Discrimination ratio
A measure of the stimulus control exerted by an S^D or S-Delta
Generalization
-Less precise control
-Obtained by training in a wide array of settings/stimuli
Stimulus generalization
1) Process where once a CS has been established, similar stimuli may also produce a CR
2) process by which an operant response occurs to one discriminative stimulus, it also occurs to other similar stimuli
Stimulus discrimination
1) process where we exhibit less pronounced CR to CSs that differ from the original CS
2) Process where less responding occurs to stimuli that are different from the original trained stimulus
-Operant response to trained stimuli but not others
Concept formation
1) The generalization within classes of stimuli; and
2) The discrimination between classes of stimuli
Generalization in practice
-Occurred when the target behavior occurs in situations other than the specific training conditions
-Ideally it involves having the target behavior occur in all relevant situations
Promoting generalization
-Reinforce occurrences of generalization
-Training setting and criterion setting should be quite similar
-Training and criterion should gradually become more dissimilar
Stimulus exemplar
Stimuli that represent the range of relevant stimulus situations in which the response should occur after training
General case programming
Different S^D’s may require different responses to obtain the same reinforcer