Chapter 7 - Learning and Adaptation Flashcards
Papa of Operant Conditioning
B.F. Skinner
Operant (behaviour)
A class of behaviour that operates on the environment to produce a common environmental response
Learning
A change in behaviour due to experience
Operant Learning
A change in a class of behaviour as a function of the consequences that followed it
Reinforcement, punishment. Consequences of our behaviour
Reinforcement
- The occurrence of a particular behaviour
- Is followed by an immediate consequence
- That results in the strengthening of the behaviour (person is more likely to engage in the behaviour again in the future)
Reinforcement _____ behaviour
INCREASES, due to the reinforcement
Positive Reinforcement
Response produces a good stimulus; something you want
Result: Increase in response rate
Negative reinforcement
Response ELIMINATES / PREVENTS a bad stimulus
Result: Increase or strengthening of the behaviour / response rate
Taking away something bad to increase behaviour
Positive Punishment
Decrease in behaviour by adding something bad
Negative Punishment
Taking away something good to decrease the behaviour
Punishment ____ behaviour
Decreases
Adding a stimulus -
Positive reinforcement
Remove a stimulus-
Negative reinforcement
Reward ≠
Reinforcer
Observe what is happening with these behaviours. What fits the person before the behaviour, after it, what if it does not happen?
Escape Behaviour
When operant behaviour increases by REMOVING an ongoing event or stimulus
Avoidance Behaviour
When operant behaviour increases by PREVENTING the onset of an event or stimulus
Aversive Stimulus
An event or stimulus that an organism escapes or avoids
Unconditional (Primary) Reinforcer
A reinforcer that acquired its properties as a function of species EVOLUTIONARY HISTORY
Has to do with SURVIVAL. Biological importance
Cannot survive without eating, drinking water, sleeping..
Conditional (Secondary) Reinforcer
Otherwise neutral stimuli or events that have acquired the ability to reinforce due to a contingent relationship with other, typically unconditional reinforcers.
E.g. Money
Immediacy (Contiguity)
A stimulus is more effective as a reinforcer when it is delivered IMMEDIATELY after the behaviour
What variables affect Reinforcement?
- Immediacy (Contiguity)
- Contingency
- Motivating Operations
- Individual Differences
- Magnitude
Contingency
A stimulus is more effective as a reinforcer when it is delivered CONTINGENT on the behaviour
Reinforcer is not randomly given
Motivating Operations
Establishing operations make a stimulus MORE effective as a reinforcer at a particular time.
Abolishing operations make a stimulus LESS effective as a reinforcer at a particular time
Individual differences
Reinforcers vary from person to person
Magnitude
Generally, a more intense stimulus is a more effective reinforcer
Task Characteristics
E.g. Reinforce a pigeon pecking for food v.s. a hawk pecking for food
Establishing Operations (EO)
An operation that INCREASES the effectiveness of a reinforcer
E.g. a bar offering free popcorn and beverages
Abolishing Operations (AO)
An operation that DECREASES the effectiveness of a reinforcer
E.g. Satiation - if you are full, you will be less motivated to eat food
Reinforcer Magnitude
Generally, larger reinforcers are more reinforcing than smaller reinforcers
- NOT linear
will $5 be as effective if you just won the lottery?
Premack Principle
H = high probability response
L = low probability response
L ➡️ H, reinforces L
H ➡️ L, DOES NOT reinforce H
Different behaviours have different probabilities of occurring.
E.g. Eating = high probability; Lever pressing = low probability.
Lever pressing leads to more eating, not the other way around
Continuous Reinforcement (CRF) Schedule
- Behaviour is reinforced EACH TIME it occurs
- Rate of behaviour increases rapidly
- Useful when shaping a new behaviour
- RARE in the natural environment!
Everytime you are doing that behaviour, you are getting rewarded
Intermittent Reinforcement Schedule
- Many different types
- Four (4) main types:
- Fixed-ratio (FR)
- Variable-ratio (VR)
- Fixed-Interval (FI)
- Variable-Interval (VI)
Fixed-Ratio (FR)
Behaviour reinforced after a fixed-number of times
- e.g. FR-120
What does the Fixed Ratio schedule generate?
Post-Reinforcement Pause (PRP)
- Pausing typically increases with ratio size and reinforcer magnitude
- Straight horizontal line on graph; reinforcement being provided
Steady RUN RATES following the PRP
Variable-Ratio Schedule (VR)
- The number of responses needed varies each time
- Ratio-requirement varies around average
- e.g., VR-360 (reinforcement after 360 responses)
Variable-Ratio Schedule Example:
VR-360
Ratios
Shuffled Ordering
Ratios:
• 1, 10, 20, 30,60, 100, 180, 240, 300, 360, 420,480, 540, 600, 660, 690, 690, 720, and 739 responses
• Mean = 360 (Average Ratio)
ShuffledOrdering:
• 20, 240, 720, 420, 480, 60, 10, 690, 30, 739, 360, 690, 300, 1, 660, 600, 540, 100, 180
What are Post-Response Pauses (PRPs) like in Variable-Ratio Schedules?
What are they influenced by?
Rare and very short
- influenced by the LOWEST ratio and/or the AVERAGE ratio
What do Variable-Ratio Schedules produce?
Produce Higher rates than a comparable Fixed-ratio schedule, because you do not know when you will be reinforced
What are Variable-Ratio schedules common in?
Natural environments
Two common variations in Variable-Ratio schedules
- Random-Ratio
- Progressive-Ratio
Random-Ratio (VR)
• Schedule is controlled by a random number generator.
• Produces similarly high rates of responding.
• Type of ratio used in casino games & video games!
Progressive-Ratio (VR)
• Ratio requirements move from small to large
• e.g., 1,2,3,4,5,6,7,8…
• e.g., 2,4,6,8,10 . . .
• PRPs increase with ratio size
• Creates a “BREAK-POINT” measure of how hard an organism will work
• at what point is a human willing to stop responding for said reinforcement
Fixed-Interval Schedule (FI)
• Behavior is reinforced when it occurs after a given period of time.
• e.g., FI-4min (only after 4 minutes, if you respond again, will you get reinforced. No less)
What do Fixed-Interval schedules produce and what increases gradually as a result?
• PRPs
• Responding increases gradually producing a “scallop” shape
- Probably would not study right after an exam, but would study more as another exam approaches
Are Fixed-Interval schedules (FI) common or uncommon in the natural environment?
Uncommon
Variable-Interval Schedule (VI)
• The TIMING of the response needed VARIES each time
• Interval varies around an AVERAGE
• e.g., VI-3mins (shuffled reinforcement in terms of time - seconds)
• RATIOS (in seconds): • 300, 30, 280, 120, 360, 300, 0, 240, 220, 180, 10, 280, 100, 60
Not knowing how long it is going to rain for, when the bus will come in the snow
PRPs and rates of responding in Variable-Interval Schedule?
• PRPs are rare and short
• Steady rates of responding, but
• BUT NOT AS HIGH AS A VR
Are Variable-Interval Schedules common or uncommon in natural environments?
Common
Extinction
- A behaviour that has been previously reinforced
- No longer results in the reinforcing consequences
- And therefore, the behaviour stops occurring in the future
Extinction Burst
Increase in frequency, duration, and/or intensity of the unreinforced behaviour during the extinction process
- Still trying to get that reinforcement because you expect it. It has not been extinguished
- Like getting a snack that will not come out of the vending machine because you entered coins
Spontaneous Recovery
The tendency for the EXTINGUISHED behaviour to occur again in situations SIMILAR to those it had been previously reinforced
Defining Punishment
- The occurrence of a particular
behavior - Is followed by an immediate consequence
- That results in the WEAKENING of the behavior (i.e., the person is less likely to engage in the behavior again in the future)
Two Ways of punishing
Add a stimulus = Positive Punishment ➕
Remove a stimulus = Negative Punishment ➖
Premack Principle for Reinforcement and Punishment
For Reinforcement:
• High-probability behavior reinforces low-probability behavior
For Punishment:
• Low-probability behavior punishes high-probability behavior.
Extinction v.s. Negative Punishment
• Extinction - WITHHOLDING the reinforcer that was maintaining
the behavior
• Negative punishment - REMOVING or WITHDRAWING a positive reinforcer after the behavior
Variables Affecting Punishment
• Contingency - the degree of correlation between a behavior and its consequence (consequence has to occur everytime that behaviour happens)
• Contiguity - nearness of events in time (temporal contiguity) or space (spatial contiguity).
• The longer the delay (less contiguity), the slower the learning.
• Intensity - the more intense the punisher is in terms of magnitude, the more effective it typically is.
Introductory Intensity of Punishment and Ethical Consideration
(Variables that affect punishment)
Using an effective level of punishment from the beginning is very important!
• If punishment is to be used, it must be intense enough to supress the behavior dramatically (always try reinforcement before you try punishment)
The Problems with Punishment
Punishment can induce ESCAPE and AVOIDANCE behaviors.
• Examples:
• Hiding
• Cheating • Lying
More problems with punishment
• Aggression
(Form of escape, become aggressive because you were punished)
• Apathy
(Why do anything if anything you will be punished)
• Doesn’t teach acceptable behaviors!
• Abuse
• Imitation of the Punisher
Classical Conditioning
Process of learning to associate two stimuli (associative learning; associate something with say a memory)
• Respondent conditioning
• Pavlovian conditioning
Unconditional Reflex
Unconditional (or unlearned) stimulus elicits a Unconditional (or unlearned) response
- Natural responses
- Donut (US) causes salivation (UR)
Conditional Reflex
Conditional (or learned) stimulus elicits a Conditional (or learned) response
- When you associate two things together, like bell donut bell donut. The bell would make you salivate, donut or no donut
Generating a Conditional Reflex
Step 1:
• Make administration of the US CONTINGENT on presentation of the novel stimulus
Neutral Stimulus (🔔) ➡️ US (🍩) ➡️ UR (🤤)
——— Time ——— >
as the neutral stimulus is repeated
it becomes the CS
Step 2:
• Present the CS (formerly the neutral stimulus) on its own
Conditional Stimulus (🔔) ➡️ Conditional Response (🤤)
Variables affecting Respondent Conditioning
AMOUNT of Exposure to the Contingencies (i.e., Number of pairings)
• In general, MORE exposure = GREATER conditional responding
• EARLY exposure produces MORE learning than later exposure
• i.e., Non-linear
• Conditional Responding is asymptotic
• Conditioning can occur at different rates
Other variables affecting Respondent Conditioning
Intertrial-Interval (time between exposures)
• Interval between one CS-US exposure (a trial) and another CS-US exposure (a different trial)
• Long term contingency is better because you can analyze the data more)
Age
• Degenerative/health effects of aging • Learning history
Conditioned Emotional Responses
An emotional response to a stimulus that is acquired through Respondent Conditioning.
• Like classical conditioning, but taking use of emotion
• e.g., Little Albert
• CERs can be positive OR negative