PSYC 361 MT2: OPERANT CONDITIONING (1) Operant Methods & Theories Of Rewards Flashcards
3 Key Elements in Instrumental Learning
1) Environment
2) Instrumental Behaviour
3) Consequence
Instrumental Learning Influences
- Timing of reward delivery
- Rules of reward delivery
- Type of rewards
- Other stimuli associated with rewards
Thorndike: The Law of Effect
- Devised puzzle box to study learning
- Connection formed between lever (S) & response (R) through many trials of cat pressing lever
- Learning = incremental, not insightful
The Law of Effect- SITUATION
Responses that produce satisfying effect in a situation = more likely to occur again in that situation
Responses that produce discomforting effect in a situation = less likely to occur again in that situation
The Law of Effect- STIMULUS
Response in presence of a stimulus followed by satisfying event, association between S & R STRENGTHENED
Response in presence of a stimulus followed by annoying event, association between S & R WEAKENED
Behaviourism: SKINNER
- Studied learning from a behaviourist perspective
- Coined “Operant” = OPERAtes on the environmeNT
(Instrumental conditioning & operant conditioning interchangeable)
Operant Conditioning
Reinforcement: behaviour INCREASES when it produces an APPETITIVE stimulus
Punishment: behaviour DECREASES when it produces an AVERSIVE stimulus
Skinner: stimuli as reinforcers & punishers
- Reward vs. Reinforcer: attractive & motivational property vs behaviour facilitator
Operant Conditioning: +/- CONTINGENCY
Positive: action leads to presentation of stimulus
Negative: action leads to removal of stimulus
Schedules for Reinforcement
Rules for when & how frequently reinforcers are delivered
continuous reinforcement schedule (CRF): every response = reinforcer delivery
partial reinforcement schedule (PRF): ratio, interval, fixed vs variable
Schedules for Reinforcement: PARTIAL REINFORCEMENT SCHEDULE (PRF)
Ratio Schedule: reinforcers delivered based on # of times response occurs
Interval Schedule: reinforcers delivered based on time elapsed after which response occurs
Fixed vs Variable:
- Fixed = # responses/time has to elapse is certain
- Variable = overall average known but #/time for each reinforcer delivery uncertain
Fixed vs Variable: RESPONDING PATTERNS TO SCHEDULES
VR: steady & robust responding (leads to strongest responding typically)
FR: post-reinforcement pause & ratio run
VI: steady & stable responding
FI: fixed-interval scallop
Extinction
Conditioned response diminishes due to lack of reinforcement; rate affected by previous reinforcement schedules (ie. slot machines operate on VR)
- learning process; actions no longer produce rewards
- Adaptive: saves energy by reducing unnecessary behaviour
Primary vs Secondary Reinforcers
Primary: often biologically essential (food, water)
*Secondary**: stimuli previously paired with primary reinforcer becomes reinforcing in nits own right, aka Conditioned Reinforcers (lever, clicker, voucher)
4 Different Functions of Secondary (Conditioned) Reinforcers
- Reinforcing of new learning response
- Establishing & maintaining schedules of reinforcement
- Maintaining of behaviour during extinction
- Mediating delays between response & delivery of reinforcement
Timing of Reinforcer Delivery
Temporal Contiguity: how soon reinforcer follows response
Immediate reinforcer delivery = max learning
Delays to reinforcer delivery discounts reinforcing effect
Theories of Rewards
Rewards as Reinforcers: rewards often called reinforcers because responses followed by reward strengthens association between certain environmental conditions (S) & R
Rewards as Incentives: the anticipation/expectancy of reward arouses incentive motivation
Theories of Rewards- RESPONSE THEORIES
Skinner: reinforcers focuses on functional aspect; any stimulus following a response that increases probability of that response’s recurring = reinforcer
Approach has considerable practical utility, often difficult to determine what will be good reinforcer for given person in given situation
Theories of Rewards- RESPONSE THEORIES: PREMACK’S PRINCIPLE
Premack’s Principle: if 2 responses are arranged in an operant conditioning procedure, the more probable response will reinforce less probable response; less probable response will not reinforce more probable response
Responses occur @ different probability; what is reinforcing is relative, not absolute— dependent on probability of responses (rats deprived of water/wheel)
Momentary Probability: probability of the behaviour at a given time in given situation; reflect “value” of behaviour, can be manipulated by deprivation of size of reward
Theories of Rewards- MOTIVATIONAL THEORIES
Based on homeostatic model: drive from need, energizes behaviour to reach goal
- need —> drive —> activity —> goal —> reduced drive —> reduced activity
Drive Reduction Theory (Hull): any behavioural outcomes that reduce drive is reinforcing
Theories of Rewards- MOTIVATIONAL THEORIES: NEED REDUCTION & DRIVE STIMULUS REDUCTION
(Miller, Kessen): hungry rat either drank milk/had milk directed injected into stomach during T-maze learning—> drink = better reinforce
Injecting milk more reinforcing than injecting saline solution
If only need reduction true, drinking/injecting milk should be equally reinforcing
Drinking milk reduced both drive stimulus intensity & need; injecting milk only reduced need (saline neither)
Theories of Rewards- MOTIVATIONAL THEORIES: DRIVE REDUCTION THEORY
Evidence:
- milk vs saline (miller, kessen)
- pain avoidance: press bar to avoid shock
- fear reduction: escape in fear-eliciting environment
Evidence Against:
- events that do not reduce drive still reinforce
- self-stimulation of brain (olds, milner)
- monkeys work for “sensory experience”
Are rewards necessary for learning?
2 assumptions that rewards act as reinforcers
1) learning is an associative process
2) role of rewards is to form &/ strengthen associations
- if learning not associative/reinforcement not necessary for learning, theories challenged
Latent Learning
Hungry rats learned to run maze rewarded/not
Reward made animals run maze faster
Non-reward still completed maze
Introduction of reward improved performance of previously non-reward rats
Latent Learning: learning occurred, but not manifested until reward introduced
Latent Extinction
Extinction of previously rewarded response can occur without performance of response in absence of reward
- rats learn to run down runway to goalbox for reward, if rat is 1st placed in empty goalbox & then allowed to run down runway to empty goalbox (extinction), extinguish faster
- suggesting rat is not expecting reward
Theories of Incentive Motivation
Alternate view to rewards as reinforcers: the anticipation/expectancy of reward arouses incentive motivation, a drive state which prompts us to engage in activities that lead to rewards
Way in which objects & events in environment can acquire high motivational value & drive behaviour, even in absence of clear biological need
Incentive Shifts
Animals respond “better” (faster, vigorously, accurately) for bigger rewards
—> do better/more motivated to perform?
Support for Motivational Hypothesis: animals learn to respond for either small/large reward & then shifted from small to large/large to small
- performance changes to appropriate levels but changes too fast to be explained by learning
Behavioural Contrast
Contrast Effects: the enhancement/diminishment, relative to normal, of perception, cognition & related performance as result of immediately previous/simultaneous exposure to a stimulus of greater/lesser value in the same dimension
Deprivation Effects
Incentive motivational view on deprivation; deprivation does not directly energize the behaviour, rather increases incentive motivation by making anticipated incentives more attractive/valuable
- Tomlin: only hungry animals show latent learning— for sated animals, food has no/little incentive value
- Alliesthesia (cabanac): hunger makes food a better incentive; thirst makes water a better incentive, etc. reflecting the palatability of the reward
Summary of Operant Conditioning
Instrumental behaviour is goal-directed & can be modified by learning procedure
Operant method allows for studying learning in freely behaving animals
Various factors
- Each reinforcement theory can explain some, but not all instrumental learning phenomena
- Learning & performance may be differentially influenced by motivation & motivation is complicated & better theories may be needed