final Flashcards
conditioned inhibition
the CS announces no US
conditioned inhibition example
good response > click!; wrong response > no, thank you!
conditioned inhibition dog trainers
call this a “non-reward marker” — something signals nothing will happen (good or bad)
safety signals
can be used to calm down — when this is present, nothing happens
conditioned inhibition definition
procedure in which a CR elicited by a CS is inhibited when another concurrently trained CS signals the absence of a US
CS can be…
xcitors or inhibitors — these constructs are crucial in our discussion of motivation (can either slow things down or accelerate things)
conditioned excitation
CS being associated with a US
Method 1 (intuitive): differential inhibition or discriminative inhibition procedure
classical conditioning, simply pairing with a US or not (same with habituation/dishabituation). Based on a pairing
differential/discriminative inhibition pairings
CSa <> US
CSx <> no US
CSa = clicker, CSx = note
differential/discriminative inhibition animal responds to
Animal responds only to CSa…in other words, it can discriminate between the two
differential/discriminative inhibition: two sounds
Trying to discriminate between two sounds. Condition one with food and the other without. Eg. If the animal is salivating, C, therefore not G:
What if CSa is C (musical note “do”) and CSx is G (musical note “sol”)
Method 2 (could be competing): conditioned inhibition procedure
getting two reinforced stimuli to compete
conditioned inhibition procedure trials
US is presented (reinforced trials): CSa > US
US is omitted (non reinforced trials) CSa + CSx > nothing
conditioned inhibition outcome
Subjects respond to CSa
Subjects do not respond to CSa + CSx
Subjects also do not respond to CSx (= conditioned inhibitor!)
CSx by itself signals no US (food)
conditioned inhibition CSx
CSx on its own was never on its own predicting the food.
conditioned inhibition designed by
Pavlov
information value in conditioning
CS provides information on the US
rescorla experiments
contingencies between CS (tones) and US (shock)
rescorla experiments groups
3 groups:
positive, zero, negative contingency
what is key in rescorla
probability of CS followed by US
rescorla positive contingency
if CS is followed by US in a relatively predictable way — excitation
rescorla zero contingency
completely unpredictable
rescorla negative contingency
inhibition, reverse of number one. If you have a US that is not immediately preceded by the CS (almost like backward conditioning) — inhibition
appetitive conditioning
cues can announce something pleasurable
appetitive conditioning example
Coyotes associate the human scent or voice with food (litter on trails)
aversive conditioning
Cues can announce something not pleasurable
aversive conditioning example
Humans use human scent with hazing methods to scare the coyotes away
powerful appetitive USs
food, sex, drugs
aversive control, appetitive conditioning
get coyotes afraid of what they are supposed to avoid. Coyotes predicted the presence of food with humans, food and humans became the same thing — this classical conditioning was the problem. Became operant when the coyotes actually acted on the human to obtain to the food (an attack)
aversive conditioning, hazing program
hazing program may not be successful with coyotes that already formed that association. Could be useful in young coyotes, to begin to conditioning coyotes that humans are dangerous. Have to be consistent in these pairings.
main method of extinction
you stop pairing of CS-US
is extinction unlearning or forgetting?
yes, but also learning a new response — learning what happens when CS is followed by US of one nature and of another nature
extinction is actually
Responses are suppressed or not expressed
extinction is likely just..
a suppression of response. It is not permanent, if the right conditions are back together the response comes back
renewal effect
CS-US pairing in context X
Extinction in context Y — now bringing the organism to a different environment on purpose to extinguish the response
CS in context X > renewed
renewal effect two associations
- CS — US
- CS — no US
extinguishing is…
is context specific — extinction must be done in context of acquisition
extinguishing is important in
therapy: eg. Might be able to handle it in the therapists office but not in the real environment
If you experience of fear of explosions in war, being exposed just to the sound in a clinic may not help you
extinction is NOT
loss of information
compound conditioning/CS
The presentation of 2 or more CSs, ether
- simultaneous
- serial
simultaenous compound CS
CSs presented at same time
serial compound CS
CSs presented in a sequence
compound conditioning example
Tone + light > US (food)
This will condition normally, as if one S
But, if later, the tone is tested alone…the CR won’t be as strong — compound stimuli will be stronger
If you really want a good response to the tone, condition the tone alone
isolating one part of compound conditioning suggests that
cues CSs are competing
compound conditoning hints at
the importance of having salient cues
salient cues
Salient cues = more intense. The more intense cue is more likely to get conditioned and be noticed
importance of saliency related to what theory
mackintosh attentional theory of conditioning
CS competition — how to win the competition?
- Be more noticeable > overshadowing (because of saliency)
- Be there sooner > blocking (because of temporal priority) — when something happens first, in the sequence of conditioning
overshadowing
- More salient cues get more learning in compound conditioning, still less than if trained alone
- This brings us back to the idea of an information value of cues (cognitive interpretation): we use here the concept of information
- If you really want to have a clear effect of one CS, try to isolate that completely and train it alone
Neural networks/information theory involved here in classical conditioning
compound conditoning example
CSa + CSb > US is compound conditioning, ie. competing CSs
when two CSs compete, if one is more salient it will…
overshadow the less salient ones
compound conditioning analogy
Picture book for kids. The book contains text and pictures, but the pictures will most likely overshadow the text
In other words, learning to read this way is not optimal
compound conditioning spatial learning
Similarly, in spatial navigation, landmark cues may overshadow more general spatial cues — most of the time when you navigate proximal cues influence you most. The most important cues are the most salient
compound potentiation
This is the opposite of overshadowing
In this case, there is more conditioning to a weak conditioned stimulus if it is combined with a more salient conditioned stimulus during conditioning
compound potentiation example
The conditioning of a weak odour may be strong if it is combined with a salient taste during conditioning
Eg. Gadbois aversion for even the smell of whisky in his taste aversion for whisky
Taste and odour go together — they are intimately linked stimuli. Smell is usually the early CS that something good/bad is coming.
kamins blocking effect example scenario
Scenario: visit in-laws every Sunday. Your FIL offers an anise based liqueur, but you drink it every time even though it makes you sick. One day, he offers you bagels with the anise.
Will you develop an aversion to bagels? No. Even if you get a bit sick, you will attribute your malaise to the anise not the bagels.
kamins blocking effect assigned variables
CSa: anise
Cab: bagels
kamin blocking
you were exposed to the aversive first, then you combined the bagel (neutral). Only the aversive maintains the negative response.
kamin original experiment
The experimental group gets the pairing as described above
When bagels are presented without anise (Test phase) almost no CR
Less response (CR) in the experimental group than the control group
kamin what does prior conditoning do
prior conditioning with the anise has blocked conditioning to the bagels
Basically, the anise already predicted the sickness. Adding the bagels did nothing.
kamin what is the control group
the basic compound conditioning explained previously
original experiment control group
compound conditioning group
original experiment experimental group
pre-trained
table in slides
basics of classical conditioning slide 56
Is overshadowing different from blocking?
It occurs in only one conditioning phase
overshadowing application
See application mentioned by Boynton for chemotherapy: scapegoating
Give patients a salient new flavour just before chemo (eg. Ginger candy)
This new flavour will overshadow any other flavour eaten in recent meals (just before the chemo)
The ginger candy becomes the scapegoat…problem is…the patient may never be able to stomach ginger candies in the future
information provider
- The CS-US association is an information provider unless the informational value is blocked
- to be useful, a cue needs to be non-redundant (eg. Needs to be a non-redundant predictor)
- Redundant cues are typically not very informant
learning only occurs if…
the CS provides new information on the US
CSs may
compete with each other
a more informative CS will win over
a less informative CS
Two areas of experimental psych have produced solid quantitative models, even “laws”
- Psychophysics and the study of sensation and perception
- Learning theory and the study of conditioning and learning
The Rescorla-Wagner model
- most well known mathematical model in experimental psych and has been applied beyond learning theory
- Tries to explain blocking
- A lot about the unconditioned stimulus
The Mackintosh model
- A lot about conditioned stimulus
- Attentional theory (saliency)
The Pearce-Hall model (including Kaye & Pearce)
US and CS modulation, attention, negative transfer
Wagner’s original model (SOP) and AESOP
Priming (Attentional priming), affective dimension
Rescorla-Wagner model
We know that cues compete with each other for associative strength
We know and will reaffirm that contiguity (in time and space) is not the full story. Information produced by the cues needs to be:
Reliable
Non-redundant
Rescorla-Wagner model counterintuitive
difference between phase 1 and 2. In all experiments in learning theory we have an acquisition phase and after. In acquisition phase, you do want that CS/US pairing to be tight and predictable. Later, if the CS always announces the US, your animal loses attention (redundancy). Then work with the saliency of the CS/US or make it less predictable. Uncertainty is what facilitates learning after a while (NOT initially)
error detection and error correction
- Prediction error: The difference between what is predicted, and what actually happens
- Error detection: Ability to detect errors or discrepancies between the predicted and actual occurrences (detect prediction errors)
- Error correction learning: Similar to “trial and learning”. Learning to reduce the error or discrepancy between predicted and actual occurrences
error correction learning
- This type of learning applies to classical conditioning and many other types of learning, ie. skill and motor learning
- This includes prediction error and error detection
pavlovian classical experiment follows this scheme:
Bell — food, Salivate salivate
Or
CS — US, CR UR
when conditioning to one CS generalizes to another we have…
generalization
generalization graphic
Historically: two views on classical conditioning
SR learning
SS learning
SR learning
CS-UR ~ note that this form of learning is known to occur in some case
SS learning
US » CS = stimulus substitution theory or “cognitive” theories of classical conditioning; this view was confirmed by Rescorla in 1978.
SR vs. SS learning graphic
Is it possible to respond to a CS (e.g., tone) that was never paired with the US (e.g., food)? (method)
One method can help answer this question: US devaluation
Is it possible to respond to a CS (e.g., tone) that was never paired with the US (e.g., food)? (forms of learning)
- Second order or higher order conditioning
- Sensory pre-conditioning
conclusion about US-CS?
association is essential!
figure from textbook?
???
is sign tracking resistant to outcome devaluation?
Sign tracking but not goal tracking is resistant to outcome devaluation
We will see later that cues paired with a US/reward can…
gain “incentive salience”
what happens if you get a temporary seperation of the CS and US in classical condtioning
you will decrease sign tracking, but increase goal tracking.
second-order condiotning and sensory preconditoning graphics
conclusion, learning can occur…
without an US > pro-SR learning
second order conditioning example
You get bitten (US) by a dog (CS) » Fear (CR)
The later you associate the park (CS2) with dogs and get fearful (CR) entering the park
second order conditoning graphic
sensory preconditioning graphic
sensory preconditioning appears to be…
pro SR learning
sensory preconditioning analogy
Guilt by association” analogy
- Peter and John hang-out together. Peter is arrested for using cocaine. You assume John is using cocaine.
what happens with sensory preconditioning
wo neutral S get associated: The target CS (tone) never gets paired with the response it evokes.
In sensory preconditioning, the same two stimuli are paired before the light can even elicit a CR. So with that logic, it is a case of S-S learning.
sensory preconditioning is therefore
pure S-S learning
generalization
concise def
???
higher (second) order conditioning
concise def
???
sensory preconditoning
concise def
???
methods in classical conditioning: motor responses
- Proboscis Extension Response (PER) conditioning to artificial odours in bees
- Eyeblink conditioning in rabbits
methods in classical conditioning: emotional responses
- Fear conditioning in rats
- Autoshaping in pigeons
methods in classical conditioning: motivational responses
- Appetitive conditioning in rats
- Taste aversion learning
eyeblink conditioning in rabbits associations
UR = eyeblink
CR = eyeblink
US = puff of air
CS = tone/light
fear conditioning/threat conditioning/conditioned emotional response/conditioned suppression associations
UR = bar pressing (from training)
CR = fear/suppression of bar pressing
US = shock
CS = tone/light/noise
conditioned fear
the fear response is often “freezing”.
the CS suppresses the bar pressing, therefore
The response (pressing the bar) is suppressed.
in conditioned fear, the suppresion is
the dependent variable, i.e., a measure of conditioned fear » suppression ratio
suppression ratio
responding CS/(responding CS + responding pre-CS)
When the CS does not change the bar pressing rate, the S.R
= 0.5. A complete suppression is 0.
fear conditioning is an important concept in…
CBT for fear and anxiety disorders, even some elements of PTSD.
conditioned emotions are important in…
motivational aspects of instrumental conditioning
autoshaping in pigeons: sign tracking
focus on the CS’s, or cues predicting the US.
autoshaping in pigeons: goal tacking
focus on the USs
autoshaping therefore is actually
sign tracking
autoshaping is a convenient way to train animals in
Skinner boxes. Alternative: successive approximations
signal-food conditioning protocol
- You light the “pecking key” for a few seconds, then the pellet magazine opens.
- After a number of pairings (45 or so, more or less), the pigeons will peck at the key (it announces the delivery of food, and it is signtracking… that cue becomes associated with the food).
Note: the food delivery is not contingent upon the pecking. The pigeons does not have to peck the key.
appetitive conditoning in rats
- This is the “magazine approach procedure”: CS (any) - US (food)
- The approach is more likely to happen when the CS is produced. This is goal tracking as defined earlier.
taste aversion learning associations
UR = nausea
CR = nausea
US = drug injection (emetic drug or nausea/vomiting inducing)
CS = flavour S (saccharin)
taste aversion learning CS/US
CS-US can be separated in time by hours and still the conditioning will take place.
taste aversion clear and dramatic example
Clear and dramatic example in humans: Observed in humans with chemotherapy as well.
strength of conditioning factors
Time / timing
Novelty of CS and US
Intensity of CS and US
Pseudoconditioning
four main types of CS/US associations
- Delay conditioning
- Short-delay
- Long-delay
- Trace conditioning
- Simultaneous conditioning
- Backward conditioning
- Other factor: Trial spacing
types of conditioning graphic
conditoning works better if…
the CS occurs before the US .
the CS _____ the US
announces
does backward condtioning work well?
typically no
delay conditoning
The interval of 1me depends on the type of conditioning.
delay conditoning example
Example: Eyeblink (very short) vs taste aversion (very long) conditioning.
trace conditoning
Gap between CS and US = “trace interval”
is trace conditioning good
Fine procedure, but decreased efficiency with increased delay.
trace conditoning possible issues
- The memory of the CS.
- Inability for the animal to differentiate between the trace interval from the time between trials.
simultaneous condtioning
Full temporal overlap of the CS and US.
simultaneous condtioning validity
Usually not great results, but some exceptions.
backward conditoning
May be a signal of “no US” (i.e., a conditioned inhibitor).
backward conditioning may be associated with…
with relief in fear conditioning.
= “safety signal”
conditioning works best if…
trials are spaced-out over time (spaced trials); The ITI is crucial.
ISI?
interstimulus interval
ITI
intertrial interval
are massed trials efective
no
summary of conditioning
Short ISI’s
Long ITI’s
ratio between ITI and duration of CS graphic
novelty of the CS and US is…
very important: Pre-exposure to CS and US before conditioning can interfere with learning.
CS pre-exposure
“latent inhibition”. You can habituate to a potential CS with repeated exposures.
US pre-exposure
Randich & LoLordo . Will delay subsequent conditioning. You can habituate to a potential US with repeated exposures.
CS and US pre-exposure both have
real-life implications, e.g., for fear/phobia acquisition.
intensity of the CS/US: US
The CR will be strong, if the US is strong. So the intensity of food or shocks, etc., will influence the magnitude of the response.
intensity of the CS/US: CS
Strong salient CS’s will positively influence learning as well. So a strong tone, or flash of light, etc., will influence the magnitude of the response.
what is best for intensity of CS/US?
With CS’s salient, but not scary (overpowering) is good.
Counterfeit (fake) conditioning: Pseudoconditioning and sensitization
: What if the CS elicits a response like the one that you are trying to condition?
(Example)
You use CS = light.
Response (UR) = blinking.
Natural blinking or conditioned blinking?
So what is going on? » impossible to tell
Counterfeit (fake) conditioning: Pseudoconditioning and sensitization : culprit 1
Sensitization
Counterfeit (fake) conditioning: Pseudoconditioning and sensitization
: culprit 2
Pseudoconditioning
counterfeit/pseudo: sensitization example
Blinking for the camera. Initially a natural response, can get sensitized and anticipated.
counterfeit/pseudo: psuedoconditoning example
Mere exposure to the US!
“Increased responding that may occur to a stimulus whose presentations are intermixed with presentation of a US in the absence of the establishment of an association between the stimulus and the US”. (Domjan, 2015) Experiments need to control for both sensitization and pseudoconditioning.
Rescorla and Wagner model: If no CS or a novel CS is followed by a US
- The US will be unexpected
- You have a positive prediction error
- The larger the error, the greater the learning!
Rescorla and Wagner model: If a well learned CS is followed by the expected US
- The US is expected…
- There is no error prediction (because no discrepancy)
- There is no learning.
Rescorla and Wagner model: If there is a CS-US association established, but then no US occurs
- The absence of the US will be unexpected
- You have a negative prediction error
- There is a decrease in CS-US association
- This is extinction
What happens if you trained a dog every time the CS is produced the US follows (clicker training, phase 1)
he theory is that technically it would suggest the CS (clicker) itself is reinforcing. After a while the dog might not even need the treat, the click is reinforced enough. If the association between the US/CS is weakened, therefore the clicker would not become as good of a predictor overtime and would lose its reinforcing value.
Rescorla and Wagner ___ predict that the CS can be the ____ itself
do not, reward
rescorla and wagner…
do not think clicker training would work if the clicker is used in isolation
Basic assumptions of RS
- Each CS has an association weight, or strength of association, with the US.
- Cues (CS’s) compete for associative strength.
- This tells us that information, or the surprise value of the stimuli involved in important
key element in RW
the element of suprise!
what is surprise RW?
- What is surprise? It is what is unexpected.
- Part of this may be about attention — how much attention are you giving to elements in that model
- Either you think its coming or its not, or vice versa, or a completely novel experience
premise of surprise RW
The US needs to be “surprising”. With time, the US, in any conditioning sessions, becomes less and less “surprising”.
why is surprise needed?
Why? Because the CS becomes a solid predictor of the US.
example of CS being solid predictor o US
clicker training in dogs
learning wont occur anymore when…
the CS predicts the US perfectly.
suprise clicker training example
Clicker training, your click pouch should be full of whole bunch of different intensity treats (surprise element). Slight variation of the US is good — this is for appetitive learning
Could apply to a person as well. If a CS announces a shock, same principles apply especially if it is always the same small shock. If you start varying the intensity of the shock, then the person will start paying attention again.
If the CS/US link is highly predicable there is…
no learning, learning depends on not being exactly sure how that works.
what happens with repetition in learning
Repetitions get boring after a while, if you can always predict things then you start getting bored. That means though that you have already learned the task. But how do you keep a strong response/motivation — this is where unpredictability might help
- You don’t want the response to start to extinguish
- But if you maintain the link between CS/US it shouldn’t extinguish
how do you avoid extinguishing of connection
maintain the link between CS/US
Does it matter if “learning does not occur anymore”?
you are maintaining the strength of the association… so maybe not.
Does that mean that you can drop the US?
- Then you create a negative prediction error (see above)
- There is a decrease in CS-US association…
- So a US is important (at least some of the time…)
what is important in surprise/learning
Uncertainty/intermittent reward is good for this
If you drop the US you get…
a negative prediction error, you are affecting the CS/US relationship
mathematical model, a.k.a
the neural network-like model:
mathematicla model peak of learning is
the associative strength between the US and the CS.
how is associative strength developed
Based on strengthening of the relationship between the CS and US. The more you do click/treat, with time the dog learns that the click predicts the treat.
learning ceiling is denoted by ____, it is
This learning ceiling effect is called lambda, it is the asymptote of the learning curve.
what is the learning ceiling
When dog completely knows that click means treat
Asymptote is very much determined by the value of the US (Whether it is good or bad)
If it was about shocks, probably a more intense shocks get you to a higher lambda.
what is really important in lambda
All about the US — that is the important part of this. Relatively of the US is very important
the associative strength of the US and CS is V (y axis) =
predictive value
the asymptote is determined by
the value of the US
Change in associative strength (formula)
△V = ⍺β(ƛ-V)
⍺
CS = ⍺ (it comes first) = salience of CS
β
US = β = salience of US
V
V = associative strength = learning parameter = how well the CS predicts the US
ƛ
ƛ = ceiling effect (how much that animal is going to learn) — stands for the US
△
△ = delta = change
(ƛ-V)
(ƛ-V) = surprisingness of the US = prediction error
V = ⍺β(ƛ-V) is pretty much
the Rescorla Wagner model
(ƛ-V)
The difference between the US and what the CS predicts (ƛ-V) is called the prediction error.
as trials of CS-US associations accumulate
the CS becomes a better and better predictor of the US.
the more pairings of the CS/US…
the easier it is for the animal to learn
with time, the prediction error becomes
smaller and smaller
classical conditioning =
djusting the prediction error
If you have a large (magnitude) US (𝛃), lambda will be…
larger too (higher asymptote)
More likely to hit the lambda with…
a strong/salient US
So how can you get the blue line in clicker training? (larger magnitude)
- Use a strong US
e.g., in dog training, liver treats instead of regular kibble - Easier to make you get nervous about the possibility of getting a shock when the shock is very painful
can lambda move?
The lambda can move (you can bypass the ceiling of (1.0), all depending on the strength of 𝛃
if the CS is really salient (⍺)…
the learning will be faster (i.e., it affects the rate of learning).
how can you get faster learning
get a salient CS (⍺)
salient CS example
e.g., in dog training, a stronger, louder, better defined (crispier) clicker sound.
strong CS, great US =
lot of learning and fast learning.
can boredom occur over time
With time is it possible things get boring, boredom can take place. It is technically possible that after a while these parings get boring. Then you could replace the clicker with a whistle or something to add an element of surprise/novelty.
A strong US (liver treats), or 𝛃…
will impact the associative strength/asymptote or lambda (𝛌).
A salient CS (crisp click), or 𝛂…
will impact the rate of learning.
is it possible for other CSs to be present during conditioning?
yes
If 2 or 3 CS’s are present…
they all contribute to the conditioning (and may compete… as you may remember with blocking)
how did they add the sum of associative strengths to the formula?
This is why they added sigma to the formula, the sum of the associative strengths — acknowledge the potential of sum variability in adding CS.
when might a number of associative strengths be at play
In the acquisition of a traumatic experience it might not always be clear what caused it, but rather an association of many things
compound conditioning is also known as
potentiation effect
formula with compound conditioning
𝚫V = 𝛂𝛃(𝛌-𝚺V).
what does sigma refer to
sum of
the Kamin blocking experiment: phases
- Phase 2: The tone competes with the light
- Phase 3: The tone loses
why does the kamin experiment play out that way
Why? The light already accurately predicts the US (shock).
- The light already acquired the strength of the CS/US pairing
Medical diagnosis of dietary (GI) intolerance example
Phase 1: Patient drinks milk » stomach ache
Phase 2: Patient drinks milk + and has garlic » stomach ache
Phase 3: Could the garlic cause the stomach ache?
Diagnosis: “Unlikely, eat garlic, but avoid milk”
Think of this example: Medical diagnosis of dietary (GI) intolerance – implication (garlic/milk)
Implication (cognitive): Clearly the physician is thinking of a milk allergy here. But… if she/he had considered that both milk and garlic are common triggers for IBS (therefore a milk — and garlic — intolerance, not a milk allergy), then garlic would have been seen as a potential player.
- Have to test things one by one. If you start combining things you will never be able to isolate the cause — not just because of the metabolic response, but also the chance of classical conditioning
Suppression ratio
There is less learning with a high suppression ratio.
If a new stimulus is presented with an other that was previously conditioned…
then little or no conditioning will occur (resulting in a high suppression ratio).
Remember that you initially pair a tone or noise (CS) to a shock (US).
Then, the same noise AND a light are paired to the shock (US).
the outcome is…
that the response to the light is not well acquired (high suppression ratio).
what did the RW model address
Rescorla-wagner explained blocking, big victory. However, it does not explain a lot of other things.
how did the RW model explain blocking
By showing that the second stimulus does not acquire an associative value in the second wave
how to unblock
modulate the US (saliency of US is the focus of the model)
UNBLOCKING =
Increasing the US (shock) in phase 2.
neural network explanation
???
what decreases during extinction
In other words, V, or the associate strength, will decrease during extinction
what happens to lambda in extinction
So lambda becomes 0 (there is no US anymore)
what happens when you reintroduce a US
There is very strong spontaneous recovery as soon as you reintroduce the US. What may actually work is if you repair the clicker with the food (marking or keep going signal). Fear is that if you never pair the clicker with the US again, it may change the meaning of the click. The meaning of the click is how it was associated with the US. What if sometimes you click and its not followed by food, but sometimes you click and it is followed by food. That model here does not address that.
Conditioned inhibition à la Pavlov:
L (CS) + Shock (US)
L (CS) + N (CS) + no US > the noise becomes an inhibitor
Protection from extinction
If you combine an inhibitor with an excitor during extinction, the inhibitor will protect the excitor from losing associative strength (V).
what does protection from extinction help to explain
This may explain relapse in exposure therapy for fear or phobias — you would try to extinguish the link
in exposure therapy for fear/phobias you would…
extinguish excitors. While doing so, you may have cues that become inhibitory.
initially during fear conditioning…
it looks like you have no more fear. But if later you test the excitor by itself, then fear comes back.
why does that occur during fear conditioning?
- Well remember the concept of surprise.
- The noise (inhibitory) predicts no US (no fear). But if you present the light again… fear is back!
- Surprise is the key for learning!
- In some cases you can have two excitors
what happens if you have two excitors?
- Then you have a very efficient extinction!
To get very efficient extinction trials…
combine the CS with other excitors (not inhibitors).
negative contingency
US is less probable in the presence of the CS
- This scenario leads to inhibitory conditioning of the CS
neutral contingency
US is equally probable in the absence or presence of the CS
- This leads to no learning
Anytime you have a CS or US that is conditioned
context is always processed at the same time (comparator theory)
Mackintosh and Turner (1971)
How the attentional theory of theory (attentional learning of classical conditioning)
mackintosh and turner groups
Group 1: used noise followed by shock, no phase 2, light noise than much better shock, then test light and see what happens
Group 2: noise followed by shock, light/noise shock, light noise than much bigger shock, then test
what happened in mackintosh and turner experiment
- Based on Rescorla and Wagner, the groups should not have differed, but the control group (1), sees more conditioning (remember, with the SR, low scores indicate more conditioning)
- Group 1 (control) learns because of unblocking, i.e., the use of a larger shock.
- Group 2 (experimental): The LN - Shock trial interfered with the learning.
- Group 2, phase 2: Subjects learn that the light (L) is redundant, so they pay less attention to it!
what were mackintosh and turner trying to show
that we learn to ignore redundant predictors of a US, good or bad.
what do we learn to do with redundant predictors of a US
learn to ignore them!
So it may not be about the ineffectiveness of the ___ (RescorlaWagner), but rather the ineffectiveness of the __ (Mackintosh).
US, CS
so how well does the CS predict the US?
If the CS is a good predictor, then we pay attention to it. If not… we don’t
Attention paid to the CS
how well the CS predicts its consequences
___ is the predictor
CS
Mackintosh model is all about the…
attention we pay to the CS
the more ___ the ___, the faster the learning.
salient, CS
moral of the story
US and CS are important
Group dog clicking classes — how does it work if everything you hear in the room is click?
Think about it. How does the dog know which click is theirs? There must be a mechanism for that dog to understand which click is important to them. Attention is important!
problems with RW model?
- Does not explain well extinction of inhibition
- Does not explain well latent inhibition (pre-exposure to a CS)
- May have some issues even with the blocking effect
- Miller, barnet & Grahame identity 23 predictive failures of the model
who raised issues with blocking effect of RW?
Mackintosh raised some issues with how the US needs to be surprising, thinks attention is more important
mackintosh and turner main idea
The more you repeat something, the better the outcome — talking about elemental isolated stimuli that get a certain type of learned irrelevance
mackintosh and turner finding about CS?
CS that is not redundant (or surprising) may be good
what is at the forefront of mackintosh and turner model?
CS
conclusion of mackintosh and turner?
we learn to ignore redundant predictors of a US
- So it may not be about the ineffectiveness of the US but rather the ineffectiveness of the CS
so how well does the CS predict the US?
If the CS is a good predictor, than we pay attention to it. If not…we don’t
cognitive: attention paid to CS «<
how well the CS predicts its consequences
mackintosh model?
- Attention to the CS = alpha in the RW formula
- CS salience (alpha) is key, the more salient — the faster the learning
- Game is to pay attention to the CSs that are the best predictors of the US
what is latent inhibition the same as
CS pre-exposure effect
what is latent inhibition/CS pre-exposure effect?
if a stimulus is not suprising anymore, after a while you will stop paying attnetion
does the mackintosh model always work?
no. thats the problem
the pearce hall model…
changed the landscape of models for classical conditioning
pearce hall two groups?
Group 1 = tone and shock, tone and SHOCK
Group 2 = light and shock, tone and SHOCK
pearce hall results
We know here that group 1 should learn quickly, because the tone is present in phase 1 and 2, with a US of stronger value. What we have here is the opposite due to negative transfer
premise of pearce hall model?
Why pay attention to a well established CS? (Elements of mackintosh here)
Instead, we pay attention to unknown/unconditioned CS — really coming back with idea of surprise
main point of pearce hall
organisms attend to reposed to uncertain predictors
pearce hall perspective aligns well with…
modern attentional thoery
what is alpha based on in pearce hall model?
As in mackintosh, alpha is attention but the value of alpha is based on how surprising the US is — important because alpha brings attention in
what matters in pearce hall model
both CS and US, could argue all elements of RW model are here
kaye and pearce study
Step 1: Condition rats to light CS — initially orient to it (eg. Sign tracking). If they orient to it it means they are paying attention
Step 2. Three groups
Continuous
Partial
None
continuous group?
CS/US pairing on every trial (R&W in support of this)
partial group?
50% CS/US paring
none group?
CS never paired with US
kaye and pearce study results
Found that partial pairing actually worked the best. Continuous pairing is fine but not great, no pairing does not perform with. Partial pairing is really when the rats are paying attention, in other words, a little uncertainty is not bad.
conclusion of pearce/hall and kaye/pearce
- How surprising the US is on preceding trial is crucial
- How surprising the US is, depends on the CS
- Little bit of uncertainty may actually maintain responses
- The vale of alpha remains high
- US component (R&W), CS component (Mackintosh) — this pairing creates uncertainty and ends up being in your advantage
- Explains negative transfer
- Explains latent inhibition
The Pearce and hall model, under the kay and Pearce experiment reinforces that…
uncertainty is a good thing and both CS/US are important
pearce and hall model summary of negative transfer: two groups?
Group 1 = tone and shock, tone and SHOCK
Group 2 = light and shock, tone and SHOCK
pearce and hall model summary of negative transfer: phase 1
Phase 1: CS becomes a perfect predictor of the US
- Problem, US becomes unsurprising with time — the repetition, fact that CS is losing some of its predictive value is problematic. With time, less and less attention is payed to the CS.
pearce and hall model summary of negative transfer: alpha value at beginning of phase 2
is very low
pearce and hall model summary of negative transfer:
initially little learning is possible
- Negative transfer occurs: you pay less and less attention to the CS
pearce and hall model summary of negative transfer:
Synthesis is to in a sense, take all of the components together
Importance of CS comes together with Pearce and Hall
what are we paying attention to? according to pearce and hall
- Cues with high predictive value (mackintosh)
- Cues high in uncertainty (pearce-hall): organisms attend to uncertain predictors
what does RW suggest (main idea!)
Surprisingness of the US — high saliency of US gets you to pay a lot more attention to predictors
attnetion is important but depends on…
previous learning
two things we have learned…
can condition attention AND can condition motivation as well. Cognitive processes of attention and motivation can be conditioned, these are essential to learning
modern version of short term memory?
working memory
who proposed working memory?
werner honig
what is working memory?
Memory that you use as you are doing a task. Could include rats in a skinner box, or people being given a phone number. Form of short term memory that stays as long as the info is relevant (usually not very long)
how does sensory memory go to STM?
self-generated priming
how does LTM go to STM?
retrieval-generated priming
what does wagner do in his new model alone?
Brings back the surprise element here
Brings back primed component of STM
wagner priming (problem?)
If you are primed, in most memory tasks that primed element will be easier to retrieve from memory later. Here, a primed element is less surprising — therefore we have a bit of a problem
two types of priming
self generated
retrieval generated
self generated priming?
via sensory memory (basic structure of the stimulus)
retrieval generated priming?
via LTM (actually from a complex retrieval process of info — already most likely in LTM)
surprise can be reduced in ___ ways
2
retrieval generated priming?
From long term memory via a retrieval cue
self-generated priming?
From sensory memory via a recent presentation
blocking =
retrieval generated priming
retrieval from LTM?
the US is primed by the CS
what is priming of the US with the CS?
blocking
priming can explain…
blocking
- because the surprisingness of the US is dampened
continuous learning =
in the moment, self-generated
long pause, later CS is presented =
long term memory, retrieval generated priming
priming of the CS
- RW: combined processing of the CS and US in STM is important
- Wagner thought that maybe he could look into latent inhibition (CS pre exposure effect again)
- With this model, exposure to the CS before conditioning should reduce the CS surprisingness.
wagner can explain…
habituation
wagner model what is habituation
habituation is simply a decrease in the “surprisingness” of the stimulus with both types
SOP
Standard operating procedure or sometimes opponent process
what kind of model is SOP?
This is a connectionist model, very much a neural network model — parallel processing
who massively influenced SOP?
donald hebb
what was SOP originally called?
PDP or parallel distributed processing
what does PDP entail
- Processing in the brain is parallel, not serial (like the information processing models always suggested).
- Processing in the brain is distributed, not localized or modular (like emphasized in [clinical] neuropsychology and evolutionary psychology).
Those models (PDP) have been very successful in some areas of cognitive psychology and neuroscience…
- Learning and memory (including priming)
- Perception / pattern recognition
But (PDP) not so successful in other areas (mostly “higher” cognitive processes):
- Language
- Problem solving
what does SOP do that is new?
address time
SOP (3 features)
- Addresses the timing of the CS and US
- Addresses backward conditioning (US presented before the CS). Explains how you can get conditioned inhibition; timing is still important here for this effect.
- Compound conditioning: Leads to inhibition.
AESOP who proposed?
wagner and brandon
AESOP adds?
Wagner and Brandon added the emotional qualifies of the stimuli, particularly the US.
what does AE mean in AESOP?
affective extension - emotional aspect
AE suggests that new nodes are activated by a CS
sensory node
affective node
result of AESOP
CS gets linked to both of these nodes
Potentially every time you have CS/US paring…
you are tapping into the sensory system, likely near the limbic
AESOP is the only theory that says…
pairing can be both sensory and emotional
what is AESOP useful in discussing
It is particularly useful when discussing US’s that may have a strong emotional valence (positive or negative).
what can AESOP help to explain
how the emotional response may potentiate the sensory response
AESOP and trauma?
So this adds an other variable/factor to conditioning. Think about flashbacks in PTSD!
Physical stimuli and emotions being combined
Suddenly AESOP is much about the ___ or ____ and is less stuck in a “____” view of the US.
Suddenly this theory is much about the “sign” or “stimulus” and is less stuck in a “poverty of the stimulus” view of the US.
what is also responsible for poverty of stimulus?
Cognitive psych is also responsible for poverty of stimulus because it focuses on representation of the stimulus by itself. Here we are saying that none of that is terribly valid if you don’t take the emotional value into consideration.
what is the parallel processing dimension of AESOP?
simultaneous sensory and affective processing.
what does AESOP explain
Explains why conditioning works better (often worse), with strong reactions — eg. Trauma
potentiation between sensory and affective suggests that..
limbic system is working with sensory areas of brain
CS in AESOP
They (Wagner & Brandon) were influenced by Pearce (from Pearce and Hall)
pearce came up with?
configural thoery of learning
what is configural learning
The set, or configuration, is what is conditioned to the US
what is elemental learning?
Each CS conditioned independently to the US
pearce configural theory of learning?
configural theory of learning — means that if you have many CS, Pearce saw them as typically being a whole. In other words, they are more than just the parts (elemental view)
the configurational perspective is ___ in nature
gestalt
configurational perspective gestalt
It means that the whole is more than the sum of its parts.
Example: Play a chord on your piano or guitar, say A minor.
- Means that configurations are meaningful and you can actually be conditioned by configurations
So compound stimuli, especially if in the same sensory modality, are…
more that the sum of the elements.
So just adding different CS’s just…
weakens the value of the single, elemental CS (they are in fact, literally not the same).
From a neural network theory point of view, and consistent with SOP, if you condition ABC as CS’s, but then just test AB, you would …
lose 33% of the conditioning (from R-W, remember that CS’s share a lambda).
order of progression of theories
Wagner’s original theory ~ SOP ~ AESOP
AESOP explains
- Compound conditioning effects (Rescorla-Wagner)
- Attentional and priming effects (Mackintosh, Pearce-Hall)
- Time constraints in conditioning
- Sensory and emotional dimensions of conditioning
- Stimulus configuration, and even generalization
operant conditioning, focus on…
Focus on response-outcome learning
applications of operant conditioning
- Parenting
- Training (animals, humans, etc.)
- Teaching (behaviourist methods)
- Cognitive Behavioural Therapies (and others), behaviour modification
- Programming of gaming software, gambling games and machines (e.g., slot machines)
- Advertisement
- Any social situations (making compliments, ignoring people, etc)
- The tipping game!
Etc. (so many more examples)
differences between classical and operant conditoning
- Contingency between outcome and response is extremely clear in operant — the is why an animal learns
- In classical conditioning this is very fuzzy because it is often automatic/unconscious
similarities in classical and operant conditioning
- Both have a learning curve.
- Both have extinction.
thordnike theory
Thorndike: Reinforcement theory » law of effect.
guthrie theory
Contiguity theory » Contiguity is enough.
guthrie contiguity examples
(Why the dog doesn’t learn to hear the porcupine? Because of lack of contiguity in time and space)
Often dogs get hit by porcupine, may feel little bit of pain but there is an analgesic in the quills. Very quickly the animal doesn’t feel pain, the more they start moving the more the quills go in. Why is it that my dog is not learning the porcupine is a big thing from the initial hit? Because of continuity. They end up learning the most pain they experience that day is hours later at the vet clinic. That is what they learn to be afraid of.
stimulus sampling theory
stimulus elements are at the centre of this theory:
Learning curves show an improvement because…
the organism notices more stimulus elements.
for guthrie, a reinforcer is…
just a very salient stimulus — we know it is more than that now.
How close time and space associations are happening is the key for Guthrie
tolmans approach
One of the first cognitive theories (uses the term mental map or cognitive map).
for tolman behaviour is…
inherently flexible.
tolman rats main point
He showed that rats in mazes learn better about places than responses
Main point: They can learn on non-reinforced trials. It becomes obvious when the reward is provided. This is called latent learning
learning is not…
performance!
the role of the reinforcer is to…
provider motivation (idea developed by spence and hull)
tolmans approach order of variables
SD > R > O
SD = discriminative stimulus
R = response
O = outcome
discriminative stimulus gives…
a response for a specific outmode (Eg. Kibbles in spinner box) — operating conditioning. Contingencies must be vert explicit.
thorndike box SD/R/O values
SD = box
R = sequence of events to open
O = outcome
for thorndike and guthrie ____ associations
SR
for tolman ___ associations
SS
rewards thorndike idea?
they reinforce
rewards guthrie?
they are a very salient stimulus
rewards tolman/hull/spence?
they motivate
what do rewards actually do (consensus)
In fact they are/do all three — reinforce, salient stimuli and actually motivate.
two types of priming
positive and negative
positive priming
pre-exposure to stimulus facilitates received of that stimulus
negative priming
Wagner SOP
the exposure to the stimulus gets in the way of learning
negative transfer
when any type of prior knowledge interferes with current knowledge (learning)
pearce-hall negative transfer
Peace-Hall negative transfer: pairing a CS with a weak US slows-down conditioning when the CS is subsequently paired with a strong US
classical view of dog bit situation
CS is dog, US is bite
AE-SOP view of dog bite situation
CS is still dog, now there is two US — one purely sensory (pain), one affective (learned fear)
AESOP explains ____ very well
AESOP explains conditioned emotional responses, often based on fear conditioning, very well
contiguity
all about spatial and temporal (how close in space and time things are)
- CS co-occurs with the US: they are contiguous, or close together, in space and time
contingency
the CS predicts the US: the occurrence of the US is contingent on the prior occurrence of the CS.
two main methods of a skinner box
discrete trial method
free-operant method
discrete trial method
Stand-alone trials, and many of them, in a session. This controlled by the experimenter.
- Eg. Gadbois lab — presenting dog with a problem. Accumulate trials
free-operant method
The animal or the human control the apparatus/computer.
- Typically used for skinner box — animal controls flow of experiment by giving specific responses to specific stimuli
free operant set up
Typical learning curve as response rate showing acquisition and extinction (when you stop producing the kibbles). — response for minute
The cumulative recorder taking the cumulative responses.
As you learn they increase, as it extinguishes they decrease
SD → R → O
Able to identify SD as something that should provoke a response and outcome
S△ → R → ØO
The other stimulus, the one you don’t want to respond to because it won’t provide an outcome
Sdelta or?
S-
Operant way of teaching discrimination (nose hold in position) example
Eg. Want a dog that can find specific snakes (eg. Gartner snake) in a park that is full of different kinds of snakes. Need dogs to tell difference between ribbon and garter snakes. How do you teach a dog to discriminate just ribbon snakes? We rely on their nose. It is very simple, show the ribbon snake smell and encourage the dog for sniffing (give treat or click). Now grab the other stimulus (garter snake) and do not encourage the dog for sniffing (no treat). Do you train no response (just walk away) or a distinct response for a no (train them to sniff, then sit down because it is wrong). S+ or yes response would be hold nose.
Contrasting approach: systematically reinforcing a response where they initially just show interest and ignore with the S-.
concept of discriminative stimulus
Light on » Press lever » get food
Light off » Press lever » no food
So after a while, the rat presses the lever only or mostly when the light is on.
what does it mean when an operant response is brought under stimulus control
a stimulus will set the occasion for the response.
- The light is a discriminative stimulus (also labelled S^D or S+): It is associated with reinforcement. The other stimulus is the S delta or S-.
occasion-setting in discriminative stimulus is…
operant
____ is necessary for all sorts of operant learning
stimulus control
shaping, autoshaping and ____ behaviours according to skinner
superstitious
superstituous behaviours example
If there is too much of a delay before giving reward, can occur because they have to hold their nose for 5 seconds, you can develop superstitious behaviours. Smarter breeds are more likely to develop superstitious behaviours (border collies) — because they are trying to determine what you want.
Idea is that you need to extinguish the response, may feel like you are going backwards. You have to get them to step back, do a normal entrance, then reinforce. How do you do this? Often going backwards. Get them to enter, put them right in front of the vial before the trial starts and reshape just the nose hold — not the whole entrance. This helps to reduce the possibility of reinforcing the wrong behaviour.
habit slips
In maze running, food encountered in the maze won’t stop the running to the goal (the end that is baited).
- If you put the reinforcement before the end goal, often the rat will just run by it and complete the task. That just means they have really learned the task and developed the habit of going through the full actions (almost the opposite of superstitious behaviours in some ways)
Habit slips, or strong discriminative stimuli »
response associations are not uncommon.
habit slip example
- Going to pick-up somebody at the airport but you take the Dartmouth Crossings exit because you go there often.
- These behaviours can be really hard to extinguish
the ______ of the response, does not matter if you focus on the goal
specificity
lashley showed…
that rats that learn to run dry maze, will immediately learn to swim it if it is flooded.
can all dogs trained in the lab transfer to the field
no only about 50% - dog knows the odours but the change in context is enough to confuse them
transfer of learning is a form of
generalization
specificity of response errors in change of context
might have to do with focus of control, or sign tracking/goal tracking
EDT
errorless discriminative training
EDT Terrace example
The dog learns to respond to the S+ (e.g., scent of lavender).
The S- (scent of oregano) is introduced early.
The S- is not presented in the first trial, and then is presented in weak form at first, and gradually strengthened: this is “fading” (fading-in).
attentional dimension of EDT
Manipulation of the saliency of the S-. Difficult to do if you don’t have the right equipment
EDT is all about
what you are getting the animal/human to pay attention to. Based on simple premise — during learning you commit the least amount of error possible and also the outcome is that (often) the performance of the animal/human will be with less errors than with typical discrimination training.
EDT example (long)
S+ (one you want the dog to respond to) is aways shown at full strength. You present the two stimuli at the same time. S+ (lavender) always shown at full saliency, but not the oregano. Present a very weak, varying, amount of the S- (oregano). Sometimes they will look towards the oregano, but ignore that and they typically go towards the more salient stimulus. Idea is that through all the trials, you make the saliency of S- stronger until they are at the same level. This works because you are indicating right away to the dog which stimulus is important and which to ignore. In some cases, they barely learn to ever respond to the S- so by the end of the task they have responded only to the S+, even though the S- is at full strength at the end.
why is EDT considered to be errorless
you diminish the amount of potential error that can be created
EDT tried to be used with what population
This got the attention of people working with a autistic children. If you go with trial/error method they often get very frustrated when it is not reinforced. How to teach autistic children without them making errors, created techniques that resembled this in some ways.
main advantages of EDT
Large reduction of errors compared to traditional discrimination learning
Less mistakes are possible during training.
No negative emotions during training (e.g., frustration, helplessness, etc.)
The training is potentially very fast
EDT conclusions
Mistakes are NOT necessary for learning!
Evidence from neural networks that sometimes it is
It is likely an attentional mechanism: You learn (early!) to ignore the irrelevant stimulus and basically never respond to it (or at very low rate and only initially in the training)
Trial and error is NOT necessary, in principle you can learn without the error just by focusing on attentional processes right from the get go
main disadvantages of EDT
Modifications or reversals in training are difficult: e.g., if lavender (S+) becomes the S- and you try to train the S- (oregano) as the S+. Acquisition of the S+ will be delayed and/or impaired.
Example: Training a dog to discriminate (for field search and detection) a specific species of snake within the Thamnophis genus.
If you ever want that dog to switch to Garter snakes as the target species, it may be challenging.
___ of dogs wont train through EDT easily
2/3
examples of how classical conditioning is involved in operant conditioning
Money — there is a delay, humans still understand this
The clicker in animal training or TAG teach (TAG = Teaching with Acoustical Guidance; e.g., used to train movement precision in dance or sports) — make the exact moment they did the exact right movement
the clicker is a ______ stimulus
classically conditioned discriminative
the clicker is a _____ for food (US)
pavlovian CS
why is the clicker used in operant conditioning
because it has acquired a reinforcing value in itself
where did the idea of using just the clicker come from
Situation arises first when trying to train orcas and dolphins in captivity — idea is that you dont want to interfere with animals during training. How do you tell them “yea you got that right, continue”. The conditioned reinforcer was created, not giving a primary but marking it with a secondary (the clicker or a whistle)
That in itself is literally telling you that the clicker does not need to be followed by the reinforcer, at least every time.
history of conditioned reinforcer
When pigeons were in the Skinner box, when being shaped and giving the right response, the food magazine would drop but make a click while doing so. Quite accidentally, researchers noticed that when the magazine would empty, pigeons would continue to respond for a long time, and most likely we responding to the click as a conditioned reinforcer.
what are conditioned reinforcers now used in
Used a lot with kids, eg. Stickers.
These are secondary reinforcers — they have been conditioned to come with a reinforcer later.
CS/US link may ____ over time if you stop paring the click with food for animals
deteriorate
how to use the clicker two steps
- Pair the clicker sound with the primary reinforcer , e.g. food (with a short of a delay between the two as possible).
- When that pairing is well established, then:
when there is both a clicker and food…
the clicker announces the food
what happens when you use the clicker in isolation
In this case, the clicker announces the possibility of food.
“… as long as the secondary reinforcer is occasionally followed by food, the behaviour is maintained”
Or the clicker is used as a marker (yes, this was the right answer, but continue what you are doing; e.g., in “chaining” — more on this below).
Could be called “keep going signal”
Becomes useful in a number of situations — solves issue of distance, need for spatial/temporal precision,
why use the clicker
- Necessary because of delay in delivering the primary, usually because of distance between the trainer and the trainee.
- Spatial and/or temporal precision is necessary.
- You are training behavioural chains, and use the clicker as a “keep going” signal (similar arguments as per #1):
chaining
You use chaining to train complex sequences
chaining is common in
sports, music, dance, and other high motor skill learning — eg. Flying an airplane
backward chaining
Backward chaining: 3, then 2, then 1 (reverse order of actions).
Learn the whole sequence by starting at the end and go backwards. Could be useful in correction of superstitious behaviour in a dog for example.
chaining is often use to…
train new pilots and flight simulators
“Stimuli associated with primary reinforcement can strengthen behaviour because these stimuli acquire….
their own reinforcing value”
conditioned (secondary) reinforcers and motivation
If not that, then at least it is feedback that informs the organism it will soon get the primary (sometimes called a “ keep going signal” in training circles).
still the idea that ____ is what gets ____ to happen
Idea that learning/dopamine is what gets attention/learning to happen
negative contrast
Rats (and humans) are sensitive to the contrast between reinforcers of different values.
negative contrast three rat groups
Group 1: plain water as reinforcement — fine, especially if animal had been water deprived. But water is not a terribly good reinforcer. Food when hungry works better than water when thirsty.
Group 2: Sweetened water as reinforcement — glucose gives it more value
Group 3: Sweetened water then plain water — they will get less responses to plain water.
is reinforcement or punishment better
Science suggests that reinforcement is the way to go, but that punishment works (albeit with negative consequences) if just talking about performance, punishment does work.
Question is what supports reinforcement — it is ethical. Important to make an ethical argument, what ethics tell us is that you well develop a better relationship with the human/animal if you use reinforcement as opposed to punishment
punishment works to suppress a behaviour, but at a cost…
Emotional distress, frustration, toxic relationship, etc.
And not even necessary… other things work.
Concurrent reinforcement to punishment may still keep the behaviour:
Sam is misbehaving at school and gets punished by the teachers… but the attention he gets from his peers is more reinforcing.
When the contingency are figured-out, it may lead to cheating:
till producing the “bad” behaviour when the punisher is not around.
best technique of reinforcement/punishment
DRA or differential reinforcement of alternative (other) behaviours
From a ethical perspective people say to always go with
positive reinforcement — that is better because it won’t damage relationship with the person. 98% of the time this is the ethical and scientific perceptive
is timing important for…
conditioned reinforcers
timing is particularly important for
early training — kibble has to come immediately after the behaviour you want is produced
continuous reinforcement
every time the animal produces the behaviour it is reinforced.
if you want a behaviour to be easily extinguishable reinforce it _____
continuously.
- Behaviour that is reinforced every time, when you stop reinforcing it will extinguish immediately. Typical example in parenting — you give a reward to your child for doing something, one day you dont and they won’t do the behaviour anymore.
continuous reinforcement sets up…
an expectation
When you initial train an animal (or human for that matter), you typically start with
a continuous reinforcement (CRF) schedule.
types of intermittent reinforcement
Fixed ratio
Variable ratio
Fixed interval
Variable interval
how are intermittent reinforcement/schedule of reinforcement recordered
Those schedules are recorded by a cumulative recorder and the output is a cumulative record. It shows the pattern of response based on the reinforcement delivery (the schedule).
ratio schedules
Ratio is based on a count of the number of occurrences or trials — each trial is every chance you would have to reinforce the organism. In a ratio schedule, you do not reinforce every trials — instead, skip a few then reinforcer
- Ratio schedules are based on the rate of response, i.e., a x number of responses (e.g., you reinforce the rat every 5 good responses, this would be a FR 5).
interval schedules
Interval is about the timing that elapses before the reinforcement, not the number of trials.
- Interval schedules are based on an interval of time between reinforcements, i.e., regardless of the number of correct bar presses (assuming some good responses occurred), the rat is reinforced at an interval of time (e.g., every 2 minutes, this would be a FI 2).
what kind of reinforcement were dog trainers taught to use
Dog trainers were taught to do continuous reinforcement, because ‘otherwise the dog would bark or get frustrated’ — gadbois says if that is happening you are doing it wrong
what is important for intermittent reinforcement
Intermittent reinforcement for every experiment people were taught to go slowly. Eg. If you see any stress or aggravation in your rat you are going too quickly.
Observing your organism is important — figure out what they like or dont. If you sense any frustration stop because something is wrong.
95% of the time when you start applying intermittent reinforcement and you get frustration it is because…
you are going too fast
study using intermittent reinforcment with dogs
Study said dogs in intermittent reinforcement got more stressed — they went from 100% continuous reinforcement immediately to 60%. Imagine your dog gets constantly reinforced and then suddenly only 60% of the time and they wonder why the dog is stressed. Thats because you were going way to fast. Actual way to do it is very slowly omit a few of the rewards, with time increasingly omit more. It is that simple. If you start to see any aggression, reinforce a little more.
You are setting up the animal to understand that when they give a right answer — give them feedback they did, but not always a reward. Eventually you will have pigeons that will give 3000 responses without reinforcement — that is how powerful intermittent reinforcement is when used right.
what is needed for the transition from continuous to intermittent reinforcement
need an adaptation period
fixed ratio
- FR 5 = every five answers you give a reward
- If you are continuously reinforcement start with FR2 for example, needs to be low early on.
- FR 5 is extremely predicable — they know when the reinforcement is coming. They will tap the lever five times quickly to get the reward, they will accelerate.
- They have an expectation and that is the issue with the fixed ratio — they learn they just have to produce five responses
variable ratio
- Variable ratio = VR 5 +/- 2
- You set the computers to come up with these numbers, what it means is that they know they will get a response but not every time.
fixed interval
- FI 10
- Unit of time
- Regardless of number of responses they give, they will not be reinforced until the interval is over — eg. 5 minutes
- Now they know if they wait a certain time they will get a reinforcement
- Problem is that they figure out what they do doesn’t really matter until the end of the internal
- Can correct this problem using variable interval
variable interval
- VI 10+/- 3 minutes
- Dont know exactly the interval of time where they will start getting a reward — therefore will work really hard in those conditions
- Will typically get you a more consistent reinforcement, lower chance of extinction
what methods of reinforcement is preferred
Anticipation and learning connected, anticipation will be better when you dont know when you are getting the reward therefore fixed and variable intervals are better.
psyches of dog trainers now are wrong
Most dogs don’t care about the treat because they are so into the task (working breeds). Whole point of doing intermittent reinforcement is that you train your brain that this is fun because you dont know when you are getting the reward — its a surprise! Very resistant to extinction — falls into category of habit learning
When you keep habit learning it becomes pretty much unextinguishable. Problematic when you do therapy on behaviours you are trying to get rid of that have become habits.
problems with fixed ratio or interval
they are predictable!
what is the benefit of variable ratio or variable interval?
By applying those, the rat or pigeon is unsure about when the reward is delivered and is more likely to respond eagerly and the behaviour acquired is much more resistant to extinction
real life is…
intermittent
what relationships are hard to snap out of
Relationships that are hard to snap out of are relationships that are unpredictable, they’re exciting — there is a lot of anticipation
That is because you are on a variable ratio — most social interactions are on intermittent reinforcement
Reinforcement matters in social behaviour, that means that all principles you know from learning theory applies to these situations.
fixed ratio (example and consequence)
Reward every xth response. Consequence: Produces a good amount of behaviours, but lots of post reinforcements pauses. You know that after you are done that ratio, there is no need to rush.
variable ratio (example and consequence)
The ratio is harder to predict. Consequence: High and persistent rates of responding.
- Slot machines!
fixed interval (example and consequence)
Reward at a fixed (xth) interval of time. Consequence: Low rate response after each reward. Same problem as fixed ratio because it is relatively easy to predict.
variable interval (example and consequence)
Steady rates of behaviour because the outcome is difficult to predict in time.
Social media likes or messages
variable interval: social media
social media companies have people that are trained in learning theory to control this, they control when they let you know, how they let you know, when you stop responding and they look at your rate of responses. They use these intermittent schedules of reinforcement, mostly variable ratio/interval
Can be used for good and bad things!
Variable ratio/interval gives…
more persistent behaviour less chance of extinction
Ratio schedule
Strong correlation between the rate of reinforcement and the rate of behaviour.
Faster response rates (the “hurried-up” nature of the behaviours produced, e.g., lever-pressing).
Interval schedule
Reinforcement rates can be steady, but the rate of responding can vary immensely.
More pauses and wait times between responses.
compound schedules
Use of more than on schedule. In fact, outside of the laboratory, these are by far the most common schedules of reinforcement.
different schedule examples
Mixed/multiple/tandem/chained
matching law
When training an organism you want to see what they respond to best — test what works best. There is some evidence that some species will not deal well with variable intervals (as much as we said they are good if you go slow enough), some species will not work unless it is continuous reinforcement
- Skinner thought there were universal laws here but that is not the case
matching law example
Give two different schedules to a pigeon
A: pecking at disk A is reinforced on a VI 2 min schedule
B: pecking at disk B is reinforced on a VI 1 min schedule
- In this experiment you get more for VI 1 min, this could change with time and will depend on how hungry the animal is. Eventually, they may switch to VI 2.
relative rate of responding to a particular choice (response alternative) will match (equal) the…
relative rate of reinforcement for that choice (response alternative).
what does matching law explain the relationship between
explains the relationship between payoff and choice
Idea is that animals will try to maximize reinforcement at a lower cost
what explains the matching law
- Animals are likely to try to maximize the rate of reinforcement: How much you get per unit of time / session.
- They try to improve the local rate of reinforcement, so they shift between the options (choices). This is called melioration
- Rate is not the same as strength, so both are factors (the same way as speed and mass influence momentum in kinematics): This brings us to momentum
melioration
this depends on emotional state — will often simply switch to a better schedule when it fits their current metabolic needs better
what is reinforcement a test of
persistence rather than competence
delayed reinforcement
ometimes reinforcement is delayed. How well animals take delayed reinforcement matters (if too long, organisms do not take it well). If humans are being explained to — going to work two weeks for a lot but then get paid, humans are okay with that.
Impulsiveness vs self-control (impulse control)
impulsive people are more likely to want their goods right away
procrastination and working priorities
a lot of this has to do with your ability for delayed gratification. Idea is that you have ease into it and develop a technique of finding small gratifications.
implications of reinforcement
- delayed reinforcement
- impulsiveness vs. self control
- procrastination and working priorties
Example of tasks testable with people and animals: Choice between a large delayed reward vs. a smaller immediate reward.
The general observation: Animals and humans are impulsive, i.e., prefer smaller immediate rewards.
example of delayed reinforcers
biweekly paycheque
delayed reinforcers are…
not as reinforcing as immediate ones.
If you are trying to intervene with things that have regular schedules it can be challenging — really hard to get people to adhere to things unless you have …
a pre-commitment strategy
what do pre commitment strategies have applications or
studying, weight loss, cessation of smoking, snoozing the alarm in the morning, etc.
what are pre commitment strategies
get people to announce what they are going to do (eg. New years resolution). You are dangling the possibility of shame if you dont go through with what you said you would
Example: Dry January and other resolutions that are “publicly” announced.
what are factors in delayed gratification
temperament / personality, development and psychopahtologies
personality example
impulsivity
development example
Impulse control (especially in boys), etc.
psychopathologies example
DHD, etc.
ADHD is complicated: idea that attention may be a problem. If you cannot focus attention, learning is not going to happen.
strategies for delayed gratification
- Make the immediate reward less “rewarding” (or even coming at a cost).
- Make abstention profitable: You get a reward if you don’t procrastinate.
- Make the associated delayed reward explicitly less appealing if the choice is to get rewarded immediately
Make the immediate reward less “rewarding” (or even coming at a cost).
Put a cost on engaging in the behaviour now. They still get a choice in the matter.
Make abstention profitable
Ironically, that profit can be delayed, it will still work, e.g., “If you finish your homework now, you can play video games for longer tomorrow”.
Appetitive reinforcement?
Make the associated delayed reward explicitly less appealing if the choice is to get rewarded immediately
Personality difference here, impact may depend on whether you can be patient.
what can help delayed grat
Much of this can work if you make “anticipation” exciting (as it should be). It tends to be more rewarding to some than others. “Waiting” does not have to be aversive.
Humans have working memory and prospective memory (planning) — ability to deal with things that haven’t happened yet.
making anticipation exciting depends on
- Working memory: Good working memory = reduced impulsiveness.
Distraction from the appeal of the immediate reward is the strategy here. Remember that working memory is conceptually overlapping with attention. - Prospective memory and planning: To process delayed rewards, you need to be able to process the future (i.e., mental “time travel”)
what are Impulsiveness / sensation-seeking / risk-taking implicated in
asal ganglia/dopaminergic system
Personality traits that may modulate and influence behaviour in both humans and animals.
Many of these traits are associated with pathologies or risky behaviours:
Gambling
Addiction
Risk junkies (sky-diving, bungee jumping, risky sexual behaviours, etc)
ADHD
Borderline personality disorder — characterized by risky sexual behaviours
dopaminergic system dogs 1992 study
Dogs in 1992: paper that showed a massive difference between breeds that have high vs. Low dopamine.
High = border collie, jack Russel, Australian Shepard (more energetic)
Low = great dane, saint bernard, Newfoundlander (more lazy)
Difference is activity level and impulsiveness
Can be applied to friends: you may have friends that prefer skydiving vs. Friends that prefer laying on couch
We now know through neuroimaging that there is a correlation between…
raits and dopaminergic system — idea of addictive personalities. We know this is a thing, but we don’t know what to do about it.
We think the key to additive personalities is sensitivity to reinforcement — responding to variable intervals, excitement, etc.
Idea that it is dopamine quantity and sensitivity both — unknown
day-to-day applications for managing delayed grat
- Give choices — with any organism, test your procedure, reinforcement and schedule — every individual may be different
- Help make better choices
- Be clear about the impact of the choices — explain consequences
clinical approaches for managing delayed grat
- Contingency management — directly from learning theory
- Incentive-based treatment
Behavioural economics and neuroeconomics (not huge on exam)
- These two fields have massively influenced research in behaviour and neuroscience, and have been influenced by those sciences as well. Go a little beyond basic cognitive psych and neuro — weird interaction between rational and irrational behaviour
- We know that humans love to think we are cognitive and rational, good at following rules. When actually, we are unbelievably limbic in the way we make our decisions. What we do is limbic not cortical, based on emotions/raw motivations/reinforcement — applies to psych, bio, etc.
theories of reinforcement
- Drive reduction theory
- The Premack principle
- Behavioural regulation theory
- Selection by consequences
Interestingly, Skinnerian (radical behaviourism) theories are ______ in explaining reinforcement.
not very helpful
skinnerian reinforcment
- They avoid mentalistic concepts, hypothetical/theoretical constructs, e.g., “motivation”. — had to adopt one to explain how motivation works (drive theory)The Skinnerian arguments to explain reinforcement are circular arguments (tautologies).
- They do not have a theoretical framework to explain reinforcement, nor do they seem to mind.
Clark Hull and drive reduction theory
theory of attrition/deficit, missing something so you get it — reducing drive
Drive reduction theory:
Reinforcers reduce drives.
drive reduction theory example
Example: Treats reduce hunger (while training a dog; logically, this means that a hungry dog would learn better, faster, etc.).
drive reduction theory is a ____ theory
It is a deficit theory (you do or learn something because you need to reduce a drive, e.g., hunger, thirst, etc.).
Hull eventually realized the drive theory had limitations and started to talk about “incentives”.
In the section on motivation, we will cover the incentive theory that does a much better job at explaining motivation and its role in learning (and behaviour in general).
Neuro and psych have adopted incentive theory — works better than drive because you can classically condition motivation
premack prinicple aka
the differential probability principle
classical/traditional principle
Contingency between a behaviour and the reinforcement.
premack principle
The contingency is between two behaviours — contrast between pressing a lever and getting fed, to him this is the essence of reinforcement. More probable response will reinforce less probable responses
premack prinicple in clear terms
more probable responses will reinforce less probable responses
The question: What is the contingent behaviour? In the Skinner box, what are the two behaviours?
skinner box: premack
Skinner box:
Bar pressing
Eating
- But the rat would rather eat… In other words, reinforcement happens when the (instrumental) behaviour gives access to a preferred behaviour.
- The bar pressing gets you to eat which is the motivation
- Theory about contrasting two different processes
solution for reinforcement!
Do a preference test! See what is more reinforcing to the subject: Food, toy, play, etc.
Find the behaviour (instrumental act) that provides access to a more preferred behaviour.
premack 1959
play a pinball machine or have candy
Some kids chose candy (they spent more time eating candy than play pinball).
With these kids, if candy is made contingent on playing pinball, pinball playing increases.
The reverse is true for kids that chose pinball over candy.
preference is…
really important!
preference lab example
Dog owners bringing dog preferred treats to lab
Behavioural regulation theory (Timberlake et al): premise
al behaviours have a preferred level. If you restrict access to that behaviour, the organism will engage in an other behaviour or behaviours to gain back access to that behaviour.
Behavioural regulation theory: staddon
Bliss point and the minimal distance model. How the organism distributes its behaviours to maximize reinforcement (the bliss point).
So all organisms will learn to maintain a preferred distribution of behaviours.
behavioural regulation theory main assumption
The main assumption of this model is that humans and animals will seek and attempt to maintain an optimal combination of activities (the bliss point).
Idea is that as organisms we try to meet our metabolic needs
Selection by consequences (Donahoe et al.): Premise
Learning is like natural selection.
selection by consequences, premise is like…
“weeding-out” (Bouton’s term) or “pruning” (my term) by reinforcement (or punishment…).
“ingredients” for natural selection
- You need a trait to select. Here it is a behaviour.
- You need a selective agent. Here it is reinforcement (via the teacher/trainer/computer).
selection by consequences mechanisms are similar to…
imilar to natural selection — reinforces the theory that learning is not just about acquiring information, but also pruning information
selection by consequences prerequisites
- variation
- fitness consequences
- mode of inheritance
- limited resources
variation
This is now variability in forms of behaviours, across situations, etc.
fitness consequences
This is now simply reinforcement contingencies
mode of inheritance
The learning process itself (operant conditioning).
limited resources
The constraints on the process, and deprivation (e.g., of food) is the analogy.
- eg. If you are reinforcing with food, maybe you need hunger. Some dogs are suggested to skip breakfast, so that they will work harder to increase the drive
learning is…
ontogenetic (it happens during a lifetime, i.e., it is a developmental process)
learning happens to…
an individual
natural selection is…
phylogenetic (happens over long periods of time and across generations, i.e., it is an evolutionary process).
natural selection happens to…
species
selection by consequences elements found in connectionist theories
- Neural networks models make similar assumptions (pruning process are described).
- Genetic algorithms and genetic programming: They use explicitly biological terms such as “selection”, “mutation”, “crossover”, etc
motivation is a…
hypothetical construct
potential issues with Structuralism, Functionalism, Gestaltism
Potential issues: Introspection, lack of experimental rigour.
radical behaviourism (skinnerian)
Potential issues: No hypothetical constructs or intervening variables tolerated (i.e., mediators, e.g., memory, emotions, motivation, the brain).
radical behaviourism (skinnerian)
Potential issues: No hypothetical constructs or intervening variables tolerated (i.e., mediators, e.g., memory, emotions, motivation, the brain).
Operational behaviourism (modern academic behaviourism).
Potential issues: Still focussed on conditioning to explain all behaviours; and focus on behaviour (as opposed to its underlying processes).
Cognitivism: Focus on thought processes, awareness, consciousness, etc.
Potential issues: Poverty of the stimulus and representationalism.
Connectionism
- PDP models, a.k.a. neural network models
- A form of neural associationism
- Explains well perception, learning, memory, but not so well the higher cognitive processes.
What is motivation? (4)
- Hypothetical (theoretical) construct
- Need for intervening variables » Hull » Operational behaviourism (Tolman)
- So important that Woodworth thought psychology should be called “Motivology”.
- But why did some behaviourists (mostly Hull and Tolman) see the need for motivational concepts?
– The main reason: Explain how reinforcement (and punishment) work.
Where is the evidence for motivation?
behavioural
neurophysiological
behavioural evidence for motivation
- Behaviour is variable — some days you are horny, other days not — this changes how you feel
- Behaviour is persistent — why do we procrastinate? Can persist in good and and behaviours regardless of the outcome
neurophysiological evidence for motivation
- The impact of physiological states relating to homeostasis (balance, a compensating system), allostasis (systems do not balance easily because there are many things going on in parallel — chaotic and paradoxical), metabolism
- Neurotransmitters: e.g., dopamine (one of the most important NT for motivation, that is why it has a direct impact on learning)
- Hormones: Mostly peptide and steroid hormones
How was motivation historically defined?
- ethology/biology
- psychology
Ethology/biology historical definition
drive as instincts — the hunting drive is not really a desire, it is more an incentive behaviour
psychology historial definition
- Drive theory: a response to deficits (it is a “deficit theory”, focus on -R) — deficit theory, works well with homeostasis, weak theory in some ways
- Incentive theory: focus on +R, intrinsic motivation, etc — much more based on positive reinforcement and classical conditioning
incentive theory motivation
- Incentive motivation: Motivation for instrumental behaviour created by anticipation of a positive reinforcer. Also called the rG-sG mechanism (from Hull & Spence)
- incentive motivation is about how you acquire a reinforcer - Incentive learning: A process by which organisms learn about the value of a specific reinforcer while they are in a particular motivational state
- Learning of how to be motivated, you can teach an animal to be motivated — requires not always making things too predictable or too easy
do you need reinforcement for learning?
Technically no. incidental learning (latent learning), implicit learning dont require reinforcement
first demonstration of incidental learning
Tolman and latent learning
are reinforcers necessary for learning
Reinforcers are not necessary for learning but are there to motivate behaviour and give purpose.
Reinforcers affect motivation, not learning.
Learning is not performance…
but motivation is performance.
will learning always extinguish if not reinforced
Nope
Eventually extinction will happen in classical conditioning and operant learning
Habit learning: operant w a lot of repetition, everything gets autonomized and the acquisition of habits will not get extinguished
Habit learning…
- Habit learning was hard to situate in learning theory
- Hull and Lashley (e.g., “Maze running habit”) used the term.
- Or Tolman’s “habit slip”.
- The modern use started with Hirsh (1974), then Miskin et al. (1984).
concept of habit learning is very close to…
“skill learning” (both with a motor component) and both are seen overall as “implicit” or “non declarative” forms of learning
habit learning and skill learning both seem to be controlled by
same parts of brain — basal ganglia, dopamine, subcortical
Idea you can go beyond conditioning and get really strong learning that is not much at the mercy of external reinforcement because it has become intrinsically motivated
Eg. Dogs in field don’t always need a treat, sometimes just motivated enough by finding a turtle
The more you do something, the more likely you are to develop intrinsic motivation
Difference between liking and wanting something
types of declarative (Explicit) learning
facts and events
types of nondeclarative (implict) learning
skills and habits, priming, simple classical conditoning, nonassociative learning
Habit learning is:
Instrumental in nature, and very much associated with “motor learning” and the basal ganglia (as opposed to hippocampal-based learning).
- More subcortical than we would imagine
Hippocampal (contextualizer) learning
Context learning (spatial and temporal). Defined as “rapid learning”.
We also know from Gray that the hippocampus is a “comparator”.
The hippocampus gives context (the what, the when and the where).
Change in context can make you lose some learning or ability to adjust in a certain environment
Basal ganglia-based learning
Motor in nature. Defined as “slow learning”.
Now often defined within the realm of the “cortico-striatal loop” (two loops, one is motor, the other is motivational)
If a rat learns to run the maze, but can do it by swimming it is not just basal ganglia earning (memorizing specific motor sequences), but also hippocampal. They both work.
The cortico-striatal loop (from Seger & Spiering, 2011)
- Not just cortical or subcortical, but both
-So motor-based learning (habit and skill learning) and motivation are linked in the brain.
-Both are heavily driven by the dopaminergic system.
-Remember: Both “motor” and “motivation” (even “emotion”) have the same etymological root.
–> Why you give amphetamines to people with ADHA (ritalin)
All parts of the brain are kinda connected…
if you look at the etymology but motor and motivation go together
the evolution of the concept of motivation: Hull
drive theory
drive theory
- Hull, early work: Reinforcement » reduction of drive (e.g., hunger)
- Behaviour strength = D x H
- Behaviour strength = Drive (need) x Habit (learning)
- Interaction between drive and habit
- Behaviour strength = Need x Learning
Resistance to extinction depends on:
- Degree of food deprivation (how hungry you are): Drive or D
- How much prior reinforcement (reinforcement history) you got: Habit or H
Incentive theory: Hull take 2
- The concept of incentive motivation: Hull & Spence
- Behaviour strength = D x H x K
Behaviour strength = Drive x Habit x Incentive
Behaviour strength = Need x Learning x Incentive
Realized the K explains better forms of learning — innate
Incentive =
motivational effect of reward
Hull’s theory (even the update with Spence) is seen as
outdated today
Now we conceptualize motivation as
response to an anticipation of need
Tolman’s view
Tolman: He argued that animals learn expectancies
Food rewards are confirmatory (or not).
Food = catalyst (Rescorla, 1985).
for a strict, radical behaviourist
- the operational terms “anticipation” and “expectancy” are problematic… but only theoretically.
- So they prefer still the term “reinforcement”
- Clearly the reward value in a learning task is contingent on how easy it is to get it
- The way the see the reinforcer is as a confirmer (yea, I was right — like tap teaching in dance)
Here we lose the radical behaviourists because
(anticipation is involved)
The physiological processes are also not a focus of radical behaviourism.
The processes are invisible to the naked eye… unless you have technology… and we do in neuroscience.
Although the process is classically conditioned…
Indeed, the motivational effects of reward come from classical conditioning (this is “incentive learning”).
This is what incentive learning really is
The motivational effect of reward comes from
classical conditioning
reinforcement affecting motivational state idea
Idea is simple: if you are hungry, you are learning about that state of being hungry. It is actually when you are hungry that you are open to the incentive value of the situation you are in. Only when you are hungry can you learn, because it sets you up for that anticipation. Anticipation itself becomes the reward (remember the clicker!)
Clicker announces food > clicker announces possibility of food > pigeon that is still pecking at same stimulus despite the fact there is no food
This occurs because there is still anticipation!
In a sense, motivational states are occasion setters
incentive learning
-You learn in the hungry state about the incentive value
-So you learn about an incentive, and then:
-You link this to the instrumental action.
-If you experience the +R in a non-hungry state, then you will have no interest in the +R (e.g., food).
-Motivational states are occasion setters! (Davidson, 1993, 1998, etc.) or facilitators (Rescorla, 1985)
-Great review of these ideas in Dickinson & Balleine, 1994
Interlude; what are occasion setters?
Safety signals are negative occasion setters:
“if this light is on, the shock won’t happen”
“if I have my pills, I won’t have a panic attack”
and even: “if I have my pill bottle – even if empty – I won’t have a panic attack”.
We can have modulation in conditioning
Facilitation and facilitators; occasion setting and occasion setter
occasion setters are…
environmental cues (CS’s) that “set the occasion” for conditioning to occur. They “facilitate” the conditioning.
Occasion setting (Holland) =
facilitation (Rescorla)
Occasion setters (Holland) =
facilitators (Rescorla) » motivational states
An o.s. is a
CS that confirms a CS-US pairing.
An o.s. is a
modulator of the conditioning between a CS and a US.
In itself, it becomes rewarding
So we have three ingredients in classical conditioning modulation:
- The CS
- The US
- The modulator (second CS) = occasion setter = facilitator » motivational states
what does the modulator add
This adds what we call “conditional relations” and “conditional control”.
It is not unlike basic discriminations in operant conditioning when we reinforce the S+ (the target stimulus or “good response”), and not the S- (the foil, distractor, or “wrong response”).
pavlovian discrimination
Two ingredients:
1. The target stimulus: The CS that is present on every trial in a CS-US pairing. It is the target because that is the one that the subject responds to.
2. The feature stimulus: The CS that is present only in trials indicating if the US occurs or not (depending on the procedure).
On top of operant discrimination, you can add something from a specific situation using classical conditioning
occasion setter facilitates…
a specific association between stimuli — facilitator, can be there or not but its presence facilities something
So back to incentive motivation and learning:
-Hunger triggers the foraging behaviour (system).
-Motivation “energizes” the action.
-But you must learn (“know”) that the action leads to the reinforcer.
-Occasion-setters (or facilitators) help in doing so!
-You must learn that the reinforcer has a positive effect on the motivational state.
-The motivational state increases the desirability of the reinforcer
“Learning will always extinguish if not reinforced”
Actually extended training makes behaviour less sensitive to its consequences » habit learning (e.g., Holland 2004).
It is not about the need, or the reward…
but about the anticipation of the reward…
acquired motivation
Motivation is not from drives/needs, but rather the anticipation of need…
Back to Tolman and latent learning…
1.Exploration of the maze. No reward.
2. Reward in the maze: They excel, with no previous reinforcement history!
So… rewards:
1.Reinforce behaviour
2.Motivate
early behaviourists believed in
gradual learning, not one trial learnings. But if a rat can do it just by exploring and no reinforcement, there is something else going on. Rewards reinforce and motivate behaviour
tolmans ultimate contribution
As Bouton (2016) says elegantly, “a reward at the end of the tunnel” goes a long way
Sometimes just the possibility of it is enough
Flaherty (1996) discuses at length these phenomena in “Incentive relativity”.
crespi (1942)
Negative and positive. contrast effects
crespi study
An increase in reward (1 to 16 pellets) results in an increase in running speed and elation = positive contrast effect.
– Acquire the task, at shift time there is no change. The learning is stabilized.
A decrease in reward (256 to 16 pellets) results in a decrease in running speed and depression = negative contrast effect.
– Huge crash in running speed, they adjust their behaviour based on what they are reinforced in
1 pellet to 16 pellets
– Huge jump in their learning! Post-shift is huge
crespi study rationale
Those effects are caused by a change in the expected value of the reward. Not really the reward, but what you expect the reward to be. How much effort you are putting in might be modulated by the expected value of the reward.
Not doubt this is paradoxical
A few words about paradoxical reward effects:
Rewards can sometimes weaken behaviours (in an instrumental context)
- -Eg. You get tired
Non-rewards can sometimes strengthen behaviours (in an instrumental context)
– Eg. Allude to incentive/intrinsic motivation
This includes the negative contrast effects mentioned above.
Rewards may not always be as magical as people think, they often and quickly become irrelevant
- Animal trainers seem to forget this — the importance to develop incentive motivation and that you do not need to always reward.
- Gen-X thought always rewarding was a great idea — everyone gets a trophy. Sounds good but it creates more problems than anything else, namely increasing a dependency on extrinsic motivation.
What about extinction and motivation?
Extinction has a lot to do with the magnitude of the reinforcement.
The bigger the reward, the bigger the extinction
The higher magnitude for a reward, the faster the extinction. (Hulse, 1958; Wagner, 1961)… UNLESS, you are dealing with HABIT LEARNING
over learning extinction effect
Many rewarded trials can increase the rate of extinction (as opposed to fewer rewarded trials)
what should you do with high value reinforcement
Maybe make it intermittent
example of intermittent reinforcement
Slot machines are very careful in giving very big payments very rarely. The small payments keep you there, the inconsistent big prizes are the anticipation.
Can rewards have negative effects?
Extrinsic rewards can affect performance negatively.
“Punished-by-reward” principle in humans
“Punished-by-reward” principle in humans
People like this do not want to take responsibility and are also extremely sensitive to external rewards. When it is not rewarding anymore they disconnect. This is the danger of using extrinsic motivation too much — idea is trying to find ways of developing intrinsic motivation
can verbal rewards be good?
For some people these are more important than money
Expectations, overall, are important.
Partial Reinforcement Extinction Effect (PREE) and PERSISTENCE
- Rescorla, 1999 (and a few other studies) — said connection between US/CS was really important
He realized he might be wrong after Pearce/Hall and Wagner
-Partial reinforcement reduces the associative change produced by non- reinforcement.
Two groups of rats: CRF (continuous reinforcement) and PRF (partial reinforcement)
In acquisition, CRF (100% of trials are reinforced): better (faster, stronger acquisition)
-But: In extinction, PRF (e.g., 50% of trials are reinforced) » persistence
running speed (Effort for motivation) study
Running speed (effort for motivation) — in acquisition trials the continuous reinforcement were doing better than the partial. Seeing a little less effort
In extinction: the continuous lose the behaviour much more quickly, running speed goes down dramatically. The partially enforced rats see effort go down over time, but it is much slower.
working dogs hpoglycaemia struggle
Train dogs to recognize signs of anxiety in people with PTSD
Biomedical alert and the lack of maintenance training — people with hypoglycaemia in people with diabetes
This events are very raw, so it is hard to reinforce. Even when we have people that volunteer to produce those samples, well over have the time they forgot. When they get hypo they deal with that situation at hand, not about training the dog. The dogs then do not get reinforced. That is problematic unless if during the training they start using partial reinforcement. Real life for these dogs means that dogs will not always get reinforced, but they need to keep detecting it.
Performance collapse, in part because of rare events: Few chances for reinforcement… if the responses to the events are even reinforced when they occur!
Learned industriousness (Einsenberg, 1992)»_space; this can transfer to other tasks.
Sequential theory (Capaldi)
How to explain PREE? Two theories:
1. Frustration theory (e.g., Amsel): Not well supported
2. Sequential theory: Not about frustration
1. Extinction = many non-rewarded trials (by definition!)
2. No problem! Especially if this matches acquisition done in PRF!
3. It is the memory of the acquisition phase that “sets the tone” for what to expect.
Set expectations early and your future behaviour is dependent on this. This is why always reinforcing is a bad idea. What happens during the acquisition phase is what matters.
the sequential theory, and small and large rewards: Capaldi and Capaldi (1970), Leonard (1969)
Sequences of:
non-reinforced trials (N)
reinforced trials with large rewards (R)
reinforced trials with small rewards (r).
Two acquisition types:
rNR
RNr
rNR is more resistant to extinction as suggested by the sequential theory
This is because they expect less initially
is frustration theory very well supported?
no, not very well supported. Frustration does not seem necessary to justify motivation.
In fact, some cognitive factors seem at least as important:
- Memory
- Associative strength
How foraging works example
Important point. Basically said, why is it that there is so much resistance from some behaviourists and some animal trainers on the idea that reinforcing all the time is not the best. They point out that if you actually look at how animals get reinforced in the wild, that reinforcement every time would collapse very quickly
Looking at success rate in predation: for most is very low
Yet they continue doing it
If they were right that you need to be rewarded every time you try to hunt, most lions after a few trials would quit
Says: hey, our brains are wired for not knowing and instead hoping. That incentive/intrinsic motivation and effect of anticipation and expectancy.