Lecture 6 - Operant Conditioning Flashcards
Operant condition
(also known as instrumental conditioning and trial-and-error learning)
is associating a voluntary
behavior (‘operation on the environment’) with an outcome.
some action the animal chooses to do is associated with an outcome
Law of Effect
Animals learn that a behavior (or class of similar behaviors) predicts a particular outcome and seek the outcome by performing a particular behavior
Behaviors with good outcomes increase; behaviors with bad outcomes decrease.
(Thorndike, 1911)
Discrete trial paradigm
(Thorndike,
1911)
Cat opens the puzzle box and is reinforced with food reward.
Cat learned that flipped the switch was responsible for it getting out and getting food: So that escape behavior becomes more likely (and faster) in the future.
discrete because every-time the cat got out that was one trial and for each new trial the cat had to be put back in
B. F. Skinner
free-operant paradigm
refined Thorndike’s method to allow the animal to respond repeatedly
==> allowed the animal to control the rate of responding ==> animal controls when they get the reward (food)
• SKinner Box
Skinner Box
little contraption, everything was automated: counted the number times the lever was pressed, counted the number of times the reward was provided
made it easy to measure this activity over time
instead of recording trials you’re recording behaviors over time
• Behaviors could be automatically recorded
in a Skinner box – count number of behaviors and outcomes.
Acquisition
reinforcing behavior: giving reward for every time the rat presses the lever
the amount of responses goes up
extinction
it keeps pressing the lever but no food comes
if you stop reinforcing the behavior then the behavior starts to go away
amount of responses decreases
Basic elements of the free-operant paradigm:
- discriminative stimulus (S)
- behavioral response (R)
- outcome (O)
S –> R –> O
Through repeated trials, the animal learns that the outcome is contingent upon
the appropriate response.
discriminative stimulus (S)
that helps you select
the appropriate behavior (e.g. rat can see the lever).
the animal has to be able to ID something in the environment that it’s operating on
behavioral response (R
or class of similar responses,
is performed in response to the stimulus (e.g. rat pushes lever with either paw).
outcome (O)
follows that either reinforces or punishes the behavior (e.g. rat gets food, good outcome).
reinforcers
Outcomes that increase the likelihood of the behavior
primary reinforcers
secondary reinforcers
primary reinforcers
meet some innate need (e.g. food, water, sleep, and sex).
Note that these are not always reinforcing (i.e.
you won’t work for water if already satiated).
Secondary reinforcers
have no intrinsic value, but predict or are associated with primary reinforcers (e.g. money, good grades, gold stars, etc.).
something by itself has no value but through some kind of association it’s learned that this other thing is valuable
punishers
Outcomes that decrease the behavior
primary punisher
secondary punisher
Primary punisher
Pain (shock), nausea, loud noises, social disapproval (?), loss of freedom (jail).
basically just aversive things
Secondary punisher
Monetary fines, demerits, bad grades, etc.
You are about to press a button on your iClicker. When
you see that you got the correct answer to the question,
that acts as a ______________.
Secondary reinforcer
positive (+) conditioning
If an outcome/consequence is added, if you’re given an outcome as a result of your behavior
this has nothing to do with “good” or “bad.”
negative (-) conditioning.
If an outcome/consequence is removed, something is taken away
this has nothing to do with “good” or “bad.”
Positive reinforcement
when you want to increase the behavior (reinforce) and you do it positively
animal rewarded for doing a behavior –> given something to make the behavior more likely
response increases (reinforcement)+ consequence is added (positive)
Negative Reinforcement (escape/avoidance)
response increases (reinforcement) + consequence is removed (negative)
want the behavior to increase but take something away ==> if you do something I want, I’ll take away a “bad thing”
Positive punishment
Response decreases (punishment) + Consequence is added (positive)
when you don’t want a behavior and you add something (electric shock)
Negative punishment (omission)
Response decreases (punishment) + Consequence is removed (negative)
I don’t want you to do a behavior so I take something away (money, privileges, etc…)
“No more T.V. for you!”
Positive reinforcement example
Eat all your vegetables –> get some dessert.
“do something I want you to do and I give you something”
Positive punishment example
Scratch the couch ==> get sprayed with water;
tease your sibling ==> parental scolding.
Negative Reinforcement example
Shut off the alarm clock (aversive stimulus) ==> removal of an aversive stimulus;
- arm does flailing motion - next morning you're more likely to make that same movement - reinforcement of behavior that takes away an aversive stimulus
take ibuprofen ==> reduce a headache.
- next time you have a headache you're more likely to grab that medicine again - reinforces that behavior of taking the medicine - you're not getting anything, something is being taken away (an aversive stimulus - a headache)
Negative punishment example
Commit armed robbery ==> loss of freedom (jail).
timing and context in operant conditioning
are critical for forming the association.
critical for how effective it’s going to be
If the outcome is delayed….
… the association is not learned as well.
So, punishing your dog for something it did an hour ago is probably not very effective…
any kind of reinforcement to be effective needs to come
almost immediately
Reinforcement schedules
(i.e. how often you get the outcome)
how providing an outcome, on what timing, how frequently, how reliably, how that can affect the rate at which the associations are learned.
how often and how reliably you get the outcome: going to affect the rate of learning and the effectiveness over time
continuous reinforcement
schedule
When you get a reward after every behavior:
every time the rat presses the lever it gets a reward: no break in the reward: everytime you perform the action you get the outcome
partial reinforcement
schedule
anything that isn’t a continuous reinforcement schedule
the outcome follows
less than 100% of the time
variable-ratio schedule
A powerful form of partial reinforcement schedule
steep learning curve: if you don’t know when it’s coming you just keep banging away at the lever
you don’t get the outcome every time, but you get it about every 5 or 10 times - but you can’t predict it (unknown) – the exact timing can’t be predicted.
gambling!!! - the payout is variable
most effective and has the highest curve of learning
fixed ratio
every fifth time you perform the action you get the reward
rats: 5 responses and a pause (a plateau)
Sheldon gave Penny chocolate each time she did
something to please him. What kind of paradigm is this?
Positive reinforcement
Sheldon sprayed water on Leonard when he disagreed.
What kind of paradigm is this?
Positive punishment
Sheldon wants Leonard to do it less (punisher) and he is adding something (positive)
something is being added to the situation and he wants him to not perform that behavior again
Reinforcers and punishers can be
equally effective at
producing behavior in laboratory conditions (controlled conditions); however, punishers can experience problems in the real world.
Problems with punishers?
in the real world when you can’t control those discriminative stimuli or timing as easily
- If you punish a behavior, you may encourage
cheating/circumvention. (“Don’t’ speed” becomes “Don’t get caught speeding”.)
2. Concurrent reinforcement may undermine the punishment. (Student punished for talking in class may be reinforced with approval by other students.)
- Punishment can lead to more variable behavior. (If a specific behavior is decreased, what replaces it?)
- if you punish a child for jumping on the couch, then they may start jumping on the bed (doesn’t get rid of the class of behaviors)
- The initial intensity of the punishers needs to be fairly intense (otherwise you may get habituation).
- Punishment can lead to stress and anxiety, which is associated with other undesirable behaviors. (creates states that aren’t conducive for encourage the behaviors you want)
How do animals get trained to do complex (and sometimes stupid) things?
- You can’t simply reinforce a complex behavior as it may not be done accidentally.
- Use chaining (chained learning) to create a series of reinforced behaviors that build on each other (start with something simple and keep adding one step at a time till you get something that looks much more complex)
squirrel on waterskis
S (See platform) –> R (Stand on platform) –> O (Food reward)
S (See handle) –> R (Stand on platform + place paws on handles) –> O (Food reward)
Operant vs. classical conditioning
Classical conditioning
• Passive: environment works on animal.
• UStimulus evokes a
response.
• Animal learns that the CS predicts the US.
• Typically simple associations.
Operant conditioning:
• Active: animal operates on environment.
• A behavioral response
produces an outcome.
• Animal learns that behavior predicts an outcome.
• More flexible and powerful, producing more complexity.
However, the two often work together (e.g. primary and secondary reinforcers can become associated classically).
Evaluating situations to ID what kind of conditioning or paradigm
is it passive or active?
what’s being associated?
the more complex a behavior the more likely it’s operant conditioning
Brain-based models for operant conditioning
any instance of operant conditioning involves the interaction of several neural systems.
Law of effect
origins of operant conditioning
states that animals make associations between voluntary behaviors and
contingent outcomes.
_____ make a behavior more likely
reinforcers
_____ make a behavior less likely
punishers
Both reinforcers and punishers can be due to
intrinsic preferences (primary) or learned associations with intrinsic preferences (secondary).
When you add something to the outcome (give a treat or shock), that is
positive
When you take away something (pain or freedom),
that is
negative
_____ may not always be as effective as reinforcers in the real world, but are equally effective in the lab.
punishers