4-OperantConditioning Flashcards
What’s the difference between Pavlovian conditioning and Operant conditioning?
Pavlovian conditioning relies on formations of reflexive associations between stimuli, resulting in involuntary responses; operant conditioning relies on consequences of past actions influencing future behaviour, resulting in increase or decrease of voluntary behaviours
What simple principle does operant conditioning operate on?
Consequences lead to change in voluntary behaviour; actions that result in a reward tend to be repeated or become more frequent; actions that result in punishment tend to be avoided or become less frequent
Who put cats inside ‘puzzle boxes’ where they could escape box by pulling a string, stepping on a platform, and turning a latch on the door?;
What basic thing did he find?
Edward Thorndike;
Cats get quicker at this with experience
According to Thorndike, responses accompanied or closely followed by satisfaction to the animal will what?
Be more firmly connected with the situation, so that, when it recurs, their responses will be more likely to recur
When Skinner put animals in a box, with different things for them to manipulate (e.g electric shocks from floor; food dispenser & lever), how would he teach them the desired behaviour of pressing the lever?
Either wait, or through shaping (selectively reinforce any behaviour resembling the target that could lead to desired behaviour)
Skinner (1948) discovered that if you randomly reward pigeons, they develop what?;
Random reinforcement shapes behaviour. The reinforcement is correlated with the pigeon’s movements but there is no what?
‘Superstitious behaviour’ – they start to believe that things they do cause the random rewards;
Causation
How do humans behave superstitiously?
Even if there is actually no true association between a behavior and an outcome we expect and try to find links (lucky charms, rituals, pedestrian crossings, etc)
Many behaviours are made up of smaller behaviours.
Describe Chaining;
Why is backward chaining more effective?
Shaping a behaviour by teaching in bits and pieces; can be done forwards or backwards;
If you start with the last behaviour in the chain & work backwards, they’ll know the reward is coming after the last step so they’ll continue learning
What’s the point of classifying consequences of behaviour?
Different ways of altering behaviour work in different ways; if you want to alter behaviour, you have to understand the differences
In regards to reinforcers & punishers, the consequence of one’s actions after a behaviour, R, follows a stimulus, S, determines what?
The likelihood of that behaviour happening again when the next instance of the stimulus occurs
What’s the difference between Positive & Negative Reinforcement? Provide examples of each
Positive is when stimulus is added to increase desired behaviour (e.g. given ice-cream after doing homework); Negative is when a stimulus is removed to increase desired behaviour (e.g. let off chores for doing homework)
What’s the difference between Positive & Negative Punishment? Provide examples of each
Positive is when stimulus is added to decrease undesired behaviour (e.g. getting smacked); Negative is when a stimulus is removed to decrease undesired behaviour (e.g. losing license)
Name the two different schedules of reinforcement?;
What’s the difference between ratio & interval?
Continuous (CRF) - each response; Partial (PRF) - intermittent (only sometimes);
Ratio is amount of responses before reinforcement (e.g. every 10 times); interval is time before reinforcement (e.g. every 10 mins)
How does Fixed Ratio (FR) work?;
Variable Ratio (VR)?;
Fixed Interval?;
Variable Interval?
Response is reinforced every nth time (e.g. newspaper delivery);
On average, every nth - unpredictable (e.g. gambling);
First after n seconds (e.g. waiting for bus);
On average, first after n seconds - unpredictable (e.g. checking email)
Which 2 reinforcement schedules show a post-reinforcement pause?
FR (have a break as they know reward is coming after n times) & FI (wait until they think the time’s coming then ramp up behaviour, no point before then - leads to a scalloped pattern)
Under which conditions will behaviour continue at a constant rate & why?
VR & VI, as they don’t know when the reward is coming so they keep trying
Which reinforcement schedules are the most efficient?;
Why are VR schedules most resistant to extinction?
Ratio
Organism will persevere if sometimes rewarded so harder to extinguish
Which schedules of punishment are most effective?;
Is reinforcement or punishment more effective?
Continuous;
Reinforcement - can increase repertoire of desirable behaviours in order to decrease undesirable
According to Skinner, what is a problem with punishment?;
What are some other problems?
Not as permanent as reinforcement;
Reduces trust/increases aggression; more difficult in the real world than in the lab so not as effective
List some ways to punish effectively
No escape; as intense as possible (within limits); continuous schedule; no delay; over a short period; no subsequent reinforcement
What are some side effects to look for after punishment?;
What’s a better approach to avoid these?
Changes in other behaviours; aggression; fear; modelling of violence; learned helplessness;
Use strategies that will help in the long term rather than being carried away in the moment (e.g. reinforce appropriate behaviour)
Apart from schedule, what other 3 variables affect conditioning?
Drive, Size & Delay
If I reward you for doing your homework with your favourite snack, should I do this when you are hungry or when you’ve just eaten?;
Why?
Hungry;
Reinforcement depends on how much the organism wants the reinforcer; More drive for hungry organism vs sated organism
Should I reward you for doing your homework with a large serve of your favourite snack or one bite of your favourite snack?;
Why?;
What problem can occur with this?
Large serve; In operant conditioning, size does matter (animals in a Skinner box learn faster if they get more food pellets); Diminishing return (behaviour will taper off as reward increases; the more you have the less 1 more will count)
What is the speed of acquisition for large/desired stimulus compared to smaller/undesired stimulus?;
What’s the speed of extinction?
Faster;
Also faster - change of reinforcement & change in behaviour (law of diminishing return)
Is it better to reward you for doing your homework with your favourite snack right now or in half an hour?;
Why?
Right now;
Delay reduces the effect; we prefer immediate rewards or have less motivation; also harder to link consequences (as in delayed punishment)
Skinner argued that the Three-Term Contingency are the basis of operant conditioning. What are they?
The discriminative stimulus (sets the occasion); The operant response (the behaviour); The outcome (i.e.reinforcer/punisher) that follows (the consequence)
When does Stimulus Control occur?
When does the behaviour happen?
When behaviour comes to be under the control of the stimulus (much of our everyday behaviour);
Only when the stimulus is present & not when absent
When you’re training a dog, if it jumps to the verbal cue “up”, but also to the verbal cue of “ah”, or “gulp”, or “here”..etc…, what is occurring?
Stimulus Generalisation
Define Stimulus Generalization
When a response is reinforced in the presence of one stimulus there’s a general tendency to respond in the presence of new stimuli with similar physical properties or have been associated with the stimulus (loose degree of stimulus control)
If the dog only jumps to the verbal cue “up”, but not to the verbal cue of “oops”, or “here”……or the tap on the plate, or a clap, what has occurred?
Stimulus Discrimination
Define stimulus discrimination;
How is it taught?
Degree to which antecedent stimuli set the occasion for particular responses; precise degree of stimulus control;
By using discrimination training procedures such as differential reinforcement (e.g. reward behaviour only when stimulus is present)
In the three-term contingency, what does a discriminative stimulus serves as?;
So what is key to operant conditioning?
To signal the occasion when a particular behaviour will be reinforced/punished;
Learning to discriminate the stimulus
Stimuli become signals if what?;
If they’re predictive of a consequence
Stimulus control uses discrimination and shaping why?
It’s easier to refine an existing behaviour by adding a new stimulus than what?
To make sure the organism only responds if a stimulus is present;
Make a new behaviour from scratch
Provide some examples of human behaviour that seem as if they are under stimulus control?
Traffic lights; typical talking distances; social drinkers/smokers; social behaviours