Lecture 24 - Flashcards
Time out
NOT extinction – negative punishment (removing broad range of +ve reinforcers). Brief, safe procedure, but not without potential problems.
Ability to implement
Time out should remove reinforcers
Is time in reinforcing?
Abuse because reinforcing for parent or teacher remove aversive stimulus.
Premacks principle
The Premack principle is a theory of reinforcement that states that a less desired behavior can be reinforced by the opportunity to engage in a more desired behavior.
What will be reinforcing?
Transituational reinforcement
Theory focuses on general causal stimuli.
Reinforcers and punishers form unique and independent sets of transituationally effective stimuli.
Consummatory responses (unreinforcible?).
Relatively long-term deprivation necessary in order to use one of these gold standards
Premack experiment
Part 1: Non-deprived rats
Measure baseline rates of running and drinking (BASE). More running (○) than drinking (●).
Now the running wheel only active for short periods following some drinking (FR 30). Drinking increases.
Part 2: Deprive the rats of water.
Baseline drinking now much higher (BASE).
Arrange it such that following drinking, running wheel is activated, and rat is forced to run (FR 15, FR 5). Rate of drinking decreases
Problems for the idea of transituational reinforcers.
In Part 1, “running” reinforces a consummatory response (drinking). This should not occur.
In Parts 1 and 2, wheel-running was both a reinforcer and a punisher of drinking. Transituational reinforcement doesn’t allow this dual role.
PART 1 - What you have is you have some rats, and you have your rats in your cage is and you don’t deprive them of food or water and you take them out every day for 10 minutes and you put them in a special little operant chamber and in it is a spigot that they can lick in order to get water and we count the number of licks they make and there is also a running wheel. Mice or rats can be seen to spontaneously go into the running wheel for a jog in pet shops for example. So we take out out rats and dump them into the chamber and then we measure how much of that 10 minutes, that 600 seconds on the y axis and what we can see is here’s the amount of time they spend running with the black dots and the amount of time they spend drinking with the white dots. On a day by day basis we can see that they spend more time running than they are drinking. Drinking should be a reinforcer in its our right. So then what he did is that he could control the running wheel and lock it up so that the rats couldn’t run unless they made 30 lickks of water and if they made 30 licks of water, the wheelwould free up, and then they could go for their run. What we see on the middle panel of the graph, in the FR30, is that the rate of running goes down right since the wheel is locked and they cannot do as much running as before. But the rate of drinking goes up, by following drinking with running, we actually reinforce drinking and the frequency of drinking goes up so we can use running as a reinforcing. So running is a reinforcer for rats and can even reinforce a consummatory response like drinking, using running, and the final panel is the return to base line which just shows that they go back to the way they were.
PART 2 - Now a slight variation is done… for a few hours before they are put in the chamber the water is taken away in their home cage and we put them back in the chamber and see what happens. What we see now in the ten minutes/600 seconds they spend a lot of time drinking and now far less time running than they do to drinking here or to amount of running they did in the first experiment. So we’ve changed the probabilities of those two behaviours. He then set it up so that if the animals drank, they were forced to run ( a little motor would start the running wheel). When this happens what occurs is that the amount of running goes up due to forcing them to run but the amount of drinking goes down. FR15 (every 15 licks you are forced to go for a run) for example shows that the amount of drinking has gone down quite a long way compared with what it was at baseline.
Now is running in a running wheel, a reinforcer or a punisher for rats? It can be both. So the idea of trans-situational reinforcers I.e. the idea that something is an inherent property of a thing that this that this thing is a reinforcer and this thing is a punisher does not make sense from the results of this experiment. In various contexts, the same activity can be both a reinforcer and a punisher.
So in parts one and two, running was both a reinforcer and a punisher of drinking.Trans situational reinforcement theory doesn’t allow for this kind of dual role
Premack’s approach to what reinforcement is has had lasting effects on behaviour modification and applied psychology generally
Challenges concept of reinforcers as stimuli. Instead, behaviours are characterised as
either high probability or low probability. Behaviour is reinforced when it is followed by higher probability behaviours.
Probabilities of behaviour
Challenges concept of reinforcers as stimuli. Instead, behaviours are characterised as either high probability or low probability. Behaviour is reinforced when it is followed by higher probability behaviours.
What will be reinforcing in a situation is high probability behaviour
e.g. if we deprive a rat of water for a few hours, drinking becomes a high probability behaviour and we use drinking as a reinforcer
So a behaviour is reinforced when performing one behaviour gives the opportunity to perform a behaviour of a higher probability (e.g. so instead of food being the reinforcer, eating is the reinforcer)
Can measure the probabilities of behaviours prior therefore should be able to predict what behaviour will reinforce other behaviours in a situation.
The probabilities of behaviours can vary from situation to situation or even as a function of time.
Can tailor the reinforcers to that particular individual by looking at what the free rates of behaviour are for these activities (do they do it in their spare time a lot? For example for me it would be Netflix or something whereas for dad it would be playing on the piano)
In some situations, some behaviours have high probabilities than others e.g. using food as reinforcement to answer a question in a lecture is going to be easier an hour before lunch than an hour after lunch because the probability of them wanting to eat something will change in these two situations
Usually works better before compared to after
Major influence on behaviour modification.
Increases scope of what can be an effective reinforcer.
Procedures for identifying reinforcers and punishers are clear, yet relatively unobtrusive.
Reinforcers can be tailored for specific situations and to specific individuals
Deprivation a means of changing probabilities of certain behaviours in a situation.
Examples of premacks in human studies
Premack principle to control classroom behaviour of nursery school children (Homme et al., 1963)
High probability behaviours were running, screaming, pushing chairs, and doing jigsaws.
Low probability behaviours were sitting quietly and attending.
Sitting quietly was intermittently followed by sound of bell and instruction “Run and scream”. After a while, a signal to stop and engage in another low or high probability behaviour.
Notes
The situation was classroom behaviour of nursery school Children. One of the things they wanted to do is to teach the children the skills of being able to sit quietly and attend whilst someone reads a story or talks to them or something so they’ll be prepared when they make the transition into grade one. This itself is the low probability behaviour and the high probability behaviours were running, screaming, pushing chairs and doing jigsaws. So what they did is they took the kids and they set up a contingency. Kids were told to sit quietly and that they were going to do this for five minutes, 10 minutes, and then we’re going to sound the bell and when we sound the bell, we will tell you to run around and scream. This worked really well and then instead of having to sit in a team quietly for five minutes, it was 10 minutes on then it was 15 minutes, and then it was 20 minutes. At the end of this gradual time increase the bell would ring and they could get up and run and scream. According to this semi-anecdotal paper by the end of it, if you walked into the room the kids would sit and be well behaved until the bell went and then you would have all the running, screaming, pushing chairs and doing jigsaws.
Mitchell, W.S., & Stoffelmayr, B.E. (1973). Journal of the Experimental Analysis of Behavior, 6, 419-423.
Two chronic schizophrenics: Coil stripping as part of their “industrial therapy”.
Reinforcers such as cigarettes, sweets, or fruit, had proved ineffective.
Used these as they refused to take part and were trying to encourage them to
The high probability behaviour was sitting down doing nothing obvious. Being able to sit was made contingent on coil stripping.
Notes
The experimenters measured what the high probability behaviour for these 2 chronic catatonic schizophrenics were and that was sitting down doing nothing obvious
So what they did is that they took their chairs away and then gave them back after they spent a certain amount of time engaging with other people in the sheltered workshop
Baseline shows no change in performance, they are doing nothing
Instructions only is that at regular intervals someone would come up and ask them nicely to come and join everybody else and take part in the group activity and you have nothing happening here either
Then have a change in procedure where the instructions were followed by the removal of the chair and we can see that contingent upon the chair being given back and for being able to sit down as do nothing obvious for a while is that the rate of the engagement behaviour has increased.
Then returned to instructions only and in both cases the rate of behaviour goes down
Then moved into a more ecologically sound version of this where they arranged the contingencies that they could keep the chair and keep sitting down as long as they engaged in a reasonable amount of activities with everybody else and the chair would only be taken away if there was a long period of time without engaging in activity and can see that sitting whilst working for both of these participants was giving a nice sort of engagement in this activity
The probability/frequency of beahviours depends on the
context and this is what we call stimulus control in operant conditioning
Stimulus control in instrumental conditioning
To be effective in the environment, instrumental responses must occur at appropriate occasions. (At the right time)
Antecendent-Behaviour-Consequences
Also can be seen as stimulus-response-outcomes
Often called the ABCs of behaviour
The frequency and likelihood of certain behaviours changes as a function of the situation
Antecedent stimuli control (cue, signal) instrumental behaviour.
Controls, cues or signals certain behaviours, the behaviour itself and the consequences of that behaviour
Antecedents change the likelihood of behaviours because those behaviours have different consequences that are signalled by the antecedent signal
Antecedent-behaviour-consequences
Also can be seen as stimulus-response-outcomes
Often called the ABCs of behaviour
The frequency and likelihood of certain behaviours changes as a function of the situation
Antecedent stimuli control instrumental behaviour
Controls, cues or signals certain behaviours, the behaviour itself and the consequences of that behaviour
Antecedents change the likelihood of behaviours because those behaviours have different consequences that are signalled by the antecedent signal
Stimulus generalisation and discrimination
Extent that stimulus dimensions control behaviour
If you want to look at stimulus control then you tend to use quite specific stimulus dimensions
Effects of reinforcement and discrimination training
(Jenkins & Harrison, 1960, 1962)
In this particular study, it shows the affects of different sorts of training and how it influences the degree to which you get discrimination occurring and the degree to which you get generalisation occurring
Pigeons have been trained to peck a key, initially see the filled dots, the animals are trained, they are put into an experimental chamber and there is a key and they can pick on the key and occasionally on a VI schedule they will get food (e.g. they will get access to wheat for four seconds for pecking on the key). So it is occurring unpredictably but once a minute roughly they get food on average but will maintain a nice high rate of behaviour and they will probably peck the key 60-80 times a minute in these sorts of conditions. So we train them up while they’re doing this, in the background is a tone, and that tone is sounding at 1000 Hz (pitch). Then what we do is we test these pigeons by putting them into the exact same appartus, we never provide reinforcement, it is just a quick test - lets say for 1 minute its 1000 Hz tone, for the next minute it is 2000 Hz, and the next minute it is 500Hz and we see whether the rate of responding changes as we change the frequency of the tone. In the situation just described, the proportion of all the behaviour that occurred during the whole test that occurs to each of the stimuli is pretty similar. The maximum is to the stimulus that was trained, but they also respond a lot to any other tone. Now take another group of pigeons and will train them on effectively the same task, but with a slightly different procedure. Now, for two minutes, pecking the key will provide food, and there is a 1000 Hertz tone playing and after the 2 minutes has elapsed the tone goes off. You can keep on pecking the key but you will not get the food this is extinction in other words. Do this for a week or two, bring them and we get in half on hour or so each day doing this task. Very quickly, our birds will learn to make that discrimination. When the 1000 tone is playing, they peck like crazy and when the tone goes off, they’ll just go for a wander around and look for some food and bob the heads around. So you might end up with a response rate of 60 responses per minute during thing reinforced stimulus (S+) and one or two responses per minute during the unreinforced stimulus (S-). Now let’s run exactly the same test on these pigeons again and see what happens and that’s what we have here with the squares and you see a very different shape, get a maximum proportion of responding to the training stimulus and you still get responding to those stimuli that are most like the training stimuli. The further you get away from it, the fewer responses the animal makes. So we are getting still some generalisation, but more discrimination is occurring. Now we’ll take another group of pigeons will run the same study, but we’ll make that discrimination even more salient - what we will do is when the 1000 Hertz tone (S+) is playing, they get food and a tone of 950 hertz (S-), which is reasonably similar, is playing then you don’t get food. We train them up, they can make that discrimination and we get far more responding to the 1000 Hz tone than we do for the 950Hz tone and then we give them the test which is the filled triangles and as you can see you get responding close to that particular training stimulus but see how quickly and suddenly it drops down and you do not have to get very far away from it for this to happen/for discrimination to have its effect. So what we see in these graphs, each of these graphs shows some generalisation and it shows some discrimination. The degree of generalisation or discrimination you get is a function off this sort of experience training exposure that the organism has had to those stimuli before we do the test and so the world shapes behaviour in this way.
Generalisation of punishment - Honig & Slivka (1964)
VI baseline rewards all key colours (wavelengths of light) equally.
Then add occasional punishment for responses to 550nm only. Using colours (wavelengths of light) and training these birds with a range of colours and with each of these colours is the same schedule of reinforcement. Whatever colour is being used, there is always food there. And what happens if you look at the responding on all of these particular stimulus is that during the VI baseline you get similar responses per minute to each of the colours so there are similar responses to each colour. Then they gave the animals slight electric shocks during one of the stimuli so pecking the key at 550 nanometers would occasionally produce mild electric shocks and you can see how the response rate changes dramatically to that stimulus. The response rate collapses at that particular value but you also see how that spreads to the other stimuli in the situation and stimuli close to that value also show dramatic reductions. As training progresses (lets look at open triangles), they get better at that discrimination and the responding here in test days 1-3 improves to these other stimulus that are a bit more distant from the punished stimuli. There is a feature of this graph that is really important … one of the nasty side effects of punishment is, before punishment introduction there was a very high rate of responding, once the punishment was introduced even after some training, it had suppressed the response rate of everything there even other behaviours that can be reinforced. This is one of the risks of using especially strong punishers in situations as it may not only suppress the particular behaviour you're trying to diminish, but it can also to suppress a lot of other behaviours, many of which may be theones that you actually want in that situation and may be appropriate. So punishment isn't something that is particularly favoured in people who were designing behaviour modification programmes.