Instrumental conditioning Flashcards

Question 1

Q

What did Thorndike and his puzzle boxes demonstrate?

Answer

A

That animals learn a response to get access to a reinforcer
Learning achieved through simple trial and error, with no evidence of any active understanding of how to solve the problem
Learn from consequences of actions

Question 2

Q

What was Thorndike’s “law of effect”?

Answer

A

Any behaviour followed by pleasant consequences is likely to be repeated and any behaviour followed by negative consequences is likely to be stopped

Question 3

Q

How does operant conditioning resemble/differ from classical conditioning?

Answer

A

Similarities - Basic principles e.g. acquisition, extinction, spontaneous recovery and stimulus generalisation
Differences - instrumental involves animal having to DO something to get reinforcement i.e. animal has control, voluntary responses

Question 4

Q

What are the 3 basic concepts within instrumental conditioning?

Answer

A

R - instrumental response e.g. lever press
rF - reinforcement e.g. access to food
SD - discriminative stimulus i.e. stimulus that informs animal about availability of reinforcement

Question 5

Q

What happens in operant conditioning experiments using the skinner box?

Answer

A

Rat needs to be allowed to perform the lever press to obtain food
As trials continue, number of presses per minute steadily increase as lever press becomes more strongly associated with food reward

Question 6

Q

What does the law of effect suggest regarding what animals actually learn in operant conditioning experiments?

Answer

A

By associative learning theory, we have a node that corresponds to the discriminative stimulus i.e. lever, and a node for response (lever press) and reinforcement (food)
Law of effect would suggest association only exists between SD and R i.e. lever and lever press, with reinforcement having modulatory effect aiding establishment of association (initial reinforcement attracts animal close to lever)
Animal learns to press lever through trial and error and learning from consequences but cannot predict consequences of behaviour because no link between response and rF or between SD and rF

Question 7

Q

How did Adams and Dickinson test this theory?

Answer

A

PHASE 1 - both exp and control group receive same treatment, presented with lever which released food when pressed so SD-R association learned
PHASE 2 - Devaluation of reinforcer via induction of illness; exp group received food and illness close together so association formed; tests whether the food actually has impact on SD-R association
Law of effect would predict that devaluing the reinforcer in this way will not affect results of test - SD-R relationship should still exist and both groups should still perform the same

Question 8

Q

What did Adams and Dickinson find?

Answer

A

Animals in exp group stopped responding to large extent while those in control continued high rate of pressing
Challenges law of effect, demonstrating that reinforcement is part of the association and we need to account for its value in understanding how learning occurs

Question 9

Q

How do the behaviourist movement and the modern learning theory differ?

Answer

A

Behaviourism - only associations between stimuli and responses established during learning episodes
Modern - more interactive relationship wherein associations may develop between different stimuli, responses and outcomes (reinforcers) in standard conditioning tasks

Question 10

Q

What are the different procedures used to develop understanding of operant conditioning?

Answer

A

Positive procedures - response produces presentation of an event, can be reinforcement e.g. food, or punishment e.g shock
Negative procedures - Response terminates presentation of event, can be omission e.g. no food (aversive), or escape/avoidance e.g. avoiding a shock(appetitive)

Question 11

Q

How else can the procedures be classified?

Answer

A

Reinforcement - Increase response rate; can be positive i.e. using food to encourage lever press, or can be negative i.e. learning that pressing lever allows escape and avoidance of shock
Punishment - Decrease response rate; can be positive e.g. giving shock when press lever thus discouraging further pressing, or negative i.e. something appetitive being taken away (common strategy when children misbehave)

Question 12

Q

What us the conditioned emotional response procedure

Answer

A

STEP 1 - Training in instrumental task (lever press –> food)
STEP 2 - Classical conditioning training once lever associated with food (tone–> shock) - shock is aversive and is the US, interrupting the lever pressing (the UR); CR is when tone (CS) alone interrupts lever pressing

Question 13

Q

What do we need to take care with in order for the CER procedure to work correctly?

Answer

A

How we conduct step 1 - if animal gets food reward every time presses lever, in short time it will be satiated and no longer motivated to press lever - at this point no classical conditioning will be observable
To avoid this we need to limit number of rewards during a session - has been shown to effectively increase response rate for long periods

Question 14

Q

What is meant by an interval schedule of reinforcement?

Answer

A

Present a reward every then and again e.g. every minute

Question 15

Q

What is meant by a fixed interval schedule?

Answer

A

Reward for responding after a fixed period of time since last reward - during the interval any further lever presses have no effect but once interval up as soon as lever pressed the reward will be received. Animals somehow learn about time and we see responses only after the intervals; responding is not uniform

Question 16

Q

What would we see if we used an FI-60” schedule in the CER procedure?

Answer

A

Plateaus where rats have learned to wait approx. a minute before pressing lever again, and that is where we see the increase in responding leading up to each reward point
In the classical conditioning stage the plateaus also represent time when no responses because tone is played and the rat knows to expect a shock

Question 17

Q

What is meant by a variable interval schedule?

Answer

A

Animal rewarded for responding after a non-fixed period of time since last reward presented
VI-60” means we have an AVERAGE of 60secs between each reward but not each interval will be 60sec - the animal cannot learn the mean interval and cannot anticipate when reward will be presented so responding is uniform

Question 18

Q

How does classical conditioning interact with a variable interval operant conditioning schedule?

Answer

A

When tone played, it disrupts the responses as rat expects a shock - we see a plateau
As soon as reward presented, it immediately starts responding again, however

Question 19

Q

What is meant by a ratio reinforcement schedule?

Answer

A

Present reward whenever accumulate number of responses

Question 20

Q

What is a fixed ratio schedule?

Answer

A

reward after fixed number of responses - animal somehow learns to count e.g. learns reward after 5 presses and doesn’t respond immediately afterwards; respond is not uniform

Question 21

Q

What happens when we combine the CER procedure with a fixed ratio schedule?

Answer

A

Rats respond with five presses until reinforced, and don’t start again until conditioned stimulus i.e. tone has stopped

Question 22

Q

What is a variable ratio schedule?

Answer

A

animal gets reward after non-fixed number of responses performed - e.g. VR-3 means number of responses required before reinforcement changes around an average of 3 presses, animal can’t learn number required so responds uniformly
Learning is quick and responding is usually high because they don’t know when the reinforcement will be delivered

Question 23

Q

So in the CER procedure, what schedule should be used to ensure the animals respond at a high rate and uniformly throughout session?

Answer

A

Variable ratio (or variable intervals) - best options to show how classical conditioning affects this type of learning as the animal is responding continuously

Question 24

Q

Summarise the importance of the CER procedure

Answer

A

When we pair tone and shock in classical conditioning, learning is silent - rat simply freezes so we don’t know whether it has associate the two because it is simply not moving, nothing to measure
Having learned through instrumental conditioning to press the lever, however, we can see an observable change in behaviour - if rat has learned tone is aversive, it will stop pressing lever (equivalent to summation tests to look for silent learning)

Question 25

Q

Why would we choose a variable schedule for the CER procedure?

Answer

A

With fixed options we don’t know if behaviour stops because of the tone or because animal has learned to tell time/count
We don’t know how they are integrating the classical and instrumental conditioning in these options

Question 26

Q

What is a problematic real-world example of where we see instrumental conditioning by a variable ratio schedule?

Answer

A

Slot machines - probability of jackpot is constant while number of responses (coins entered) needed to obtain reward is variable
Gambling addictions illustrate the strength and power of this type of schedule - constant responses to try to win because they don’t know when reinforcement will arrive but they know the reward does exist

Question 27

Q

What is meant by the misbehaviour of organisms?

Answer

A

Natural behaviour traits unresponsive to whether reinforced or not e.g. pigeons will peck illuminated disk even when it prevents/limits food presentation, natural behaviour difficult to change