6.2—operant conditioning: learning through consequences Flashcards
1
Q
Operant Conditioning
A
- operant conditioning: a type of learning in which behaviour is influenced by consequences
- very few of our behaviours are random; people tend to repeat actions that previously led to positive or rewarding outcomes
- if a behaviour previously led to a negative outcome, people are less likely to perform it again
- operant conditioning involves voluntary actions (e.g. speaking, listening, starting and stopping and activity, moving toward or away from something)
2
Q
Contingency
A
- contingency: a consequence depends upon an action
- this is important to operant conditioning
- e.g. earning good grades is generally contingent upon studying effectively
- the consequences of a behaviour can be either reinforcing or punishing (figure 6.10)

3
Q
Reinforcement
A
- reinforcement: a process in which an event or reward that follows a response increases the likelihood of that response occurring again
- Thorndike (1905); cats in puzzle boxes were able to escape more rapidly over repeated trials because they learned which responses worked (figure 6.11)
- law of effect: responses followed by satisfaction will occur again; those not followed by satisfaction become less likely

4
Q
Reinforcer
A
- reinforcer: a stimulus that is contingent upon a response, and that increases the probablility of that response occurring again
- B.F. Skinner; behaviourist influenced by Thorndike
- operant chambers: (or Skinner boxes): include a lever or key that the subject can manipulate; pushing the lever may result in the delivery of a reinforcer (e.g. food)
5
Q
Punishment and Punisher
A
- punishment: a proces that decreases the future probability of a response
-
punisher: a stimulus that is contingent upon a response, and that results in a drecrease in behaviour
- are not based on the stimuli themselves, but their effects on behaviour
- e.g. yelling, losing money, or going to jail will all make it less likely that a particular response will occur again
6
Q
Positive Reinforcement
A
- positive reinforcement: the strengthening of behaviour after potential reinforcers such as praise, money, or nourishment follow that behaviour (table 6.2)

7
Q
Negative Reinforcement
A
- negative reinforcement: the strengthening of a behaviour because it removes or diminishes a stimulus (table 6.2)

8
Q
Avoidance Learning | Negative Reinforcement
A
- avoidance learning: a specific type of negative reinforcement that removes the possibility that a stimulus will occur
- e.g. taking a detour to avoid traffic on a particular road
- brain-imaging scans show a region of the frontal lobes (the orbitofrontal cortex) show incrased activity when successfully avoiding a negative outcome
- avoidance learning (negative reinforcement) uses some of the same brain networks as positive reinforcement
9
Q
Escape Learning | Negative Reinforcement
A
- escape learning: occurs if a response removes a stimulus that is already present
- e.g. covering your years when you hear really loud music
- you can’t avoid the music, because it’s already present, but you can escape it instead
10
Q
Positive Punishment
A
- positive punishment: a process in which a behaviour decreases in frequency because it was followed by a particular, usually unpleasant, stimulus
- e.g. cat owners using a spray bottle
11
Q
Negative Punishment
A
- negative punishment: when a behaviour decreases because it removes or diminishes a particular stimulus
- e.g. when a parent grounds a child
12
Q
Primary and Secondary Reinforcers
A
-
primary reinforcers: reinforcing stimuli that satisfy basic motivational needs—needs that affect an individual’s ability to survive (and, if possible, reproduce)
- e.g. food, water, shelter, and sexual contact
-
secondary reinforcers: reinforcing stimuli that acquire their reinforcing effects only after we learn that they have value
- e.g. money and praise
- they are abstract and don’t directly influence survival-related behaviours
- the nucleus accumbens becomes activated when processing rewards (i.e. both primary and secondary reinforcers)
- variations in this area are why different people differ in their motivations for reinforcers
- when behaviour is rewarded, dopamine is released
- dopamine-releasing neurons in the nucleus accumbens and surrounding areas keep track of which behaviours are, or are not, associated with a reward
- they are involved with learning new behaviour-reward associates as well as reinforcement itself
13
Q
Discriminative Stimulus
A
- discriminative stimulus: a cue or event that indicates that a response, if made, will be reinforced
- e.g. before pouring a cup of coffee, we check if the light on the coffee pot is on; a discriminative stimulus tells us that the beverage will be hot and, presumably, reinforcing
- e.g. you will only ask to borrow your parents’ car when they’re in a good mood
- these stimuli demonstrate that we can use cues from our environment to help us decide whether or not to perform a conditioned behaviour
14
Q
Generalization
A
- generalization: when an operant response takes place to a new stimulus that is similar to the stimulus present during original learning
- e.g. a child petting, laughing, and playing with a border collie may lead to him becoming more likely to pet other dogs, or even other furry animals
- in operant conditioning, discriminating and generalization are controlled by dopamine-secreting neurons
- compare this to classical conditioning, were these two effects were due to the strengthening of synapses as a result of simultaneous firing
15
Q
Delayed Reinforcement and Extinction
A
- Thorndike (1911) noticed that reinforcement was more effective if there was very little time between the action and the consequence
- this difference is due to the greater difficulty in associating the reinforcer with the behaviour
- e.g. drugs that have their effect soon after they’re taken are generally more addictive than drugs whose effects occur several minutes or hours afterwards
-
extinction: the weakening of an operant response when reinforcement is no longer available
- e.g. if you lose your Internet connection, you’ll stop trying to refresh your browser because there’s no reinforcement for doing so
16
Q
Reward Devaluation
A
- behaviours do change when the reinforcer loses some of its appeal
- experiment: rats are trained to press two different levers, each with a different reward (i.e. two different rewarding tastes); if experimenters pre-feed the animal with one of these two tastes, they will crave it less than the other
17
Q
Shaping
A
-
shaping: a procedure in which a specific operant response is created by reinforcing successive approximations of that response
- e.g. toilet training; shaping is done in a step-by-step fashion until the desired response is learned
- chaining: linking together two or more shaped behaviours into a complex action or sequence of actions
- e.g. animal actors in movies were almost certainly trained through lengthy shaping and chaining procedures
- applied behaviour analysis (ABA): using close observation, prompting, and reinforcement to teach behaviours, often to people who experience difficulties and challenges owing to a developmental condition (e.g. autism)
18
Q
Schedules of Reinforcement
A
- schedules of reinforcement: rules that determine when reinforcement is available
-
continuous reinforcement: every response made results in reinforcement
- e.g. vending machines deliver a snack every time the correct amount of money is deposited
-
partial (intermittent) reinforcement: only a certain number of responses are awarded, or a certain amount of time must pass before reinforcement is available (figure 6.14)
- e.g. phoning a friend only gets an actual person on the other end of the call some of the time

19
Q
Ratio Schedules and Interval Schedules
A
-
ratio schedules: the reinforcements are based on the amount of responding
- tend to generate relatively high rates of responding
- interval schedules: based on the amount of time between reinforcements
- fixed schedule: the schedule of reinforcement remains the same over time
- variable schedule: the schedule of reinforcement, although linked to an average, varies from reinforcement to reinforcement
-
fixed-ratio schedule: reinforcement is delivered after a specific number of responses have been completed
- e.g. a rat is required to press a lever 10 times to receive food
-
variable-ratio schedule: the number of responses required to receive reinforcement varies according to an average
- e.g. in a VR5 experiment, trials could involve seven level presses, followed by four, six, three, and so on; but the average is five
- in animal studies, variable-ratio schedules lead to the highest rate of responding of the four types of reinforcement schedules
-
fixed-interval schedule: reinforces the first response occuring after a set amount of time passes
- e.g. if your professor gives you an exam every three weeks, your reinforcement for studying is on a fixed-interval schedule
-
variable-interval schedule: the first response is reinforced following a variable amount of time
- e.g. if yuo’re wathing a meteor shower, you’d be rewarded for looking up at irregular times; a meteor may fall on average every 5 minutes, but there will be times of inactivity for 1 minute, 8 minutes, 10 minutes, and so on

20
Q
Partial Reinforcement
A
- partial reinforcement effect: a phenomenon in which oraganisms that have been conditioned under partial reinforcement resist extinction longer than those conditioned under contiuous reinforcement
- e.g. people are only intermittently reinforced for putting money into a slot machine, but a high rate of responding is still maintained and may not drecrease until afte a great many losses in a row
- this effect is likely due to the fact that the individual is used to not receiving reinforcement for every response, so a lack of reinforcement isn’t surprising and doesn’t alter the motivation to produce the response when the reinforcement isn’t available
- reinforcement and supersitition; whether a supersitition affects your performance is based on whether or not you allow it to
21
Q
Applying Punishment
A
- people tend to be more sensitive to the unpleasantness of punishment than they are to the pleasures of reward
- e.g. in an experiment, univeristy students playing a computer game found losing $100 to be three times more punishing than gaining $100 was reinforcing
- the use of punishment raises some ethical concerns, especially when it comes to physical means
- while punishment may suppress an unwanted behaviour temporarily, by itself it does not teach which behaviours are appropriate
- punishment of any kind if most effective when combined with reinforcement of an alternative, suitable response
22
Q
Classical and Operant Conditioning
A
- some may want to think of behaviour as being due to either clssical conditioning or operant conditioning
- but complex behaviour is influenced by both types of learning, each influencing behaviour in slightly different ways
- e.g. playing slots
- uses a variable-ratio schedule of reinforcement (a type of operant conditoning) that leads to a high response rate
- but the flashing lights and sounds, maybe even the chair, all serve as conditioned stimuli for the unconditioned response of excitement associated with gambling