Chapter 5 Flashcards
Operant conditioning is a form of ____ learning.
Associative.
In classical conditioning, do we control the response?
No.
Classical VS operant conditioning?
In classical the outcome occurs regardless, whilst in operant the outcome is dependent on your response.
Operant conditioning is based on avoiding or obtaining a specific ____.
Outcome.
Operant conditioning requires an ____ operate in it’s environment to determine an outcome.
Organism.
Thorndike was the first to study behavioral outputs due to operant conditioning. What was his study? What was the conclusion? What is this idea known as?
Puzzle boxes. Organisms are more likely to repeat actions that produce satisfying consequences, and less likely to repeat actions that do not. This idea is known as law of effect.
The more time an animal spent in Thorndike’s box, the ____ they learned to escape.
Quicker.
Law of effect?
Probability that a particular behavioral response increases or decreases depending on the consequences that have followed that response in the past.
____ -> ____ -> ____
Stimulus, response, outcome.
Thorndike’s learning procedures involved ____ trials. What is this?
Discrete. Operant conditioning paradigm whee the experimenter defines the beginning and end of each trial.
BF Skinner wanted to refine Thorndike’s techniques. How did he do this?
He created a Skinner box, which was opposite from Thorndike’s discrete trial. The Skinner box is a conditioning chamber where reinforcement/punishment is automatically delivered when an animal makes a response (ex: lever pressing) - in this case, the animal is in charge of what was the start an end.
Free Operant Paradigm?
Operant conditioning paradigm where the animal can operate the apparatus “freely”, responding to obtain reinforcement/avoid punishment, whenever it chooses - commonly referred to as operant conditioning
Can we see extinction in operant conditioning?
Yes.
Reinforcement?
Providing consequences to increase probability of behavior occurring again in the future.
Punishment?
Providing consequence to decrease probability of behavior occurring again the in the future.
Adding a stimulus to free operant experiments can make them more elaborate. Example?
S(Light ON)->R(Lever press)->O(Food release)
S(Light OFF)->R(Lever press)->O(NO food release)
According to Thorndike and Skinner, operant conditioning consists of what 3 components?
Stimulus, response, outcome.
Discriminative stimuli?
In operant conditioning, stimuli that signal whether a particular response will lead to a particular outcome.
Example of Shaping?
Little kid learning to write the letter “A”, it looks somewhat similar and they are rewarded it. Then he kept getting closer and closer to a proper A. Shaping: operant conditioning technique in which successive approximations to a desired response are reinforced.
Chaining? Example?
Operant conditioning technique where organisms are gradually trained to execute complicated sequences of discrete responses. I.e. Learning a complicated dance in order to receive smarties? How about you get a smartie after every correct dance move.
Operant conditioning?
Process whereby organisms learn to make responses in order to obtain or avoid important consequences
Operant conditioning
Process whereby organisms learn to make our refrain from making certain responses in order to obtain/avoid a certain outcome.
Thorndike’s cat in box
A cat was placed in a box and when it escaped it was given food so it was more likely to do it again
Law of effect
Given a particular stimulus, a response that leads to a desirable outcome will tend to increase in frequency
Classical vs operant conditioning
Classical: organisms experience an outcome (US) whether or not they have learned the conditioned response (CR)
Operant: the outcome O is wholly dependent on whether the organism performs the response R
How are operant and classical conditioning similar?
Both have a negatively accelerated learning curve (time to do something decreases rapidly and then levels off) and they both show extinction
Discrete trial paradigms
Experimented defined the beginning and end of each trial
Free operant paradigm
The animal can operate freely (i.e running into maze to get food, then running out and that’s when the trial ends.)
Reinforcement does what?
Increase probability of behavior
Skinner box
Cage with lever for food to be dispensed into a little plate thing. The animals would explore the cage and they accidentally hit it and dramatically increased their rate of responding
Punishment does what?
Decrease probability of a certain behavior
Cumulative recorder
height of the line at any given time represents the number of responses that have been made in the entire experiment (cumulative) up to that time
Discriminative stimuli
Stimuli that signal whether a particular response will lead to a particular outcome
Habit slip
Stimulus to response association is so strong that the stimulus seems to evoke the learned response automatically i.e. Making a phone call to a familiar number instead of the number you intended to dial
Protestant ethic effect
Rats that have been trained to press a lever to obtain food will often continue to work to obtain food even though they have free food in their cage
What is a response defined by?
The outcome it produces
Shaping: successive approximations to the desired response are reinforced
When a rat in a Skinner box happens to wander near the food tray, experimenter drops in a piece of food. The rat eats the food and starts to learn an association b/w the tray and food and will spend all its time near the food tray. The experimenter then changes the rules: now rat must also be near the lever before food is dropped. Once the rat has learned this, rules change again: food is dropped only if the animal is actually touching the lever, then rule changes again: only if animal is pressing down the lever.
A reinforcer is a ____.
Consequence
Primary reinforcer
Organisms have innate drive to obtain these things i.e. Food water sex sleep good temperature
Clark Hull’s drive reduction theory
Drive reduction theory says that humans are motivated to reduce the state of tension caused when certain biological needs are not satisfied.
what are some complications with primary reinforcements?
An animal will work hard for water, but once they’ve drunk enough, more water isn’t reinforcing. Also, primary reinforcers are not created equally (animal will work harder for food they like)
Secondary reinforcers
Reinforcers that initially have no intrinsic value but that have been paired with primary reinforcers i.e. Money
Token economies
Desired behavior is reinforced with a token which can be exchanged for privilege
By being paired with a primary reinforcer, secondary reinforcers become reinforcers themselves.
Organisms will work to obtain.
Why do some researchers say that animals aren’t fooled by secondary reinforcers?
They think the animals use the secondary reinforcers for providing informational feedback that behavior is on the right track for obtaining a primary reinforcer
A switch in outcome may produce a change in ____.
Responding.
Negative contrast?
organisms given a less preferred reinforcer in place of an expected and preferred reinforcer will respond less strongly for the less preferred reinforcer than if they had been given that less preferred reinforcer all along
Negative contrast example
while rats can be trained to make lever press responses to obtain either food pellets or water sweetened with sucrose, they tend to prefer the latter. If the sweetened water is used as reinforcer during the first half of each training session and food pellets as the reinforcer during the second half, rats typically make many more responses during the first half of the session
Thorndike and Skinner concluded that punishment was _______ as reinforcement at controlling behavior
Not as effective
How can discriminative stimuli for punishment encourage cheating?
A speeding driver will see a cop (the discriminative stimulus) and slow down to avoid getting a ticket. But speeding in the absence of a police car will probably not be punished. In this case, punishment doesn’t train driver not to speed. It only teaches him to suppress speeding in the presence of police cars. In effect, driver has learned to cheat
How can concurrent reinforcement undermine punishment?
The effects of punishment can be counteracted if the reinforcement occurs along with the punishment. Like suppose a rat first learns to press a lever for food but later learns that lever presses are punished by shock. Unless the rat has another way to obtain food, it’s likely to keep pressing the lever to obtain food reinforcement in spite of punishing effects of shock.
How can punishment not end how you want?
Punishment decreases the probability that R will happen in the future… but what will happen instead? The organism will explore possible responses
Does initial intensity of punishment matter?
Yes it should be strong. I.e. Rats become numb to mild shock
Reinforcement schedules
Rules determining when outcomes are delivered in an experiment
____ outcomes produce fastest learning.
Immediate.
Response -> outcome interval being long results in what?
Decrease in association
The time lag between R and O is an important factor in ____.
Self control e.g. its easy to convince a student to study if a test is coming up tomorrow; it is harder if the test is in 5 weeks. the delay between R (studying) and O (good grades) makes reinforcement less effective in eliciting the response
Pre commitment
Making a choice that is difficult to change later. For example a dieter will be less likely to cheat if he gets rid of all the sweets in their house
Positive reinforcement versus positive punishment
Positive reinforcement
S (potty present) –> R (empty bladder) –> O (praise)
performance of the response causes the reinforcer to be “added”
Positive Punishment
S (potty absent) –> R (empty bladder) –> O (Disapproval)
The response must be with held, if it is not withheld (toddler peeing himself) a punishment is “added” to the environment
Negative reinforcement versus negative punishment
Negative Reinforcement
S (shock) –> R (take Aspiring) –> O (no more headache)
behaviour is encouraged because it causes something negative to be subtracted from the environment
Negative Punishment
S (recess) –> R (aggressive behaviour) –> O (loss of playtime)
Behaviour is discouraged by subtracting something reinforcing (playtime) from the environment
Continuous reinforcement schedule
Each response is always followed by the outcome
Partial reinforcement schedules
Patterns where response is followed by outcome less than 100% of the time
Fixed ratio schedule
A fixed number of responses must be made before outcome is delivered
Fixed interval schedule
Reinforcers first response after a fixed amount of time. Once the time interval has elapsed, the reinforcement remains available until the response occurs and the reinforcement is obtained. To constantly respond before the times interval elapses is a waste of time and effort
Variable ratio schedule
Provides reinforcement after random number of responses. This reduced the post reinforcement pause because there’s a study higher rate of responding since the very next response my result in a reinforcement (i.e. Lottery)
Variable interval schedule
Reinforce his first response after an interval that averages in particular length in time. Response rate under variable interval is steadier then fixed interval because animals check periodically to see whether reinforcement is available
Concurrent reinforcement schedule
Organism can make any of several possible responses, each leading to a different outcome. This allows researchers to examine how organisms choose to divide your time and efforts among different options
Matching law of choice behavior
Behavior is correlated with it’s environment. Given two responses that are reinforced on variable interval schedules, an organisms relative rate of making each response will match the relative rate of reinforcement for that response
Behavioral economics
Study of how organisms allocate their time and resources among possible options
Economic theory predicts that each consumer will allocate resources in a way that maximizes there____or relative satisfaction
Subjective value
The particular allocation of resources to provide maximum subjective value to an individual is called a____
Bliss point. I.e. Jamie gets $100 each week, he can either buy 10 albums or eat out five times, is bliss point is eating out two times a week but buying six albums
The premack principle
Opportunity to perform a highly frequent behavior can reinforce a less frequent behavior. For example, peanut gave a group of rats free access to drinking water and the running wheel. On average, right spent more time running than drinking. He restricted their access to the wheel: they were allowed to run after the drunk certain amount of water. They learned the response to outcome association and started drinking more water (R) in order to access the wheel (O). Total amount of running decreased and total amount of drinking increased. Activity of running was acting as a reinforcer, and it was increasing the probability of an infrequent behavior
Response deprivation hypothesis
Suggests that the critical variable is not which responses more frequent, but merely which response have been restricted. By restricting the ability to make any response, you can make the opportunity to perform that response more reinforcing. For example, if you have been studying for six hours straight, The idea of taking a break to clean your room (which is something you hate) may begin to look attractive
Cortical areas process____information and then send it as primary inputs to the____, which sends messages to motor neurons in the muscles.
Sensory, motor cortex
Dorsal striatum
Part of the basal ganglia, is further divided into caudate nucleus and putamen. Receive stimulus information from sensory cortical areas and projects to the motor cortex.
Dorsal striatum is critical for what?
It is critical in operant conditioning for learning stimulus to response associations based on feedback. For example, rats with lesions of the DS can learn simple response to outcome associations but only in the absence of discriminative stimuli
Orbitoftontal cortex
One of the brain areas that appear to be important in prediction of behavior. This area is underneath front of the brain that contributes to goal directed behavior by representing predicted outcomes. It receives input from all sensory modalities. Monkeys with OFC lesions can learn to respond to stimuli that have a reward outcome over a stimuli that does not have a reward outcome. But, if conditions are reversed and the second stimulus is now the one being rewarded, lesioned monkeys are slower to adapt
Neurons in the orbifrontal cortex fire differently depending on what?
Whether a reward or punishment is coming. Medial areas of OFC we process information about reinforcers and lateral aerials about punishers.
Ventral tegmental area (VTA) is known as the…?
Pleasure centre
In humans, pimizide suppresses cravings, whilst ____ increases cravings.
Dopamine
Dopamine also strengthens learning of stimulus to response associations during operant conditioning by generally promoting____.
Synaptic plasticity
Opiates signal ____ in the brain.
Liking
Endogenous opioids?
Naturally occurring peptides with similar effects to opiates. Researchers believe drugs like heroin or so pleasurable because the activate the same brain receptors as endogenous opioids. Endogenous opioids are released in response to primary reinforcers
What is wanting signaled by?
Dopamine
What is liking signaled by?
Endogenous opioids
How can you differ addiction from habit?
The degree of it
What do pathological addicts experience?
Extreme withdrawal symptoms, to the point where the neglect other parts of their life because nothing else is as important as the addictive substance.
Addiction involves not only seeking the high, positive reinforcement, but also avoiding the adverse effects of withdrawal,____, both of which reinforce drunk taking.
Negative reinforcement
Many addictive drugs are___, which target opiate receptors, which increase dopamine levels
Opiates
What does cocaine block,?
Dopamine reuptake, so it stays in the synapse longer
Many addicts report that they no longer experience a high to cocaine and amphetamines, why is that?
Their wanting system has disconnected from their liking system.
Behavioral addiction
When addictions to behaviors, instead of drugs, produce reinforcements or highs as well as cravings and withdrawal symptoms when the behavior is prevented. The most common example of this is gambling. Skaters adjusted gambling is so addictive due to reinforcement on variable ratio schedule.
Where does behavioral addiction reflect dysfunction in the brain?
The same place that is affected by drug addictions. For example gambling activates the brain similarly to how cocaine does.
In the US, most treatment plans include what type of therapy?
Cognitive therapy
Addiction is a strong stimulus to response to Alcom Association, with environmental stimuli triggering the addictive behavior, resulting in the reinforcing outcome of the high. The’s, treatment should reduce the ____ strength.
True
Distancing
Avoiding the stimuli the trigger the unwanted response
Delayed reinforcement
Imposing a fix delay before giving into addiction. Delete between response and I’ll come weekends learning of response to outcome response