Task 6 - Instrumental Conditioning Flashcards
Operant conditioning
process whereby organisms learn to make or refrain from making certain responses in order to obtain or avoid certain outcomes
example: Thorndikes puzzle box
Reinforcement
this process of providing an outcome for a behaviour that increases the probability of that behaviour
when deciding whether paradigm is operant or classical
- -> focus on the outcome
- when the outcome happens regardless –> classical
- when the outcome only happens by chance (if one does something) –> operant
Free-operant paradigm
animal could operate the apparatus freely, whenever it chose (f.e. when Thorndike added a return ramp to his puzzle box)`
Discrete trials paradigm
trials were controlled by the experimenter
Skinner box
he devised the cage – with a trough in one wall through which food could be delivered automatically
Cumulative recorder
A learning curve drawn by a pen that moves across a roll of paper at a steady rate, increasing its vertical height by a fixed amount for every response of an organism, such as a lever press by a rat in a Skinner box or a peck by a pigeon of an illuminated plastic key – f.e. odometer in the car
Discriminative Stimuli
stimuli that signal whether a particular response will lead to a particular outcome
–> they help the learner discriminate or distinguish the conditions where a response will be followed by a particular outcome
Discriminative Stimuli
stimuli that signal whether a particular response will lead to a particular outcome
–> they help the learner discriminate or distinguish the conditions where a response will be followed by a particular outcome
Shaping (??)
in which successive approximations to the desired response are reinforced
Chaining (–> backward chaining)
technique in which organisms are gradually trained to execute sequences of discrete responses
- related technique to shaping
- -> sometimes more effective to train the steps in the reverse order
Chaining (–> backward chaining)
technique in which organisms are gradually trained to execute sequences of discrete responses
- related technique to shaping
- -> sometimes more effective to train the steps in the reverse order
Reinforcer
is a consequence of behavior that leads to increased likelihood of that behavior in the future
Primary reinforcers
they are of biological value to the organism, and therefore organisms will tend to repeat behaviors that provide access to these things
- examples: Food, water, sleep, the need to maintain a comfortable temperature, and sex
Drive reduction theory (Clark Hull)
proposed that all learning reflects the innate, biological need to obtain primary reinforcers
–> complication: primary reinforcers are not always reinforcing
secondary reinforcers
reinforcers that initially have no biological value, but that have been paired with (or predict the arrival of) primary reinforcers (can be as strongly encouraging as primary enforcers)
– Example: money
Token economies
often used in prisons, psychiatric hospitals, and other institutions where the staff has to motivate inmates or patients to behave well and to perform chores such as making beds or taking medications
- tokens function in the same way as money does in the outside world
- Animals as well will work for secondary reinforcers
negative contrast:
organisms given a less-preferred reinforcer in place of an expected and preferred reinforcer will respond less strongly for the less-preferred reinforcer than if they had been given that less-preferred reinforcer all along
– F.e. the monkey that throws the cucumber because it is the less preferred food, once he saw the grapes
Punishment
the process of providing outcomes for behaviour that decrease the probability of that behaviour – the response decreases
Punishers or negative outcomes
common punishers for animals include pain, confinement, and exposure to predators (or even the scent of predators)
Four most important factors that determine how effective the punishment will be
- Punishment leads to more variable behaviour.
- Discriminative stimuli for punishment can encourage cheating
- Concurrent reinforcement can undermine the punishment
- Initial intensity matters
Differential reinforcement of alternative behaviors (DRA)
A process – rather than delivering punishment each time the unwanted behaviour is exhibited, it’s possible to reward preferred, alternate behaviours
Reinforcement schedules
the rules determining when outcomes are delivered in an experiment
Timing affects learning
Normally, immediate outcomes produce the fastest learning
Delays undermine the punishments effectiveness, and may weaken learning
Response consequence delay
the longer one waits to punish something/someone the less the association will be made between the punishment and the …
Self-control
an organism’s willingness to forego a small immediate reward in favor of a larger future reward
Positive (reinforcement)
positive does not mean good → instead it means added
Positive reinforcement
the desired response causes the reinforcer to be added to the environment
Positive punishment
an undesired response causes a punisher to be added to the environment
Negative reinforcement
behaviour is encouraged (reinforced) because it causes something to be subtracted from the environment – over time the response becomes more frequent
– sometimes called avoidance training
Negative punishment
something is subtracted (negative) from the environment, and this subtraction punishes the behavior
– sometimes called omission training
Negative (reinforcement)
negative does not mean bad, it means subtraction in a mathematical sense
Reinforcement / Punishment
Positive / Negative
(Definition)
the terms reinforcement and punishment describe whether the response increases (reinforcement) or decreases (punishment) as a result of training. the terms positive and negative describe whether the outcome is added (positive) or taken away (negative)
Partial reinforcement schedules
patterns in which an outcome follows a response less than 100 percent of the time
– Example: Becky has to clean her room seven days in a row to obtain her weekly allowance (seven responses for one reinforcement)
Four types of partial reinforcement:
- Fixed-ratio (FR) schedule
- Fixed-interval (FI) schedule
- Variable-ratio (VR) schedule
- Variable-interval (VI) schedule
- Fixed-ratio (FR) schedule
In operant conditioning, a reinforcement schedule in which a specific number of responses are required before a reinforcer is delivered; for example, FR 5 means that reinforcement arrives after every fifth response
Postreinforcement pause
In operant conditioning with a fixed-ratio (FR) schedule of reinforcement, a brief pause following a period of fast responding leading to reinforcement.
It just happens – the animal takes a break – the longer the organism is doing the response the longer the pause will be
- Fixed-interval (FI) schedule
an FI schedule reinforces the first response after a fixed amount of time
- Variable-ratio (VR) schedule
a VR schedule provides reinforcement after a certain average number of responses
–> as a result, there is a steady, high rate of responding even immediately after a reinforcement is delivered, because the very next response just might result in another reinforcement
- Variable-interval (VI) schedule
a VI schedule reinforces the first response after an interval that averages a particular length of time – VI schedules tend to produce higher rates of responding than FI schedules (more reinforcing than the fixed-ratio)
The interval schedules are better than the ratio
Concurrent reinforcement schedules
in which the organism can make any of several possible responses, each leading to a different outcome
– Linked to behavioural economic –> how they use their time and resources
Matching law of choice behaviour
the principle that an organism, given a choice between multiple responses, will make a particular response at a rate proportional to how often that response is reinforced relative to the other choices
Behavioural economics
the study of how organisms allocate their time and resources among possible options
– economic theory predicts that each consumer will allocate resources in a way that maximizes her “subjective value,” or relative satisfaction. (in microeconomics, the word utility is used instead of subjective value.) the value is subjective because it differs from person to person
Pigeon could either get a reinforcer after a minute or two pellets after 2 min
Bliss point
the particular allocation of resources that provides maximal subjective value to an individual
- Changes depending on context
Premack principle
The theory that the opportunity to perform a highly frequent behavior can reinforce a less frequent behavior; later refined as the response deprivation hypothesis.
- Example: if you have been studying for several hours straight, the idea of “taking a break” to clean your room or do the laundry can begin to look downright attractive
- Rats want to run on their wheel
Response deprivation hypothesis
a refinement of the Premack principle stating that the opportunity to perform any behaviour can be reinforcing if access to that behaviour is restricted → want something because you can’t have it
Basal ganglia
collection of ganglia (cluster of neurons) information from the sensory cortex to the motor cortex can also travel via this indirect route
One part of the basal ganglia is the dorsal striatum – which can be further subdivided into the caudate nucleus and the putamen
dorsal striatum
receives highly processed stimulus information from sensory cortical areas and projects to the motor cortex, which produces a behavioral response
– Plays a critical role in operant conditioning, particularly if discriminative stimuli are involved
Rats with lesions of the dorsal striatum can learn operant responses (e.g., when placed in a skinner box, lever-press R to obtain food O). But if discriminative stimuli are added (e.g., lever-press r is reinforced only in the presence of a light sd), then the lesioned rats are markedly impaired – similar to people that have a disruption to the striatum due to Parkinson’s disease or huntington’s disease
→ the dorsal striatum appears necessary for learning SD → R associations based on feedback about reinforcement and punishment
Orbitofrontal cortex
appears to contribute to goal-directed behavior by representing predicted outcomes
- receives inputs conveying the full range of sensory modalities (sight, touch, sound, etc.) and also visceral sensations (including hunger and thirst), allowing this brain area to integrate many types of information;
- outputs from the orbitofrontal cortex travel to the striatum, where they can help determine which motor responses are executed
?? is this also right ??
First projects from the sensory cortex (stimulus) to → the orbitofrontal cortex (prediction) → then to the basal ganglia (SD → R association)→ then to the striatum (motor learning)→ then to the motor cortex (reaction)
wanting and liking in the brain
later studies identified that rats would work for electrical stimulation in several brain areas, including the ventral tegmental area (VTA)
Ventral tegmental area (VTA)
a small region in the midbrain of rats, humans, and other mammals – produces dopamine (wanting something) – can stimulate the VTA to get same effect as a reinforcer
“pleasure centers”
some researchers inferred that the rats “liked” the stimulation, and the VTA and other areas of the brain where electrical stimulation was effective became informally known as “pleasure centers.”
Anhedonia hypothesis
the incentive salience hypothesis proves this wrong – that wanting and liking is the same thing and that dopamine is for both
Hedonic value
the subjective “goodness” of a reinforcer, or how much we like it
Motivational value
meaning how much we “want” a reinforcer and how hard we are willing to work to obtain it
Incentive salience hypothesis
The hypothesis that dopamine helps provide organisms with the motivation to work for reinforcement – states that the role of dopamine in operant conditioning is to signal how much the animal “wants” a particular outcome—how motivated it is to work for it
Endogenous opioids
brain chemicals that are naturally occurring neurotransmitter-like substances (peptides) with many of the same effects as opiate drugs
how do “wanting” and “liking” interact
Possible way that the two brain systems (of liking and wanting) interact: differences in the amount of endogenous opioid released, and in the specific opiate receptors they activate, may help determine an organism’s preference for one reinforcer over another
Pathological addiction
a strong habit that is maintained despite harmful consequences
addiction may involve not only seeking the “high” but also avoiding the adverse effects of withdrawal from the drug. in a sense, the high provides a positive reinforcement, and the avoidance of withdrawal symptoms provides a negative reinforcement—and both processes reinforce the drug-taking responses
Behavioural addictions
are addictions to behaviour, rather than drugs, that produce reinforcements or highs, as well as cravings and withdrawal symptoms when the behaviour is prevented
– Perhaps the most widely agreed-upon example of a behavioral addiction is compulsive gambling
Detoxification
taking a different drug instead – like drinking alcohol free beer
Extinction
if response R stops producing outcome o, the frequency of r should decline
Distancing
avoiding the stimuli that trigger the unwanted response
Differential reinforcement of alternate behaviours (DRA)
reinforce yourself for example with a spa day if you didn’t use the drug or punish yourself if you did use the drug
Delayed reinforcement
whenever the smoker gets the urge to light up, she can impose a fixed delay (e.g., an hour) before giving in to it
most effective treatments
combine cognitive therapy (including counseling and support groups) with behavioral therapy based on conditioning principles—and medication for the most extreme cases
Protestant ethic effect (NOT sure if this is right)!!!
delusional – would rather work for their food then get it freely – you think that you do something for an effect– vs habit slip
Reward prediction hypothesis (ask about this again)
the firing of dopamine
Reward prediction hypothesis (ask about this again)
the firing of dopamine
– the phasic activity of dopaminergic neurons in the midbrain signals a discrepancy between the predicted and currently experienced reward of a particular event