Ch. 4: Reinforcement Flashcards
Define Operant, Learning & Operant Learning
Operant: A class of behaviour that operates on the environment to produce a common environmental consequence.
Learning: A change in behaviour due to experience
Operant Learning: A change in a class of behaviour as a function of the consequences that followed it.
Learning = conditioning
5 ways of Reinforcing consequences?
Increase frequency, duration, intensity, quickness, variability
Two Ways of Reinforcing
- Add a stimulus + positive reinforcement
- Remove a stimulus - Negative Reinforcement
Reward vs. Reinforcer
Reward does not equate to reinforcer
Ex. Giving a dog a treat for rolling over, later asking the dog to roll over and he doesn’t means the reward didn’t function as a reinforcer.
Plateau
Maximum amount of behaviour that can be conceivably admitted. Behaviour can never exceed a probability of one.
Is Reinforcement a theory?
No reinforcement is not a theory. It is a functional description. It is not circular.
What are the two types of reinforcement?
- Unconditional (Primary) Reinforcer
- Conditional (Secondary) Reinforcer
Unconditional (primary) reinforcer
A motivating stimulus that does not need to be learned, such as food, water, warmth, oxygen, shelter, etc.
-depends on some amount of deprivation
-Often species specific
Conditional (Secondary) Reinforcer
Stimuli, objects, or events that become reinforcing based on their association with a primary reinforcer.
Ex. A dog isn’t born wanting to sit on cue, but when sitting is paired with primary reinforcers such as treats or social interaction, it becomes a secondary reinforcer.
Liberman, et al. 1973
Institutionalize patients with schizophrenia. Reinforced “rational talk” by positive interactions with patients and did not reinforce “irrational talk” by having negative interactions with patients.
Conditional Reinforcement
Conditional reinforcement is when something becomes rewarding because it’s linked to a real reward.
For example, if you give a dog a treat every time you click a button, the dog will start to like the sound of the click because it knows a treat is coming. The click becomes a conditional reinforcer.
Contingency
The degree of correlation between a behaviour and its consequence.
Reinforcement variables: Contiguity
Nearness of events in time (temporal contiguity) or space (spatial contiguity).
High contiguity often referred to “pairing”
Less contiguity (longer delays) between the operant response and the reinforcer, diminishes the effectiveness of the reinforcer.
Reinforcement variables: Temporal vs spatial contiguity
Temporal contiguity: means that two things happen close together in time. In learning, it refers to how closely in time a behavior and its consequence (like a reward or punishment) are linked. The closer they happen, the stronger the connection the brain makes between them. For example, if you give a dog a treat right after it sits, it’s more likely to connect sitting with getting the treat because the two events are closely timed.
Spatial contiguity: means that two things happen close together in space. In learning, it refers to how close things are physically when you’re trying to learn something. For example, if words and pictures are shown next to each other on a page, it’s easier to learn because they’re close together. The brain connects them more easily when they’re near each other.
Hyperbolic decay function
Describes how something (like a reward or value) becomes less important or less impactful as time passes, but not in a straight line. Instead, it drops quickly at first, then slows down over time.
In simple terms, it’s like saying, “The longer you wait for something, the less you care about it, but that drop in how much you care happens fast at first and then slows down later.”
For example, if you’re waiting for a reward, you might be really excited at first, but the longer you wait, the less excited you get, though your excitement doesn’t disappear completely.
Reinforcer Characteristics: Reinforcer Magnitude
Generally, larger reinforcers are more reinforcing than smaller reinforcers.
-Relation between size and effectiveness is not linear.
-Generally, the more you increase magnitude, the less benefit you get from the increase.
-Effectiveness of unconditional reinforcers tends to diminish quickly.
Reinforcer characteristics: specific reinforcer
Ex: chocolate is yummier than sunflower seeds
Reinforcer Characteristics: Task characteristics
In behavior modification, “task characteristics” as part of reinforcer characteristics refer to how the specific qualities of a task influence how effective a reward (reinforcer) will be. For example, how difficult, complex, or interesting the task is can affect how well a reward works to encourage someone to do it. Tasks that are easy and clear may need smaller reinforcers, while challenging or boring tasks might need stronger rewards to keep someone motivated.
Ex: getting a pigeon to peck for food vs a hawk peck for food
Reinforcer Characteristics: Motivating operations
Establishing: increases effectiveness
Ex: deprivation
Abolishing: decreases effectiveness
Ex: satiation
Reinforcement Characteristics: Competing contingencies
When there are two or more possible outcomes or consequences for a behavior, and you have to choose between them. Each outcome might have different rewards or punishments, and you weigh which one is more important to you.
CHOICE: Allocation of time among two or more activities.
Ex: Should I watch YouTube or study?
Premack Principle
High-probability behaviour reinforces low-probability behaviour.
The idea that a more enjoyable activity can be used as a reward for doing a less enjoyable activity. In simple terms, it means “If you do something you don’t like first, you get to do something you really enjoy afterward.”
For example, if a child doesn’t like doing homework but loves playing video games, you can say, “First finish your homework, then you can play video games.” The fun activity (video games) motivates them to do the less fun activity (homework).
Problems with Premack Principle
-Doesn’t account for conditional reinforcement effects.
-Low probability behaviour can reinforce high probability behaviour when the organism has been deprived of the low probability behaviour
Schedule of Reinforcement
A rule that describes the delivery of reinforcement. Different schedules produce unique schedule effects. The effects become predictable. Occur in numerous species.
Cumulative Record
A plot of cumulative responses (y-axis) over time (x-axis).
It is a visual way of showing how often a behavior happens over time. Imagine a graph where the line goes up every time the behavior is done. The steeper the line, the more the behavior is happening. If the line is flat, it means the behavior isn’t happening at all.
Frequency vs. Cumulative Frequency
-Frequency= How many times something happens in a single period.
-Cumulative frequency= The total number, adding up each period’s frequency.
Frequency is the number of times something happens in a specific time or group. For example, if you count how many times you eat pizza in a week, that number is the frequency.
Cumulative frequency is the running total of how often something happens, adding up each time. For example, if you track how many times you eat pizza each week for a month, and add up each week’s total as you go, that’s cumulative frequency.
Schedules of Reinforcement: Continuous Reinforcement (CRF) Schedule
-Behaviour is reinforced each time it occurs
-Rate of behaviour increases rapidly (good for new behaviours)
-Rare in the natural environment
Ex. A child is praised every time they clean
Intermittent Reinforcement Schedule
When a reward or reinforcement is given only sometimes after a behavior, not every time.
Ex: gambling
4 main types:
-Fixed-ratio (FR)
-Variable-ratio (VR)
-Fixed-Interval (FI)
-Variable-Interval (VI)
Intermittent reinforcement: Fixed-Ratio Schedule
Behaviour is reinforced after a fixed number of times.
Important ex: FR-120… pigeon has to peck 120 times for reinforcement
Generates Post-reinforcement pause which typically increases with ratio size and reinforcer magnitude
Generates steady run rates following the PRP.
Intermittent Reinforcement: Variable-Ratio Schedule
-Ratio-requirement varies around an average.
Ex: VR-360 if there were 739 responses and the mean was 360
-Works with shuffles ordering
Ex. From each response it is randomly sorted. So the pigeon may have to peck 30 times first to get food, then 720, then 180…
-Less post-reinforcement pause- very rare and short.
-Produces higher rates
-Common in natural environments
Has two common variations: random ratio and progressive ratio
Intermittent Reinforcement: Variable-Ratio Schedule (Random-Ratio)
-The schedule is controlled by a random number generator
-Produces similarly high rates of responding
-Type of ratio used in casino games & video games
Intermittent Reinforcement: Variable-Ratio Schedule (Progressive ratio)
-ratio requirements move from small to large
-ex. 123456789…
-Post reinforcement pauses (PRPs) increase with ratio size
-Creates a “break-point” measure of how hard organism will work.
Intermittent Reinforcement: Fixed-Interval Schedule
-Behaviour is reinforced when it occurs after a given period of time.
Ex: FI-4 minutes… delivered after 4 minutes
-Produces Post reinforcement pause
-responding increases gradually producing a “scallop” shape
-uncommon in natural environment
Intermittent Reinforcement: Variable-Interval Schedule
-Interval varies around an average.
Ex: VI-3 min intervals (in seconds)
-PRPs are rare and short
-steady rates of responding- not as high as a VR
-common in natural environments
Fixed/Variable Duration
-Reinforcer is contingent on continuous performance for a period of time.
Ex: practicing guitar for 30 min
-Many people use these schedules but provide no reinforcer.