Module 2 Flashcards
What is non-associative learning?
- a process in which an organism’s behavior toward a specific stimulus changes over time in the absence of any evident link to (association with) consequences (types: habituation, sensitization, dishabituation)
- ex. habituation: gradual decrease of responding the more stimulus is presented
What is associative learning?
behavioral change that accompanies the presentation of two or more stimuli at the same point in time or space
– classical/pavlovian conditioning and instrumental conditioning
What is instrumental learning?
a type of learning in which behaviors are strengthened or weakened by their consequences
What are the three key elements of instrumental learning?
- The environment
- The instrumental behaviour
- The consequence
What factors can influence instrumental learning?
- Timing of the reward delivery
- Rules of reward delivery
- Type of rewards
- Other stimuli associated with rewards
Who is Thorndike and what did he study/conclude?
- Edward Thorndike is an American learning theorist:
- Devised the puzzle box to study learning
- Cat learned to press the lever to escape over many trials
- Concluded that a connection is formed between the lever (S) and the response (R) –> S-R learning –> law of effect
- Learning is incremental, not insightful (Insightful learning primary the case for higher learning)
What is stimulus-response learning?
- S-R learning focuses on the direct association between stimuli and responses. It posits that learning occurs when a specific stimulus consistently elicits a specific response.
- The learner forms a connection between a stimulus (S) and a response (R) without considering the consequences of the response
What is the law of effect?
- This principle emphasizes the role of consequences in learning.
- If a response in the presence of a stimulus is followed by a satisfying event, the association between the stimulus and the response is strengthened; if a response in the presence of a stimulus is followed by an annoying event, the association is weakened
- Argues for connectionism in learning.
- It suggests that the strength of a stimulus-response association is influenced by the satisfaction (or dissatisfaction) that follows the response.
What is the limitation of SR learning?
Generalization: The example of monkeys transferring tool manipulation between hands illustrates a form of generalization. In this case, the monkeys learn to apply their skills with tools in a flexible way across different contexts, which suggests that their learning involves more than just a simple stimulus-response association.
Who is B.F. Skinner and what did he study?
B. F. Skinner studied learning from a pure behaviourist perspective
* Designed many innovative tools to study learning
* Coined the term “Operant”: Operates on the environment
Trained rats to run down a runway to goal box where there’s reward à trying to see if rat learns that they can be rewarded for running to goal, they would run faster. If performance increases this would indicate learning
What is the operant chamber/skinner box?
Operant chamber also called “Skinner box”
Operant chamber and cumulative recorder
* In the box there are objects (e.g. lever) the rats can interact with
* Tones and light for training stimuli.
* Shaping: teaching rat to hit lever for sugar reward
* Cumulative recorder: allows for continuous recording of free ongoing instrumental
behaviouràroll of paper that runs continuously and corresponds to time. With every response performed by rat, pen will move continuously to one direction until l it resets. Slope = how many actions over unit if time -à rate of response
What is reinforcement/punishment/positive/negative?
Reinforcement: behaviour increases when it produces an appetitive stimulus
Punishment: behaviour decreases when it produces an aversive stimulus
* Stimulus here is the consequence (e.g. foot shock, sugar pellet) (reinforcer vs punisher =whether behaviour increases or decreases)
Positive and negative refers to “contingency” (presence or absence) * Positive: action leads to presentation of a stimulus
* Negative: action leads to removal of a stimulus
e.g. neg reinforcer –> something bad taken away
What is the difference between reward and reinforcer?
Reward vs reinforcer: attractive and motivational property vs behaviour facilitator
What is a continuous reinforcement schedule (CRF)?
every response leads to reinforcer delivery
What is a partial reinforcement schedule (PRF)?
Partial reinforcement schedule (PRF):
* Ratio schedule: reinforcers are delivered based on the number of times a response occurs (e.g. FR10= animal gets reward every 10 lever presses)
* Interval schedule: reinforcers are delivered based on the time elapsed after which a response occurs
* Fixed vs variable:
- Fixed: the number of responses or the time has to elapse is certain
- Variable: the overall average is known, but the number/time for each reinforcer delivery is uncertain.
How do the different schedules of reinforcement produce stereotypical response patterns?
VR: steady and robust responding
FR: post-reinforcement pause + ratio run
VI: steady and stable responding
FI: fixed-interval “scallop. After reinforcement, animals stop responding. As time approaches to when they predict the next reinforcer will be coming, they start to increase responding
— Typically, VR leads to the strongest responding: because its unpredictable. To get most rewards they need to respond a lot
What is extinction?
A conditioned response diminishes due to lack of reinforcement
* Is a learning process itself – actions no longer produce rewards (During extinction it’s not about unlearning. Its new learning that suppresses old learning)
* Adaptive: saves energy by reducing unnecessary behaviour
*Extinction rate is affected by previous reinforcement schedules
– Slot machines operate on VR schedules: hardest to extinguish!
What are primary vs secondary reinforcers?
Primary reinforcers: often biologically essential like food, water, sex etc.
Secondary reinforcers: stimuli that are previously paired with a primary reinforcer that becomes reinforcing in its own right, also known as conditioned reinforcers.
* e.g., sound of the lever, clicker, vouchers
What are the four different functions generally ascribed to secondary (conditioned) reinforcers?
- Reinforcing of new learning response
- Establishing and maintaining schedules of reinforcement
- Maintaining of behaviour during extinction
- Mediating delays between response and delivery of reinforcement
What is temporal contiguity and what was the study by Grise?
- Temporal contiguity: how soon the reinforcer follows the response
- Immediate reinforcer delivery leads to maximum learning (experimentally determined as ~0.5s)
Grise (1948) removed all reinforcers, whereas Wolf (1964) kept a secondary reinforcer –> What’s the relationship between learning and delay: - Adding a delay to reinforcer delivery discounts the reinforcing effect
- Delay of up to 10 seconds –> there’s virtually no learning if reinforcer is delivered 10
sec after - Secondary reinforcer (tone kept in): learning somewhat protected by presence of secondary reinforcer
» In one condition, the primary reinforcer (food) was delayed after the behavior (like pressing a lever).
» In another condition, while there was a delay in the primary reinforcer, a secondary reinforcer (the tone) was still present
» When the tone (secondary reinforcer) was kept in the experiment, learning was somewhat protected from the negative effects of the delay. Even if the food (primary reinforcer) was delayed, the tone provided a cue that helped maintain the connection between the behavior and the expected reward.
How are superstitious behaviours accidental reinforcement learning?
- When a random behaviour is accidentally reinforced by an appetitive outcome that comes in close temporal contiguity
- Highlights the natural tendency to draw causal links between events
- Reinforced pigeons on Fixed time schedule (will get a reward after certain amount of time- not contingent on action) à over a half of the pigeons’ developed stereotypical
behaviour (thought doing this behaviour would get them reward)
Definition of rewards as reinforcers:
Rewards are often called reinforcers because a response followed by a reward strengthens the association between certain environmental conditions (stimuli) and the response.
* Response theories
* Motivational theories
Definition of rewards as incentives:
the anticipation or expectancy of reward arouses incentive motivation
What was Skinner’s definition of a reinforcer and why is it practical?
- Skinner’s definition of reinforcers focuses on the functional aspect: any stimulus following a response that increases the probability of that response’s recurring is a reinforcer.
- This approach has considerable practical utility because it is often difficult to determine what will be a good reinforcer for a given person in a given situation.
What is Premack’s principle?
Core argument: probability of a response could be low or high
Premack (1959): responses occur at different probability. What is reinforcing is relative, not absolute, and is dependent on the probability of the responses.
* Rats deprived of water or running wheel: the deprived one can be used to reinforce the other.
* Found the same patterns in children: eating candy and playing pinballs can reinforce each other depending on what they naturally prefer. The kids who like to play Ping Pong will use it as reinforcer for eating candy (if they prefer playing Ping Pong they have to eat candy to play ping pong –> increases probability of eating candy). Whatever’s being deprived, they will preform action to get to it (e.g Eating food becomes highly probable behaviour with deprivation – deprivation as reinforcing response. OR making food really good [ larger reward- reinforcing response] also increases motivating probability of eating.)
Premack’s principle: if two responses are arranged in an operant conditioning procedure, the more probable response will reinforce the less probable response; the less probable response will not reinforce the more probable response.
What does response theory argue?
Response theory argues that responses have different probabilities and you can manipulate the probability to achieve learning outcome