Week 4: Learning Flashcards
Reinforcement learning (RL)
Anything that increases the likelihood that a response will occur, consists of learning by trial-and-error
Reinforcement learning (goals)
Maximize the occurrence and consumption of rewards, minimize the occurrence and consumption of punishment
Types of reward
Primary/secondary, positive/negative
Timeline of reinforcement learning, conditioning, behaviourism
Law of effect (Thorndike), classical conditioning (Pavlov) instrumental behaviour and conditioning (Skinner), Exponential learning (Hull), learning rule (Rescorla-Wagner)
Law of effect
Responses that produce a satisfying effect in a particular situation become more likely to occur again in that situation
Law of effect (experiment)
Cat in a box, mouse maze
Instrumental conditioning
Associations are formed between states and actions, outcome independent (habit)
States
Stimuli or context
Classical (Pavlovian) conditioning
Associations are formed between states and outcomes, trigger an unconditioned response to occur after a conditioned stimulus
Unconditioned response
Reflexive behaviour, involuntary actions, innate behaviour, salvation, freezing
Behaviourist paradigm
Behaviour is generated through reinforcement/conditioning (learned associations), there is no learning/behaviour without reinforcement
Notion of reinforcing outcome
Reward, objective property of what would reinforce the behaviour
Aim of behaviour
Maximizing the occurrence of reward by trial-and-error
Properties of conditioning and reinforcement learning
Contiguity, contingency
Contiguity
The reward must closely follow in time after the stimulus-response events
Contingency
The stimulus-response events must increase the probability of getting the reward
Blocking-paradigm
There is no learning when the stimulus is completely predicted, learning and association os proportional to surprise
Rescorla-Wagner model
Describes changes in associative strength (V) in one or several signals (CS) and the subsequent stimulus (US). Higher association strength leads to higher likelihood to trigger the UR
Error-prediction model
Error corresponds to surprise (explains the blocking-paradigm), mismatch occurs due to prediction error
Schultz et al.
Recorded midbrain (SN, VTA) dopamine neurons in a monkey brain during classical conditioning
Dickinson & Balleine
Investigated goal-directed learning in rats
Model-free reinforced learning
Habitual/Pavlovian conditioning, instrumental goal-directed behaviour
Habitual/Pavlovian conditioning (model-free learning)
Inflexible associations between states and actions or outcomes, after learning the association is no longer dependent on the response-outcome contingency or the outcome-properties
Instrumental goal-directed behaviour (model-free learning)
Flexible associations between states, actions and outcomes, sensitive to the outcome value
Model-based goal-directed learning
Purposeful behaviour
Purposeful behaviour (model-based learning)
Rats can learn structures and/or cognitive maps without reinforcement
Model-free goal-directed behaviour
Flexible association between states, action and outcomes, sensitive to the outcome value, forward planning
Model-based goal-directed behaviour
Model of transitions between states, action and states, this can then evaluate actions within available outcomes, backward induction