Reinformence learning and motor sequences Flashcards
Learning from Reward/Reinforcement,
The Beginnings
Psychology
* Classical Conditioning
* A learned (reinforced) reflex / response that is evoked by a stimulus
- Pavlov’s Dog
ring bell get treat
Reinforcement and punishment
Reinforcement: increase behaviour
Punishment: decrease behaviour
Positive: add something
Negative: take away something
Classroom Examples:
Positive Reinforcement – Candy
Negative Reinforcement – Take away homework
Positive Punishment – Writing Lines
Negative Punishment – Take away recess
State, action, reward
Interaction between agent and environment
At each step t the agent:
* Executes action At
* Receives scalar reward Rt
* Receives observation Ot
The environment:
* Receives action At
* Emits a reward Rt
* Emits observation O
The process of reinforcement learning
involves learning to link reward with
specific actions (and their outcomes)
so they become repeated
Reward feedback can be binary (action
is rewarded or not) or a scalar quantity
(relative to the utility of action/reward
outcomes)
Human(-like?) Reaching and
Locomotion
Stick man
In both cases, the actions were learned using reward – the action was repeated when it was associated with success (reaching the target, walking).
The goal of reinforcement learning
MAXIMIZE REWARD
* Minimize Loss
Cumulative Reward
-Might be better to sacrifice immediate reward
for long-term reward
* Chess
* Investments
Actions that are associated with reward become strengthened/repeated (to maximize reward)
Exploration
The (trial and error) process of acquiring more information about the environment by searching possibilities
Searching (many) action possibilities to determine which actions tend to maximize reward
- found out goalie can’t reach low
Exploitation
capitalize on known information to
maximize reward
Actions associated with past history of reward tend to be repeated to maximize future reward
-shoot low and score
Tradeoff Between Action Exploration and Exploitation
Shift emphasis from exploring to exploiting to maximize reward
Learning from reward
Reinforcement feedback, hit and shift target
Part one:
When the participant hits the
target, it increases in size and the
participant hears a pleasant tone
When the participant misses the target, they do not receive any reward feedback
Part two:
Shift unknown to participants
Absence of reward causes participants to shift their aimpoint
- if no reward go back to exploration phase
Brain Structures Involved in Reinforcement Learning
The basal ganglia are a collection of subcortical structures in the brain.
Dopamine is a neurotransmitter that is
part of the brain’s intrinsic
reward system. It is produced in the substantia nigra.
Dopamine input to the striatum is critical for learning from reward
and strengthening the representation of specific actions.
Striatum: reinforces action based on dopamine release (reward)
Learning to Produce Motor Sequences
(Serial Actions)
Subjects learn to produce sequences of finger movements - discrete actions (individual finger movements) assembled into functional sequence
Piano example:
Before training: Key presses are done
independently with very little temporal overlap
After: Key presses are strung together in sequences. grouped together for more efficient sequences
Subjects get faster with practise and more efficient with less error. Smoother and more linked together
Learning Causes ‘Chunking’ of Individual Elements in a Motor Sequence
Practice can link sequential actions into a single movement pattern
Sequences= similar, grouped together
With practice, independent actions are
‘chunked’ into a larger subunit of a
movement sequence
Eventually actions can be ‘chunked’ together into a single cohesive
movement sequence where
successive actions are ‘coarticulated’
Chunking: fusing a series of individual elements into a larger subunit of a movement sequence
Co-articulation
Adjacent movement elements influence each other
when we become more efficent, graphs start to lump together. Do not see them as individual movements due to experience
Assocciative/premotor network (early learning)
Sensory, early learning
Brain regions with increased activity in early stages of learning
Dorsolateral Prefrontal Cortex - strategizing, high level planning,
Inferior Parietal Cortex - visual input
Rostral Premotor Areas - motor planning
Cerebellum - correcting erros
Basal Ganglia - Reward process
Learning motor sequences involves a complex and highly distributed network of brain areas. High cognitive demand
Conscious processing
Sensorimotor Network (late learning)
M1 and Pre motor
Brain regions with increased activity in later stages of learning
Supplementary motor area (SMA)
- storage unit for motor plans
Dorsal premotor area
- motor planning
Primary motor cortex
- send AP down spine to muscles, movement sequencing
In later stages of motor sequence
learning, activity shifts to sensory and
motor regions of the brain. Low cognitive demand Automatic processing