Reinformence learning and motor sequences Flashcards

1
Q

Learning from Reward/Reinforcement,
The Beginnings

A

Psychology
* Classical Conditioning
* A learned (reinforced) reflex / response that is evoked by a stimulus

  • Pavlov’s Dog
    ring bell get treat
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reinforcement and punishment

A

Reinforcement: increase behaviour
Punishment: decrease behaviour
Positive: add something
Negative: take away something

Classroom Examples:
Positive Reinforcement – Candy
Negative Reinforcement – Take away homework
Positive Punishment – Writing Lines
Negative Punishment – Take away recess

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

State, action, reward

A

Interaction between agent and environment

At each step t the agent:
* Executes action At
* Receives scalar reward Rt
* Receives observation Ot

The environment:
* Receives action At
* Emits a reward Rt
* Emits observation O

The process of reinforcement learning
involves learning to link reward with
specific actions (and their outcomes)
so they become repeated

Reward feedback can be binary (action
is rewarded or not) or a scalar quantity
(relative to the utility of action/reward
outcomes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Human(-like?) Reaching and
Locomotion

A

Stick man

In both cases, the actions were learned using reward – the action was repeated when it was associated with success (reaching the target, walking).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The goal of reinforcement learning

A

MAXIMIZE REWARD
* Minimize Loss

Cumulative Reward
-Might be better to sacrifice immediate reward
for long-term reward
* Chess
* Investments

Actions that are associated with reward become strengthened/repeated (to maximize reward)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Exploration

A

The (trial and error) process of acquiring more information about the environment by searching possibilities

Searching (many) action possibilities to determine which actions tend to maximize reward

  • found out goalie can’t reach low
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Exploitation

A

capitalize on known information to
maximize reward

Actions associated with past history of reward tend to be repeated to maximize future reward

-shoot low and score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Tradeoff Between Action Exploration and Exploitation

A

Shift emphasis from exploring to exploiting to maximize reward

Learning from reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Reinforcement feedback, hit and shift target

A

Part one:
When the participant hits the
target, it increases in size and the
participant hears a pleasant tone

When the participant misses the target, they do not receive any reward feedback

Part two:

Shift unknown to participants
Absence of reward causes participants to shift their aimpoint

  • if no reward go back to exploration phase
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Brain Structures Involved in Reinforcement Learning

A

The basal ganglia are a collection of subcortical structures in the brain.

Dopamine is a neurotransmitter that is
part of the brain’s intrinsic
reward system. It is produced in the substantia nigra.

Dopamine input to the striatum is critical for learning from reward
and strengthening the representation of specific actions.

Striatum: reinforces action based on dopamine release (reward)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Learning to Produce Motor Sequences
(Serial Actions)

A

Subjects learn to produce sequences of finger movements - discrete actions (individual finger movements) assembled into functional sequence

Piano example:

Before training: Key presses are done
independently with very little temporal overlap

After: Key presses are strung together in sequences. grouped together for more efficient sequences

Subjects get faster with practise and more efficient with less error. Smoother and more linked together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Learning Causes ‘Chunking’ of Individual Elements in a Motor Sequence

A

Practice can link sequential actions into a single movement pattern

Sequences= similar, grouped together

With practice, independent actions are
‘chunked’ into a larger subunit of a
movement sequence

Eventually actions can be ‘chunked’ together into a single cohesive
movement sequence where
successive actions are ‘coarticulated’

Chunking: fusing a series of individual elements into a larger subunit of a movement sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Co-articulation

A

Adjacent movement elements influence each other

when we become more efficent, graphs start to lump together. Do not see them as individual movements due to experience

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Assocciative/premotor network (early learning)

A

Sensory, early learning

Brain regions with increased activity in early stages of learning

Dorsolateral Prefrontal Cortex - strategizing, high level planning,

Inferior Parietal Cortex - visual input

Rostral Premotor Areas - motor planning

Cerebellum - correcting erros

Basal Ganglia - Reward process

Learning motor sequences involves a complex and highly distributed network of brain areas. High cognitive demand
Conscious processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Sensorimotor Network (late learning)

A

M1 and Pre motor

Brain regions with increased activity in later stages of learning

Supplementary motor area (SMA)
- storage unit for motor plans

Dorsal premotor area
- motor planning

Primary motor cortex
- send AP down spine to muscles, movement sequencing

In later stages of motor sequence
learning, activity shifts to sensory and
motor regions of the brain. Low cognitive demand Automatic processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly