Reinforcement Learning: in Brain Flashcards
What is a neuronal system used for reward prediction?
One of the principal neuronal systems involved in processing reward information appears to be the dopamine system.
What have behavioural studies shown about DA in the brain?
Behavioral studies show that dopamine projections to the striatum and frontal cortex play a central role in mediating the effects of rewards on approach behavior and learning. These results are derived from selective lesions of different components of dopamine systems, systemic and intracerebral administration of direct and indirect dopamine receptor agonist and antagonist drugs, electrical self-stimulation, and self-administration of major drugs of abuse, such as cocaine, amphetamine, opiates, alcohol, and nicotine (Di Chiara, 1995; Wise et al., 1978).
What studies have been done on impaired DA transmission?
Many studies have also investigated the behavior of animals with impaired dopamine neurotransmission after local or systemic application of dopamine receptor antagonists or destruction of dopamine axons in ventral midbrain, nucleus accumbens, or striatum.
What have studies of impaired DA transmission shown?
Besides observing locomotor and cognitive deficits reminiscent of Parkinsonism, these studies revealed impairments in the processing of reward information. The earliest studies argued for deficits in the subjective, hedonic perception of rewards (Wise 1982; Wise et al. 1978). Further experimentation revealed impaired use of primary rewards and conditioned appetitive stimuli for approach and consummatory behavior (Beninger et al. 1987; Ettenberg 1989; Miller et al. 1990; Salamone 1987; Ungerstedt 1971; Wise and Colle 1984; Wise and Rompre 1989).
Why do the properties of DA neurons suggest they are involved in reward prediction?
About 75% of dopamine neurons show phasic activations when animals touch a small morsel of hidden food during exploratory movements in the absence of other phasic stimuli, without being activated by the movement itself (Romo and Schultz 1990).
What do studies of DA show in behavioural paradigms?
Evidence from multiple behavioural paradigms suggests that dopamine provides basal ganglia target structures with phasic signals that convey a reward prediction error that can influence learning and action selection, particularly in stimulus-driven habitual instrumental behavior (Barto, 1995; Schultz et al., 1997; Wickens & Kotter, 1995).
How might the temporal credit assignment equation relate to dopamine signalling?
delta (t) = DA signal
r(t) = actual reward
[V(t) - gamma V(t+1)] = predicted reward
What 2 factors indicate that prediction errors are used for learning?
- cortico-striatal synapses show dopamine-dependent plasticity
- three-factor learning rule: need presynaptic+postsynaptic+dopamine