Planning Flashcards
Match the following names with their hypothesis. Note these are in chronological order of publication: Broca, Gross/Fuster, O’Keefe & Nadel, Ungerleider & Mishkin, Desimone & Duncan, Schultz & Dayan, Cohen & Botvinick
Modular brain, hierarchical brain, cognitive map, parallel streams, biased competition theory, dopamine, response conflict
This lecture was called planning and reasoning because the reasoning aspect is essentially all about reasoning…
Inductively i.e. inferring probable conclusions from premises
If we argue that reinforcement learning drives all behaviour then we are reducing an agent’s brain down to…Does this agent have knowledge e.g…?
A gigantic table of values of different actions in a given state. E.g. which states follow others. No
Tolman (1946) presented evidence that animals do have knowledge in form of cognitive maps. How was this demonstrated?
A rat was trained to enter a tunnel which led to a reward. When this tunnel was blocked at test the rat chose to take the most appropriate alternative tunnel given the inferred angle of of the rewarding location from the starting position
Give another example from an animal cognition in which reinforcement learning does not suffice and knowledge of the order of states must be known
Aesop’s thirsty crow places stones in a jug to raise the water level so that he can reach/drink the water in the jug. The crow needs knowledge about the links between states which lead to the reward
Give a third example from animal cognition in which knowledge is used to plan for the future (Raby, 2007)
Western scrub jays experience being given food for breakfast in compartment A but not in C. Then when given food in B with free access to neighbouring A & C in the evening, they cache (store/save) the food in C not A = should they be placed in C the next morning breakfast will be available
What is the benefit of having knowledge of the transitions between states e.g. in the form of a cognitive map?
It allows you to learn about rewards offline
Which part of the brain do we use to learn about links between states? TSB Summerfield (2006) who found that…
MTL. The degree of MTL activity during associative encoding predicted the success of recall of which house went with which scene (= the blue trace on the fMRI signal graphs)
In the real world we often do not intend to learn associations between stimuli. Nevertheless we still learn associations. The neural mechanism for this implicit associative encoding was demonstrated by Schapiro (2012). What were the stimuli?
Pps were presented with a recursive structured series of colourful stimuli. Some stimuli nearly always followed each other (strong pairs), whilst other stimuli followed each other at just above-chance (weak pairs).
What did Schapiro (2012) find re: the hippocampus and implicit associative encoding? What does the middle bar of the 3 on Schapiro’s result graph depict?
MVPA showed that strong pair members showed greater hippocampal pattern similarity than weak pair members despite no difference in visual similarity. Shuffled pairs (strong to the left & weak to the right)
What was found from training an agent to move from X to Y with vs. without a planning algorithm called _ _ _ _?
DYNA. Without planning the agent becomes stuck. With planning on the basis of learnt transitions between states, agents show a much improved ability to reach the goal location
What evidence is there of offline learning i.e. whilst not engaged in the activity & instead resting?
After rats have run around a circular apparatus, CA1 place cells in the hippocampus fire in the same but more rapid sequence as during the actual activity
What do types of offline learning exist? One refers to planning and the other “reflecting”. When do they occur?
Replay of cellular activity after the experience. Preplay before the action has been performed. During sleep or quiet resting
As well as occurring in the hippocampus, replay also occurs in _ _ _. Sequences are sometimes replayed ___ as if…
PFC. Backwards as if learning begins at the goal so that the rewarding purpose of an action is propagated backwards through a series of states back to the starting point. This allows us to plan how to reach this goal again in the future
Uncertainty based competition theory posits 2 competing RL (response learning) mechanisms which link sensation to action - what are they? Where do they likely lie at the neural level?
Habit-based, model free RL vs. goal- & model based RL. Basal ganglia vs. PFC
Note that whilst a cognitive map is an example of knowledge used by model-based RL, the hippocampus/ cognitive map is not the model based RL mechanism itself, as demonstrated by…
The fact that HM has a spatial memory (cognitive map)/ hippocampus deficit but not a planning deficit. The planner (PFC) directs you around the cognitive map
Schoenbaum (2008) provides evidence that rats are not driven entirely by model free, habit-based RL. Two odours were presented to rats:…(complete the task but not the findings)
Odour X predicted banana vs. odour Y predicted grapefruit. Then odour Y was devalued by being paired with LiCl or by feeding the rat to satiety on grapefruit
What were the findings of Schoenbaum’s (2008) experiment with rats, odours, bananas and grapefruit? What do they show?
After devaluation of odour X, rats continued to respond to odour Y. Rats do not just generically attribute a rewarding tag to odours but use specific knowledge of that’s odour outcome
What area of the brain can be lesioned to impair the flexible pattern of responding shown by rats towards odour X vs. Y in Schoenbaum (2008)?
Orbitofrontal cortex
Shallice & Burgess (1991) found that PFC lesion patients are impaired at planning. This was demonstrated using the ___ ___ task in which Pps had to buy certain products without breaking any of 3 rules: 1)…2)…3)…
Multiple errands task. 1) Don’t leave a designated area, 2) Don’t leave a shop without buying anything & 3) Don’t steal anything
How did the multiple errands task show that PFC lesion patients’ deficit was not in memory?
By giving them a written shopping list and a written copy of the rules
Shallice (1991) also found that PFC lesion patients were impaired at the ___ of ___ task in which you must…
Tower of Hanoi task. Move all the disks from the leftmost peg to the right two pegs without placing a larger disk on top of a smaller disk
Planning could involve imaging a __ of possible moves & searching through a ___ of ___ action sequences & identifying their outcomes
Tree. Variety of possible
The most anterior portion of PFC is involved in branching control i.e. maintaining the rule applicable to one episode whilst in a new episode. This was demonstrated by Koechlin (1999):…
Pps had to decide whether successive letters were adjacent in the word “tablet”. BA 10 (most anterior PFC) was only activated in the branching condition in which one task (upper letters) had to be held in pending state whilst another task (lower letters) was performed
Was BA10 (most anterior PFC) active in the dual task or delay conditions of Koechlin (1999)?
No
___ PFC is more anterior than rostral PFC
Polar
PFC may code for the value of different possible actions & hence its role in planning/ reflection. This was shown by Boorman (2010) who found that when Pps chose between…, medial BA10 (most anterior PFC) responded to…, whereas lateral BA10 (most anterior PFC) responded to…
2 options, the value of each of which drifted over time (independently of the value of the other option). The value of what you chose. The value of what you didn’t choose
Daw (2011) investigated whether humans use model free or model based response learning. Describe the two-step gambling task used
You first choose between 2 options (A1 & A2) where A1 has a 70% P of taking you to a B1/B2 choice & A2 has a 70% P of taking you to a C1/C2 choice (vs. 30% P of taking you to the alternative slot machine)
Imagine that Pps have experience of the two-step gambling task and that this trial their first choice takes them to a rewarded 2nd choice. What are the model-free vs. model based predictions re: the P of Pps staying with this choice on the next trial?
Model-free prediction: More likely to make the same choice because it was rewarded (the history of whether the first choice commonly leads to the favourable 2nd choice is irrelevant). Model-based: More likely to make the same choice again only if this is known to commonly (vs. rarely) lead to the favourable 2nd choice
Daw (2011) concludes that…. Draw the graphs for model-free & model-based predictions & actual findings with rewarded vs. unrewarded + common vs. rare on the X axis & stay P on the Y axis (0.5 to 1)
Humans rely on a combination of model-based and model-free response learning. See slide 27 of lecture 14
Contrary to predictions, Daw (2011) found that the ___ & ___ were activated by ___ model-free RL (encoding prediction errors) & model-based RL in the two-step gambling task
PFC & striatum. Both
Planning is very computationally expensive. E.g. deep blue (the chess computer) searches through 200,000 possible states of the world/second. How do we overcome this obstacle?
By chunking state space e.g. when planning a trip to Paris I do not consider each step I will take, just that I need to get to London to catch the Eurostar
PFC codes for ___ actions, unlike PMc or supplementary motor cortex which code for specific multicomponential actions e.g. PFC cells…(Tanji, 1994 & 2007)
Abstract e.g. PFC cells would code for 1) an AABB pattern regardless of what A & B were e.g. pull vs. turn or 2) the notion of travelling to an airport rather taking the 10am train to Heathrow
What is the four rooms problem? Does reinforcement learning suffice? Why?
When an agent in one room has to learn to reach a goal state in another of four rooms. No. Because the agent must learn the value of an intermediate state/option (the doorway to the next room)
Birds neurally segment a song when singing it. High vocal brain area cells (equivalent to mammalian _ _ _ cells) use ultra-___ code firing in ___ to drive RA (response-action) neurons. These cells directly control excitation in…
PMc. Sparse. Sequence. Abdominal expiriatory muscle to produce the song
Anterior ___ PFC is activated by ___ between hierarchically ordered tasks i.e. codes for links (e.g. the doorway) between chunks of states of the world (e.g. room X & Y)
Ventral. Transitions
Knowledge of alternative possible actions, plans or methods is called ___ knowledge
Counter-factual knowledge
Chunking and developing abstract hierarchical representations of our plans reduces the computational ___ of planning
Expense