wk11 - Learning + Knowledge Flashcards
What are the four experimental effects which show us that attention is important for learning?
- Trade offs b/w salience & valdiity
- Blocking
- Highlighting
- Learngin rules of different complexcity
What is salience in regards to attention??
How much a cue grabs your attention when all other things are equal.
- In the absence of validity, high salience cues will attract your attention.
- Low salience cues will not attract attention.
What evidence did Kruschke & Johansen (1999) provide for the interaction between salience and validity of cues and their utilisation in a manipulated Posner cuing task? (hint: see image for cues).
Posner Cuing task:
- Pts task is to learn which of the cues to attend in order to respond to a target as quickly as possible.
- Two Cues = High Salience Cue (C1) & Low Salience Cue (C2).
- Validity of the cues are manipulated
- High validity (HV) / Low validity (LV)
- How much do they use each cue? (utilisation).
Results:
- Condition 1: C1 - HV & C2 - LV = utilisation is All C1 vs none C2.
- Condition 2: C1 - & C2 have equal validity (.8 prediction each) = C2 used a little more & C1 used a little less > C2 detracts C1.
- Condition 3: C1 .8 validity & C1 .9 validity = equal utilisation
Inference:
- There is a trade-off between salience & validity.
- Increased validity = increased utilisation.
- Decreased validity = decreased utilisation
- Increased salience = increased utilisation
- Decreased salience = Decreased utilisation
- BUT validity & salience interact
- Increased utilisation of one cue decreases utilisation for other cues.
What is Blocking? (attention/learning)
When learning occurs in a classical conditioning paradigm, an early learning task produces an association between red light (A) and reward (X) is created. A > X
- In a late learning task, A + a bell (B) are paired with X.
- AND an alarm (C) and a blue light (D) now predict a new reward (y).
- A.B > X
- C.D > Y.
Test: B (bell) & D (blue light) are presented to see which reward the mice will go to.. X or Y? 50/50 chance. BUT mouse consistently choose Y.
Why?
- All attention is focused on A > X
- When AB > introduced - not attention left for B - i.e. A blocks B - attention on one thing blocks learning of another.
- Cue D therefore drives final response
What is highlighting? (learning/attention)
Classical Conditioning Paradigm:
-
Early Training:
- A.B > X
-
Late Training:
- A.B > X (twice as likely as below to get reward)
- A.D > Y
- Test:
- B.D > Y
- Expected to go to X more.
- But preference for Y (guided by D).
Why:
- A & B are already paired with X
- Attn is shifted to D because it alone predicts the unusual event at Y.
- Cue D drives the final response.
- D learning HIGHLIGHTS unusual event.
Highlighting: prior learning about cue A highlights the fact that cue D predicts something different than cue A.
What is Simple Learning Theory?
- Where co-occurences lead to a strengthening b/w cues & outcomes.
What is attentional learning theory? And how is it different from simple learning thoery?
- Differes by also incorporating the ability to differentially weight cues according to their relevance.
- i.e. co-occurring cues are down-weighted because they don’t show how two cues might different.. therefore not relevant.
In the Filtration/Condensation categorisation experiment, which learning theory (simple or attention) better predicts the outcome of learning differences between type 1 (filtration) and type 2 (condensation) categories?
Real Data
- Filtration categories only have 1 feature (height) to focus on & are learned faster.
- Condensation categories have several features & are slower to learn.
Predictions of Learning Theories
-
Simple Learning Theory predicts no difference in accuracy & speed of learning between the two types of categories
- As it relies of co-occurrence alone - not taking into consideration simplicity vs complexity of cateogry features.
-
Attention Learning Theory predicts Type 1 categories will be learned more accurately & faster.
- Takes into account only relevant dimension.
Attention Learning Theory better fits the real data.
What is one-shot/fast mapping?
Learning by exclusion, based on what is already known (our expectations)
When given data about a relationship that goes against our intuition, and that data is noisy (not a perfect correlation) - what are we likely to do when trying to integrate that information into learning? (wind-speed to fire task & light-switch correspondence task).
We are likely to rely on our intuition to guide our expectations.
Wind-speed to fire spread task:
- in which pts asked to predict the spread of the fire based on the strength of the wind.
- real r/ship is that greater wind = smaller spread - which is counter-intuitive.
- When pts were given this data and then completed the prediction task, they produced imperfect results.
- These results were passed onto subsequent participants - & on & on - creating noisier data that was attenuated by the pts.
- Until finally, the predicted data began to perfectly represent the intuitive model - Geater wind = greater fire spread.
Similar finding was found for counterintuitive light-switch task
What is the difference between Holmesian deduction & judicial exoneration?
Holmesian deduction: once you eliminate the impossible, whatever remains - however improbable - must be the truth.
- Only hypotheses which explain the data are plausible candidates for an explanation.
Judicial exoneration; If one suspect confessed, then we let the other go.
- If one hypothesis clearly explains the data, then other candidates are considered less likely.
What is inference to the best explanation?
- Explanation is hypothesis evaluation.
- Have a prior belief in a hypothesis H, which predicts that we observe data D.
- Observe D then H is supported.
- Even if there are others alt h’s that predict D - you believe your H more.
What is Baye’s Rules?
- Prior Beliefs X Likelihood of Observed Event.
- How much should I update my belief in a hypothesis after observing some data?
- “Your updated belief should be proportional to your prior belief x the likelihood of the observed data, i.e. how much did you hypothesis actually predict the event?”
- Possible explanations should be combined w/ the data to update our belief in each hypothesis - data of our observations & things which require explaining
Coin Flip Example.
- Fair coin - P(HEADS) = .50
- Two-headed Coin - P(HEADS) = 1.0
- Two-tailed Coin - P(HEADS) = 1.0
Flips
- After 2 flips - 2 x tails - Two-head coin hypothesis gone.
- After 9 flips - 9 x tails - hmmm could be double two-tailed coin.
- 10th flip is tail = fair coin.
What is Occam’s Razor?
- People prefer explanations that explain more data with a minimal number of assumptions
- “The simplest explanations (that fits the observed data) is probably the correct one”.