Lecture 10 Flashcards
1
Q
What is learning?
A
- Not just change in behaviour over time - as learning is not the same as development e.g you get taller but you didn’t learn to get taller
- Learning is change in behaviour due to reinforcement via observational learning and classical conditioning
- Can improve behaviour over time via evolution and survival of fittest = not active but passive as evolution does the work
- In decision making = maximising reinforcement/utility
- Learning is improved decision making over time
2
Q
How to improve decision making?
A
- Better probability estimation via
- Subjective probability and objective reality, coherence and correspondence
- Better utility estimation: expected vs experienced utility: being able to estimate how much you like something vs how much you actually liked them
3
Q
What is an example of learning?
A
- Two options: Act 1 or Act 2
- You will wither win/lose, and repeat decision over time - remove outcome uncertainty by gaining additional info
4
Q
What is the linear model of learning?
A
- Two possible behaviours WITH reinforceable probabilities
- Choice probabilities at time t, as time passes and more trials have passed, probabilities for reinforcement get bigger
- At a given time, probability that picking act A will be reinforced, you win/lose, and lambda is the learning rate
- In every given trial there is an error between anticipation and reality = still had a probability between win/lose, and will get smaller as you are adjusting probability to get better
- Teacher value is what happened - probability estimate so with enough time the estimated probability approach true = decision making improves over time
5
Q
What is the Rescorla Wagner Model with nonlinear response mapping?
A
- Replace external response probabilities with internal response weights, weights are adjusted based on reinforcement on trials
- Predicts nonnormative probability matching: the probability of a person choosing either option converges with the weighted probabilities, and can tell If behaviour is probability matching and is learning (normative)
- Nonlinear response function predicts maximising e.g pick one option 100% of time = but not engaging at probability matching
- Response function replaces probability estimate with weight = changes the overall probability
6
Q
What is calibration?
A
- Forecasters were too cautious and overemphasised uncertainty e.g saying pain = 0.6 meant rain = 0.8 true probability
- Based on feedback, weather predictions are more calibrated with more information = subjective probabilities matched objective probabilities
7
Q
What is a nonnormative learning example?
A
- The inverse Base-rate effect
- Has training trials with corrective feedback, and condition 2 = trials without corrective feedback (leads to novel combination of symptoms)
- Have two diseases: one of them is more common than the other
- In this condition, person can only have 1 of these diseases
- Patients with common disease have two symptoms = one of which is perfectly diagnostic of common disease, other symptom is not commonly diagnostic but people with rare disease have it too - imperfect predictor
- People learn to correctly predict when patients have common/rare diseases and perfectly accurately
- After training and introducing them to new patients = where they have both predictive symptoms = ppts pick the rare disease = valid cues but both conflict with each other so should follow base rate
- Ppts learn to predict things that happen a lot because it occurs a lot with the combination of cues, the patient with rare cue, you pay more attention to the predictive cue only. Combination of test cues do not match what people have learned but matches what people think for the rare disease better than the common disease
8
Q
Why does feedback not fix everything?
A
- Inverse base-rate effect is more than base-rate neglect
- Base rate neglect predicts 50/50 response but inverse base rate effect is more rare responding than common
- Inaccurate representation leads to nonnormative response probabilities
9
Q
How to have better utility estimation?
A
- Update expected utilities based on actual utilities
- If you want to know about the utility of some event = don’t predict how you will feel due to biases = look at how someone else experiencing that event now feels