bayesian lecture Flashcards
what does bayesian inference aim to achieve
you’ve measures something, and you wanna say something about the things that caused it
or with the brain - we have light hitting our retina and brain makes sense of the thing that caused that light pattern
The likelihood approach is something common with classical statistics
describe this in more detail.
- basically we use the maximum likelihood estimate (an example of an unbiased estimator)
- this implicitly assumes a relationship between the probability of the outside data (the coin) and the probability of the data you have observed (1 heads flip)
- basically it looks at the likely distribution and picks whatever is more likely
maximum likelihood function
so we have the distribution of possible heights for the clock tower, the likelihood function.
the maximum likelihood function just takes the max point of this (most likely value) and uses this to determine the size of the clock tower.
describe problems of the frequentist approach
problems with the maximum likelihood method/ min sum of squares
- problems with overfitting - looking at 1 single measurment (the side flipped) doesn’t tell us much about the variability of the cause (the coin)
- however the ML viewpoint would say ah! this is a trick coin, it only gives a heads !!
- 2nd bad thing - gives a single point estimate. just tells you the most likely value (e.g., heads)
- no information about what parameters would fit
- 3rd bad - limited in the way you can test your data - can only use t-test, f-test or chi-squared. maybe you decide things don’t have to always be simple differences between things and want to test a more complex model. well tough! with classic statistics you cant.
- 4th bad - p value DEPENDS on the number of n, how the data was collected
the problem of overfitting with classic statistic approaches
Where you try to fit too much to your data, fitting stuff you don’t actually have (high varaince)
Because the frequentist approach has no restriction on model complexity. this can lead to weird very sneaky fits that catch every single data point. the problem here is it’s not realistic. sometimes there’s some noise where points move further from where they’re meant to be. so to model exactly every single dot might not be a good representation of what the model should be.
if you were to re-fit the model to the data it might be best to use the black line. imagine re-doing the assembly of red and blue dots it would not likely include the noise variables and fit poorly. what would be better is to use a simpler model of lower variance (high bias).
what do we mean by the bias-variance tradeoff?
- two types of prediction errors (bias and variance)
- there’s a tradeoff between a model’s ability to minimize bias. and variance
bias: the difference between the average prediction of our model and thecorrect value we are trying to predict
variance: the variability of our model prediction for a given data point telling us the spread of the data. a model with high variance shapes and fits very well to the training data but does not generalise well onto data it hasnt seen before
What is the bias-variance trade off
- complex model - with high variance and low bias. leads to variable predictions.
- or simple model - with low variance but high bias. leads to stable predictions.
we see the complex model frequently with classical and frequentist statistics (bad, overfitting)
to infer the properties of X (cause) given r (observed data)
what calculation do we use (according to bayes)
Bayesican is basically an attempt to solve an inverse problem
true of false
true
what cases can you use bayes
machine learning, statistics, mathematics etc
why do neuroscientists care about Bayes?
people believe the brain enables perception using a similar model to bayes formula
what is the likelihood
the measured property - r in this example -, written as:
p(r/x)
measured something ( r ) and wanna know something about the height of the tower ( x )
what is the prior
your prior expectation of DV, in this case, the typical height of a clocktower
p(x)
what is the posterior probability
the probability of the cause (height of clocktower) given the observed data (visual angle)
what you get when you combine the likelihood by the prior
describe how likelihood and prior can help you infer the properties of something
- inferring speed of a car
- we can see the car is going between 30-50km (likelihood)
- we know this road typically n drive 30km (prior)
- we times those two together to give us an optimal estimate
whats difference between bayesian and maximum likelihood approach
likelihood approach would be satisfied with what we’re seeing alone (speed of the car) but bayesian takes it a step further and adds prior expectations to the model
how might the prior differ
in its distribution,
- normal distribution (gaussian)
- power law distribution (the Pareto distribution, jp)
- Exponential-tailed Erlang distribution
- beta distribution - binary data
what affects the weight given to the likelihood
How much variability there is - when perceiving the speed of a car there might be a lot of variabilities (night time, don’t have our glasses on) or might be little variability (sharp vision, can clearly see the car is parked).
in this case the prior doesn’t matter much and you get a optimal estimate that is very similar to the likelihood
what things might affect the optimal esitmate
- how much weight is given to the likelihood (high or low variability)
- how much weight isgiven to the prior (high or low variability)
judging by how much weight you give either - it affects whether one estimate will dominate the other (unelss both given equal weighting)
look at the likelihood estimate here, very little variability when you have a heavy weighted likelihood
what if you have equal reliability of both the likelihood and prior?
more variabilty now in the blue line - becomes fatter
now when calculating the estimate, the uncertainty from your visual estimate (likelihood) is about the same as the uncertainty in the prior expectations. both equally wide. now when multiplying them by each thether the posterior is in between.
what kind of model do we typical use in Bayesian statistics?
The Gaussian model (linear system)- makes things easier
multiplying Gaussians by Gaussians will always result in a gaussian
bayesian approach vs max likelihood approach when is bayesian better
when you have limitted data flipping 1 heads - bayesian will give a better judge of the optimal estimate (0.66) while max likelihood say heads (1).
after 100 flips then both basically say the same thing
bayesian is most useful in making predictions with limitted data
how do priors affect your interpretation of things (e.g., vision)
Prior knowledge provides ways to disambiguate stimuli (Helmholtz 1867)
so when we see the below balls, it’s not clear whether the image is sticking out or not BUT your prior knowledge that light comes from above changes your interpretation of the image
now you interpret it as the balls sticking out
prior information can stabalise ….
the estimate you form
what do we mean by optimal organism (the bayesian brain hypothesis)
the idea here is that the brain performs optimal inferences and choses the action that maximises the utility function
Describe what Weiss, Simoncelli and Adelson 2002 did in their study
So they tested bayesian inference. they asked peopel to judge the speed of a moving object that varied in uncertainty (could see it very well vs not so much)
the line they saw it go in two ways - 2 x likelihoods and they paired this with their prior (things normally move slow). this gave them the posterior
then they compare this posterior to subject performance to see if they got the same. AND IT FIT WELL!!! Bayesian model explained subject performance.
key thing we see in this study
- high contrast - likelihood has more of an effect
- low contrast - prior has more an effect (percieve thigns slower as what they are)
describe to a child what the Weiss, Simoncelli and Adelson 2002 study found
they found that when is easier to see something our vision has more an effect on what we percieve but when things are moe foggy our prior expectation has more an effect (underestimating the speed of different objects)
prior can have an effect on your percept true or false
true bitch
Ernst and Banks 2002 study
- visual and touch
- glasses induced a 3D percept
- dots with some stickign out to form a bar
- showed the bar visually to the subject
- then used these forced-feedback devicess so that when n felt under the mirror there wa a bar they could physically touch
- manipulated botht the width of the physiclal bar and the width of. the visual bar
- if the two are matched in terms of reliability - both equally weighted and the optimal estimate is somewhere between
- if visual is more reliable then the optimal estimate is shifted towards that. you use your visual estimate more to influence percept.
- found the more noise that wsa induced in the visual estimate the loess weight n attached to their visual information
what is the general thing were learning with botht rthe Weis and Ernst studies?
whatever estimate is more reliable (prior vs likelihood as in Weiss study) or (likelihood vs likelihood as in Ernst and Banks study) we use that
Mamassian & Landy 2002 study
investigating the proir belief we have that light comes from above
- 3 stimuli shown
- the prior that light comes from above would influence your percept of thestimuli seeing narrow things sticking out vs wide things sticking out
- third stimuli is somewhere between those. -where its hard to see hwats sticking out where
results
- the prior expectation of light coming from above did influcene n’s percept of the stimuli
Describe some studies that have shown how human perception seems to rely on this kind of integration of different cues (nbased on a bayesian framework)
- ventrilliquist act where you pair audition and visual and you get a visual percept of the puppet speaking even thought the pupet itself didnt. Found bayesian cue combination was going on there
- surface slants - have to guess whether its staight on or tilted. combine likelihood with prior for that
- depth cues - e.g., infering the depth of a cylinder based on things like motion, visual circles
Griffiths & Tenebaum (2006)
Describe this study
asked n questions about different things: length of poems, lengthtime of pharos rein, length time of a marriage, movie runtime
- aimed to see if n had prior expectations
- compared this to the actual distriibutions of a movie run time
- n actually pretty close at guessing the variables as though they have a good idea of prior of movie runtimes that hey combine with the information about the movie being watched
Wu, aker et al., (2018)
emotion study
- shown n reaction following an outcome - had to infer their desires and beliefs form this
why do we care about action planninig
Well with the bayesian brain hypothesis there was an element of motor movment. remember it choses the action that maximimes its utility function
action planning (decision)
explain whats going on here
chose an action that maximises your maximum gain/utility
- we have smoe expectation of what the world is like (x)
- parigin this with a certain motor output (m)
- this would give rise to a certain outpume (o)
- estimates the estimated gain from potential outcomes in the workld and the expected outcomes given your motor control
- we want the one motor action that maximises utility
e.g., in a game - whats the best response. ican make in order to get the highest score
Trommershauser et al., 2013
- investigated whether subjects really were trying to maxmise the gain
- see this in accordance with a normal distributin - basically n werent going to be perfect everytime might be a little bit off to either the left or right
design
- we have a goal (white circle) and dispense area (light)
- varied the penalty
results
- penalty amount varied responses
- n would hit the goal lsightly further away when penalty was high so to not risk the penalty
- model fit did a really good job explaining performance - based on the idea that they behaved in a way to maximise their gain
according to the bayesian brain hypothesis - how do humans percieve?
- have a generative model of how the world looks
- do optimal inference
- create an action plan that takes into account their own motor uncertainty and maximises the utility
bayesian brain hypothesis believes humans percieve by:
- have a generative model of how the world looks
- do optimal inference
- create an action plan that takes into account their own motor uncertainty and maximises the utility
have any studies combiend all of these and tested on human subjects?
Kording and Wolpert
- required humans had generative model, needed to do inference and needed an action plan
- also some learning involved
design
- n hand placed behind the screen
- they could NOT see their own hand
- start at some point and goal is to reach hand towards target location
- halfway through n recieved some kind of feedback (e.g., location of hand refleted by dot and n could adjust accordingly)
- soemtimes they gave them bunch of dots - make it harder to know where hand acc was
- could give them very large number of dots - makes it verryr hard
- or might not give any feedback at all
this design allows you to vary the priors, likelihoods and thus the posterior probability
- maybe add lateral shift to things - cheating! - show them soemthing a little to the left or little to right bu dont show them how much its shifted to let or right
- n might have prior expectation learned about what the shift is likely to be
- likelihood - the single dot (very precise likelihood) or bunch of dots (variability in the likelihood)
- then they combine the prior with the likelihood to get the posterior
results
- relative to the true lateral shift, depending on how clear the likelihood is
- they rate judgments based heavily / not so heavily on that
- if they recieve a precise likelihood they rely on this very much and you get little deviation from the target
- subjects used the information depending on how reliable it was (no feedback - light green very reliable feedback)
of course n have to take into account any physical uncertainty which is why we have the element of the ation control
Name the elements of the optimal bayesian observer
if it possible to test the optimal bayesian observer
yes! we have went through studies that did it (Kording and Wolpert)
We have evidence for all three elements in humans. forsome modalities
however when it comes. tokind of higher level thinking like the krakvien and traversty type stuff there is also some ideas that chose not to follow the type of optimal bayesian observer
what did Ma et al., (2006) propose
cue combination stuff we see at the behaviural level might also be present at the neural level
neurons, cue 1 and cue 2 combinations giving rise to different firing rates in the brain. can combine these using log transformation
that its very possible neurons in the brain could be doing these kind of combinations of information in a weighted way based on reliability
so cue combination appears true at the behaviorual level… any evidence for it at the neural level?
neurophysical recordings support the idea that some motor neurons acc do this kind of cue-combination. (Fetsch Angelis and Angelika 2012)
- implies the brain could be doing this kind of encodign
- e.g., visual and movment cues being combined together
what is bayesian inference basically about
optimal use of the information.
Should use all the information and weight each in terms of its reliability
to make an inference about an unknown property
summary
- human studies - show this specifically in term s of perception, perception + action, and to soem degrees decision making