bayesian lecture Flashcards

1
Q

what does bayesian inference aim to achieve

A

you’ve measures something, and you wanna say something about the things that caused it

or with the brain - we have light hitting our retina and brain makes sense of the thing that caused that light pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The likelihood approach is something common with classical statistics

describe this in more detail.

A
  • basically we use the maximum likelihood estimate (an example of an unbiased estimator)
  • this implicitly assumes a relationship between the probability of the outside data (the coin) and the probability of the data you have observed (1 heads flip)
  • basically it looks at the likely distribution and picks whatever is more likely
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

maximum likelihood function

A

so we have the distribution of possible heights for the clock tower, the likelihood function.

the maximum likelihood function just takes the max point of this (most likely value) and uses this to determine the size of the clock tower.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

describe problems of the frequentist approach

A

problems with the maximum likelihood method/ min sum of squares

  • problems with overfitting - looking at 1 single measurment (the side flipped) doesn’t tell us much about the variability of the cause (the coin)
  • however the ML viewpoint would say ah! this is a trick coin, it only gives a heads !!
  • 2nd bad thing - gives a single point estimate. just tells you the most likely value (e.g., heads)
  • no information about what parameters would fit
  • 3rd bad - limited in the way you can test your data - can only use t-test, f-test or chi-squared. maybe you decide things don’t have to always be simple differences between things and want to test a more complex model. well tough! with classic statistics you cant.
  • 4th bad - p value DEPENDS on the number of n, how the data was collected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the problem of overfitting with classic statistic approaches

A

Where you try to fit too much to your data, fitting stuff you don’t actually have (high varaince)

Because the frequentist approach has no restriction on model complexity. this can lead to weird very sneaky fits that catch every single data point. the problem here is it’s not realistic. sometimes there’s some noise where points move further from where they’re meant to be. so to model exactly every single dot might not be a good representation of what the model should be.

if you were to re-fit the model to the data it might be best to use the black line. imagine re-doing the assembly of red and blue dots it would not likely include the noise variables and fit poorly. what would be better is to use a simpler model of lower variance (high bias).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what do we mean by the bias-variance tradeoff?

A
  • two types of prediction errors (bias and variance)
  • there’s a tradeoff between a model’s ability to minimize bias. and variance

bias: the difference between the average prediction of our model and thecorrect value we are trying to predict
variance: the variability of our model prediction for a given data point telling us the spread of the data. a model with high variance shapes and fits very well to the training data but does not generalise well onto data it hasnt seen before

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the bias-variance trade off

A
  • complex model - with high variance and low bias. leads to variable predictions.
  • or simple model - with low variance but high bias. leads to stable predictions.

we see the complex model frequently with classical and frequentist statistics (bad, overfitting)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

to infer the properties of X (cause) given r (observed data)

what calculation do we use (according to bayes)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bayesican is basically an attempt to solve an inverse problem

true of false

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what cases can you use bayes

A

machine learning, statistics, mathematics etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

why do neuroscientists care about Bayes?

A

people believe the brain enables perception using a similar model to bayes formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the likelihood

A

the measured property - r in this example -, written as:

p(r/x)

measured something ( r ) and wanna know something about the height of the tower ( x )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the prior

A

your prior expectation of DV, in this case, the typical height of a clocktower

p(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the posterior probability

A

the probability of the cause (height of clocktower) given the observed data (visual angle)

what you get when you combine the likelihood by the prior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

describe how likelihood and prior can help you infer the properties of something

A
  • inferring speed of a car
  • we can see the car is going between 30-50km (likelihood)
  • we know this road typically n drive 30km (prior)
  • we times those two together to give us an optimal estimate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

whats difference between bayesian and maximum likelihood approach

A

likelihood approach would be satisfied with what we’re seeing alone (speed of the car) but bayesian takes it a step further and adds prior expectations to the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

how might the prior differ

A

in its distribution,

  • normal distribution (gaussian)
  • power law distribution (the Pareto distribution, jp)
  • Exponential-tailed Erlang distribution
  • beta distribution - binary data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what affects the weight given to the likelihood

A

How much variability there is - when perceiving the speed of a car there might be a lot of variabilities (night time, don’t have our glasses on) or might be little variability (sharp vision, can clearly see the car is parked).

in this case the prior doesn’t matter much and you get a optimal estimate that is very similar to the likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what things might affect the optimal esitmate

A
  • how much weight is given to the likelihood (high or low variability)
  • how much weight isgiven to the prior (high or low variability)

judging by how much weight you give either - it affects whether one estimate will dominate the other (unelss both given equal weighting)

20
Q

look at the likelihood estimate here, very little variability when you have a heavy weighted likelihood

what if you have equal reliability of both the likelihood and prior?

A

more variabilty now in the blue line - becomes fatter

now when calculating the estimate, the uncertainty from your visual estimate (likelihood) is about the same as the uncertainty in the prior expectations. both equally wide. now when multiplying them by each thether the posterior is in between.

21
Q

what kind of model do we typical use in Bayesian statistics?

A

The Gaussian model (linear system)- makes things easier

multiplying Gaussians by Gaussians will always result in a gaussian

22
Q

bayesian approach vs max likelihood approach when is bayesian better

A

when you have limitted data flipping 1 heads - bayesian will give a better judge of the optimal estimate (0.66) while max likelihood say heads (1).

after 100 flips then both basically say the same thing

bayesian is most useful in making predictions with limitted data

23
Q

how do priors affect your interpretation of things (e.g., vision)

A

Prior knowledge provides ways to disambiguate stimuli (Helmholtz 1867)

so when we see the below balls, it’s not clear whether the image is sticking out or not BUT your prior knowledge that light comes from above changes your interpretation of the image

now you interpret it as the balls sticking out

24
Q

prior information can stabalise ….

A

the estimate you form

25
Q

what do we mean by optimal organism (the bayesian brain hypothesis)

A

the idea here is that the brain performs optimal inferences and choses the action that maximises the utility function

26
Q

Describe what Weiss, Simoncelli and Adelson 2002 did in their study

A

So they tested bayesian inference. they asked peopel to judge the speed of a moving object that varied in uncertainty (could see it very well vs not so much)

the line they saw it go in two ways - 2 x likelihoods and they paired this with their prior (things normally move slow). this gave them the posterior

then they compare this posterior to subject performance to see if they got the same. AND IT FIT WELL!!! Bayesian model explained subject performance.

key thing we see in this study

  • high contrast - likelihood has more of an effect
  • low contrast - prior has more an effect (percieve thigns slower as what they are)
27
Q

describe to a child what the Weiss, Simoncelli and Adelson 2002 study found

A

they found that when is easier to see something our vision has more an effect on what we percieve but when things are moe foggy our prior expectation has more an effect (underestimating the speed of different objects)

28
Q

prior can have an effect on your percept true or false

A

true bitch

29
Q

Ernst and Banks 2002 study

A
  • visual and touch
  • glasses induced a 3D percept
  • dots with some stickign out to form a bar
  • showed the bar visually to the subject
  • then used these forced-feedback devicess so that when n felt under the mirror there wa a bar they could physically touch
  • manipulated botht the width of the physiclal bar and the width of. the visual bar
  • if the two are matched in terms of reliability - both equally weighted and the optimal estimate is somewhere between
  • if visual is more reliable then the optimal estimate is shifted towards that. you use your visual estimate more to influence percept.
  • found the more noise that wsa induced in the visual estimate the loess weight n attached to their visual information
30
Q

what is the general thing were learning with botht rthe Weis and Ernst studies?

A

whatever estimate is more reliable (prior vs likelihood as in Weiss study) or (likelihood vs likelihood as in Ernst and Banks study) we use that

31
Q

Mamassian & Landy 2002 study

A

investigating the proir belief we have that light comes from above

  • 3 stimuli shown
  • the prior that light comes from above would influence your percept of thestimuli seeing narrow things sticking out vs wide things sticking out
  • third stimuli is somewhere between those. -where its hard to see hwats sticking out where

results

  • the prior expectation of light coming from above did influcene n’s percept of the stimuli
32
Q

Describe some studies that have shown how human perception seems to rely on this kind of integration of different cues (nbased on a bayesian framework)

A
  • ventrilliquist act where you pair audition and visual and you get a visual percept of the puppet speaking even thought the pupet itself didnt. Found bayesian cue combination was going on there
  • surface slants - have to guess whether its staight on or tilted. combine likelihood with prior for that
  • depth cues - e.g., infering the depth of a cylinder based on things like motion, visual circles
33
Q

Griffiths & Tenebaum (2006)

Describe this study

A

asked n questions about different things: length of poems, lengthtime of pharos rein, length time of a marriage, movie runtime

  • aimed to see if n had prior expectations
  • compared this to the actual distriibutions of a movie run time
  • n actually pretty close at guessing the variables as though they have a good idea of prior of movie runtimes that hey combine with the information about the movie being watched
34
Q

Wu, aker et al., (2018)

emotion study

A
  • shown n reaction following an outcome - had to infer their desires and beliefs form this
35
Q

why do we care about action planninig

A

Well with the bayesian brain hypothesis there was an element of motor movment. remember it choses the action that maximimes its utility function

36
Q

action planning (decision)

explain whats going on here

A

chose an action that maximises your maximum gain/utility

  • we have smoe expectation of what the world is like (x)
  • parigin this with a certain motor output (m)
  • this would give rise to a certain outpume (o)
  • estimates the estimated gain from potential outcomes in the workld and the expected outcomes given your motor control
  • we want the one motor action that maximises utility

e.g., in a game - whats the best response. ican make in order to get the highest score

37
Q

Trommershauser et al., 2013

A
  • investigated whether subjects really were trying to maxmise the gain
  • see this in accordance with a normal distributin - basically n werent going to be perfect everytime might be a little bit off to either the left or right

design

  • we have a goal (white circle) and dispense area (light)
  • varied the penalty

results

  • penalty amount varied responses
  • n would hit the goal lsightly further away when penalty was high so to not risk the penalty
  • model fit did a really good job explaining performance - based on the idea that they behaved in a way to maximise their gain
38
Q

according to the bayesian brain hypothesis - how do humans percieve?

A
  • have a generative model of how the world looks
  • do optimal inference
  • create an action plan that takes into account their own motor uncertainty and maximises the utility
39
Q

bayesian brain hypothesis believes humans percieve by:

  • have a generative model of how the world looks
  • do optimal inference
  • create an action plan that takes into account their own motor uncertainty and maximises the utility

have any studies combiend all of these and tested on human subjects?

A

Kording and Wolpert

  • required humans had generative model, needed to do inference and needed an action plan
  • also some learning involved

design

  • n hand placed behind the screen
  • they could NOT see their own hand
  • start at some point and goal is to reach hand towards target location
  • halfway through n recieved some kind of feedback (e.g., location of hand refleted by dot and n could adjust accordingly)
  • soemtimes they gave them bunch of dots - make it harder to know where hand acc was
  • could give them very large number of dots - makes it verryr hard
  • or might not give any feedback at all

this design allows you to vary the priors, likelihoods and thus the posterior probability

  • maybe add lateral shift to things - cheating! - show them soemthing a little to the left or little to right bu dont show them how much its shifted to let or right
  • n might have prior expectation learned about what the shift is likely to be
  • likelihood - the single dot (very precise likelihood) or bunch of dots (variability in the likelihood)
  • then they combine the prior with the likelihood to get the posterior

results

  • relative to the true lateral shift, depending on how clear the likelihood is
  • they rate judgments based heavily / not so heavily on that
  • if they recieve a precise likelihood they rely on this very much and you get little deviation from the target
  • subjects used the information depending on how reliable it was (no feedback - light green very reliable feedback)

of course n have to take into account any physical uncertainty which is why we have the element of the ation control

40
Q

Name the elements of the optimal bayesian observer

A
41
Q

if it possible to test the optimal bayesian observer

A

yes! we have went through studies that did it (Kording and Wolpert)

We have evidence for all three elements in humans. forsome modalities

however when it comes. tokind of higher level thinking like the krakvien and traversty type stuff there is also some ideas that chose not to follow the type of optimal bayesian observer

42
Q

what did Ma et al., (2006) propose

A

cue combination stuff we see at the behaviural level might also be present at the neural level

neurons, cue 1 and cue 2 combinations giving rise to different firing rates in the brain. can combine these using log transformation

that its very possible neurons in the brain could be doing these kind of combinations of information in a weighted way based on reliability

43
Q

so cue combination appears true at the behaviorual level… any evidence for it at the neural level?

A

neurophysical recordings support the idea that some motor neurons acc do this kind of cue-combination. (Fetsch Angelis and Angelika 2012)

  • implies the brain could be doing this kind of encodign
  • e.g., visual and movment cues being combined together
44
Q

what is bayesian inference basically about

A

optimal use of the information.

Should use all the information and weight each in terms of its reliability

to make an inference about an unknown property

45
Q

summary

A
  • human studies - show this specifically in term s of perception, perception + action, and to soem degrees decision making