bayesian lecture Flashcards
what does bayesian inference aim to achieve
you’ve measures something, and you wanna say something about the things that caused it
or with the brain - we have light hitting our retina and brain makes sense of the thing that caused that light pattern
The likelihood approach is something common with classical statistics
describe this in more detail.
- basically we use the maximum likelihood estimate (an example of an unbiased estimator)
- this implicitly assumes a relationship between the probability of the outside data (the coin) and the probability of the data you have observed (1 heads flip)
- basically it looks at the likely distribution and picks whatever is more likely
maximum likelihood function
so we have the distribution of possible heights for the clock tower, the likelihood function.
the maximum likelihood function just takes the max point of this (most likely value) and uses this to determine the size of the clock tower.
describe problems of the frequentist approach
problems with the maximum likelihood method/ min sum of squares
- problems with overfitting - looking at 1 single measurment (the side flipped) doesn’t tell us much about the variability of the cause (the coin)
- however the ML viewpoint would say ah! this is a trick coin, it only gives a heads !!
- 2nd bad thing - gives a single point estimate. just tells you the most likely value (e.g., heads)
- no information about what parameters would fit
- 3rd bad - limited in the way you can test your data - can only use t-test, f-test or chi-squared. maybe you decide things don’t have to always be simple differences between things and want to test a more complex model. well tough! with classic statistics you cant.
- 4th bad - p value DEPENDS on the number of n, how the data was collected
the problem of overfitting with classic statistic approaches
Where you try to fit too much to your data, fitting stuff you don’t actually have (high varaince)
Because the frequentist approach has no restriction on model complexity. this can lead to weird very sneaky fits that catch every single data point. the problem here is it’s not realistic. sometimes there’s some noise where points move further from where they’re meant to be. so to model exactly every single dot might not be a good representation of what the model should be.
if you were to re-fit the model to the data it might be best to use the black line. imagine re-doing the assembly of red and blue dots it would not likely include the noise variables and fit poorly. what would be better is to use a simpler model of lower variance (high bias).
what do we mean by the bias-variance tradeoff?
- two types of prediction errors (bias and variance)
- there’s a tradeoff between a model’s ability to minimize bias. and variance
bias: the difference between the average prediction of our model and thecorrect value we are trying to predict
variance: the variability of our model prediction for a given data point telling us the spread of the data. a model with high variance shapes and fits very well to the training data but does not generalise well onto data it hasnt seen before
What is the bias-variance trade off
- complex model - with high variance and low bias. leads to variable predictions.
- or simple model - with low variance but high bias. leads to stable predictions.
we see the complex model frequently with classical and frequentist statistics (bad, overfitting)
to infer the properties of X (cause) given r (observed data)
what calculation do we use (according to bayes)
Bayesican is basically an attempt to solve an inverse problem
true of false
true
what cases can you use bayes
machine learning, statistics, mathematics etc
why do neuroscientists care about Bayes?
people believe the brain enables perception using a similar model to bayes formula
what is the likelihood
the measured property - r in this example -, written as:
p(r/x)
measured something ( r ) and wanna know something about the height of the tower ( x )
what is the prior
your prior expectation of DV, in this case, the typical height of a clocktower
p(x)
what is the posterior probability
the probability of the cause (height of clocktower) given the observed data (visual angle)
what you get when you combine the likelihood by the prior
describe how likelihood and prior can help you infer the properties of something
- inferring speed of a car
- we can see the car is going between 30-50km (likelihood)
- we know this road typically n drive 30km (prior)
- we times those two together to give us an optimal estimate
whats difference between bayesian and maximum likelihood approach
likelihood approach would be satisfied with what we’re seeing alone (speed of the car) but bayesian takes it a step further and adds prior expectations to the model
how might the prior differ
in its distribution,
- normal distribution (gaussian)
- power law distribution (the Pareto distribution, jp)
- Exponential-tailed Erlang distribution
- beta distribution - binary data
what affects the weight given to the likelihood
How much variability there is - when perceiving the speed of a car there might be a lot of variabilities (night time, don’t have our glasses on) or might be little variability (sharp vision, can clearly see the car is parked).
in this case the prior doesn’t matter much and you get a optimal estimate that is very similar to the likelihood