Bayes Flashcards

1
Q

What is probability ?

A

The measure of the likelihood that an event will occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a random variable?

A

usually writtenX, is a variable whose possible values are numerical outcomes of a random phenomenon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what two types of random variable are there?

A

1) discrete

2) continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is a discrete random variable?

A

Adiscrete random variableis one which may take on only a countable number of distinct values such as 0,1,2,3,4,…….. Discrete random variables are usually (but not necessarily) counts. If a random variable can take only a finite number of distinct values, then it must be discrete. Examples of discrete random variables include the number of children in a family, the Friday night attendance at a cinema, the number of patients in a doctor’s surgery, the number of defective light bulbs in a box of ten.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is Theprobability distributionof a discrete random variable?

A

Theprobability distributionof a discrete random variable is a list of probabilities associated with each of its possible values. It is also sometimes called the probability function or the probability mass function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What probabilities is bayes constituted from?

A

Conditional probability
Joint
Marginal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is conditional probability?

A

If data are obtained from two (or more) random variables,
the probabilities for one may depend on the value of the other(s)

You cannot reverse conditional probabilities – not interchangeable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is joint probability?

A

Is theprobabilityof event Y occurring at the same time event X occurs.

We can multiply conditionals together to make joint probabilities (and they are reversible)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is marginal probability?

A

The probability of an event occurring (could be thought of as the unconditional probability as it is not conditioned on another event).

You can think of it (as the name suggests) as with a table of results…. the marginal probability is one which is totalled (summed) at the margins, so all values from either the column or row (see here http://bit.ly/2pB0gYi) We just add them up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is independance?

A

If talking about conditional probabilities, if they are independent, then the occurrence of one does not affect the probability of occurrence of the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Example of independance ?

A

Two independent processes such as the outcome of rolling a die, and outcome of flipping a coin.

The probability of the outcome 5 from a die is completely independent of the outcome of a coin turning out as a coin. We can compute both separately, and then get the joint probability. The fact that these processes are independent makes the calculations much easier – good property to have.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If you have many discrete data points, what can you create?

A

a continuous variable

Many many bins in a histogram of probabilities becomes a continuous / smooth bell curve (of belief).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is Probability density function?

A

Probability density function is the area under a distribution which is used to specify the probability of the random variable falling within a particular range of values, as opposed to taking on any one value.

or density of a continuous random variable, is a function, whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is The Cumulative distribution function (CDF)?

A

The Cumulative distribution function (CDF): The summed amount in a distribution up to a certain point. So from 0 (or -) to a given point – we integrate all these values to a certain point. The integral of the probability function over its range. It grows as we integrate.

(summing up every single value in a continuous way – the integral of the probability density function over its range) for continuous-valued random variables, instead of specifying probabilities, the distribution is described by the CDF (p X < x). Or by its derivative ‘probability density function’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

So… what is Bayesian probability about?

A

Using different and multiple probability terms to quantify our degree of confidence we have for something to be the case based on our current knowledge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Bayes enables us to….

A

Enable statements to be made about the partial knowledge available (based on data), concerning some situation or ‘state of nature’ (unobservable or as of yet unobserved) in a systematic way – using probability as the measure of uncertainty.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Gelman said, The guiding principle is that the state of knowledge about anything unknown is described by a ….

A

probability distribution

18
Q

say Bayes formula …..

A

Posterior of the model given the data: P (M | D)
Likelihood of the data given the model: P (D | M);
The prior probability of the model (no data): P (M)
The marginal probability (the data): P (D).

P (M|D) = P (D|M) P(M)
––––––––––––
P(D)

19
Q

what is the

The normalization constant ?

A

The marginal probability (the data): P (D).

20
Q

name one computational technique to get a normalisation constant?

A

monte carlo method

21
Q

name one non-computational technique to get a prior?

A

take it from someone else’s study (the partial eta squared - effect size estimation)

22
Q

what mostly determines the posterior distribution?

A

The compromise between p(D|M) and the p(M)

23
Q

The more data we have from either p(D|M) or p(M)

A

the more precise our posterior becomes

24
Q

THE LIKELIHOOD DISTRIBUTION IS NOT….

A

A PROBABILITY DISTRIBUTION

Set of conditional probabilities of a particular outcome.

25
Q

The denominator P(D) incorporates

A

the probability of all possible outcomes (all data)

26
Q

P(D | M) is not equal to….

A

P(M | D) …..

27
Q

P(D | M) is not equal to P(M | D) … why?

A

sometimes known as the base rate fallacy … these two are certainly not the same …. think hypothesitis from presentation

28
Q

if the frequentist approach uses p values for hypothesis testing, what does bayes theorem use?

A

bayes factor

29
Q

if The frequentists approach (NHST) uses the maximum likelihood estimate with confidence intervals to estimate with uncertainty, what does bayes use?

A

Bayes uses the posterior distribution with highest density interval intervals to estimate with uncertainty – P(D|M) and P(M|D)

30
Q

what is the difference between The “frequentist” approaches to statistical inferences which are usually based on maximum likelihood estimation (MLE) and the bayes approach?

A

MLE chooses the parameters that maximize the likelihood of the data.

In MLE, parameters are assumed to be unknown but fixed, and are estimated with some confidence.

In Bayesian statistics, the uncertainty about the unknown parameters is quantified using probability so that the unknown parameters are regarded as random variables.

31
Q

confidence intervals are calculated from the…

A

likelihood distribution

distribution of the data given the model θ (fixed).

32
Q

but bayes calculates the highest density interval HDI using

A

posterior distribution, p(θ|y), and sthg else (nuisance parameters). model θ as random variable.

33
Q

a prior distribution should ….

A

should be a reflection of the pre-existing knowledge about possible parameter values as far as that is quantifiable.

34
Q

what is an non-informative prior ?

A

E.g. temperature, every temp I range has equal prob, so we don’t know anything.

35
Q

what is a conjugate prior?

A

SO conjugate priors lead to the same functional form for prior and posterior.

36
Q

to get the bayes factor we need

A

THE BAYESIAN INFORMATION CRITERION BIc

37
Q

what is precision ?

A

the inverse variance

38
Q

how does precision affect the posterior?

A

The posterior mean is expressed as a weighted average of the prior mean and the observed values, ⃗y, with weights proportional to the precisions.

so

– if original prior had small variance (high precision), that would influence this term, conversely if broad unprecise prior, the term would be very small.

– So what contributes mostly to our estimation of posterior mean is the data

– The more precision in the prior, the more it influences the posterior mean. But if crap, we rely on the data.

The more data we get, the more the data will influence the mean of the posterior

39
Q

so the Posterior distribution can be summed to 2 things?

A

Ð Compromise between data and prior information.

Ð Modulated by the precision in likelihood and prior distributions. the sample size of the data.

40
Q

summery ….

A

SUMMERY

1) Probability is a quantification of the degree of confidence.
2) Use prior knowledge or belief and new data to make better predictions (updated beliefs).
3) Posterior is differentially influenced by the data and prior distributions depending on these factors (sample size / precision (inverse variance))

Precision (inverse variance) in each distribution. Sample size (data evidence).

4) Close-forms (analytical expressions) make computations easier. But simulation techniques are required for more complex problems (here she references normal distribution, and other complex ones, and also not knowing and letting the posterior guide this process in an iterative fashion).

41
Q

According to Jeffreys (1961), Bayes factors falling between
> 1 and 3 are considered
> 3 and 10 represents…
> greater than 10 is considered….

A

“anecdotal” evidence

“moderate” evidence

“strong” evidence.

42
Q

According to Masson (2011), probability values p(H0|D) falling between 0.50 and 0.75 are taken as

A

weak evidence in favor of the null hypothesis.