Part 1 : Probabilistic Data Models Flashcards

1
Q

Deterministic Models Vs Probabilistic Models

A
  • Deterministic models do not explicitly model uncertainties or
    ‘randomness’ in data.
  • Variability of inferences derived from the data is not included.
  • In many tasks, we benefit from modelling uncertainty and randomness.
  • This is explicit in Probabilistic Models.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Maximum-Likelihood Estimation

A

is a method of estimating the parameters of a probabilistic model.

  • Assume θ is a vector of all parameters of the probabilistic model
  • MLE is an extremum estimator obtained by maximising an objective
    function of θ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Difference between Deterministic and Probabilistic Models

A
  • A deterministic model would give one value, the most likely.
  • A probabilistic model quantifies the chance/probability of the selected point being one of the possible classes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

MLE - Mathematics

A
Assume f(θ) is an objective function to be optimised (e.g. maximised), the
arg max corresponds to the value of θ that attains the maximum value of
the objective function f.

θˆ = arg maxθ f(θ)

Note: this is different than maximising the function (i.e. finding the
maximum value [max f(θ)])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Probabilistic MLE Approach

A
  • Derive expression for conditional probability of observing data D given
    parameter a.

p(D|a)

- Using observed data, find parameter value which maximises the
conditional probability (i.e. the likelihood).

aML = arg maxap(D|a)

  • Assume that observations are independent - a common assumption often
    referred to as i.i.d. independent and identically distributed.

p(D|a) = Π p(yi | xi, a)

– Note :: Π

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Probabilistic MLE Large Sample

A
  • The average of yi value will be a x

- The ‘spread’ will be the same as for (epsilon), defined by σ^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Probabilistic Variance

A

Var(aML) = σ^2/Σx^2

Variance is dependant on input variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Binomial Distribution

A

gives the probability distribution for a discrete
variable to obtain exactly D successes out of N trials, where the probability
of the success is α and the probability of failure is (1 − α) and 0 ≤ α ≤ 1,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Maximum a Posteriori (MAP) Estimation

A

When you build prior knowledge into MLE.

θML = arg maxθ p(D|θ) p(θ)

  • Likelihood
  • Prior
  • Posterior :: (Combine Likelihood and Prior)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Conclusion

A
  • Probabilistic models encode randomness in the data
  • They enable predicting confidence (as a probability)
  • Parameters of the model are tuned
  • Maximum Likelihood Estimation (MLE) is a recipe used for training model parameters
  • MLE does not encode our prior knowledge of possible parameters
  • Maximum a Posteriori (MAP) maximises likelihood along with prior
How well did you know this?
1
Not at all
2
3
4
5
Perfectly