Part 1 : Probabilistic Data Models Flashcards
Deterministic Models Vs Probabilistic Models
- Deterministic models do not explicitly model uncertainties or
‘randomness’ in data. - Variability of inferences derived from the data is not included.
- In many tasks, we benefit from modelling uncertainty and randomness.
- This is explicit in Probabilistic Models.
Maximum-Likelihood Estimation
is a method of estimating the parameters of a probabilistic model.
- Assume θ is a vector of all parameters of the probabilistic model
- MLE is an extremum estimator obtained by maximising an objective
function of θ
Difference between Deterministic and Probabilistic Models
- A deterministic model would give one value, the most likely.
- A probabilistic model quantifies the chance/probability of the selected point being one of the possible classes.
MLE - Mathematics
Assume f(θ) is an objective function to be optimised (e.g. maximised), the arg max corresponds to the value of θ that attains the maximum value of the objective function f.
θˆ = arg maxθ f(θ)
Note: this is different than maximising the function (i.e. finding the
maximum value [max f(θ)])
Probabilistic MLE Approach
- Derive expression for conditional probability of observing data D given
parameter a.
p(D|a)
- Using observed data, find parameter value which maximises the conditional probability (i.e. the likelihood).
aML = arg maxap(D|a)
- Assume that observations are independent - a common assumption often
referred to as i.i.d. independent and identically distributed.
p(D|a) = Π p(yi | xi, a)
– Note :: Π
Probabilistic MLE Large Sample
- The average of yi value will be a x
- The ‘spread’ will be the same as for (epsilon), defined by σ^2
Probabilistic Variance
Var(aML) = σ^2/Σx^2
Variance is dependant on input variables.
Binomial Distribution
gives the probability distribution for a discrete
variable to obtain exactly D successes out of N trials, where the probability
of the success is α and the probability of failure is (1 − α) and 0 ≤ α ≤ 1,
Maximum a Posteriori (MAP) Estimation
When you build prior knowledge into MLE.
θML = arg maxθ p(D|θ) p(θ)
- Likelihood
- Prior
- Posterior :: (Combine Likelihood and Prior)
Conclusion
- Probabilistic models encode randomness in the data
- They enable predicting confidence (as a probability)
- Parameters of the model are tuned
- Maximum Likelihood Estimation (MLE) is a recipe used for training model parameters
- MLE does not encode our prior knowledge of possible parameters
- Maximum a Posteriori (MAP) maximises likelihood along with prior