Week 5: Statistical Modelling Flashcards by Henry Cao

Probability Distributions

This approach models uncertainty and quantifies our degree of belief that something will happen.

How well did you know this?

Not at all

Perfectly

Probability Distribution Function (PDF)

The area under the curve between two points of a PDF is the probability of the outcome being within the two points.

How well did you know this?

Not at all

Perfectly

Cumulative Distribution Function (PDF)

The height of the curve at a point is the chance that the outcome is less than or equal to the point.

How well did you know this?

Not at all

Perfectly

Joint Distribution

It’s the probability distribution of all the random variables in the set.

How well did you know this?

Not at all

Perfectly

Independence

A and B and independent if

P(A|B) = P(A),
P(B|A) = P(B),
P(A,B) = P(A)*P(B)

How well did you know this?

Not at all

Perfectly

Conditional Independence

A and B are conditionally independent given C

iff P(A,B|C) = P(A|C) * P(B|C), or
iff P(A|B,C) = P(A|C)

Conditional independence doesn’t imply unconditional independence or the other way around.

How well did you know this?

Not at all

Perfectly

Representative Sample

A sample from a population that accurately reflects the characteristics of the population.

How well did you know this?

Not at all

Perfectly

Prior

The initial probability that hypothesis h holds without having observed the data.

P(h)

How well did you know this?

Not at all

Perfectly

Likelihood

The probability of observing data D, given some world where the hypothesis h is true.

P(D|h)

How well did you know this?

Not at all

Perfectly

Posterior

The probability that hypothesis h is true, given that we have observed dataset D.

P(h|D)

How well did you know this?

Not at all

Perfectly

Likelihoods

When modelling a random process, we don’t know the hypothesis h. We estimate the parameters of a model h by maximising the probability P(D|h) (or L(h|D)) of observing D. Hypotheses aren’t always mutually exclusive and there can be an infinite number of them.

How well did you know this?

Not at all

Perfectly

Maximum Likelihood Estimate (MLE)

Calculate the parameters of so to maximise the likelihood L(h|D).

Goal is \arg \max_h \left{L(h \mid D) \right}

L(h \mid D) = P(D \mid h) = \prod_{i=1}^m P(\boldsymbol{x}_i \mid h)

How well did you know this?

Not at all

Perfectly

Bayesian Estimation

Compute a model h of maximum posterior probability Pr(h \mid D)

Goal is \arg \max_h \left{ P(h \mid D) \right}

Using Bayes Rule,

P(h \mid D) = \frac{P(D \mid h) \cdot P(h)}{P(D)}

This assumes conditional independence.

How well did you know this?

Not at all

Perfectly

Conditional Independence Assumption

Can multiply all probabilities given assumption and the probability of the assumption together in the numerator divided by the probability of the data.

How well did you know this?

Not at all

Perfectly

Probability Density Function for Normal Distribution

f(x) = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{(x - \mu)^2}{\sigma^2} \right)}

How well did you know this?

Not at all

Perfectly

Laplace Estimation

Study These Flashcards

When computing likelihoods for each possible attribute value, add 1 to the numerator and \ell to the denominator. This allows for non-zero denominators

Density Estimation

Study These Flashcards

Given a dataset, compute an estimate of an underlying probability density function.

Parametric Models

Study These Flashcards

The number of parameters is fixed and independent of the training set size. These are approximations of reality and incorporate stronger assumptions than non-parametric models. They’re generally more explainable and enable deeper investigations.

Examples include:
- Multivariate Linear Regression
- Neural Networks
- k-Means
- Gaussian

Non-parametric Models

Study These Flashcards

The number of parameters grows as the sample size increases. They have modelling power for getting stronger representations.

Examples include:
- Decision Tree
- DBSCAN

Multivariate Gaussian/Normal Distribution

Study These Flashcards

f(\boldsymbol{x}) = \frac{1}{(2\pi)^{\frac{n}{2}} \left\lvert \boldsymbol{\Sigma} \right\rvert ^{\frac{1}{2}}} e^{-\frac{1}{2}(\boldsymbol{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^_{-1} (\boldsymbol{x} - \boldsymbol{\mu})}

Iso-density Contours

Study These Flashcards

In these contours, all the points x have equal density. f(x) = c.

This is similar to the elevation maps used in topography.

Poisson Distribution

Study These Flashcards

f(x) = \frac{\delta^x e^{-\delta}}{x !}

\delta = rate at which the events occur
x = random variable corresponding to the number of events

Mixture Model

Study These Flashcards

Consists of multiple component models each one specified by its own parameters.

f(\boldsymbol{x}) = \sum_{k=1}^K \pi_k f_k (\boldsymbol{x}; \boldsymbol{w}_k)

Log-likelihood of a Mixture Model

Study These Flashcards

L(\boldsymbol{\pi}, \boldsymbol{w}_1,…,\boldsymbol{w}K) = \log \left[ \sum{k=1}^K \pi_k f_k (\boldsymbol{x}; \boldsymbol{w}_k) \right]

Gaussian Mixture Model (GMM)

P(\boldsymbol{x}_i) = \sum_{k=1}^K P(C_k) P(\boldsymbol{x}_i \mid C_k)

Expectation-Maximisation (EM) Algorithm

Well-known algorithm for computing GMM's. E-Step: compute \pi_{i,k} = P(C_k \mid \boldsymbol{x}_i). Using Bayes rule, compute P(\boldsymbol{x}_i \mid C_k) P(C_k). m_k = \sum_{i=1}^m \pi_{i,k} M-step: compute the new means, covariances, and component weights \boldsymbol{\mu}_k \leftarrow \sum_{i=1}^m \left( \frac{\pi_{i,k}}{m_k} \right) \boldsymbol{x}_i \boldsymbol{\Sigma}_k \leftarrow \sum_{i=1}^m \left( \frac{\pi_{i,k}}{m_k}\right) (\boldsymbol{x}_i - \boldsymbol{\mu}_j) (\boldsymbol{x}_i - \boldsymbol{\mu}_j)^T \pi_k \leftarrow \frac{m_k}{m}

Week 5: Statistical Modelling Flashcards

(26 cards)