Probabilistic Generative Models Flashcards

1
Q

What are generalized linear models?

A

These are models making predictions such that y(x) = f(g(x, w)), where g is a linear function of w and f is a non-linear function.
N.B.: Decision surfaces still are linear functions of x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a 2-class classificiation problem, what is the prediction of a probabilistic generative model?

A

P(C1|x) = 1 / (1+exp(-a)) = σ(a)
and P(C2|x) = 1 - σ(a),
where a = - ln[ (P(x|C1)*P(C1)) / (P(x|C2)*P(C2)) ]
(σ is the sigmoid function)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In a k-class classificiation problem, what is the prediction of a probabilistic generative model?

A

P(Ck|x) = exp(ak) / Σj exp(aj) = softmax_k(x),
where aj = ln[ P(x|Cj)*P(Cj) ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How are called the terms of P(Ck|x) = P(x|Ck)*P(Ck) / P(x)?

A

P(Ck) is the prior probability, P(Ck|x) is the posterior probability, P(x) is the density of X and P(x|Ck) is the class conditional density.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the inductive bias of Gaussian Discriminant Analysis (GDA), a.k.a. “Linear Discriminant Analysis (LDA)”?

A
  1. All class conditional densities are Gaussian.
  2. All classes share the same covariance matrix.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the parameters of GDA?

A

The prior probabilities P(Ck) and Gaussian means μk of each class, and their common covariance matrix Σ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In a 2-class classificiation problem, how is the prediction calculated by GDA?

A

P(C1|x) = σ(w.T * x + w0),
where w = Σ^-1 * (μ1 - μ2)
and w0 = -1/2 * μ1.T * Σ^-1 * μ1 + 1/2 * μ2.T * Σ^-1 * μ2 + ln(P(C1)/P(C2))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a 2-class classificiation problem, what is the likelihood of the datapoint (x, t)?

A

[P(C1) * N(x|μ1, Σ)]^t * [P(C2) * N(x|μ2, Σ)]^(1-t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the i.i.d. assumption?

A

All points of a dataset are independant and identically distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are learned the parameters of GDA?

A

By maximizing the likelihood function on the training set (i.e. the product of the likelihood of each training point).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the optimal values of P(Ck), μk and Σ using GDA and max. log. likelihood?

A

The optimal value of P(Ck) is the frequency of Ck in the training set.
The optimal value of μk is the x value averaged over all Ck training examples.
The optimal value of Σ is the weighted average of the covariance matrix of each class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Quadratic Discriminant Analysis (QDA)?

A

It’s like GDA but without making the assumption that all classes have the same covariance matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Naive Bayes (NB) assumption?

A

Features are conditionaly independant, given the class label, i.e. P(X|Ck) = P(X1, X2, … |Ck) ≈ P(X1|Ck) * P(X2|Ck) * …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are Probabilisitc Graphical Models (PGM)?

A

It’s a trade-off between GDA (where all features are considered dependent) and NB (where all features are considered independent): the dependencies of features are set according to a dependency graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Diagonal GDA (DGDA)?

A

It’s GDA, making the NB assumption (the covariance matrix is then diagonal).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Gaussian Naive Bayes (GNB)?

A

It’s QDA, making the NB assumption (or DGDA withouh sharing the covariance matrix). Each class then has its own diagonal covariance matrix.

17
Q

What is the consequence on the decision boundaries of sharing or not the covariance matrix across all features?

A

If the covariance matrix is shared, decision boundaries are linear, otherwise they’re not.