Probabilistic Generative Models Flashcards

Question 1

Q

What are generalized linear models?

Answer

A

These are models making predictions such that y(x) = f(g(x, w)), where g is a linear function of w and f is a non-linear function.
N.B.: Decision surfaces still are linear functions of x.

Question 2

Q

In a 2-class classificiation problem, what is the prediction of a probabilistic generative model?

Answer

A

P(C1|x) = 1 / (1+exp(-a)) = σ(a)
and P(C2|x) = 1 - σ(a),
where a = - ln[ (P(x|C1)*P(C1)) / (P(x|C2)*P(C2)) ]
(σ is the sigmoid function)

Question 3

Q

In a k-class classificiation problem, what is the prediction of a probabilistic generative model?

Answer

A

P(Ck|x) = exp(ak) / Σj exp(aj) = softmax_k(x),
where aj = ln[ P(x|Cj)*P(Cj) ]

Question 4

Q

How are called the terms of P(Ck|x) = P(x|Ck)*P(Ck) / P(x)?

Answer

A

P(Ck) is the prior probability, P(Ck|x) is the posterior probability, P(x) is the density of X and P(x|Ck) is the class conditional density.

Question 5

Q

What is the inductive bias of Gaussian Discriminant Analysis (GDA), a.k.a. “Linear Discriminant Analysis (LDA)”?

Answer

A

All class conditional densities are Gaussian.
All classes share the same covariance matrix.

Question 6

Q

What are the parameters of GDA?

Answer

A

The prior probabilities P(Ck) and Gaussian means μk of each class, and their common covariance matrix Σ.

Question 7

Q

In a 2-class classificiation problem, how is the prediction calculated by GDA?

Answer

A

P(C1|x) = σ(w.T * x + w0),
where w = Σ^-1 * (μ1 - μ2)
and w0 = -1/2 * μ1.T * Σ^-1 * μ1 + 1/2 * μ2.T * Σ^-1 * μ2 + ln(P(C1)/P(C2))

Question 8

Q

In a 2-class classificiation problem, what is the likelihood of the datapoint (x, t)?

Answer

A

[P(C1) * N(x|μ1, Σ)]^t * [P(C2) * N(x|μ2, Σ)]^(1-t)

Question 9

Q

What is the i.i.d. assumption?

Answer

A

All points of a dataset are independant and identically distributed.

Question 10

Q

How are learned the parameters of GDA?

Answer

A

By maximizing the likelihood function on the training set (i.e. the product of the likelihood of each training point).

Question 11

Q

What are the optimal values of P(Ck), μk and Σ using GDA and max. log. likelihood?

Answer

A

The optimal value of P(Ck) is the frequency of Ck in the training set.
The optimal value of μk is the x value averaged over all Ck training examples.
The optimal value of Σ is the weighted average of the covariance matrix of each class.

Question 12

Q

What is Quadratic Discriminant Analysis (QDA)?

Answer

A

It’s like GDA but without making the assumption that all classes have the same covariance matrix.

Question 13

Q

What is the Naive Bayes (NB) assumption?

Answer

A

Features are conditionaly independant, given the class label, i.e. P(X|Ck) = P(X1, X2, … |Ck) ≈ P(X1|Ck) * P(X2|Ck) * …

Question 14

Q

What are Probabilisitc Graphical Models (PGM)?

Answer

A

It’s a trade-off between GDA (where all features are considered dependent) and NB (where all features are considered independent): the dependencies of features are set according to a dependency graph.

Question 15

Q

What is Diagonal GDA (DGDA)?

Answer

A

It’s GDA, making the NB assumption (the covariance matrix is then diagonal).

Question 16

Q

What is Gaussian Naive Bayes (GNB)?

Answer

Study These Flashcards

A

It’s QDA, making the NB assumption (or DGDA withouh sharing the covariance matrix). Each class then has its own diagonal covariance matrix.

Question 17

Q

What is the consequence on the decision boundaries of sharing or not the covariance matrix across all features?

Answer

Study These Flashcards

A

If the covariance matrix is shared, decision boundaries are linear, otherwise they’re not.

Probabilistic Generative Models Flashcards

(17 cards)