2015 A2 Flashcards

Question 1

Q

Explain the difference between a MLE and MAP.

Answer

A

MAP includes the prior P(θ), while MLE doesn’t take the prior into account.
Therefore, the likelihood of MAP is weighted with some weight coming from
the prior.

Question 2

Q

Which probability distributions are modelled by discriminative and generative models respectively. Assume x is the data, C is class label. Which distributions are directly optimized during the training process

Answer

A

Discriminative: optimizes p(C|x) which maximizes the difference between classes
Generative: optimizes the joint probability p(x,C)

Question 3

Q

What are generative models fundamentally capable of that discriminative models are not? Motivate

Answer

A

Generative models can be used to simulate new data. They can also work with unlabeled data.

Question 4

Q

Give an example of a generative and a discriminative model.

Answer

A

Generative - Naive Bayes.
Discriminative - Logistic Regression.

Question 5

Q

Describe the difference between PCA and LDA projections.

Answer

A

PCA - Projects the data points onto a lower dimension sub space, such that the it maximizes the variance between data points. Doesn’t care about classes.
LDA - Porjects the data points onto a lower dimension sub space, such that to maximize the Sb (Between scatter), and minimize the Sw(within scatter) data points of the separate classes. Cares about classes.

Question 6

Q

Why would either LDA or PCA work better if the goal is to train a classifier on the projected version of the given dataset.
(Data set has linear classes that don’t overlap)

Answer

A

LDA will work better when you have the observations labelled to specific classes.

Question 7

Q

Suppose a Logistic Regression classifier is trained on a two-dimensional dataset with two classes. Features including x0,x1,x2. The weights of the classifier corresponding to each of these features are w0,w1,w2.

Now assume that we add a regulization term of the form a(wn)^2 to the negative log likelihood before minimizing over the weights, where alpha is a positive real number. If alpha is very large, how does this affect the weights obtained during minimization?

Answer

A

During the minimization the penalized feature is prevented from obtaining its
optimal value. The other features will therefore dominate.

Question 8

Q

It is necessary to find the most likely state sequence for [x1,x2,x3]. Name the algorithm that you will use for this.

2015 A2 Flashcards

(8 cards)