2015 A2 Flashcards

1
Q

Explain the difference between a MLE and MAP.

A

MAP includes the prior P(θ), while MLE doesn’t take the prior into account.
Therefore, the likelihood of MAP is weighted with some weight coming from
the prior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which probability distributions are modelled by discriminative and generative models respectively. Assume x is the data, C is class label. Which distributions are directly optimized during the training process

A

Discriminative: optimizes p(C|x) which maximizes the difference between classes
Generative: optimizes the joint probability p(x,C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are generative models fundamentally capable of that discriminative models are not? Motivate

A

Generative models can be used to simulate new data. They can also work with unlabeled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give an example of a generative and a discriminative model.

A

Generative - Naive Bayes.
Discriminative - Logistic Regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the difference between PCA and LDA projections.

A

PCA - Projects the data points onto a lower dimension sub space, such that the it maximizes the variance between data points. Doesn’t care about classes.
LDA - Porjects the data points onto a lower dimension sub space, such that to maximize the Sb (Between scatter), and minimize the Sw(within scatter) data points of the separate classes. Cares about classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why would either LDA or PCA work better if the goal is to train a classifier on the projected version of the given dataset.
(Data set has linear classes that don’t overlap)

A

LDA will work better when you have the observations labelled to specific classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Suppose a Logistic Regression classifier is trained on a two-dimensional dataset with two classes. Features including x0,x1,x2. The weights of the classifier corresponding to each of these features are w0,w1,w2.

Now assume that we add a regulization term of the form a(wn)^2 to the negative log likelihood before minimizing over the weights, where alpha is a positive real number. If alpha is very large, how does this affect the weights obtained during minimization?

A

During the minimization the penalized feature is prevented from obtaining its
optimal value. The other features will therefore dominate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

It is necessary to find the most likely state sequence for [x1,x2,x3]. Name the algorithm that you will use for this.

A

Viterbi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly