Quiz 5 Flashcards

1
Q

True or False

In Support Vector Machines, we maximize (║w║^2)/2 subject to the margin constraints.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or False

In kernelized SVMs, the kernel matrix K has to be positive definite.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True or False

If two random variables are independent, then they have to be uncorrelated.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True or False

Isocontours of Gaussian distributions have axes whose lengths are proportional to the eigenvalues of the
covariance matrix.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

True or False

Cross validation will guarantee that our model does not overt.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True or False

In logistic regression, the Hessian of the (non regularized) log likelihood is positive denite.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Given a binary classification scenario with Gaussian class conditionals and equal prior probabilities, the
optimal decision boundary will be linear.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or False

The hyperparameters in the regularized logistic regression model are η (learning rate) and λ (regularization
term).

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The Bayes risk for a decision problem is zero when…

A

the class distributions P(X|Y ) do not overlap and the the prior probability for one class is 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Gaussian discriminant analysis…

A

models P(Y = y|X) as a logistic function, is an example of a generative model and can be used to classify points without ever computing an exponential.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Ridge regression…

A

reduces variance at the expense of higher bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Logistic regression…

A

minimizes a convex cost function and can be used with a polynomial kernel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In least-squares linear regression, imposing a Gaussian prior on the weights is equivalent to…

A

L2 regularization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In terms of the bias-variance trade-off, which of the following is/are substantially more harmful to the test error than the training error?

Bias
Loss
Variance
Risk

A

Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In Gaussian discriminant analysis, if two classes come from Gaussian distributions that have different means, may or may not have different covariance matrices, and may or may not have different priors, what are some of the possible decision boundary shapes?

A

1) hyperplane
2) a nonlinear quadric surface (quadric = the isosurface of a quadratic function)
3) the empty set (the classifier always returns the same class)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why might we prefer to minimize the sum of absolute residuals instead of the residual sum of squares for some data sets?

(Hint: What is one of the
flaws of least-squares regression?)

A

The sum of absolute residuals is less sensitive to outliers than the residual sum of squares.

17
Q

You train a linear classifier on 10,000 training points and discover that the training accuracy is only 67%. Which
of the following, done in isolation, has a good chance of improving your training accuracy?

..Add novel features
..Train on more data
..Use linear regression
..Train on less data

A

Add novel features

Train on less data

18
Q

In least-squares linear regression, adding a regularization term can…

A

increase training error, increase validation error and decrease validation error.

19
Q

Recall that the data model, yi = f (Xi) + εi, that justifies the least-squares cost function in regression.

The statistical assumptions of this model, for all i, are…

A

εi comes from a Gaussian distribution, all εi have the same mean, and all yi have the same variance.

20
Q

How does ridge regression compare to linear regression with respect to the bias-variance tradeoff?

A

Ridge regression usually has higher bias and ridge regression’s variance approaches zero as
the regularization parameter
λ → ∞

21
Q

What following quantities affect the bias-variance tradeo-off?

A

λ, the regularization coefficient in ridge regression
C, the slack parameter in soft-margin SVM
d, the polynomial degree in least-squares regression

22
Q

MLE, applied to estimate the mean parameter
of a normal distribution N(μ;Σ) with a known covariance
matrix Σ, returns…

A

the mean of the sample points.

23
Q

Maximizing the log likelihood is equivalent to…

A

maximizing the likelihood.

24
Q

What is the maximum number of points in the Bayes optimal decision boundary?

(Note: as the distribution is
discrete, we are really asking for the maximum number of integral values of k where the classifier makes a transition from predicting one class to the other.)

A

As f is linear in k, there is only one root, and the decision boundary is a single point.