Quiz 5 Flashcards

Question 1

Q

True or False

In Support Vector Machines, we maximize (║w║^2)/2 subject to the margin constraints.

Question 2

Q

True or False

In kernelized SVMs, the kernel matrix K has to be positive definite.

Question 3

Q

True or False

If two random variables are independent, then they have to be uncorrelated.

Question 4

Q

True or False

Isocontours of Gaussian distributions have axes whose lengths are proportional to the eigenvalues of the
covariance matrix.

Question 5

Q

True or False

Cross validation will guarantee that our model does not overt.

Question 6

Q

True or False

In logistic regression, the Hessian of the (non regularized) log likelihood is positive denite.

Question 7

Q

Given a binary classification scenario with Gaussian class conditionals and equal prior probabilities, the
optimal decision boundary will be linear.

Question 8

Q

True or False

The hyperparameters in the regularized logistic regression model are η (learning rate) and λ (regularization
term).

Question 9

Q

The Bayes risk for a decision problem is zero when…

Answer

A

the class distributions P(X|Y ) do not overlap and the the prior probability for one class is 1.

Question 10

Q

Gaussian discriminant analysis…

Answer

A

models P(Y = y|X) as a logistic function, is an example of a generative model and can be used to classify points without ever computing an exponential.

Question 11

Q

Ridge regression…

Answer

A

reduces variance at the expense of higher bias.

Question 12

Q

Logistic regression…

Answer

A

minimizes a convex cost function and can be used with a polynomial kernel.

Question 13

Q

In least-squares linear regression, imposing a Gaussian prior on the weights is equivalent to…

Answer

A

L2 regularization

Question 14

Q

In terms of the bias-variance trade-off, which of the following is/are substantially more harmful to the test error than the training error?

Bias
Loss
Variance
Risk

Question 15

Q

In Gaussian discriminant analysis, if two classes come from Gaussian distributions that have different means, may or may not have different covariance matrices, and may or may not have different priors, what are some of the possible decision boundary shapes?

Answer

A

1) hyperplane
2) a nonlinear quadric surface (quadric = the isosurface of a quadratic function)
3) the empty set (the classifier always returns the same class)

Question 16

Q

Why might we prefer to minimize the sum of absolute residuals instead of the residual sum of squares for some data sets?

(Hint: What is one of the
flaws of least-squares regression?)

Answer

Study These Flashcards

A

The sum of absolute residuals is less sensitive to outliers than the residual sum of squares.

Question 17

Q

You train a linear classifier on 10,000 training points and discover that the training accuracy is only 67%. Which
of the following, done in isolation, has a good chance of improving your training accuracy?

..Add novel features
..Train on more data
..Use linear regression
..Train on less data

Answer

Study These Flashcards

A

Add novel features

Train on less data

Question 18

Q

In least-squares linear regression, adding a regularization term can…

Answer

Study These Flashcards

A

increase training error, increase validation error and decrease validation error.

Question 19

Q

Recall that the data model, yi = f (Xi) + εi, that justifies the least-squares cost function in regression.

The statistical assumptions of this model, for all i, are…

Answer

Study These Flashcards

A

εi comes from a Gaussian distribution, all εi have the same mean, and all yi have the same variance.

Question 20

Q

How does ridge regression compare to linear regression with respect to the bias-variance tradeoff?

Answer

Study These Flashcards

A

Ridge regression usually has higher bias and ridge regression’s variance approaches zero as
the regularization parameter
λ → ∞

Question 21

Q

What following quantities affect the bias-variance tradeo-off?

Answer

Study These Flashcards

A

λ, the regularization coefficient in ridge regression
C, the slack parameter in soft-margin SVM
d, the polynomial degree in least-squares regression

Question 22

Q

MLE, applied to estimate the mean parameter
of a normal distribution N(μ;Σ) with a known covariance
matrix Σ, returns…

Answer

Study These Flashcards

A

the mean of the sample points.

Question 23

Q

Maximizing the log likelihood is equivalent to…

Answer

Study These Flashcards

A

maximizing the likelihood.

Question 24

Q

What is the maximum number of points in the Bayes optimal decision boundary?

(Note: as the distribution is
discrete, we are really asking for the maximum number of integral values of k where the classifier makes a transition from predicting one class to the other.)

Answer

Study These Flashcards

A

As f is linear in k, there is only one root, and the decision boundary is a single point.

Quiz 5 Flashcards

(24 cards)