ML midterm Flashcards

questions

1
Q

Consider a set of classifiers that includes all linear classifiers that use different choices of strict subsets of the components of the input vectors x∈Rd. Claim: the VC-dimension of this combined set cannot be more than d+1.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

It is impossible to overfit to the test set if one only uses it for evaluation.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If the target function is deterministic, overfitting cannot occur.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

For a given linear regression problem with weight decay, there is a λ>0
regularization coefficient so that the optimal weight vectors of the regularized problem is the same as for the unregularized problem

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If the VC dimension of a model is infinite, then VC theory still guarantees generalization, but the bound is looser.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If a data set of size k cannot be shattered by a hypothesis set H, then k is H’s break point.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The cross-entropy error for a logistic regression model has an upper bound when all the samples are classified correctly.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Consider the following hypotheses: h1(x)=w1x1 and h2(x)=1−w1x1, where x1 is the first component of the input vector x. Then for any dataset, the absolute difference between Eout and Ein can be proven to be the same for the two hypotheses.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Underfitting occurs when the in sample error gets larger than the out of sample error.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hard Support Vector Machines work by trying to find a separating hyperplane that correctly classifies all elements in the training dataset while maximizing the distance of the nearest points to this separator.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Stochastic Gradient Descent is made stochastic with respect to normal Gradient Descent by adding a randomized extra noise term to the estimation of the gradient.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If the target function is not a linear function, then the training dataset cannot linearly separable.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Polynomial regression is not a linear problem, so Stochastic Gradient Descent cannot be used.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The Hoeffding Inequality implies that the increase of the generalization error is bounded from below by a logarithmic function of the growth of the size of the training set.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The test set can be used to estimate the out of sample error.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The following model is linear (with respect to the parameters w):

h(x)=w1sin(x1)+ew2x2

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

There exists a dataset of size dVC+1 for which Ein>0.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

If the training set is linearly separable, then the pocket algorithm returns a worse model (in approximating the target function) than PLA.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The Ein obtained by normal linear regression is not larger than the Ein obtained by linear regression with weight decay.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The gradient of the error with respect to the weights can be estimated on one sample and this estimation is unbiased.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The following model is linear (with respect to the parameters):

h(x)=w1sin(x1)+w2ex2−w3x1x2

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

By using the cross-entropy error measure, logistic regression is guaranteed not to get stuck in a local minimum of the error function.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Ein=0 for all datasets of size at most dVC.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

It is possible for a growth function to have more than one break point.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

In Gradient Descent, too small step size can lead to slow learning.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

The training set can be used to estimate the in sample error

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

The squared error measure can be decomposed into bias and variance term

A

True

28
Q

In gradient desxent the step size should be as small as possible, to avoid instability

A

False

29
Q

The PLA is guaranteed to terminate in a finite number of steps

A

False

30
Q

In each step of Gradient descent, the gradient of the error is added to the weigths

A

False

31
Q

The variance of model measures the expected squared difference between the output of the model and the target value.

A

False

32
Q

When considering a missclassified sample, the PLA algorithm will moe the separator in the direction of classifying that sample correctly

A

True

33
Q

The Hoeffding Inequality implies that the generalisation error decreases as the model complexity is increased

A

False

34
Q

If we decrease the regularization coefficiont, the generalization error decreases when using weight decay

A

False

35
Q

Nonlinear transformation of the inputs cannot increase the VC dimension.

A

False

36
Q

There exists no closed analytical formula to determine the weigths of a linear regression problem that minimise the sample error

A

False

37
Q

The gardient descent algo is used to find local minimum in a logistic regression problem.

A

True

38
Q

The Hoeffding inequality implies that the generalisation error grows as the size of the dataset grows

A

False

39
Q

The best stopping criterion for an iterative classification algorithm is to stop only when Ein stops decreasing.

A

False

40
Q

lt is impossible for any learning algorithm to generalise to unseen examples, ie. the output for those will be random.

A

False

41
Q

Stochastic Gradient Descent differs from normal Gradient Descent in that it adds an extra noise term after the estimation gradient.

A

False

42
Q

Regularization is commonly used to speed up the optimization process.

A

False

43
Q

Regularization can improve generalisation.

A

True

44
Q

Using a model with the same complexity as the target function does not prevent overfitting.

A

True

45
Q

When considering a misclassified sample, the PLA algorithm will move the separator in a way to ensure that after this update the sample will be classified correctly.

A

False

46
Q

Increasing the amount of testing data reduces overfitting.

A

False

47
Q

After hyperparameter optimization, the validation error is an unbiased estimate for Eout.

A

False

48
Q

For hypothesis space H with VC dimension dVC:dVC>=N implies that there exists a dataset of size N that H sthatters

A

True

49
Q

Increasing the amount of training data reduces overfitting.

A

True

50
Q

A growth function with a breakpoint k can be bounded by a polynomial of degree k -1.

A

True

51
Q

Overfitting happens when the in sample error is larger (“goes over”) than the out of sample error.

A

False

52
Q

The validation set can be used to optimize hyperparameters.

A

True

53
Q

Polynomial regression is not a linear problem, so Stochastic Gradient Descent cannot be used.

A

False

54
Q

The perceptron model is a linear separator.

A

True

55
Q

Since the (logistic) sigmoid function is not linear, the logistic regression model
is also nonlinear

A

False

56
Q

Overfitting occurs when the out of sample error gets larger than the in sample
error.

A

False

57
Q

The cross-entropy error for a logistic regression model does not have an upper
bound

A

True

58
Q

The Rademacher bound can be used to bound the 0-1 loss of a classification
model.

A

True

59
Q

The obtained by normal linear regression is not larger than the Ein
obtained by linear regression with weight decay.

A

True

60
Q

In Gradient Descent, as a general rule of thumb, the step size should be
chosen to be small enough to keep the magnitude of the gradient under
to avoid instability.

A

False

61
Q

When considering a misclassified sample, the
PLA algorithm will move the separator in the
direction of classifying that sample correctly,
however, the sample might still be classified
incorrectly after the weight update.

A

True

62
Q

The vector (the normal of the separating hyperplane) in an SVM is in the span of the input samples x_i.

A

True

63
Q

A not linearly separable dataset can become linearly separable if further features are added.

A

True

64
Q

In each step of Gradient Descent, the gradient of the error is subtracted from the weights.

A

True

65
Q

If the training set is not linearly separable, then the perceptron learning algorithm always goes into an infinite loop.

A

True

66
Q
A