ML midterm Flashcards
questions
Consider a set of classifiers that includes all linear classifiers that use different choices of strict subsets of the components of the input vectors x∈Rd. Claim: the VC-dimension of this combined set cannot be more than d+1.
True
It is impossible to overfit to the test set if one only uses it for evaluation.
False
If the target function is deterministic, overfitting cannot occur.
False
For a given linear regression problem with weight decay, there is a λ>0
regularization coefficient so that the optimal weight vectors of the regularized problem is the same as for the unregularized problem
True
If the VC dimension of a model is infinite, then VC theory still guarantees generalization, but the bound is looser.
False
If a data set of size k cannot be shattered by a hypothesis set H, then k is H’s break point.
True
The cross-entropy error for a logistic regression model has an upper bound when all the samples are classified correctly.
True
Consider the following hypotheses: h1(x)=w1x1 and h2(x)=1−w1x1, where x1 is the first component of the input vector x. Then for any dataset, the absolute difference between Eout and Ein can be proven to be the same for the two hypotheses.
False
Underfitting occurs when the in sample error gets larger than the out of sample error.
False
Hard Support Vector Machines work by trying to find a separating hyperplane that correctly classifies all elements in the training dataset while maximizing the distance of the nearest points to this separator.
True
Stochastic Gradient Descent is made stochastic with respect to normal Gradient Descent by adding a randomized extra noise term to the estimation of the gradient.
False
If the target function is not a linear function, then the training dataset cannot linearly separable.
False
Polynomial regression is not a linear problem, so Stochastic Gradient Descent cannot be used.
False
The Hoeffding Inequality implies that the increase of the generalization error is bounded from below by a logarithmic function of the growth of the size of the training set.
False
The test set can be used to estimate the out of sample error.
True
The following model is linear (with respect to the parameters w):
h(x)=w1sin(x1)+ew2x2
False
There exists a dataset of size dVC+1 for which Ein>0.
True
If the training set is linearly separable, then the pocket algorithm returns a worse model (in approximating the target function) than PLA.
False
The Ein obtained by normal linear regression is not larger than the Ein obtained by linear regression with weight decay.
True
The gradient of the error with respect to the weights can be estimated on one sample and this estimation is unbiased.
True
The following model is linear (with respect to the parameters):
h(x)=w1sin(x1)+w2ex2−w3x1x2
True
By using the cross-entropy error measure, logistic regression is guaranteed not to get stuck in a local minimum of the error function.
True
Ein=0 for all datasets of size at most dVC.
True
It is possible for a growth function to have more than one break point.
False
In Gradient Descent, too small step size can lead to slow learning.
True
The training set can be used to estimate the in sample error
True