ML midterm Flashcards
questions
Consider a set of classifiers that includes all linear classifiers that use different choices of strict subsets of the components of the input vectors x∈Rd. Claim: the VC-dimension of this combined set cannot be more than d+1.
True
It is impossible to overfit to the test set if one only uses it for evaluation.
False
If the target function is deterministic, overfitting cannot occur.
False
For a given linear regression problem with weight decay, there is a λ>0
regularization coefficient so that the optimal weight vectors of the regularized problem is the same as for the unregularized problem
True
If the VC dimension of a model is infinite, then VC theory still guarantees generalization, but the bound is looser.
False
If a data set of size k cannot be shattered by a hypothesis set H, then k is H’s break point.
True
The cross-entropy error for a logistic regression model has an upper bound when all the samples are classified correctly.
True
Consider the following hypotheses: h1(x)=w1x1 and h2(x)=1−w1x1, where x1 is the first component of the input vector x. Then for any dataset, the absolute difference between Eout and Ein can be proven to be the same for the two hypotheses.
False
Underfitting occurs when the in sample error gets larger than the out of sample error.
False
Hard Support Vector Machines work by trying to find a separating hyperplane that correctly classifies all elements in the training dataset while maximizing the distance of the nearest points to this separator.
True
Stochastic Gradient Descent is made stochastic with respect to normal Gradient Descent by adding a randomized extra noise term to the estimation of the gradient.
False
If the target function is not a linear function, then the training dataset cannot linearly separable.
False
Polynomial regression is not a linear problem, so Stochastic Gradient Descent cannot be used.
False
The Hoeffding Inequality implies that the increase of the generalization error is bounded from below by a logarithmic function of the growth of the size of the training set.
False
The test set can be used to estimate the out of sample error.
True
The following model is linear (with respect to the parameters w):
h(x)=w1sin(x1)+ew2x2
False
There exists a dataset of size dVC+1 for which Ein>0.
True
If the training set is linearly separable, then the pocket algorithm returns a worse model (in approximating the target function) than PLA.
False
The Ein obtained by normal linear regression is not larger than the Ein obtained by linear regression with weight decay.
True
The gradient of the error with respect to the weights can be estimated on one sample and this estimation is unbiased.
True
The following model is linear (with respect to the parameters):
h(x)=w1sin(x1)+w2ex2−w3x1x2
True
By using the cross-entropy error measure, logistic regression is guaranteed not to get stuck in a local minimum of the error function.
True
Ein=0 for all datasets of size at most dVC.
True
It is possible for a growth function to have more than one break point.
False
In Gradient Descent, too small step size can lead to slow learning.
True
The training set can be used to estimate the in sample error
True
The squared error measure can be decomposed into bias and variance term
True
In gradient desxent the step size should be as small as possible, to avoid instability
False
The PLA is guaranteed to terminate in a finite number of steps
False
In each step of Gradient descent, the gradient of the error is added to the weigths
False
The variance of model measures the expected squared difference between the output of the model and the target value.
False
When considering a missclassified sample, the PLA algorithm will moe the separator in the direction of classifying that sample correctly
True
The Hoeffding Inequality implies that the generalisation error decreases as the model complexity is increased
False
If we decrease the regularization coefficiont, the generalization error decreases when using weight decay
False
Nonlinear transformation of the inputs cannot increase the VC dimension.
False
There exists no closed analytical formula to determine the weigths of a linear regression problem that minimise the sample error
False
The gardient descent algo is used to find local minimum in a logistic regression problem.
True
The Hoeffding inequality implies that the generalisation error grows as the size of the dataset grows
False
The best stopping criterion for an iterative classification algorithm is to stop only when Ein stops decreasing.
False
lt is impossible for any learning algorithm to generalise to unseen examples, ie. the output for those will be random.
False
Stochastic Gradient Descent differs from normal Gradient Descent in that it adds an extra noise term after the estimation gradient.
False
Regularization is commonly used to speed up the optimization process.
False
Regularization can improve generalisation.
True
Using a model with the same complexity as the target function does not prevent overfitting.
True
When considering a misclassified sample, the PLA algorithm will move the separator in a way to ensure that after this update the sample will be classified correctly.
False
Increasing the amount of testing data reduces overfitting.
False
After hyperparameter optimization, the validation error is an unbiased estimate for Eout.
False
For hypothesis space H with VC dimension dVC:dVC>=N implies that there exists a dataset of size N that H sthatters
True
Increasing the amount of training data reduces overfitting.
True
A growth function with a breakpoint k can be bounded by a polynomial of degree k -1.
True
Overfitting happens when the in sample error is larger (“goes over”) than the out of sample error.
False
The validation set can be used to optimize hyperparameters.
True
Polynomial regression is not a linear problem, so Stochastic Gradient Descent cannot be used.
False
The perceptron model is a linear separator.
True
Since the (logistic) sigmoid function is not linear, the logistic regression model
is also nonlinear
False
Overfitting occurs when the out of sample error gets larger than the in sample
error.
False
The cross-entropy error for a logistic regression model does not have an upper
bound
True
The Rademacher bound can be used to bound the 0-1 loss of a classification
model.
True
The obtained by normal linear regression is not larger than the Ein
obtained by linear regression with weight decay.
True
In Gradient Descent, as a general rule of thumb, the step size should be
chosen to be small enough to keep the magnitude of the gradient under
to avoid instability.
False
When considering a misclassified sample, the
PLA algorithm will move the separator in the
direction of classifying that sample correctly,
however, the sample might still be classified
incorrectly after the weight update.
True
The vector (the normal of the separating hyperplane) in an SVM is in the span of the input samples x_i.
True
A not linearly separable dataset can become linearly separable if further features are added.
True
In each step of Gradient Descent, the gradient of the error is subtracted from the weights.
True
If the training set is not linearly separable, then the perceptron learning algorithm always goes into an infinite loop.
True