Quiz -CB Flashcards
To avoid overfitting, we should use a small training dataset.
False
Which of the following classifier scale(s) poorly with the number of observations in terms of computational cost.
- k-nearest neighbors
- decision tree
- random forest
- svm w/ rbf kernel
- K-nearest neighbours
- SVM with RBF kernel
On average, linear SVM performs better than logistic regression.
False, no free lunch theorem.
A multi-layer feedforward network with linear activation functions is more powerful than a single-layer feedforward network with linear activation functions.
True. There are more neurons to calculate more complex observations through each layer.
Tell me based on a accuracy-epochs graph, when there’s overfitting and dropout.
Dropout -> validation set is higher than training set at the start.
Overfitting -> training set higher than validation set at the end.
What is a deep neural network?
A neural network made of at least 2 hidden layers with non-linear activations.
In a mini-Batch gradient descent, the gradient used for updating the parameters is the average of all the gradients computed in the mini-batch.
True
In what year was the VGG-16 network published?
2014
Can a network made of convolutional neural layers be a feed forward neural network?
True
Give the equation to find the shape of the output for a convolution network
(input/output_shape - size)/2
Which of the following statements about Batch Normalisation (BN) are correct?
- BN is used as a layer
- with BN, we need to increase the dropout.
- BN is useful for domain adaption.
- BN rescales its input values so as to have 0 mean and variance 1
1, 3, 4
For an image recognition problem (i.e. recognising a cat in photo), which architecture would be best suited to solve the problem?
A convolutional neural network followed by a sequence of fully connected layers.
Which of the following facts are true
- Regularisation biases the parameters towards the value 0.
- R is usually used to reduce underfitting.
- R is usually used to reduce overfitting.
- Tikhonov Regularisation does add significant numerical complexity to Least Squares
1, 3
Which of the following facts about Logistic Regression is true.
- LR, assume error is normally distributed.
- LR, the predicted value h_w(x) is the likelihood that the data point should be classified as 1 given the observed input feature vector.
- In LR, the predicted value h_w(x) is the actual class value (0 or 1)
- The larger the risk score is, the more likely it is that the class is positive
2, 4
Why are deeper Neural Nets more powerful than shallow networks?
Deeper NNs generalise better.