Quiz -CB Flashcards
To avoid overfitting, we should use a small training dataset.
False
Which of the following classifier scale(s) poorly with the number of observations in terms of computational cost.
- k-nearest neighbors
- decision tree
- random forest
- svm w/ rbf kernel
- K-nearest neighbours
- SVM with RBF kernel
On average, linear SVM performs better than logistic regression.
False, no free lunch theorem.
A multi-layer feedforward network with linear activation functions is more powerful than a single-layer feedforward network with linear activation functions.
True. There are more neurons to calculate more complex observations through each layer.
Tell me based on a accuracy-epochs graph, when there’s overfitting and dropout.
Dropout -> validation set is higher than training set at the start.
Overfitting -> training set higher than validation set at the end.
What is a deep neural network?
A neural network made of at least 2 hidden layers with non-linear activations.
In a mini-Batch gradient descent, the gradient used for updating the parameters is the average of all the gradients computed in the mini-batch.
True
In what year was the VGG-16 network published?
2014
Can a network made of convolutional neural layers be a feed forward neural network?
True
Give the equation to find the shape of the output for a convolution network
(input/output_shape - size)/2
Which of the following statements about Batch Normalisation (BN) are correct?
- BN is used as a layer
- with BN, we need to increase the dropout.
- BN is useful for domain adaption.
- BN rescales its input values so as to have 0 mean and variance 1
1, 3, 4
For an image recognition problem (i.e. recognising a cat in photo), which architecture would be best suited to solve the problem?
A convolutional neural network followed by a sequence of fully connected layers.
Which of the following facts are true
- Regularisation biases the parameters towards the value 0.
- R is usually used to reduce underfitting.
- R is usually used to reduce overfitting.
- Tikhonov Regularisation does add significant numerical complexity to Least Squares
1, 3
Which of the following facts about Logistic Regression is true.
- LR, assume error is normally distributed.
- LR, the predicted value h_w(x) is the likelihood that the data point should be classified as 1 given the observed input feature vector.
- In LR, the predicted value h_w(x) is the actual class value (0 or 1)
- The larger the risk score is, the more likely it is that the class is positive
2, 4
Why are deeper Neural Nets more powerful than shallow networks?
Deeper NNs generalise better.
Of the following activation functions, which one(s) are well designed to prevent the problem of vanishing gradients.
- ReLu
- tanh
- sigmoid
- linear
- ReLu - designed to tackle vanishing gradients
- Gradient is always 1 in linear
T or F
Given a convolutions layer both 1) setting its stride to 2 or 2) appending a MaxPooling layer of size 2x2 result in the tensor size to be halved horizontally and vertically
T
Which of the following statements about the problem of Vanishing gradients are true:
- VG more likely for deeper networks
- VG more likely when the activation function is tanh.
- VG is more likely when the activation function is ReLu.
- VG is more likely to occur in the later layers of the network (layers closer to the output)
1, 2
VG more likely to happen in earlier layers of network -> backpropogation. Earlier layers have more multiplicated and more likely to be close to zero.
Which of the following statements about Batch Normalisation (BN) are corrrect?
- BN is used as a layer
- with BN, we need to increase the dropout
- BN is useful for domain adaption.
- BN rescales its input values as to have 0 mean and variance 1
1,3,4
For an image recognition problem (i.e. recognising a cat in a photo), which architecture would be best suited to solve the problem?
- a sequence of fully connected layers.
- a recurrent neural network
- a convolutional neural network followed by a sequence of fully connected layers.
- a fully convolutional neural network, without dense layers
3
We are trying to classify pictures as part of a web service. The uploaded are first transformed into visual features by taking the output of the second to last layer of the VGG-16 network. Which of the following neural network architectures would be most suitable to complete this task based on these input features?
- CNN followed by fully connected lyaers
- seq. of CNN
- seq. of fully connected layers
- sequence of recurrent layers
3
What can a 1x1 convolution be used for?
- Changing the width/height of a tensor
- Reducing the number of channels of a tensor
- Increasing the number of channels of a tensor
- Applying spatial filtering to an image tensor.
- Applying pooling
2, 3