Quiz -CB Flashcards

Question 1

Q

To avoid overfitting, we should use a small training dataset.

Question 2

Q

Which of the following classifier scale(s) poorly with the number of observations in terms of computational cost.

k-nearest neighbors
decision tree
random forest
svm w/ rbf kernel

Answer

A

K-nearest neighbours
SVM with RBF kernel

Question 3

Q

On average, linear SVM performs better than logistic regression.

Answer

A

False, no free lunch theorem.

Question 4

Q

A multi-layer feedforward network with linear activation functions is more powerful than a single-layer feedforward network with linear activation functions.

Answer

A

True. There are more neurons to calculate more complex observations through each layer.

Question 5

Q

Tell me based on a accuracy-epochs graph, when there’s overfitting and dropout.

Answer

A

Dropout -> validation set is higher than training set at the start.

Overfitting -> training set higher than validation set at the end.

Question 6

Q

What is a deep neural network?

Answer

A

A neural network made of at least 2 hidden layers with non-linear activations.

Question 7

Q

In a mini-Batch gradient descent, the gradient used for updating the parameters is the average of all the gradients computed in the mini-batch.

Question 8

Q

In what year was the VGG-16 network published?

Question 9

Q

Can a network made of convolutional neural layers be a feed forward neural network?

Question 10

Q

Give the equation to find the shape of the output for a convolution network

Answer

A

(input/output_shape - size)/2

Question 11

Q

Which of the following statements about Batch Normalisation (BN) are correct?

BN is used as a layer
with BN, we need to increase the dropout.
BN is useful for domain adaption.
BN rescales its input values so as to have 0 mean and variance 1

Question 12

Q

For an image recognition problem (i.e. recognising a cat in photo), which architecture would be best suited to solve the problem?

Answer

A

A convolutional neural network followed by a sequence of fully connected layers.

Question 13

Q

Which of the following facts are true

Regularisation biases the parameters towards the value 0.
R is usually used to reduce underfitting.
R is usually used to reduce overfitting.
Tikhonov Regularisation does add significant numerical complexity to Least Squares

Question 14

Q

Which of the following facts about Logistic Regression is true.

LR, assume error is normally distributed.
LR, the predicted value h_w(x) is the likelihood that the data point should be classified as 1 given the observed input feature vector.
In LR, the predicted value h_w(x) is the actual class value (0 or 1)
The larger the risk score is, the more likely it is that the class is positive

Question 15

Q

Why are deeper Neural Nets more powerful than shallow networks?

Answer

A

Deeper NNs generalise better.

Question 16

Q

Of the following activation functions, which one(s) are well designed to prevent the problem of vanishing gradients.

ReLu
tanh
sigmoid
linear

Answer

Study These Flashcards

A

ReLu - designed to tackle vanishing gradients
Gradient is always 1 in linear

Question 17

Q

T or F

Given a convolutions layer both 1) setting its stride to 2 or 2) appending a MaxPooling layer of size 2x2 result in the tensor size to be halved horizontally and vertically

Answer

Study These Flashcards

A

T

Question 18

Q

Which of the following statements about the problem of Vanishing gradients are true:

VG more likely for deeper networks
VG more likely when the activation function is tanh.
VG is more likely when the activation function is ReLu.
VG is more likely to occur in the later layers of the network (layers closer to the output)

Answer

Study These Flashcards

A

1, 2

VG more likely to happen in earlier layers of network -> backpropogation. Earlier layers have more multiplicated and more likely to be close to zero.

Question 19

Q

Which of the following statements about Batch Normalisation (BN) are corrrect?

BN is used as a layer
with BN, we need to increase the dropout
BN is useful for domain adaption.
BN rescales its input values as to have 0 mean and variance 1

Answer

Study These Flashcards

A

1,3,4

Question 20

Q

For an image recognition problem (i.e. recognising a cat in a photo), which architecture would be best suited to solve the problem?

a sequence of fully connected layers.
a recurrent neural network
a convolutional neural network followed by a sequence of fully connected layers.
a fully convolutional neural network, without dense layers

Answer

Study These Flashcards

A

3

Question 21

Q

We are trying to classify pictures as part of a web service. The uploaded are first transformed into visual features by taking the output of the second to last layer of the VGG-16 network. Which of the following neural network architectures would be most suitable to complete this task based on these input features?

CNN followed by fully connected lyaers
seq. of CNN
seq. of fully connected layers
sequence of recurrent layers

Answer

Study These Flashcards

A

3

Question 22

Q

What can a 1x1 convolution be used for?

Changing the width/height of a tensor
Reducing the number of channels of a tensor
Increasing the number of channels of a tensor
Applying spatial filtering to an image tensor.
Applying pooling

Answer

Study These Flashcards

A

2, 3

Quiz -CB Flashcards

(22 cards)