Quiz -CB Flashcards

1
Q

To avoid overfitting, we should use a small training dataset.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following classifier scale(s) poorly with the number of observations in terms of computational cost.

  1. k-nearest neighbors
  2. decision tree
  3. random forest
  4. svm w/ rbf kernel
A
  1. K-nearest neighbours
  2. SVM with RBF kernel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

On average, linear SVM performs better than logistic regression.

A

False, no free lunch theorem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A multi-layer feedforward network with linear activation functions is more powerful than a single-layer feedforward network with linear activation functions.

A

True. There are more neurons to calculate more complex observations through each layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Tell me based on a accuracy-epochs graph, when there’s overfitting and dropout.

A

Dropout -> validation set is higher than training set at the start.

Overfitting -> training set higher than validation set at the end.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a deep neural network?

A

A neural network made of at least 2 hidden layers with non-linear activations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In a mini-Batch gradient descent, the gradient used for updating the parameters is the average of all the gradients computed in the mini-batch.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In what year was the VGG-16 network published?

A

2014

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Can a network made of convolutional neural layers be a feed forward neural network?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Give the equation to find the shape of the output for a convolution network

A

(input/output_shape - size)/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the following statements about Batch Normalisation (BN) are correct?

  1. BN is used as a layer
  2. with BN, we need to increase the dropout.
  3. BN is useful for domain adaption.
  4. BN rescales its input values so as to have 0 mean and variance 1
A

1, 3, 4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For an image recognition problem (i.e. recognising a cat in photo), which architecture would be best suited to solve the problem?

A

A convolutional neural network followed by a sequence of fully connected layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which of the following facts are true

  1. Regularisation biases the parameters towards the value 0.
  2. R is usually used to reduce underfitting.
  3. R is usually used to reduce overfitting.
  4. Tikhonov Regularisation does add significant numerical complexity to Least Squares
A

1, 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which of the following facts about Logistic Regression is true.

  1. LR, assume error is normally distributed.
  2. LR, the predicted value h_w(x) is the likelihood that the data point should be classified as 1 given the observed input feature vector.
  3. In LR, the predicted value h_w(x) is the actual class value (0 or 1)
  4. The larger the risk score is, the more likely it is that the class is positive
A

2, 4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why are deeper Neural Nets more powerful than shallow networks?

A

Deeper NNs generalise better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Of the following activation functions, which one(s) are well designed to prevent the problem of vanishing gradients.

  1. ReLu
  2. tanh
  3. sigmoid
  4. linear
A
  1. ReLu - designed to tackle vanishing gradients
  2. Gradient is always 1 in linear
17
Q

T or F

Given a convolutions layer both 1) setting its stride to 2 or 2) appending a MaxPooling layer of size 2x2 result in the tensor size to be halved horizontally and vertically

A

T

18
Q

Which of the following statements about the problem of Vanishing gradients are true:

  1. VG more likely for deeper networks
  2. VG more likely when the activation function is tanh.
  3. VG is more likely when the activation function is ReLu.
  4. VG is more likely to occur in the later layers of the network (layers closer to the output)
A

1, 2

VG more likely to happen in earlier layers of network -> backpropogation. Earlier layers have more multiplicated and more likely to be close to zero.

19
Q

Which of the following statements about Batch Normalisation (BN) are corrrect?

  1. BN is used as a layer
  2. with BN, we need to increase the dropout
  3. BN is useful for domain adaption.
  4. BN rescales its input values as to have 0 mean and variance 1
A

1,3,4

20
Q

For an image recognition problem (i.e. recognising a cat in a photo), which architecture would be best suited to solve the problem?

  1. a sequence of fully connected layers.
  2. a recurrent neural network
  3. a convolutional neural network followed by a sequence of fully connected layers.
  4. a fully convolutional neural network, without dense layers
A

3

21
Q

We are trying to classify pictures as part of a web service. The uploaded are first transformed into visual features by taking the output of the second to last layer of the VGG-16 network. Which of the following neural network architectures would be most suitable to complete this task based on these input features?

  1. CNN followed by fully connected lyaers
  2. seq. of CNN
  3. seq. of fully connected layers
  4. sequence of recurrent layers
A

3

22
Q

What can a 1x1 convolution be used for?

  1. Changing the width/height of a tensor
  2. Reducing the number of channels of a tensor
  3. Increasing the number of channels of a tensor
  4. Applying spatial filtering to an image tensor.
  5. Applying pooling
A

2, 3