Quiz 2 Flashcards

Question 1

Q

You have an input volume of 32×32×3. What are the dimensions of
the resulting volume after convolving a 5×5 kernel with zero padding,
stride of 1, and 2 filters?

Answer

A

Parameter count = (k1 * k2 * depth + 1) * No. of filters

Therefore, (5 * 5 * 3 + 1) * 2 which comes to 152

Question 2

Q

Consider a documents collection made of 100 documents. Given a
query q, the set of documents relevant to the users is D* = d3, d12,
d34, d56, d98. An IR system retrieves the following documents D =
d3, d12, d35, d56, d66, d88, d95
• Compute the number of True-Positives, True-Negatives, FalsePositives, False-Negatives
• Compute Precision, Recall, and Accuracy.

Answer

A

TP = 3, TN = 91, FP = 4, FN = 2
Precision = 3/7 Recall = 3/5 Accuracy = 94/100

Question 3

Q

You have an input volume of 32×32×3. What are the dimensions of
the resulting volume after convolving a 5×5 kernel with zero padding,
stride of 1, and 2 filters?
4. How many weights and biases would you have?

Answer

A

W×H×T* num of F + num of F

= 5 * 5 * 5 * 3 * 2 + 2

Question 4

Q

Output size of vanilla Convolution

Answer

A

(H-k1+1) X (W-k2+2)

Question 5

Q

Suppose you have an input volume of dimension 64x64x16. How many
parameters would a single 1x1 convolutional filter have, including the
bias?

Question 6

Q

Suppose your input is a 300 by 300 color (RGB) image, and you use
a convolutional layer with 100 filters that are each 5x5. How many
parameters does this layer have including the bias parameters?

Question 7

Q

You have an input volume that is 63x63x16 and convolve it with 32
filters that are each 7x7, and stride of 1. You want to use a same
convolution. What is the padding?

Answer

A

((63 − 7 + 2P) / 1) + 1 = 63

Solve for P = 3

Question 8

Q

Sigmoid

Answer

A

0 to 1

Lose gradient at both ends

Computation is exponential term

Question 9

Q

Tanh

Answer

A

-1 to 1 (centered at 0)

Lose gradient at both ends

Still computationally heavy

Question 10

Q

Relu

Answer

A

No saturation on positive end

Can cause dead neuron (if x <= 0)

Cheap to compute

Question 11

Q

Leaky relu

Answer

A

Learnable parameter

No saturation

No dead neuron

Still cheap to compute

Question 12

Q

Which activation is best?

Answer

A

ReLU is typical starting point

Sigmoid is typically avoided

Question 13

Q

Initialization

Answer

A

Initialization that is close to a good (local) minima will converge faster and to a better solution

Initializing values to a constant value leads to a degenerate solution!

Xavier Initialization –> Lesson 3, Slide 26

Question 14

Q

Issues with optimizers

Answer

A

Noisy gradient estimates

Saddle points

Ill-conditioned loss surface

Question 15

Q

Optimization types

Answer

A

RMSProp

Keep a moving average of squared gradients

Adagrad

Use gradient statistics to reduce learning rate across iterations

Adam

Maintains both first and second moment statistics for gradients

Question 16

Q

Drop out

Answer

A

Dropout: For each node, keep its output with probability p; Activations of deactivated nodes are essentially zero

In practice, implement with a mask calculated each iteration

During testing, no nodes are dropped

Can be seen as:

Training 2^n networks, or

The model should not rely too heavily on particular features

Question 17

Q

Methods to address class imbalance

Answer

A

Sampling

Synthetic oversampling minority technique

Identify nearest neighbors in feature space, select subset of nearest neighbors, then uniformly sample from line segment connecting the nearest neighbors

Cost-based learning

Focal Loss

downweight easy examples (well classified, high probability examples)