Quiz 2 Flashcards
You have an input volume of 32×32×3. What are the dimensions of
the resulting volume after convolving a 5×5 kernel with zero padding,
stride of 1, and 2 filters?
Parameter count = (k1 * k2 * depth + 1) * No. of filters
Therefore, (5 * 5 * 3 + 1) * 2 which comes to 152
- Consider a documents collection made of 100 documents. Given a
query q, the set of documents relevant to the users is D* = d3, d12,
d34, d56, d98. An IR system retrieves the following documents D =
d3, d12, d35, d56, d66, d88, d95
• Compute the number of True-Positives, True-Negatives, FalsePositives, False-Negatives
• Compute Precision, Recall, and Accuracy.
TP = 3, TN = 91, FP = 4, FN = 2 Precision = 3/7 Recall = 3/5 Accuracy = 94/100
You have an input volume of 32×32×3. What are the dimensions of
the resulting volume after convolving a 5×5 kernel with zero padding,
stride of 1, and 2 filters?
4. How many weights and biases would you have?
W×H×T* num of F + num of F
= 5 * 5 * 5 * 3 * 2 + 2
Output size of vanilla Convolution
(H-k1+1) X (W-k2+2)
Suppose you have an input volume of dimension 64x64x16. How many
parameters would a single 1x1 convolutional filter have, including the
bias?
17
Suppose your input is a 300 by 300 color (RGB) image, and you use
a convolutional layer with 100 filters that are each 5x5. How many
parameters does this layer have including the bias parameters?
7600
You have an input volume that is 63x63x16 and convolve it with 32
filters that are each 7x7, and stride of 1. You want to use a same
convolution. What is the padding?
((63 − 7 + 2P) / 1) + 1 = 63
Solve for P = 3
Sigmoid
0 to 1
Lose gradient at both ends
Computation is exponential term
Tanh
-1 to 1 (centered at 0)
Lose gradient at both ends
Still computationally heavy
Relu
No saturation on positive end
Can cause dead neuron (if x <= 0)
Cheap to compute
Leaky relu
Learnable parameter
No saturation
No dead neuron
Still cheap to compute
Which activation is best?
ReLU is typical starting point
Sigmoid is typically avoided
Initialization
Initialization that is close to a good (local) minima will converge faster and to a better solution
Initializing values to a constant value leads to a degenerate solution!
Xavier Initialization –> Lesson 3, Slide 26
Issues with optimizers
Noisy gradient estimates
Saddle points
Ill-conditioned loss surface
Optimization types
RMSProp
Keep a moving average of squared gradients
Adagrad
Use gradient statistics to reduce learning rate across iterations
Adam
Maintains both first and second moment statistics for gradients