Quiz 2 Flashcards

1
Q

You have an input volume of 32×32×3. What are the dimensions of
the resulting volume after convolving a 5×5 kernel with zero padding,
stride of 1, and 2 filters?

A

Parameter count = (k1 * k2 * depth + 1) * No. of filters

Therefore, (5 * 5 * 3 + 1) * 2 which comes to 152

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Consider a documents collection made of 100 documents. Given a
    query q, the set of documents relevant to the users is D* = d3, d12,
    d34, d56, d98. An IR system retrieves the following documents D =
    d3, d12, d35, d56, d66, d88, d95
    • Compute the number of True-Positives, True-Negatives, FalsePositives, False-Negatives
    • Compute Precision, Recall, and Accuracy.
A
TP = 3, TN = 91, FP = 4, FN = 2
Precision = 3/7 Recall = 3/5 Accuracy = 94/100
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You have an input volume of 32×32×3. What are the dimensions of
the resulting volume after convolving a 5×5 kernel with zero padding,
stride of 1, and 2 filters?
4. How many weights and biases would you have?

A

W×H×T* num of F + num of F

= 5 * 5 * 5 * 3 * 2 + 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Output size of vanilla Convolution

A

(H-k1+1) X (W-k2+2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Suppose you have an input volume of dimension 64x64x16. How many
parameters would a single 1x1 convolutional filter have, including the
bias?

A

17

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Suppose your input is a 300 by 300 color (RGB) image, and you use
a convolutional layer with 100 filters that are each 5x5. How many
parameters does this layer have including the bias parameters?

A

7600

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

You have an input volume that is 63x63x16 and convolve it with 32
filters that are each 7x7, and stride of 1. You want to use a same
convolution. What is the padding?

A

((63 − 7 + 2P) / 1) + 1 = 63

Solve for P = 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sigmoid

A

0 to 1

Lose gradient at both ends

Computation is exponential term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Tanh

A

-1 to 1 (centered at 0)

Lose gradient at both ends

Still computationally heavy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Relu

A

No saturation on positive end

Can cause dead neuron (if x <= 0)

Cheap to compute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Leaky relu

A

Learnable parameter

No saturation

No dead neuron

Still cheap to compute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which activation is best?

A

ReLU is typical starting point

Sigmoid is typically avoided

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Initialization

A

Initialization that is close to a good (local) minima will converge faster and to a better solution

Initializing values to a constant value leads to a degenerate solution!

Xavier Initialization –> Lesson 3, Slide 26

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Issues with optimizers

A

Noisy gradient estimates

Saddle points

Ill-conditioned loss surface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Optimization types

A

RMSProp

Keep a moving average of squared gradients

Adagrad

Use gradient statistics to reduce learning rate across iterations

Adam

Maintains both first and second moment statistics for gradients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Drop out

A

Dropout: For each node, keep its output with probability p; Activations of deactivated nodes are essentially zero

In practice, implement with a mask calculated each iteration

During testing, no nodes are dropped

Can be seen as:

Training 2^n networks, or

The model should not rely too heavily on particular features

17
Q

Methods to address class imbalance

A

Sampling

Synthetic oversampling minority technique

Identify nearest neighbors in feature space, select subset of nearest neighbors, then uniformly sample from line segment connecting the nearest neighbors

Cost-based learning

Focal Loss

downweight easy examples (well classified, high probability examples)