Multi-Choice: Deep Learning Flashcards
Q1 (Deep Learning). A neural net outputs f(x1)=0.1, f(x2)=0.5, f(x3)=0.7 for targets (1,0,1). Binary cross entropy loss to 2 decimals? (One choice) 1) -1.12 2) 4 3) 1.12 4) 0.77 5) 2.12
Correct item: 3. Explanation: The BCE ~ 1.12.
Q2 (Deep Learning). A fully connected NN with 5 hidden layers each of 10 neurons, input is 55x55x3, output is size 1. Total number of parameters? (One choice) 1) 91200 2) 91211 3) 91210 4) 91201 5) 91160 6) 91161 7) 9075
Correct item: 2. Explanation: 9075->10 + 10->10 (×4) + last layer => total 91211.
Q3 (Deep Learning). A CNN with three conv layers on an RGB 55x55: (1) 5x5 filters,5 filters,stride1,pad0 => (2) 7x7,10 filters,stride2,pad0 => (3) 7x7,20 filters,stride4,pad0. Final output dimension? (One choice) 1) 5520 2) 6620 3) 232310 4) 51515 5) 55553 6) 553
Correct item: 1. Explanation: The dimension steps: 55->51->23->5 in spatial dims, final depth 20 => 5520.
Q4 (Deep Learning). Same CNN as Q3. Total number of parameters (including bias)? (One choice) 1) 9820 2) 12660 3) 2460 4) 380 5) 12280 6) 12760
Correct item: 2. Explanation: Layer1=380, Layer2=2460, Layer3=9820 => 12660 total.
Q5 (Deep Learning). A single-hidden-layer net for multiclass: h=σ(W(1)x + b(1)), ŷ=softmax(W(2)h + b(2)). Input dimension=D, classes=k, hidden=H. Total # of parameters? (One choice) 1) (D+1)H + (H+1)k 2) D+H+k 3) HD+KH 4) H+K 5) HD+KH+K
Correct item: 1. Explanation: The first layer has (D+1)H parameters, second has (H+1)k.
Q6 (Deep Learning). Same net as Q5, cross-entropy error, gradient wrt w(1)ᵢⱼ? (One choice) 1) Σₖ(yₖ - ŷₖ) hᵢ(1-hᵢ) xⱼ 2) Σₖ(yₖ - ŷₖ) w(2)ₖᵢ hᵢ(1-hᵢ) xⱼ 3) Σₖ(yₖ - ŷₖ) w(2)ₖᵢ hᵢ(1-hᵢ) 4) Σₖ(yₖ - ŷₖ) w(2)ₖᵢ hᵢ(1-hᵢ) xᵢ …
Correct item: 2. Explanation: The chain rule gives Σₖ(yₖ - ŷₖ) w(2)ₖᵢ hᵢ(1-hᵢ) xⱼ.
Q7 (Deep Learning). Which are true for convolutional layers? (Two correct) 1) Good for all tabular data 2) Trained by gradient descent 3) Fewer params than fully connected for same input 4) Use linear activations
Correct items: 2 and 3. Explanation: Convs do share weights, so fewer parameters, and are trained via gradient descent.
Q8 (Deep Learning). Which are true for RNNs? (Two correct) 1) LSTMs fix vanishing gradient issues 2) Param count depends on sequence length 3) RNNs only do many-to-one tasks 4) RNN design uses parameter sharing
Correct items: 1 and 4. Explanation: LSTMs mitigate vanishing gradients; RNNs share parameters across timesteps.