11 Intro to NN Flashcards

Question 1

Q

What is the difference between a batch and an epoch in neural‑network training?

Answer

A

A batch is a subset of training samples used for one gradient update; one epoch is completed after every sample in the full training set has been used once.

Question 2

Q

Why does ReLU help mitigate the vanishing‑gradient problem for deep nets?

Answer

A

Because its derivative is 1 for positive inputs, so gradients do not shrink as they pass through ReLU activations.

Question 3

Q

Give the formula for a residual block in ResNet.

Answer

A

Output = F(x) + x, where F(x) is the learned residual mapping (e.g., two Conv–BN–ReLU layers).

Question 4

Q

In one sentence, why do skip connections improve gradient flow?

Answer

A

They provide a direct path with derivative 1, so gradients cannot vanish even if ∂F/∂x is small.

Question 5

Q

What is the intuition behind learning a residual instead of the full mapping?

Answer

A

If the desired mapping is close to identity, the network only needs to learn small differences (the residual), which is easier to optimize.

Question 6

Q

List two common loss functions for classification tasks in neural nets.

Answer

A

Binary cross‑entropy (binary) and categorical cross‑entropy (multiclass).

Question 7

Q

True/False: Dropout is typically inserted immediately after convolutional layers.

Answer

A

False – it is most often applied after dense (fully connected) layers; conv layers rely more on BatchNorm.

Question 8

Q

What does a 3×3 filter with stride 2 and ‘valid’ padding do to an input of size 32×32?

Answer

A

Produces a feature map of size 15×15: ((32 − 3)/2) + 1 = 15.

Question 9

Q

Purpose of max pooling in CNNs?

Answer

A

Downsample feature maps while retaining the strongest activations, adding translation invariance and reducing computation.

Question 10

Q

Why does data augmentation reduce overfitting?

Answer

A

It shows the model label‑preserving variations of the same data, forcing it to learn invariant features rather than memorizing exact examples.

Question 11

Q

Softmax vs. sigmoid: when do you use each?

Answer

A

Use sigmoid for independent binary outputs; use softmax when classes are mutually exclusive and probabilities must sum to 1.

Question 12

Q

Define an embedding layer in one sentence.

Answer

A

A trainable lookup table that maps discrete tokens (e.g., words or categories) to dense, low‑dimensional vectors.

Question 13

Q

What tensor shape represents a batch of 64 RGB images at 128×128 resolution?

Answer

A

(64, 128, 128, 3).

Question 14

Q

Give two operations that constitute data augmentation for images.

Answer

A

Examples: random horizontal flip; random rotation; random zoom; random translation; brightness shift (any two).

Question 15

Q

Key advantage of Inception modules over plain VGG‑style stacking.

Answer

A

Parallel convolutions of multiple sizes let the network capture multi‑scale features without greatly increasing depth or parameters.

Question 16

Q

What is the main reason residual networks can be trained to 100+ layers while vanilla CNNs struggle?

Answer

Study These Flashcards

A

Skip connections keep gradients alive, preventing vanishing and allowing very deep optimization.

Question 17

Q

State the forward and backward steps of backpropagation in two bullet points.

Answer

Study These Flashcards

A

Forward: compute layer outputs and loss; Backward: use chain rule to propagate gradients and update weights.

Question 18

Q

Which activation is symmetric around zero and often used in shallow regression nets?

Answer

Study These Flashcards

A

Tanh (hyperbolic tangent).

11 Intro to NN Flashcards

(18 cards)