Neural networks Flashcards

Question 1

Q

What is representation learning?

Answer

A

It’s the idea of learning basis functions so the model “learns” its own (higher dimension) representation of the data.

Question 2

Q

What are the best suited NN output activation functions for regression and (binary and multi-class) classification?

Answer

A

Question 3

Q

What is the universal approximation theorem?

Answer

A

A single hidden layer neural network with a linear output unit can approximate any continuous function arbitrarily well, given enough hidden units.

Question 4

Q

What is computed during backpropagation?

Answer

A

Backpropagation correctly computes the derivative of the network function with respect to the input.

Question 5

Q

What are the different ways to fight gradient vanishing?

Answer

A

Question 6

Q

How to fight exploding gradient?

Answer

A

By using L2 regularization and/or gradient clipping.

Question 7

Q

How to improve gradient descent?

Answer

A

(Mini)batch or stochastic gradient descent
Momentum, i.e. considering the gradient of the previous step:
θ_t+1 = θ_t - v_t, with v_t = γ * v_t-1) + η * ∇J(θ_t)
(γ is the momentum coefficient)
Nesterov Accelerated Gradient (NAG), i.e. computing the “look-ahead gradient” after applying momentum:
v_t = γ * v_t-1) + η * ∇J(θ_t - γ * v_t-1))
Adaptive learning rate, i.e. for each parameter i:
θ_t+1,i = θ_t,i - α / sqrt(Σ(p=1,2,…t) g_p,i² + ε) * g_t,i
or
θ_t+1,i = θ_t,i - α / sqrt(E[g²]t + ε) * g_t,i, with E[g²]t = ρ*E[g²]t-1 + (1-ρ)*g_t²
(ρ ≈ 0.9 is the decay constant)
Adaptive momentum estimation (Adam):
θ_t+1,i = θ_t,i - α / sqrt[v_t / (1-β2t) + ε]) * m_t / (1-β2t), with
m_t = β1*m_t-1 + (1-β1)*g_t
v_t = β2*v_t-1 + (1-β2)*g_t²
β1 ≈ 0.9, β2 ≈ 0.999

Question 8

Q

Using adaptive learning rate, how is the learning rate impacted by the gradient?

Answer

A

Sparse gradient –> high l.r.
Frequent gradient –> low l.r.

Question 9

Q

What are the best suited models for image and sequence data?

Answer

A

Image –> Convolutional NN + Transformers
Sequence –> Recurrent NN + Transformers

Question 10

Q

What are the best optimizers to use?

Answer

A

SGDM, RMSProp, Adam (1.5 order), Shampoo (2 order)

Question 11

Q

What are good practices when training a NN?

Answer

A

(11 cards)