Past Papers Flashcards
What are hyper parameters?
Set before training
Be sure to specify that they DEFINE network’s architecture
For an L1 regularised neural network, write down how the regularisation term changes the way the parameters are modified during back prop
For an L1 regularised neural network, write the loss function expanded around θ* and show what the minimum is for θi
For an L2 regularised neural network:
Write down how regularisation term changes the way the parameters update
Expand around the min of the loss function
Write down an expression for the ith component of the minimum
When doing questions with L1 remember to
Mention that we are introducing sparsity to the solution -> some parameters will go to zero if they are not significant
Describe what early stopping does
At each iteration of early stopping, we check how the validation or test set errors behave. After p (patience) consecutive iterations where the test error gets worse, the algorithm terminates
GPT4o definition of Universal Approximation Property
A feedforward NN with a single hidden layer containing a finite number of neurons can approximate any continuous function cation on a compact subset of Rd, given an appropriate activation function
What is the capacity of an infinitely wide neural network with a single hidden layer?
By the UAP, this network can approximate any function and therefore has infinite capacity
When asked to compare loss functions?
Remember to check wether the functions are bounded & wether they are differentiable
When discussing MSE?
Remember to mention that it is preferred for regression
How does L2 regularisation change the way the parameters are updated using backprop?
When finishing values that minimise MSE for linear regression
Mention linear independence of system of derivatives with respect to β
How many times are parameters updated ?
N * (1 - validation_split) = num of training samples (X)
X / batch size = num of batches (B)
B * epochs = number of parameter updates
Total # of parameters in a NN
UAP conditions on g
Maps R to R
Measurable
Non polynomial
Bounded on any finite interval
The closure of the set of all discontinuities of g in R has zero lebesgue measure