Loss Flashcards
Holder continuity
Holder continuity implies (uniform) continuity
Where smaller α means the function can be rougher
Lipschitz continuity
Holder continuity with α = 1
What does a neural network do fundamentally
It approximates a function
Objective of deep learning
Find a function that accomplishes given task by turning inputs into outputs in an, in some sense, optimal way
General loss function
Auto encoder
Used in unsupervised learning
Dim reduction; has I = O and aims to reconstruct its input x where data must pass through a hidden layer with d units for d «_space;I
Relate auto encoders to PCA
Auto encoders are a non linear extension of PCA
Risk
Mini batch risk
As architecture and activation functions have already been specified, the mini batch risk is not fully determined by the parameter vector
Here #B is size of B
Compare squared loss to absolute loss
Squared loss targets mean
Abs loss targets median
Squared loss significantly amplifies loss of a prediction far from actual value causing outliers to disproportionately effect training
How to manage limitation of squared loss
To minimise the effect of outliers but retain quadratic behaviour of loss in a δ neighbourhood (for small δ>0) we use Huber loss
Alternative to Huber loss
Log cosh loss
Like Huber: behaves quadratically in region of true value and almost linearly far away
Unlike Huber: twice differentiable
Weighted sum of 1D losses
Categorical cross entropy
Used to train multi class classifies (as opposed to binary)
Kulback Leibler divergence
Measure of the discrepancy between the empirical distribution of labels and the distribution predicted by the network
A measure of how 1 P dist is different to another