Loss Flashcards

Question 1

Q

Holder continuity

Answer

A

Holder continuity implies (uniform) continuity

Where smaller α means the function can be rougher

Question 2

Q

Lipschitz continuity

Answer

A

Holder continuity with α = 1

Question 3

Q

What does a neural network do fundamentally

Answer

A

It approximates a function

Question 4

Q

Objective of deep learning

Answer

A

Find a function that accomplishes given task by turning inputs into outputs in an, in some sense, optimal way

Question 5

Q

General loss function

Question 6

Q

Auto encoder

Answer

A

Used in unsupervised learning

Dim reduction; has I = O and aims to reconstruct its input x where data must pass through a hidden layer with d units for d &laquo_space;I

Question 7

Q

Relate auto encoders to PCA

Answer

A

Auto encoders are a non linear extension of PCA

Question 8

Q

Risk

Question 9

Q

Mini batch risk

Answer

A

As architecture and activation functions have already been specified, the mini batch risk is not fully determined by the parameter vector

Here #B is size of B

Question 10

Q

Compare squared loss to absolute loss

Answer

A

Squared loss targets mean

Abs loss targets median

Squared loss significantly amplifies loss of a prediction far from actual value causing outliers to disproportionately effect training

Question 11

Q

How to manage limitation of squared loss

Answer

A

To minimise the effect of outliers but retain quadratic behaviour of loss in a δ neighbourhood (for small δ>0) we use Huber loss

Question 12

Q

Alternative to Huber loss

Answer

A

Log cosh loss

Like Huber: behaves quadratically in region of true value and almost linearly far away

Unlike Huber: twice differentiable

Question 13

Q

Weighted sum of 1D losses

Question 14

Q

Categorical cross entropy

Answer

A

Used to train multi class classifies (as opposed to binary)

Question 15

Q

Kulback Leibler divergence

Answer

A

Measure of the discrepancy between the empirical distribution of labels and the distribution predicted by the network

A measure of how 1 P dist is different to another

Question 16

Q

Piecewise affine