4 Backpropagation Flashcards
What is SGD?
Stochastic gradient descent
What is backpropagation?
A method for computing gradients in neural nets
What is (mini) batch gradient descent?
A simple strategy for optimization
What is the expected loss
E_x,y~phat_data L(yhat,y)
Expectation meansaverage over training examples
How do we use ML to optimize for w
If we know the distribution of our model:
Log likelihood
W_ML = argmax FnzyL(w) = argmax sum(m,i=1) log p(yi|xi)
What is p(x|y) in ML
Least squares it is Gaussian
For classification is is multinomial
What is cross-entropy
Statistical divergence between the outputs of the model and the examples in the training set.
Maximizing LL is equilivent to minimizing the CE
- sum(m,i=1) log p(yi|xi)
What is cross-entropy also called?
Loss of the cost function
How do we calculate loss in prqactice?
Take average (1/m) over the presented examples
We specify f that tries to predict p(y|x) output yhat = f(x)
MSE loss: L(yhat,y) = sum(m,i=1) ||yi -yihat||^2
Log loss: L(y,yhat) = sum(m,i=1) yi log yihat
How do we do backpropagation?
Backpropergate the gradiants starting fromthe loss.
Draw a computational 2 layer grapgh for a NN
h = f1(a1) = f1(w1 x+ b1)
yhat = f2(a2) =f2(w2h + b2)
draw your grapgh on paper
Gradiant descent
Batch gradiant descent