4 Backpropagation Flashcards

1
Q

What is SGD?

A

Stochastic gradient descent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is backpropagation?

A

A method for computing gradients in neural nets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is (mini) batch gradient descent?

A

A simple strategy for optimization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the expected loss

A

E_x,y~phat_data L(yhat,y)

Expectation meansaverage over training examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we use ML to optimize for w

A

If we know the distribution of our model:
Log likelihood

W_ML = argmax FnzyL(w) = argmax sum(m,i=1) log p(yi|xi)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is p(x|y) in ML

A

Least squares it is Gaussian
For classification is is multinomial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is cross-entropy

A

Statistical divergence between the outputs of the model and the examples in the training set.

Maximizing LL is equilivent to minimizing the CE

  • sum(m,i=1) log p(yi|xi)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is cross-entropy also called?

A

Loss of the cost function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we calculate loss in prqactice?

A

Take average (1/m) over the presented examples

We specify f that tries to predict p(y|x) output yhat = f(x)
MSE loss: L(yhat,y) = sum(m,i=1) ||yi -yihat||^2
Log loss: L(y,yhat) = sum(m,i=1) yi log yihat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we do backpropagation?

A

Backpropergate the gradiants starting fromthe loss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Draw a computational 2 layer grapgh for a NN

A

h = f1(a1) = f1(w1 x+ b1)
yhat = f2(a2) =f2(w2h + b2)

draw your grapgh on paper

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Gradiant descent

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Batch gradiant descent

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly