3 - Backpropagation in computation graphs Flashcards

1
Q

Computational graph

A

eg if f(x,y,z) = (x+y)*z

x = -2
+ (q =3)
y = 5
- (t=-12)
z = -4

(Imagine lines matching the correct operation)
Backpropagation goes right to left in these

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

partial derivative

A

How much the output changes when one(?) input changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Chain rule

A

F(x) = f(g(x)) then F’(x) = f’(g(x))g’(x)

’ means derivative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Derivatives: if q = x +y then…

A

derivative of q w/r to x = 1
derivative of q w/r to y = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Derivatives: if f = qz then

A

derivative of f w/r q = z
derivative of f w/r z = q

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

General concept for why chain rule is useful in computational graphs

A

To determine the “effect” of one input on the output, follow the chain from the output to the input

Ie, in the example, to find deriv f w/r x, we need deriv f w/r q multiplied by deriv q w/r x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Is a computational graph the same as a neural network?

A

NO!
Computational graph is much bigger and shows operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sigmoid derivative

A

(1-sig(x))(sig(s))

That’s
(1-σ(x))(σ(s))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Patterns in backward flow: ADD gate

A

gradient distributor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Patterns in backward flow: MAX gate

A

Gradient Router

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Patterns in backward flow: MUL gate

A

gradient switcher

EG
x*y where x=3 and y=-4 and the gradient is 2

would mean that x has -8 (-42) and y has 6 (32)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Jacobian Matrix

A

The derivative of each element of z (output) w/r to each element of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Steps of training a simple net

A

Forward pass
Compute loss
Propagate loss backwards
Step the optimiser (ie update parameters)
reset all the gradients to 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly