3 - Backpropagation in computation graphs Flashcards
Computational graph
eg if f(x,y,z) = (x+y)*z
x = -2
+ (q =3)
y = 5
- (t=-12)
z = -4
(Imagine lines matching the correct operation)
Backpropagation goes right to left in these
partial derivative
How much the output changes when one(?) input changes
Chain rule
F(x) = f(g(x)) then F’(x) = f’(g(x))g’(x)
’ means derivative
Derivatives: if q = x +y then…
derivative of q w/r to x = 1
derivative of q w/r to y = 1
Derivatives: if f = qz then
derivative of f w/r q = z
derivative of f w/r z = q
General concept for why chain rule is useful in computational graphs
To determine the “effect” of one input on the output, follow the chain from the output to the input
Ie, in the example, to find deriv f w/r x, we need deriv f w/r q multiplied by deriv q w/r x
Is a computational graph the same as a neural network?
NO!
Computational graph is much bigger and shows operations
Sigmoid derivative
(1-sig(x))(sig(s))
That’s
(1-σ(x))(σ(s))
Patterns in backward flow: ADD gate
gradient distributor
Patterns in backward flow: MAX gate
Gradient Router
Patterns in backward flow: MUL gate
gradient switcher
EG
x*y where x=3 and y=-4 and the gradient is 2
would mean that x has -8 (-42) and y has 6 (32)
Jacobian Matrix
The derivative of each element of z (output) w/r to each element of x
Steps of training a simple net
Forward pass
Compute loss
Propagate loss backwards
Step the optimiser (ie update parameters)
reset all the gradients to 0