Back Propagation Flashcards
What are the steps for training a multilayer neural network using back propagation?
- Set random weights for each weight
- Forward pass (compute all the outputs of every neuron)
- Calculate error
- Back propagation
- Update weights
- GOTO Step 2
What is the process for the forward pass?
- Calculate input neurons
a. Calculate the sum of weights and inputs for each neuron
b. Pass into the function you choose (sigmoid or other) to get result - Calculate hidden neurons
a. Calculate the sum of weights and inputs for each neuron
b. Pass into the function you choose (sigmoid or other) to get result - Calculate hidden neurons
a. Calculate the sum of weights and inputs for each neuron
b. Pass into the function you choose (softmas or sigmoid or other) to get result
- Calculate hidden neurons
What is the mean squared error function?
E(X) = 0.5 * ∑(y_n - t_n )^2
What is the differential of the sigmoid equation?
σ = 1 / 1 + e^-βx
σ = 1 / 1 + e^-βx = (1 + e^-βx)^-1
y = u^-1 u = 1 + e^-βx
dy/du = -u^-2 du/dx = -βe^-βx
σ’ = -βe^-βx / -(1 + e^-βx)^-2
σ’ = βe^-βx / (1 + e^-βx)^-2
1 - σ = 1 - 1 / 1 + e^-βx
1 - σ = (1 + e^-βx - 1 )/ 1 + e^-βx
1 - σ = (e^-βx)/ 1 + e^-βx
σ’ = σ(1 - σ)
What is the differential of the mean squared error function with respect to the output neuron?
E(X) = 0.5 * ∑(σ(a) - t_n )^2 [Image 6]
E(X) = 0.5 * ∑(σ(a) - t_n )^2
y = 0.5 * u^2 u = σ(a) - t_n
dy/du = u du/da = σ(a)(1 - σ(a))
dE/da = σ(a)(1 - σ(a))u
dE/da = σ(1 - σ(a))(σ(a) - t_n)
dE/da = σ(1 - σ(a))(σ(a) - t_n)
What is the differential of the output of the weight-input sum (a) with respect of a output neuron weight (w_n)?
[Image 6]
da/dw_n = w0z0 + w1z1 + … + w_nz_n
da/dw_n = d/dw_n (w_n z_n) = z_n
What is the differential of the error with respect to an output neuron (b)?
Reminder:
> dE/da = σ(1 - σ(a))(σ(a) - t_n)
[Image 6]
da/db = w0z0 + w1z1 + … + w_nz_n
da/db = d/db(w_1σ(b))
da/db = w_1σ(b)(1 - σ(b))
da/db = w_1z_1(1 - z_1)
dE/db= dE/da × da/db
dE/da = σ(a)(1 - σ(a))(σ(a) - t_n)
dE/db = σ(a)(1 - σ(a))(σ(a) - t_n)w_1z_1(1 - z_1)
What is the differential of the error with respect to an output neuron weight (v_n)?
Reminder:
> dE/db = σ(a)(1 - σ(a))(σ(a) - t_n)w_1z_1(1 - z_1)
[Image 6]
db/dv_n = d/dw_n (v_n x_n) = x_n
dE/dv_n = dE/db × db/dv_n
dE/dv_n = σ(a)(1 - σ(a))(σ(a) - t_n)w_1z_1(1 - z_1)x_n
For this example, what is the range differential of the error with respect to a_k?
[Image 7]
E(X) = 0.5 * ∑(z_k - t_n )^2 E(X) = 0.5 * ∑(σ(a_k) - t_n )^2
y = 0.5 * u^2 u = σ(a_k) - t_n
dy/du = u du/da_k = σ(a_k)(1 - σ(a_k))
dE/da_k = σ(a_k)(1 - σ(a_k))u
dE/da_k = σ(a_k)(1 - σ(a_k))(σ(a) - t_n)
dE/da_k = σ(a_k)(1 - σ(a))(σ(a) - t_n)
What is the symbol for the differential of the error with respect to a_k?
δ_k
For this example, what is the range differential of the error with respect to w_jk?
[Image 7]
dE/dw_jk = dE/da_k × da_k/w_jk
da_k/w_jk = d/dw_jk (w_0kz_0 + w_1kz_1 + ... + w_nkz_j +) da_k/w_jk = z_j
dE/da_k = δ_k
dE/dw_jk = δ_kz_j
How do you apply gradient descent to the output layer?
[Image 7]
w_jk+1 = w_jk - ηδ_kz_j
How do weights propagate through the hidden layers?
It passes through multiple neurons which connect to each other. It does not propagate through one path instead through multiple paths.
What is the equation for the error with respect to a_j?
[Image 7]
dE/da_j = ∑_k dE/da_k × da_k/da_j
dE/da_j = ∑_k δ_k × da_k/da_j
da_k/da_j = d/(da_j ) ∑_j (w_jk σ(a_j))
da_k/da_j = ∑_j w_jk σ(a_j )(1-σ(a_j ))
da_k/da_j = ∑_j w_jkz_j(1-z_j)
dE/da_j = ∑_j w_jkz_j(1-z_j)δ_k
What is the equation for the error with respect to weights w_ij?
[Image 7]
dE/dw_ij = dE/da_j × da_j/dw_ij
da_j/dw_ij = d/dw_ij (∑_i (w_ij z_i))
da_j/dw_ij = z_i
dE/dw_ij = δjzi