Back Propagation Flashcards

1
Q

What are the steps for training a multilayer neural network using back propagation?

A
  1. Set random weights for each weight
  2. Forward pass (compute all the outputs of every neuron)
  3. Calculate error
  4. Back propagation
  5. Update weights
  6. GOTO Step 2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the process for the forward pass?

A
  1. Calculate input neurons
    a. Calculate the sum of weights and inputs for each neuron
    b. Pass into the function you choose (sigmoid or other) to get result
  2. Calculate hidden neurons
    a. Calculate the sum of weights and inputs for each neuron
    b. Pass into the function you choose (sigmoid or other) to get result
    1. Calculate hidden neurons
      a. Calculate the sum of weights and inputs for each neuron
      b. Pass into the function you choose (softmas or sigmoid or other) to get result
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the mean squared error function?

A

E(X) = 0.5 * ∑(y_n - t_n )^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the differential of the sigmoid equation?

σ = 1 / 1 + e^-βx

A

σ = 1 / 1 + e^-βx = (1 + e^-βx)^-1

y = u^-1
u = 1 + e^-βx
dy/du = -u^-2
du/dx = -βe^-βx

σ’ = -βe^-βx / -(1 + e^-βx)^-2

σ’ = βe^-βx / (1 + e^-βx)^-2

1 - σ = 1 - 1 / 1 + e^-βx
1 - σ = (1 + e^-βx - 1 )/ 1 + e^-βx
1 - σ = (e^-βx)/ 1 + e^-βx

σ’ = σ(1 - σ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the differential of the mean squared error function with respect to the output neuron?
E(X) = 0.5 * ∑(σ(a) - t_n )^2 [Image 6]

A

E(X) = 0.5 * ∑(σ(a) - t_n )^2

y = 0.5 * u^2
u = σ(a) - t_n
dy/du = u
du/da = σ(a)(1 - σ(a))

dE/da = σ(a)(1 - σ(a))u

dE/da = σ(1 - σ(a))(σ(a) - t_n)

dE/da = σ(1 - σ(a))(σ(a) - t_n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the differential of the output of the weight-input sum (a) with respect of a output neuron weight (w_n)?
[Image 6]

A

da/dw_n = w0z0 + w1z1 + … + w_nz_n

da/dw_n = d/dw_n (w_n z_n) = z_n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the differential of the error with respect to an output neuron (b)?
Reminder:
> dE/da = σ(1 - σ(a))(σ(a) - t_n)

[Image 6]

A

da/db = w0z0 + w1z1 + … + w_nz_n

da/db = d/db(w_1σ(b))

da/db = w_1σ(b)(1 - σ(b))

da/db = w_1z_1(1 - z_1)

dE/db= dE/da × da/db

dE/da = σ(a)(1 - σ(a))(σ(a) - t_n)

dE/db = σ(a)(1 - σ(a))(σ(a) - t_n)w_1z_1(1 - z_1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the differential of the error with respect to an output neuron weight (v_n)?
Reminder:
> dE/db = σ(a)(1 - σ(a))(σ(a) - t_n)w_1z_1(1 - z_1)

[Image 6]

A

db/dv_n = d/dw_n (v_n x_n) = x_n

dE/dv_n = dE/db × db/dv_n

dE/dv_n = σ(a)(1 - σ(a))(σ(a) - t_n)w_1z_1(1 - z_1)x_n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

For this example, what is the range differential of the error with respect to a_k?
[Image 7]

A
E(X) = 0.5 * ∑(z_k - t_n )^2  
E(X) = 0.5 * ∑(σ(a_k) - t_n )^2  
y = 0.5 * u^2
u = σ(a_k) - t_n
dy/du = u
du/da_k = σ(a_k)(1 - σ(a_k))

dE/da_k = σ(a_k)(1 - σ(a_k))u

dE/da_k = σ(a_k)(1 - σ(a_k))(σ(a) - t_n)

dE/da_k = σ(a_k)(1 - σ(a))(σ(a) - t_n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the symbol for the differential of the error with respect to a_k?

A

δ_k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

For this example, what is the range differential of the error with respect to w_jk?
[Image 7]

A

dE/dw_jk = dE/da_k × da_k/w_jk

da_k/w_jk = d/dw_jk (w_0kz_0 + w_1kz_1 + ... + w_nkz_j +)
da_k/w_jk = z_j

dE/da_k = δ_k

dE/dw_jk = δ_kz_j

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you apply gradient descent to the output layer?

[Image 7]

A

w_jk+1 = w_jk - ηδ_kz_j

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do weights propagate through the hidden layers?

A

It passes through multiple neurons which connect to each other. It does not propagate through one path instead through multiple paths.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the equation for the error with respect to a_j?

[Image 7]

A

dE/da_j = ∑_k dE/da_k × da_k/da_j

dE/da_j = ∑_k δ_k × da_k/da_j

da_k/da_j = d/(da_j ) ∑_j (w_jk σ(a_j))

da_k/da_j = ∑_j w_jk σ(a_j )(1-σ(a_j ))

da_k/da_j = ∑_j w_jkz_j(1-z_j)

dE/da_j = ∑_j w_jkz_j(1-z_j)δ_k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the equation for the error with respect to weights w_ij?
[Image 7]

A

dE/dw_ij = dE/da_j × da_j/dw_ij

da_j/dw_ij = d/dw_ij (∑_i (w_ij z_i))

da_j/dw_ij = z_i

dE/dw_ij = δjzi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you apply gradient descent to the hidden layer?

[Image 7]

A

w_ij+1 = w_ij - ηδ_kz_i

17
Q

Why might the output value change every time the AI is trained?

A

Because of the AI finding local minma

18
Q

What should the weights of the neural network be set to initially?

A

> Set randomly

> Close to 0