Lecture 15 - Neural Networks Part 3 Flashcards
What is the generalised delta rule?
show the formula?
A rule that can be used to modify the weights to minimize |z-y| regardless of the order of the output function
σxj=(z-y)(dy/dwj)
What is the formula of the logistic function?
What are its properties?
y = 1 / ( 1 + e-wbar xbar)
Curve has a sigmoid shape
Differentiable
Monotonically increasing (i.e. higher x is always higher y)
Tends to 1 as input tends to +inf
Tends to 0 as input tends to -inf
Equal to 0.5 when input equal to 0
What is the derivative of the logistic function?
y(1-y)
Why is training hidden units difficult?
No idea what their expected outputs should be
What is the solution to the difficulty of training hidden units?
Devise a way of making plausible guesses at what the outputs should be
To use the generalised delta rule, the output function must be ____________
Differentiable
How are hidden units trained?
Feeding back the error from the output layer
-> the estimated error of a hidden unit is the weighted sum of the errors of the output units
Passing back a weighted error to train a hidden unit is known as ?
back propagation
What is the issue with large values of a in back propagation?
May make it impossible to find a minimum error value
What is the issue with a small value of a in back propagation?
Prolong the gradient descent and increase the chance of finding a local (rather than global) minimum
What are two ways of determining when to stop training using back propagation?
Keep going until error falls below a given threshold
Keep going until average error change is small
How can we reduce overfitting in back propagation networks?
Decrease number of hidden units
Add weight decay (each iteration all weights decrease slightly)
Use large number of high quality training samples
What is the problem with decreasing the number of hidden units and adding weight decay to combat overfitting in back propagation networks?
Restrict the complexity of the output function
What is cross validation, as it relates to overfitting?
Split training data into two sets: one to modify weights and one to test the weights
Test the test data after each weight change
Use the weights that gave the best result
For what purpose are back propagation networks particularly useful?
Approximation, regression, prediction, classification