Chapter 8- Neural Networks Flashcards
what type of neural networks demonstrate above human level performance in chess and go?
convolutional neural networks
what is alex net?
a convolutional neural network that outperformed other models in the imagenet challenge
what is a spiking neural network?
it aims to mimic a biological neuron more closely
give a neuron mathematically in sum notation y(x,y) =
f(sum: wx +b)
give a neuron mathematically in matrix notation
f(W.X)
give three different activation functions?
threshold, sigmoid, softmax
what is the activation function used for logistic regression?
sigmoid
give the sigmoid activation function y(X,W) = ?
1 / (1 + e^-z)
when is the softmax activation function used?
when we have multiple, mutually exclusive classes
softmax is an extension of…?
the logistic function
give the equation for gradient descent, w_new = ?
w_old - lamda (dL/dw)
give the equation for squared error loss, L = ?
0.5(y-t)^2
what is backpropagation?
the application of the chain rule for neural networks
what are the two stopping criteria we use for neural networks?
maximum number of epochs
early stopping criteria
what does the learning rate determine?
how large an adjustment we make to each weight at each iteration
what neural network structure should be sufficient to approximate any function?
a multilayer perceptron with one hidden layer
what is the advantage of adding more layers to a model, rather than more neurons?
increases flexibility with fewer free parameters
what are the three approaches to establishing neural network architecture?
experimentation, heuristics, pre-trained models (transfer learning)
when we add too many layers/neurons to a model, we risk…?
overfitting
what is bias error?
error due to an erroneous assumption in the model
what is variance error?
error due to the algorithm fitting to noise in the training data
what kind of error decreases as we make a model more complex?
bias
describe the idea behind a drop out scheme
begin with an overly complex model
during training, the output of any individual neuron is ignored with probability p
what is the traditional error curve of simpler models
test error decreases up until the model is sufficiently complex and then increases
what is double descent?
if we continue to increase the number of hidden layers, the test error decreases again
what are the two problems that deep(er) neural networks face?
training time
vanishing gradient problem
what is the vanishing gradient problem
the weight update in the early layers can be extremely close to zero
what are the two ways to fix the vanishing gradient problem?
relu, rectified linear unit activation function
feed the output of a neuron directly into a later stage of the network
describe the relu activation function
all negative values for y are set to 0, has a gradient of 1 for positive values for y
what is the input to a cnn?
raw 2d image
how do we represent each layer of a cnn?
rectangle
what is a fully connected layer?
each neuron in the current layer is connected to all neurons in the next layer
what do the different layers of the cnn learn, how is it different to a standard mlp? (hint: features)
early layers learn the features that are used for classification, rather than them needing to be pre-selected
give a kernel for horizontal lines
1 1 1
0 0 0
-1 -1 -1
give a kernel for vertical lines
1 0 -1
1 0 -1
1 0 -1
what is padding?
ensure the size of the output remains the same size by extending the original image in a non informative way
what is a pooling operation?
select the (max/min/average) value in a local area
what is the effect of pooling, why do we do it?
makes the model more robust to variations in position
what is the flatten layer?
converts the 2d map into a 1d array of features
what is a recurrent neural network
include loops in the hidden layer neurons which allow the network to use historical information
on what type of data are recurrent neural networks useful?
time series
sequence
what is a generative adversarial network?
generates data that the discriminator thinks is from the training set.
what are the two parts of a generative adversarial network?
generator and a discriminator
relu vs sigmoid
relu is faster and solves the vanishing gradient problem
cons of neural networks (2)
does not take into account spatial or temporal information
black box
what is a capsule neural network
we not only have a representation of the image, put also its pose