Deep learning models and methods Flashcards
What is the basic building unit of neural networks?
A two step computation
Combine inputs as weighted sum
Compute output by activation function of combined inputs
What are the common activation functions for neural nets?
Sigmoid (0 to 1)
Hyperbolic tangent (-1 to +1)
Rectified linear unit (Rlu) - (if negative = 0 else just the value)
for output the identity function can be used (f(x) = x)
Why use activity functions?
They allow us to use non linear functions to seperate the data into linear seperable data.
What is the typical cost function for a neural net and why do we have it?
The cost function is typically the negative log probability of the correct answer.
We use this since it has a very big gradient, when the target value is 1 and the output value is close to 0. This will allow us to have a more stale gradient, so we can achieve our optimal cost faster than before
Maximizing log-likelihood equivalent to minimizing mean squared error.
Explain the softmax algorithm
We have logits that receive values maybe larger than 1.
The softmax function then squashes these values such that they are between 0 and 1
If we take an input of [1, 2, 3, 4, 1, 2, 3], the softmax of that is [0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]. The output has most of its weight where the ‘4’ was in the original input. This is what the function is normally used for: to highlight the largest values and suppress values which are significantly below the maximum value.
What function do we often use for output units?
Linear units
or
Sigmoid and softmax units
What activation function is often used for hidden units?
Relu / Hyperbolic tangent / sigmoid
often hyperbolic is better than sigmoid
What are convolutional networks?
They are special neural networks that uses convolution (a bunch of filtering) in order to classify the input.
They only work for problems that can be specified reasonbly! in a grid.
What is the common structure for a convolution network?
Input layer
convolution layer
relu layer (or another kind of detection layer)
pooling layer
… additional layers
…
output layer
What is convolution?
For convolution we one or several kernels
which has a size.
The kernels represent general structure of the image (remember the components of a cross).
We can then use a kernel on each picture to gain a new grid of values where the values are of the kernel is summed and divided by the total of kernel elements. This gives us a number for each pixel we place the kernel.
convolution is simply to use this filtering method on every pixel.
What is max pooling?
Pick a window size (often 2-3)
pick a stride (how much the window moves) usually 2
Walk the window across the filtered images
from each window take the maximum value
This results in a smaller grid, which can handle variantions in the image (bad handwriting etc.)
What is back propagation?
Lets imagine an error function
error = right answer - actual answer
The error or cost function tells us how much away we are from the actual correct values.
Instead of computing it all allover again. We can go backwards in the algorithm and finetune different places. We do only have to go back a specific path until we get the desired weight we need to change. This greatly reduce the overall computing time.
What do we do if we have to classify temporal data or data in sequences?
Use requrrent neural networks!
What is the basic principle about recurrent NN?
The output of the first neural network is passed on to the next along with a new input.
This can be used to classify various task
What are the problem with long term dependencies?
Over many steps we have the problem that the values 100 steps ago has complely vanished.
Gradients propagated over many stages tend to either vanish or explode.
A possible solution is to use Long Short-term memory models
here we keep some memory stored in a cell state. So the task will now be to decide if we should store new or use the old value