Deep learning models and methods Flashcards

Question 1

Q

What is the basic building unit of neural networks?

Answer

A

A two step computation

Combine inputs as weighted sum

Compute output by activation function of combined inputs

Question 2

Q

What are the common activation functions for neural nets?

Answer

A

Sigmoid (0 to 1)

Hyperbolic tangent (-1 to +1)

Rectified linear unit (Rlu) - (if negative = 0 else just the value)

for output the identity function can be used (f(x) = x)

Question 3

Q

Why use activity functions?

Answer

A

They allow us to use non linear functions to seperate the data into linear seperable data.

Question 4

Q

What is the typical cost function for a neural net and why do we have it?

Answer

A

The cost function is typically the negative log probability of the correct answer.

We use this since it has a very big gradient, when the target value is 1 and the output value is close to 0. This will allow us to have a more stale gradient, so we can achieve our optimal cost faster than before

Maximizing log-likelihood equivalent to minimizing mean squared error.

Question 5

Q

Explain the softmax algorithm

Answer

A

We have logits that receive values maybe larger than 1.

The softmax function then squashes these values such that they are between 0 and 1

If we take an input of [1, 2, 3, 4, 1, 2, 3], the softmax of that is [0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]. The output has most of its weight where the ‘4’ was in the original input. This is what the function is normally used for: to highlight the largest values and suppress values which are significantly below the maximum value.

Question 6

Q

What function do we often use for output units?

Answer

A

Linear units

or

Sigmoid and softmax units

Question 7

Q

What activation function is often used for hidden units?

Answer

A

Relu / Hyperbolic tangent / sigmoid

often hyperbolic is better than sigmoid

Question 8

Q

What are convolutional networks?

Answer

A

They are special neural networks that uses convolution (a bunch of filtering) in order to classify the input.

They only work for problems that can be specified reasonbly! in a grid.

Question 9

Q

What is the common structure for a convolution network?

Answer

A

Input layer

convolution layer

relu layer (or another kind of detection layer)

pooling layer

… additional layers

…

output layer

Question 10

Q

What is convolution?

Answer

A

For convolution we one or several kernels

which has a size.

The kernels represent general structure of the image (remember the components of a cross).

We can then use a kernel on each picture to gain a new grid of values where the values are of the kernel is summed and divided by the total of kernel elements. This gives us a number for each pixel we place the kernel.

convolution is simply to use this filtering method on every pixel.

Question 11

Q

What is max pooling?

Answer

A

Pick a window size (often 2-3)

pick a stride (how much the window moves) usually 2

Walk the window across the filtered images

from each window take the maximum value

This results in a smaller grid, which can handle variantions in the image (bad handwriting etc.)

Question 12

Q

What is back propagation?

Answer

A

Lets imagine an error function

error = right answer - actual answer

The error or cost function tells us how much away we are from the actual correct values.

Instead of computing it all allover again. We can go backwards in the algorithm and finetune different places. We do only have to go back a specific path until we get the desired weight we need to change. This greatly reduce the overall computing time.

Question 13

Q

What do we do if we have to classify temporal data or data in sequences?

Answer

A

Use requrrent neural networks!

Question 14

Q

What is the basic principle about recurrent NN?

Answer

A

The output of the first neural network is passed on to the next along with a new input.

This can be used to classify various task

Question 15

Q

What are the problem with long term dependencies?

Answer

A

Over many steps we have the problem that the values 100 steps ago has complely vanished.

Gradients propagated over many stages tend to either vanish or explode.

A possible solution is to use Long Short-term memory models

here we keep some memory stored in a cell state. So the task will now be to decide if we should store new or use the old value

Question 16

Q

What smart things can we do with text processing?

Answer

Study These Flashcards

A

We can make use of word embeddings, which allows us to examine the distance from one word to a related word such as:

country name -> capital

Question 17

Q

It is very har computational wise to do training for an RNN why is that?

Answer

Study These Flashcards

A

When we have all the recurrances, we can see each recurrances as a layer in a normal feed forward net. So training a RNN with 100 recurrances, is the same as training a 100 layered FFN

Question 18

Q

Answer

Study These Flashcards

A

Deep learning models and methods Flashcards

(18 cards)