03 - Multi Layer Perceptron Flashcards

Question 1

Q

What are capcity, optimization and generalization

Answer

A

Capacity: the range or scope of the types of functions that the model can approximate
Optimization: minimization of training error
Generalization: the model’s ability to adapt properly to new, previously unseen data, drawn from the same distribution as the one used to create the model.

Question 2

Q

Describe a fully connected NN, incl parameter sizes.

Answer

A

Usual form for a neural net: h = (Wx+b)
→ h - hidden layer response, W - weight matrix, 1 vector per neuron, x - input vector, b - bias vector

Like in regression, we add a bias to be able to offset the response.
- x → nx1 vector
- h → mx1 vector
- W → mxn matrix
- Wx = a→ mx1 (pre activation response)
- b → mx1
- a+b → mx1
- h = g(a+b)

Question 3

Q

What are hyperparameters

Answer

A

in neural nets, we train the weights and biases. Everything else that is adjustable are hyperparameters:

Number of layers, propagation type (fully connected, convolutional) activation function, loss function & parameters, training iterations and batch sizes

Question 4

Q

Explain the input, hidden and output layers in a NN

Answer

A

Input layer
- Vectorized version of input data
- Sometimes it is preprocessed
- Weights connect to the hidden layer
- Weights & biases are floats, not integers, so input needs to be converted to floats

Hidden layer(s)
- There is no answer to how many layers to use, it depends on the task and should be trained
- In the example the nr of perceptrons expands, normally a compression structure is seen
- Eg. we compress pixel values → features → class possibilities.
- Wether we see an expanding or bottleneck topology is strongly application dependend

Output Layer
- usually no activation function is used for the output layer, as the probability for all classes is wanted
- For regression
- Linear outputs with MSE (Mean Square Error)
- For classification
- Softmax units (logistic sigmoid for two classes)
- many other options for other applications

The output before softmax: o = Wx+b (logits)
Predicted label: hat y=softmax(o)
Loss is found via Negative log likelihood or cross entropy:NLL/CE: (o,y)

Question 5

Q

Normalization vs Standardization

Answer

A

Standardization centers data around a mean of zero and a standard deviation of one
Normalization scales data to a set range, often [0, 1], by using the minimum and maximum values.

Question 6

Q

Typical activation functions

Answer

A

Sigmoid:
- sigma’(a)=sigma(a)(1-sigma(a))

Tanh:
- tanh’(x)=1-tanh(x)^2

ReLU(rectified linear unit):
- relu’(x)=step(x)
- most used

You can find the derivative by using the original input which makes sigmoid and tanh already popular, but reLU is the simplest, where the gradient is just a step function

Question 7

Q

Perceptron -> Multilayer Perceptron

Answer

A

One-way computational chain, big function block with many free parameters

Input processing: h_1 = g_1(W_1x+b_1)
Processing of first hidden representation: h_2 = g_2(W_2h_1+b_2)
…keep on going for each layer

Earlier, wideness was used, so many neurons but no deepness. This did not work, so now we go deep with and reduce the number of neurons pr layer.

Question 8

Q

MLP Max Likelihood Recap

Answer

A

-So we have a model p, input x and try to predict an output y
-We use ML to estimate the model parameters with a training dataset
W_{ML}=arg max E_{x~*p_data}log p(y|x)
→ the expectation is the mean over the m trianing examples
*p_data is the empirical distribution, a limited sample since we do not have access to the full population

The model should follow this empirical distribution, such that we are able to predict future test cases
To get high classification accuracy, we need something to optimize with, like cross entropy or negative log likelihood

Question 9

Q

03 - Multi Layer Perceptron Flashcards

(9 cards)