3 Multilayer Perceptrons Flashcards

Question 1

Q

What is the form for fully connected NNs

Answer

A

h = g (Wx + b)
W: Matrix of weights, one vector per neuron
x: one input example (vector)
b: vector of biases, one scalar per neuron
h: hidden layer response
g: the activatoin function

Question 2

Q

What are the hyperparameters of multilayer perceptrons?

Answer

A

Number of layers
Propagation types: fully connected, convolutional
Activation functions
Loss functions and parameters
training iteration and batch sizes

Question 3

Q

What is important about the input layer?

Answer

A

A vectorized version of the input data
Somtimes preprocessed
weights connect to the hidden layers

Question 4

Q

What is important about the hidden layer?

Answer

A

Number of hidden layers
Number of neurons
Topology

Design is applicaiton dependent

Question 5

Q

What is topology?

Answer

A

Refers to the way neurons are connected
Wheather the hidden layer os expanding or a bottleneck
often the number of neurons is reduced in the layers after the input

Question 6

Q

What are the different types of output layer?

Answer

A

For regression:
- Linear output with MSE

For classification:
- Softwax units
- logistic for two classes

Question 7

Q

What are the most used activation functions

Answer

A

Sigmoid: sigmoid (x) = 1/(1+exp(-x))
tanh: tanh(x)
ReLu: max(0,x)

ReLu learns much faster and better than the others

Question 8

Q

What is the validation set used for?

Answer

A

Fine tuning

Question 9

Q

What is a common method to increase capacity

Answer

A

More neurons

Question 10

Q

What is multilayer perceptron (MLP)
What does is look like? (hidden layers)

Answer

A

A feedforward network -> one-way computational chain

Input x, first hidden representation
h1 = g1(W1 x + b1)

Then next layer
h2 = g2(W2 h1 + b2)

Question 11

Q

What is the universal approximation of MLP?

Answer

A

We can think of Multilayer perceptron as a big function block with many free parameters.
Even with a single hidden layer, any function can be represented (as long as we use a non-linearity. We are not guaranteed, however, that the training algorithm will be able to learn that function.
in practice, single layer nets may not be trained well to a task. Instead, we go deep and reduce the number of neurons per layer

Question 12

Q

What is an Epoch?
And what happens if we use too many or to few?

Answer

A

A hyperparameter that is one complete pass through the entire training dataset.
one example at a time in order then the models parameters are updated based on the error made on the example.
Typically DNN are trained for a large number of epochs.
To few: may underfit the data(not able to capture underlying patterns)
Too many: may overfit (fitting noise in data rather than underlying patterns)

Question 13

Q

How can we determine optimal number of epochs

Answer

A

Early stopping
Cross-validation

Decides when to stop the training

Question 14

Q

Generalization error

Answer

A

We have only access to limited sample(not full popultion)
Empirical distribution phat_data

We want our model to predict future test cases:
It is a measure of how well the DNN is able to generalize its knowledge from the training data to new, unseen data.
What separates machine learning from optimization is that we want the generalization error, also called the test error, to be low as well.

Question 15

Q

Recall Maximum likelihood

Answer

A

Neural model p tries to predict an output label from image
ML estimate using a training set:

W_ML = argMax E_(x~phat_data) log p(y | x)
expectation is the mean over m training images

Question 16

Q

Cross entropy

Answer

A

Used to determine how good the NN fits the data. (performance of the model on specific dataset)

log(prediction)

Question 17

Q

What is overfitting

Answer

A

occurs when a model becomes to complex and begins to memorize the training data, rather than generalizing to new data.
fitting noise in data rather than underlying patterns

Question 18

Q

What can cause Overfitting?
And how to prevent it

Answer

A

Causing overfitting:
- To many layers or neurons in the network(more parameters, more complex)
- Insufficient data (small dataset, esay memorize)
- Training to many epochs
- High learning rate: cause model to converge to quickly to a suboptimal solution

Preventing overftting:
- Early stopping: Monitor the models performance on a validation during training. Stop when it degrades.
- Using data augmentation: Gives diverse training examples to help generalize the model better

Question 19

Q

What is underfitting?

Answer

A

Occurs when a model is not complex enough.
It is not able to capture underlying patterns

Question 20

Q

What can cause Underfitting?
And how to prevent it

Answer

A

Causing underfitting:
- To few layers or neurons: Not enough capacity
- Using a wrong model: using a linear model for a non-linear problem
- Poor choise of activation function
- Overuse of regularization techniques
- Insufficient data: the model may not be able to learn the underlying pattern.
- Using a too complex model fora simple task, also known as the “Goldilocks principle”

Prevent underfitting:
- Increase model complexity,
- Gather more data
- carefully tuning the model’s hyperparameters such as learning rate, batch size, and number of layers/neurons.

Question 21

Q

What is Train-val-test split:

Answer

A

A method used to divide a dataset inro three parts: training set, validation set and test set. (Dont leak information!)

Training set is used train, typically using an optimization algorithm like stochastic gradient descent. Model parameters are adjusted using the minimize loss function

Validation set is used to tune the models hyperparameters.
Used to evaluate the perfomance on unseen data.

Test set is used to evaluate the models performance. It is used as the final evaluation of the model, and it is the measure of how well the model generalizes to new data.