3 Multilayer Perceptrons Flashcards

1
Q

What is the form for fully connected NNs

A

h = g (Wx + b)
W: Matrix of weights, one vector per neuron
x: one input example (vector)
b: vector of biases, one scalar per neuron
h: hidden layer response
g: the activatoin function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the hyperparameters of multilayer perceptrons?

A

Number of layers
Propagation types: fully connected, convolutional
Activation functions
Loss functions and parameters
training iteration and batch sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is important about the input layer?

A
  • A vectorized version of the input data
  • Somtimes preprocessed
  • weights connect to the hidden layers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is important about the hidden layer?

A

Number of hidden layers
Number of neurons
Topology

Design is applicaiton dependent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is topology?

A

Refers to the way neurons are connected
Wheather the hidden layer os expanding or a bottleneck
often the number of neurons is reduced in the layers after the input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the different types of output layer?

A

For regression:
- Linear output with MSE

For classification:
- Softwax units
- logistic for two classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the most used activation functions

A

Sigmoid: sigmoid (x) = 1/(1+exp(-x))
tanh: tanh(x)
ReLu: max(0,x)

ReLu learns much faster and better than the others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the validation set used for?

A

Fine tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a common method to increase capacity

A

More neurons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is multilayer perceptron (MLP)
What does is look like? (hidden layers)

A

A feedforward network -> one-way computational chain

Input x, first hidden representation
h1 = g1(W1 x + b1)

Then next layer
h2 = g2(W2 h1 + b2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the universal approximation of MLP?

A

We can think of Multilayer perceptron as a big function block with many free parameters.
Even with a single hidden layer, any function can be represented (as long as we use a non-linearity. We are not guaranteed, however, that the training algorithm will be able to learn that function.
in practice, single layer nets may not be trained well to a task. Instead, we go deep and reduce the number of neurons per layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an Epoch?
And what happens if we use too many or to few?

A

A hyperparameter that is one complete pass through the entire training dataset.
one example at a time in order then the models parameters are updated based on the error made on the example.
Typically DNN are trained for a large number of epochs.
To few: may underfit the data(not able to capture underlying patterns)
Too many: may overfit (fitting noise in data rather than underlying patterns)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can we determine optimal number of epochs

A

Early stopping
Cross-validation

Decides when to stop the training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Generalization error

A

We have only access to limited sample(not full popultion)
Empirical distribution phat_data

We want our model to predict future test cases:
It is a measure of how well the DNN is able to generalize its knowledge from the training data to new, unseen data.
What separates machine learning from optimization is that we want the generalization error, also called the test error, to be low as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Recall Maximum likelihood

A

Neural model p tries to predict an output label from image
ML estimate using a training set:

W_ML = argMax E_(x~phat_data) log p(y | x)
expectation is the mean over m training images

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Cross entropy

A

Used to determine how good the NN fits the data. (performance of the model on specific dataset)

  • log(prediction)
17
Q

What is overfitting

A

occurs when a model becomes to complex and begins to memorize the training data, rather than generalizing to new data.
fitting noise in data rather than underlying patterns

18
Q

What can cause Overfitting?
And how to prevent it

A

Causing overfitting:
- To many layers or neurons in the network(more parameters, more complex)
- Insufficient data (small dataset, esay memorize)
- Training to many epochs
- High learning rate: cause model to converge to quickly to a suboptimal solution

Preventing overftting:
- Early stopping: Monitor the models performance on a validation during training. Stop when it degrades.
- Using data augmentation: Gives diverse training examples to help generalize the model better

19
Q

What is underfitting?

A

Occurs when a model is not complex enough.
It is not able to capture underlying patterns

20
Q

What can cause Underfitting?
And how to prevent it

A

Causing underfitting:
- To few layers or neurons: Not enough capacity
- Using a wrong model: using a linear model for a non-linear problem
- Poor choise of activation function
- Overuse of regularization techniques
- Insufficient data: the model may not be able to learn the underlying pattern.
- Using a too complex model fora simple task, also known as the “Goldilocks principle”

Prevent underfitting:
- Increase model complexity,
- Gather more data
- carefully tuning the model’s hyperparameters such as learning rate, batch size, and number of layers/neurons.

21
Q

What is Train-val-test split:

A

A method used to divide a dataset inro three parts: training set, validation set and test set. (Dont leak information!)

Training set is used train, typically using an optimization algorithm like stochastic gradient descent. Model parameters are adjusted using the minimize loss function

Validation set is used to tune the models hyperparameters.
Used to evaluate the perfomance on unseen data.

Test set is used to evaluate the models performance. It is used as the final evaluation of the model, and it is the measure of how well the model generalizes to new data.