Model Training, Tuning and Evaluation Flashcards
What is an Activation Function?
It is a function within a Neuron that defines the output of a node / neuron based on its input signal.
What is a Linear Activation Function?
It mirrors what came into it as an output. Think of it as a pass-through.
Can a Linear Activation Function perform back propagation?
No
What is a Binary Step Function?
It is on or off. Like a light switch. It only has a single value.
Why are non-linear activation functions better than linear ones?
They allow for back propagation and multiple layers.
What is a Rectified Linear Unit (ReLU)?
Used for deep learning. Very fast and easy to compute.
What is Leaky ReLU?
It introduces a negative slope below zero
What is PReLU?
It is like leaky ReLU, but the slope is learned from back propagation.
What is Maxout?
It outputs the max of the inputs.
What is Softmax?
The final output layer of a multi-class classification problem. It converts the outputs to a probability of each classification. Only handles a single label.
What can Sigmoids do that Softmax cannot?
Multiple classifications
What is TanH best for?
RNNs
What is the activation function selection in steps?
ReLu, Leaky ReLu, PReLU, Maxout
What is a CNN?
A Convolutional Neural Network
What does a CNN do?
It finds a feature within your data. This could be in text or something in an image.
What is the LeNet-5 CNN?
Used for handwriting analysis
What is the AlexNet CNN?
Used for image classification
What is an RNN for?
Sequences of data. Time series, web logs, captions, machine translation, etc..
What is a recurrent neuron?
It is a neuron that remembers the data from previous runs.
Can you have a layer of recurrent neurons?
Yes
What is an Epoch?
An iteration in which we train.
What is Learning Rate?
A hyperparameter that controls how much of the model’s weights are adjusted with respect to the loss (error) after each iteration during training.
What does too high a learning rate cause?
Overshooting the optimal solution.
What does too low a learning rate cause?
Taking too long to find the optimal solution.
What is the batch size hyperparameter?
How many training samples are used within each batch of each epoch.
What is local minima?
A dip in the graph.
Do smaller batch sizes get stuck in “local minima”?
Yes, but they can work their way out. Batch sizes that are too large end up getting stuck at the wrong solution.
What does regularization do?
It prevents overfitting.
What is overfitting?
When a model is good at making predictions on the training data, but not on the new data it hasn’t seen before.
What does dropout do?
It drops out specific neurons at random. Standard in CNNs
Can fewer layers or neurons prevent overfitting?
Yes
What is early stopping?
Stopping at a specific epoch when your accuracy is degrading over time.
What does L1 and L2 Regularization do?
They prevent overfitting
What is the L1 formula
Term is the sum of the weights