March 2025 Flashcards
Main benefit of using logarithms (mathematically speaking)
Multiplication, which can create really really large and small numbers can be replaced by addition.
log(a*b) = log(a) + log (b)
Why use the negative log function as a loss function
Because the negative log gives us very high loss values when the prediction (accuracy)is close to zero.
Cross Entropy Loss
SoftMax followed by the negative log likelihood loss.
Cross Entropy (etymological description)
Cross Entropy is the comparison of two different probability distributions. In classification, this is usually a comparison between the known probability distribution (the labels), and the current predicted probability. You can see how that would yield itself to a loss function.
Consequence of using a learning rate that is too low?
It will take too many epochs to converge. Too many epochs means overfit (the model will memorize the dataset)
What is the most common last activation layer in a Classification CNN?
SoftMax
What is the most common last activation layer in a Binary Classification CNN?
Sigmoid
What is the concept called that is used to protect the general image classification tasks of a pre-trained model?
Freezing pre-trained layers.
Does a learning rate have to be a single number?
No, they can differ by layer. And because layers are a gradient of abstraction it is often inappropriate for a single, one-size-fits-all learning rate.
Discriminative Learning Rates
What should you focus on when training to know if your model is overconfident or beginning to overfit? And what should you NOT focus on?
You should focus on your metrics. You should not focus on the loss.
The loss function is just something you give to the model that it can differentiate and therefore perform SGD.
Downsides of deeper architectures?
More prone to overfit (due to more parameters with which to overfit)
Out of memory errors driving smaller batch sizes
Much longer training times
One way of speeding up the training of deep networks?
Mixed-Precision Training
Using half-precision floating point fp16 where possible during training
CUDA has a mode where you can enable this.
/sys
It is a virtual file system for modern Linux distributions to store and allows modification of the devices connected to the system.
F.relu
ReLU Rectified Linear Unit (replace every negative number with a zero)
activation function
a nonlinear layer
Precision
How many positive indications were actually positive.
Precision = TP / (TP +FP)
TP = True Positives
FP = False Positives
Precision is about CUTTING down on false positives.
Recall
How many of the actual positive instances were correctly identified.
Recall = TP / (TP + FN)
TP = True Positives
FN = False Negatives
Recall all the known positives at the risk of more false positives
Harmonic Mean & reason it is used in F1 score
N / (Sum of 1/xi)
lower values have a stronger influence
(It’s a “bad apples” algo)
F1 score
0 to 1
Harmonic mean of Precision and Recall
F1 = 2 (Precision * Recall) / (Precision + Recall)
Way of digging into what your classification model got wrong
Confusion Matrix
PyTorch method that changes the shape of a tensor without changing its contents.
view(-1,2828)
-1 is a special parameter to view that means “make this axis as big as necessary to fit all the data
Here we are multiplying 2828 *the lengths of the two image dimensions to get the new length of the new (smushed) dimension.
Taking an image and vectorizing it.
Define a PyTorch Dataset
a collection that contains tuples of independent and dependent variables.
independent = inputs, dependent = targets
What is the deal with “L” from FastAI?
Meaning of L
L is a specialized list-like container provided by fastcore ( dependency of fastai).
It extends Python lists with additional functionality such as element-wise operations, filtering, and mapping.
What is the significance of a method in PyTorch that end in an underscore?
This method will modify its contents in place. (Like a mutator from OO world).