12 - Terra Incognita Flashcards
What is the addition operation described in the context of binary numbers?
Addition is performed modulo-97, meaning sums wrap around between 0 and 96.
How is the sum expressed in modulo-97 addition?
sum = x + (some multiple of 97), where 0 ≤ x ≤ 96.
What does ‘grokking’ refer to in the context of neural networks?
‘Grokking’ is a term used to describe a deep understanding and internalization of information by a neural network.
What is the relationship between the number of parameters in a neural network and its performance?
More parameters can lead to overfitting, while fewer parameters can lead to underfitting.
What is overfitting in machine learning?
Overfitting occurs when a model learns details and noise in the training data to the extent that it negatively impacts its performance on new data.
What is underfitting in machine learning?
Underfitting occurs when a model is too simple to capture the underlying pattern of the data.
Describe the bias-variance trade-off.
High bias leads to underfitting, while high variance leads to overfitting. The goal is to find a balance between the two.
What is the role of the test dataset in machine learning?
The test dataset is used to evaluate the model’s performance on unseen data, indicating its ability to generalize.
What happens when a model is too complex?
It may overfit the training data, leading to poor performance on test data due to capturing noise.
What is a simple model’s performance on noisy data?
A simple model may ignore noise, leading to high training and test errors.
Fill in the blank: The number of ________ in a model determines its complexity and capacity.
[parameters]
True or False: A model with too few parameters will have a high risk of overfitting.
False
What is the consequence of a model that tracks every variation in the training data?
It leads to overfitting and poor generalization to test data.
What does the capacity of a hypothesis class refer to?
It refers to the range of functions that a model can approximate based on its parameters.
What is the universal approximation theorem?
It states that a neural network with sufficient neurons can approximate any function.
How does model complexity affect training risk?
Training risk decreases as model complexity increases, up to a point, after which it may increase due to overfitting.
What is the consequence of using a very simple model on complex data?
It will likely result in high training and test errors due to underfitting.
Fill in the blank: The goal of an ML engineer is to find the sweet spot between ________ and variance.
[bias]
What is one of the most significant challenges in model selection?
Determining the right level of complexity to avoid underfitting and overfitting.
What happens to test error when a model is too complex?
The test error tends to increase due to overfitting.
Describe the performance of a model that has a high bias.
It will likely underfit the training data and perform poorly on test data.
What does a highly complex, nonlinear model do during training?
It minimizes training errors but may generalize poorly to test data.
What happens to the capacity of the hypothesis class when the number of parameters is increased?
It increases the capacity of the hypothesis class.
What does the dashed curve in the figure represent?
The training risk, or the risk that the model makes errors on the training dataset.