Slides met chatgpt Flashcards
What is regularization in machine learning?
Regularization consists of strategies explicitly designed to reduce test error, potentially at the expense of increased training error. Its goal is to move a model from overfitting to matching the data complexity.
List some common regularization techniques used in deep learning.
Common techniques include parameter norm penalties, dataset augmentation, ensemble methods, increasing noise robustness, semi-supervised learning, multitask learning, early stopping, parameter tying and sharing, dropout, and adversarial learning.
What is a parameter norm penalty, and how is it used in regularization?
Parameter norm penalties limit model capacity by adding a penalty term,
Ξ©(π), to the objective function
π½J. This regularizes weights
wβΞΈ (excluding biases) to reduce overfitting.
Define L1 norm penalty and its application in regularization.
The L1 norm penalty is defined as
Ξ©(π)=β₯π€β₯1=βπβ£|π€πβ£β£. This approach, also known as LASSO, encourages sparsity in the model parameters.
Define L2 norm penalty and its application in regularization.
The L2 norm penalty is defined as
Ξ©(π)=β₯ π€ β₯=π€πw
What is dataset augmentation, and why is it useful?
Dataset augmentation involves generating additional synthetic data to enhance the training set. Examples include image transformations (rotation, flips), noise addition in audio, and synonym replacement in text, helping improve model generalization.
Explain ensemble methods and their role in reducing model error.
Ensemble methods combine multiple learners to reduce error by averaging predictions. Techniques like bagging, boosting, and stacking are used to generate diverse models that reduce the variance in predictions.
Describe dropout as a regularization technique.
Dropout is a technique where units in a neural network are randomly βdroppedβ (set to zero) during training, forcing the network to learn redundant representations. This can be thought of as training an ensemble of subnetworks.
What is the early stopping technique in model training?
Early stopping involves monitoring validation error during training and halting training when validation error no longer improves, preventing overfitting to the training data.
Define parameter tying and parameter sharing.
Parameter tying keeps parameters close by adding a penalty to the loss, while parameter sharing enforces identical parameters, common in CNNs, to reduce memory usage and improve efficiency.
Explain the difference between deterministic and stochastic gradient descent.
Deterministic gradient descent uses the whole dataset for each update, while stochastic gradient descent (SGD) updates weights based on individual data points or small minibatches, adding noise to the learning process.
What is momentum in optimization, and why is it used?
Momentum is a technique that accelerates gradient descent by maintaining a moving average of past gradients, helping the model navigate along consistent gradient directions and dampen oscillations.
Describe the Adam optimizer.
Adam (Adaptive Moments) is an optimization algorithm that adjusts learning rates based on the moments (mean and uncentered variance) of the gradients, providing more efficient convergence.
What are Convolutional Neural Networks (CNNs) designed for?
CNNs are designed for data with grid-like structure, such as images (2D grids of pixels) and time-series data (1D grids of samples). They use convolution in place of general matrix multiplication in certain layers.
What is a convolution operation in CNNs?
Convolution in CNNs involves sliding a filter (kernel) over the input to compute feature maps, capturing spatial hierarchies in data by leveraging sparse interactions, parameter sharing, and equivariance.
Define pooling in the context of CNNs.
Pooling reduces the spatial dimensions of feature maps, typically by taking the max or average in each neighborhood. This introduces translation invariance and reduces the number of parameters in the network.
What are Recurrent Neural Networks (RNNs) used for?
RNNs are designed for sequential data and allow parameter sharing over time, making them effective for tasks like language modeling and time-series forecasting.
Explain the concept of unfolding in RNNs.
Unfolding is the process of representing the computational graph of an RNN across time steps, allowing the network to learn temporal dependencies through recurrent connections.
Summarize the bias-variance trade-off in machine learning.
The bias-variance trade-off describes how increasing model complexity reduces bias but increases variance. A balanced model minimizes both to avoid underfitting and overfitting.