lesson_3_flashcards
What is regularization in deep learning?
Techniques to prevent overfitting by penalizing large weights or encouraging sparsity, such as L1, L2, or dropout regularization.
How does dropout regularization work?
At each iteration, nodes are randomly ‘dropped’ (deactivated) with a probability (p), preventing over-reliance on specific features.
What is batch normalization?
A method to normalize layer outputs during training, improving gradient flow and learning stability.
What is data augmentation?
Techniques like flipping, rotating, or adding noise to expand datasets artificially, improving model generalization.
Why is weight initialization important?
Proper initialization ensures effective gradient flow, avoiding vanishing or exploding gradients, and helps models converge faster.
What is Xavier initialization?
A method to maintain consistent variance of activations across layers by scaling weights based on the number of input and output nodes.
What are optimizers in deep learning?
Algorithms like SGD, Adam, and RMSProp used to adjust model parameters to minimize the loss function.
What is momentum in optimization?
A technique to smooth updates by incorporating past gradients, improving convergence speed and stability.
What is the vanishing gradient problem?
A phenomenon where gradients become very small as they propagate back through layers, slowing or halting learning.
What are the key considerations for designing a neural network architecture?
Understanding the data, selecting appropriate layers, ensuring gradient flow, and leveraging domain-specific insights.
What is the purpose of normalization in deep learning?
To standardize input or layer data, ensuring balanced and effective gradient flow throughout the network.
What is data preprocessing?
Preparing raw data for training through techniques like normalization, scaling, or encoding.
How does ReLU improve gradient flow?
By providing non-saturating gradients for positive inputs, ensuring gradients remain large enough to drive learning.
What is the importance of hyperparameter tuning?
Adjusting settings like learning rate, batch size, or regularization strength can significantly impact model performance.
What is the difference between overfitting and underfitting?
Overfitting occurs when a model performs well on training data but poorly on unseen data, while underfitting fails to capture patterns in training data.