Lecture 16 - Introduction to deep learning Flashcards
What is regularization in machine learning?
Techniques designed to reduce test error, even at the cost of increasing training error, to balance model complexity and data.
Think of regularization as a “balancing act” that prevents overfitting by simplifying the model’s behavior.
Name three common regularization techniques in deep learning.
Dropout: Temporarily disables random neurons during training.
Early Stopping: Halts training when validation error stops improving.
Parameter Norm Penalties: Penalizes large weights (e.g., L1/L2 norms).
What is dropout, and why is it used?
Dropout prevents overfitting by randomly dropping units (neurons) during training, making the model less reliant on specific features.
Define Early Stopping.
A method to avoid overfitting by stopping training when validation set performance no longer improves.
What is stochastic gradient descent (SGD)?
A method that updates model weights by calculating gradients on small, random batches of data.
How does momentum improve optimization?
Momentum accumulates gradients over time to accelerate convergence and smooth updates.
What is the Adam optimizer?
Combines momentum and adaptive learning rates for efficient optimization.
What is the primary purpose of Convolutional Neural Networks (CNNs)?
To process grid-like data such as images or time-series.
What are the three key stages of a CNN layer?
Convolution: Detects features using filters.
Non-linearity: Introduces non-linear decision boundaries (e.g., ReLU).
Pooling: Reduces spatial dimensions for efficiency and invariance.
What are recurrent neural networks (RNNs) used for?
Processing sequential data (e.g., text, time series) by sharing parameters across time steps.
How do RNNs differ from CNNs?
RNNs process data across time (sequentially), while CNNs focus on spatial patterns in grid-like data.
What are parameter norm penalties?
Add terms to the loss function to penalize large weights.
L1 Norm: Promotes sparsity.
L2 Norm: Penalizes the size of weights (weight decay).
Why is dataset augmentation useful?
Generates synthetic data to reduce overfitting, especially in data-scarce domains.