lesson_2_flashcards
What is a computation graph?
A directed acyclic graph representing a function as interconnected modules, allowing gradient computations for optimization.
Why must layers in computation graphs be differentiable?
Differentiability ensures gradients can be computed, which are essential for optimization using gradient descent.
What is backpropagation?
An algorithm to compute gradients for all parameters in a computation graph by recursively applying the chain rule from outputs to inputs.
What are the steps of backpropagation?
- Forward pass to compute activations; 2. Backward pass to calculate gradients of loss with respect to parameters using chain rule.
What is the difference between forward and reverse mode automatic differentiation?
Forward mode propagates gradients from inputs to outputs, while reverse mode computes gradients from outputs to inputs, which is more efficient for deep learning.
What is the significance of automatic differentiation in deep learning?
It automates gradient computation for arbitrary computation graphs, simplifying implementation and enabling differentiable programming.
How does the chain rule apply in backpropagation?
Gradients of the loss with respect to parameters are computed as products of intermediate gradients along the computation graph.
What is a fully connected (linear) layer in neural networks?
A layer where each input is connected to every output through a weighted sum, followed by an optional activation function.
What are hidden layers in a neural network?
Layers between input and output that learn intermediate features, increasing the representational power of the model.
What is a rectified linear unit (ReLU)?
A non-linear activation function defined as max(0, x), providing better gradient flow than sigmoid functions.
What is the role of the Jacobian in neural networks?
Jacobians represent gradients of vector-valued functions with respect to their inputs, aiding in efficient gradient computations.
What is logistic regression in the context of computation graphs?
A binary classifier using the sigmoid function on a weighted sum of inputs, represented as a simple computation graph.
What is gradient flow in deep learning?
The propagation of gradient information through a network during backpropagation, critical for effective learning.
What is differentiable programming?
A paradigm where entire programs, including control flows, are made differentiable to enable optimization through backpropagation.
What are mini-batches, and why are they used?
Small subsets of training data used in gradient descent to balance computational efficiency and stable optimization.