Neural Networks Flashcards
What is deep learning, and how does it contrast with other machine learning algorithms?
- Subset of ML that is concerned with neural networks: how to use back propagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data. In that sense, deep learning represents an unsupervised learning algorithm that learns representations of data through the use of neural nets.
Vanishing Gradient
- Happens when training deep NNs, particularly those with many layers, such as recurrent neural networks, and deep forward networks. It is characterized by the gradients of the loss function with respect to the weights becoming very small, almost approaching zero, as they are propagated back through the layers during back propagation. This results in slow or stalled learning because the weights are updated minimally.
- What causes them?
- Activation Functions
- Sigmoid and hyperbolic tangent (tanH) squash their inputs into a small range (between 0 and 1 for sigmoid, and -1 and 1 for tanH), which can lead to small gradients. When these small gradients are multiplied across many layers during back propagation, they become exponentially smaller.
- Deep Networks
- Many Layers: networks with many layers, gradients need to be back propagated through each layer. If the gradients are small at any layer, they will continue to diminish as they are propagated backward.
- Weight Initialization:
- Poor initialization can exacerbate the vanishing gradient problem. If weights are initialized too small, the output of each layer and the corresponding gradients will also be small.
- Activation Functions
- Effects:
- Slow Convergence: takes long Tim to converge, and may not ever.
- Poor Performance
- Difficulty in Training Deep Networks.
- Solutions:
- Activation Functions
- ReLU (Leaky ReLu), do not suffer from this problem as much because gradient not squashed to small values.
- Swish and GELU: newer activation functions that also help mintage vanishing gradients.
- Weight Initialization
- Initialize weights with a variance that keeps the output of each layer within a reasonable range.
- Batch Normalization
- Normalize inputs of each layer to maintain gradient flow, making network less sensitive to weight initialization.
- Activation Functions
Exploding Gradient
- Divergence, oscillating loss, and numerical instability.
- How to fix
- Gradient Clipping
- Regularization
- L2: adding a penalty to large weights helps constrain the growth of weights during training
- Dropout: Randomly dropping units during training can prevent overfitting and control gradient growth.
Explain backpropagation in detail
- Process of adjusting the weights of the network to minimize the error between the actual output and desired output.
- Calculates gradient of the loss function with respect to each weight by using the chain rule of calculus.
- Steps:
- Forward Pass: input data passed, each neuron performs a linear combination of inputs, applies an activation function, and passes the result to the next layer
- Compute Loss: output from forward pass is compared to the true labels to compute loss using loss function (MSE, Cross Entropy Loss)
- Backward Pass: gradient of the loss function with respect to each height is computed, and chain rule is applied to propagate the error backward through the network, layer by layer, from the out layer to the input layer
- Weight update: Weights are updated using an optimization algorithm (Gradient Descent) to minimize the loss.
What is a perceptron?
A perceptron is the simplest type of artificial neural network, consisting of a single layer with a step activation function.
What is an activation function?
An activation function introduces non-linearity to a neural network, enabling it to solve complex problems. Examples: ReLu, sigmoid, and tanh
What is gradient descent in neural netowrks?
Gradient descent is an optimization algorithm that iteratively updates the network’s weights by minimizing the loss function.
What is a feedforward neural network?
A neural network where information moves in only one direction – from input nodes, through hidden nodes, to output nodes – without cycles.
What is a loss function in neural networks?
A loss function measures how well the neural network’s predictions match the actual data. Common examples include MSE for regression and cross-entropy for classification
What is backpropagation?
It is an algorithm used for training neural networks, where the gradient of the loss function is calculated and propagated backward through the network to update weights.
What is ReLU?
Rectified Linear Unit (ReLU) is an activation function defined as f(x) = max(0,x), introducing non-linearity without causing vanishing gradients (like sigmoid and tanh)
What is dropout in neural networks?
Dropout is a regularization technique where random neurons are ignored during training, preventing overfitting by ensuring the network doesn’t rely too heavily on any on neuron. (like pruning of trees)
What is a convolutional neural network (CNN)?
CNN is a type of deep nn commonly used in image processing tasks. It uses convolutional layers to extract features from input images.
What is a recurrent neural network (RNN)?
RNN are a type of nn where connections form a directed cycle, making them effective for sequential data like time series or language.
What is a long short-term memory (LSTM)?
Type of RNN that can learn long-term dependencies using memory cells to store information over time.