Neural Networks Flashcards

Question 1

Q

What is deep learning, and how does it contrast with other machine learning algorithms?

Answer

A

Subset of ML that is concerned with neural networks: how to use back propagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data. In that sense, deep learning represents an unsupervised learning algorithm that learns representations of data through the use of neural nets.

Question 2

Q

Vanishing Gradient

Answer

A

Happens when training deep NNs, particularly those with many layers, such as recurrent neural networks, and deep forward networks. It is characterized by the gradients of the loss function with respect to the weights becoming very small, almost approaching zero, as they are propagated back through the layers during back propagation. This results in slow or stalled learning because the weights are updated minimally.
What causes them?
- Activation Functions
  - Sigmoid and hyperbolic tangent (tanH) squash their inputs into a small range (between 0 and 1 for sigmoid, and -1 and 1 for tanH), which can lead to small gradients. When these small gradients are multiplied across many layers during back propagation, they become exponentially smaller.
- Deep Networks
  - Many Layers: networks with many layers, gradients need to be back propagated through each layer. If the gradients are small at any layer, they will continue to diminish as they are propagated backward.
- Weight Initialization:
- Poor initialization can exacerbate the vanishing gradient problem. If weights are initialized too small, the output of each layer and the corresponding gradients will also be small.
Effects:
- Slow Convergence: takes long Tim to converge, and may not ever.
- Poor Performance
- Difficulty in Training Deep Networks.
Solutions:
- Activation Functions
  - ReLU (Leaky ReLu), do not suffer from this problem as much because gradient not squashed to small values.
  - Swish and GELU: newer activation functions that also help mintage vanishing gradients.
- Weight Initialization
  - Initialize weights with a variance that keeps the output of each layer within a reasonable range.
- Batch Normalization
  - Normalize inputs of each layer to maintain gradient flow, making network less sensitive to weight initialization.

Question 3

Q

Exploding Gradient

Answer

A

Divergence, oscillating loss, and numerical instability.
How to fix
- Gradient Clipping
- Regularization
  - L2: adding a penalty to large weights helps constrain the growth of weights during training
  - Dropout: Randomly dropping units during training can prevent overfitting and control gradient growth.

Question 4

Q

Explain backpropagation in detail

Answer

A

Process of adjusting the weights of the network to minimize the error between the actual output and desired output.
Calculates gradient of the loss function with respect to each weight by using the chain rule of calculus.
Steps:
Forward Pass: input data passed, each neuron performs a linear combination of inputs, applies an activation function, and passes the result to the next layer
Compute Loss: output from forward pass is compared to the true labels to compute loss using loss function (MSE, Cross Entropy Loss)
Backward Pass: gradient of the loss function with respect to each height is computed, and chain rule is applied to propagate the error backward through the network, layer by layer, from the out layer to the input layer
- Weight update: Weights are updated using an optimization algorithm (Gradient Descent) to minimize the loss.

Question 5

Q

What is a perceptron?

Answer

A

A perceptron is the simplest type of artificial neural network, consisting of a single layer with a step activation function.

Question 6

Q

What is an activation function?

Answer

A

An activation function introduces non-linearity to a neural network, enabling it to solve complex problems. Examples: ReLu, sigmoid, and tanh

Question 7

Q

What is gradient descent in neural netowrks?

Answer

A

Gradient descent is an optimization algorithm that iteratively updates the network’s weights by minimizing the loss function.

Question 8

Q

What is a feedforward neural network?

Answer

A

A neural network where information moves in only one direction – from input nodes, through hidden nodes, to output nodes – without cycles.

Question 9

Q

What is a loss function in neural networks?

Answer

A

A loss function measures how well the neural network’s predictions match the actual data. Common examples include MSE for regression and cross-entropy for classification

Question 10

Q

What is backpropagation?

Answer

A

It is an algorithm used for training neural networks, where the gradient of the loss function is calculated and propagated backward through the network to update weights.

Question 11

Q

What is ReLU?

Answer

A

Rectified Linear Unit (ReLU) is an activation function defined as f(x) = max(0,x), introducing non-linearity without causing vanishing gradients (like sigmoid and tanh)

Question 12

Q

What is dropout in neural networks?

Answer

A

Dropout is a regularization technique where random neurons are ignored during training, preventing overfitting by ensuring the network doesn’t rely too heavily on any on neuron. (like pruning of trees)

Question 13

Q

What is a convolutional neural network (CNN)?

Answer

A

CNN is a type of deep nn commonly used in image processing tasks. It uses convolutional layers to extract features from input images.

Question 14

Q

What is a recurrent neural network (RNN)?

Answer

A

RNN are a type of nn where connections form a directed cycle, making them effective for sequential data like time series or language.

Question 15

Q

What is a long short-term memory (LSTM)?

Answer

A

Type of RNN that can learn long-term dependencies using memory cells to store information over time.

Question 16

Q

What is a fully connected layer?

Answer

Study These Flashcards

A

A layer in which each neuron is connected to every neuron in the previous and the next layer, often used at the end of a network for final classification.

Question 17

Q

What is batch normalization

Answer

Study These Flashcards

A

A technique that normalizes the inputs to each layer in a network, speeding up training and improving performance by reducing internal covariate shift.

covariate shift: a phenomenon that occurs in neural networks when the distribution of inputs to the network changes during training. This happens when the network’s parameters are updated, which causes the distribution of inputs to subsequent layers to change. For example, in deep networks, the output of each layer feeds into the next, so when the parameters of one layer change, the distribution of inputs to the next layer changes as well.

Question 18

Q

What is weight initialization?

Answer

Study These Flashcards

A

Setting initial weights for a neural network before training. Poor initialization can lead to slow convergence or model failure. Techniques include Xavier and He initialization.

Question 19

Q

What is learning rate?

Answer

Study These Flashcards

A

A hyperparameter that controls the step size during gradient descent. Too large a rate can cause overshooting, while too small a rate can cause slow convergence

Question 20

Q

What is a pooling layer in CNN’s?

Answer

Study These Flashcards

A

A layer used to down-sample the spatial dimensions (width, height) of the input, reducing the number of parameters and computation in the network.

Say, a picture is in super high dimension and has 6 shades of pink in the corner, the pooling layer can merge the 6 shades into 1 representation, simplifying the model.

Examples: Max Pooling, Average Pooling.

Question 21

Q

What is transfer learning?

Answer

Study These Flashcards

A

A technique where a pre-trained model (usually on a large dataset) is fine-tuned for a specific task on a smaller dataset, improving performance and reducing training time.

Question 22

Q

What is a softmax function?

Answer

Study These Flashcards

A

A function that converts logits (raw model predictions) into probabilities for multi-class classification, ensuring they sum to 1.

Sigmoid is used for two class classification.

A, B, C, D –> model -> cross-entropy -> softmax -> 10% A, 20% B, 60% C, 10% D.

Question 23

Q

Adam Optimzier

Answer

Study These Flashcards

A

Adam Optimizer, or Adaptive Moment Estimation, is an optimization algorithm used to train deep neural networks in machine learning. It’s an improvement on the standard Stochastic Gradient Descent (SGD) optimizer and is considered the default algorithm for deep learning. Adam Optimizer is known for its fast convergence and robustness across problems.
Adam Optimizer works by adjusting the learning rate for each parameter in the model based on its gradient history. This helps the neural network learn more efficiently as a whole. Adam Optimizer keeps track of gradients from previous steps, but it doesn’t just average them. Instead, it uses a combination of recent and past gradient information, giving more weight to the recent.

Neural Networks Flashcards

(23 cards)