Deep Learning Fundamentals - Training Deep Neural Networks Flashcards
What is a labeled dataset?
A dataset where input data is paired with corresponding output data, enabling a model to learn patterns and make predictions.
Why is the quality and size of a labeled dataset important?
They directly influence the effectiveness of the resulting model.
Give an example of a labeled dataset.
A dataset of images labeled as either “cat” or “dog” to train a model to differentiate between the two.
What is gradient descent?
An optimization algorithm used to minimize a function, often a cost function, by iteratively adjusting model parameters.
How does the learning rate affect gradient descent?
A large learning rate may overshoot the minimum, while a small rate can result in slow convergence.
How does gradient descent refine a model’s parameters?
By iteratively adjusting weights to minimize the cost function.
How are weights updated in backpropagation?
Using the computed gradients and a predefined learning rate.
What is the purpose of testing a model after training?
To assess how well it performs on unseen data, simulating real-world scenarios.
What is a test dataset?
A dataset different from the training dataset, used to evaluate the model’s performance.
What is a Neural Network? What are Weights? How about the Learning Process? And Loss Function?
A neural network is a computational model inspired by the human brain’s network of neurons. It consists of layers of interconnected nodes (neurons) that process data and can learn to perform tasks like classification, regression, and pattern recognition.
What are Weights?
Weights are numerical values that represent the strength of connections between neurons in different layers.
They determine how input data is transformed as it moves through the network.
Adjusting these weights is how the network learns from data.
The Learning Process
Objective:
The goal is to adjust the weights so that the neural network produces outputs that are as close as possible to the desired results.
Loss Function:
A loss function measures how well the neural network’s predictions match the actual targets.
The lower the loss, the better the network is performing.
What is Gradient Descent?
Gradient descent is an optimization algorithm used to minimize the loss function by iteratively adjusting the weights.
It calculates the gradient (slope) of the loss function with respect to each weight and moves the weights in the direction that decreases the loss.
Analogy: Imagine you’re hiking down a mountain in thick fog, and you want to reach the lowest point (the valley). Since you can’t see far ahead, you feel the slope of the ground under your feet and take small steps downhill. Over time, you’ll reach the bottom. This process is similar to gradient descent.
What is Learning Rate? How to choose the right learning rate?
The learning rate is a hyperparameter that determines the size of the steps we take to reach the minimum of the loss function.
It controls how much we adjust the weights during each iteration of gradient descent.
Choosing the Right Learning Rate:
Too High: If the learning rate is too high, we might overshoot the minimum, causing the loss to increase or even diverge.
Too Low: If it’s too low, the training process will be very slow, and it might get stuck in a local minimum.
What is Backpropagation? How does it work?
Backpropagation is the algorithm used to calculate the gradients of the loss function with respect to each weight.
It works by applying the chain rule of calculus to propagate the error backward from the output layer to the input layer.
How Backpropagation Works:
Forward Pass: Input data is passed through the network to generate an output.
Compute Loss: The output is compared to the actual target to calculate the loss.
Backward Pass: The gradient of the loss is calculated with respect to each weight, starting from the output layer and moving backward.
Update Weights: Weights are adjusted using the gradients and the learning rate.
Analogy: Think of backpropagation as a way to find out how much each valve (weight) in our water pipe system contributed to the final output so we can adjust them accordingly.
How the training process works?
Training Process:
Initialization: Start with random weights.
Forward Pass: Compute the output of the network using the current weights.
Compute Loss: Measure how far the network’s prediction is from the actual target.
Backward Pass (Backpropagation): Compute the gradients of the loss with respect to each weight.
Update Weights (Gradient Descent): Adjust the weights in the opposite direction of the gradients.
Repeat: Continue this process for many iterations (epochs) until the loss is minimized.
Give me an example about how an Simple Neural Network for Predicting House Prices works.
Suppose we have a dataset with one input feature—the size of the house (in square meters)—and the target is the price of the house.
Step-by-Step Training:
Initialize Weights:
Start with a random weight
w
w.
Forward Pass:
Predict the price using the current weight:
PredictedPrice
=
w
×
HouseSize
PredictedPrice=w×HouseSize.
Compute Loss:
Use Mean Squared Error (MSE):
Loss
=
(
PredictedPrice
−
ActualPrice
)
2
Loss=(PredictedPrice−ActualPrice)
2
.
Backward Pass (Compute Gradient):
Calculate the gradient of the loss with respect to the weight:
d
Loss
d
w
=
2
×
(
PredictedPrice
−
ActualPrice
)
×
HouseSize
dw
dLoss
=2×(PredictedPrice−ActualPrice)×HouseSize.
Update Weight:
Adjust the weight:
w new = w old − LearningRate × dLoss/dw w new
=w old −LearningRate× dw/dLoss
.
Iterate:
Repeat steps 2–5 for all data points and for several epochs.
Visualizing the Process:
Plotting the Loss Function: If you plot the loss function against the weight, it might look like a U-shaped curve.
Gradient Descent Path: Starting from the initial weight, gradient descent moves the weight down the slope of the loss function to find the minimum point.