Gradient descent Flashcards
Q: What is gradient descent used for in machine learning?
A: To minimize a cost function by iteratively adjusting parameters to reduce errors.
Q: Why is gradient descent important for linear regression and deep learning models?
A: It helps find the optimal parameter values that minimize the cost function, crucial for model accuracy.
Q: What is the initial step in the gradient descent algorithm?
A: Start with initial guesses for the parameters, often setting them to zero.
Q: How does gradient descent adjust the parameters during training?
A: By taking iterative steps proportional to the negative gradient (steepest descent) of the cost function.
Q: What shape does the cost function have for linear regression with the squared error?
A: A bowl or hammock shape, indicating a single global minimum.
Q: What are local minima in the context of gradient descent?
A: Points where the cost function is locally minimized, but may not be the absolute lowest point (global minimum).
Q: How does the starting point affect the outcome of gradient descent?
A: Different starting points can lead to different local minima because gradient descent takes the path of steepest descent from the initial point.
Q: What is the main goal of gradient descent?
A: To reduce the cost function to its minimum value by adjusting the model parameters.
Q: What does the term “iteration” refer to in gradient descent?
A: A single update of the model parameters based on the gradient of the cost function.
Q: What is the direction of steepest descent in gradient descent?
A: The direction in which taking a small step reduces the cost function the most.
Q: How does the cost function J change as gradient descent is performed?
A: It decreases as the parameters are adjusted to better fit the training data.
Q: What is a common issue that gradient descent can face with certain cost functions?
A: Getting stuck in local minima instead of finding the global minimum.
Q: What is the benefit of minimizing the cost function using gradient descent?
A: It improves the model’s predictions by finding the optimal set of parameters that best fit the training data.
Q: What does the symbol α represent in gradient descent?
A: The learning rate, which controls the size of the steps taken during the optimization process.
Q: Why is α important in gradient descent?
A: It determines how large of a step you take towards minimizing the cost function.