Gradient descent Flashcards

Question 1

Q

Q: What is gradient descent used for in machine learning?

Answer

A

A: To minimize a cost function by iteratively adjusting parameters to reduce errors.

Question 2

Q

Q: Why is gradient descent important for linear regression and deep learning models?

Answer

A

A: It helps find the optimal parameter values that minimize the cost function, crucial for model accuracy.

Question 3

Q

Q: What is the initial step in the gradient descent algorithm?

Answer

A

A: Start with initial guesses for the parameters, often setting them to zero.

Question 4

Q

Q: How does gradient descent adjust the parameters during training?

Answer

A

A: By taking iterative steps proportional to the negative gradient (steepest descent) of the cost function.

Question 5

Q

Q: What shape does the cost function have for linear regression with the squared error?

Answer

A

A: A bowl or hammock shape, indicating a single global minimum.

Question 6

Q

Q: What are local minima in the context of gradient descent?

Answer

A

A: Points where the cost function is locally minimized, but may not be the absolute lowest point (global minimum).

Question 7

Q

Q: How does the starting point affect the outcome of gradient descent?

Answer

A

A: Different starting points can lead to different local minima because gradient descent takes the path of steepest descent from the initial point.

Question 8

Q

Q: What is the main goal of gradient descent?

Answer

A

A: To reduce the cost function to its minimum value by adjusting the model parameters.

Question 9

Q

Q: What does the term “iteration” refer to in gradient descent?

Answer

A

A: A single update of the model parameters based on the gradient of the cost function.

Question 10

Q

Q: What is the direction of steepest descent in gradient descent?

Answer

A

A: The direction in which taking a small step reduces the cost function the most.

Question 11

Q

Q: How does the cost function J change as gradient descent is performed?

Answer

A

A: It decreases as the parameters are adjusted to better fit the training data.

Question 12

Q

Q: What is a common issue that gradient descent can face with certain cost functions?

Answer

A

A: Getting stuck in local minima instead of finding the global minimum.

Question 13

Q

Q: What is the benefit of minimizing the cost function using gradient descent?

Answer

A

A: It improves the model’s predictions by finding the optimal set of parameters that best fit the training data.

Question 14

Q

Q: What does the symbol α represent in gradient descent?

Answer

A

A: The learning rate, which controls the size of the steps taken during the optimization process.

Question 15

Q

Q: Why is α important in gradient descent?

Answer

A

A: It determines how large of a step you take towards minimizing the cost function.

Question 16

Q

Q: What does dw/d J(w,b) represent in the gradient descent update rule?

Answer

Study These Flashcards

A

A: The derivative (or gradient) of the cost function with respect to w.

Question 17

Q

Q: What is the purpose of the derivative in gradient descent?

Answer

Study These Flashcards

A

A: It indicates the direction in which the cost function increases; thus, taking a step in the negative direction reduces the cost.

Question 18

Q

Q: What is a simultaneous update in gradient descent?

Answer

Study These Flashcards

A

A: Updating all parameters (e.g., w and b) at the same time before applying the new values.

Question 19

Q

Q: How do you perform a simultaneous update for parameters w and b?

Answer

Study These Flashcards

A

A: Calculate new temporary values for both w and b using the update rules, then assign these temporary values to w and b simultaneously.

Question 20

Q

Q: What happens if you incorrectly implement non-simultaneous updates in gradient descent?

Answer

Study These Flashcards

A

A: You may not correctly minimize the cost function and could end up with a different algorithm with different properties.

Question 21

Q

Q: How does the learning rate α affect the gradient descent process?

Answer

Study These Flashcards

A

A: If it’s too high, the steps might be too large, causing divergence. If it’s too low, the steps might be too small, causing slow convergence.

Question 22

Q

Q: Why is it critical to perform simultaneous updates in gradient descent?

Answer

Study These Flashcards

A

A: To ensure the algorithm correctly assesses the simultaneous effect of both w and b adjustments before applying them.

Gradient descent Flashcards

(22 cards)