Gradient descent Flashcards

1
Q

Q: What is gradient descent used for in machine learning?

A

A: To minimize a cost function by iteratively adjusting parameters to reduce errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Q: Why is gradient descent important for linear regression and deep learning models?

A

A: It helps find the optimal parameter values that minimize the cost function, crucial for model accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Q: What is the initial step in the gradient descent algorithm?

A

A: Start with initial guesses for the parameters, often setting them to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Q: How does gradient descent adjust the parameters during training?

A

A: By taking iterative steps proportional to the negative gradient (steepest descent) of the cost function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Q: What shape does the cost function have for linear regression with the squared error?

A

A: A bowl or hammock shape, indicating a single global minimum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Q: What are local minima in the context of gradient descent?

A

A: Points where the cost function is locally minimized, but may not be the absolute lowest point (global minimum).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Q: How does the starting point affect the outcome of gradient descent?

A

A: Different starting points can lead to different local minima because gradient descent takes the path of steepest descent from the initial point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Q: What is the main goal of gradient descent?

A

A: To reduce the cost function to its minimum value by adjusting the model parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Q: What does the term “iteration” refer to in gradient descent?

A

A: A single update of the model parameters based on the gradient of the cost function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Q: What is the direction of steepest descent in gradient descent?

A

A: The direction in which taking a small step reduces the cost function the most.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Q: How does the cost function J change as gradient descent is performed?

A

A: It decreases as the parameters are adjusted to better fit the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Q: What is a common issue that gradient descent can face with certain cost functions?

A

A: Getting stuck in local minima instead of finding the global minimum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Q: What is the benefit of minimizing the cost function using gradient descent?

A

A: It improves the model’s predictions by finding the optimal set of parameters that best fit the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Q: What does the symbol α represent in gradient descent?

A

A: The learning rate, which controls the size of the steps taken during the optimization process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Q: Why is α important in gradient descent?

A

A: It determines how large of a step you take towards minimizing the cost function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Q: What does dw/d J(w,b) represent in the gradient descent update rule?

A

A: The derivative (or gradient) of the cost function with respect to w.

17
Q

Q: What is the purpose of the derivative in gradient descent?

A

A: It indicates the direction in which the cost function increases; thus, taking a step in the negative direction reduces the cost.

18
Q

Q: What is a simultaneous update in gradient descent?

A

A: Updating all parameters (e.g., w and b) at the same time before applying the new values.

19
Q

Q: How do you perform a simultaneous update for parameters w and b?

A

A: Calculate new temporary values for both w and b using the update rules, then assign these temporary values to w and b simultaneously.

20
Q

Q: What happens if you incorrectly implement non-simultaneous updates in gradient descent?

A

A: You may not correctly minimize the cost function and could end up with a different algorithm with different properties.

21
Q

Q: How does the learning rate α affect the gradient descent process?

A

A: If it’s too high, the steps might be too large, causing divergence. If it’s too low, the steps might be too small, causing slow convergence.

22
Q

Q: Why is it critical to perform simultaneous updates in gradient descent?

A

A: To ensure the algorithm correctly assesses the simultaneous effect of both w and b adjustments before applying them.