Learning rate Flashcards

1
Q

Q: How does the choice of the learning rate α affect gradient descent?

A

A: It has a huge impact on the efficiency; if chosen poorly, gradient descent may not work at all.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Q: What happens if the learning rate α is too small in gradient descent?

A

A: The algorithm will take very small steps, resulting in slow convergence and taking a long time to reach the minimum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Q: What is the effect of a too large learning rate α in gradient descent?

A

A: It can cause the algorithm to overshoot the minimum, potentially even causing it to diverge and fail to converge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Q: What does the derivative term dw/d J(w) indicate in gradient descent?

A

A: It indicates the direction and rate of the steepest ascent, guiding adjustments to the parameter w.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Q: What is the outcome when the gradient descent reaches a local minimum?

A

A: The derivative becomes zero, leading to no change in the parameter w, thus maintaining the local minimum position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Q: How does the gradient descent behave if the parameter w is already at a local minimum?

A

A: The parameter w remains unchanged because the update rule results in w=w−α⋅0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Q: What will happen to the cost J if the learning rate α is properly chosen?

A

A: The cost J will gradually decrease until it reaches a local or global minimum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Q: How does gradient descent ensure convergence to a minimum with a fixed learning rate α?

A

A: As w approaches the minimum, the derivative decreases, resulting in smaller updates and thus gradual convergence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Q: What does it mean if the derivative term in gradient descent is large?

A

A: The update step will be larger, indicating a steeper slope and a need for a larger adjustment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Q: What is the relationship between the slope of the cost function and the size of the steps in gradient descent?

A

A: A steeper slope (larger derivative) results in larger steps, while a flatter slope (smaller derivative) results in smaller steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Q: How does gradient descent behave if w is initialized near the minimum and α is too large?

A

A: It may overshoot the minimum and keep bouncing back and forth without converging.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Q: Why is it crucial to choose an appropriate learning rate α in gradient descent?

A

A: To ensure efficient convergence to a minimum without overshooting or taking excessively small steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly