Learning rate Flashcards

Question 1

Q

Q: How does the choice of the learning rate α affect gradient descent?

Answer

A

A: It has a huge impact on the efficiency; if chosen poorly, gradient descent may not work at all.

Question 2

Q

Q: What happens if the learning rate α is too small in gradient descent?

Answer

A

A: The algorithm will take very small steps, resulting in slow convergence and taking a long time to reach the minimum.

Question 3

Q

Q: What is the effect of a too large learning rate α in gradient descent?

Answer

A

A: It can cause the algorithm to overshoot the minimum, potentially even causing it to diverge and fail to converge.

Question 4

Q

Q: What does the derivative term dw/d J(w) indicate in gradient descent?

Answer

A

A: It indicates the direction and rate of the steepest ascent, guiding adjustments to the parameter w.

Question 5

Q

Q: What is the outcome when the gradient descent reaches a local minimum?

Answer

A

A: The derivative becomes zero, leading to no change in the parameter w, thus maintaining the local minimum position.

Question 6

Q

Q: How does the gradient descent behave if the parameter w is already at a local minimum?

Answer

A

A: The parameter w remains unchanged because the update rule results in w=w−α⋅0.

Question 7

Q

Q: What will happen to the cost J if the learning rate α is properly chosen?

Answer

A

A: The cost J will gradually decrease until it reaches a local or global minimum.

Question 8

Q

Q: How does gradient descent ensure convergence to a minimum with a fixed learning rate α?

Answer

A

A: As w approaches the minimum, the derivative decreases, resulting in smaller updates and thus gradual convergence.

Question 9

Q

Q: What does it mean if the derivative term in gradient descent is large?

Answer

A

A: The update step will be larger, indicating a steeper slope and a need for a larger adjustment

Question 10

Q

Q: What is the relationship between the slope of the cost function and the size of the steps in gradient descent?

Answer

A

A: A steeper slope (larger derivative) results in larger steps, while a flatter slope (smaller derivative) results in smaller steps.

Question 11

Q

Q: How does gradient descent behave if w is initialized near the minimum and α is too large?

Answer

A

A: It may overshoot the minimum and keep bouncing back and forth without converging.

Question 12

Q

Q: Why is it crucial to choose an appropriate learning rate α in gradient descent?

Answer

A

A: To ensure efficient convergence to a minimum without overshooting or taking excessively small steps.

Learning rate Flashcards

(12 cards)