Week 6: Optimization Flashcards

Question 1

Q

Give examples of functions whose minimum point is not a stationary point. Draw graphs.

Answer

A

One discontinuous function and one bounded function.

Question 2

Q

What defines a stationary point?

Answer

A

Its first derivative (gradient) is zero.

Question 3

Q

Why would be not be able to use a kind of exponential function as an objective function?

Answer

A

As it has its minimum when f(x) goes to infinity and hence doesn’t have an interior minimum.

Question 4

Q

How come we can limit ourselves to minimization (and not also max.) when we do optimization?

Answer

A

Since we can always rewrite the maximization problem as minimizing the negative objective function.

Question 5

Q

What is a typical rate for the learning rate gamma?

Answer

A

0.01 or 0.05.

Question 6

Q

Name the two ways in which optimization is used in machine learning.

Answer

A

1) For training a model. We optimize the objective function J(theta) and the optimization variables are the model parameters. 2) For tuning hyperparameters set before-hand of analysis. We then optimize the objective function with the hyperparameters as optimization variables.

Question 7

Q

Why is a convex function a good function to optimize?

Answer

A

Since it has a unique, global minimum.

Question 8

Q

Give examples of convex cost functions?

Answer

A

The cost functions for linear, logistic regression and the the L1 regularised linear regression.

Question 9

Q

Give an example of a non-convex function.

Answer

A

The cost function for a deep neural network.

Question 10

Q

State the optimization problem i.e., the minimization of the cost function for linear regresson.

Answer

A

theta.hatt = arg min (theta) 1/n SUM [ ||*XX theta** - y ||] ^2_2

Question 11

Q

Why is coordinate descent particularly fast and efficient for optimizing an L1 regularized linear regression model? Since the model works the way that is sets many coefficients to zero, many of the updates in the coordinate descent will simply set theta_j = 0 due to the sparsity of the optimal theta.hat.

Week 6: Optimization Flashcards

(11 cards)