Reducing Loss Flashcards
How do we choose a set of model params that minimize loss ?
One way is to compute the gradient.
What are hyperparameters ?
the configuration settings used to tune how the model is trained
Why are Iterative Approaches for Reducing Loss so prevalent in ML ?
primarily because they scale so well to large data sets.
What does it mean when a Model has “Converged” ?
we’ve iterrates until overall loss stops changing or at least changes extremely slowly
How many minimums do Convex problems have ?
only one place where the slope is exactly 0. That minimum is where the loss function converges.
What is Gradient Descent used for In ML ?
algorithm for calculating the loss.
What is differential calculus ?
studies the rates at which quantities change (vs integral calculus)
How are gradients used in ML ?
gradients are used in gradient descent to min loss. We often have a loss function of many variables that we are trying to minimize, and we try to do this by following the negative of the gradient of the function.
What are ML libraries such as TensorFlow used for ?
functions handles the mathematical computations (for example gradient descent)
Is redundant data bad ?
Some redundancy can be useful to smooth out noisy gradients, but enormous batches tend not to carry much more predictive value than large batches.
What is Stochastic gradient descent (SGD) ?
it uses only a single example (a batch size of 1) per iteration, chosen randomly, to determine the average gradient.
Is SGD good ?
given enough iterations, it works but is noisy.
What is mini-batch SGD ?
10 to 1000 examples. reduces the amount of noise in SGD but is still more efficient than full-batch.