probability / gradient descent Flashcards
what is gradient descent
numerical method for finding the input to a function f that minimizes the function
most common use for gradient descent
minimizing empirical risk
when is gradient descent guaranteed to work
convex function (happy face)
convex function
a function f is convex if, for every a, b in the domain of f, the line segment between (a, f(a)) and (b, f(b))
if f(t) is a function of a single variable and twice differentiable
f(t) is convex if and only if (d^2f)/(dt^2) (t) >= 0 for all t
if f(t) is convex and differentiable
then gradient descent converges to a global minimum of f as long as the step size is small enough
nonconvex functions and gradient descent
gradient descent might work, but not guaranteed
choosing a step size
constant step size, alpha
t(i+1) = ti - alpha(df/dt)(ti)
experiment
some process whose outcome is random
set
an unordered collection of items, usually denoted with curly brackets
sample space
set of all possible outcomes of an experiment
event
subset of the sample space or a set of outcomes
what do probabilities mean
if probability(E) = p, then if we repeat our experiment infinitely many times, the proportion of repetitions in which event E occurs is p
the sum of the probabilities of each outcome
must be exactly 1
probability distribution
describes the probability of each outcome s in a sample space S