probability / gradient descent Flashcards
what is gradient descent
numerical method for finding the input to a function f that minimizes the function
most common use for gradient descent
minimizing empirical risk
when is gradient descent guaranteed to work
convex function (happy face)
convex function
a function f is convex if, for every a, b in the domain of f, the line segment between (a, f(a)) and (b, f(b))
if f(t) is a function of a single variable and twice differentiable
f(t) is convex if and only if (d^2f)/(dt^2) (t) >= 0 for all t
if f(t) is convex and differentiable
then gradient descent converges to a global minimum of f as long as the step size is small enough
nonconvex functions and gradient descent
gradient descent might work, but not guaranteed
choosing a step size
constant step size, alpha
t(i+1) = ti - alpha(df/dt)(ti)
experiment
some process whose outcome is random
set
an unordered collection of items, usually denoted with curly brackets
sample space
set of all possible outcomes of an experiment
event
subset of the sample space or a set of outcomes
what do probabilities mean
if probability(E) = p, then if we repeat our experiment infinitely many times, the proportion of repetitions in which event E occurs is p
the sum of the probabilities of each outcome
must be exactly 1
probability distribution
describes the probability of each outcome s in a sample space S
probability of each outcome
must be between 0 and 1
the probability of an event
sum of the probabilities of the outcomes in the event
if S (sample space) w/ n possible outcomes and all outcomes are equally likely
then the probability of any one outcome occurring is 1/n
probability of an event
of outcomes in event/ # of outcomes in sample space
complement rule
it is a complement if it contains the set of all outcome in the space that are not in the circle or A
mutually exclusive
no overlap, or they both can’t happen at the same time
if A and B are mutually exclusive, then the probability that A or B happens is:
P(A U B) = P(A) + P(B)
U = union or symbol for or
principle of inclusion-exclusion
if A and B are any two events, then P(A U B) = P(A) + P(B) - P(A intersect B)
the probability that events A and B both happen is
P(A intersect B ) = P(A)P(B|A)
<– given that
P(B|A)
the probability that B happens given that A happened
if P(B|A) = P(B)
then A and B are independent
probability that events A and B both happen
P(A intersect B) = P(A)P(B|A)
if P(B|A) = P(B)
A and B are independent
conditional probability
the probability of an event may change if we have additional information about outcomes
population
choosing k elements randomly from a group of n possible elements
sequence of length k
obtained by selecting k elements from a group of n possible elements with replacement, ORDER MATTERS
number of ways k elements from a group of n possible elements with replacement
n^k
permutation
selecting k elements from a group of n possible elements without replacement
number of ways to select k elements from a group of n possible elements without replacement and order matters
P(n,k) = (n)(n-1) … (n - k + 1) = n! / (n - k)!
combination
set of k elements selected from a group of n possible elements without replacement , order doesn’t matter
law of total probability
if A is an event and E1, E2, and Ek is a partition of S, then Probability(A intersect E1) + P(A intersect E2) + …. + P(A intersect Ek)
Bayes Theorem
P(B|A) = ((P(B)P(A|B))/(P(A))