probability / gradient descent Flashcards by Stephanie Yue

what is gradient descent

numerical method for finding the input to a function f that minimizes the function

How well did you know this?

Not at all

Perfectly

most common use for gradient descent

minimizing empirical risk

How well did you know this?

Not at all

Perfectly

when is gradient descent guaranteed to work

convex function (happy face)

How well did you know this?

Not at all

Perfectly

convex function

a function f is convex if, for every a, b in the domain of f, the line segment between (a, f(a)) and (b, f(b))

How well did you know this?

Not at all

Perfectly

if f(t) is a function of a single variable and twice differentiable

f(t) is convex if and only if (d^2f)/(dt^2) (t) >= 0 for all t

How well did you know this?

Not at all

Perfectly

if f(t) is convex and differentiable

then gradient descent converges to a global minimum of f as long as the step size is small enough

How well did you know this?

Not at all

Perfectly

nonconvex functions and gradient descent

gradient descent might work, but not guaranteed

How well did you know this?

Not at all

Perfectly

choosing a step size

constant step size, alpha
t(i+1) = ti - alpha(df/dt)(ti)

How well did you know this?

Not at all

Perfectly

experiment

some process whose outcome is random

How well did you know this?

Not at all

Perfectly

set

an unordered collection of items, usually denoted with curly brackets

How well did you know this?

Not at all

Perfectly

sample space

set of all possible outcomes of an experiment

How well did you know this?

Not at all

Perfectly

event

subset of the sample space or a set of outcomes

How well did you know this?

Not at all

Perfectly

what do probabilities mean

if probability(E) = p, then if we repeat our experiment infinitely many times, the proportion of repetitions in which event E occurs is p

How well did you know this?

Not at all

Perfectly

the sum of the probabilities of each outcome

must be exactly 1

How well did you know this?

Not at all

Perfectly

probability distribution

describes the probability of each outcome s in a sample space S

How well did you know this?

Not at all

Perfectly

probability of each outcome

Study These Flashcards

must be between 0 and 1

the probability of an event

Study These Flashcards

sum of the probabilities of the outcomes in the event

if S (sample space) w/ n possible outcomes and all outcomes are equally likely

Study These Flashcards

then the probability of any one outcome occurring is 1/n

probability of an event

Study These Flashcards

of outcomes in event/ # of outcomes in sample space

complement rule

Study These Flashcards

it is a complement if it contains the set of all outcome in the space that are not in the circle or A

mutually exclusive

Study These Flashcards

no overlap, or they both can’t happen at the same time

if A and B are mutually exclusive, then the probability that A or B happens is:

Study These Flashcards

P(A U B) = P(A) + P(B)
U = union or symbol for or

principle of inclusion-exclusion

Study These Flashcards

if A and B are any two events, then P(A U B) = P(A) + P(B) - P(A intersect B)

the probability that events A and B both happen is

Study These Flashcards

P(A intersect B ) = P(A)P(B|A)

<– given that

P(B|A)

the probability that B happens given that A happened

if P(B|A) = P(B)

then A and B are independent

probability that events A and B both happen

P(A intersect B) = P(A)P(B|A)

if P(B|A) = P(B)

A and B are independent

conditional probability

the probability of an event may change if we have additional information about outcomes

population

choosing k elements randomly from a group of n possible elements

sequence of length k

obtained by selecting k elements from a group of n possible elements with replacement, ORDER MATTERS

number of ways k elements from a group of n possible elements with replacement

n^k

permutation

selecting k elements from a group of n possible elements without replacement

number of ways to select k elements from a group of n possible elements without replacement and order matters

P(n,k) = (n)(n-1) ... (n - k + 1) = n! / (n - k)!

combination

set of k elements selected from a group of n possible elements without replacement , order doesn't matter

law of total probability

if A is an event and E1, E2, and Ek is a partition of S, then Probability(A intersect E1) + P(A intersect E2) + .... + P(A intersect Ek)

Bayes Theorem

P(B|A) = ((P(B)P(A|B))/(P(A))

probability / gradient descent Flashcards

(38 cards)