Equations Flashcards by Anna L

conditional probability, p(a|b) =

p(a,b) / p(b)

How well did you know this?

Not at all

Perfectly

bayes, p(a|b) =

p(b|a)p(a) / p(b)

How well did you know this?

Not at all

Perfectly

independent events, p(a,b) =

p(a)p(b)

How well did you know this?

Not at all

Perfectly

total probability/marginalisation, p(X=x) =

sumy: p(x|y)p(y)

How well did you know this?

Not at all

Perfectly

conditional independence assumption, p(x|y) =

multiply: p(x|y)

How well did you know this?

Not at all

Perfectly

discriminant function, f(x) =

sum: wx - t

How well did you know this?

Not at all

Perfectly

perceptron update rule, sigmoid error wj =

wj - (lrate)(f(x)-y)(x)

How well did you know this?

Not at all

Perfectly

sigmoid/logistic regression, f(x) =

1 / (1+e^-z)

How well did you know this?

Not at all

Perfectly

log loss/cross entropy loss, L(f(x),y) =

-{ylogf(x) + (1-y)log(1-f(x))}

How well did you know this?

Not at all

Perfectly

summed log loss/ cross entropy error/ negative log likelihood, E =

sum i: {ylogf(x) + (1-y)log(1-f(x))}

How well did you know this?

Not at all

Perfectly

partial derivative of cross entropy error, dE/dw =

sum: (f(x) - y))(x)

How well did you know this?

Not at all

Perfectly

partial derivative of sigmoid, dy/dz =

y(1-y)

How well did you know this?

Not at all

Perfectly

partial derivative of cross entropy error, dE/df(x)

-[y(1/f(x)) - (1-y)(1/(1-f(x)))]

How well did you know this?

Not at all

Perfectly

specificity =

TN / (FP+TN)

How well did you know this?

Not at all

Perfectly

precision = positive predictive value =

TP / (TP + FP)

How well did you know this?

Not at all

Perfectly

recall = sensitivity = tp rate =

TP / P

How well did you know this?

Not at all

Perfectly

fp rate =

FP / N

How well did you know this?

Not at all

Perfectly

f1 measure =

2 / (1/precision) + (1/recall)

How well did you know this?

Not at all

Perfectly

pearsons correlation coefficient =

sum:(x-xhat)(y-yhat) / sqrt(sum:(x-xhat)^2)sum:(y-yhat)^2))

How well did you know this?

Not at all

Perfectly

information gain/ mutual information =

I(X;Y) = H(Y) - H(Y|X)

How well did you know this?

Not at all

Perfectly

euclidean distance =

sqrt(sum:(x1-x2)^2)

How well did you know this?

Not at all

Perfectly

hamming distance =

sum: delta(xi not equal xj)

How well did you know this?

Not at all

Perfectly

neuron, y(x,w) =

f(wx + b)

How well did you know this?

Not at all

Perfectly

softmax =

e^z / sumk: e^z

How well did you know this?

Not at all

Perfectly

gradient descent, wnew =

wold - (lrate)(dL/dw)

mean squared error loss, MSE =

1/n sum: (y-t)^2

neuron gradient, with sigmoid loss, and squared loss, dL/dw =

dL/dy dy/dz dz/dw = (y-yhat)(yhat)(1-yhat)(x)

entropy, H(X) =

sum: p(x)logp(x)

entropy

L = 0.5(y-t)^2

squared error loss

e^z / sumk: e^z

softmax

information gain =

I(X;Y) = H(Y) - H(Y|X)

mutual information =

I(X;Y) = H(Y) - H(Y|X)

recall =

TP / P

sensitivity =

TP / P

tp rate =

TP / P

precision =

TP / (TP + FP)

positive predictive value =

TP / (TP + FP)

- sum i: {ylogf(x) + (1-y)log(1-f(x))}

summed log loss/ cross entropy error/ negative log likelihood

summed log loss =

- sum i: {ylogf(x) + (1-y)log(1-f(x))}

cross entropy error =

- sum i: {ylogf(x) + (1-y)log(1-f(x))}

negative log likelihood =

- sum i: {ylogf(x) + (1-y)log(1-f(x))}

log loss, L(f(x),y) =

-{ylogf(x) + (1-y)log(1-f(x))}

cross entropy loss, L(f(x),y) =

-{ylogf(x) + (1-y)log(1-f(x))}

sigmoid, f(x) =

1 / (1+e^-z)

logistic regression, f(x) =

1 / (1+e^-z)

bias update for logistic regression, t =

t + lrate(f(x) - y)

bias update for perceptron, t =

t = t +lrate(yhat - y)

what is P(A or B) if a) they are disjoint b) they are joint

a) P(A) + P(B) | b) P(A) + P(B) - P(A and B)

give the bernoulli distribution

``` P(X = 0) = 1 - p p(X = 1) = p ```

give the binomial distribution

P(X = k) = (nCk)(p^k)(1-p)^k

give the geometric distribution

P(X=x) = (1-p)^x-1 (p)

give the poisson distribution

P(X=x) = { lambda^x e(-lambda) } / x!

if a discrete r.v. X has a pmf f(X) what is the expected value E[g(x)]

sum i: g(Xi)f(Xi)

if a discrete r.v. X has a pmf f(X) what is the variance V[g(x)]

E[(g(X) - E(g(X)))^2] = E[g(X)^2] - E[g(X)]^2

properties of Expectations | E[aX + b] =

aE[X] + b

properties of variance: | V[aX+b] =

a^2V[X]

give the equation for hinge loss

sum: -y(wx + b) = sum: -y(yhat) sum all the negative values for ONLY the misclassified samples

when we perform minibatch sgd, what do we times sum:dL/dW by to scale it

n / |S| | n samples / batch size

what is the perceptron weight update, with hinge loss?

wj = wj - (lrate)( - yhat x y)(xj) or if just for the misclassified wj = wj - (lrate)( -y)(xj) = wj + (lrate)(y)(xj)

what is the loss function (negative log-likelihood) for SGD for logistic regression

- 1/n sumi->n:[yi log f(xi) + (1-yi) log (1-f(xi))] same but with 1/n to rescale based on sample size

the decision boundary for logistic regression is given by

d = 1 / (1+e^-z) wx + b = log(d / 1-d)

give the equation for zero mean, unit variance normalisation

(x - x_mean) / sigma

give the equation for restrict range normalisation

- (x - x_min) / (x_max - x_min)

give the equation for fisher score, F=

(m1 - m2)^2 ---------------- v1 + v2

give a kernel for horizontal lines

1 1 1 0 0 0 -1 -1 -1

give a kernel for vertical lines

1 0 -1 1 0 -1 1 0 -1

give the distribution update scheme for adaboost, i.e. what do we multiply Dj(i) by

1 / 2ej if the classification was incorrect | 1 / 2(1-ej) if the classification was correct

if we know that A is conditionally independent of B given C, then P(A|B,C) = ?

P(A|C)

if A is conditionally independent of B given C, then P(A|B,C) = P(A|C), prove it

P(A,B|C) = P(A|C)P(B|C), conditional independence P(A,B,C) / P(C) = P(A,C)/P(C) P(B|C)/P(C), conditional probability P(A,B,C) = P(A,C)P(B,C) / P(C), times by P(C) P(A,B,C)/P(B,C) = P(A,C)/P(C), divide by P(B,C) PA|B,C) = P(A|C)

if A and B are conditionally independent given C then we know?

P(A,B|C) = P(A|C)P(B|C)

d e^x / dx = ?

x' e^x

d ln x/ dx = ?

1 / x

product rule, d(uv) / dx

u dv/dx + v du/dx

d log f(x) / dx =

f'(x) / f(x)

Equations Flashcards

(75 cards)