Equations Flashcards
conditional probability, p(a|b) =
p(a,b) / p(b)
bayes, p(a|b) =
p(b|a)p(a) / p(b)
independent events, p(a,b) =
p(a)p(b)
total probability/marginalisation, p(X=x) =
sumy: p(x|y)p(y)
conditional independence assumption, p(x|y) =
multiply: p(x|y)
discriminant function, f(x) =
sum: wx - t
perceptron update rule, sigmoid error wj =
wj - (lrate)(f(x)-y)(x)
sigmoid/logistic regression, f(x) =
1 / (1+e^-z)
log loss/cross entropy loss, L(f(x),y) =
-{ylogf(x) + (1-y)log(1-f(x))}
summed log loss/ cross entropy error/ negative log likelihood, E =
- sum i: {ylogf(x) + (1-y)log(1-f(x))}
partial derivative of cross entropy error, dE/dw =
sum: (f(x) - y))(x)
partial derivative of sigmoid, dy/dz =
y(1-y)
partial derivative of cross entropy error, dE/df(x)
-[y(1/f(x)) - (1-y)(1/(1-f(x)))]
specificity =
TN / (FP+TN)
precision = positive predictive value =
TP / (TP + FP)
recall = sensitivity = tp rate =
TP / P
fp rate =
FP / N
f1 measure =
2 / (1/precision) + (1/recall)
pearsons correlation coefficient =
sum:(x-xhat)(y-yhat) / sqrt(sum:(x-xhat)^2)sum:(y-yhat)^2))
information gain/ mutual information =
I(X;Y) = H(Y) - H(Y|X)
euclidean distance =
sqrt(sum:(x1-x2)^2)
hamming distance =
sum: delta(xi not equal xj)
neuron, y(x,w) =
f(wx + b)
softmax =
e^z / sumk: e^z
gradient descent, wnew =
wold - (lrate)(dL/dw)
mean squared error loss, MSE =
1/n sum: (y-t)^2
neuron gradient, with sigmoid loss, and squared loss, dL/dw =
dL/dy dy/dz dz/dw = (y-yhat)(yhat)(1-yhat)(x)
entropy, H(X) =
sum: p(x)logp(x)
sum: p(x)logp(x)
entropy
L = 0.5(y-t)^2
squared error loss