Equations Flashcards

1
Q

conditional probability, p(a|b) =

A

p(a,b) / p(b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

bayes, p(a|b) =

A

p(b|a)p(a) / p(b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

independent events, p(a,b) =

A

p(a)p(b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

total probability/marginalisation, p(X=x) =

A

sumy: p(x|y)p(y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

conditional independence assumption, p(x|y) =

A

multiply: p(x|y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

discriminant function, f(x) =

A

sum: wx - t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

perceptron update rule, sigmoid error wj =

A

wj - (lrate)(f(x)-y)(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

sigmoid/logistic regression, f(x) =

A

1 / (1+e^-z)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

log loss/cross entropy loss, L(f(x),y) =

A

-{ylogf(x) + (1-y)log(1-f(x))}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

summed log loss/ cross entropy error/ negative log likelihood, E =

A
  • sum i: {ylogf(x) + (1-y)log(1-f(x))}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

partial derivative of cross entropy error, dE/dw =

A

sum: (f(x) - y))(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

partial derivative of sigmoid, dy/dz =

A

y(1-y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

partial derivative of cross entropy error, dE/df(x)

A

-[y(1/f(x)) - (1-y)(1/(1-f(x)))]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

specificity =

A

TN / (FP+TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

precision = positive predictive value =

A

TP / (TP + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

recall = sensitivity = tp rate =

A

TP / P

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

fp rate =

A

FP / N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

f1 measure =

A

2 / (1/precision) + (1/recall)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

pearsons correlation coefficient =

A

sum:(x-xhat)(y-yhat) / sqrt(sum:(x-xhat)^2)sum:(y-yhat)^2))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

information gain/ mutual information =

A

I(X;Y) = H(Y) - H(Y|X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

euclidean distance =

A

sqrt(sum:(x1-x2)^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

hamming distance =

A

sum: delta(xi not equal xj)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

neuron, y(x,w) =

A

f(wx + b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

softmax =

A

e^z / sumk: e^z

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

gradient descent, wnew =

A

wold - (lrate)(dL/dw)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

mean squared error loss, MSE =

A

1/n sum: (y-t)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

neuron gradient, with sigmoid loss, and squared loss, dL/dw =

A

dL/dy dy/dz dz/dw = (y-yhat)(yhat)(1-yhat)(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

entropy, H(X) =

A

sum: p(x)logp(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

sum: p(x)logp(x)

A

entropy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

L = 0.5(y-t)^2

A

squared error loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

e^z / sumk: e^z

A

softmax

32
Q

information gain =

A

I(X;Y) = H(Y) - H(Y|X)

33
Q

mutual information =

A

I(X;Y) = H(Y) - H(Y|X)

34
Q

recall =

A

TP / P

35
Q

sensitivity =

A

TP / P

36
Q

tp rate =

A

TP / P

37
Q

precision =

A

TP / (TP + FP)

38
Q

positive predictive value =

A

TP / (TP + FP)

39
Q
  • sum i: {ylogf(x) + (1-y)log(1-f(x))}
A

summed log loss/ cross entropy error/ negative log likelihood

40
Q

summed log loss =

A
  • sum i: {ylogf(x) + (1-y)log(1-f(x))}
41
Q

cross entropy error =

A
  • sum i: {ylogf(x) + (1-y)log(1-f(x))}
42
Q

negative log likelihood =

A
  • sum i: {ylogf(x) + (1-y)log(1-f(x))}
43
Q

log loss, L(f(x),y) =

A

-{ylogf(x) + (1-y)log(1-f(x))}

44
Q

cross entropy loss, L(f(x),y) =

A

-{ylogf(x) + (1-y)log(1-f(x))}

45
Q

sigmoid, f(x) =

A

1 / (1+e^-z)

46
Q

logistic regression, f(x) =

A

1 / (1+e^-z)

47
Q

bias update for logistic regression, t =

A

t + lrate(f(x) - y)

48
Q

bias update for perceptron, t =

A

t = t +lrate(yhat - y)

49
Q

what is P(A or B) if

a) they are disjoint
b) they are joint

A

a) P(A) + P(B)

b) P(A) + P(B) - P(A and B)

50
Q

give the bernoulli distribution

A
P(X = 0) = 1 - p
p(X = 1) = p
51
Q

give the binomial distribution

A

P(X = k) = (nCk)(p^k)(1-p)^k

52
Q

give the geometric distribution

A

P(X=x) = (1-p)^x-1 (p)

53
Q

give the poisson distribution

A

P(X=x) = { lambda^x e(-lambda) } / x!

54
Q

if a discrete r.v. X has a pmf f(X) what is the expected value E[g(x)]

A

sum i: g(Xi)f(Xi)

55
Q

if a discrete r.v. X has a pmf f(X) what is the variance V[g(x)]

A

E[(g(X) - E(g(X)))^2]

E[g(X)^2] - E[g(X)]^2

56
Q

properties of Expectations

E[aX + b] =

A

aE[X] + b

57
Q

properties of variance:

V[aX+b] =

A

a^2V[X]

58
Q

give the equation for hinge loss

A

sum: -y(wx + b)
= sum: -y(yhat)
sum all the negative values for ONLY the misclassified samples

59
Q

when we perform minibatch sgd, what do we times sum:dL/dW by to scale it

A

n / |S|

n samples / batch size

60
Q

what is the perceptron weight update, with hinge loss?

A

wj = wj - (lrate)( - yhat x y)(xj)
or if just for the misclassified
wj = wj - (lrate)( -y)(xj)
= wj + (lrate)(y)(xj)

61
Q

what is the loss function (negative log-likelihood) for SGD for logistic regression

A
  • 1/n sumi->n:[yi log f(xi) + (1-yi) log (1-f(xi))]

same but with 1/n to rescale based on sample size

62
Q

the decision boundary for logistic regression is given by

A

d = 1 / (1+e^-z)

wx + b = log(d / 1-d)

63
Q

give the equation for zero mean, unit variance normalisation

A

(x - x_mean) / sigma

64
Q

give the equation for restrict range normalisation

A
  • (x - x_min) / (x_max - x_min)
65
Q

give the equation for fisher score, F=

A

v1 + v2

66
Q

give a kernel for horizontal lines

A

1 1 1
0 0 0
-1 -1 -1

67
Q

give a kernel for vertical lines

A

1 0 -1
1 0 -1
1 0 -1

68
Q

give the distribution update scheme for adaboost, i.e. what do we multiply Dj(i) by

A

1 / 2ej if the classification was incorrect

1 / 2(1-ej) if the classification was correct

69
Q

if we know that A is conditionally independent of B given C, then P(A|B,C) = ?

A

P(A|C)

70
Q

if A is conditionally independent of B given C, then P(A|B,C) = P(A|C), prove it

A

P(A,B|C) = P(A|C)P(B|C), conditional independence
P(A,B,C) / P(C) = P(A,C)/P(C) P(B|C)/P(C), conditional probability
P(A,B,C) = P(A,C)P(B,C) / P(C), times by P(C)
P(A,B,C)/P(B,C) = P(A,C)/P(C), divide by P(B,C)
PA|B,C) = P(A|C)

71
Q

if A and B are conditionally independent given C then we know?

A

P(A,B|C) = P(A|C)P(B|C)

72
Q

d e^x / dx = ?

A

x’ e^x

73
Q

d ln x/ dx = ?

A

1 / x

74
Q

product rule, d(uv) / dx

A

u dv/dx + v du/dx

75
Q

d log f(x) / dx =

A

f’(x) / f(x)