Week 1 Flashcards

1
Q

Linear modelling

A

Learning a linear relationship between attributes and responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does this mean?
t = f(x;a)

A

A function f() that acts on x and has a parameter a.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What parameter is known as the intercept?

A

w0 in
f(x) = w0 + w1*x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the squared loss function describe?

A

How much accuracy we are losing through the use of a certain function to model a phenomenon.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is this?
Ln(y, f(x;w0,w1))

A

The squared loss function, telling us how much accuracy we lose through the use of f(x;w0,w1) to model y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you calculate the average loss across a whole dataset?

A

L =
1/N *
SUM(n=1 to N) of the squared loss function with xn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

argmin means…

A

Find the argument that minimises.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bias-variance tradeoff

A

The tradeoff between a model’s ability to generalise and the risk of overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Validation set

A

A second dataset that is used to validate the predictive performance of the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

K-fold cross-validation

A

Splits the data into K equally sized blocks. Each block is a validation set with the other K-1 blocks as training set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

LOOCV (abbreviation)

A

Leave-One-Out Cross-Validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is LOOVCV?

A

A type of K-fold cross-validation where K=N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

0! ==

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a prerequisite for multiplying an n x m matrix A and a q x r matrix B?

A*B is possible if…

A

m == q
So the number of columns of the first matrix needs to be equal to the number of rows in the second matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

(X*w)^T can be simplified to…

A

(w^T) * (X^T)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

(ABCD)^T can be simplified to…

A

( (AB) (CD) )^T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

( (AB) (CD) )^T can be simplified to…

A

(CD)^T * (AB)^T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

(CD)^T * (AB)^T can be simplified to…

A

D^T * C^T * B^T * A^T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the partial derivative with respect to w of:
w^T * x

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the partial derivative with respect to w of:
x^T * w

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the partial derivative with respect to w of:
w^T * w

A

2w

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the partial derivative with respect to w of:
w^T * c*w

A

2cw

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Multiplying a scalar by an identity matrix results in..

A

a matrix with the scalar value on each diagonal element.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The inverse of a matrix that only has values on the diagonal, is

A

Another diagonal matrix where each diagonal element is the inverse of the corresponding element in the original.

25
Q

How do we write the optimum value of w?

A

^w
w met een dakje erop

26
Q

^w ==

(The formula)

A

(X^T * X) ^-1 * X^T * t

27
Q

Linear regression

A

Supervised learning where t is a real number.

28
Q

Linear classification

A

Supervised learning where t is an element from a finite set.

29
Q

In supervised learning, each dataset is of the form x,t (with t in R in regression). What is the goal?

A

We look for a hypothesis such that t = f(x). We want the computer to predict the data.

30
Q

Overfitting

A

When the model comes up with a hypothesis that is too complex, so it fits the existing data very well but has terrible prediction skills.

31
Q

What does regularisation add to supervised learning?

A

It doesn’t only try to find minimized loss, but also to find minimal weights. So a penalty is added for a higher weight.

32
Q

What is the central question in the generative modelling problem?

A

Can we build a model that could generate a dataset like ours?

33
Q

What does the equation
f(x;w) = w^T * x
do?

A

It generates a datapoint (the label essentially) for every input data info x.

34
Q

What does the italics N mean?
Example: N(0, sig^2)

A

It means ‘normal distribution with mean 0 and variance sig^2’.

35
Q

For a Gaussian variable, the most likely point corresponds to…

A

the mean

36
Q

Give the name (left side) of the function for the joint density of t over all datapoints in a dataset.

A

p(t | x, w, sig^2)

37
Q

p(t | x, w, sig^2) =

A

PRODUCT(n=1 to N) p(t.n | x.n, w, sig^2)

38
Q

Give the formula for log L, the log likelihood:

A

-(N/2) * log(2pi) -
N * log(sig) -
(1/(2 * sig^2)) *
SUM(n=1 to N) of (t.n - w^T * n)^2

39
Q

Give the Bernoulli distribution

A

P(X=x) = q^x * ((1-q)^(1-x))

40
Q

IID (abbreviation)

A

independent and identically distributed

41
Q

When is a matrix A negative definite?

A

If x^T * A * x < 0 for all vectors of real values x.

42
Q

How do we actually show negative definiteness?

A

By solving the equation
- (1/sig^2) * z^T * X^T * Xz <0
for any vector z that z^T * X^T * X
z > 0

43
Q

What is M with a line on it?

A

The expected value of the squared error between estimated parameter values and the true values.

44
Q

Give the formula for M with the stripe on it.

A

M_ = B^2 + V
with B= bias and V = variance

45
Q

A function of a random variable is…

A

itself a random variable.

46
Q

How can you check if two random variables x and y are independent?

A

Check if p(x, y ) = p(x)*pp(y)

47
Q

How do you check conditional independence?

A

p(x,y|z) = p(x|z) * p(y|z)

48
Q

If you know the probability of the outcomes of a random variable X, how do you calculate the expected value E(X)?

A

Multiply each value of X by its probability and add all these products.

E(X) = SUM of all ( x * P(X=x))

49
Q

What should you mention when multiplying probabilities of variables?

A

That the variables are independent.

50
Q

How do you calculate the variance of a random variable?

A

Use the formula
var(X) = E(X^2) - E(X)^2.

51
Q

How do you calculate E(X^2)?

A

The probabilities in E(X) stay the same, but you now need to use the square of x instead of x to calculate the sum of probabilities.

52
Q

What can you say about the hypothesis/ the weight vector in a model with data in matrix X with N rows and 1 column?

A

The matrix usually contains feature values in the columns, and an additional column per row with a 1. Thus there are no features in this matrix X.
The weight vector contains a value for every column, now only 1s so w.T * x.n = w.0 * 1 = w.0, it’s just a number that is multiplied with every xn and gives the same hypothesis h for every value xn.

53
Q

What is the loss function of the least-squares regression problem?

A

1/N * SUM(n=1 to N) (t.n-w.T*x.n)^2)

54
Q

The derivative of a sum of terms is equal to…

A

the sum of the derivatives of those terms.

55
Q

The derivative of (ax+b)^q = …

A

(q(ax+b)^q-1) * a

56
Q

What would the log of the likelihood be:

PRODUCT(n=1 to N) of r^x.n * ((1-r)^(1-x.n))

A

SUM(n=1 to N) of x.nlog(r) + (1-x.n)log(1-r)

57
Q

When you are asked to compute the maximum likelihood estimate of a Bernoulli parameter, what do you do?

A

Take the derivative of the log likelihood, so a log L/a r. If there is a sum in the log L, it stays in the derivative for now. All a*log(b) become a/b in the derivative.
Equate it to zero.

58
Q

Multiplying a negative definite matrix by a negative constant gives…

A

a positive definite matrix.

59
Q
A