Week 1 Flashcards
Linear modelling
Learning a linear relationship between attributes and responses.
What does this mean?
t = f(x;a)
A function f() that acts on x and has a parameter a.
What parameter is known as the intercept?
w0 in
f(x) = w0 + w1*x
What does the squared loss function describe?
How much accuracy we are losing through the use of a certain function to model a phenomenon.
What is this?
Ln(y, f(x;w0,w1))
The squared loss function, telling us how much accuracy we lose through the use of f(x;w0,w1) to model y.
How do you calculate the average loss across a whole dataset?
L =
1/N *
SUM(n=1 to N) of the squared loss function with xn.
argmin means…
Find the argument that minimises.
Bias-variance tradeoff
The tradeoff between a model’s ability to generalise and the risk of overfitting.
Validation set
A second dataset that is used to validate the predictive performance of the model.
K-fold cross-validation
Splits the data into K equally sized blocks. Each block is a validation set with the other K-1 blocks as training set.
LOOCV (abbreviation)
Leave-One-Out Cross-Validation
What is LOOVCV?
A type of K-fold cross-validation where K=N.
0! ==
1
What is a prerequisite for multiplying an n x m matrix A and a q x r matrix B?
A*B is possible if…
m == q
So the number of columns of the first matrix needs to be equal to the number of rows in the second matrix.
(X*w)^T can be simplified to…
(w^T) * (X^T)
(ABCD)^T can be simplified to…
( (AB) (CD) )^T
( (AB) (CD) )^T can be simplified to…
(CD)^T * (AB)^T
(CD)^T * (AB)^T can be simplified to…
D^T * C^T * B^T * A^T
What is the partial derivative with respect to w of:
w^T * x
x
What is the partial derivative with respect to w of:
x^T * w
x
What is the partial derivative with respect to w of:
w^T * w
2w
What is the partial derivative with respect to w of:
w^T * c*w
2cw
Multiplying a scalar by an identity matrix results in..
a matrix with the scalar value on each diagonal element.