Lecture 6 - Linear Models Flashcards

1
Q

What kind of model are Linear models

A

Geometric models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are linear functions

A

f(ax1+bx2)=af(x1)+bf(x2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do geometric concepts such as lines and planes do?

A
  • impose structure
  • Represent similarity between points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

List 4 properties of linear models

A
  • Simple
  • Parametric
  • Stable
  • Prone to underfitting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Parametric

A

We know in advance what are the parameters that need to be learned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Uni-variate linear regression

A

in uni-variate linear regression f(xi)=axi+b, where b is referred to as intercept, a is called regression coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is linear regression

A

Linear regression is about finding the parameters a and b such that sum of residuals Error f(x1)-^f(x1) is minimised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how to find the parameters a and b

A

Goal is minimizing the sum of squared residuals. We set the partial derivatives to 0 and solve for a and b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Linear regression is susceptible to ____

A

outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name 2 ways how we can evaluate performance of regression

A
  • Root mean squared error
  • R^2 or the coefficient of determination
  • RSS
  • TSS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can we do about the outliers? Name 2 solutions

A
  1. Using the ordinary least squares method
    Train the model
    Filter out the noisy pionts absed on the residuals plot
    Retrain the model
  2. Using the total least squares method
    Total least squares: generalises the least squares method to the situation that both x and y values are noisy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If we have very few data points during training, the linear model might not be representative of the test data, why?

A

Low residuals on training data and high residuals on test data lead to overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is regularisation

A

Regularisation is a general method to avoid overfitting by applying additional constrains to the weight vector w. A common approach is to make sure the weights are, on averagem small in magnitude: this is referred to as shrinkage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the regularised regression expression

A

w=argmin(y-Xw)T(y-Xw)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a perceptron in linear classification

A

A linear classifier that will achieve perfect separation on linearly seperable data is the perceptron. The perceptron iterates over the training set, updating the weight vector every time it encounters an incorrectly classified example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

a simple neural network has 2 parameters

A
  • weight vector w (length number of features +1)
  • Learning rate: showing how fast the perceptron converges to the seperating line
17
Q

what is the updating weight vector equation

A

w’ = w+nyixi

18
Q

What is an SVM

A

Support vector machine: for a given training set and decision boundary, let m+ be the smallest margin of any possitive, and m the smallest margin of any negative m-, then we want the sum of these to be as large as possible and we want m+ and m- to be equal.

19
Q

what is margin

A

margin is m/||w||
where m is the distance between the decision boundary and the nearest training instances as measured along w.

20
Q

how do we maximise the margin

A

minimising ||w|| or 1/2||w||^2

21
Q

What is the final quadratic optimisation problem for maximizing the margin

A

w, t=argmin1/2||w||^2

22
Q

An SVM can have 2 types of margins

A
  1. Hard magin: strict
  2. Soft margin: can allow some exceptions inside the margins
23
Q

What if the data is not learly seperable?

A

allow mixture of classes (pos and neg)
-> chosing a margin that allows misclassification is an example of bais-variance trade off.

24
Q

what is a slack variable

A

It allows some points to be inside the margin or on the wrong side

25
Q

What is complexity paramter C

A

A user-defined regularisation parameter trading off margin maximisation against slack variable minimsation: a high value of C means that margin error incur a high penalty, while a low value permits more margin errors in order to achieve a large margin.

26
Q

how can we extend a lienar classifier to a non-linear one?

A

a non-linear mapping of data from the original input space to a new feature space wehre linear classification can be applied.

27
Q

what are kernels

A

kernel is a function applied on pairs of data.

28
Q

Name 2 kernels

A
  1. Polynomial kernel
  2. Gaussian kernel
29
Q
A