Lecture 6 - Linear Models Flashcards by Alexander Bazba

What kind of model are Linear models

Geometric models

How well did you know this?

Not at all

Perfectly

What are linear functions

f(ax1+bx2)=af(x1)+bf(x2)

How well did you know this?

Not at all

Perfectly

What do geometric concepts such as lines and planes do?

impose structure
Represent similarity between points

How well did you know this?

Not at all

Perfectly

List 4 properties of linear models

Simple
Parametric
Stable
Prone to underfitting

How well did you know this?

Not at all

Perfectly

What is Parametric

We know in advance what are the parameters that need to be learned

How well did you know this?

Not at all

Perfectly

What is Uni-variate linear regression

in uni-variate linear regression f(xi)=axi+b, where b is referred to as intercept, a is called regression coefficient.

How well did you know this?

Not at all

Perfectly

What is linear regression

Linear regression is about finding the parameters a and b such that sum of residuals Error f(x1)-^f(x1) is minimised

How well did you know this?

Not at all

Perfectly

how to find the parameters a and b

Goal is minimizing the sum of squared residuals. We set the partial derivatives to 0 and solve for a and b

How well did you know this?

Not at all

Perfectly

Linear regression is susceptible to ____

outliers

How well did you know this?

Not at all

Perfectly

Name 2 ways how we can evaluate performance of regression

Root mean squared error
R^2 or the coefficient of determination
RSS
TSS

How well did you know this?

Not at all

Perfectly

What can we do about the outliers? Name 2 solutions

Using the ordinary least squares method
Train the model
Filter out the noisy pionts absed on the residuals plot
Retrain the model
Using the total least squares method
Total least squares: generalises the least squares method to the situation that both x and y values are noisy.

How well did you know this?

Not at all

Perfectly

If we have very few data points during training, the linear model might not be representative of the test data, why?

Low residuals on training data and high residuals on test data lead to overfitting

How well did you know this?

Not at all

Perfectly

What is regularisation

Regularisation is a general method to avoid overfitting by applying additional constrains to the weight vector w. A common approach is to make sure the weights are, on averagem small in magnitude: this is referred to as shrinkage.

How well did you know this?

Not at all

Perfectly

what is the regularised regression expression

w=argmin(y-Xw)T(y-Xw)

How well did you know this?

Not at all

Perfectly

What is a perceptron in linear classification

A linear classifier that will achieve perfect separation on linearly seperable data is the perceptron. The perceptron iterates over the training set, updating the weight vector every time it encounters an incorrectly classified example.

How well did you know this?

Not at all

Perfectly

a simple neural network has 2 parameters

Study These Flashcards

weight vector w (length number of features +1)
Learning rate: showing how fast the perceptron converges to the seperating line

what is the updating weight vector equation

Study These Flashcards

w’ = w+nyixi

What is an SVM

Study These Flashcards

Support vector machine: for a given training set and decision boundary, let m+ be the smallest margin of any possitive, and m the smallest margin of any negative m-, then we want the sum of these to be as large as possible and we want m+ and m- to be equal.

what is margin

Study These Flashcards

margin is m/||w||
where m is the distance between the decision boundary and the nearest training instances as measured along w.

how do we maximise the margin

Study These Flashcards

minimising ||w|| or 1/2||w||^2

What is the final quadratic optimisation problem for maximizing the margin

Study These Flashcards

w, t=argmin1/2||w||^2

An SVM can have 2 types of margins

Study These Flashcards

Hard magin: strict
Soft margin: can allow some exceptions inside the margins

What if the data is not learly seperable?

Study These Flashcards

allow mixture of classes (pos and neg)
-> chosing a margin that allows misclassification is an example of bais-variance trade off.

what is a slack variable

Study These Flashcards

It allows some points to be inside the margin or on the wrong side

What is complexity paramter C

A user-defined regularisation parameter trading off margin maximisation against slack variable minimsation: a high value of C means that margin error incur a high penalty, while a low value permits more margin errors in order to achieve a large margin.

how can we extend a lienar classifier to a non-linear one?

a non-linear mapping of data from the original input space to a new feature space wehre linear classification can be applied.

what are kernels

kernel is a function applied on pairs of data.

Name 2 kernels

1. Polynomial kernel 2. Gaussian kernel

Lecture 6 - Linear Models Flashcards

(29 cards)