Regression Flashcards

Question 1

Q

How can Linear Regression be extended?

Answer

A

With regularisers (L2 Regularisation)

Question 2

Q

What is regression even for?

Answer

A

For predicting continuous classes; where classification fails

Question 3

Q

What is Linear Regression?

Answer

A

Linear Regression is an attempt to build a linear model to predict the target values, by finding a weight for each attribute.

Captures a relationship between two variables or attributes
It makes the assumption that there is a linear relationship between the two variables

x = W0 + sum of WiAi

x = class
w* are the weights
a* are the attribute values

Question 4

Q

How to choose the best line for linear regression?

Answer

A

option 1) Finding the line that minimises the distance between all points and the line
- Euclidean distance: d(a,b) = sqrt(sumof(ai - bi)^2))

option 2) Least squares estimation: finding the line that minimises the sum of the squares of the vertical distances between approximated/predicted and observed
- minimise the Residual Sum of Squares (RSS) -> aka Sum of Square Errors (SSE):
RSS(Beta) = Sumof(yi - betaxi)^2

Question 5

Q

Which metric to use for linear regression to find the best line?

Answer

A

Actual choice of metric isn’t that important, they’re all pretty stable

Just use either

Root mean-squared error
Root relative squared error
Correlation coefficient

Question 6

Q

WTF is a Regression Tree?

Answer

A

Extension of Decision Trees, where the “class” (value) at each leaf is calculated by averaging over the values of all instances at that node

Question 7

Q

WTF is a Model Tree?

Answer

A

Generalised regression trees where the class at each leaf is calculated via linear regression over training instances at that node

Basically partitioning our data set and applying linear regression to each partition

As you work down the tree, the result at each leaf node is which linear regression model to use on our data set

Question 8

Q

Regression vs Model trees

Answer

A

Model trees have advantages over regression trees in both compactness and prediction accuracy, because model trees can exploit local linearity in the data

Regression trees will never give a predicted value lying outside the range observed in the training cases, whereas model trees can extrapolate

Question 9

Q

How to translate a regression task into a simple classification task?

Answer

A

Can map a continuous class onto discrete classes via DISCRETISATION

Set range of continuous variables that corresponds to each discrete class

Question 10

Q

How to translate a classification task into a suite of regression task?

Answer

A

MULTI-RESPONSE LINEAR REGRESSION

Perform one regression per discrete class
- With all positive instances set to 1 and all negative instances set to 0
Classify a given test instance by estimating its value relative to each class, and selecting the class with the highest value
Approximates a numeric membership function for each class

Question 11

Q

WTF is Maximum Likelihood Estimation?

Answer

A

Goal is to search for a value of Beta so that the probability P(y = 1 | x) = hbeta(X) is large when x belongs to the “1” class and small when x belongs to the “0” class (so that P(y = 0 | x) is large)

Question 12

Q

Linear Regression uses gradient descent, what does Logistic Regression use?

Answer

A

Tries to maximise so uses gradient ascent

Question 13

Q

Can Logistic Regression be applied to multi-class classification?

Answer

A

By default, no, only for binary classification.

However, can extend to multi-classification by assuming a multinomial distribution.
- Mutlinomial Logistic Regression
Applies softmax - a generalisation of the logistic function to J dimensions.
- Results in a J-Dimensional vector of real values in the range (0,1) that add up to 1

Question 14

Q

Logistic Regression Pros and Cons

Answer

A

PROS

Simple yet low-bias classifier
Unlike Naive Bayes not confounded by diverse, correlated features

CONS

Slow to train
Some feature scaling issues
Often needs a lot of data to work well
Choosing a regularisation a nuisance but important since OVERFITTING is a problem - adds constraints on the parameter space

Question 15

Q

What is Regression? How is it similar to Classification, and how is it different?

Answer

A

Regression is used for when the target attribute (class) is numeric (continuous).
Consequently, we can't assess the likelihood of each class like we can in Classification.

Question 16

Q

How do we build a linear regression model? What is RSS and what advantage does it have over some alternatives?

Answer

A

1) Learn the weights using Gradient Descent
2) This assumes that our error function is convex; one popular alternative is to consider the sum of squared differences between our predicted values and the actual target values from the training data

Question 17

Q

How can we use a Decision Tree to do Regression?

Answer

A

We can have a single prediction value at each leaf of the tree (this is a Regression tree)

Alternatively, we can apply a linear regression over the instances at each leaf of the tree (a Model tree)

Question 18

Q

What is Logistic Regression?

Answer

A

Logistic Regression is an attempt to build a model where the target is close to “1” for positive instances of the class, and close to “0” for the negative instances of the class

Question 19

Q

How is Logistic Regression similar to Naïve Bayes and how is it different?

Answer

A

Both Naïve Bayes and Logistic Regression are attempting to find the class c for a test instance T, by maximising P(c | T)

In Naïve Bayes, we make some simplifying assumptions, most notably, that the attributes are conditionally independent of the class labels - hence the product of all the attributes

In Logistic Regression, we attempt to model this directly, without the simplifying assumptions of independence. This is possible because we don’t attempt to generate class possibilities; only attempt to discriminate amongst the various classes

Question 20

Q

What is “Logistic”? What are we “Regressing”?

Answer

A

Logistic function = 1 / (1 + e^m)

Regression output = (beta dotprod x)

Question 21

Q

How do we train a Logistic Regression model? In particular, what is the significance of the following argmaxBeta(sumof(yiloghbeta(xi) + (1 - yi)log(1-hbeta(xi)))

Answer

A

To train, use the logistic function y = 1 / (1 + exp^(-betax))

This means we want the output of the linear regression to be positive when the target class is 1, and negative when the target class is 0

Question 22

Q

How can Nearest Neighbour/Prototype classifiers be applied to regression tasks?

Answer

A

K-NN models can be applied directly through simply weighted combination of the continuous labels associated with the nearest neighbour.

However, it is less clear for Nearest Prototype models.

Question 23

Q

How to do Naïve Bayes for continuous-valued features?

Answer

A

Probability Density Functions

Question 24

Q

Can Naïve Bayes be applied to regression?

Answer

A

Yes, but it’s bad

Question 25

Q

How to deal with continuous features in Decision Trees?

Answer

A

Information-Based Supervised Discretisation

Question 26

Q

Can Decision Trees be applied to regression?

Answer

A

Yes, Regression Trees and Model Trees

Question 27

Q

How to deal with continuous features in SVMs?

Answer

A

Like NN/NP, SVMs natively handle continuous-valued features.

Question 28

Q

Can SVMs be applied to regression?

Answer

A

Yes, Support Vector Regressors

Modelling a tube/street to pick up all the training data, don’t want to pick up any of the support vectors? (I think)

Question 29

Q

How to deal with continuous features in Logistic Regression?

Answer

A

Similar to Naïve Bayes, logistic regression models are usually defined in terms of discrete features, but, kernel density functions can be used to handle continuous features.

Question 30

Q

Why is Logistic Regression considered a classification model?

Answer

A

Because of the use of a binary decision rule, however it very much is a regression model.