Regression Flashcards

1
Q

How can Linear Regression be extended?

A

With regularisers (L2 Regularisation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is regression even for?

A

For predicting continuous classes; where classification fails

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Linear Regression?

A

Linear Regression is an attempt to build a linear model to predict the target values, by finding a weight for each attribute.

Captures a relationship between two variables or attributes
It makes the assumption that there is a linear relationship between the two variables

x = W0 + sum of WiAi

x = class
w* are the weights
a* are the attribute values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to choose the best line for linear regression?

A

option 1) Finding the line that minimises the distance between all points and the line
- Euclidean distance: d(a,b) = sqrt(sumof(ai - bi)^2))

option 2) Least squares estimation: finding the line that minimises the sum of the squares of the vertical distances between approximated/predicted and observed
- minimise the Residual Sum of Squares (RSS) -> aka Sum of Square Errors (SSE):
RSS(Beta) = Sumof(yi - betaxi)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which metric to use for linear regression to find the best line?

A
  • Actual choice of metric isn’t that important, they’re all pretty stable

Just use either

  • Root mean-squared error
  • Root relative squared error
  • Correlation coefficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

WTF is a Regression Tree?

A

Extension of Decision Trees, where the “class” (value) at each leaf is calculated by averaging over the values of all instances at that node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

WTF is a Model Tree?

A

Generalised regression trees where the class at each leaf is calculated via linear regression over training instances at that node

Basically partitioning our data set and applying linear regression to each partition

As you work down the tree, the result at each leaf node is which linear regression model to use on our data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regression vs Model trees

A

Model trees have advantages over regression trees in both compactness and prediction accuracy, because model trees can exploit local linearity in the data

Regression trees will never give a predicted value lying outside the range observed in the training cases, whereas model trees can extrapolate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to translate a regression task into a simple classification task?

A

Can map a continuous class onto discrete classes via DISCRETISATION

  • Set range of continuous variables that corresponds to each discrete class
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to translate a classification task into a suite of regression task?

A

MULTI-RESPONSE LINEAR REGRESSION

  • Perform one regression per discrete class
    • With all positive instances set to 1 and all negative instances set to 0
  • Classify a given test instance by estimating its value relative to each class, and selecting the class with the highest value
  • Approximates a numeric membership function for each class
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

WTF is Maximum Likelihood Estimation?

A

Goal is to search for a value of Beta so that the probability P(y = 1 | x) = hbeta(X) is large when x belongs to the “1” class and small when x belongs to the “0” class (so that P(y = 0 | x) is large)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Linear Regression uses gradient descent, what does Logistic Regression use?

A

Tries to maximise so uses gradient ascent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Can Logistic Regression be applied to multi-class classification?

A

By default, no, only for binary classification.

However, can extend to multi-classification by assuming a multinomial distribution.
- Mutlinomial Logistic Regression
Applies softmax - a generalisation of the logistic function to J dimensions.
- Results in a J-Dimensional vector of real values in the range (0,1) that add up to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Logistic Regression Pros and Cons

A

PROS

  • Simple yet low-bias classifier
  • Unlike Naive Bayes not confounded by diverse, correlated features

CONS

  • Slow to train
  • Some feature scaling issues
  • Often needs a lot of data to work well
  • Choosing a regularisation a nuisance but important since OVERFITTING is a problem - adds constraints on the parameter space
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Regression? How is it similar to Classification, and how is it different?

A
Regression is used for when the target attribute (class) is numeric (continuous).
Consequently, we can't assess the likelihood of each class like we can in Classification.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do we build a linear regression model? What is RSS and what advantage does it have over some alternatives?

A

1) Learn the weights using Gradient Descent
2) This assumes that our error function is convex; one popular alternative is to consider the sum of squared differences between our predicted values and the actual target values from the training data

17
Q

How can we use a Decision Tree to do Regression?

A

We can have a single prediction value at each leaf of the tree (this is a Regression tree)

Alternatively, we can apply a linear regression over the instances at each leaf of the tree (a Model tree)

18
Q

What is Logistic Regression?

A

Logistic Regression is an attempt to build a model where the target is close to “1” for positive instances of the class, and close to “0” for the negative instances of the class

19
Q

How is Logistic Regression similar to Naïve Bayes and how is it different?

A

Both Naïve Bayes and Logistic Regression are attempting to find the class c for a test instance T, by maximising P(c | T)

In Naïve Bayes, we make some simplifying assumptions, most notably, that the attributes are conditionally independent of the class labels - hence the product of all the attributes

In Logistic Regression, we attempt to model this directly, without the simplifying assumptions of independence. This is possible because we don’t attempt to generate class possibilities; only attempt to discriminate amongst the various classes

20
Q

What is “Logistic”? What are we “Regressing”?

A

Logistic function = 1 / (1 + e^m)

Regression output = (beta dotprod x)

21
Q

How do we train a Logistic Regression model? In particular, what is the significance of the following argmaxBeta(sumof(yiloghbeta(xi) + (1 - yi)log(1-hbeta(xi)))

A

To train, use the logistic function y = 1 / (1 + exp^(-betax))

  • This means we want the output of the linear regression to be positive when the target class is 1, and negative when the target class is 0
22
Q

How can Nearest Neighbour/Prototype classifiers be applied to regression tasks?

A

K-NN models can be applied directly through simply weighted combination of the continuous labels associated with the nearest neighbour.

However, it is less clear for Nearest Prototype models.

23
Q

How to do Naïve Bayes for continuous-valued features?

A

Probability Density Functions

24
Q

Can Naïve Bayes be applied to regression?

A

Yes, but it’s bad

25
Q

How to deal with continuous features in Decision Trees?

A

Information-Based Supervised Discretisation

26
Q

Can Decision Trees be applied to regression?

A

Yes, Regression Trees and Model Trees

27
Q

How to deal with continuous features in SVMs?

A

Like NN/NP, SVMs natively handle continuous-valued features.

28
Q

Can SVMs be applied to regression?

A

Yes, Support Vector Regressors

Modelling a tube/street to pick up all the training data, don’t want to pick up any of the support vectors? (I think)

29
Q

How to deal with continuous features in Logistic Regression?

A

Similar to Naïve Bayes, logistic regression models are usually defined in terms of discrete features, but, kernel density functions can be used to handle continuous features.

30
Q

Why is Logistic Regression considered a classification model?

A

Because of the use of a binary decision rule, however it very much is a regression model.