Linear Regression Flashcards

Question 1

Q

our job as Machine Learning experts

Answer

A

Choose a model suitable for classifying the data according
to the attributes
• Choose attributes suitable for classifying the data according
to the model
• tune the hyper-parameters of the model

Question 2

Q

hyper-parameters tuning

Answer

A

• Compared to choice of learner and feature representation,
improvement due to “parameter tuning” is typically small

Usually used as a final stage, to get slightly higher Accuracy with respect to the development data
Because we are evaluating lots of models, there is a risk of “over-tuning”
the best choice of hyper-parameters for the development data may not the best choice of hyper-parameters on the test data
hyper-parameters tuning -> grid search

Question 3

Q

ML can be viewed as?

Answer

A

optimisation problem

Maximise D(L, θ; F(T ))

evaluation metric D (like Accuracy), a dataset T ,
a feature representation F(T ), and a learner L with
hyperparameters θ

Holding F(T ) and L fixed:
ˆθ = arg minθ∈Θ Error(θ;L, F(T ))
–> optimizing θ

Question 4

Q

Linear Regression

Answer

A

continuous attributes -> continuous class

Linear regression captures a relationship between two attributes. It makes the assumption that there is a linear relationship between the two variables.

An outcome variable (aka response variable, dependent variable, or label)
A predictor (aka independent variable, explanatory variable, or feature)

Question 5

Q

how to choose the best line in LinearR?

Answer

A

(1) the line that minimises the distance between all points and the line (Euclidean distance)

(2) Least squares estimation: find the line that minimises the sum of the squares of the vertical distances between
approximated/predicted yˆis and actual yis.
• Minimise the Residual Sum of Squares (RSS)
(aka Sum of Squares Due to Error (SSE))

All attributes are numerical → Grid Search is )-:
• Partial derivatives can be (easily!) calculated
• (RSS is convex — the local optimum is a global minimum)

How to find the line that has the lowest RSS then?
–> Gradient Descent

Question 6

Q

Gradient Descent

Answer

A

gradient descent is an iterative optimization algorithm to minimize a cost function, and can be applied to minimizing the SSE

We need to pick a certain value of a and a certain value for b that will minimize the cost function

Gradient descent is an algorithm that minimizes a convex function. It turns out that if we denote the above expression as a sort of a cost, then for every choice of a,b, we will get a certain cost. We want this to be minimum. (minimize the value of f(a,b))

Iterative approximation to Error optimisation

Steps in the Gradient Descent algorithm involve:
• making a prediction for each (training) instance
• comparing the prediction with the actual value
• multiplying by the corresponding attribute value
• updating the weights after all of the training instances have been processed
–> evaluation matrix is already built-in in the model as we were going to compare the predictions with the actual values anyway

gradient descent takes as an input some seed values of a and b and iteratively improves it, until we reach the minimum cost, thereby giving us the optimal parameters aoptimal and boptimal, and then we can get the desired line, y=a-optimal⋅x+b-optimal.

Question 7

Q

α in GD

Answer

A

α is a parameter of the algorithm, representing the learning rate (how big a step you take in updating θi).
If α is too small, the algorithm might be slow.
If it is too large, you might miss the minimum.

Question 8

Q

Evaluation of Numeric Prediction

Answer

A

• It clearly doesn’t make sense to evaluate numeric prediction
tasks in the same manner as classification tasks, as:
• “direct hits” (true positive matches) are an unreasonable
expectation
• unlike classification, we can make use of the inherent
“ordering” and “scale” of the outputs

RSS
MSE
RMSE
RRSE
Corelation Coefficient

Question 9

Q

Which Evaluation Metric to Use?

Answer

A

The relative ranking of methods each across the different metrics is reasonably stable, such that the actual choice of metric isn’t crucial

Question 10

Q

Non-linear methods for numeric prediction

Answer

A

regression trees
model trees (generalised regression trees)
locally weighted linear regression
support vector regression

Question 11

Q

what to do with discrete attribute in LinearR?

Answer

A

binarization