Regression Flashcards
What is regression?
- Predict the value of a continuous variable based on other variables assuming a linear or nonlinear model of dependency
Predicted variable (dependent) = ^y (yhead) Other variables (explanatory variables)
Whats the difference between regression and classification?
Difference to classification is that classification predicts nominal class attributes whereas regression predicts continuous attributes
What are regression techniques?
1) Linear Regression
2) Polynomial Regression
3) Local Regression
4) ANN
5) DNN
6) K-nearest-Neighbours Regression
Explain k-nearest neighbor regression
- Use the numeric average of the k-nearest stations
Choose k values between 1 and 20 (Rule of thumb)
How can you evaluate regression models?
Methods for Model Evaluation:
- Cross Validation: 10-fold
- Holdout Validation: 80 % random share for training, 20% for testing
What metrics for Model Evaluation can be applied?
- Mean Absolute Error (MAE): computes the average deviation between predicted value and actual value
- Mean Squared Error (MSE): Places more emphasis on larger deviation
- Root Mean Squared Error (RMSE): similar scale as MAE with more emphasis on larger deviations
- Pearsons Correlation Coefficient: scores well if high (low) actual values get high (low) predictions
- R Squared: measures the part of the variation in y hat that is explainable from the explanatory variables
How do you interpret R2
R2 = 1; perfect model as total variation of y can be completely explained from X
How can you apply regression trees?
- In principle the same as for classification
Differences:
1) splits are selected by maximizing the MSE reduction (not GINI or entropy)
2) prediction is average value of the trainings examples in a specific leaf
What may happen if your tree has a higher depth?
- It may overfit
- The model learns several outliers
What is the assumption of linear regression?
- The target variable y is linearly dependent on explanatory variables x
How do you fit a regression function?
Least-squares approach: Find the weight vector that minimizes the sum of squared error for all training examples
Error: Difference between estimated and real value in training data
What is ridge regularization?
- Variation of least squares approach (another way to fit a regression function)
- Tries to avoid overfitting by keeping weights small
alpha of 0 = normal least squares regression
alpha of 100 = strongly regularized flat curve (strong penalty)
What problems can occur by feature selection for regression?
Problem 1: Highly correlated variables (height in cm and inch)
- weights are meaningless, one variable should be removed
Problem 2: Insignificant variables
- uncorrelated variables get w=0 or relatively small weights assigned
How can you check if a variable with a small weight really is insignificant?
- Statistical test with H0 (w=0 as variable is insignificant)
- t-stat: number of standard deviations that w is away from 0 (center of distribution; high t-stat means that H0 is to reject)
- p-value: Probability of wrongly rejecting H0 (p-value close to 0; variable is significant
What does Interpolation mean?
Interpolating regression:
- predicted values are within the range of the training data values
- is regarded as safe
What does extrapolation mean?
Extrapolating regression:
- May also predict values outside of the training data interval
- Depending on the use case can also be relevant (how far will the sea level have risen by 2050) ?
An explanatory variable that is out of range could result in a predicted dependent variable out of range
How can the results of a linear regression and a K-NN regression be described?
- Linear regression extrapolates
- KNN and regression trees interpolate
–> Linear regression is sensitive to outliers
Which technique can be applied to non-linear problems?
- Apply transformations to the explanatory variables within the regression function (log, exp, square root; polynomial transformation)
- This allows use of linear regression techniques to fit much more complicated non-linear datasets
Explain polynomial regression
- Extension of linear regression
- can be fitted using the least squares method
- Tendency to overfit training data for large degrees M (to the power of M)
Where does a polynomial regression often overfits and which workaround can you apply to mitigate overfitting?
- Overfitting often happens in sparse regions
Workarounds:
- decrease M
- increase amount of training data
- Local Regression
What is the idea behind local regression?
Assumption: non-linear problems are approximately linear in local areas
Idea: use linear regression locally for the data point at hand (lazy learning)
- Combination of k-NN and linear regression
How does local regression work?
Given a datapoint:
1) retrieve the k nearest neighbors
2) learn a regression model for those neighbors
3) use the learned model to predict the y value
What are the advantages of local regression?
Advantage: fits non-linear models well
- good local approximation
- often better than pure k-NN (local regression does not compute the average of the neighbors but a regression function that is often more accurate)
What are the disadvantages of local regression?
Disadvantage:
- slow at runtime
- for each test example:
- find k nearest neighbors
- compute local model