- Predict the value of a continuous variable based on other variables assuming a linear or nonlinear model of dependency ``` Predicted variable (dependent) = ^y (yhead) Other variables (explanatory variables) ```

Regression Flashcards by Jan Kiefer

What is regression?

Predict the value of a continuous variable based on other variables assuming a linear or nonlinear model of dependency

Predicted variable (dependent) = ^y (yhead)
Other variables (explanatory variables)

How well did you know this?

Not at all

Perfectly

Whats the difference between regression and classification?

Difference to classification is that classification predicts nominal class attributes whereas regression predicts continuous attributes

How well did you know this?

Not at all

Perfectly

What are regression techniques?

1) Linear Regression
2) Polynomial Regression
3) Local Regression
4) ANN
5) DNN
6) K-nearest-Neighbours Regression

How well did you know this?

Not at all

Perfectly

Explain k-nearest neighbor regression

Use the numeric average of the k-nearest stations

Choose k values between 1 and 20 (Rule of thumb)

How well did you know this?

Not at all

Perfectly

How can you evaluate regression models?

Methods for Model Evaluation:

Cross Validation: 10-fold
Holdout Validation: 80 % random share for training, 20% for testing

How well did you know this?

Not at all

Perfectly

What metrics for Model Evaluation can be applied?

Mean Absolute Error (MAE): computes the average deviation between predicted value and actual value
Mean Squared Error (MSE): Places more emphasis on larger deviation
Root Mean Squared Error (RMSE): similar scale as MAE with more emphasis on larger deviations
Pearsons Correlation Coefficient: scores well if high (low) actual values get high (low) predictions
R Squared: measures the part of the variation in y hat that is explainable from the explanatory variables

How well did you know this?

Not at all

Perfectly

How do you interpret R2

R2 = 1; perfect model as total variation of y can be completely explained from X

How well did you know this?

Not at all

Perfectly

How can you apply regression trees?

In principle the same as for classification
Differences:
1) splits are selected by maximizing the MSE reduction (not GINI or entropy)
2) prediction is average value of the trainings examples in a specific leaf

How well did you know this?

Not at all

Perfectly

What may happen if your tree has a higher depth?

It may overfit

- The model learns several outliers

How well did you know this?

Not at all

Perfectly

What is the assumption of linear regression?

The target variable y is linearly dependent on explanatory variables x

How well did you know this?

Not at all

Perfectly

How do you fit a regression function?

Least-squares approach: Find the weight vector that minimizes the sum of squared error for all training examples
Error: Difference between estimated and real value in training data

How well did you know this?

Not at all

Perfectly

What is ridge regularization?

Variation of least squares approach (another way to fit a regression function)
Tries to avoid overfitting by keeping weights small

alpha of 0 = normal least squares regression
alpha of 100 = strongly regularized flat curve (strong penalty)

How well did you know this?

Not at all

Perfectly

What problems can occur by feature selection for regression?

Problem 1: Highly correlated variables (height in cm and inch)
- weights are meaningless, one variable should be removed

Problem 2: Insignificant variables
- uncorrelated variables get w=0 or relatively small weights assigned

How well did you know this?

Not at all

Perfectly

How can you check if a variable with a small weight really is insignificant?

Statistical test with H0 (w=0 as variable is insignificant)
t-stat: number of standard deviations that w is away from 0 (center of distribution; high t-stat means that H0 is to reject)
p-value: Probability of wrongly rejecting H0 (p-value close to 0; variable is significant

How well did you know this?

Not at all

Perfectly

What does Interpolation mean?

Interpolating regression:

predicted values are within the range of the training data values
is regarded as safe

How well did you know this?

Not at all

Perfectly

What does extrapolation mean?

Study These Flashcards

Extrapolating regression:

May also predict values outside of the training data interval
Depending on the use case can also be relevant (how far will the sea level have risen by 2050) ?

An explanatory variable that is out of range could result in a predicted dependent variable out of range

How can the results of a linear regression and a K-NN regression be described?

Study These Flashcards

Linear regression extrapolates
KNN and regression trees interpolate

–> Linear regression is sensitive to outliers

Which technique can be applied to non-linear problems?

Study These Flashcards

Apply transformations to the explanatory variables within the regression function (log, exp, square root; polynomial transformation)
This allows use of linear regression techniques to fit much more complicated non-linear datasets

Explain polynomial regression

Study These Flashcards

Extension of linear regression
can be fitted using the least squares method
Tendency to overfit training data for large degrees M (to the power of M)

Where does a polynomial regression often overfits and which workaround can you apply to mitigate overfitting?

Study These Flashcards

Overfitting often happens in sparse regions

Workarounds:

decrease M
increase amount of training data
Local Regression

What is the idea behind local regression?

Study These Flashcards

Assumption: non-linear problems are approximately linear in local areas
Idea: use linear regression locally for the data point at hand (lazy learning)
- Combination of k-NN and linear regression

How does local regression work?

Study These Flashcards

Given a datapoint:

1) retrieve the k nearest neighbors
2) learn a regression model for those neighbors
3) use the learned model to predict the y value

What are the advantages of local regression?

Study These Flashcards

Advantage: fits non-linear models well

good local approximation
often better than pure k-NN (local regression does not compute the average of the neighbors but a regression function that is often more accurate)

What are the disadvantages of local regression?

Study These Flashcards

Disadvantage:

slow at runtime
for each test example:
- find k nearest neighbors
- compute local model

How do ANNs for regression differ from ANNs for classification?

- ANN for classification uses a threshold to determine if the class is true or false - ANN for regression does not use a cutoff for true or false predictions; it leaves the numbers as they are (between 0 and 1)

Can you learn non-linear models with ANNs?

- No, its a linear model where the target variable is a linear combination of the input variables - non-linear regressions can be approximated by using multiple hidden layers (Deep ANNs allow arbitrary functions) But: the flexibility of Deep ANNs can result in overfitting -> lot of training data necessary

What is a time series?

- Data points are indexed in time order | - e.g. Stock market prices, Temperature

What is the difference in predicting Time series?

- Predict data points that continue the series into the future - There is a order in the data - other regression techniques do not use time explicitly - aims to predict future values of the same variable

What are the different approaches for time series forecasting?

1) Data-driven: Smoothing 2) Model-driven: 1) component models of time series 2) other regression techniques

How does smoothing work?

Principle: Use more recent values, as they might matter more Simple Moving Average (SMA) - average of last n values Exponential Moving Average (EMA) - exponentially decrease weight of older values

Explain component models of time series

Assume time series to consist of four components: 1) Long-term trend 2) Cyclical effect 3) Seasonal effect 4) Random variation

Explain Windowing as time series method

Idea: Transform forecasting problem int classical learning problem (classification or regression) - Only use the last k time periods E.g. Weather forecasting: Assumption: - use the weather from the k previous days - the older past is irrelevant

What are the components of the generalization error?

- Bias: Error due to wrong model complexity (model underfits the training and test data) - Variance: Models excessive sensitivity to small variations in the training data (flexible models are likely to overfit) - Irreducible error: Due to noisiness of the data itself (need to clean the training data)

Explain the Bias / Variance-Tradeoff

- The more flexible a model is the higher is the degree of overfitting - the less flexible a model is the higher is the degree of underfitting ``` Both cases (too flexible and too static) result in higher errors -> find the ideal flexibility that results in the lowest error! ```

How can you find the ideal flexibility (bias/variance-tradeoff)

1) Test different learning methods (models that are more biased (Linear Regression) and more flexible (polynomial regression) 2) Test different hyperparameters - Degree of polynomial, ridge - max depth of tree, min examples branch - number of hidden layers of ANN 3a: increase amount of training data 3b: include data with corner cases (make data more interesting) 3c: cleanse the training data

How does the learning curve for a low flexibility model look like in dependence of training set size?

- Underfitting: High training and test errors due to bias | - Error remains even if more data is added

How does the learning curve for a high flexibility model look like in dependence of training set size?

- Overfitting: Generalization error decreases for increasing training set size - using more training data you can also use more flexible models (like Deep Learning)

Regression Flashcards

(37 cards)