Regression Flashcards

1
Q

What is regression?

A
  • Predict the value of a continuous variable based on other variables assuming a linear or nonlinear model of dependency
Predicted variable (dependent) = ^y (yhead)
Other variables (explanatory variables)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Whats the difference between regression and classification?

A

Difference to classification is that classification predicts nominal class attributes whereas regression predicts continuous attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are regression techniques?

A

1) Linear Regression
2) Polynomial Regression
3) Local Regression
4) ANN
5) DNN
6) K-nearest-Neighbours Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain k-nearest neighbor regression

A
  • Use the numeric average of the k-nearest stations

Choose k values between 1 and 20 (Rule of thumb)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you evaluate regression models?

A

Methods for Model Evaluation:

  • Cross Validation: 10-fold
  • Holdout Validation: 80 % random share for training, 20% for testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What metrics for Model Evaluation can be applied?

A
  • Mean Absolute Error (MAE): computes the average deviation between predicted value and actual value
  • Mean Squared Error (MSE): Places more emphasis on larger deviation
  • Root Mean Squared Error (RMSE): similar scale as MAE with more emphasis on larger deviations
  • Pearsons Correlation Coefficient: scores well if high (low) actual values get high (low) predictions
  • R Squared: measures the part of the variation in y hat that is explainable from the explanatory variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you interpret R2

A

R2 = 1; perfect model as total variation of y can be completely explained from X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can you apply regression trees?

A
  • In principle the same as for classification
    Differences:
    1) splits are selected by maximizing the MSE reduction (not GINI or entropy)
    2) prediction is average value of the trainings examples in a specific leaf
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What may happen if your tree has a higher depth?

A
  • It may overfit

- The model learns several outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the assumption of linear regression?

A
  • The target variable y is linearly dependent on explanatory variables x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you fit a regression function?

A

Least-squares approach: Find the weight vector that minimizes the sum of squared error for all training examples
Error: Difference between estimated and real value in training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is ridge regularization?

A
  • Variation of least squares approach (another way to fit a regression function)
  • Tries to avoid overfitting by keeping weights small

alpha of 0 = normal least squares regression
alpha of 100 = strongly regularized flat curve (strong penalty)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What problems can occur by feature selection for regression?

A

Problem 1: Highly correlated variables (height in cm and inch)
- weights are meaningless, one variable should be removed

Problem 2: Insignificant variables
- uncorrelated variables get w=0 or relatively small weights assigned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can you check if a variable with a small weight really is insignificant?

A
  • Statistical test with H0 (w=0 as variable is insignificant)
  • t-stat: number of standard deviations that w is away from 0 (center of distribution; high t-stat means that H0 is to reject)
  • p-value: Probability of wrongly rejecting H0 (p-value close to 0; variable is significant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does Interpolation mean?

A

Interpolating regression:

  • predicted values are within the range of the training data values
  • is regarded as safe
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does extrapolation mean?

A

Extrapolating regression:

  • May also predict values outside of the training data interval
  • Depending on the use case can also be relevant (how far will the sea level have risen by 2050) ?

An explanatory variable that is out of range could result in a predicted dependent variable out of range

17
Q

How can the results of a linear regression and a K-NN regression be described?

A
  • Linear regression extrapolates
  • KNN and regression trees interpolate

–> Linear regression is sensitive to outliers

18
Q

Which technique can be applied to non-linear problems?

A
  • Apply transformations to the explanatory variables within the regression function (log, exp, square root; polynomial transformation)
  • This allows use of linear regression techniques to fit much more complicated non-linear datasets
19
Q

Explain polynomial regression

A
  • Extension of linear regression
  • can be fitted using the least squares method
  • Tendency to overfit training data for large degrees M (to the power of M)
20
Q

Where does a polynomial regression often overfits and which workaround can you apply to mitigate overfitting?

A
  • Overfitting often happens in sparse regions

Workarounds:

  • decrease M
  • increase amount of training data
  • Local Regression
21
Q

What is the idea behind local regression?

A

Assumption: non-linear problems are approximately linear in local areas
Idea: use linear regression locally for the data point at hand (lazy learning)
- Combination of k-NN and linear regression

22
Q

How does local regression work?

A

Given a datapoint:

1) retrieve the k nearest neighbors
2) learn a regression model for those neighbors
3) use the learned model to predict the y value

23
Q

What are the advantages of local regression?

A

Advantage: fits non-linear models well

  • good local approximation
  • often better than pure k-NN (local regression does not compute the average of the neighbors but a regression function that is often more accurate)
24
Q

What are the disadvantages of local regression?

A

Disadvantage:

  • slow at runtime
  • for each test example:
    • find k nearest neighbors
    • compute local model
25
Q

How do ANNs for regression differ from ANNs for classification?

A
  • ANN for classification uses a threshold to determine if the class is true or false
  • ANN for regression does not use a cutoff for true or false predictions; it leaves the numbers as they are (between 0 and 1)
26
Q

Can you learn non-linear models with ANNs?

A
  • No, its a linear model where the target variable is a linear combination of the input variables
  • non-linear regressions can be approximated by using multiple hidden layers (Deep ANNs allow arbitrary functions)
    But: the flexibility of Deep ANNs can result in overfitting -> lot of training data necessary
27
Q

What is a time series?

A
  • Data points are indexed in time order

- e.g. Stock market prices, Temperature

28
Q

What is the difference in predicting Time series?

A
  • Predict data points that continue the series into the future
  • There is a order in the data
  • other regression techniques do not use time explicitly
  • aims to predict future values of the same variable
29
Q

What are the different approaches for time series forecasting?

A

1) Data-driven: Smoothing
2) Model-driven:
1) component models of time series
2) other regression techniques

30
Q

How does smoothing work?

A

Principle: Use more recent values, as they might matter more

Simple Moving Average (SMA)
- average of last n values

Exponential Moving Average (EMA)
- exponentially decrease weight of older values

31
Q

Explain component models of time series

A

Assume time series to consist of four components:

1) Long-term trend
2) Cyclical effect
3) Seasonal effect
4) Random variation

32
Q

Explain Windowing as time series method

A

Idea: Transform forecasting problem int classical learning problem (classification or regression)
- Only use the last k time periods

E.g. Weather forecasting:
Assumption:
- use the weather from the k previous days
- the older past is irrelevant

33
Q

What are the components of the generalization error?

A
  • Bias: Error due to wrong model complexity (model underfits the training and test data)
  • Variance: Models excessive sensitivity to small variations in the training data (flexible models are likely to overfit)
  • Irreducible error: Due to noisiness of the data itself (need to clean the training data)
34
Q

Explain the Bias / Variance-Tradeoff

A
  • The more flexible a model is the higher is the degree of overfitting
  • the less flexible a model is the higher is the degree of underfitting
Both cases (too flexible and too static) result in higher errors
-> find the ideal flexibility that results in the lowest error!
35
Q

How can you find the ideal flexibility (bias/variance-tradeoff)

A

1) Test different learning methods (models that are more biased (Linear Regression) and more flexible (polynomial regression)
2) Test different hyperparameters
- Degree of polynomial, ridge
- max depth of tree, min examples branch
- number of hidden layers of ANN

3a: increase amount of training data
3b: include data with corner cases (make data more interesting)
3c: cleanse the training data

36
Q

How does the learning curve for a low flexibility model look like in dependence of training set size?

A
  • Underfitting: High training and test errors due to bias

- Error remains even if more data is added

37
Q

How does the learning curve for a high flexibility model look like in dependence of training set size?

A
  • Overfitting: Generalization error decreases for increasing training set size
  • using more training data you can also use more flexible models (like Deep Learning)