Linear Regression Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

First step in multiple regression analysis?

A

Compute the F-statistic and examine the associated p-values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 basic types of variable selection? Describe one of them.

A
  • Forward selection: begin with the null model, then fit p simple linear regressions and add the one with the lowest RSS. Repeat with the new model.
  • Backward selection
  • Mixed selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two assumptions of a linear model and how can we remove them?

A
  • Additive assumption: the effect of changes in a predictor Xj on the response Y is independent of the values of the other predictors. Remove: add an interaction parameter.
  • Linear assumption: the change in the response Y due to a one-unit change in Xj is constant regardless of the value of Xj. Remove: add a transformed feature to the model with functions like Xj^n, log(Xj), sqrt(Xj) etc…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to show the potential non-linearity of the data?

A

We plot residuals: yi-yi_estimated versus yi. If there is a pattern, it indicates the presence of non-linearities in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give an example of a problem due to error terms being correlated? How to detect such a correlation for times series?

A

Example: if we duplicate all the data, the model won’t change, but the confidence interval would be falsely narrower by a factor of sqrt(2).

We can detect it by plotting residuals versus time and see if there’s a pattern or it there’s local trends in the plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to detect non-constant variance in error terms (heteroscedasticity)? What is a possible solution to that?

A

We can plot the residuals and see if there’s a funnel shape.

Solution: transform response using a concave function such as log(Y) or sqrt(Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between an outlier and a high-leverage point? What are the effects of both and how can we detect them.

A

Outlier:

  • Def: a point for which the value of yi is far from the value predicted by the model or far from the trend of the values.
  • Effect: doesn’t have a big effect on the least squares line but can really mess up the metrics such as the RSE.
  • Detect: studentized residuals (divide each residual by its standard error). Observations whose studentized residuals are greater than 3 in absolute value are potential outliers.

High-leverage point:

  • Def: point that has an unusual feature value Xj
  • Effect: really messes with the estimated regression line
  • Detect: high leverage statistic

A good idea is to plot the studentized residuals versus the leverage statistic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is collinearity a problem in regression?
How to detect it?
Give two solutions on how to deal with it.

A

It is difficult to separate out the individual effects of collinear variables on the response.

Detect: correlation matrix, variance inflation factor

1st solution: drop one of the variables
2nd solution: combine the collinear variables into one predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly