Regression Flashcards

1
Q

Requirements for Linear Regression

A
  1. SRS
  2. Pairs of (x, y) data have a bivariate normal distribution (For each value X, the corresponding Y values have a normal dist; can be confirmed by examination of a scatterplot and double-checking of outliers
  3. Homoscedasticity of residuals (equal variance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

LR is a good model if:

A
  1. regression line of scatterplot appears to fit the points well
  2. r indicates a linear correlation
  3. High: R-squared/adj R-squared/F-stat
  4. Low: Std Error/t-statistic/AIC/BIC/MAPE/MSE
    * If not a good model, the best predicted value of y is the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Goal of linear regression

A

Find line that minimizes the sum of squares of residual values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why use a residual plot?

A

Scatter plot with residuals as y values; Used to assess correlation and regression results; Randomness in the distribution of the plot is what we want; any patterns or changing of “thickness” of distribution suggests an underlying, non-linear pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Regression process

A
  1. construct a histogram to initially gauge normality
  2. construct a scatterplot + quantile plot and verify that there is a linear pattern
  3. construct a residual plot and verify that there is no pattern
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Prediction interval

A

Confidence interval for variables (instead of population parameters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Total deviation of (x, y)

A

vertical distance y minus y-bar, which measures the distance between the the point (y) and the sample mean (y-bar)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explained deviation of (x, y)

A

vertical distance y-hat minus y-bar, which measures the distance from the predicted value and the sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Unexplained deviation of (x, y)

A

vertical distance of y minus y-hat, which is the vertical distance between the point (x, y) and the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

coefficient of determination (r-squared)

A

proportion of the variation in the response variable that has been explained by the model; R2= 1 - explained variation / total variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

correlation coefficient (r)

A

explains strength and direction of correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

adjusted r-squared

A

as you add more X variables to your model, the R-squared value will always be greater since new variables can only add to total amount of explained variation; adjusted R squares penalizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Standard Error

A

Absolute measure of the average distance that points fall from regression line; measure of goodness of fit; = Sqrt(MSE) = Sqrt [SSE/(n - q)] *q = # of coefficients in model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

F-statistic

A

measure of goodness of fit; MSR = sigma (pred-mean)/ (q - 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

AIC

A

Akaike’s Information Criterion; measures goodness of fit of an estimated statistical model and can be used for model selection; lower is better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

BIC

A

Bayesian Information criterion; measures goodness of fit of an estimated statistical model and can be used for model selection; lower is better

17
Q

MAPE

A

Mean absolute percentage error (lower the better)