Regression Flashcards

Question 1

Q

Requirements for Linear Regression

Answer

A

SRS
Pairs of (x, y) data have a bivariate normal distribution (For each value X, the corresponding Y values have a normal dist; can be confirmed by examination of a scatterplot and double-checking of outliers
Homoscedasticity of residuals (equal variance)

Question 2

Q

LR is a good model if:

Answer

A

regression line of scatterplot appears to fit the points well
r indicates a linear correlation
High: R-squared/adj R-squared/F-stat
Low: Std Error/t-statistic/AIC/BIC/MAPE/MSE
* If not a good model, the best predicted value of y is the mean

Question 3

Q

Goal of linear regression

Answer

A

Find line that minimizes the sum of squares of residual values

Question 4

Q

Why use a residual plot?

Answer

A

Scatter plot with residuals as y values; Used to assess correlation and regression results; Randomness in the distribution of the plot is what we want; any patterns or changing of “thickness” of distribution suggests an underlying, non-linear pattern

Question 5

Q

Regression process

Answer

A

construct a histogram to initially gauge normality
construct a scatterplot + quantile plot and verify that there is a linear pattern
construct a residual plot and verify that there is no pattern

Question 6

Q

Prediction interval

Answer

A

Confidence interval for variables (instead of population parameters)

Question 7

Q

Total deviation of (x, y)

Answer

A

vertical distance y minus y-bar, which measures the distance between the the point (y) and the sample mean (y-bar)

Question 8

Q

Explained deviation of (x, y)

Answer

A

vertical distance y-hat minus y-bar, which measures the distance from the predicted value and the sample mean

Question 9

Q

Unexplained deviation of (x, y)

Answer

A

vertical distance of y minus y-hat, which is the vertical distance between the point (x, y) and the regression line

Question 10

Q

coefficient of determination (r-squared)

Answer

A

proportion of the variation in the response variable that has been explained by the model; R2= 1 - explained variation / total variation

Question 11

Q

correlation coefficient (r)

Answer

A

explains strength and direction of correlation

Question 12

Q

adjusted r-squared

Answer

A

as you add more X variables to your model, the R-squared value will always be greater since new variables can only add to total amount of explained variation; adjusted R squares penalizes

Question 13

Q

Standard Error

Answer

A

Absolute measure of the average distance that points fall from regression line; measure of goodness of fit; = Sqrt(MSE) = Sqrt [SSE/(n - q)] *q = # of coefficients in model

Question 14

Q

F-statistic

Answer

A

measure of goodness of fit; MSR = sigma (pred-mean)/ (q - 1)

Question 15

Q

AIC

Answer

A

Akaike’s Information Criterion; measures goodness of fit of an estimated statistical model and can be used for model selection; lower is better

Question 16

Q

BIC

Answer

Study These Flashcards

A

Bayesian Information criterion; measures goodness of fit of an estimated statistical model and can be used for model selection; lower is better

Question 17

Q

MAPE

Answer

Study These Flashcards

A

Mean absolute percentage error (lower the better)

Regression Flashcards

(17 cards)