3.2 Multiple Linear Regression Flashcards

1
Q

Explain the key components of a multiple linear regression (MLR) model and their roles in the model equation. Describe the difference between the systematic and random components of an MLR model.

A

The key component of an MLR are the target variable and the predictors,

The systematic component b0 + x1b1 +…+xpbp models the target variable y while the random component models the noise of the prediction and is asumed to be normally distributed with mean and variance 0 and sigma^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the technique used to estimate coefficients in MLR. Interpret the meaning of the estimated coefficients in an MLR prediction function.

A

The technique used to estimate the coefficients of MLR is ordinary least squares (OLS) which seeks to minimiza the ERROR SUM OF SQUARES (SSE). The interpretation of the coefficients is:

  • b0: is the mean of the predicted target variable
  • bj: is the expected change in the predicted value of Y of every unit of increase in the predictor j, assuming all other predictors remains constant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discuss the purpose and interpretation of the residual standard error in an MLR model.

A

The residual standard error is an estimate of σ, the standard deviation of the error term. A lower residual standard error indicates a better model fit and less influence from the random component.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain how higher-order terms can be used to model non-linear relationships in MLR and the challenges they pose for interpretation.

A

Higher-order terms refer to variables raised to integer powers to model non-linear relationships between predictors and the target. They require inclusion of all lower-order terms up to the highest desired order, and they make interpretation more complex because the effect on the target does not change uniformly with each unit change in the predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the role of dummy variables in incorporating categorical factors into MLR models and how to interpret their coefficients.

A

Dummy variables allow for the inclusion of categorical factors in MLR. For a factor with w levels, w-1 dummy variables are used, with the coefficients representing the change in the predicted value of Y
compared to the reference level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain the concept of interaction terms in MLR and what they achieve.

A

Interaction terms are products of different predictors, introducing dependence between predictors in influencing the target variable. They allow the effect of one predictor on Y to change depending on the value of another predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Discuss the hierarchical principle in the context of including interaction terms.

A

According to the hierarchical principle, if an interaction term is included in the model, its individual terms should also be included, regardless of their significance. Removing the individual terms could alter the interpretation of the interaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain the purpose and interpretation of t tests and F tests.

A

The t test is used to determine the significance of an individual predictors, with a small p-value indicating a significant predictor. The F test is used to determine the significance of multiple predictors by comparing nested models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the key assumptions of MLR and the seven concerns that need attention.

A

Assumptions:

  1. E(ε) = 0
  2. VAR(ε) = σ^2
  3. ε’s are independent
  4. ε’s are normally distributed
  5. any predictor is linear combination of the other predictors

Concerns associated with the assumptions, respectivelly

  1. Residuals with non-zero averages
  2. Heteroscedasticity
  3. Dependent ε’s
  4. non-normally distributed ε’s
  5. collinearity and multicollinearity

The next concers are not associated directly with the violation of the assumptions

  1. Outliers and high leverage points
  2. high-dimensional data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Discuss the role of residuals in evaluating the validity of an MLR model.

A

Residuals, calculated as e= y - y_hat, are the observed realizations of
ε. They should exhibit similar properties to ε as described by the MLR assumptions. Deviations suggest poor model fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain the concept of perfect collinearity and its impact on MLR parameter estimation.

A

Perfect collinarity arise when a predictor xj is exact linear combination of other predictors. Collinearity makes the OLS model estimators no longer the minimum variance unbiased estimators. Thats because the model is predicting w parameters with w-1 equations, arising multiple results

Perfect collinearity occurs when a predictor is an exact linear combination of other predictors. This prevents OLS from estimating unique coefficients, which causes issues in model estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe how to interpret residual plots, namely residual vs. prediction plots and qq plots.

A

Residual vs. prediction plots should show a random scatter around
e=0, consistent spread (homoscedasticity), and no discernible trend. In qq plots, the points should closely follow the superimposed line, with deviations suggesting non-normal residuals. Standardized residuals beyond +-3 may indicate outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly