Mult Lin Reg/ Model Select Flashcards

Question 1

Q

multivariable modeling considerations

Answer

A

o 1.Which variables should be included
o 2.Which observations should be included
o 3.What variable transformations should be done
• 4.What variables might be confounders
• 5.What variables might represent effect modifiers (i.e. require interactions between variables)
• 6.Whether there are there multicollinearity problems
• 7.Whether there is a sample size / sparse data problem
• 8.How missing values will be handled
• 9.Consideration of overfitting

Question 2

Q

before any multivariable modeling

Answer

A

1.Select variables of interest
2.Define categories, sometimes more than once, for a given variable
3.Examine univariate distributions
4.Examine bivariate distributions
5.Perform univariate analysis for the primary association of interest, with each potential confounder / effect modifier
6.Rethink variables and categories
7.Perform multivariable analysis for the primary association of interest with different combinations of potential confounders / effect modifiers

Question 3

Q

multiple linear regression

Answer

A

Relationship between one dependent and two or more independent variables is a linear function

Question 4

Q

multiple regression model

Answer

A

Estimate multiple linear regression equation;
Test overall significance of model
Test significance of each independent variables
Test relative important of each independent variables
Select best model
Use model for prediction and estimation

Question 5

Q

interpretation of regression model

Answer

A

• b0 is the estimate β0, which is the intercept of the regression line and the average Y value when the predictors are zero.
• bk is the estimate βk, which is one of the partial regression coefficient or ‘slopes’ of regression line
o represents change in Y for a unit change in Xk with other predictors held constant.
o i.e., βk is the average slope across all subgroups created by the Xk levels

Question 6

Q

testing model significance

Answer

A

Tests whether there is a linear relationship between all X variables together and Y

use f test statistic - tests significance

Question 7

Q

model hypotheses null and alternative

Answer

A

o H0: b1 = b2 = … = bk = 0
• Null: All beta coefficients are zero (excluding the intercept). If any kind of relationship between y and any of our predictors this wont be true. Almost every time we have real dataset reject this hypothesis fast (not even worth reporting that we reject it)
o Ha: At least one beta coefficient is not 0
• At least one X variable is related to Y

Question 8

Q

linear regression assumptions

Answer

A

o Mean of distribution of error is 0
o Distribution of error has constant variance
o Distribution of error is normal
o Errors are independent

Question 9

Q

to evaluate whether regression assumptions hold

Answer

A

we estimate the errors. These estimated errors are called residuals.

Question 10

Q

residual calculation

Answer

A

difference between an observed value of Y and the estimated mean based on the associated X value(s).

Question 11

Q

residuals useful for

Answer

A

o Diagnostics–techniques for checking assumptions of the regression model
o Understanding the variation in Y that is unexplained by the regression model
o Identifying possible outliers

Question 12

Q

Residual analysis

Answer

A

• Graphical analysis of residuals
o Plot residuals vs. Xi Values
o Plot residuals vs predicted values ( )
o Plot histogram or stem-and-leaf of residuals
o Q-Q plot of residuals

Question 13

Q

what if diagnostic plots indicate a problem

Answer

A

•	Change the model:  
o	Add or remove variables
o	Transform variables or recode categorical variables
o	Remove outliers (but be careful!)
•	Use a different analytic approach

Question 14

Q

occam’s razor

Answer

A

Occam’s Razor: the principle that the simplest explanation is the most plausible unless there is evidence that a more complicated explanation is necessary
For regression: you want a model with the smallest number of simple predictors that explains the observed data

Question 15

Q

R squared

Answer

A

Proportion of variation in Y ‘explained’ by all X variables Taken Together.. always increases (or stays same) when a new X variable added to model

Question 16

Q

simply maximizing r square will

Answer

Study These Flashcards

A

lead to models that vastly overfit data

Question 17

Q

model building

Answer

Study These Flashcards

A

• Use specified X variables (chosen based on understanding of problem and data)
• Stepwise Regression
o Computer selects X variable most highly correlated With Y
o Continues to add or remove variables depending on SSE (forward selection or backwards elimination)

Question 18

Q

forward selection

Answer

Study These Flashcards

A

involves starting with no variables in the model, testing the addition of each variable using a chosen model comparison criterion, adding the variable (if any) that improves the model the most, and repeating this process until none improves the model.

Question 19

Q

backward elimination

Answer

Study These Flashcards

A

Backward elimination, which involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, deleting the variable (if any) that improves the model the most by being deleted, and repeating this process until no further improvement is possible

Question 20

Q

multicollinearity

Answer

Study These Flashcards

A

High correlation between X variables
Coefficients measure combined effect
Leads to unstable coefficients and potentially misleading conclusions
Example: BMI and weight in the same model

Question 21

Q

detecting multicollinearity

Answer

Study These Flashcards

A

• Examine Correlation Matrix
o Correlations between pairs of X variables are greater than correlations with Y variable
• look at scatterplots between all pairs of variables

Question 22

Q

remedies for multicolinearity

Answer

Study These Flashcards

A

Eliminate one correlated X variable

Mult Lin Reg/ Model Select Flashcards

(22 cards)