Mult Lin Reg/ Model Select Flashcards

1
Q

multivariable modeling considerations

A

o 1.Which variables should be included
o 2.Which observations should be included
o 3.What variable transformations should be done
• 4.What variables might be confounders
• 5.What variables might represent effect modifiers (i.e. require interactions between variables)
• 6.Whether there are there multicollinearity problems
• 7.Whether there is a sample size / sparse data problem
• 8.How missing values will be handled
• 9.Consideration of overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

before any multivariable modeling

A
  • 1.Select variables of interest
  • 2.Define categories, sometimes more than once, for a given variable
  • 3.Examine univariate distributions
  • 4.Examine bivariate distributions
  • 5.Perform univariate analysis for the primary association of interest, with each potential confounder / effect modifier
  • 6.Rethink variables and categories
  • 7.Perform multivariable analysis for the primary association of interest with different combinations of potential confounders / effect modifiers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

multiple linear regression

A

Relationship between one dependent and two or more independent variables is a linear function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

multiple regression model

A
  • Estimate multiple linear regression equation;
  • Test overall significance of model
  • Test significance of each independent variables
  • Test relative important of each independent variables
  • Select best model
  • Use model for prediction and estimation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

interpretation of regression model

A

• b0 is the estimate β0, which is the intercept of the regression line and the average Y value when the predictors are zero.
• bk is the estimate βk, which is one of the partial regression coefficient or ‘slopes’ of regression line
o represents change in Y for a unit change in Xk with other predictors held constant.
o i.e., βk is the average slope across all subgroups created by the Xk levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

testing model significance

A

Tests whether there is a linear relationship between all X variables together and Y

use f test statistic - tests significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

model hypotheses null and alternative

A

o H0: b1 = b2 = … = bk = 0
• Null: All beta coefficients are zero (excluding the intercept). If any kind of relationship between y and any of our predictors this wont be true. Almost every time we have real dataset reject this hypothesis fast (not even worth reporting that we reject it)
o Ha: At least one beta coefficient is not 0
• At least one X variable is related to Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

linear regression assumptions

A

o Mean of distribution of error is 0
o Distribution of error has constant variance
o Distribution of error is normal
o Errors are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

to evaluate whether regression assumptions hold

A

we estimate the errors. These estimated errors are called residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

residual calculation

A

difference between an observed value of Y and the estimated mean based on the associated X value(s).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

residuals useful for

A

o Diagnostics–techniques for checking assumptions of the regression model
o Understanding the variation in Y that is unexplained by the regression model
o Identifying possible outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Residual analysis

A

• Graphical analysis of residuals
o Plot residuals vs. Xi Values
o Plot residuals vs predicted values ( )
o Plot histogram or stem-and-leaf of residuals
o Q-Q plot of residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what if diagnostic plots indicate a problem

A
•	Change the model:  
o	Add or remove variables
o	Transform variables or recode categorical variables
o	Remove outliers (but be careful!)
•	Use a different analytic approach
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

occam’s razor

A
  • Occam’s Razor: the principle that the simplest explanation is the most plausible unless there is evidence that a more complicated explanation is necessary
  • For regression: you want a model with the smallest number of simple predictors that explains the observed data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

R squared

A

Proportion of variation in Y ‘explained’ by all X variables Taken Together.. always increases (or stays same) when a new X variable added to model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

simply maximizing r square will

A

lead to models that vastly overfit data

17
Q

model building

A

• Use specified X variables (chosen based on understanding of problem and data)
• Stepwise Regression
o Computer selects X variable most highly correlated With Y
o Continues to add or remove variables depending on SSE (forward selection or backwards elimination)

18
Q

forward selection

A

involves starting with no variables in the model, testing the addition of each variable using a chosen model comparison criterion, adding the variable (if any) that improves the model the most, and repeating this process until none improves the model.

19
Q

backward elimination

A

Backward elimination, which involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, deleting the variable (if any) that improves the model the most by being deleted, and repeating this process until no further improvement is possible

20
Q

multicollinearity

A
  • High correlation between X variables
  • Coefficients measure combined effect
  • Leads to unstable coefficients and potentially misleading conclusions
  • Example: BMI and weight in the same model
21
Q

detecting multicollinearity

A

• Examine Correlation Matrix
o Correlations between pairs of X variables are greater than correlations with Y variable
• look at scatterplots between all pairs of variables

22
Q

remedies for multicolinearity

A

Eliminate one correlated X variable