Stats 4 - Model Selection Flashcards
When creating a model, are all the model terms equally important?
No!
Model terms are NOTequally important!
For example, we may remove some explanatory terms which will result in a decrease in explanatory power of the model, but it may not make it significantly worse
What is the goal of model selection?
Simple as possible but at the same time not sacrificing too much explanatory power
“Everything should be made as simple as possible, but no simpler.”
Outline the general process of Model simplification.
- Start with maximal model –> the model that contains everything that might be important –> include all the terms & interactions that seem relevant biologically
- Simplify our maximal model towards the null model –> states that nothing is important/explains the response varaible
- But on the journey to the Null model, we reach a point somewhere in between where you can’t remove any further terms without making the model significantly worse: this is called the minimum adequate model.
Explain the different parts of the attached flow diagram.
Model selection is an iterative process
- Current model
- Make list of valid terms to drop (you can’t just drop any term)
- Remove the least significant term –> Term with the lowest explanatory power (Sum Sq)
- Creates new model
- Compare the current model to the new model using anova(Model1,Model2)
- Is it statistically worse/different?
F-test –> Linear model/AIC –> Non-Linear Model
a) No statistical signficance –> New model becomes new current model
b) Statistical significance (not good - significant reduction in explanatory power) –> remove the term from the list of possible terms that can be removed
7. Are there any more valid terms to drop?
Yes –> Continue simplification
No –> Minimum adequete model
How to construct the maximum model?
Make a model that includes all the explanatory power with all the different possible interactions
Example:
- Explanatory variables:
GroundDwelling (Categorical)
Trophic Level (Categorical)
Litter size (offspring produced at one birth) (Continuous)
Body Mass (Continuous)
- Response variable:
Genome size
What is a quick way to construct a model that only shows pairwise interaction?
To only include only pairwise comparisons - use the following command
y ~ (a + b + c)^2
When performing model simplification, what is the rule for the terms that you are allowed to drop?
Obviously, you only want to remove the non-significant terms
Rule –> You cannot remove a main effect or an interaction while those main effects or interactions are present in a more complex interaction.
So if we have a complex interaction which we want to keep, we cannot just simply remove its constituent main effects/interactions
Rule of Thumb –> Start by removing terms from the bottom of summary output
How do you know how much you can simplify your model?
Each time you drop a term –> the model gets worse since the sum of squares are no longer explained (ESS explains less of the observed variation) –> the remaining variables may compensate for the loss of explanatory power
Takeaway message –> Model gets a little worse? Its Okay! –> tiny amount explained by the removed term is not worth it –> makes the model unnecessarily complicated
But wait a minute?!?!? How do we know how much a tiny amount is???
Use the the F-Test (anova) –> If the F-Test shows significance (P<=0.05) –> there has been a significant reduction in explanatory power –> you should NOT remove the term
But…
If the F-Test shows NO statistical significance (P>0.05) –> then you can proceed to remove the term.
What does the drope.scope function allow you to do?
The drope.scope function is a function in R that tells you what terms you can drop from your model
Without having to completely re-write your model, what is a shortcut you can use to update your model?
Short cut for removing a term from our model using the update function.
Its like you are telling R what to change in function ‘f’ on either side of the ~ symbol. The dots in the code (. ~ .) mean ‘use whatever is currently in the response or explanatory variables’
Remember to rename your new updated model
Newname <- update(…..)
Otherwise you will lose your old model and you will NOT be able to compare
What should you always do after removing a term from your current model?
Run an ANOVA
To compare the Current and New model!
Breakdown the following ANOVA output that compares Model 1 and Model 2.
What are the two things you should consider when dropping a variable?
What term would you drop from the attached ANOVA output?
- Can you even drop that variable –> is the interaction/main effect present in any more complex interactions?
Check using drope.scope function or use rule of thumb - start at the bottom of the table
- Examine the sum of sq from the ANOVA (Model) output –> what main effect/interaction has the lowest sum of sq (least explanatory power)
What term would you drop?
logBM:Trophiclevel –> Lowest of Sum Sq