Lecture 10: Nested models Flashcards
Nested models
Refer to models that are identical, expect for the facts that some parameters are constrained to zero (i.e., fixed to zero) in one of them, while parameters are free in the other.
The smaller/constrained model is said to be “nested in” the larger or constrained model. By definition, the unconstrained model always provides a better fit than the constrained model.
Nested models help us figure out if adding more features or complexity to a model actually makes a meaningful difference in how well it describes the data. If the improvement is significant, it suggests that the added complexity is justified; if not, sticking with the simpler model might be more reasonable.
Incremental F-tests
Used to determine whether the increase in R-squared (the proportion of variance in the dependent variable explained by the predictors in the model) between two nested models is statistically significant. The incremental F-test compares the variation in the dependent variable explained by an unconstrained model with the variation explained by a constrained model.
The F-test assesses whether the increase in R-squared is greater than what would be expected by chance alone, indicating the significance of adding additional predictors to the model.
Hierarchical regression
Refers to an approach where predictors are added to a regression model in blocks, allowing us to assess the additional variance explained by each block of predictors with an incremental F-test.
Degrees of freedom
A counter that keeps track of how many parameters you could theoretically still estimate, given the dataset of this size. It is a way to keep track of how many parameters you can fit to a dataset of a given complexity.
In which cases do you use hierarchical regression?
- To determine whether theoretically relevant factors explain significant variance above and beyond demographic characteristics
=> E.g., you want to predict college achievement based on high school GPA while controlling for demographic factors
1) Block 1: Demographic factors (age, sex, neighbourhood quality, SES)
2) Block 2: GPA - You determine whether a previously neglected factor explains additional variance
=> E.g., you want to show that your new scale of morality explains more variance in certain behavioural outcomes (e.g., “donating to charity”) than an existing scale of morality
Block 1: Add all sub scales of the old morality scales
Block 2: Add all sub scales of the new morality scales - To test the overall effect of adding a single categorical predictor that is represented by multiple dummy variables
Formula for the incremental F-test
F = [ (SSRu – SSRc) / (df1u – df1c) ] / SSEu / df2u
SSRu - SSRc = increase in variance explained by the model
df1u – df1c = increase in the number of parameters
df2u = degrees of freedom for the residuals of the unconstrained model