model estimation & fit Flashcards
what is estimation
we need to find values for the model parameters such that πΊ and πΊ are as similar as possible β using Maximum Likelihood estimation
the discrepancy between πΊ(π½) and πΊ is operationalized by
he fit function
fit function his expression is derived based on the assumption of
multivariate normality y βΌ N (0 ,Ξ£)
When does the fit function yield the lowest value
when the model-implied and sample covariance matrices are identical
A value of 0 for the fit function means
that the model implied covariance matrix reproduces the observed covariance matrix perfectly
β this means our model is just identified!
β which means no meaningful way to asses model fit!
the Maximum Likelihood (ML) approach is
- robust to violations
- parameter estimates will be correctly obtained
- but standard errors and model fit may be affected
we have to alternatives estimation approaches related to ML
Satorra-Bentler β MLM
why is ML not enough
the optimization can only provide the best set of values it can find for our model parameters
- we still need to assess the quality of the model specification and estimated parameter values
- to determine how well they explain observed covariance matrix
- this is where model fit and fit indices come into play
model fit =
- evaluate the fit of the specified model given the estimated parameter values
- checking whether there is evidence of model misspecification
The $F_{ml}$ is proportional to the likelihood ratioβ¦
- the likelihood of the specified (hypothesized) model divided by the likelihood of the saturated model
- i.e., difference in the log case
The $F_{ml}$ tells us
- how well the specified model fits compared to the best possible fit β i.e., the saturated model
- we translate it into a summary test statistic that is central to model fit
a value of $F_{ml}$ = 0 is unlikely since we
-
do not know the population covariance matrix β we only have a sample πΊ
even if our model is correctly specified β $F_{ml}$ β 0 due to sampling error - are not interested in the saturated model β we usually want positive degrees of freedom
T test statistic formula
T = n*Fml
T test follows chi square distribution if
- if the model-implied covariance matrix πΊ = the population (true) covariance matrix
- then, π has a central $π^2$distribution with as many degrees of freedom as the
specified (hypothesized) model
Types of model fit
Exact, close
For exact fit, we want Chi square value to beβ¦
LOW, we do not want to reject H0
caveats of exact fit
- for small sample sizes we have a poor approximation of the $π^2$distribution
- large sample size test is overpowered
Two models we compare our model with
baseline, only item variances
Saturated = DF = 0, T = 0
types of fit indices
Incremental (compare to baseline) - How much better does the specified model fits compared to the baseline model? - CFI
absolute (compare to saturated) - how close the specified model $π_m$ is to the saturated model - RMSEA
incremental fit indices rules
CFI, NFI, -> the higher the better!
between 0 and 1
.95 = good
.90 = acceptable
absolute fit indices
RMSEA = measures the amount of misfit per degree of freedom. allows us to quantify if the specified model is close to the saturated model
- smaller values are better
proposed benchmarks β i.e., rules of thumb- < 0.5 Γ very good fit or close fit
- . 05 β .08 Γ good fit or fair fit
- . 08 β .1 Γ mediocre fit or good
- . 05 β .08 Γ good fit or fair fit
- > .10 Γ poor or unacceptable
Two RMSEA tests
Exact fit - RMSEA = 0 ( same as T test statistic)
close fit - RMSEA = good 0.05
sig -> not good
poor fit - RMSEA = 0.08
sig -> good!
SRMR
- compares πΊ and πΊ based on the residuals
- it is the average of the squared values in the residual correlation matrix
a cutoff value < 0.08 is considered a good fit
Goodness of fit
how much variance in πΊ is explained by πΊβ¦
the proportion of variance in the sample covariance matrix πΊ accounted for by the model-implied covariance matrix πΊ
- takes values in the range 0 to 1
- higher values are better
- > .90 is considered acceptable fit
Select competing models based onβ¦
Nested -> Likelihood ratio test
NOT nested -> information criteria
Types of models to compare
Simple model (more restrictions)
vs.
complicated model (more general, less restrictions)
complicated model has
additional parameter, so 1 DF less.
Which model always fits better? simple or complicated (general) model?
general model!
what if simple and complicated model have similar fit
If they indicate similar model fit β we prefer the simpler model
what are information criteria
Information criteria balance model fit and complexity
by weighing the log-likelihood function with model complexity
AIC, BIC ->
what do we want AIC and BIC to be?
we prefer models that have lower information criteria values
model modification
- mi
=
- epc
=
β modification index β the expected decrease in the π test statistic
expected parameter change β the approximate value for the added model parameter
- we can think of π½ as
- a vectorized version of the full model specification of πΊ
- πΊ(π½) contains
- only the free model parameters for which we want to estimate values