Midterm Exam Flashcards
Factor analytic (measurement) model
- Is a specified model relating constructs (i.e., factors or latent variables) to measures consistent with the observed data?
- Measurement model relating factors to measures; no directional effects between factors (ex: EFA, CFA, bi-factor models, multitrait, multimethod (MTMM) models)
Path analytic model
- Are the hypothesized directional relations between measured variables consistent with the observed data?
- Directional effects among measures; no latent variables are included
Full structural model
- Are the hypothesized directional relations between constructs consistent with the observed data?
- Includes measurement model and structural effects between factors
- This is a combination of characteristics of path analytic models (directional relations between constructs) and factor analytic/measurement models (relations of constructs to measures).
Model specification
Specification of models and hypotheses using words, model diagrams with appropriate symbols from the Bentler-Weeks notation system (e.g., F1, g1, V1, E1) and equations
Advantage of SEM over regression/ANOVA models
SEM includes measurement error of observed vars in the model, whereas regression/ANOVA assumes observed vars are measured without error (unrealistic)
Variance, covariance, correlation
Variance: statistical test of individual differences
Covariance: statistical test of how 2 vars covary
Correlation: standardized covariance between 2 vars (rXY= covXY/sXsY)
What is the null hypothesis in SEM?
Null hypothesis suggests perfect fit of specified model to data. Therefore, we DO NOT want to reject the null hypothesis.
Fit is tested through chi-square
- p-value < .05 = We reject our null hypothesis of perfect fit.
- p-value > .05 = fail to reject the null hypothesis
Compare/contrast
JKW/LISREL and Bentler-Weeks notation models
JKW/LISREL:
- characterized by 8 basic matrices containing model parameters symbolized using Greek letters
- comprises structural models and measurement models
- distinguishes between exogenous and endogenous latent vars
Bentler-Weeks:
- parameters found in only 3 matrices
- uses VFED system (Variable, Factor, Error, Disturbance) to name vars and residuals; expressed by equation for each DV and covariance matrix among IVs
- measure vars (Vs) and latent vars (Fs) are handled similarly
Mathematical equations for DVs in a CFA model (containing 2 uncorrelated factors, 6 vars)
n = Y E (Greek symbols)
V1 = g1F1 + E1 V2 = g2F1 + E2 V3 = g3F1 + E3 V4 = g4F2 + E4 V5 = g5F2 + E5 V6 = g6F2 + E6 + 0F1 + 0E1 +0E2 + 0E3 + 0E4 + 0E5 (this last one is expanded form)
n = Y E (Greek symbols)
n (eta) = matrix containing DVs (V’s)
Y (gamma) = matrix containing weight parameters (g’s, 1’s, and 0’s)
E (epsilon) = matrix containing IVs (F’s and E’s)
What are the g’s in the Y(gamma) matrix?
g’s are the weights applied to the factors to produce measured vars
What should be noted about the last columns of a weight matrix?
They are generally the weights applied to the errors to produce measured vars and produce an identity matrix (i.e., will have 1’s down the diagonal of the matrix with 0’s in the off-diagonal space)
What Greek symbol represent the covariance matrix for IVs?
Phi (circle with line vertically in middle)
What are parameters of a SEM?
- Variances of exogenous vars (i.e., F’s and E’s)
- Covariances of exogenous vars (i.e., F’s and E’s)
- Weights representing directional effects specified in the model (i.e., g’s, 1’s, 0’s)
What is NOT considered a model parameter?
Variances and covariance of measured vars
Define ULI and UVI and what they do
ULI and UVI are constraints that scale the factors (i.e., methods of setting metric of latent vars)
ULI constraint: metric of factor is set by fixing the first loading to 1
UVI constraint: metric of factor is set by fixing the variance of the factor to 1 (thus standardizing factor)
Equation for model df
What does it consist of?
df = v(v + 1) / 2 - (# of free parameters)
v(v + 1) / 2 = variances and covariances
What are estimated (free) parameters?
- factor variances
- factor covariances
- factor loadings
- error variances
What are free parameters?
Parameters we wish to estimate and are NOT constrained
Fixed parameters/constraints
What is the issue with too many constraints?
Parameters constrained to equal
- 0 (as in a path excluded from the model)
- 1 (as in a factor variance or factor loading used to set the metric)
- X value (do this comparing a model across groups)
Constraints produce some lack of fit
Independent (exogenous) vars
Dependent (endogenous) vars
- Exogenous latent variables are independent variables
- Endogenous latent variables are dependent variables in that they are predicted by exogenous latent variables and/or other endogenous latent variables
what are the model parameters in a CFA and in which matrices do we find them?
- Variances of exogenous vars (i.e., F’s and E’s)
- Covariances of exogenous vars (i.e., F’s and E’s)
- Weights representing directional effects specified in the model (i.e., g’s, 1’s, 0’s)
- Covariance matrices
What is the equation for the t rule?
t < or = p (p + 1) / 2
t = freely estimated parameters
p = measured vars (v’s)
p (p + 1) / 2 = unique variances and covariances among measured vars
same components as df equation
t-rule is necessary but not sufficient identification condition
What is identification?
Mathematically identifying a unique solution for the model parameters. Starts with scaling factors (ULI or UVI); check t-rule; then mathematically test for unique solution for parameters (2-indicator and 3-indicator rule).
2-indicator rule (for CFA)
For a multifactor, standard factor analytic model, if the following conditions are met, they are sufficient to identify a model:
(a) at least two variables are a function of each factor,
(b) each variable is a function of one and only one factor, and
(c) each factor has at least one non-zero covariance with another factor.
3-indicator rule (for CFA)
Three-indicator Rule: For a multifactor, standard factor analytic model, if the following conditions are met, they are sufficient to identify a model:
(a) at least three measured variables are a function of each factor, and
(b) each measured variable is a function of one and only one factor.
Identified vs. unidentified model
Identified = we can uniquely solve for all model parameters Unidentified = we cannot solve uniquely for all the model parameters, because multiple solutions exist.
Empirical underindentification
A model is mathematically identified, but characteristics in the observed data prevent us from obtaining unique values for model parameters.
Ex: What if V3 is not correlated with V1 and V2 in our data? The covariances between V3 and the other two variables are not useful in solving for model parameters. The model is empirically underidentified.
Null β Rule
- Way to mathematically identify path analytic (or full structural) models
- If all elements in the β matrix are zeros, the model is identified. The β matrix contains all zeros if no dependent variable is a function of any other dependent variable (e.g., a multiple regression model). The errors for these models may or may not be correlated.
Recursive Rule
- Way to mathematically identify path analytic (or full structural) models
- If a model is recursive, it is identified. A recursive model has no reciprocal effects (direct feedback), feedback loops (indirect feedback), or error covariances. A model is recursive if the upper-right triangle of the beta matrix contains all zero values for variables that have been ordered in the standard manner and the covariances among the errors are zero.
Purpose of estimation
Calculate the best estimates for the model parameters such that values in the residual matrix are minimized
Goal of Maximum likelihood estimation (F.ML)
Find parameter values that have the greatest probability of producing the sample data.
Assumptions of ML estimation
- independent observations
- large sample size
- correctly specified model
- multivariate normal data
Iterative procedures in ML estimation
- Start values provide first guesses about model parameters; can be generated by the software or provided by the user.
- Log-likelihood is computed using these starting values in the density function.
- Parameter values are then adjusted such that the log-likelihood value is increased.
- The process continues until the increases in log-likelihood value cease (i.e., they do not increase beyond the set convergence criterion), yielding the maximum likelihood parameter estimates.
Reproduced covariance matrix
- Used in ML function to approximate the sample covariance matrix based on the factor model and the estimated parameter
Residual matrix
= the difference between the sample covariance matrix and the reproduced covariance matrix based on the model parameters
- We want to minimize values in the residual matrix in ML estimation (as close to 0.0 as possible)
under what conditions might our measures have nonnormal distributions?
- nonnormality of continuous vars (i.e., skewness, kurtosis)
- coarsely categorized (discrete) data (ex: dichotomous data [yes/no] and Likert-type data)
- all of the above
what are the implications of nonnormality for our choice of estimation procedure?
- nonnormality can alter the estimation of standard errors and chi-square statistic
- we would have to use MLM (Satorra-Bentler chi-square), MLMV, or MLR (only one to deal with missing data)
Evaluation of fit includes…
Global fit: the model perfectly reproduces the variance-covariance matrix among the measures (e.g., model test statistic [chi-square])
Local fit: assessing the individual parameter estimates to see if they are in the expected magnitude and direction (e.g., approximate fit indexes)
Model chi-square test
test of exact fit; likelihood ratio test
RMSEA
- root mean square error of approximation
- a parsimony adjusted index that assesses fit as a fx of degrees of freedom
- the larger the df, the smaller the RMSEA; thus, parsimony correction is reduced in strength with larger sample sizes
RMSEA thresholds
< .05 = close fit of model to data
.05 - .08 = fair fit
.08 - .10 = mediocre fit
> .10 = poor fit
CFI
- comparative fit index
- an incremental fit index that estimates the relative improvement in fit of model over a baseline model (i.e., the null model which assumes zero covariances among measures)
- CFI > or = .95
SRMR
- standardized root mean square residual
- the mean absolute correlation residual
- SRMR < or = .08
AIC
- Akaike information criterion
- predictive fit index that assesses fit in hypothetical replications of the same size randomly selected from the same population
- model with the smallest AIC is preferred as the most likely to replicate
Parsimony
Idea of the simplest model that is able to best explain the data using the least number of parameters
Implications of Type I and Type II errors and power in evaluating fit
In SEM, the framework of hypothesis testing does a better job guarding against Type
I errors than it does guarding against Type II errors
- more power = greater precision in model estimates
Nested vs. non-nested models
- Nested can be evaluated using chi-square difference test, LM tests, and Wald tests
- Non-nested can be compared with information criteria (AIC and BIC)
chi-square difference test
- If one model (more constrained or restricted model) can be derived by imposing one or more constraints on a second model (less constrained or full model), then more constrained model is nested within the less constrained model.
- A type of specification search
- Assumptions: multivariate normality of vars, large sample size
robust chi-square difference tests
MLM, MLR, MLMV
Why are specification searches conducted?
- to add parameters to improve model fit
- to delete parameters to produce more parsimonious models
Wald tests
- a specification search used to delete parameters to produce more parsimonious models
- evaluates whether a model would produce poorer fit in the population if one or more of the free parameters were fixed (non-sig Wald test = no significant loss of fit when parameter is deleted from model, which indicates the parameter can be constrained)
Modification indices/LM tests
- Lagrange Multiplier
- a specification search
- evaluates whether a model would show improved fit in the population if one or more parameters were freed (non-sig LM test = no significant improvement in fit when parameters are added to the model)