Week 4 - Preparing for Analysis Flashcards
What two bits of information should every research paper include in terms of missing information?
- The extent and nature of missing data
2. The procedures used to manage the missing data, including the rationale for using the method selected
What are three patterns of missingness when it comes to missing data?
MCAR - missing completely at random
MAR - missing at random (a dummy variable to used to determine this)
NMAR - not missing at random (when there is a pattern - nonignorable nonresponse)
What is listwise deletion?
Cases with any missing values are deleted from analysis (complete case analysis)
What is pairwise deletion?
The maximum amount of available data is retained. Cases are only excluded from operations which missing data is required (available case analysis)
What is mean substitution?
Missing values are imputed with the mean value of that variable (this method reduces the variance of the variable, which also attenuates covariances that the variable has with other variables).
What is regression substitution?
A regression equation based on the nonmissing data is use to predict expected values for the missing data (its a best guess but produces biases in the variances and covariances).
What is the difference between stochastic and nonstochastic imputation methods?
Stochastic means having a random probability distribution or pattern that may be analysed statistically but may not be predicted precisely.
What is pattern-matching imputation (two types)?
Two types
- Hot-deck: values are imputed by finding participants who match the case with missing data on other variables
- Cold-deck: a variation of the above where information from external sources is used to determine the matching variables.
What is stochastic regression?
A random value is added to the imputed predicted value. (reduces biased variance estimates)
What is expectation maximisation (EM)?
Two steps
- Values for the parameters are obtained with available data. Regression methods are used to impute, on the basis of these initial values.
- After this, new values for the parameters are calculated with the newly imputed data along with the original observed data. The process repeats until the estimates changes very little from one iteration to the next.
What is maximum likelihood?
Strategies where observed data are used to estimate parameters, which are then used to estimate the missing scores.
What is multiple imputation?
Several imputed data sets are created. Analysis is carried out on the data sets with parameter estimates. Final results are obtained by averaging the parameter estimates across the multiple analyses. These are then used to calculate construction of confidence intervals around the parameter estimates.
What is full information maximum likelihood (FIML)?
It estimates parameters on the basis of the available complete data as well as the implied values of the missing data given the observed data.
What is central limit theorem? (CLT)
As your sample size becomes bigger, the closer we get to a normal distribution.