Screening Flashcards
Ungrouped Data = ?
multiple regression, canonical correlation, factor analysis, or structural equation modeling
Grouped Data = ?
analysis of covariance, multivariate analysis of variance or covariance, profile analysis, discriminant analysis, or multilevel modeling
What can Values be?
Out of range plausible values coding accuracy (frequencies table)
Missing Data?
suspect it-test it
Missing Data how to test?
use a dummy variable and test between group diff – anything less than 5% missing yields same result from any method of recovery depending on size of data set.
Types of missing data
1) MCAR (Missing completely at random – i.e. unpredictable, independent of other variables in study)
2) MAR (missing at random, ‘ignorable non-response’ – predictable from other variables – e.g. patients might miss questionnaire)
3) MNAR (missing not at random or ‘non-ignorable’ – missingness is related to the variable – serious bias of results, e.g. patients less likely to complete qu. Because of score)
Types of tests for missing data?
1) SPSS MVA (missing value analysis) – finds patterns and replaces values. T-test produced to test if missingness is related to any other variable on cases with >5% missing data. This tests MCAR, MAR & MNAR. ‘Little MCAR test’ = if not sig assume MCAR; if sig and missingness is relted to other IVs (not DV) then MAR. If sig with DV, then MNAR.
how to deal with missing data ?
1) Omission / Deleting (still report) – Ok if (<5%) with MAR/MCAR & dataset mod to large. Problematic if small, experimental design, or MNAR.
2) Prior knowledge – Impute or fill in using expert prior knowledge. Could use dichotomous downgrade (high vs low).
3) Impute Mean Substitution – Easy, conservative, but reduces variance in data set (because the mean is closer to itself than to the missing value it replaces, and the correlation the variable has with other variables is reduced because of the reduction in variance)
4) Impute Regression – More sophisticated. Other IVs used to write regression equation for variable with missing data (DV). Sometimes repeated regressions, using predicted values from REG1 as new DV. Reduces variance (values closer to the mean) and inflates relationship (scores fit better than they should) between IVs. Relies on good relationship between DV and potential IVs. Can use SPSS MVA for this.
5) Expectation Maximisation (EM) – Possible for MCAR/MAR – EM assumes normal distribution and makes missing data correlation matrix for the partially missing data. It bases inferences about missing values on the likelihood under that distribution. 1) finsing conditional expectation given all data and estimates of parameters 2) maximum likelihood – values inputted iteratively. Biased as does not add error to imputed data (biased standard error).
6) Multiple Imputation – No assumptions about randomness. Complex in SPSS. Can be applied to longitudinal data and time-series and retains sampling variability.
once dealt with missing data, what good to do?
Contrast different methods – do not base method on outcome
types of outliers?
univariate & multivariate
what/how to test for univariate outliers?
1) Standardise variable – absolute values z = > 3.29 (.1% sample)
2) Histograms / boxplots
3) Outlier decision should be independent of results
4) Tackling univariate first should limit multivariate
what/how to test for multiivariate outliers?
1) Best with formal stats methods
2) Mahalanobis Distance (MD) = distance of a case centroid (intersection of variable means)
3) MD tested using chi sq distribution (X2) with conservative alpha (p
what is Mahalanobis Distance ?
1) distance of a case centroid (intersection of variable means)
what is leverage ?
1) Leverage = similar to MD but cant use same sig tests with X2 as on diff scale. Leverage is how far out (but can still be in line)
what is discrepancy ?
1) Discrepancy = is how out of line.