Chapter 11: Processing and cleaning the data Flashcards
When cleaning the data, what 3 files should you keep of the survey?
- Original raw dataset
- Cleaned eataset
- Enciphered (keyed) dataset
How to check for multivariate outliers?
- Conduct a regression analysis and use Mahalanobis distance
what are the 3 different missingness mechanisms?
- Missing completely at random (MCAR)
- Missing at random (MAR)
- Missing not at random (MNAR)
What is missing completely at random? (3)
- Probability of missing is the same for all cases
- Missingness does not depend on any data
- Missingness does not follow a pattern
What is missing at random? (3)
- Probability of missing is not the same for all cases and may depend on observed information
- Missingness depends on the data, but data does not depend on missing data (thus; missing data is irrelevant)
- Missing data follow a known pattern
What is missing not at random (2)?
- Probability is not the same and may depend on the missing information
- Missingness has a pattern, but pattern is not observed or unknown
With what types of missing data can you use imputations?
- MCAR or MAR
What are ways to treat outliers?
- Minimize influence by changing raw score to something less extreme (mean/mode/median)
- Delete the extreme cases from the analysis
What are 2 different imputation strategies?
- Impute the variable mean to the cases with missing results
- Use a regression analysis for imputation value
How can you check for strange patterns in categorical variables?
- Frequency distribution of 2 variables at the same time. Combination of value X on Y can be strange
How can you check for strange patterns in continuous variables? (3)
- Bivariate outliers, Mahalonobis distance
- Min-max, means and standard deviation
- Mediam or mode
What should be included in the codebook of a dataset (10)?
- Respondent / panel recruitment
- ID numbers
- Randomization variables
- Instruction text
- Vraiables
- Question text
- value and variable labels
- Routing
- Descriptives
- Screenshots of questionnaire
What is a bivariate outlier?
- An outliers that is an outlier because of the combination of values on both x- and y-space. The values in itself are not outliers, but their combination is
Name 3 different types of paradata and why it is useful to add those to your dataset?
- Duration of the survey - survey seriousness and understanding
- Duration of the question - question seriousness and understanding
- Brower or device - for detection of errors across different devices / operating systems