Chapter 11: Processing and cleaning the data Flashcards

1
Q

When cleaning the data, what 3 files should you keep of the survey?

A
  1. Original raw dataset
  2. Cleaned eataset
  3. Enciphered (keyed) dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to check for multivariate outliers?

A
  1. Conduct a regression analysis and use Mahalanobis distance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the 3 different missingness mechanisms?

A
  1. Missing completely at random (MCAR)
  2. Missing at random (MAR)
  3. Missing not at random (MNAR)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is missing completely at random? (3)

A
  1. Probability of missing is the same for all cases
  2. Missingness does not depend on any data
  3. Missingness does not follow a pattern
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is missing at random? (3)

A
  1. Probability of missing is not the same for all cases and may depend on observed information
  2. Missingness depends on the data, but data does not depend on missing data (thus; missing data is irrelevant)
  3. Missing data follow a known pattern
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is missing not at random (2)?

A
  1. Probability is not the same and may depend on the missing information
  2. Missingness has a pattern, but pattern is not observed or unknown
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

With what types of missing data can you use imputations?

A
  1. MCAR or MAR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are ways to treat outliers?

A
  1. Minimize influence by changing raw score to something less extreme (mean/mode/median)
  2. Delete the extreme cases from the analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are 2 different imputation strategies?

A
  1. Impute the variable mean to the cases with missing results
  2. Use a regression analysis for imputation value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can you check for strange patterns in categorical variables?

A
  1. Frequency distribution of 2 variables at the same time. Combination of value X on Y can be strange
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can you check for strange patterns in continuous variables? (3)

A
  1. Bivariate outliers, Mahalonobis distance
  2. Min-max, means and standard deviation
  3. Mediam or mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What should be included in the codebook of a dataset (10)?

A
  1. Respondent / panel recruitment
  2. ID numbers
  3. Randomization variables
  4. Instruction text
  5. Vraiables
  6. Question text
  7. value and variable labels
  8. Routing
  9. Descriptives
  10. Screenshots of questionnaire
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a bivariate outlier?

A
  1. An outliers that is an outlier because of the combination of values on both x- and y-space. The values in itself are not outliers, but their combination is
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name 3 different types of paradata and why it is useful to add those to your dataset?

A
  1. Duration of the survey - survey seriousness and understanding
  2. Duration of the question - question seriousness and understanding
  3. Brower or device - for detection of errors across different devices / operating systems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly