Week 4 : Item Non-Response and Imputation Methods Flashcards

1
Q

What are the 3 different reasons as to why item non-response may occur?

A
  • Respondent = answer is not known, refusal or even an accidental skip
  • Interviewer = does not ask question or does not record response
  • Processing = response rejected at editing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In which variables is item non-response the highest?

A

Financial variables and for derived variables, e.g. total household income from all sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the issue with complete-case analysis?

A
  • Complete case analysis deletes all units with incomplete data (in the variables involved)
  • It is inefficient
  • It is problematic in regression comparing models
  • May give biased estimates and invalid inferences.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

List the 4 imputation methods.

A
  • Mean imputation
  • Deterministic imputation
  • Hot deck imputation
  • Model-based imputation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does imputation do?

A

Reduces bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does simple mean imputation work?

A

Impute all missing values of y by respondent sample mean of y, if y is continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does simple mode imputation work?

A

Impute by respondent sample mode if categorical variable e.g. number of cars.
- Or create a missing value category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List the issues with mean imputation.

A
  • Associations tend to be diluted by pulling estimate of correlation toward zero
  • Distorts distribution
  • Variance will be wrongly estimated (typically underestimated) if the imputed values are treated as real
  • Thus inferences will be wrong too.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does deterministic and logic-based imputation work?

A
  • Impute using logical rules:
    e.g. Age = 9, so deduce marital status = single
    Y1 = no. of dependent children, Y2 = no. of non-dependent children, Y3 = no. of children
    Y1 + Y2 = Y3
    If Y1 and Y2 are missing can deduce value from Y3
  • Last observation carried forward
    > Specific to longitudinal data
    > Replace by the last observed value but is problematic for variables that can change (e.g. income)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does hot-deck imputation work?

A

Replace missing values with the last observed value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is sequential hot deck?

A
  1. Records are ordered
  2. Impute value from previous record in the same class (needs some starting values)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

List the 3 issues with sequential hot deck.

A
  • If few classes, limited control
  • If many classes, often multiple use of donors
  • Choice of starting values is important
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the alternative hot deck method?

A

Hierarchical hot deck

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does hierarchical hot deck work?

A
  • Sort file of respondents hierarchically by variables
  • Then match as many variables as possible, making final choice random
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the problem with most imputation methods?

A

They do not reflect sampling variation and uncertainty about regression coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does multiple imputation reflect sampling variation?

A

Creates several (e.g. five) imputed values for each missing value, each of which is predicted from a slightly different model and each of which also reflects sampling variability.

17
Q

When should we use hot deck imputation?

A

For categorical variables with few missing cases.

18
Q

When should we use random regression imputation?

A

For variables with substantial rates of missing (>10%), especially continuous variables.

19
Q
A