Missing data concepts and multiple imputation with Stata Flashcards

1
Q

What are missing data?

A

Observations that could have been made but were not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can values be missed?

A

By design (intentionally), or unintentionally

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In randomised controlled trials how are missing values most likely to occur?

A

In outcome variable(s)

  • that is in terms of statistical analysis models, the response variables have MVs, while the explanatory
    variables tend to be fully observed.
  • Randomisation groups, baseline values of the outcome variable, centres for multi-centre trials typically fully available
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In observational studies the explanatory variables (covariates) are just as likely to contain what?

A

Missing values as the outcome measures, that is:

  • the explanatory variable of interest may contain MVs
  • or the covariates included in the model for the purpose of explaining background variability or
    adjustment (confounders) may contain MVs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why can’t we simply analyse the observed data using an appropriate analysis method?

A

There are a number of potential problems:

  • Estimation method no longer valid
  • Loss of precision
  • Departure from the intention-to-treat principle (RCTs)
  • Lack of generalisability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the main issue with missing data?

A

Bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the best solution to dealing with missing data?

A

Avoid it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A valid estimator is one that is what?

A
  • unbiased for the parameter of interest
  • and its precision (standard error) can be quantified
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an example of providing a valid estimator?

A

If there were no missing data then fitting an ANCOVA model to a RCT that deployed a before-after design, would provide a valid estimator for the therapy difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a problem that affects the use of valid estimators?

A

In the presence of missing data analysis methods that would provide valid estimators for the complete data do not necessarily provide valid estimators when applied to the observed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What issue may still persist even under circumstances where a valid estimator can be obtained from the observed data?

What is an example?

A

This might not be the most efficient estimator.

For example, as implemented in most software packages, repeated measures (M)ANOVA uses only subjects for which the response has been observed at all time points.

  • Complete case analysis suffers a loss of precision since information from cases with partially observed multivariate responses (48%) is ignored
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do we ideally want?

A

An analysis method that provides valid inferences in the presence of missing data and uses all the available information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the intention to treat (ITT) principle refer to?

A

A type of analysis specific to RCTs, and states that all subjects should be analysed as part of the treatment group which they were originally assigned to, irrespectively of the level of treatment received and protocol adherence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the purpose of the intention to treat principle?

A

This advice is aimed at maintaining the benefits of randomisation, that is avoiding confounding of the group effect (=avoiding selection bias).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What leads to a departure from the ITT principle which can introduce selection bias?

A

Missing values

Example:
* less chronically mentally ill patients may be less likely to adhere to intensive management and are then more likely to be lost to follow-up in this group.

  • if more chronic cases also tend to have more psychopathology and intensive therapy is beneficial then the group difference will tend to be underestimated based on the observed data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is generalisability?

A

Extent to which study results apply to the target population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What can missing data affect?

A

The generalisability of the results from a trial or an observational study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an example of RCT?

A

Suppose the most severely ill were most likely to be lost to follow-up (in both randomisation groups)

Then the observed results would be representative of a population in which the less severely ill are over-represented

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the aim of data analysis?

A

Inference for a target population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

All data analyses are based on model assumptions about what?

A

target population

sampling process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What sampling method is typically used?

A

Random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When data are missing and analyses are based on observed data further assumptions are being
made for what reason?

A

To describe how the observed data came about

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Formally, the missing value generating mechanism is the probability of what?

A

Missing value pattern given the values taken by the (later observed or missing) observations

24
Q
  1. What does the probability of a missing value pattern not depend on?
  2. The observed data are in fact what and what is this mechanism also known as?
A
  1. Any observed or unobserved measurements or characteristics
  2. A random sample of the intended measures.

Examples:
* A lab sample is dropped.
* The interviewer overlooks a question by accident.

The mechanism is also known as uniform non-response.

A complete case analysis, albeit less precise, remains valid.

25
Q

P(miss) may depend on what?

A

Some observed characteristics, but conditional on these characteristics the probability of a missing pattern does not depend on unobserved data (in other words, it is MCAR or random within classes of the observed characteristics).

Example:
- Chance of missing a language test different in boys and girls

26
Q

What are MAR and MCAR referred to as?

A

Non-informative or ignorable mechanisms, because if MCAR or MAR holds, analyses can ignore P(miss)

27
Q

What does MCAR stand for?

A

Missing Completely at Random)

28
Q

What does MAR stand for?

A

Missing at random

29
Q

Even after considering the information in the observed data, the reason for a value being missing still depends on what?
.

A

The unseen observations.

Example:
Patients miss their hospital appointments because treatment has deteriorated their condition

Such a mechanism is referred to as an informative MV mechanism.

30
Q

No mainstream software exists to deal with MNAR data.

True or false

A

TRUE

31
Q

If the data are missing by design then what do we know?

A

The mechanism by which they were generated (MCAR or MAR).

32
Q

If the data are not missing by design then what do we have to choose between?

A

An informative and non-informative missingness mechanism on
theoretical grounds (subject-matter).

33
Q

We can observe all the variables that drive missingness under MNAR.

A

FALSE

We can never observe all the variables that drive missingness under MNAR.

34
Q

It not possible to determine empirically whether the mechanism by which MNAR is generated is informative or not

A

TRUE

35
Q

How can we look at departures from MCAR?

A

By assessing whether any (fully) observed variables are associated with the MV mechanism.

36
Q

What methods can be used to assess departures from MCAR?

A

Formal or informal methods may be used.

E.g. for a set of fully observed baseline variables

  • compare summaries of baseline variables between subjects with different MV patterns
  • plot respective summaries against MV patterns
  • model probabilities of MV patterns as function of baseline variables and test their effects

e.g. using a logistic regression model, whether the response is an indicator variable of missingness in the variable of interest, coded 1 for observed values and 0 for missing values.

37
Q

What is multiple imputation?

A

A three-step process which helps us to analyse data that are missing at random.

38
Q

Before we carry out multiple imputation, what do we have to find?

A

Fully observed variables that are correlated with the partially observed data (and maybe with P(miss)).

39
Q

What is step 1 of a multiple imputation?

A

We multiply impute (fill-in) the missing values with values randomly drawn from a distribution

  • We do this by deploying what is called the imputation model.
  • The imputation model relies on the correlates of the incomplete data.
  • We create several different multiply imputed datasets.
40
Q

What is step 2 of a multiple imputation?

A

Ee analyse each imputed dataset separately and obtain estimates for the quantities of interest

41
Q

What is step 3 of a multiple imputation?

A

We combine the multiple estimates from step 2

42
Q

What steps of analysis are involved in a multiple imputation?

A
  1. Setup
  2. Imputation
  3. Analysis
  4. Combining
  5. Postestimation
  • Importing
  • Data management
43
Q

What is involved in the setup of a multiple imputation?

A

Choose an mi style (how imputations are stored)
* wide
* mlong
* flong
* flongsep

Register variables
* mi register imputed bmi
* mi register regular attach smokes age hsgrad female

44
Q

What is the imputation step of a multiple imputation dependent upon?

A

Pattern and type of data.

There are different methods for univariate(situations where we only wish to impute one variable) monotone and arbitrary data.

45
Q

What variable type follows a univariate pattern and what imputation method is used during the imputation stage?

A

Continous- regress, pmm, truncreg, intreg

Binary- logit

Categorical- ologit, mlogit

Count- poisson, nbreg

46
Q

What variable type follows a monotone pattern and what imputation method is used during the imputation stage?

A

Mixture- monotone

47
Q

What variable type follows an arbitrary pattern and what imputation method is used during the imputation stage?

A

Continuous- mvn

Mixture- chained

48
Q

Multiple imputation using chained equations (ICE) is performed by what?

A

mi impute chained

49
Q

How are variables imputed using chained equations (ICE)?

A

Variables are imputed iteratively using conditional univariate imputation models - Conditional meaning that each of the variables to be imputed is regressed on a number of fully observed variables

Stata first imputes variable with fewer imputed variables

50
Q

What does inserting ‘regress’ before a variable we are imputing ensure?

A

Variable is imputed as continuous

51
Q

What does inserting ‘logit’ before a variable we are imputing ensure?

A

Variable is imputed as binary

52
Q

How can we analyse data in stata for an imputation?

A

mi estimate: estimation_command
* regress - Linear regression
* logit - Logistic regression
* poisson - Poisson regression
* stcox - Cox proportional hazards model
* glm - generalised linear model
* xtreg - Fixed- and random-effects linear regression
* mixed - Multilevel mixed-effects linear regression
* svy: Estimation commands for survey data

For a full list type help mi estimate

53
Q

What does Stata classify all commands that can be used after main analysis as?

A

Postestimation

54
Q

What are examples of post-estimation analysis?

A

Transformation

55
Q

What concepts do not have clear interpretation within multiple imputation framework and therefore are not directly applicable to multiple imputation results?

A

likelihood tests

56
Q

What are examples of post estimation analyses?

A

Transformation - use mi estimate and then need to calculate transformations within each imputed dataset.
Include as many transformations by calling transformation an arbitrary name e.g diff and then type in expression e.g difference between parameter estimate for smokes and bmi

Test whether more than one term are having a significant effect on our outcome when tested jointly