Causal inference Flashcards

1
Q

Explain the consistency assumption

A

That you can generalize from observed data to potential outcomes: E(Y|A=a, X=x) = E(Y^a|A=a, X=x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the ignorability assumption

A

That you have no unmeasured confounders, meaning that we have controlled for all relevant confounders such that the treatment indeed is randomized. For example (in an extremely simplified model where the only confounder X is age), older people are more likely both to get a medicine and to die within the next 5 years (target Y). But if you controll for age, which patients that get the medicine becomes randomized, and you can draw causal conclusions about its effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the positivity assumption

A

That every possible combination has to be represented in the observed dataset: P(A=a|X=x) > 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is incident user design?

A

Focusing on “new initiaters”, that is, users that just started on a treatment. Then it’s easier to understand the effect compared to a case where you e.g. did yoga for 70% of the time the last two years.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a great benefit of active comparator design?

A

You reduce the amount of confounding. If you for example study two groups that started doing yoga and zumba, you can expect those groups to be relatively similar. However, then you do not answer how well yoga works compared to nothing; you compare yoga with zumba.

This is almost like in DFT physics where you use relative energies because you have no clue what’s going on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define confounding variables.

A

Confounding variables are variables that effect both the treatment (A) and the outcome (Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are chains, forks and colliders (in paths terminology)?

A

Chain: A -> G -> B
Fork: A B
G is a collider here: A -> G

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain how you can block the following path by conditioning (i.e. controlling, or “fixing the variable” in more intuitive terms):

A -> G -> B, where
A is the weather,
G is “icyness” of sidewalks,
B is whether people fall on the side walk

A

With no conditioning, there will obviously be an association between A and B: people will fall more on the sidewalks in the winter.

However, if you condition on G, you condition on the reason why people fall, and hence you block the association between A and B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain what happens if you condition on G in the following DAG:
A -> G

A

G was a blocker, but when conditioning on it, you open up a new, associated path between A and B. For example, A and B can be light bulb switches based on coin flip, while G can be the light shining if both A and B are on. In this case, A and B will be correlated once you control for G.

One lesson here is that you have to be careful with what you control for.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define d-separation between nodes A and B.

A

A and B are d-separated by a set of nodes C if it blocks every path from A to B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define a backdoor path

A

Paths between treatment (A) and outcome (Y) that travel through arrows going into A, e.g. A Y.

Here, it is important to control for X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two parts of the backdoor path criterion

A
  1. All backdoors are blocked
  2. No descendents are controlled for (i.e. that one opens up paths by controlling).
    The set of confounding that satisfies these conditions are not unique, i.e. more stuff works.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the disjunctive cause criterion?

A

Control for all variables that causes the treatment (A), the outcome (Y) or both.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain matching.

A

One tries to make observational data look like a randomized study.

Let’s say age is the only variable. Much more old than young people will get a treatment. By matching, you exclude a lot of the data, to make sure there is a 50-50 distribution of A=1 and A=0 for each age group.

After this, outcome analysis becomes very simple; it feels a lot like a randomize trial.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain stochastic balance and fine balance in terms of matching.

A

Stochastic balance is minimizing the absolute difference between treated group and matched group.
Fine balance is minimizing the differences of the average of the covariates in the two groups (I think of this intuitively as bias)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you solve the issue of outliers when computing the Mahalanobis distance? (by using so-called robust distance measure)

A

Instead of summarizing the total distance, you can summarize the rank of each individual distance.

17
Q

Explain the bias-variance tradeoff in pair matching vs many-to-one matching.

A

In pair matching (i.e. 1 treated 1 control) you have low bias, but high variance because of small amounts of data.

In many-to-one matching (i.e. 1 treated k control) you have low variance (due to more data), but high bias because as you match to several controls, the matching inevitably gets less precise.

18
Q

What is hidden bias?

A

Two conditions need to be met:

  1. Imbalance in unobserved variables
  2. These variables are confounders
19
Q

What is the point of sensitivity analysis?

A

To figure out how sensitive the conclusions (e.g. rejection of null hypothesis) are to the following assumption:
that two patients with matching (here, identical) confounding variables X should have identical probability of getting the treatment.

20
Q

Define the propensity score.

A

The probability of receiving treatment given controlling variables (X).

21
Q

What is meant by trimming the tales with respect to propensity scores?

A

There might be people in the control group with extremely low propensity scores (i.e. extreme low probability of treatment). Also, there might be people in the treatment group with very high propensity scores.

These groups may hard to match properly, hence the positivity assumption is violated. This can be fixed by “trimming the tales”, that is, excluding this data.

22
Q

What is Inverse Probability of Treatment Weighting?

A

You weight your treatment group by its inverse probability of treatment, and your control group by its inverse probability of not getting treatment.

Use the propensity scores for this.

Then you end up with a pseudo-population resembling a randomized trial, where subjects have 50 percent chance of treatment.

This pseudo population can be used to perform regression on potential outcomes directly.

23
Q

Explain bootstrapping

A

Select random elements (with replacement) from sample. Calculate statistics from this subsample (e.g. mean). Redo random sampling many times, and re-calculate the mean; the standard deviation of the mean tells something about the sample error.

24
Q

Explain three ways to deal with large weights in IPTW?

A
  1. Manually check large weight data points; can there be some data error there
  2. Trim the tails of the propensity score (<2 U >98)
  3. Weight truncation: set all weights above a maximum allowed value (e.g. 99th percentile) to the maximum allowed value. This increases bias, but reduces variance.