Chapter 10: Introduction to causality Flashcards

1
Q

When were Bayesian networks introduced?

A

In 1985 by Judea Pearl

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain how Bayesian networks are represented

A

A bayesian network is a tuple (G,P(· | ·)) where G is a DAG with vertices X1, …, Xn and P(· | ·) is a family of conditional probability tables on Xi give pa(Xi) (for all i). pa(Xi) is the parent of Xi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When marginalizing probabilities in a Bayesian network, what steps should you follow to calculate the probabilities.?

A
  1. Write down the outcomes that you are interested in
  2. Use the chain function to split the outcomes into separate probabilities.
  3. Omit the terms that are no outcomes you are interested in or when they are no ancestors of the outcome you are interested in.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the caveat when it comes to correlations in studies.

A

Just because two variables are correlated, it does not necessarily mean that one variable causes the other. There are several reasons why correlation does not imply causation:
1. Third variable problem: A third variable may be causing both of the correlated variables, creating a spurious correlation.
2. Reverse causality: The correlation may be caused by the effect variable influencing the cause variable.
3. Confounding variables: The correlation may be caused by other variables that are not controlled for in the study, creating a spurious correlation.
4. Non-random sampling: The correlation may be caused by the way the sample was selected, rather than any inherent relationship between the variables.
5. Time-lag: The correlation may be caused by the time lag between cause and effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain Simpson’s paradox.

A

Simpson’s Paradox is a statistical phenomenon where a trend that is present in several different groups of data disappears or reverses when the groups are combined. It can happen due to confounding variables, selection bias, or data aggregation. It doesn’t invalidate the data but highlights the need for careful analysis and interpretation of the data. By using different variables in the model, one can reach different conclusions.

For example, consider a study in which the average test scores of students from two different schools are compared. School A has a higher average score than School B, but when the scores are broken down by gender, it is found that the girls at School A have a lower average score than the girls at School B, and the boys at School A have a higher average score than the boys at School B. The overall trend of higher scores at School A disappears when the data is separated by gender.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Give a practical definition of causality

A

X causes Y if and only if changing X leads to a change in Y, keeping all else constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How would we ideally like to test for causality in an experimental setting?

A

Ideally we want to have two versions of the world. In one we change X so that we can observe it’s effect on Y and keeping all else the same. In the other we would not change anything at all. We can then compare the results of changing X, to not changing X on Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name and rank three methods on how we can practically test for causality in an experimental setting. Also provide a downside of each.

A
  1. Randomization: Randomly assign individuals to new and old worlds so to compare the results later. Possibly risky, unethical: what if the new world is really bad? Or really much better? An alternative would be to use a multi-armed bandits approach, meaning assigning most individual to the best known world, but keep the randomization
  2. Natural experiments: A research design that uses an existing real-world situation to test for causality. Unlike traditional experiments, which are typically conducted in a laboratory or controlled environment, natural experiments take advantage of naturally occurring variations in the environment, policy changes, or other events to study the effects of an intervention. A downside is the lack of control over the variables and setting.
  3. Conditioning: Conditioning refers to the process of controlling for or taking into account the effects of other variables in order to estimate the causal relationship between two variables. In statistics and causal inference, conditioning is often used to estimate the causal effect of one variable on another when working with observational data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is meant by External validity?

A

External validity means that the conclusions of your experiment validate to the real-world. In contrast to internal validity which only claims that the results of your experiment apply to the subjects in that experiment alone. If units are a representative sample from some population of units, conclusions may be also valid externally. External validity limited by internal validity: If a causal conclusion drawn within a study is invalid, then generalizations of that inference to other contexts will also be invalid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is meant with a ‘do-operation’? Give one example

A

The term “do-operation” is used in causal inference and refers to the ability to manipulate or intervene on a variable in order to study its causal effect on another variable. One example of a “do-operation” would be a randomized controlled trial (RCT) to study the effectiveness of a new drug. In this example, the researchers randomly assign individuals with a certain medical condition to either the treatment group, which receives the new drug, or the control group, which receives a placebo. The researchers then observe the effect of the treatment on the outcome of interest, such as the improvement of symptoms or the reduction of side effects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Give the definition of a causal Bayesian network.

A

A causal Bayesian network (CBN) is a type of Bayesian network that explicitly represents causality. In a CBN, the directed edges represent causal relationships, and the direction of the edges represents the direction of causality. A CBN can be used to infer the causal effect of one variable on another, and to make counterfactual predictions. The parents of a given node and outcome are interpreted as the direct causes of that outcome. The direction of the edges now also mean the causal relationship between the nodes. The difference is semantic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How would a do-action influence the causal bayesian network?

A

If a node in the network was assigned a value, the parents of that node (if unattached to other nodes) will not influence the node anymore, and this can be reflected in the probability distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can the existence of causal effect be formulized? And subsequently quantified?

A

p(Y |do(X = a)) =/= p(Y |do(X = b)) for two different values for X.

Total average causal effect (ACE) of X = a with respect to X = b on Y is ACE(a, b) = E(Y |do(X = a)) − E(Y |do(X = b))

The amount of causal effect (CE) of X = a with respect to X = b on Y = y is CE(a, b, y) = p(Y = y|do(X = a)) − p(Y = y|do(X = b))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly