Causal Inference Flashcards

Question 1

Q

Define Causality.

Answer

A

Causality: This is the measure of the relationship between cause an effect. A cause has an effect if the cause contributes to the production of the effect.

Question 2

Q

What is determinism?

Give examples of deterministic and non-deterministic causes.

Answer

A

Cause & Effects can be deterministic or non-deterministic. Deterministic causes will necessarily entail the production of the effect, while non-deterministic causes will not necessarily bring about the effect but can induce it.

Example Deterministic Cause & Effects:
• Putting your hand in a fire will cause it to get burned.
• Typing on a keyboard will input text into a computer.

Example non-deterministic Cause & Effects:
• Paying attention in class leads to better grades.
• Smoking causes cancer.
• If you go to a party, you will get covid.

Question 3

Q

Why is causal inference important?

Give examples.

Answer

A

Many important questions, especially in business and science, are causal in nature.
• Shopkeepers may wish to know the impact of prices on sales.
• Governments may wish to know the effect of social distancing on the spread of covid 19.

Question 4

Q

What is the difference between a Causal Effect and an Association?
Give examples.

Answer

A

Correlation does not imply causation. Many correlations exist that would imply spurious causations.
• E.g. there is a strong correlation between chocolate consumption and Nobel laureates
Eating chocolate does NOT make you more likely to become a Nobel laureate. However, the following indirect causation is supposed to exist:
Chocolate Consumption -> National Wealth -> Standard of Education -> Nobel Pizes
We can say that chocolate is predictive of Nobel Laureates but it is NOT causally related.

Question 5

Q

Suppose we had two variables, X and Y. The relationship between them is Y = 3X. This means that:
• If we see X go up by one unit, then Y will go up by 3 units.
• If we see Y go up by one unit, then X will go up by 1/3 units.
What can we say about the direction of causality?

Answer

A

We can’t say.

If we MANIPULATE X to increase by 1, we cannot actually say how Y will behave because we know nothing about the cause and effect.

Data is not enough to prove causality; we need either a model and domain knowledge to infer a causation.

Question 6

Q

Define the following terminology:
•	Graph				
•	Adjacency			
•	Complete	
•	Directed Edge			
•	Parents & Children	
•	Path			 
•	Directed Path		
•	Ancestor		
•	Descendent	
•	Cycle	
•	Directed Acyclic Graph (DAG)

Answer

A

Graph: A collection of nodes and edges.
Adjacency: Two nodes can be said to be adjacent if only one edge separates them.
Complete: A graph is complete if there is an edge between every pair of nodes.
Directed Edge: An edge which is directional; it goes from one node to another node.
Parents & Children: Any directed edge goes from a parent node to a child node.
Path: A set of edges which connect one node to another.
Directed Path: A set of directed edges which connect one node to another.
Ancestor: A node which can be connected to another node through parent nodes.
Descendent: A node which can be connected to another node through child nodes.
Cycle: A directed path forma node to itself.
Directed Acyclic Graph (DAG): Graphs with only directed edges but no cycles.

Question 7

Q

Causes can be both direct and indirect. What is a direct and an indirect cause?

Answer

A

Direct Cause: If Y is a child of X, then X is a cause of Y.

Indirect Cause: If Y is a descendent of X, and is mediated by variables M, then X is an indirect cause of Y.

Question 8

Q

When it comes to investigating a correlation, what is ‘seeing’ vs ‘doing’?

Answer

A

Conditioning: Narrow our focus to the cases where the conditioning variable takes the value we are interesting in “seeing”.
Intervention: Fix a variable’s value (“doing”). This removes the external influences.

Question 9

Q

Health supplements aren’t generally well tested, but those who take supplements are generally healthier.
P(H = True | S = True) > P(H = True | S = False)

What would be the ‘seeing’ vs ‘doing’ investigation here?

Answer

A

• With conditioning we compare the health of people who take supplements with those that don’t.
• With Intervention we would force one group of people to take supplements and force another not to take supplements and then measure their health status.
P(H | S) or P(H | do(S) )

Question 10

Q

What are the three types of junction?

Answer

A

Chain

Fork

Collider

Question 11

Q

What are the dependencies of the following chain:

X -> Y -> Z

Answer

A

X and Y are dependent (The amount you smoke affects the amount of tar in your lungs)
Y and Z are dependent (The amount of tar in your lungs affects your chance of developing lung cancer)
X and Z are dependent (The amount you smoke affects your chance of developing lung cancer)
X and Z are independent conditional on Y (If you know the condition of someone’s lungs, the amount the smoke is irrelevant on their chance of developing cancer.)

Question 12

Q

What are the dependencies for the following fork?

Y Z

Answer

A

X and Y are dependent (# of ice creams sold depends on the temperature)
X and Z are dependent (# of shark attacks depends on the temperature)
Y and Z are dependent (# of shark attacks depends on the number of ice creams sold)
Y and Z are independent conditional on X (if you know the temperature, the number of ice creams that were sold has no impact on the number of shark attacks)

Question 13

Q

What are the dependencies of the following collider?

X -> Z

Answer

A

X and Z are dependent (Your academic ability impacts your university acceptance)
Y and Z are dependent (Your sporting ability impacts your university acceptance)
X and Y are independent (Your academic ability has no relation to your sporting ability)
X and Y are dependent conditional on Z (Knowing someone got into uni tells you they’re either sporty or clever)

Question 14

Q

Define path blocking.

Answer

A

In a graph, a path is blocked if information cannot flow from the start to the end of the path. This occurs when:
• A chain or a fork has the middle node conditioned upon.
• The path contains a collider such that the collision node (or any of its descendants) are NOT conditioned on.

Question 15

Q

Suppose you had the following path:
A -> B -> C
Are there any associations and/or causes here?

Answer

A

We know that A and C are associated. This is to say that there is some relationship between them. Knowing one would help you predict the other.
We also know that A is an indirect cause of C. This means that if A changes, C will respond (through the intermediary of B). However, C does NOT cause A.

Question 16

Q

Define D-Separation and D-Connection.

Answer

A

The variables X and Y are D-separated when the path between the two variables is blocked.
Any D-Separated variables are independent.
The variables X and Y are D-connected when the path between the two variables is not blocked.
Any D-Connected variables are dependent.

Question 17

Q

Suppose you had the following path:

A -> B< – C

Answer

A

Here A and C are not associated. They are D-Separated, and hence independent. We could say that knowing A does not help you know C.
However, if you condition on the collider B, then information can flow, and A and C become dependent.

Question 18

Q

How do you determine associations from data?

Answer

A

One way is to consider which variables are highly correlated. This only works for linear data.
Higher Correlations indicate potential associations but NOT causations

Question 19

Q

What is selection bias? Give an example.

Answer

A

It is possible to observe correlations that do not really exist as a result of the variables that you condition on. This is called selection (or Berkson’s) bias.

For example, here we are only considering hospital patients, so it makes sense that they would only have one condition, because they only come into hospital for one problem.
• This gives the conditions a strong negative correlation that would not exist in a full population dataset.

Question 20

Q

Correlation/ association does not imply causation because there are many types of association. What are the main types of association?

Answer

A

The reason that correlation does not imply causation is because there are many types of association. Here are the four main types:

Causal Relationship: One variable is the cause of another. E.g. Bad news causes stock prices to go down.
Confounding: Two variables share a common cause. Ice Cream sales and crime both increase with temperature.
Selection Bias: Where we condition on common effects. E.g. Considering diseases of hospital patients.
Chance: Many random and spurious correlations exist as a result of chance.

Question 21

Q

Define Confounding.

Answer

A

Confounding is when you have distorted relationships due to other variables. It arises when the treatment and effect share a common cause.
• E.g. The relationship between getting a degree and going to uni. There are many other factors that confound this relationship.

Question 22

Q

How can confounding be fixed?

Answer

A

Confounding can be stopped by blocking every ‘backdoor path’. A Backdoor Path is any path from X to Y which starts with an arrow pointing at X.
We block a path by conditioning or controlling for variables.
• E.g. If we control for intelligence, then the uni-degree vs salary relationship becomes a lot stronger.