Causal Inference Flashcards
Define Causality.
Causality: This is the measure of the relationship between cause an effect. A cause has an effect if the cause contributes to the production of the effect.
What is determinism?
Give examples of deterministic and non-deterministic causes.
Cause & Effects can be deterministic or non-deterministic. Deterministic causes will necessarily entail the production of the effect, while non-deterministic causes will not necessarily bring about the effect but can induce it.
Example Deterministic Cause & Effects:
• Putting your hand in a fire will cause it to get burned.
• Typing on a keyboard will input text into a computer.
Example non-deterministic Cause & Effects:
• Paying attention in class leads to better grades.
• Smoking causes cancer.
• If you go to a party, you will get covid.
Why is causal inference important?
Give examples.
Many important questions, especially in business and science, are causal in nature.
• Shopkeepers may wish to know the impact of prices on sales.
• Governments may wish to know the effect of social distancing on the spread of covid 19.
What is the difference between a Causal Effect and an Association?
Give examples.
Correlation does not imply causation. Many correlations exist that would imply spurious causations.
• E.g. there is a strong correlation between chocolate consumption and Nobel laureates
Eating chocolate does NOT make you more likely to become a Nobel laureate. However, the following indirect causation is supposed to exist:
Chocolate Consumption -> National Wealth -> Standard of Education -> Nobel Pizes
We can say that chocolate is predictive of Nobel Laureates but it is NOT causally related.
Suppose we had two variables, X and Y. The relationship between them is Y = 3X. This means that:
• If we see X go up by one unit, then Y will go up by 3 units.
• If we see Y go up by one unit, then X will go up by 1/3 units.
What can we say about the direction of causality?
We can’t say.
If we MANIPULATE X to increase by 1, we cannot actually say how Y will behave because we know nothing about the cause and effect.
Data is not enough to prove causality; we need either a model and domain knowledge to infer a causation.
Define the following terminology: • Graph • Adjacency • Complete • Directed Edge • Parents & Children • Path • Directed Path • Ancestor • Descendent • Cycle • Directed Acyclic Graph (DAG)
- Graph: A collection of nodes and edges.
- Adjacency: Two nodes can be said to be adjacent if only one edge separates them.
- Complete: A graph is complete if there is an edge between every pair of nodes.
- Directed Edge: An edge which is directional; it goes from one node to another node.
- Parents & Children: Any directed edge goes from a parent node to a child node.
- Path: A set of edges which connect one node to another.
- Directed Path: A set of directed edges which connect one node to another.
- Ancestor: A node which can be connected to another node through parent nodes.
- Descendent: A node which can be connected to another node through child nodes.
- Cycle: A directed path forma node to itself.
- Directed Acyclic Graph (DAG): Graphs with only directed edges but no cycles.
Causes can be both direct and indirect. What is a direct and an indirect cause?
Direct Cause: If Y is a child of X, then X is a cause of Y.
Indirect Cause: If Y is a descendent of X, and is mediated by variables M, then X is an indirect cause of Y.
When it comes to investigating a correlation, what is ‘seeing’ vs ‘doing’?
- Conditioning: Narrow our focus to the cases where the conditioning variable takes the value we are interesting in “seeing”.
- Intervention: Fix a variable’s value (“doing”). This removes the external influences.
Health supplements aren’t generally well tested, but those who take supplements are generally healthier.
P(H = True | S = True) > P(H = True | S = False)
What would be the ‘seeing’ vs ‘doing’ investigation here?
• With conditioning we compare the health of people who take supplements with those that don’t.
• With Intervention we would force one group of people to take supplements and force another not to take supplements and then measure their health status.
P(H | S) or P(H | do(S) )
What are the three types of junction?
Chain
Fork
Collider
What are the dependencies of the following chain:
X -> Y -> Z
- X and Y are dependent (The amount you smoke affects the amount of tar in your lungs)
- Y and Z are dependent (The amount of tar in your lungs affects your chance of developing lung cancer)
- X and Z are dependent (The amount you smoke affects your chance of developing lung cancer)
- X and Z are independent conditional on Y (If you know the condition of someone’s lungs, the amount the smoke is irrelevant on their chance of developing cancer.)
What are the dependencies for the following fork?
Y Z
- X and Y are dependent (# of ice creams sold depends on the temperature)
- X and Z are dependent (# of shark attacks depends on the temperature)
- Y and Z are dependent (# of shark attacks depends on the number of ice creams sold)
- Y and Z are independent conditional on X (if you know the temperature, the number of ice creams that were sold has no impact on the number of shark attacks)
What are the dependencies of the following collider?
X -> Z
- X and Z are dependent (Your academic ability impacts your university acceptance)
- Y and Z are dependent (Your sporting ability impacts your university acceptance)
- X and Y are independent (Your academic ability has no relation to your sporting ability)
- X and Y are dependent conditional on Z (Knowing someone got into uni tells you they’re either sporty or clever)
Define path blocking.
In a graph, a path is blocked if information cannot flow from the start to the end of the path. This occurs when:
• A chain or a fork has the middle node conditioned upon.
• The path contains a collider such that the collision node (or any of its descendants) are NOT conditioned on.
Suppose you had the following path:
A -> B -> C
Are there any associations and/or causes here?
We know that A and C are associated. This is to say that there is some relationship between them. Knowing one would help you predict the other.
We also know that A is an indirect cause of C. This means that if A changes, C will respond (through the intermediary of B). However, C does NOT cause A.