Frosi Flashcards
Which are the 3 Information Content level of Analyses?
Correlations: This level involves identifying relationships where variables move together, either in the same or opposite directions. An example given is the correlation between ice cream consumption and crime rates.
Causation: At this level, variables are linked through cause-and-effect relationships, providing actionable information. This type of analysis often involves interventions, randomized controlled trials (RCTs), or A/B tests. An example is the causal relationship between aspirin use and the reduction of migraine pain.
Mechanisms: The most in-depth analysis involves experiments that reveal the underlying mechanisms of observed phenomena, leading to a broader understanding of laws or regularities across different contexts. For instance, it’s not just that aspirin reduces migraines, but the mechanism involves the dilation of blood vessels, which increases blood flow and affects various types of pain.
Which are the 3 types of experiments we can have?
Randomized Experiments: These are controlled, transparent experiments where subjects are randomly assigned to treatment or control groups. This randomization ensures that the treatment is uncorrelated with other potential confounding variables. Examples include clinical trials in life sciences and field experiments in social sciences.
Natural Experiments: These rely on naturally occurring events or treatments that are effectively random, such as natural disasters or policies randomly assigned by institutions. Examples include the cholera outbreak in London and the Vietnam draft lottery.
Quasi-Experiments: These are experiments where the treatment is assigned by near-random processes, but not through an explicit random assignment by the researcher. Examples include changes in institutional rules that affect subjects differently, like U.S. congressional districts where a candidate narrowly wins.
Which are the 3 critical assumptions necessary for the causal inference of treatment effects?
Random Assignment: This assumption ensures that the assignment of individuals to treatment or control groups is independent of any potential outcomes. This helps establish the causality by eliminating selection bias.
Excludability: Also known as the “exclusion restriction,” this assumption posits that only the treatment influences the outcomes, and all other factors are excluded. This assumption can fail if there are confounding factors (other variables that affect the outcome and are associated with the treatment) or asymmetries in measurement (differences in how outcomes are measured across participants).
Non-interference: Also referred to as the Stable Unit Treatment Value Assumption (SUTVA), it means that the treatment of one unit does not affect the outcomes of another unit. There should be no spillover effects or interactions between units that could confound the treatment effect.
Which are the 3 experiments in which the treatment variable D can be considered exogenous?
Recall the 3 assumptions
recall the 3 assumptions: random assignment, excludability, SUTVA
A) In a Randomized Experiment, these three assumptions are typically satisfied by design.
B) In Natural or Quasi-Natural Experiments, these assumptions may hold due to the random-like nature of the treatment assignment, such as when using instrumental variables.
C) With Observational Data, these assumptions do not naturally hold, and researchers must employ robust identification strategies (like matching, difference-in-differences, etc.) to claim causality.
1) Which are the 4 types of random assingments?
2) Why randomization is necessary?
3) How do we check for covariate balance?
1) Question:
1. Simple randomization: divide the sample in 2 groups based on a straightforward decision rule (head or tail, random number generator)
2. Stratified randomization: divide the sample by strata (e.g., young vs. old) and then randomly allocate individuals into treated and control groups within the strata
3. Paired randomization: pair units together and randomize within the pairs (extreme case of stratification – e.g., 1 control + 1 treatment paired)
4. Clustered randomization: partition the population into clusters (as in stratified) but assign treatment to random clusters rather than to random units within clusters (e.g., classrooms)
2) Question:
The randomized assignment of experimental units to conditions ensures that groups are balanced on covariates that could affect the outcome. Our two groups thus should have same average levels for all possible other variables we can study: same average age, same gender distribution…
Why do we want balanced groups?
Because our aim is to “isolate” the average effect of the treatment! The only thing that differs between the two groups is whether the units have received the treatment or not
3) Question:
To check for covariate balance, we usually compare averages of the variables with a t-test.
What are the advantages of a Random Assignment?
- Groups are equal on expectations on relevant variables at pre-test (i.e., before treatment)àrun some tests to see if randomization has worked!
- Alternative causes are not confounded with the treatment
- Confounding variables are unlikely to be correlated with the treatment
- Error terms are uncorrelated with treatment variables: in an experimental design, random assignment is used to distribute both observed and unobserved variables evenly between the treatment and control groups.
- The selection process is known and can be modeled
How can we establish causality with Natural experiments?
List the techniques to create a valid control group
To establish causality, it’s necessary to construct a reasonable control group using methods such as:
* Matching: Pairing subjects in the treatment group with similar subjects in the control group.
* Synthetic Controls: Creating a composite control group that approximates the characteristics of the treatment group.
* Differences-in-Differences (DiD): Comparing the changes in outcomes over time between the treatment and control groups.
How can we establish causality with Quasi experiments?
These experiments involve conditions where the assignment is not random. There are two main types of assignment:
* Self-selection: Subjects choose for themselves whether to be in the treatment group (e.g., enrolling in a program).
* Non-random selection: An administrator makes the selection (e.g., a school principal assigning teachers to classes).
Because selection is not random, it’s not guaranteed that the treatment and control groups are equivalent, and the observed effects might have alternative explanations.
To address these issues, researchers need to apply theory and logic, along with econometric techniques, to rule out these alternative explanations and attempt to isolate the effect of the treatment.
Which are the 2 types of validity
With regards to experiments, two main validity types (Shadish et al., 2002):
* Internal Validity: is the effect we find really causal (i.e., Does the observed covariation between the treatment and the outcome result from a causal relationship)?
- External Validity: would the results hold in other settings (i.e., Would the causal relationship between X and Y hold for different people, treatments, outcome measures, and settings)?
Even if we solved Selection Bias with Randomization, which are the 8 other issues that could arise even if we performed a correct randomization?
- History
External events occurring concurrently with treatment could cause the observed effect
Example: we run a study to understand consumer preferences with a specific market segment at the time a new product comes out on the market à it is likely that our results will be influenced by this new entry (even if mitigated by randomization) - Maturation
Naturally occurring changes over time could be confused with a treatment effect
Example: people get better over time not because of a treatment (medicine), rather because they get better by themselves. This is also mitigated by randomization - Attrition and Mortality
Loss of respondents to treatment or to measurement can produce artificial effects if this loss is systematically correlated with certain traits.
Example: if one treatment (e.g., a course) is too difficult, we might see a higher dropout rate than in the control for people that had lower pre-entry scores à this means that we might end up overestimating the effect of the course when we analyze the post-course results, since we only have data about people that stayed in the program! - Testing
Exposure to a test can affect scores on subsequent exposures to that test, an occurrence that can be confused with a treatment effect
Example: taking GMAT multiple times. - Instrumentation
The nature of a measure may change over time or across conditions in a way that could be confused with a treatment effect - Spillovers/ Contamination
When the treatment has some side effects also on the control group - Partial Compliance
Only a fraction of individuals who were offered the treatment might actually take it or “absorb” it (or some members of the control group might manage to get the treatment) - The Hawthorne and John Henry Effects
The mere fact of being under evaluation forces subjects to change their behavior
When we can claim external validity?
To claim that our results have external validity, meaning that they can be generalized to the whole population, we ideally need a random sample from that population
What is the difference between Lab Experiments and Field Experiments?
What are the implication for validity in the 2 cases?
Lab Experiments:
* Characterized by high control over variables with the only variation being the treatment itself.
* Common in natural and life sciences, e.g., experiments with mice, where all conditions are kept constant except for the treatment. In social sciences, participants are often students or people who know they are part of a study.
* Individuals are aware that they are being observed by researchers.
Field Experiments:
* Occur in less controlled but more realistic environments.
* Derived from agricultural sciences but also used in social sciences with participants like legislators, managers, entrepreneurs who continue their daily activities during the study.
* Designed to be as unobtrusive as possible, often with participants unaware they are part of a study.
Implications for Validity:
Internal Validity Concerns: In field experiments, it is challenging to control for all external variables that participants might encounter in their normal environments. This makes it harder to establish a causal link between the treatment and the outcome.
External Validity Concerns: Lab experiments raise questions about how well the results can be generalized to real-world settings since the conditions are highly controlled and may not reflect the complexity of the outside world.
Should we use pre-treatment values of covariates or post-treatment values of covariates?
Ideally, we add pre-treatment values of covariates: we do not want factors that can be affected by the treatment!
What is the Key Assumption behind Difference in Differences?
The key assumption we are making is that treated units would have experienced the same change in mean outcomes over time as that actually observed among the untreated units
In cases of no-randomization, it is easy to include controls that help mitigate concerns of omitted variables. Be sure not to include variables that are themselves an outcome of the treatment: ideally, include pre-treatment (stable) characteristics
How does fixed effects works in DID?
Fixed Effects are nothing more than dummies that we add to our regression when we are in a panel setting (i.e., repeated observation over time for multiple units)
We control for time and individual differences by adding individual and time dummies.
How can we reduce standard errors of ATE?
- Precisely measure outcomes.
- Conduct pre-post (before and after treatment) experiments rather than post-only, as changes over time (delta scores) usually exhibit less variance.
- Increase the number of subjects, especially in groups where high variability is expected.
Can we assess the uncertainty around the ATE from a single experiment with one randomization through Randomization Inference?
Yes
What is and How does Randomization Inference work?
Randomization Inference is The calculation of p-values based on an inventory of datasets
coming from a simulation of all possible randomizations.
Randomization Inference allows for testing the sharp null hypothesis, which states that there is no treatment effect for all observations (i.e., the outcome would be the same with or without the treatment).
Simulation of Randomizations: It involves simulating all possible randomizations of the treatment across observations to create a sampling distribution of the Average Treatment Effect (ATE) under the sharp null hypothesis.
Probability Calculation: From the simulated distribution, the probability of obtaining an estimated ATE as large as the observed one can be calculated, assuming that the true treatment effect is zero (ITE_i = 0).
Large Number of Randomizations: When there are many possible ways to randomize, the sampling distribution can be approximated.
Calculation of P-Values: The p-values, which help determine the significance of the results, can be calculated based on this simulated distribution. This process is termed Randomization Inference.
What is a Mediation analusys (related to a mechanism, that is the step after causal inference)?
Mediation analysis aims to identify the pathways—the mediators—through which the treatment affects the outcome. Thus, understanding the mediators is crucial for unpacking the causal chain from treatment to outcome, and for possibly finding more efficient or targeted interventions based on those mediators.
It focuses on two key questions:
* Did the treatment cause a change in the mediator?
* Did this change in the mediator lead to a change in the outcome?*
Example: An example from a sailor study is given, where:
* Treatment: Lime-based diet
* Mediator: Increase in Vitamin C intake
* Outcome: Decrease in scurvy-induced deaths
How Mediators can be analyzed with a regression?
Mediation Components: It breaks down the total effect of a treatment into two parts: the direct effect and the indirect (mediated) effect. The total treatment effect is the sum of these two components.
- Direct Effect: The effect of the treatment variable on the outcome variable that is not transmitted through the mediator.
- Indirect Effect: The effect that is transmitted from the treatment to the outcome through the mediator.
Regression Equations: It describes a three-equation system used to quantify these effects:
* The first equation models the mediator as a function of the treatment.
* The second equation models the outcome as a function of the treatment.
* The third equation models the outcome as a function of both the treatment and the mediator.
What are the main challenges and considerations in establishing causal mediation?
1) Problem with Regression-Based Analysis: It notes that a significant issue with such analysis is that the mediator variable, Mi, is often not manipulated independently through a randomized intervention, which can lead to concerns about the validity of the mediation effect.
* Solution: To address this, the slide suggests manipulating the treatment variable Di and the mediator Mi through separate random assignments. For example, in a nutritional study, Di could be the presence or absence of limes in the diet, while Mi could be the presence or absence of vitamin C, regardless of its source.
2) Challenges in Implementation: Despite this theoretical approach, the slide acknowledges the difficulty of implementing such designs in real economic settings due to the complexity and multi-dimensional nature of economic behaviors and outcomes.
3) Warnings:
* Specificity of Manipulation: Ensuring that the treatment only manipulates the mediator of interest and not other potential mediators is challenging.
* Random Allocation: Often, mediators are not randomly allocated, which can bias the estimation of mediation effects.
What is a possible solution to the challenges of causal mediation analysis?
The proposed solution is “Implicit Mediation Analysis,” where researchers adjust elements of the treatment to indirectly measure the effects of mediators. Instead of directly testing how changes in the mediator influence the outcome, researchers look at how different aspects of the treatment influence one or more mediators. In other words, the focus is not on studying how 𝐷”-induced change in 𝑀” influences 𝑌”, but rather on the relative effectiveness of different classes of treatments whose attributes affect 1+ mediators along the way
Example:
Research Question: how do conditional cash transfers (𝐷”) impact schooling enrollment (𝑌”) of low-income children?
* Conditional cash transfers = government payments to low-income families who agree to keep their children enrolled in schools
* Experimental Evidence: conditional cash transfers (𝐷”) lead to improved educational outcomes (𝑌”) for children in developing economies
Potential Mediators (𝑴𝒊) ?
* Increased cash (𝑀#”): by providing cash to families, they can invest it in their children’s education
* Increased conditions ( 𝑀$” ): by conditioning the cash payment on the schooling requirements, families exert greater effort in the children’s schooling
these two mediators have been investigated by assigning families to one of 3 experimental groups
1. Control group: no cash or instructions from the government
2. Treated group 1: cash without conditions
3. Treated group 2: cash with conditions
Why is it difficult to establish causal mediation in social sciences?
Because it is difficult (/ impossible) to independently allocate both the treatment and the mediator to units